Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Computer Scientist's guide to cell biology (2007) [pdf] (cmu.edu)
142 points by nafizh on Jan 25, 2016 | hide | past | favorite | 13 comments


This is very nicely writen, and looks like an extremely useful resource for those interested in entering the bioscience field. I would offer one word of caution, which I think is especially pertinent to those coming from a CS background (and, I should add, not something this book particularly does, as far as I have read).

Historically and practically, biology as a field has often represented processes and events in an highly linearized and well defined fashion. This is extremely attractive for a number of reasons. As one key example, we (as humans) remember through narrative, and so the construction of a Rube Goldberg type description of a process is often a useful technique for easy recall of complicated information ("A hits B which winds up C and switches on D etc...").

The other reason is that many of the experiments would really imply a linear pathway if that were all one look for. There is often a clear-cut progression of information signals or metabolic intermediates moving from one state to another state through well defined intermediates, such that if that were what you were looking for, you'd find it.

In this representation, many biological processes are highly analogous to computer programs which perform some task - you have an input, and through functional manipulation generate an output.

The realty, as has been uncovered in the last 15-20 years, is that most of these processes and events are not linear pathways. They are wildly heterogenous networks that integrate information spanning a range of temporal and spatial scales. The non-equilibrium nature of the cell means that, to a certain extent, everything is coupled to everything else through interactions where the associated coupling coefficients are also dependent on everything else.

The reason I bring this up is that I think it's very temping to find analogies between CS and biology (DNA = hard drive, RNA = memory etc). The problem with this is that we (again, as humans) implemented the underlying computer architecture, while we're only scratching the surface of biological complexity. By prescribing that some mapping of CS-to-biology exists we risk convincing ourselves that we understand the biology better than we do, or making assumptions regarding how the biology may or may not work.

Clearly, this kind of description can be used early on, but its important to recognize that these analogies should be viewed as broad-brush stroke descriptors and not functional ones.


+1 This is a nice reminder about how "organic" this domain is. I agree, even if I read this cover to cover, I wouldn't be ready to nail a bioinformatics job interview.

As an armchair biologist (among other things,) this paper is a great next-level-of detail from the pop-sci knowledge that "DNA is the Program." Indeed, Wikipedia's illustration of the workings of a ribosome takes steps towards your point, @alexholehouse. It's jagged and sloppy, and while it appears clock-like in its machinations, one must immediately ask how that could be anything but an oversimplification. Is this the workings of the computer that interprets DNA's "program?"

https://en.wikipedia.org/wiki/Ribosome#/media/File:Protein_t...

A final point, it's sobering to watch this and think that every movement of these proteins represents at least one doctoral thesis' worth of work. Although we have amazing ways of seeing these microscopic actions at play, we don't exactly have debuggers, REPLs or profilers that let us observe cells unaffected. Messy stuff.


I don't agree that people have long thought these were linear pathways. My undergraduate education was more than 20 years ago and beyond having the "Biochemical Reaction Pathways" chart on my wall (http://web.expasy.org/pathways/ has been around 40 years) which clearly shows things as a graph structure, it was commonly taught that there were complex networks, regulated at many points.


I'm sure you mean well, but you're just saying the same thing the author of the document has already said.

>we risk convincing ourselves that we understand the biology better than we do, or making assumptions regarding how the biology may or may not work.

Okay, so the risk is non-zero. That doesn't tell us anything though. Our scientific progress hasn't halted because of high school students being convinced that they know all there is to know because they scored high on a test of the simplified model of reality.

Every single formal method of instruction uses simplified models to ground students' understanding of these concepts. All exams are graded on students "knowing" something, that's much more complicated in reality.

>Clearly, this kind of description can be used early on, but its important to recognize that these analogies should be viewed as broad-brush stroke descriptors and not functional ones.

Analogies are always about broad brushes. All analogies break down at some level. Because at that point, you would just describe the complex idea, rather than use the analogy.


Yeah, I completely agree. I did bioinformatics research for a couple of years after grad school, but before I went to industry. It's going to take many, many years before enough basic research is done to prove that you're right (or possibly wrong!) and that it's not just right now that matters, but also the last minute or hour or day or week or month. When signalling concentrations are super low and don't decay instantly, you've just made a "memory" that will persist some time after the initial stimulus.

I think that in order to actually understand truly how things work people are going to have to simulate many different types of cells from first principles (i.e. the physics) and even though the compute time will be 99.99999% wasted checking all kinds of interactions that turn out to be unimportant, that's what will find all the rare interactions that really matter.

The combinatorics that go on inside cells is truly staggering.


To be fair to the book author, on page 22 he writes, "often there will be complex non-linear relationships between the parts of a biological chemical pathway."

Still, I found alex's comment very informative, a good addition to the book on that topic.


I'll also add that studying the brain and this gets me thinking on general-purpose analog computing. We need an exponential increase in research on that vs what we have now. The brain is analog. The cells are analog. The few GP-analog and digital-analog hybrids that are built outperform their digital counterparts. So, we need to start looking at continuous control and harmonic systems in nature from an analog modeling perspective as some great circuits and capabilities might come out of it.

Materials science is already way ahead here copying nature left and right. Favorite example that I found just last night:

https://www.youtube.com/watch?v=mEH6tDLKcVU


If you like this book you will probably also like the gold standard for cellular biology: Molecular Biology of the Cell (http://www.amazon.com/Molecular-Biology-Cell-5th-Edition/dp/...).

It costs more but it's worth it. It's a deeply informative book that covers a large spectrum of topics that you can read without much background knowledge in biology. It's the same book your doctor or bioinformatics professional probably used in school learning about cellular biology.


Does something something (except marketing) make this book specially relevant for computer scientists? I mean, better than ordinary biology courses, wikipedia, etc.


I work at a biotech company and what I've read so far has been a very well written overview. Really great writeup.


I've currently taken a very strong interest in biotech but my majors don't really give me much time to study chemistry or biology in any significant way. What is the percent composition of R&D in biotech in terms of CS and bio? Is there little enough bio that I could get away with self-teaching it?


Depends on the company and what their product/service is. You might need varying levels of knowledge. I'd say you totally could self-teach it for most roles.


This is not bad. Probably should hand one of these to every computer scientist I work with who is contemplating biological problems.

I think having it focus on cell biology, rather than just the information systems, is worthwhile.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: