15.4.3 Bio-computing
One of the most important building blocks of life is the DNA (de-oxy-ribo-nucleic acid) molecule. Each individual cell of every living organism contains DNA. DNA is critical to all life because it stores the genetic information that describes how an organism works.
Our DNA represents all of the instructions, all of the data, that describe how to create a human and keep him or her functioning. In other words, human DNA contains the “program” or blueprint for building and “running” a human being. Subtle variations in DNA determine the individual physical characteristics that differ from person to person, such as facial features, and eye, hair, and skin color. DNA also determines a person’s gender – either male or female.
DNA is a double helix shaped molecule composed of four subunits, or nucleotides, each consisting of about 25 atoms. The nucleotides are: adenine, thymine, guanine, and cytosine – which are usually abbreviated by their initials A, T, G, and C. Because of their chemical makeup, adenine always binds with thymine, and guanine with cytosine. For this reason A-T and G-C are referred to as the base pairs of DNA.
Human DNA consists of about 3.2 billion base pairs, each of which represents a single bit of data. A rather amazing consequence of this fact is that since 3.2 billion bits is only about 400 megabytes, a person’s entire genome could be stored on a single audio CD, and more than a hundred human genomes on a 50 gigabyte blu-ray disc. Think about that the next time you pop a disc into your Xbox or PlayStation. That game disc holds far more information than your own biological blueprint.
On the other hand, the DNA molecule is a far more efficient data storage device than a blu-ray disc. This is apparent when we note that each and every one of the microscopic cells in our body contains a complete copy of our entire genome. Since each DNA base pair is composed of two 25 atom nucleotides, DNA uses about 50 atoms per bit. DNA thus expresses the entire human genome using a total of about 160 billion atoms (3.2 billion bits x 50 atoms per bit). While that may seem to be a large number of atoms, by the standards of our current computing and data storage technology, DNA is incredibly compact and efficient.
The field of biological computing (or biocomputing) attempts to harness the incredible information processing and storage capabilities of biology in order to construct computing systems. One of the most noteworthy early achievements in the field occurred in 1994 when Leonard Adleman showed how DNA could be used to solve an example of a very difficult optimization problem. In his experiment, Adleman used biological reactions to solve an instance of the Hamiltonian path problem.
The Hamiltonian path problem involves finding a path through a graph that visits every node in that graph exactly once. This problem can be envisioned by thinking of a map of towns connected by roadways. You want to find a route that visits each of the towns exactly once. You may start at any town and end at any other, but you must visit each town on the map exactly once. While you might think solving such a problem is easy, the requirements that you can’t skip any of the towns or visit any of them more than once make this problem quite tough.[10]
Adleman’s approach exploited the massive parallelism found in conventional biochemistry. While biochemical reactions take place much more slowly than electronic computations, conventional biochemistry works on many billions of molecules at the same time. When each of these chemical reactions can be made to represent the search for a solution to a problem, the result is massive parallelism – searching many billions of options, all at the same time. Building on Adleman’s basic approach, researchers have shown that a range of computing problems can be mapped to DNA and solved using biological reactions. However, this approach required that the problems be set up by hand, mapped to the appropriate biochemical regime, the required chemical reactions carried out in a test tube, and then the results interpreted.
A more recent approach to biocomputing focuses on building general purpose biological computers. Such computers would be able to store, transmit, and manipulate data. The fundamental building blocks of general purpose biological computers would be DNA and RNA, and as such these systems are often referred to as DNA computers. DNA computers would be constructed using the techniques of synthetic biology which is defined as the design and construction of biological devices and systems for useful purposes.[11]
On the data storage front, as of January 2013, a team of scientists from the European Bioinformatics Institute successfully encoded ¾ of a megabyte (750 kilobytes) of data in a strand of DNA – and later retrieved the data. The stored data included: a number of text files, a PDF, a JPEG color photograph, and an MP3 of a portion of Dr. Martin Luther King’s famous “I Have a Dream” speech.[12] While significant cost and speed barriers remain, DNA possesses many positive characteristics for use as a long term storage medium, including the fact that it is highly compact and tends to be quite stable over time.
If the various obstacles to constructing DNA computers can be overcome, the practical benefits could be enormous. While DNA computers would be slow compared to today’s electronic computers, they would be incredibly small, energy efficient, and highly compatible with existing biological systems. In fact, DNA computers can, in a very real sense, be thought of as molecular level computers – since biological systems ultimately operate on molecules such as DNA and RNA.
Imagine a world where we could reprogram cellular biology. Such computers could potentially detect and then kill cancer cells, or be used to control insulin levels in diabetic patients. They have the potential to usher in a world of medicine at the cellular level.
Progress towards this dream is being made. In March 2013 a team of bioengineers at Stanford led by Drew Endy announced the creation of the first “biological transistor” from DNA and RNA called a transcriptor (Fig 15.12). Endy’s team illustrated how transcriptors could be used to implement and, or, nand, nor and other basic logic gates using a system they call Boolean Integrase Logic gates, or BIL gates.[13] Such transcriptor logic based BIL gates could be used to activate / deactivate the expression of particular genes under program control. While in its early days, this exciting field is well worth watching as its impact could be truly enormous.
Figure 15.12 Video describing transcriptor logic
Footnotes
[10] Hamiltonian path, first posed by William Hamilton, is an example of an NP-Complete problem – a problem for which all known algorithms require exponential time to find a solution, but which once solved the solution can be verified correct quite rapidly (in polynomial time).
[11] “Adventures in Synthetic Biology” by Drew Endy, Isadora Deese, and Chuck Wadey provides an accessible introduction to synthetic biology in the form of a web comic.
[12] http://www.sciencenews.org/view/generic/id/347702/description/DNA_stores_poems_a_photo_and_a_speech
[13] Yes, this is a bad pun on Bill Gates name. Become a respected scientist and one day you too might be able to name your fundamental breakthrough something silly. ☺