In recent years, researchers have used DNA to encode everything from an operating system to malware. Instead of being a technological curiosity, these efforts were serious attempts to exploit the properties of DNA for long-term storage of data. DNA can remain chemically stable for hundreds of thousands of years, and we̵
But so far, writing data to DNA has involved converting data to a series of bases on a computer and then arranging that sequence from a site that runs a chemical synthesizer – living things do not actually come into the picture. But individually, a group of researchers had figured out how to record biological events by altering a cell’s DNA so they could read the cell’s history. A team at Columbia University has now figured out how to combine the two inserts and write data to DNA using voltage differences applied to live bacteria.
CRISPR and data storage
The CRISPR system is designed as a way to edit genes or completely cut them out of DNA. But the system first became aware of biologists because it inserted new sequences into DNA. For full details, see our Nobel coverage, but for now you just know that part of the CRISPR system involves identifying DNA from viruses and inserting copies of it into the bacterial genome to recognize it if the virus ever reappears.
The group in Columbia has figured out how to use this to detect memories in bacteria. Let’s say you have a process that activates genes in response to a specific chemical, like a sugar. The researchers redirected this to also activate a system that produces copies of a circular piece of DNA called a plasmid. When the copy number was high, they activated the CRISPR system. Given the circumstances, it was most likely to insert a copy of the plasmid DNA into the genome. When the sugar was not present, it would generally insert something else.
Using this system, it was possible to tell if a bacterium has been exposed to the sugar in its past. It’s not perfect as the CRISPR system does not always insert anything when you want it, but it works on average. So you just need to sequence enough bacteria to figure out the average sequence of events.
To adapt this to data storage, the researchers used two plasmids. One is the same as described above: present at low levels when a specific signal is absent, and present at very high levels when the signal is round. The other is always present at moderate levels. When CRISPR is activated, it tended to insert sequences regardless of which plasmid was present at higher levels, as shown in the diagram below.
On its own, this saves only a little. But the process can be repeated, creating a stretch of DNA, which is a series of inserts derived from the red and blue plasmids, the identity being determined by whether the signal was present or not.
To give it a shock
It’s a nice system, but pretty far from the kind of stuff we usually associate with data production – output from a sensor reading or calculation is rarely a sugar or an antibiotic mixed with a lot of bacteria. Getting bacteria to respond to an electrical signal proved to be relatively simple. E coli is capable of altering the activity of genes depending on whether it is in an oxidizing or reducing chemical environment. And the researchers were able to change the environment by applying voltage differences to a particular chemical in the culture with the bacteria.
More specifically, the voltage difference would change the oxidative state of a chemical called ferrocyanide. This in turn caused the bacteria to change the activity of the genes. By constructing the plasmid to respond to the same signal as these genes, the researchers were able to control the plasmid levels using different voltages. And they could then detect that level of this plasmid by activating the CRISPR system in these cells.
It’s pretty easy to see how each of the bets in a series can be considered a zero or one, depending on the bet. But remember, this system is not perfect; fairly regularly, CRISPR would not insert anything when enabled, changing all subsequent bits. Since this process is random, the more likely it is that at least one of them will end up being skipped the longer row of bits you try to encode.
To limit this problem, the researchers kept their data at three bits per second. Bacterial population. Even then, they had to train a supervised learning algorithm to reconstruct the most probable series of bits based on an average of the sequences found in the population. And even with that, the system could not recognize the bit series about six percent of the time. In the end, they decided to use a parity bit that was the sum of the first two to allow error correction, and then edited lots of populations in parallel.
(By giving each population’s plasmids a unique sequence tag called a “barcode”, it was possible to mix many of them into a single population after the bits were encoded and still loosen everything when the DNA was sequenced.)
With everything in place, they successfully saved and read “Hello World!” They even put the bacteria in some potting soil for a week and showed that they were able to recover the message. (Freezer storage obviously works better.) They estimate that the message can be preserved for at least 80 generations of bacteria.
Let’s be clear: as a storage medium in its current form, this is pretty awful. If you were to put some data into DNA, you would be much better off getting the DNA chemically synthesized. But it’s exciting to think that we could go straight from electrical signals to altered DNA, and there may be some ways to improve the system now that it’s been created.
Natural chemical biology, 2021. DOI: 10.1038 / s41589-020-00711-4 (About DOIs).