In 2017, it was reported that there was 2.7 Zettabytes of data in the entire digital universe. This corresponds to 2,700,000,000,000 Gigabytes. The International Data Corporation predicts this value will grow to 175 Zettabytes by 2025.
These figures make it clear that the need for denser data storage is an inevitability. The answer to this problem, according to some scientists, may lie in our very genetic code.
DNA – or Deoxyribonucleic acid – stores our genetic data. DNA found in our bodies can store 215,000,000 gigabytes of data per gram – over one million times that of the average hard drive.
Nearly every cell in our bodies has DNA in it. It quite literally makes up who we are. The structure consists of two molecular chains that coil around each other to form a double helix.
Between them are ‘base pairs’. There are four bases: A, T, C and G. These are in a specific order along each chain and each base is ‘paired’ with a corresponding base on the opposite chain. The pairings are AT and CG.

This very coding is what scientists are hoping to utilise in data storage. Data is usually represented as a string of 1s and 0s – known as binary data. The basic principle is to convert these 1s and 0s into As, Ts, Cs and Gs. Each base can represent a string of two numbers. For example, A could represent 00, T could go to 01, C to 10 and G to 11. This code is then synthesised into an actual DNA molecule.
In 2012, a team led by George Church at Harvard Medical School encoded a 52,000-page book onto DNA fragments using this method.
Yaniv Erlich, a computer scientist at Columbia University notes another advantage of this method, “DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete,”. This refers to ‘disc rot’, a phenomenon where CDs and DVDs become unreadable over time due to chemical deterioration. Data stored in DNA, however, can almost last forever – or a few thousand years at the very least.
To retrieve the data, scientists perform a simple procedure known as a polymerase chain reaction (PCR). This is a chemical reaction where copies of the DNA chain are made. In addition to this, scientists can target certain parts of the chain instead of the entire set. The data is then sequenced, decoded and adjusted for errors.
Before this kind of technology enters the market, it will need to become cheaper. Sequencing the DNA remains prohibitively expensive.
Catalog, a new DNA-storage technology company is working on ways to bring down the cost. It aims to cheaply generate large chunks of DNA molecules of around 20-30 bases each. It then arranges these into large networks using enzymes. Instead of meticulously encoding each 0 or 1 using bases, the DNA ‘chunks’ can be arranged into the desired pattern.
Whilst it may sound like science fiction, this technology is very close to entering the market. Microsoft believe that their DNA storage technology will be available to the general public within the decade. So it might not be too long before your computer has strands of DNA inside it.