Public Release: 

Public collections of DNA & RNA sequence data reach 100 gigabases

International collaboration produces this scientific milestone

NIH/National Library of Medicine

For nearly 20 years, the three leading public repositories for DNA and RNA sequence data have collaborated to provide access to the ever increasing amount of genetic data produced by institutions around the world. The three repositories have now reached a significant milestone by collecting and disseminating 100 gigabases of sequence data. For a frame of reference, one hundred billion bases is about equal to the number of nerve cells in a human brain and a bit less than the number of stars in the Milky Way.

These 100,000,000,000 bases, or "letters" of the genetic code, represent both individual genes and partial and complete genomes of over 165,000 organisms. While a single gene from organisms as diverse as humans, elephants, earthworms, fruitflies, apple trees, and bacteria can range from less than one hundred to over several thousand bases long, an organism's genome can be longer than one billion bases. The free access to this information allows scientists to study and compare the same data as their colleagues nearly anywhere in the world, and makes possible collaborative research that will lead ultimately to cures for diseases and improved health.

Thanks to their data exchange policy, the three members of the International Nucleotide Sequence Database Collaboration: GenBank (Bethesda, Maryland USA), European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-Bank in Hinxton, UK), and the DNA Data Bank of Japan (Mishima, Japan) all reached this milestone together.

GenBank is maintained by the National Center for Biotechnology Information (NCBI), a part of the National Library of Medicine, National Institutes of Health. Submitters to GenBank currently contribute over 3 million new DNA sequences per month to the database. More information about GenBank may be found on the NCBI Web site at

David Lipman, Director of the National Center for Biotechnology Information, commented that "Today's nucleotide sequence databases allow researchers to share completed genomes, the genetic make-up of entire ecosystems, and sequences associated with patents. The International Nucleotide Sequence Database Collaboration (INSDC) has realized the vision of the researchers who initiated the sequence database projects by making the global sharing of nucleotide sequence information possible."

Graham Cameron, Associate Director of EMBL's European Bioinformatics Institute, added "This is an important milestone in the history of the nucleotide sequence databases. From the first EMBL Data Library entry made available in 1982 to today's provision of over 55 million sequence entries from at least 200,000 different organisms, these resources have anticipated the needs of molecular biologists and addressed them-often in the face of a serious lack of resources." More information about EMBL-Bank is on the Web at

Takashi Gojobori, Director of the Center for Information Biology and DNA Data Bank of Japan, said: "The INSDC has laid the foundations for the exchange of many types of biological information. As we enter the era of systems biology and researchers begin to exchange complex types of information such as the results of experiments that measure the activities of thousands of genes, or computational models of entire processes, it is important to celebrate the achievements of the three databases that pioneered the open exchange of biological information." More information about the DNA Data Bank of Japan is on the Web at


In the late 1970s, as researchers started to study organisms at the level of their genetic code, several groups began to explore the possibility of developing a public repository for sequence information. In the early 1980s this led to the launch of two databases: the first was the EMBL Data Library, based at the European Molecular Biology Laboratory in Heidelberg, Germany (the Data Library is now known as EMBL-Bank and is based at EMBL's European Bioinformatics Institute, Hinxton, UK). Hot on its heels came GenBank, initially hosted by the Los Alamos National Laboratory and now based at the National Center for Biotechnology Information, Bethesda, Maryland, USA. By the time the International Nucleotide Sequence Consortium became formalized in February 1987, a third partner, the DNA Data Bank of Japan, had been launched at the National Institute of Genetics in Mishima, and collaborated with its European and US counterparts right from the start.

Much has changed since the days when sequences were manually keyed in from the literature or sent on floppy disc and distributed to users on 9-track magnetic tapes, but the purpose of the databases-to make every nucleotide sequence in the public domain freely available to the scientific community as rapidly as possible-remains as strong now as it was in the beginning.

About NCBI:
The National Center for Biotechnology Information is part of the National Library of Medicine. Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information-all for the better understanding of molecular processes affecting human health and disease. NCBI is host to the GenBank nucleotide sequence database.

The National Library of Medicine, the world's largest library of the health sciences, is a component of the National Institutes of Health, U.S. Department of Health and Human Services.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.