The world's three leading public repositories for DNA and RNAsequence information have reached 100 gigabases [100,000,000,000 bases;the 'letters' of the genetic code] of sequence. Thanks to their dataexchange policy, which has paved the way for the global exchange ofmany types of biological information, the three members of theInternational Nucleotide Sequence Database Collaboration [INSDC,www.insdc.org] – EMBL Bank [Hinxton, UK], GenBank [Bethesda, USA] andthe DNA Data Bank of Japan [Mishima, Japan] all reached this milestonetogether.
Graham Cameron, Associate Director of EMBL's EuropeanBioinformatics Institute, says "This is an important milestone in thehistory of the nucleotide sequence databases. From the first EMBL DataLibrary entry made available in 1982 to today's provision of over 55million sequence entries from at least 200,000 different organisms,these resources have anticipated the needs of molecular biologists andaddressed them – often in the face of a serious lack of resources."
DavidLipman, Director of the National Center for Biotechnology Information,adds: "Today's nucleotide sequence databases allow researchers to sharecompleted genomes, the genetic make-up of entire ecosystems, andsequences associated with patents. The INSDC has realized the vision ofthe researchers who initiated the sequence database projects, by makingthe global sharing of nucleotide sequence information possible."
TakashiGojobori, Director of the Center for Information Biology and DNA DataBank of Japan, says: "The INSDC has laid the foundations for theexchange of many types of biological information. As we enter the eraof systems biology and researchers begin to exchange complex types ofinformation, such as the results of experiments that measure theactivities of thousands of genes, or computational models of entireprocesses, it is important to celebrate the achievements of the threedatabases that pioneered the open exchange of biological information."
Inthe late 1970s, as researchers started to study organisms at the levelof their genetic code, several groups began to explore the possibilityof developing a public repository for sequence information. In theearly 1980s this led to the launch of two databases: the first was theEMBL Data Library, based at the European Molecular Biology Laboratory[EMBL] in Heidelberg, Germany [the Data Library is now known as EMBLBank and is based at EMBL's European Bioinformatics Institute, Hinxton,UK]. Hot on its heels came GenBank, initially hosted by the Los AlamosNational Laboratory [LANL] and now based at the National Center forBiotechnology Information, Bethesda, MD, USA. Both of these databaseswere seeded by collections begun by far-sighted individuals: EMBL Bankby the collection of Kurt Stüber, then based at the University ofCologne in Germany, and GenBank by the collection of Walter Goad atLANL. The two nascent databases began collaborating very early on, aninteraction that was initiated by Greg Hamm, the EMBL Data Library'sfirst employee. Staff at the two databases, which at that time had tofind sequences in published journal articles and re-key them into thedatabases, allocated journals to each team to avoid duplication ofeffort, and began the arduous task of mapping the fields from onedatabase onto those of the other so that they could exchangeinformation. By the time the International Nucleotide SequenceConsortium became formalized in February 1987, a third partner, the DNAData Bank of Japan, had been launched at the National Institute ofGenetics in Mishima, and collaborated with its European and UScounterparts right from the start.
Much has changed since thedays when sequences were manually keyed in from the literature or senton floppy disc and distributed to users on 9-track magnetic tapes, butthe purpose of the databases – to make every nucleotide sequence in thepublic domain freely available to the scientific community as rapidlyas possible – remains as strong now as it was in the beginning.
Cite This Page: