New! Sign up for our free email newsletter.
Science News
from research organizations

Unexpected cross-species contamination in genome sequencing projects

Date:
November 18, 2014
Source:
PeerJ
Summary:
As genome sequencing has gotten faster and cheaper, the pace of whole-genome sequencing has accelerated, dramatically increasing the number of genomes deposited in public archives. Although these genomes are a valuable resource, problems can arise when researchers misapply computational methods to assemble them, or accidentally introduce unnoticed contaminations during sequencing.
Share:
FULL STORY

As genome sequencing has gotten faster and cheaper, the pace of whole-genome sequencing has accelerated, dramatically increasing the number of genomes deposited in public archives. Although these genomes are a valuable resource, problems can arise when researchers misapply computational methods to assemble them, or accidentally introduce unnoticed contaminations during sequencing.

The first complete bacterial genome, Haemophilus influenzae, appeared in 1995, and today the public GenBank database contains over 27,000 prokaryotic and 1,600 eukaryotic genomes. The vast majority of these are draft genomes that contain gaps in their sequences, and researchers often use these draft sequences for future analyses.

Each genome sequencing project begins with a DNA source, which varies depending on the species. For animals, blood is a common source, while for smaller organisms such as insects the entire organism or a population of organisms may be required to yield enough DNA for sequencing. Throughout the process of DNA isolation and sequencing, contamination remains a possibility. Computational filters applied to the raw sequencing reads are usually effective at removing common laboratory contaminants such as E. coli, but other contaminants may be more difficult to identify.

In a new study in PeerJ, authors from Johns Hopkins University discovered contaminating bacterial and viral sequences in "draft" assemblies of animal and plant genomes that had been deposited in GenBank. These may cause particular problems for the rapidly growing field of microbiome analysis, when sequences labeled as animal in origin actually turn out to be microbial.

In an even more surprising finding, the authors discovered the presence of cow and sheep DNA in the supposedly finished genome of a pathogenic bacterium, Neisseria gonorrhoeae. Although deposited in GenBank as a finished genome, the bacterium apparently was a draft genome that was submitted as complete, with erroneous DNA inserted in five places. If taken at face value, this data would appear to be a startling case of lateral gene transfer, but the correct explanation appears to be more mundane.

These findings highlight the importance of careful screening of DNA sequence data both at the time of release and, in some cases, for many years after publication.


Story Source:

Materials provided by PeerJ. Note: Content may be edited for style and length.


Journal Reference:

  1. Merchant, Wood and Salzberg. Unexpected cross-species contamination in genome sequencing projects. PeerJ, November 2014 DOI: 10.7717/peerj.675

Cite This Page:

PeerJ. "Unexpected cross-species contamination in genome sequencing projects." ScienceDaily. ScienceDaily, 18 November 2014. <www.sciencedaily.com/releases/2014/11/141118072632.htm>.
PeerJ. (2014, November 18). Unexpected cross-species contamination in genome sequencing projects. ScienceDaily. Retrieved March 18, 2024 from www.sciencedaily.com/releases/2014/11/141118072632.htm
PeerJ. "Unexpected cross-species contamination in genome sequencing projects." ScienceDaily. www.sciencedaily.com/releases/2014/11/141118072632.htm (accessed March 18, 2024).

Explore More

from ScienceDaily

RELATED STORIES