New! Sign up for our free email newsletter.
Science News
from research organizations

New cloud-based platform opens genomics data to all

Creation of Johns Hopkins-led team allows worldwide scientific collaboration

Date:
January 12, 2022
Source:
Johns Hopkins University
Summary:
Harnessing the power of genomics to find risk factors for major diseases or search for relatives relies on the costly and time-consuming ability to analyze huge numbers of genomes. Computer scientists have now leveled the playing field by creating a cloud-based platform that grants genomics researchers easy access to one of the world's largest genomics databases. Known as AnVIL (Genomic Data Science Analysis, Visualization, and Informatics Lab-space), the new platform gives any researcher with an Internet connection access to thousands of analysis tools, patient records, and more than 300,000 genomes.
Share:
FULL STORY

Harnessing the power of genomics to find risk factors for major diseases or search for relatives relies on the costly and time-consuming ability to analyze huge numbers of genomes. A team co-led by a Johns Hopkins University computer scientist has leveled the playing field by creating a cloud-based platform that grants genomics researchers easy access to one of the world's largest genomics databases.

Known as AnVIL (Genomic Data Science Analysis, Visualization, and Informatics Lab-space), the new platform gives any researcher with an Internet connection access to thousands of analysis tools, patient records, and more than 300,000 genomes. The work, a project of the National Human Genome Institute (NHGRI), appears today in Cell Genomics.

"AnVIL is inverting the model of genomics data sharing, offering unprecedented new opportunities for science by connecting researchers and datasets in new ways and promising to enable exciting new discoveries," said project co-leader Michael Schatz, Bloomberg Distinguished Professor of Computer Science and Biology at Johns Hopkins.

Typically genomic analysis starts with researchers downloading massive amounts of data from centralized warehouses to their own data centers, a process that is not only time-consuming, inefficient, and expensive, but also makes collaborating with researchers at other institutions difficult.

"AnVIL will be transformative for institutions of all sizes, especially smaller institutions that don't have the resources to build their own data centers. It is our hope that AnVIL levels the playing field, so that everyone has equal access to make discoveries," Schatz said.

Genetic risk factors for ailments such as cancer or cardiovascular disease are often very subtle, requiring researchers to analyze thousands of patients' genomes to discover new associations. The raw data for a single human genome comprises about 40GB, so downloading thousands of genomes can take takes several days to several weeks: A single genome requires about 10 DVDs worth of data, so transferring thousands means moving "tens of thousands of DVDs worth of data," Schatz said.

In addition, many studies require integrating data collected at multiple institutions, which means each institution must download its own copy while ensuring that patient-data security is maintained. This challenge is expected to become even greater in the future, as researchers embark on ever-larger studies requiring the analysis of hundreds of thousands to millions of genomes at once.

"Connecting to AnVIL remotely eliminates the need for these massive downloads and saves on the overhead," Schatz says. "Instead of painfully moving data to researchers, we allow researchers to effortlessly move to the data in the cloud. It also makes sharing datasets much easier so that data can be connected in new ways to find new associations, and it simplifies a lot of computing issues, like providing strong encryption and privacy for patient datasets."

AnVIL also provides researchers with several major analysis tools, including Galaxy, developed in part at Johns Hopkins, along with other popular tools such as R/Bioconductor, Jupyter notebooks, WDLs, Gen3, and Dockstore to support both interactive analysis and large-scale batch computing. Collectively, these tools allow researchers to tackle even the largest studies without having to build out their own computing environments.

Researchers from all over the world currently use the platform to study a variety of genetic diseases, including autism spectrum disorders, cardiovascular disease, and epilepsy. Schatz's team, part of the Telomere-to-Telomere Consortium, used it to reanalyze thousands of human genomes with the new reference genome to discover more than 1 million new variants.

Already, the AnVIL team has collected petabytes of data from several of the largest NHGRI projects, including hundreds of thousands of genomes from the Genotype-Tissue Expression (GTEx), Centers for Mendelian Genetics (CMG), and Centers for Common Disease Genomics (CCDG) projects, with plans to host many more projects in the near future.

The AnVIL team includes researchers from Johns Hopkins University, the Broad Institute of MIT and Harvard, Harvard University, Vanderbilt University, the University of Chicago, Oregon Health and Sciences University, Yale University School of Medicine, the University of California, Santa Cruz, Roswell Park Comprehensive Cancer Institute, the Pennsylvania State University, the City University of New York, the Carnegie Institute, and Washington University in St. Louis.

This work was supported through cooperative agreement awards from NHGRI, with co-funding from the National Institute of Health's Office of Data Science Strategy to the Broad Institute and Johns Hopkins University.


Story Source:

Materials provided by Johns Hopkins University. Original written by Lisa Ercolano. Note: Content may be edited for style and length.


Journal Reference:

  1. Michael C. Schatz, Anthony A. Philippakis, Enis Afgan, Eric Banks, Vincent J. Carey, Robert J. Carroll, Alessandro Culotti, Kyle Ellrott, Jeremy Goecks, Robert L. Grossman, Ira M. Hall, Kasper D. Hansen, Jonathan Lawson, Jeffrey T. Leek, Anne O’Donnell Luria, Stephen Mosher, Martin Morgan, Anton Nekrutenko, Brian D. O’Connor, Kevin Osborn, Benedict Paten, Candace Patterson, Frederick J. Tan, Casey Overby Taylor, Jennifer Vessio, Levi Waldron, Ting Wang, Kristin Wuichet, Alexander Baumann, Andrew Rula, Anton Kovalsy, Clare Bernard, Derek Caetano-Anollés, Geraldine A. Van der Auwera, Justin Canas, Kaan Yuksel, Kate Herman, M. Morgan Taylor, Marianie Simeon, Michael Baumann, Qi Wang, Robert Title, Ruchi Munshi, Sushma Chaluvadi, Valerie Reeves, William Disman, Salin Thomas, Allie Hajian, Elizabeth Kiernan, Namrata Gupta, Trish Vosburg, Ludwig Geistlinger, Marcel Ramos, Sehyun Oh, Dave Rogers, Frances McDade, Mim Hastie, Nitesh Turaga, Alexander Ostrovsky, Alexandru Mahmoud, Dannon Baker, Dave Clements, Katherine E.L. Cox, Keith Suderman, Nataliya Kucher, Sergey Golitsynskiy, Samantha Zarate, Sarah J. Wheelan, Kai Kammers, Ana Stevens, Carolyn Hutter, Christopher Wellington, Elena M. Ghanaim, Ken L. Wiley, Shurjo K. Sen, Valentina Di Francesco, Deni s Yuen, Brian Walsh, Luke Sargent, Vahid Jalili, John Chilton, Lori Shepherd, B.J. Stubbs, Ash O’Farrell, Benton A. Vizzier, Charles Overbeck, Charles Reid, David Charles Steinberg, Elizabeth A. Sheets, Julian Lucas, Lon Blauvelt, Louise Cabansay, Noah Warren, Brian Hannafious, Tim Harris, Radhika Reddy, Eric Torstenson, M. Katie Banasiewicz, Haley J. Abel, Jason Walker. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. Cell Genomics, 2022; 2 (1): 100085 DOI: 10.1016/j.xgen.2021.100085

Cite This Page:

Johns Hopkins University. "New cloud-based platform opens genomics data to all." ScienceDaily. ScienceDaily, 12 January 2022. <www.sciencedaily.com/releases/2022/01/220112145118.htm>.
Johns Hopkins University. (2022, January 12). New cloud-based platform opens genomics data to all. ScienceDaily. Retrieved March 28, 2024 from www.sciencedaily.com/releases/2022/01/220112145118.htm
Johns Hopkins University. "New cloud-based platform opens genomics data to all." ScienceDaily. www.sciencedaily.com/releases/2022/01/220112145118.htm (accessed March 28, 2024).

Explore More

from ScienceDaily

RELATED STORIES