New! Sign up for our free email newsletter.
Science News
from research organizations

Future-proofing 'big data' biological research depends on good digital identifiers

Date:
June 29, 2017
Source:
PLOS
Summary:
'Big data' research runs the risk of being undermined by the poor design of the digital identifiers that tag data. A group of worldwide researchers has assembled a set of pragmatic guidelines to create, reference and maintain web-based identifiers to improve reproducibility, attribution, and scientific discovery. The guidance helps address the frequent problems associated with persistent identifiers linked to scientific data.
Share:
FULL STORY

"Big data" research runs the risk of being undermined by the poor design of the digital identifiers that tag data. A group of worldwide researchers, led by Julie McMurry, at Oregon Health & Science University, has assembled a set of pragmatic guidelines to create, reference and maintain web-based identifiers to improve reproducibility, attribution, and scientific discovery. The guidance, publishing June 29 in the open access journal PLOS Biology helps address the frequent problems associated with persistent identifiers linked to scientific data.

Over the past decade, the life sciences have drastically changed as data continues to evolve to be larger, more interdependent and natively web-based. In this landscape, the broader scientific research community has struggled to engineer this data for the web so that it is persistently accessible, reusable and attributable.

Depending on the individual database involved, identifiers can signify a gene, a genome, a chemical, an organism, a set of experimental data, or even a published article. The usefulness of all these items depends on the robustness and uniqueness of their respective identifiers, enabling them to be linked and discovered in perpetuity. The authors point out that the organic way in which most identifiers have arisen threatens that usefulness, and recognise that it is difficult to create and sustain persistent identifiers or web addresses that won't break and that are used consistently.

This work calls on professionals to do a better job of identifier engineering -- according to emerging community-developed conventions -- so that data can be utilized more effectively for scientific discovery. It also calls on users to be aware enough of these conventions, and of available tooling, to not get burned by broken links and missed connections.

"As with plumbing fixtures, the question of how identifiers work should only need to be understood by those that build and maintain them. However, everyone needs to know how identifiers should be used, and this is where convention is important," said McMurry. "Through this work, we hope to encourage all participants in the scholarly ecosystem -- including authors, data creators, data integrators, publishers, software developers, and resolvers -- to adhere to best practice in order to maximize the utility and impact of life science data."


Story Source:

Materials provided by PLOS. Note: Content may be edited for style and length.


Journal Reference:

  1. Julie A. McMurry, Nick Juty, Niklas Blomberg, Tony Burdett, Tom Conlin, Nathalie Conte, Mélanie Courtot, John Deck, Michel Dumontier, Donal K. Fellows, Alejandra Gonzalez-Beltran, Philipp Gormanns, Jeffrey Grethe, Janna Hastings, Jean-Karim Hériché, Henning Hermjakob, Jon C. Ison, Rafael C. Jimenez, Simon Jupp, John Kunze, Camille Laibe, Nicolas Le Novère, James Malone, Maria Jesus Martin, Johanna R. McEntyre, Chris Morris, Juha Muilu, Wolfgang Müller, Philippe Rocca-Serra, Susanna-Assunta Sansone, Murat Sariyar, Jacky L. Snoep, Stian Soiland-Reyes, Natalie J. Stanford, Neil Swainston, Nicole Washington, Alan R. Williams, Sarala M. Wimalaratne, Lilly M. Winfree, Katherine Wolstencroft, Carole Goble, Christopher J. Mungall, Melissa A. Haendel, Helen Parkinson. Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. PLOS Biology, 2017; 15 (6): e2001414 DOI: 10.1371/journal.pbio.2001414

Cite This Page:

PLOS. "Future-proofing 'big data' biological research depends on good digital identifiers." ScienceDaily. ScienceDaily, 29 June 2017. <www.sciencedaily.com/releases/2017/06/170629142952.htm>.
PLOS. (2017, June 29). Future-proofing 'big data' biological research depends on good digital identifiers. ScienceDaily. Retrieved June 14, 2024 from www.sciencedaily.com/releases/2017/06/170629142952.htm
PLOS. "Future-proofing 'big data' biological research depends on good digital identifiers." ScienceDaily. www.sciencedaily.com/releases/2017/06/170629142952.htm (accessed June 14, 2024).

Explore More

from ScienceDaily

RELATED STORIES