Biology is rapidly acquiring the character of a data science. Billions of data points on genes, proteins and other molecules are compiled in large files and systematically studied. This should lead to more knowledge and understanding about living organisms, including crops and livestock that are the basis of food security for the world population. These are some of points raised by Dick de Ridder in his inaugural address upon accepting the post of Professor of Bioinformatics at Wageningen University on 30 April.
In his address, entitled Biology as data science - Zen and the art of bioinformatics, De Ridder discusses the unstoppable growth of data on genomes, proteins and cells, which far outstrips Moore’s Law on the falling costs of microelectronic processing power. Between 1990 and 2003, unravelling the human genome – with more than three billion building blocks – cost approximately $ 2.7 billion, but in 2014 the costs for unravelling the same genome were barely $ 4,000. The gradual reduction in cost made a huge increase in the amount of data possible, allowing us to read the genomes of tens of thousands of organisms. Mapping out the complex tomato genome took an international consortium five years to accomplish, but today we can read the genomes of 150 different tomatoes in one year. “The expectation is that this year we will be able to read a million billion, or 25 petabases of DNA, worldwide” calculates De Ridder.
All this data from analyses of the properties, structure and functions of tens of thousands of molecules in cells, is stored in enormous databases. This unimaginable amount of data is also referred to as ‘big data’. Dealing with big data requires a different approach. “It is the art of the bioinformatician,” says De Ridder, “to establish new biological hypotheses on the basis of terabytes of original data. Biologists currently outsource much of their data analysis to bioinformaticians, but I expect that more and more researchers will make predictions from behind their computers, which will then be validated in outsourced experiments.”
In order to move from data to understanding, De Ridder envisions a procedure with several steps, in which “the computer scientist needs to move in the direction of the biologist. In a sense, this movement is from a reductionist to a holistic approach. Processing the biological data at the deepest level, such as DNA base pairs, therefore only makes sense if this analysis can used to build models of biological processes and if the resulting predictions can be tested. That can yield fundamental insights,” concludes De Ridder. “Conversely, biologists can no longer allow themselves to accept predictions of models without understanding the data analysis.”
Cite This Page: