New! Sign up for our free email newsletter.
Science News
from research organizations

A machine learning approach to identify functional human phosphosites

December 11, 2019
European Molecular Biology Laboratory - European Bioinformatics Institute
Scientists have created the largest phosphoproteome resource to date, which is set to help other researchers identify new functionally-relevant phosphosites. The research demonstrates an exciting use for machine learning methods to effectively compile and analyse large phosphorylation related biological datasets. Identifying new functional phosphosites has enormous potential to progress research into many biological processes and diseases.

Researchers at the EMBL's European Bioinformatics Institute (EMBL-EBI) have created the largest reference phosphoproteome to date of almost 120 000 human phosphosites. To identify those most likely to be critical, they used a machine learning approach capable of ranking them according to functional importance.

Proteins are the core molecular machines of the cell that can be regulated by protein modifications, akin to molecular switches. Protein phosphorylation is one such molecular switch, that can alter the structural conformation of a protein, causing it to become activated, deactivated or modifying its function. Despite decades of work the total number of these modifications and which ones are truly critical for life remains a mystery.

This research, published in Nature Biotechnology, creates a freely-accessible resource that can be used by researchers to better understand which proteins are phosphorylated and which phosphosites have functional relevance. Access to this data has significant implications to accelerate the progression of research into many different biological processes and diseases.

Machine learning and data sharing

"This new resource would not have been possible if scientists around the world didn't share their research data and results," says Pedro Beltrao, Group Leader at the EMBL-EBI. "It would take a single machine over 500 consecutive days to run all the mass spectrometry experiments used to create this database. By applying machine learning to this huge dataset, we created a scoring system that will hopefully help researchers to determine which lesser-known phosphosites to explore next."

The researchers at EMBL-EBI curated over 100 publicly available phospho-enriched human datasets containing over 6000 mass-spectrometry experiments from EMBL-EBI's PRoteomics IDEntifications (PRIDE) database. This large-scale project has generated the biggest open access reference phosphoproteome database to date.

Functional human phosphosites

To identify the phosphosites most critical to human cells, machine learning was used to integrate diverse annotations for each site such as the degree of conservation. The phosphosite functional score generated in this study has enormous potential to help other scientists uncover more about their proteins of interest. It can be used to rank known phosphosites to distinguish those which are functionally relevant for molecular processes and disease.

For example, the researchers were able to demonstrate the practicality of their functional score model by identifying two high-scoring phosphosites which play a role in regulating neuronal differentiation.

"The functional score model created from this study can be used to uncover an abundance of new, functional phosphosites that may play crucial roles in disease," says David Ochoa, Project Coordinator at Open Targets. "We already know of several groups who are using the scoring model, so we would like to encourage researchers everywhere to explore the resource and make use of it."

Story Source:

Materials provided by European Molecular Biology Laboratory - European Bioinformatics Institute. Note: Content may be edited for style and length.

Journal Reference:

  1. David Ochoa, Andrew F. Jarnuczak, Cristina Viéitez, Maja Gehre, Margaret Soucheray, André Mateus, Askar A. Kleefeldt, Anthony Hill, Luz Garcia-Alonso, Frank Stein, Nevan J. Krogan, Mikhail M. Savitski, Danielle L. Swaney, Juan A. Vizcaíno, Kyung-Min Noh, Pedro Beltrao. The functional landscape of the human phosphoproteome. Nature Biotechnology, 2019; DOI: 10.1038/s41587-019-0344-3

Cite This Page:

European Molecular Biology Laboratory - European Bioinformatics Institute. "A machine learning approach to identify functional human phosphosites." ScienceDaily. ScienceDaily, 11 December 2019. <>.
European Molecular Biology Laboratory - European Bioinformatics Institute. (2019, December 11). A machine learning approach to identify functional human phosphosites. ScienceDaily. Retrieved December 1, 2023 from
European Molecular Biology Laboratory - European Bioinformatics Institute. "A machine learning approach to identify functional human phosphosites." ScienceDaily. (accessed December 1, 2023).

Explore More
from ScienceDaily