News Release

More reliable bioinformatics tools for the study of proteins

Peer-Reviewed Publication

Universitat Autonoma de Barcelona

More reliable bioinformatics tools for the study of proteins

image: 

Members of the Protein Folding and Conformational Diseases research group at the IBB-UAB, led by Salvador Ventura.

view more 

Credit: IBB-UAB

Many proteins are capable of spontaneously rearranging themselves within cells to form molecular condensates—membraneless intracellular structures formed by one or multiple proteins—through a process known as liquid-liquid phase separation (LLPS). This biological process is key, as it allows proteins to organize, interact and function in an efficient and regulated manner within the cellular environment. When this mechanism fails, neurodegenerative diseases, cancers or developmental disorders can appear.

A research team from the Institute of Biotechnology and Biomedicine (IBB) of the UAB has now created the most comprehensive and reliable dataset of proteins participating in LLPS. Their proposal offers a protocol that allows to overcome the limitations of the algorithms developed so far to obtain predictive models, in which they identified shortcomings that prevent a joint and accurate analysis of the data.

The study, published in the journal Genome Biology, was led by Salvador Ventura, professor of the Department of Biochemistry and Molecular Biology of the UAB and director of the Parc Taulí Research and Innovation Institute (I3PT-CERCA); Michał Burdukiewicz, Maria Zambrano researcher at the IBB and head of the bioinformatics group at the Medical University of Białystok (Poland); and Carlos Pintado Grima, researcher at the IBB and first author of the study.

The research team classified precisely the two main types of proteins involved in LLPS, those that can form condensates by themselves (drivers) and those that only form part of them (clients). In addition, they developed the first standard set of proteins that do not participate in this process, which includes both proteins with defined structures and disordered proteins, "a key element for training artificial intelligence systems fairly and efficiently," says Salvador Ventura, who also coordinates the Protein Folding and Conformational Diseases research group at the IBB.

To validate their work, the scientists investigated specific physicochemical traits involved in LLPS in different subsets of protein sequences, identifying significant differences among them. Moreover, they evaluated the prediction of LLPS in sixteen existing bioinformatics tools, which is the most comprehensive comparison made so far.

The dataset generated in the study allows associating the role of a given protein in LLPS accurately. In total, the researchers classified 2,876 different proteins. "The data we have generated was created to guarantee reliability and interoperability among them, based on standardized criteria for their selection and categorization. Until now, we did not have enough reliable data to make rigorous predictions. With this new resource, we open the door to the development of new, more precise computational tools," says Salvador Ventura.

The datasets and all associated tools of the study are openly available in llpsdatasets.ppmclab.com.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.