Predicting Protein Functions

Proteins perform vital tasks in the body, such as regulating metabolism and transmitting signals. Researchers from the Berlin Institute of Health (BIH) and Heidelberg University have now developed an intelligent neural network that uses algorithms to predict the functions of proteins. The scientists used a trick to observe how the network makes its predictions. They applied their approach for example on the CRISPR-Cas9 gene-editing tool. The results of this research were published in the journal "Nature Machine Intelligence".

In bioscientific research, algorithms are used to efficiently examine large amounts of data for patterns. Certain programmes are able to spot recurring structures in large protein molecules and then use this information to draw conclusions about what tasks these molecules perform in cells, for example, whether they function as gene switches or signalling molecules. The predictions made by such algorithms on the basis of protein sequences - the chain of amino acids - are now incredibly precise. It remains unclear, however, why certain sequences are assigned a certain protein function. "The precise knowledge generated with a learning algorithm is not directly accessible", says Dr Dominik Niopek, head of the "Synthetic Biology" research group at the Institute for Pharmacy and Molecular Biotechnology (IPMB) of Heidelberg University.

A student team led by Dr Niopek and Roland Eils, Director of the Center for Digital Health at the BIH as well as Director of the Health Data Science Unit at the Medical Faculty Heidelberg of Heidelberg University, already began tackling this problem in 2017. They developed a new tool for researching proteins called DeeProtein, an intelligent neuronal network that can predict the function of a protein based on the sequence of protein building blocks. "Like most learning algorithms, DeeProtein is a black box, whose mode of operation remains a mystery", states Prof. Eils. "But a simple trick made it possible to observe the network while it was thinking, and hence directly retrieve some of this knowledge."

The researchers used a so-called sensitivity analysis, in which each position in the protein sequence is successively masked. From this incomplete information, DeeProtein calculates the function of the protein. DeeProtein then determines a prediction of the function based on the complete sequence, and the two sets of predictions are compared. "In this way we calculate, for each position in the protein sequence, how important this position is for predicting the correct function. We give each amino acid in the protein chain a sensitivity value for the protein function", explains Julius Upmeier zu Belzen. He is a student in the master’s programme in Molecular Biotechnology at the IPMB and the paper’s lead author.

The scientists then used the new analytical technique to identify regions in proteins that are vital to their function. This technique works for signalling proteins that play a role during carcinogenesis as well as for the CRISPR-Cas9 gene-editing tool, which is being tested in preclinical and clinical studies. "The sensitivity analysis enables us to identify protein regions that tolerate changes well or not so well. This is an important first step if we want to make targeted changes to proteins, so as to equip them with new functions or to ’switch off’ undesirable properties", underscores Dr Niopek, whose working group is also located at Heidelberg University’s BioQuant Center.

The research was funded by the Klaus Tschira Foundation, the German Research Foundation (DFG), and the Federal Ministry for Education and Research (BMBF).

J. Upmeier zu Belzen, T. Bürgel, S. Holderbach, F. Bubeck, L. Adam, C. Gandor, M. Klein, J. Mathony, P. Pfuderer, L. Platz, M. Przybilla, M. Schwendemann, D. Heid, M.D. Hoffmann, M. Jendrusch, C. Schmelas, M. Waldhauer, I. Lehmann, D. Niopek & R. Eils: Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins. Nature Machine Intelligence 1, 225-235 (2019).