Shocking the scientific community! DeepMind AI solves the "protein folding" problem and overcomes the great challenge of biology in the past 50 years
Author | Bei Shuang
AI has once again made a major breakthrough in the field of biological sciences!
On November 30th, US time, DeepMind, an artificial intelligence company under Google's parent company Alphabet, publicly announced that it has successfully solved protein folding prediction, a major problem in the biological community for 50 years.
The AI system that solved this problem was AlphaFold, which shocked the scientific community when it was launched in 2018.
DeepMind stated in its official blog: The latest version of AlphaFold has been recognized by the authoritative protein structure prediction evaluation organization (Critical Assessment of protein Structure Prediction, CASP) in terms of accurately predicting protein folding structure through amino acid sequence.
As soon as the news came out, it immediately appeared on the cover of Nature magazine, with a direct commentary title: "It will change everything!".
At the same time, Google CEO and Chief Executive Officer Sundar Pichai, Stanford professor Fei-Fei Li, Musk and many other technology giants also retweeted their congratulations as soon as possible!
So what kind of research is this major breakthrough that shocked the technology, biology and scientific communities?
1
AlphaFold: Overcoming a 50-year biological problem
First of all, we need to understand why we need to predict protein folding structure?
As we all know, proteins are essential to life. Almost all diseases, including cancer and dementia, are related to the function of proteins. The function of proteins is determined by their 3D structure.
Christian Anfinsen, winner of the 1972 Nobel Prize in Chemistry, once proposed that the 3D structure of a protein can be calculated and predicted based on its 1D amino acid sequence.
But a practical challenge is that a protein's 3D structure can fold in billions of ways before it is formed.
American molecular biologist Cyrus Levinthal pointed out that if brute force is used to calculate all possible configurations of proteins, it may take longer than the time of the universe. A typical protein may have 10
∧
300 possible configurations.
Therefore, from 1972 to the present, how to accurately predict how proteins fold has been a major challenge in the biological community.
However, the major challenge that has plagued the biological community for 50 years was successfully overcome by DeepMind yesterday. The company's latest AlphaFold system achieved an overall median score of 92.4GDT in the 14th CASP evaluation.
This means that the mean error (RMSD) of AlphaFold's predictions is only 1.6 angstroms (1 angstrom equals 0.1nm), equivalent to the width of an atom.
More importantly, even for the most challenging proteins - free modeling proteins, AlphaFold's median score reached 87.0 GDT
The prediction accuracy of the free modeling class in CASP continues to improve (GDT)
Two examples of free modeling-like protein targets
In this regard, Professor John Moult, President of CASP, said at a press conference,
DeepMind's AlphaFold system has achieved unparalleled accuracy in protein structure prediction. The grand challenges in computer science over the past 50 years have been largely solved.
It should be noted that CASP is the most authoritative organization in the world for evaluating protein structure prediction technology. It was founded by Professors John Moult and Krzysztof Fidelis in 1994 and conducts blind review every two years. Among them, GDT (Global Distance Test) is the main indicator used by CASP to measure prediction accuracy, and its range is from 0-100.
In simple terms, GDT can be roughly considered as the percentage of amino acid residues within a threshold distance to the correct position, and a GDT of around 90 can be considered competitive with the results obtained by experimental methods.
In this regard, Arthur D. Levinson, founder and CEO of CALICO, spoke highly of it:
AlphaFold is the culmination of a generation of products that predict protein structures with incredible speed and accuracy. This leap forward demonstrates how computational methods will transform biological research and hold great promise for accelerating the drug discovery process.
2
The AI mechanism behind AlphaFold
A folded protein can be viewed as a "spatial graph" where residues are nodes and edges are densely connected together.
This diagram represents the neural network model architecture of the AlphaFold system. The model operates on both protein sequences and amino acid residues - iteratively passing information between the two representations to generate structure.
This process is important for understanding the physical interactions within proteins as well as their evolutionary history.
For the latest version of AlphaFold, the researchers created an attention-based neural network system that was trained end-to-end to try to interpret the structure of this graph while reasoning about the implicit graph it built. It refines this graph structure by using multiple sequence alignment (MSA) and representations of amino acid residue pairs.
By iterating this process, the system can make accurate predictions about the basic physical structure of proteins and is able to determine highly accurate structures within a few days. In addition, AlphaFold can also use internal confidence levels to predict which parts of each predicted protein structure are reliable.
The data used by the AlphaFold system comes from a large database of about 170,000 protein structures and protein sequences of unknown structures. When training, it uses about 128 TPU v3 cores (roughly equivalent to 100-200 GPUs) and runs for only a few weeks. This is a relatively small amount of computation in the context of most of the most advanced large models used in machine learning today.
3
Second-generation AlphaFold
DeepMind co-founder and CEO Demis Hassabis said: "DeepMind's ultimate vision has always been to build general AI to accelerate the pace of scientific discovery and help us better understand the world around us."
This time, the AlphaFold system has overcome a major problem that has been solved for 50 years, which means that DeepMind has taken another solid step towards this vision.
AlphaFold became a blockbuster when it was first launched in 2018. In the CASP competition, the "Olympic Games for Protein Structure Prediction" at the time, AlphaFold achieved the highest accuracy among all contestants, and was 8 times that of the second place.
After two years of hard work, DeepMind updated AlphaFold based on the new deep learning structure system, breaking its own record again - jumping from 60GDT to 92.4GDT.
Compared with other similar AIs, AlphaFold's accuracy is also far ahead.
The DeepMind development team said that AlphaFold can achieve unprecedented accuracy because its research method was inspired by the fields of biology, physics and machine learning, and the research results on protein folding over the past half century played an important role.
As an AI tool for the scientific community, AlphaFold's application scenarios and value have already become apparent.
As the epidemic continued to spread this year, DeepMind researchers used AlphaFold to predict several protein structures of the coronavirus SARS-CoV-2, including ORF3a, ORF8, etc.
Despite the challenging nature of this protein structure and the paucity of related sequences, AlphaFold achieved high accuracy in both predictions compared to the experimentally determined structures.
In addition to deepening our understanding of known diseases, AlphaFold's application potential will also extend to unknown areas of biology.
Since DNA specifies the sequence of amino acids that make up a protein structure, researchers reading protein sequences from nature on a large scale may have to count hundreds of millions of proteins in the Universal Protein Database (UniProt). What’s more, only about 170,000 of these proteins have 3D structures.
AI technologies like AlphaFold can help researchers discover yet-to-be-identified proteins.
Reference link:
-
https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology
-
https://www.cnbc.com/2020/11/30/deepmind-solves-protein-folding-grand-challenge-with-alphafold-ai.html
Previous recommendations