An artificial intelligence (AI) network developed by Google AI offshoot DeepMind has taken a giant leap in solving one of biology̵
DeepMind’s program, called AlphaFold, surpassed about 100 other teams in a two-year prediction challenge with protein structure called CASP, short for Critical Assessment of Structure Prediction. The results were announced on November 30 at the start of the conference – held practically this year – which takes stock of the exercise.
“This is a big deal,” said John Moult, a computational biologist at the University of Maryland in College Park who co-founded CASP in 1994 to improve computational methods for accurately predicting protein structures. “Somehow the problem is solved.”
The ability to accurately predict protein structures from their amino acid sequence would be a great blessing for life sciences and medicine. It would greatly accelerate efforts to understand the building blocks of cells and enable faster and more advanced drug discovery.
AlphaFold topped the table at the last CASP – in 2018, the first year that London-based DeepMind participated. But this year, the outfit’s deep learning network was head-and-shoulders over other teams, and researchers say, performed so astonishingly well that it could herald a revolution in biology.
“It’s a game changer,” said Andrei Lupas, an evolutionary biologist at the Max Planck Institute for Developmental Biology in Tübingen, Germany, who assessed the performance of various teams in CASP. AlphaFold has already helped him find the structure of a protein that has plagued his lab for a decade, and he expects it will change the way he works and the issues he tackles. “This will change medicine. It will change research. It will change biotechnology. It will change everything, ”adds Lupas.
In some cases, AlphaFold’s structural predictions could not be distinguished from those determined using ‘gold standard’ experimental methods such as X-ray crystallography and, in recent years, cryo-electron microscopy (cryo-EM). AlphaFold may not avoid the need for these cumbersome and expensive methods – yet – say researchers, but AI will make it possible to study living things in new ways.
The structural problem
Proteins are the building blocks of life that are responsible for most of what happens inside the cells. How a protein works and what it does is determined by its 3D shape – ‘structure is function’ is an axiom of molecular biology. Proteins tend to adopt their form without help, governed only by the laws of physics.
For decades, laboratory experiments have been the most important way to obtain good protein structures. The first complete protein structures were determined starting in the 1950s using a technique in which X-rays are fired against crystallized proteins and the diffracted light is translated into the atomic coordinates of a protein. X-ray crystallography has produced the majority of protein structures. But over the last decade, cryo-EM has become the preferred tool in many structural biology laboratories.
Researchers have long wondered how the components of a protein – a variety of amino acids – map the many turns and folds in its possible form. Early attempts to use computers to predict protein structures in the 1980s and 1990s performed poorly, researchers say. Lofty claims about methods in published papers tended to dissolve when other researchers applied them to other proteins.
Moult started CASP to create greater rigor in this effort. The event challenges the teams to predict structures of proteins that have been resolved using experimental methods but for which the structures have not been published. Moult credits the experiment – he does not call it a competition – with significantly improving the field by calling time on hype. “You really find out what looks promising, what works, and what to walk away from,” he says.
DeepMind’s performance in 2018 at CASP13 surprised many researchers in the field, which has long been the bastion of small academic groups. But its approach was pretty much the same as other teams using AI, says Jinbo Xu, a computational biologist at the University of Chicago, Illinois.
The first iteration of AlphaFold used the AI method known as deep learning for structural and genetic data to predict the distance between pairs of amino acids in a protein. In another step that does not rely on AI, AlphaFold uses this information to come up with a ‘consensus’ model for what the protein should look like, says John Jumper at Deep Mind, who is leading the project.
The team tried to build on this approach, but eventually hit the wall. So that changed tack, Jumper says, and developed an AI network that incorporated additional information about the physical and geometric constraints that determine how a protein folds. They also put it as a more difficult task: instead of predicting the ratio of amino acids, the network predicts the final structure of a target protein sequence. “It’s a more complex system,” says Jumper.
CASP takes place over several months. Target proteins or parts of proteins called domains – a total of approx. 100 – are released regularly and teams have several weeks to submit their structural predictions. A team of independent researchers then evaluate the predictions using measurements that measure how similar a predicted protein is with the experimentally determined structure. Judges do not know who is predicting.
AlphaFold’s predictions arrived under the name ‘group 427’, but the surprising accuracy of many of its contributions made them stand out, Lupas says. “I had guessed it was AlphaFold. Most people had, ”he says.
Some predictions were better than others, but almost two-thirds were comparable in quality to experimental structures. In some cases, Moult says, it was not clear whether the discrepancy between AlphaFold’s predictions and the experimental result was a prediction error or an artifact of the experiment.
AlphaFold’s predictions were poorly matched with experimental structures determined by a technique called nuclear magnetic resonance imaging, but that may be down to how raw data is converted to a model, Moult says. The network also struggles to model individual structures in protein complexes or groups, whereby interactions with other proteins distort their forms.
Overall, team structures predicted more accurately this year compared to the last CASP, but much of the progress can be attributed to AlphaFold, Moult says. On protein goals that are considered moderately difficult, the best performances from other teams typically scored a 75 on a 100-point scale with predictive accuracy, while AlphaFold scored about 90 on the same goals, Moult says.
About half of the teams mentioned ‘deep learning’ in the abstract, which sums up their approach, Moult says, suggesting that AI has a broad impact on the field. Most of these were from academic teams, but Microsoft and the Chinese technology company Tencent also entered CASP14.
Mohammed AlQuraishi, a computational biologist at Columbia University in New York City and a CASP participant, is eager to dig into the details of AlphaFold’s performance at the competition and learn more about how the system works when the DeepMind team presents their approach on December 1st. It is possible – but unlikely, he says – that a lighter crop of protein targets contributed to their performance than usual. AlQuraishi’s strong assumption is that AlphaFold will be transformative.
“I think it is fair to say that this will be very disruptive to the prediction field for protein structure. I suspect that many will leave the field, as the core problem has undoubtedly been solved, ”he says. “It’s a first-order breakthrough, certainly one of the most significant scientific achievements of my lifetime.”
An AlphaFold prediction helped determine the structure of a bacterial protein that Lupas’ lab has been trying to crack for years. Lupa’s team had previously collected raw X-ray diffraction data, but transforming these Rorschach-like patterns into a structure requires some information about the shape of the protein. Tricks to get this information as well as other prediction tools had failed. “The model from group 427 gave us our structure in half an hour after we had spent a decade trying everything,” says Lupas.
Demis Hassabis, DeepMinds co-founder and CEO, says it plans to make AlphaFold useful so other researchers can hire it. (They previously published enough details about the first version of AlphaFold so that other researchers could replicate the procedure.) It may take AlphaFold days to come up with a predicted structure that includes estimates of the reliability of different regions of the protein. “We are just beginning to understand what biologists want,” adds Hassabis, who sees drug discovery and protein design as potential applications.
In early 2020, the company released predictions of structures from a handful of SARS-CoV-2 proteins that had not yet been determined experimentally. DeepMind’s predictions of a protein called Orf3a ended up being very similar to one that was later determined through cryo-EM, says Stephen Brohawn, a molecular neurobiologist at the University of California, Berkeley, whose team released the structure in June. “What they have been able to do is very impressive,” he adds.
AlphaFold is unlikely to close laboratories like Brohawn that use experimental methods to solve protein structures. However, this may mean that experimental data of lower quality and easier to collect would be all that is needed to get a good structure. Some applications, such as the evolutionary analysis of proteins, are set to flourish because the tsunami of available genomic data may now be reliably translated into structures. “This will allow a new generation of molecular biologists to ask more advanced questions,” says Lupas. “It will require more thinking and less pipetting.”
“This is a problem that I began to believe would not be solved in my lifetime,” said Janet Thornton, a structural biologist at the European Molecular Biology Laboratory-European Bioinformatics Institute in Hinxton, England, and a former CASP assessor. She hopes the approach can help shed light on the function of the thousands of undissolved proteins in the human genome and make sense of disease-causing gene variations that differ between humans.
AlphaFold’s performance also marks a turning point for DeepMind. The company is best known for using AI to master games like Go, but its long-term goal is to develop applications capable of achieving broad, human intelligence. Tackling major scientific challenges, such as predicting protein structure, is one of the most important applications that AI can do, Hassabis says. “I think that’s the most significant thing we’ve done in terms of the impact of reality.”