C. BICKEL /SCIENCE
When DeepMind first competed in 2018, its algorithm, called AlphaFold, was based on this comparative strategy. But AlphaFold also incorporated a computational method called deep learning, where the software is trained in large data troves – in this case sequences, structures and known proteins – and learns to spot patterns. DeepMind won easily, beating the competition by an average of 15% on each structure and winning GDT scores of up to approx. 60 for the most difficult goals.
But the predictions were still too rough to be useful, says John Jumper, who heads AlphaFold’s development at DeepMind. “We knew how far we were from biological relevance.” To make it better, Jumper and his colleagues combined deep learning with a “tension algorithm” that mimics the way a person can put together a puzzle: first connect pieces into small chunks – in this case clusters of amino acids – and then search for ways to join the lumps into a larger whole. Working on a modest, 128-processor computer network, they trained the algorithm on all 170,000 or so known protein structures.
And it worked. Across target proteins in this year’s CASP, AlphaFold achieved a median GDT score of 92.4. For the most challenging proteins, AlphaFold scored a median of 87, 25 points above the second-best predictions. It even excelled at resolving structures of proteins that are trapped in cell membranes, which are central to many human diseases but notoriously difficult to resolve with X-ray crystallography. Venki Ramakrishnan, a structural biologist at the Medical Research Council Laboratory of Molecular Biology, calls the result “an astonishing advance on the protein folding problem.”
All the groups in this year’s competition got better, Moult says. But with AlphaFold, Lupas says, “The game has changed.” The organizers even worried about DeepMind maybe in some way. So Lupas posed a special challenge: a membrane protein from a species of archaea, an old group of microbes. For 10 years, his research team tried every trick in the book to get an X-ray crystal structure of the protein. “We could not solve it.”
But AlphaFold had no problems. It returned a detailed picture of a three-part protein with two long helical arms in the middle. The model enabled Lupas and his colleagues to understand their X-ray data; within half an hour, they had adapted their experimental results to AlphaFold’s predicted structure. “It’s almost perfect,” says Lupas. “It simply came to our notice then. I do not know how they do it. ”
As a condition of joining CASP, DeepMind – like all groups – agreed to disclose sufficient details about its method for other groups to recreate it. It will be a boon for experimental people who will be able to use accurate structural predictions to make sense of opaque X-ray and cryo-EM data. It could also enable drug designers to quickly work out the structure of each protein in new and dangerous pathogens like SARS-CoV-2, an important step in the hunt for molecules to block them, Moult says.
Still, AlphaFold is not doing all that well yet. In the competition, it faltered noticeably on a protein, a fusion of 52 small repeating segments that distort each other’s position when assembled. Jumper says the team will now train AlphaFold to solve such structures as well as those of protein complexes that work together to perform key functions in the cell.
Although a major challenge has fallen, others will undoubtedly emerge. “This is not the end of anything,” Thornton said. “It’s the beginning of many new things.”