Unfolding AlphaFold and What it Means for Biology

Ian Wehba
5 min readDec 15, 2020

Proteins are the building blocks of life. They perform an enumerable number of biological processes including giving cells their shape, cell signaling, and catalyzing reactions. Some well-known proteins are insulin which regulates metabolism and keratin which is found in hair and nails. All proteins are composed of long chains of the 22 amino acids which then fold into a three-dimensional structure. The protein we consume is broken down into these amino acids, so that our cells can manufacture proteins with them. The function of a given protein is heavily dependent on its three-dimensional structure. Determining what shape a sequence of amino acids will fold into is known as the Protein Folding Problem.

Courtesy of DeepMind

The protein folding problem was first posed about 50 years ago, and ever since scientists have been grappling with the problem with limited success. Founded in 1994 CASP, or Critical Assessment of protein Structure Prediction, is a biennial worldwide competition to assess the cutting edge in protein structure prediction, with blind assessment of teams from around the world. In 2018, at CASP-13, the company DeepMind made headlines when its artificial intelligence model dubbed AlphaFold won the competition. Even though a winner is declared every two years a solution to the protein folding is generally regarded to require a model with over 90% accuracy.

This year at CASP-14 DeepMind did just that with AlphaFold2 — with many heralding their accomplishment as “solving” the protein folding problem.

DeepMind is an artificial intelligence company based in London that was purchased by Google in 2014. DeepMind is also known for the development of AlphaGo an artificial intelligence program designed to play the game Go. In 2016 AlphaGo defeated Go world champion Lee Sedol in a five-game match. The accomplishment was widely compared to the 1997 match between IBM’s Deep Blue chess program and Garry Kasparov, the world chess champion at the time (Deep Blue won).

With such a momentous accomplishment it can be hard to separate the hype from reality. We will explore how AlphaFold works, whether or not it has really solved the protein folding problem, and what AlphaFold means for the future of biology.

AlphaFold is an example of something called a neural network (see opposite). You can think of neural networks as incredibly complicated mathematical functions that are inspired by the structure of the human brain. Neural networks often have hundreds of inputs and thousands of parameters. Neural networks are ‘trained’ on data and then used to predict results on data they have never seen before.

AlphaFold was shown both the amino acid sequence and the three-dimensional structures of a collection of proteins to train it. And then at CASP-14 AlphaFold was only shown the amino acid sequence of proteins it had never seen before and it predicted their shape (with 92.4% accuracy).

AlphaFold was trained on a database of 170,000 proteins. During training a model’s parameters are initially randomized and then they are incrementally adjusted in response to how they perform given the training examples. This process of parameter tuning uses an algorithm called gradient decent and can be used to find local — but not necessarily global — minimums where the model performs best.

DeepMind’s broad experience with neural networks helped them to achieve these never before seen results. Dr. John Jumper, the AlphaFold Lead at DeepMind, said their work “demonstrates that machine learning techniques are finally able to meet the complexity of describing these incredible protein machines.” Machine learning is behind a lot of other cutting-edge technology, including self-driving cars, speech recognition software like Siri, and recommendation algorithms like those that power Netflix and Hulu. However, AlphaFold is among the first applications of machine learning within theoretical biology.

There are millions of potential proteins and tens of thousands in humans alone. The structure of proteins is typically determined with a process called X-ray crystallography which can cost upwards of $100,00 per protein. Proteins are at the heart of many critical biological processes, yet to understand them we must rely on prohibitively expensive and time intensive methods. The cost and time reduction AlphaFold can offer researchers working with proteins is perhaps its greatest benefit. “This leap forward demonstrates how computational methods are poised to transform research in biology and hold much promise for accelerating the drug discovery process” said Dr. Arthur D. Levinson, the Former Chairman and CEO of Genentech: one of the world’s largest pharmaceutical companies, in response to AlphaFold’s results.

Some of the proteins the participants in CASP-14 were tested with are critical to the function of the SARS-2-CoV virus. Protein folding modeling could prove useful for combatting the current and future pandemics.

DeepMind presented their results at CASP-14, however the publication of their code, data, and results are forthcoming (they published their work in CASP-13). DeepMind said “As with our CASP-13 AlphaFold system, we are preparing a paper on our system to submit to a peer-reviewed journal in due course.”

That leaves us with the question: Did AlphaFold really solve the protein folding problem? This question leads into an active are of research within machine learning: interpretability. As we previously saw machine learning models can have tens of thousands of parameters, this makes it incredibly difficult to determine why a model gives the result it does. In the context of protein folding this means that although AlphaFold can accurately predict a protein’s shape based on its amino acid sequence it does not tell us anything about why proteins fold the way they do.

From the perspective of a theoretical biologist AlphaFold has not solved the protein folding problem because their model is essentially a black box. However, with or without interpretability researchers will be able to put the AlphaFold software to use and deliver real results in health care and biotechnology.

--

--