In what’s been described as “one of the most important datasets since the mapping of the Human Genome,” DeepMind published high-quality predictions for the shape of every single protein in the human body the proteins of 20 additional organisms with extreme research value.
DeepMind used their artificial intelligence-based (AI) AlphaFold system to accurately predict the shape of a protein, at scale and in minutes, down to atomic accuracy. Determining protein structures experimentally is a time-consuming and computationally intensive pursuit that is nearly impossible to complete without the help of super computers.
The team, who report on the Alpha Fold system and the research in Nature and describe the work in a recent blog post, said that the database can help scientists studying everything from health to the environment.
They write: “As researchers seek cures for diseases and pursue solutions to other big problems facing humankind – including antibiotic resistance, microplastic pollution, and climate change – they will benefit from fresh insights into the structure of proteins. Proteins are like tiny exquisite biological machines. The same way that the structure of a machine tells you what it does, so the structure of a protein helps us understand its function. Today, we are sharing a trove of information that doubles humanity’s understanding of the human proteome, and reveals the protein structures found in 20 other biologically-significant organisms, from E.coli to yeast, and from the fruit fly to the mouse.”
The database has over 350,000 protein structures, including the human proteome with about 20,000 proteins that are expressed by the human genome, along with the proteomes of 20 other biologically-significant organisms.
In the post, the team writes that the data is already at work at some of the world’s largest research labs and institutions:
“For instance, the Drugs for Neglected Diseases Initiative (DNDi) has advanced their research into life-saving cures for diseases that disproportionately affect the poorer parts of the world, and the Centre for Enzyme Innovation at the University of Portsmouth (CEI) is using AlphaFold to help engineer faster enzymes for recycling some of our most polluting single-use plastics. For those scientists who rely on experimental protein structure determination, AlphaFold’s predictions have helped accelerate their research. As another example, a team at the University of Colorado Boulder is finding promise in using AlphaFold predictions to study antibiotic resistance, while a group at the University of California San Francisco has used them to increase their understanding of SARS-CoV-2 biology.”
The work, which took about five years to complete, was assisted by Deep Mind’s partners with at EMBL’s European Bioinformatics Institute (EMBL-EBI).
Last December, the researchers presented a new version of the AlphaFold system at the CASP14 conference, when DeepMind unveiled a radical new version of our AlphaFold system. At CASP, we pledged to share our methods and provide broad access to this body of knowledge.
Future work will expand the database to include almost every sequenced protein known to science – over 100 million structures.
“It’s a veritable protein almanac of the world. And the system and database will periodically be updated as we continue to invest in future improvements to AlphaFold,” they write.