Machine learning may accelerate solutions to address protein design challenges

By Greg Slabodkin, ScienceBoard Editor in Chief

September 16, 2022 -- Machine learning can be used to create protein molecules much more accurately and quickly than previously possible, with the potential for developing new treatments and vaccines, according to biologists at the University of Washington School of Medicine.

In their study, published September 15 in the journal Science, the researchers reported that proteins designed with an ultra-rapid software tool called ProteinMPNN were much more likely to fold up as intended. The tool for generating amino acid sequences runs in about one second, which they contend is more than 200 times faster than the previous best software, and with superior results.

The researchers, in a press release accompanying their study, note that machine-learning algorithms -- including AlphaFold and RoseTTAFold -- have been trained to predict the detailed shapes of natural proteins based solely on their amino acid sequences. However, they claim their research demonstrates the "broad utility and high accuracy of ProteinMPNN using x-ray crystallography, cryoEM, and functional studies by rescuing previously failed designs, made using Rosetta or AlphaFold, of protein monomers, cyclic homo-oligomers, tetrahedral nanoparticles, and target binding proteins."

In a separate paper published September 15 in the journal Science, the researchers confirmed that the combination of new machine-learning tools could reliably generate new proteins that functioned in the laboratory, including nanoscale rings that they believe could become parts for custom nanomachines.

"Our results highlight the rich diversity of new protein structures that can be generated using deep learning and pave the way for the design of increasingly complex components for nanomachines and biomaterials," the authors concluded.

Senior author David Baker, PhD, professor of biochemistry at the University of Washington School of Medicine, in a statement made the analogy that "ProteinMPNN is to protein design what AlphaFold was to protein structure prediction."

AlphaFold's limitations

The AlphaFold database, which Alphabet's DeepMind created with the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), contains predictions of the 3D structures of proteins generated using artificial intelligence (AI). Having recently grown to cover more than 200 million structures, representing nearly all cataloged proteins, the database has the potential to support the discovery of new drug candidates.

However, AlphaFold has not been without its critics. An opinion piece published last month in Chemistry World by Derek Lowe, PhD, a popular blogger who writes about the pharma industry, explained his reasons for why AlphaFold won't revolutionize drug discovery.

"Protein structures shift and slide in the presence of small-molecule ligands, sometimes subtly and sometimes dramatically, but AlphaFold is not (yet) equipped to predict these changes," Lowe wrote. "We may well be able to find algorithmic solutions to those questions eventually, but as yet there simply aren't enough small-molecule ligand-bound protein structures to build from. We'll need a lot of them. There are twenty or so different protein side chains to account for, but the number of small molecule structures is so huge as to be practically infinite by comparison."

More recently, researchers at the Massachusetts Institute of Technology (MIT) reported on limitations of AlphaFold's AI protein structures in drug discovery. In a study published September 6 in the journal Molecular Systems Biology, they looked at the interactions of 296 essential proteins from Escherichia coli with 218 antibacterial compounds, including antibiotics such as tetracycline, using molecular docking simulations. The simulations are designed to use the structures of molecules to predict how strongly they will bind.

However, the simulations, which have successfully been used in the past to screen compounds against a single target, underperformed when used to screen hundreds of compounds against hundreds of targets.

"AlphaFold appears to do roughly as well as experimentally determined structures, but we need to do a better job with molecular docking models if we're going to utilize AlphaFold effectively and extensively in drug discovery," James Collins, PhD, the MIT professor of medical engineering and science who led the research, said in a statement.

For his part in looking to make progress beyond AlphaFold, Baker at the University of Washington observed that "this is the very beginning of machine learning in protein design" and that in his lab "in the coming months, we will be working to improve these tools to create even more dynamic and functional proteins."

If you like this content, please share it with a colleague!