ARES deep-learning system improves 3D RNA structure prediction

By Leah Sherwood, The Science Advisory Board assistant editor

August 26, 2021 -- A new deep-learning system called Atomic Rotationally Equivariant Scorer (ARES) significantly improves the prediction of RNA structures over previous artificial intelligence (AI) models. The advance, described by Stanford University researchers in a paper in Science on August 26, may help scientists uncover the biological functions of RNA and pave the way to the discovery of novel RNA-targeted drugs.

Like proteins, RNA molecules twist and fold into intricate 3D shapes that allow them to perform a wide range of cellular functions, including catalyzing reactions, regulating gene expression, modulating innate immunity, and sensing small molecules.

However, while scientists' understanding of protein structure has made great strides in the last decade, their knowledge of RNA structure lags far behind, despite the fact that the fraction of the human genome transcribed to RNA is approximately 30 times as large as the fraction that codes for proteins.

The progress in understanding protein folding is reflected in the success of prediction models such as AlphaFold, developed by Google AI offshoot DeepMind. This model learned how to accurately predict protein structures from their amino acid sequences by leveraging the sequence-structure relationships in thousands of known protein structures.

In the case of RNA, however, there is far less training data available. This is partly because RNA structures are currently not well understood and because RNA sequence information provides less information about 3D RNA structures than is the case for proteins.

To address this issue, the researchers, led by Raphael Townshend, a Stanford doctoral graduate and founder and CEO of Atomic AI, designed ARES to make its RNA structure predictions based on minimal assumptions. The ARES deep neural network accepts as input a structural model of the 3D coordinates and chemical element type of each atom and then predicts the model's root mean square deviation from the unknown true 3D RNA structure.

ARES does not incorporate any assumptions about which features of a structural model are relevant to assessing its accuracy. Even basic structural concepts such as double helices, base pairs, nucleotides, and hydrogen bonds are not preprogrammed into the system.

Unlike AlphaFold, which was trained on thousands of known protein structures, the ARES training data was limited to 18 RNA molecules for which experimentally determined structures were published between 1994 and 2006.

To assess the ability of ARES to identify accurate structural models of previously unseen RNAs, the Stanford researchers compiled a benchmark data set of seven years' worth of winning entries in the RNA-Puzzles contest, a long-running challenge organized by the RNA scientific community. According to the rules of RNA-Puzzles, when scientists in the community discover a new RNA structure experimentally, they withhold publication of the details until other RNA-Puzzles participants have submitted their structural predictions, which are then judged based on how closely they match the experimentally determined structure.

For each RNA structure in the RNA-Puzzles data set, the researchers generated a minimum of 1,500 structural models using the Rosetta FARFAR2 sampling software. They then applied the trained ARES neural network to produce a score for each model. Three other scoring methods were also used for comparison.

Using ARES, the 10 best-scoring structure models included the experimentally correct model for 81% of the benchmark RNAs. By comparison, the three other scoring methods included the correct structure less than 50% of the time.

Next, the researchers entered ARES' predictions into four new rounds of the RNA-Puzzles blind structure prediction challenge. The four experimentally determined but unpublished RNA structures to be predicted consisted of the adenovirus VA-I RNA, the Geobacillus kaustophilus T-box discriminator tRNAGly, the Bacillus subtilis T-box tRNAGly, and the Nocardia farcinic T-box tRNAIIe (Protein Data Bank IDs 6OL3, 6PMO, 6POM, and 6UFM, respectively). For all four RNAs, ARES "won" the challenge, producing the most accurate structural model of any method.

In future work, the researchers plan to give ARES more information beyond the atomic coordinates and the chemical element type of each atom to see if this additional input boosts performance.

"ARES might be improved further by incorporating other types of experimental data, including low-resolution cryogenic electron microscopy and chemical mapping data," the authors wrote in the paper.

Do you have a unique perspective on your research related to AI? Contact the editor today to learn more.

---

Join The Science Advisory Board today!

If you like this content, please share it with a colleague!