New AI model could predict viral jumps from animals to humans

By Samantha Black, PhD, ScienceBoard editor in chief

June 16, 2021 -- A new artificial intelligence (AI)-based computational model called SweetNet aimed at characterizing complex carbohydrates could help predict which viruses are likely to spread from animals to humans. The details of the research were published in Cell Reports on June 15.

Glycans are complex carbohydrates made up of monosaccharides and nonlinear branches and are created through nontemplated multifaceted biosynthesis. These carbohydrates participate in nearly all biological processes -- from protein folding to viral cell entry -- yet are still not well understood. Furthermore, very few computational methods have been developed to link glycan sequences to functions despite their biological importance.

Previously, a glycan language software model called SweetTalk was developed to view glycans as a sequence of "glycowords" (subsequences that describe structural contexts of a glycan) and used to predict the taxonomic class of glycans as well as their properties, such as immunogenicity or contribution to pathogenicity. However, recurrent neural networks cannot fully capture the branched or treelike architecture that is seen in most glycans. Alternative-model architectures are needed to fully integrate glycan nonlinearity and improve prediction performance.

Graph neural networks may represent one potential avenue with the ability to contextualize information contained in nodes and their connecting edges. In the new paper, a research group led by Daniel Bojar, PhD, an assistant professor at the University of Gothenburg, described their work developing a new graph convolutional neural network called SweetNet to study glycan with an unprecedented level of accuracy.

SweetNet

SweetNet treats glycan sequences akin to molecular graphs and thereby accounts for their treelike structures. It views monosaccharides and linkages as nodes and their connections as edges. Importantly, representations of glycans are learned by the model and can be used for visualization and downstream prediction tasks. For instance, the model resolved clear clusters of N- and O-linked glycans and glycolipids.

The image shows a glimpse of glycan diversity, showcasing several classes of glycans from various kingdoms of life. Image courtesy of Daniel Bojar.

In comparing SweetNet to previously attempted glycan-focused machine-learning models for their ability to predict higher taxonomic groups of a glycan, glycan immunogenicity, and association with pathogenicity, the researchers found that SweetNet outperformed all other methods.

Glycans contain organismal phenotypic and environmental properties

Glycan-based phylogeny could offer insights into evolutionary histories and phenotypic similarity between species, complementary to that seen through DNA analysis.

As a proof-of-principle example, the team constructed a map of all species in the order Fabales (dicotyledonous flowering plants that include the legumes) with known glycans by averaging their glycan representations and performing hierarchical clustering. The plants clustered based on environmental (location) and/or phenotypic (seed leaves versus fernlike leaves) similarities. This showed close clustering of taxonomic groups and established a glycan-based phylogeny strategy.

The researchers also applied this to the animal kingdom, where they found meaningful information from the glycan-based phylogeny, such as adjacent clusters for amphibians and fish.

Glycans can be used to identify viral receptors

SweetNet can also provide researchers with a better understanding of zoonotic diseases, where viruses spread from animals to humans. Most viruses bind to glycan receptors prior to cell entry, and these receptors are therefore essential for viral virulence and host specificity. For example, influenza virus strains prefer either α2-3- or α2-6-linked sialic acids in their glycan receptors.

The team constructed a model that could predict the interaction of a viral protein sequence and a glycan. The recurrent neural network incorporating SweetNet modules was used to predict the binding intensity of a given protein-glycan pair, with the influenza virus and hemagglutinin as the first test case. The model was able to learn general features of glycans that are predictive of glycan motifs, which could serve as targets for influenza hemagglutinin.

"With the emergence of SARS-CoV-2, we have seen the potentially devastating consequences of viruses jumping from animals to humans," Bojar said. "Our model can now be used to predict which viruses are particularly close to 'jumping over'. We can analyze this by seeing how many mutations would be necessary for the viruses to recognize human glycans, which increases the risk of human infection. Also, the model helps us predict which parts of the human body are likely targeted by a potentially zoonotic virus, such as the respiratory system or the gastrointestinal tract."

The researchers generalized the SweetNet predictions for other viral targets, including coronaviruses and rotaviruses -- a common cause for viral infections in infants. They found that sialic acid motifs as well as sulfated glycan motifs are the most important binding motifs for coronaviruses, and that sialic acid motifs and the human milk receptor, the Gal(β1-3)GlcNAc(β1-3)Gal(β1-4)Glc motif, were a potent binding of rotaviruses. Because the human milk receptor was not included in the training set, the authors noted that SweetNet-based models can be used to rapidly make predictions of bound glycan receptors for emerging virus variants.

The research group hopes to leverage the improved understanding of the infection process to prevent viral infection and pandemics in the future.

"Predicting virus-glycan interactions means we can now search for glycans that bind viruses better than our own glycans do, and use these "decoy" glycans as antivirals to prevent viral infection," Bojar explained. "However, further advances in glycan manufacturing are necessary, as potential antiviral glycans might include diverse sequences that are currently difficult to produce."

Do you have a unique perspective on your research related to infectious diseases or artificial intelligence? Contact the editor today to learn more.

---

Join The Science Advisory Board today!

If you like this content, please share it with a colleague!