Scientists use bioinformatics to investigate origin of SARS-CoV-2

By Samantha Black, PhD, The Science Advisory Board editor in chief

March 27, 2020 -- While many scientists urgently work toward developing successful therapies to fight the COVID-19 pandemic, some are looking in the opposite direction: to where SARS-CoV-2 came from. A new report published on March 22 in the Journal of Proteome Research suggests that while bats are a likely natural reservoir, an intermediate host between the mammals and humans is probable.

Researchers from the department of computational medicine and bioinformatics at the University of Michigan addressed the origins of SARS-CoV-2 by examining two recent studies.

How the novel coronavirus relates to other viruses

The first study reported that the SARS-CoV-2 spike protein had four unique insertions that shared "uncanny similarities" with an HIV-1 protein. The controversial paper, which was published online on a preprint server and has since been removed, was met with skepticism in the scientific community but caused rumors and conspiracy theories to circulate among the general public.

The researchers reanalyzed the structural location and sequence homology of the four spike protein insertions described in the withdrawn paper. Deep-learning modeling demonstrated that all four of the insertions were located outside of the receptor binding domain of the spike protein.

They found that the four insertions are not unique to the novel coronavirus and HIV-1. They concluded that the similarities in the sequence-based alignments of the very short fragments were statistically insignificant, according to basic local alignment search tool (BLAST) E values. In fact, the similarities in the fragments were shared among many other viruses including bat coronaviruses. They did conclude that the sequence alignment of the fragments indicated a close evolutionary relation to the bat coronavirus.

The researchers also scrutinized a second study that analyzed the relative synonymous codon usage (RSCU) of the virus in eight vertebrates and found that snakes have the smallest codon usage difference. This conclusion is controversial among virologists due to lack of evidence that coronaviruses can infect animals other than mammals and birds.

To address several concerns in the study, the researchers reimplemented the RSCU comparison algorithm with 102,367 vertebrate species. They found that snakes did not have the lowest RSCU distances with SARS-CoV-2, suggesting that the analysis conducted in the aforementioned study was incomplete.

Interestingly, two kinds of frogs had the smallest RSCU distances among all vertebrates with sufficient sequences. The researchers suggest that RSCU analysis failed to identify an intermediate host because it is not specific enough to discriminate coronaviruses from different vertebrate hosts. This is in comparison to intermediate hosts for SARS-CoV and MERS-CoV, which have almost no difference in RSCU.

How the virus got to humans

Several studies have recently been published identifying coronavirus sequences in pangolins (Manis javanica) that are very similar to SARS-CoV-2. To explore the possibility of M. javanica as an intermediate host, the researchers drafted a genome for Manis-CoV that was used in comparison to SARS-CoV-2 using metagenomic samples. The analysis showed 73% overall coverage of the SARS-CoV-2 genome with 91% sequence identity. Protein sequences assembled included a partial Manis-CoV spike protein that is 92% identical to the SARS-CoV-2 spike protein.

Moreover, the researchers observed only a five-residue position difference between Manis-CoV and SARS-CoV-2, compared with 19 differences between bat-CoV, RaTG13, and SARS-CoV-2. The data suggest that pangolins such as Manis javanica can either be the intermediate hosts of SARS-CoV-2 for the transmission of bat coronaviruses to humans, or serve as alternative natural reservoirs, together with bats, to provide the genetic material for the origin of SARS-CoV-2.

The researchers indicated that it is possible that other animals may serve as intermediate hosts for the virus for two reasons. First, coronaviruses are known to have multiple intermediate hosts (palm civets, racoon dogs, and ferret badgers for SARS-CoV). Second, the 91% sequence identity between Manis-CoV and SARS-CoV-2 is high enough to confirm an evolutionary relationship between the two viruses but not high enough to consider them the same viral species (the intermediate hosts of SARS-CoV and MERS-CoV are 99.8 and 99.9% identical to human versions, respectively).

Do you have a unique perspective on your research related to proteomics or virology? Contact the editor today to learn more.


Frontera supercomputer aids COVID-19 drug development
Scientists are urgently working to build the first complete all-atom model of the SARS-CoV-2 coronavirus envelope, estimated to contain over 200 million...
Comparative genomics confirms natural development of SARS-CoV-2
The available genetic data on SARS-CoV-2 indicate that it is not derived from any previously known virus, suggesting that it originated from either natural...
Supercomputer reveals potential compounds against coronavirus
Researchers from the U.S. Department of Energy's Oak Ridge National Laboratory have used the Summit supercomputer to identify 77 small-molecule drug compounds...

Copyright © 2020

Register below for our weekly Letter from the Editor to receive the latest Science news and insights.
Festival of Genomics & Biodata
January 28-29, 2021
London, Greater London United Kingdom
Lab of the Future USA
May 11-12, 2021
Boston, Massachusetts United States
Science Advisory Board on LinkedIn
Science Advisory Board on Facebook
Science Advisory Board on Twitter