July 29, 2020 -- Genome sequencing of the SARS-CoV-2 virus has revealed that the type of virus family to which the novel coronavirus belongs most likely first emerged in bats in the late 1960s, according to a new study published in Nature Microbiology on July 28.
The first genome sequence of SARS-CoV-2, Wuhan-Hu-1, was released on January 10, 2020 and enabled immediate analyses of the virus's ancestry. No other known bat coronaviruses clustered with ORF1b of SARS-CoV-2, and therefore a bat sarbecovirus, RaTG13, was reported as the closest relative with approximately 96% genome sequence identity.
Some researchers suggested that the genetic proximity of the two viruses made it probable that the current pandemic had its origins in bats. Other researchers have suggested that pangolins may have been a possible intermediate species.
Coronaviruses are frequently under recombination, which means that small portions of RNA can have independent origins. This makes it difficult to infer the evolutionary history of coronaviruses. To elucidate the independent origins of SARS-CoV-2, in the current paper an international group of researchers analyzed the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses.
"Coronaviruses have genetic material that is highly recombinant, meaning different regions of the virus's genome can be derived from multiple sources," said author Maciej Boni, PhD, associate professor of biology at Pennsylvania State University, in a statement. "This has made it difficult to reconstruct SARS-CoV-2's origins. You have to identify all the regions that have been recombining and trace their histories. To do that, we put together a diverse team with expertise in recombination, phylogenetic dating, virus sampling, and molecular and viral evolution."
Tracing SARS-CoV-2 back in time
First, the researchers identified the noncombining regions of the genome -- regions with no breakpoints -- to reliably reconstruct the phylogenetic lineage and for dating purposes. They implemented three bioinformatic approaches to determining these regions: mosaicism, phylogenetic incongruence, and excessive homoplasy.
Next, the team reconstructed the phylogenetic histories for the nonrecombinant regions and compared them to each other to see which specific viruses have been involved in recombination events in the past. They found that the lineage of viruses to which SARS-CoV-2 belongs diverged from other bat viruses about 40 to 70 years ago.
Within the shared lineage, the viruses have acquired residues in the spike protein receptor-binding domain that enable the virus to recognize and bind to receptors on human cells. For SARS-CoV-2, this residue is more closely related to a pangolin virus than to RaTG13, suggesting recombination. The presence of this single lineage with properties that allowed it to infect human cells is related to the first SARS-CoV lineage.
Lastly, the team gauged the length of time that the lineage has circulated in bats. To do so, they estimated the time to the most recent common ancestor of SARS-CoV-2 and RaTG13. They found that although SARS-CoV-2 is genetically similar to the RaTG13 coronavirus, which was sampled from a Rhinolophus affinis horseshoe bat in 2013 in Yunnan province, China, SARS-CoV-2 diverged from RaTG13 a relatively long time ago, in 1969.
Most sarbecoviruses are spatially structured according to regional provinces in China, with little crossover. The authors noted that SARS-CoV-2 and RaTG13 are the exception within the lineage -- although they share sequence identity, they were sampled from geographically distant locations.
Collectively, the analysis points to bats being the primary reservoir for the SARS-CoV-2 lineage.
Prioritizing surveillance efforts
The authors suggested that there are unsampled virus lineages circulating in horseshoe bats that have the zoonotic potential to cause disease. However, without better surveillance sampling, it is impossible to estimate whether or how these potential lineages exist.
"The key to successful surveillance, is knowing which viruses to look for and prioritizing those that can readily infect humans," said author David Robertson, PhD, professor of computational virology at Medical Research Council-University of Glasgow Centre for Virus Research. "We should have been better prepared for a second SARS virus."
"We were too late in responding to the initial SARS-CoV-2 outbreak, but this will not be our last coronavirus pandemic," Boni noted. "A much more comprehensive and real-time surveillance system needs to be put in place to catch viruses like this when case numbers are still in the double digits."
Do you have a unique perspective on your research related to genomics or infectious disease research? Contact the editor today to learn more.