New software finds relationships between RNA modifications, cancers

By Leah Sherwood, The Science Advisory Board assistant editor

September 13, 2021 -- A new computational tool called ModTect was able to identify RNA modifications using preexisting sequencing data gathered from clinical cohort studies. The tool may bring scientists closer to understanding the relationships between RNA modifications and the development of cancer and other diseases, according to a recent study in Science Advances.

ModTect is being touted as an advancement in the new field of epitranscriptomics, the study of chemical modifications to RNA and their effects on gene expression. Just as epigenetics is revolutionizing the study of modified DNA expression, epitranscriptomics promises to do the same for RNA.

The epitranscriptome includes all the biochemical modifications of the RNA (the transcriptome) within a cell. To date, approximately 170 distinct RNA modifications have been identified by the research community. This is important because past studies have suggested links between certain RNA modifications and diseases like cancer and Alzheimer's disease.

However, studying RNA modifications in the lab has been difficult for several reasons. For instance, the chemical reagents needed to treat human samples to detect small modifications can be difficult to acquire. Furthermore, it is difficult to obtain the large quantities of RNA required, which can range from tens of micrograms to milligrams, especially in the case of patients with rare diseases.

Due to recent technical advances, standard RNA sequencing (RNA-seq) is now commonly performed, and large RNA datasets have been compiled from a wide range of clinical samples. The existence of these datasets raises the possibility of studying RNA modifications indirectly by using statistical analysis to identify modifications that were unintentionally captured in the sequencing data.

This is the insight that inspired a team of researchers from the Cancer Science Institute of Singapore (CSI Singapore) at the National University of Singapore to develop the ModTect system, a statistical framework that uses precompiled datasets obtained from RNA-seq of clinical samples to discover new RNA modifications and establish links between RNA modifications and disease.

A new computational approach

To detect modifications in RNA sequences compiled from large clinical cohort studies, ModTect looks for "misincorporation signals," which are generated when the nucleotide sequence incorporates a nucleic acid that differs from the one normally found at a given position. This can arise because of a mismatch, when random nucleotides are incorporated during sequencing, or a deletion, when a portion of the nucleotide sequence is skipped over.

Unlike previous computational approaches, ModTect does not need to be trained on a database of misincorporation signals to identify or classify them, and it is able to identify previously unseen mismatches and deletions.

After detecting the misincorporation signals, ModTech uses the signals to identify base pair-disrupting RNA modifications, which are associated with specific multinucleotide mismatch and deletion signals in the sequence data. ModTect can not only capture these RNA modification sites on ribosomal RNAs (rRNAs), but it can distinguish them from nonbase pair-disrupting RNA modifications, which can be ignored.

Putting ModTect to the test

To test the accuracy of ModTect, the researchers applied it to sequencing data gathered from 11,371 patient samples and 934 cell lines across 33 cancer types. The data were obtained from the Cancer Cell Line Encyclopedia maintained by the Broad Institute of MIT and Harvard.

ModTect identified 9,823 base pair-disrupting mRNA and rRNA modification sites in the data, indicating dysregulation (impairment) in the patients' epitranscriptomes. Subsequent analysis of the results revealed sites with RNA modification levels that were strongly associated with specific tumor types and disease states.

Three known types of base pair-disrupting RNA modification sites that ModTect discovered at nucleotide resolution on rRNAs were N1-methyladenosine at 28S:1322 rRNA, 3-methyluridine at 28S:4530 rRNA, and 3-(3-amino-3-carboxypropyl) pseudouridine at 18S:1248 rRNA.

While these three types of base pair-disrupting RNA modification sites were previously known and well characterized, all three sites have traditionally escaped detection by Sanger sequencing (one of the most common methods of DNA sequencing). The authors speculated that the reason for this may be because some epitranscriptomes impede the use of standard reverse transcriptase, the enzyme that converts RNA into DNA, which is critical to the success of Sanger sequencing.

In addition to previously known rRNA modification sites, ModTect identified a previously unknown type of base pair-disrupting RNA modification site on mRNAs called N2, N2-dimethylguanosine.

"This work is one of few studies demonstrating the association of mRNA modification with cancer development," said co-author Henry Yang, PhD, who is a research associate professor at CSI Singapore, in a statement. "We show that the epitranscriptome was dysregulated in patients across multiple cancer types and was additionally associated with cancer progression and survival outcomes."

The breakthrough was published in Science Advances on August 4, 2021.

Looking to the future, the authors expect that studies such as this one in the domain of RNA will transform the characterization of disease, much like the human genome project did in the domain of DNA.

"We anticipate that studies like this one, eventually leading to complete sequencing of RNA and detecting modifications directly in RNA, will also have a major impact on the characterization of disease and lead to novel therapeutic approaches," said co-author Daniel Tenen, senior principal investigator at CSI Singapore.

The software is publicly available on Github for other scientists to use.

Do you have a unique perspective on your research related to bioinformatics? Contact the editor today to learn more.


COVID multiomics dataset, portal go online for public use
A new multidimensional dataset for COVID-19 multiomics research has been released to the public, along with a free online research portal for analyzing...
Algorithm expands what scientists can uncover from individual cells
A new software tool can help researchers further utilize single-cell RNA sequencing data to learn more about individual cells by incorporating biologically...

Copyright © 2021

Science Advisory Board on LinkedIn
Science Advisory Board on Facebook
Science Advisory Board on Twitter