Molecular barcoding of DNA identifies rare mutations in stem cells

By Samantha Black, PhD, ScienceBoard editor in chief

August 27, 2020 -- Scientists have developed a new next-generation sequencing technique using molecular barcodes that can accurately detect a single genetic mutation in a pool of 10,000 cells. The researchers applied the approach to CRISPR-Cas9 genome editing to search for undesirable mutations caused by the platform. The research and methods were published in Genome Biology on August 24.

Current genetic sequencing techniques lack the sensitivity required to detect rare gene mutations in a large pool of cells. Investigational gene therapy platforms such as CRISPR-Cas9 rely on detection of gene mutations to produce on-target changes to treat cancers and other diseases.

Third-generation sequencing technologies (long-read sequencing) frequently produce reads in excess of 10,000 base pairs, allowing for improved de novo assembly, mapping, and detection of structural variants compared to earlier sequencing techniques. However, these single-molecule sequencing technologies provide lower per read accuracy and lower resolution compared to short-read sequencing.

Detecting rare variants

Scientists at King Abdullah University of Science and Technology (KAUST) have developed a highly accurate strategy called targeted individual DNA molecular sequencing (IDMseq). This type of sequencing involves attaching molecular barcodes (or unique molecular identifiers, UMI) to every DNA molecule in a sample of cells and making a large number of copies using polymerase chain reaction (PCR).

The technique guarantees that each original DNA molecule is uniquely represented by one molecular barcode after sequencing, thereby preventing false barcodes and allowing quantification of allele frequency in the original population. It is able to sensitively detect genetic variants, including single nucleotide variants, indels, large deletions, and complex rearrangements.

The researchers also developed a bioinformatics toolkit called variant analysis with UMIs for long-read technology (VAULT) to analyze sequencing data from IDMseq. VAULT uses a combination of algorithms to detect mutations based on the unique barcodes. In the system, every barcode represents one of the original DNA molecules. It works well with third-generation, long-read sequencing technologies and helped the researchers determine the frequency of all types of mutations -- from changes in a single DNA nucleotide to large deletions and insertions in the original DNA molecules.

In the study, the team was able to detect rare variants and accurately estimate variant frequency of a knock-in gene mutation mixed with a group of wild-type cells at the ratios of 1:100. 1:1,000, and 1:10,000. Due to the large size and low frequency of the genetic variant, it would have been missed by alternative short-read sequencing or ensemble long-read sequencing techniques.

Searching for mutations caused by CRISPR/Cas9

Recent studies have shown that genome editing by CRISPR-Cas9 can lead to large deletions and complex rearrangements of harmful on-target mutagenesis in various cell types. In the past, these large deletions were detected by ensemble amplicon sequencing, which is prone to amplification bias, or whole-genome sequencing, which cannot adequately detect large and complex variants due to limited read length.

"Several recent studies have reported that Cas9 introduces unexpected, large DNA deletions around the edited genes, leading to safety concerns. These deletions are difficult to detect and quantitate using current DNA sequencing strategies. But our approach, in combination with various sequencing platforms, can analyze these large DNA mutations with high accuracy and sensitivity," noted co-first author, Chongwei Bi, a PhD student at KAUST.

The analysis showed that large deletions occurred in 2.8% to 5.4% of Cas9 repair editing outcomes. The authors suggested that Cas9 cutting may not be random and there are likely hotspots for Cas9-induced large deletions or insertions.

"Our study revealed potential risks associated with CRISPR/Cas9 editing and provides tools to better study genome editing outcomes," explained lead author Mo Li, PhD, bioscientist at KAUST.

They also detected a 300% increase in the number of somatic single nucleotide variants after CRISPR-Cas9 editing.

"This shows that there is a lot that we need to learn about CRISPR/Cas9 before it can be safely used in the clinic," said co-author Yanyi Huang, PhD, of Peking University.

Do you have a unique perspective on your research related to genomics or gene therapy? Contact the editor today to learn more.


Spell-check gene editing corrects hearing loss in mice
Using a new genetic engineering technique called base editing, researchers restored genetic hearing loss in mice with a recessive point mutation. The...
New CRISPR system efficiently battles antibiotic resistance
Researchers from the University of California San Diego have developed a brand new CRISPR-based gene-drive system that dramatically increases the efficiency...
In vitro CRISPR system allows scientists to detect broad outcomes
As CRISPR technology advances, it is crucial to identify all unintended and potentially harmful changes introduced by gene-editing processes. A new study...

Copyright © 2020

Science Advisory Board on LinkedIn
Science Advisory Board on Facebook
Science Advisory Board on Twitter