February 9, 2022 -- The Genome in a Bottle (GIAB) consortium has documented a benchmark set of challenging, medically relevant autosomal genes with a haplotype-resolved, whole-genome assembly that are associated with several diseases. The details of the research were published in Nature Biotechnology on February 7.
Former benchmark samples are the basis of new technologies and have led to the discovery of new variants, contributing to accurate clinical genome sequencing. In addition, these samples have expanded scientists' understanding of the influence of various genomic variations on human disease and imparted acumen in challenging genes linked with human diseases present in numerous patients.
However, challenges remain for medically related genes that are repetitive or highly polymorphic and are in hard-to-assess regions. The clinical tests for these genes need locus-specific targeted designs and numerous technologies, but are only conducted when a specific disorder is suspected.
Now, a new benchmark set includes single-nucleotide variants, small insertions and deletions, and structural variant (SV) calling, which could become the basis for future benchmarks and help improve the diagnosis of several diseases.
The National Institute of Standards and Technology (NIST) conducted research on carefully compiled benchmarks for all phased small variants and SVs consisting of 273 medically related and challenging genes. These genes are frequently excluded either partially or completely from the standard targeted exon, or whole-genome sequencing or analysis. But the role of these genes in several diseases is well established and documented.
The new research imparted definite examples of challenges with calling variants in these genes, such as mapping challenges for different technologies and recognizing genes for which genome reference consortium human build 37 (GRCh37) or GRCh38 is more specific. The benchmark was carefully compiled so it complements former mapping-based benchmarks. It reported over 17,000 single-nucleotide variations, 3,600 insertions and deletions, and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002.
The benchmark showed that the gene SMN1 is located within a large segmental duplication on chromosome 5 comprising SMN1 and SMN2. Biallelic pathogenic variants in SMN1 lead to spinal muscular atrophy. The 28-kb sequences of SMN1 and SMN2 differ by five intronic and three exonic nucleotides. The detection of pathogenic variants in SMN1 and the copy number state of SMN2 can help in developing new therapies and counseling families about the recurrence risk of this disease.
NCF1 is another challenging example linked with 20% of cases of chronic granulomatous disease. The gene, which is situated within a large segmental duplication, can make the molecular diagnosis of some cases of chronic granulomatous disease difficult.
The new benchmark included the first two exons that were missing from the v4.2.1 benchmark. In addition, it imparted the method to measure the accuracy of variant calls in gene conversion, like event and SVs.
The research also showed that in medically relevant genes, including CBS, CRYAA, and KCNE1 the false duplications in either GRCh37 or GRCh38 lead to reference-specific, missed variants for short- and long-read technologies. Additionally, the authors reported that by masking these false duplications, variant recall can improve from 8% to 100%.
The method, adopted to form the new benchmark from a haplotype-resolved, whole-genome assembly, could be the basis for future comprehensive benchmarks encompassing the whole genome combining various types of small variants and SVs. Moreover, the new benchmark allows a complete analysis of sequencing strategies, analytical methodologies, and other advances for challenging genomic variants and regions relevant to medical research that can improve clinical diagnoses and medicine's understanding of the heritability of multiple diseases.
"Some of these genes, which have previously been very difficult to access, are suspected to have some connection to disease," said Justin Zook, PhD, a NIST biomedical engineer and co-author of the study, in a statement. "Others have very clear clinical importance. SMN1, for example, is a gene we characterized that is directly associated with spinal muscular atrophy, a rare but severe condition."
"Still, there are a few challenging regions that remain excluded from our benchmark or are not resolvable in spite of the availability of comprehensive long-read data," Zook et al noted. "Thus, creating the need for more advanced methods that aid in improved diagnosis."
Do you have a unique perspective on your research related to genomics? Contact the editor today to learn more.