New reference genome helps understand diversity

October 16, 2019 -- Houston, TX – At the American Society of Human Genetics 2019 Annual Meeting scientists shared results from the new human reference genome project. The new sequences improve the utility of the human reference genome, a touchstone resource for modern genetics and genomics research. There is a need to improve the reference to better capture genetic diversity among different human populations.

The original Human Genome Project was completed in 2003. It provided a set of DNA sequences that serves as a structure and representative example of the complete set of human genes. For areas of the genome where there is little variation among different people, the reference genome is an important resource that has helped move forward efforts in gene sequencing, genome-wide association studies, and protein characterization.

A more representative reference would benefit scientists using the millions of existing sequencing datasets, as well as future sequencing studies, explains Karen Wong, BS, a graduate student in Professor Pui-Yan Kwok's laboratory at the University of California, San Francisco (UCSF), who presented the research. The current reference genome "is limiting because the reference genome is constructed with DNA from a few people, and over 70% of its sequences comes from a single donor," Ms. Wong said.

When studying groups that may have more genetic differences from the reference genome, the reference genome is less useful and may even introduce error or bias to the results. Researchers are particularly challenged when insertion sequences are present. These sequences vary greatly and are not included in the current reference genome. These often-discarded sequences have unknown meaning. But the research team at UCSF wanted to explore where the sequences fit in the genome to help them understand potential effects.

To accomplish this goal, the scientists fully sequenced more than 300 genomes from around the world, including both male and female sequences from various subpopulations. Focusing on the areas of the genomes that did not map to the reference genome, they used a process called de novo assembly to identify new, unique insertion sequences and their locations relative to the reference. They were able to place a vast majority of the unique sequences missing from the reference, which enabled them to add detail to the reference genome, improving their and other researchers' ability to map future sequences and study them in context.

"The human genome has many complex regions, which no single reference structure can fully describe," Ms. Wong said. "It is important to consider what information should be represented in the human reference genome, and what doesn't need to be included, to make sure it is helpful for a variety of uses but avoids overcomplexity."

Copyright © 2019

Science Advisory Board on LinkedIn
Science Advisory Board on Facebook
Science Advisory Board on Twitter