Machine learning helps scientists map the mammalian immune system

By Samantha Black, PhD, ScienceBoard editor in chief

July 7, 2021 -- The use of advanced machine-learning software has led to the identification of hundreds of new genes that likely affect the working immune system, according to findings from a study published recently in the Proceedings of the National Academy of Sciences of the United States of America.

Forward genetics is the process of working backwards, starting with a phenotype (often induced by a random germline mutagen) and ending with discovery of the causative mutation. Traditionally, linkages between a mutant phenotype detected in screening and a particular mutation is accomplished by automated meiotic mapping (AMM) where statistical probability is suggestive of causation. However, this is not the sole indicator of causation.

Many other factors influence the correct selection of authentic causative mutations, including:

  • The nature of the mutation (benign, damaging, null)
  • The necessity of the gene for survival prior to weaning
  • The number of homozygotes tested and magnitude of the phenotypic effect
  • Data variance characteristics of the screen in question
  • The number of distinct phenotypes caused by the mutation
  • The presence or absence of cosegregating mutations
  • The observation of other alleles with similar effects

In the current study, a research team at the University of Texas Southwestern, led by Nobel Laureate Dr. Bruce Beutler, director of the Center for the Genetics of Host Defense (CGHD), set the goal of using artificial intelligence to accurately deduce if a germline mutation is causative of a particular phenotype.

The team developed Candidate Explorer (CE), a machine-learning algorithm that integrates 67 features of genetic mapping data into a single numeric score, mathematically convertible to the probability of verification of any putative mutation-phenotype association. Input features considered by the software include phenotype data (from screening), mutation data, gene data, and meiotic mapping data.

Beutler explained that this work is part of a research program started nearly a decade ago to identify every mutation that may affect the mouse immune system. With the findings from the current study, the program has been used to screen over 55% of mouse genes for effects on flow cytometry measurements of immune cells in the blood.

"The purpose of CE is to help researchers predict whether a mutation associated with a phenotype (trait or function) is a truly causative mutation," Beutler said. "CE has already helped us to identify hundreds of genes with novel functions in immunity. This will improve our understanding of the immune system so that we can find new ways to keep it robust, and also know the reason it sometimes falters."

The study focused on changes in immune cell populations, specifically B cells, T cells, dendritic cells (DC), macrophages, neutrophils, NK cells, and NK1.1+T cells. All cell populations and subpopulations were detected and measured by flow cytometric analysis of peripheral blood leukocytes from mutated mice.

Candidate Explorer successfully evaluated around 87,000 mutation/phenotype associations identified by flow cytometry screening of circulating immune cells. From these associations, the software distinguished 1,279 genes representing 2,336 mutations that were rated good or excellent candidates for causation of phenotypes.

The data suggest that the majority of genes affecting immune cell populations in the blood carry out cell type- or phenotype-specific functions. This is evidenced by the finding that 68% of genes had only one, two, or three good or excellent phenotype associations as compared to only 23% of genes having greater than 20 good or excellent associations (many identified as well-known immune regulatory genes).

Moreover, 449 genes (35%) known or predicted to be essential for viability were associated with at least one flow cytometry phenotype, indicating that many developmentally important genes likely also have functions in adult leukocytes.

Interestingly, the scientists found that the number of good or excellent gene associations varied widely depending on the cell type affected. Generally, B-cell and T-cell phenotypes were associated with the most genes while DC phenotypes were associated with very few genes.

The researchers also identified 386 genes representing "new" immunologically important genes. For many of these genes, no primary immunological or other phenotypic data were previously available.

These data will be useful to mouse geneticists in identifying causative mutations within quantitative trait loci, while clinical geneticists may use Candidate Explorer to help connect causative variants with rare heritable diseases of immunity, even in the absence of linkage information. In this way, the software might facilitate and accelerate identification of causal variants within disease-associated loci found by genome-wide association studies.

"We've now worked through about 60% of the genome and have identified thousands of genes -- hundreds of them novel -- that participate in immunity in the mouse," Beutler commented. "The vast majority of these also contribute to human immunity."

The mouse and human genome are very similar in size and content of genes.

"Almost all mouse genes have a human counterpart, and vice versa," Beutler explained.

The scientists have made all mutations available to the scientific community through a public repository, and the data supporting causation are viewable within the Candidate Explorer program on the CGHD website.

Do you have a unique perspective on your research related to artificial intelligence or immunology? Contact the editor today to learn more.


Algorithm expands what scientists can uncover from individual cells
A new software tool can help researchers further utilize single-cell RNA sequencing data to learn more about individual cells by incorporating biologically...
What makes immunotherapies successful in one tumor but not in others?
Researchers have developed a new framework to uncover pathophysiological and molecular features that determine the effectiveness of immunotherapies. The...
A new way to account for evolution in engineered biological systems
A defining characteristic of biology is that life evolves. Traditional biological engineering views an engineered product as a final destination in the...
Machine-learning system ranks most effective cancer drugs
Scientists successfully trained an ensemble of machine-learning (ML) algorithms to rank clinically relevant cancer drugs based on the drugs' predicted...
Automated flow cytometry can expedite the discovery and development of next-gen drugs
Automated flow cytometry workflows outperform manual gated techniques by reducing variability among samples, while still achieving a high degree of accuracy....

Copyright © 2021

Science Advisory Board on LinkedIn
Science Advisory Board on Facebook
Science Advisory Board on Twitter