AACR 2021: How machine learning and artificial intelligence are transforming cancer research

By Samantha Black, PhD, ScienceBoard editor in chief

April 14, 2021 -- The use of digital tools has been incorporated into many areas of cancer research. However, experts agree that machine learning and artificial intelligence (AI) are poised to transform drug discovery efforts even further in the near future. Several scientists discussed case studies of how machine learning can be used in cancer biology and drug discovery efforts during the virtual American Association for Cancer Research (AACR) 2021 meeting held online April 9-14.

In a session titled "Data Science and Machine Learning: Will They Revolutionize Cancer Cure and Research?" participants discussed the applications of machine learning in drug discovery and precision cancer medicine.

Michael Brands from Bayer opened the session by providing an overview of how digitization progress has lagged compared to other industries due to the complexity of biological systems. However, computational tools are already widely used in all aspects of drug discovery from target identification to lead optimization and qualification.

According to Brands, machine learning will continue to become a more integral component of the drug discovery process. He explained that data-driven networks will enable more insights into disease targets, and digital tools will increase reliance on in silico data, thereby reducing the need for costly and time-consuming experimental data.

Humans serving as a model for humans

This set the scene for Dr. Matthew Albert, PhD, from Insitro to provide a specific example of how machine learning is being used in drug discovery -- namely, liver fibrosis in the context of nonalcoholic steatohepatitis (NASH).

"We're at a time of convergence, with the opportunity to harness advances in cell biology and biotechnology, as well as advances in machine learning," Albert said. "We can now extract novel features from complex data, making predictions beyond human performance."

He began by explaining that original machine-learning models relied on manually defined features, but that humans are not very good at defining features used for prediction. Therefore, new machine-learning models are trained using end-to-end learning, where they can learn the features that are needed to make predictions. The features are then used to develop a low dimensional "manifold" in which the data are embedded. This allows the data to be visualized and the distances among objects in the dataset to be mapped.

Albert continued by demonstrating how machine learning can be deployed for the analysis of intermediate disease phenotypes that can help researchers unravel the genetic causes of disease. Insitro applied a convolutional neural network to analyze over 4,000 high-content histology images, resulting in the generation of over 2 million individual titles that could be used to train the model.

The network was able to extract quantitative histological traits that are predictive of standard histology scores generated by pathologists. The training set was compared to 463 samples from different patients not included in the training set, and the team found good correlation between histology standards and the machine-learning predictions. Using machine learning, the team identified two novel genome-wide significant variants that are genetic drivers of fibrosis and can be used for future therapeutic development.

Echoing the sentiments of Brands, Albert explained that machine learning can help humans serve as a model for humans.

Modeling DNA changes to provide insight into cancer gene regulation

Digging deeper, in a separate AACR session titled "Artificial Intelligence/Machine Learning in Cancer Research and Care: Progress and Challenges," Christina Leslie, PhD, from Memorial Sloan-Kettering Cancer Center described work underway to decipher gene regulation of cancer cells using chromatin data.

She explained that transcriptional regulation is coordinated by transcription factors, facilitated by physical interactions of distant enhancers and promoters, which is achieved via chromatin looping. Many cancers induce somatic mutations that remodel chromatin accessibility that block/enhance transcription to result in specific gene expression changes.

As an example, she presented data demonstrating how as CD8+ T cells progress to dysfunction in tumors, there are differences in chromatin accessibility at different loci. In this case, there was increased expression of protumor genes (programmed cell death protein 1) and decreased expression of protective genes (interferon gamma).

Leslie's team wanted to further understand how accessibility changes translate into gene expression changes and phenotypic/morphological changes of cancer cells, using 3D interaction data in graph neural networks to learn predictive gene regulatory models (GRMs). The 3D maps were generated using capture data based on chromosome conformation assays like Hi-C and HiChIP.

From this workflow, the team developed several iterations of GraphReg -- graph neural networks for gene regulatory models. The team used Hi-C/HiChIP to encode long-range chromatin interactions in a graph that is then used to propagate information via graph neural networks. With this method, epigenomic data or DNA sequence can be used to predict gene expression at certain promoter/enhancer interactions.

The first iteration of GraphReg is an epigenome-based gene regulatory model (Epi-GraphReg) that can predict gene expression from activity and connectivity of regulatory elements (e.g., promoter/enhancer elements). The second iteration of the workflow that Leslie discussed is a sequence-based gene regulatory model (Seq-GraphReg) that can predict gene expression and 1D epigenomic signals from genomic DNA sequence and 3D connectivity.

Importantly, Leslie explained that GraphReg performs better than other sequencing methods that do not include connectivity data. She went on to explain that feature attribution (such as DeepSHAP) can be used to interpret the model. This method provides a score that indicates where the model thinks the enhancer is positioned in the gene.

The workflows provide novel insights into how machine-learning models can be used to decipher how cancer cells regulate gene expression through changes in the epigenome and DNA sequences.

Do you have a unique perspective on your research related to artificial intelligence or cancer biology? Contact the editor today to learn more.

---

Join The Science Advisory Board today!

If you like this content, please share it with a colleague!