Machine-learning system ranks most effective cancer drugs

By Leah Sherwood, The Science Advisory Board assistant editor

March 29, 2021 -- Scientists successfully trained an ensemble of machine-learning (ML) algorithms to rank clinically relevant cancer drugs based on the drugs' predicted efficacy in reducing cancer cell growth. The study, reported in Nature Communications on March 25, suggests that ML may soon be widely used to predict the most appropriate treatment for individual patients with cancer.

Protein biomarkers have traditionally been used to direct cancer treatments in patients. One such example is the expression of human epidermal growth factor receptor 2 (HER2) and the treatment of breast cancer patients with trastuzumab, which binds to HER2.

In addition, techniques such as next-generation sequencing (NGS) identify genetic markers that predict responses to several targeted drugs. However, drug response is still challenging to predict in patients given the nature of cancer, which exhibits high degrees of genetic and phenotypic variability within individuals, as well as variability in individual responses to therapy.

Enter the field of proteomic profiling, which seeks to identify and quantify the complete complement of proteins (the proteome) in an individual patient at a specific point in time. Proteomic profiling based on machine learning has the potential to predict drug responses in patients with laserlike precision, thereby fundamentally changing the way cancers are diagnosed and treated.

However, for all the promise of ML-driven proteomics, real-world application to cancer diagnostics and treatments has so far lagged. In addition to a lack of data, a major bottleneck has been the low sample throughput of the standard proteomics technique of liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS).

Another bottleneck has been the laborious process of comparing proteins after chemical or metabolic labeling, thus restricting the number of samples that can be used as the input for ML model generation. However, recent advances have substantially improved LC-MS/MS throughput and led to better high-throughput, label-free analysis techniques for proteomes and phosphoproteomes (complements of protein modifications).

A new system that takes advantage of these advances was developed by scientists from Queen Mary University of London, who named their algorithm Drug Ranking Using Machine Learning (DRUML). The researchers trained DRUML on the responses of cancer cells to over 400 drugs and then used it to produce ordered lists of the drugs based on their antiproliferative efficacy. In other words, DRUML predicts the best drug to treat a given cancer model.

"DRUML predicted drug efficacy in several cancer models and from data obtained from different laboratories and in a clinical dataset," said Pedro Cutillas, PhD, who led the study and is a professor at Queen Mary University of London, in a statement. "These are exciting results because previous machine learning methods have failed to accurately predict drug responses in verification datasets, and they demonstrate the robustness and wide applicability of our method."

DRUML was trained using in-house proteomics and phosphoproteomics data derived from 48 cancer cell lines, comprising acute myeloid leukemia (AML) (n = 26), esophagus (n = 10), and hepatocellular (n = 12) cell lines. The proteomes and phosphoproteomes of the cell lines were analyzed by LC-MS/MS.

The final dataset, which required 288 LC-MS/MS runs, included 22,804 phosphopeptides and 6,455 proteins, which generated 3,283,776 and 929,520 quantitative data points, respectively. The system was then verified with data comprised of 53 cellular models from 12 independent laboratories.

Rather than relying on a single ML approach, the researchers used what they called a "predictive model ensemble" that combined the predictions of several different ML techniques, including random forest, cubist, partial least squares, principal component regression, support vector machine, deep learning, and neural network learning algorithms.

The key metric used by DRUML is drug response distance (D), which is computed using empirical markers of drug responses identified in the training set. The D metric is the difference in overall expression of markers increased in drug sensitive cells relative to markers increased in drug resistant cells within a sample. Because D is an internally normalized metric, obtained by subtracting averaged signals from two sets of phosphosites, proteins, or transcripts within a given sample, DRUML can use D to predict drug responses in a new cancer-derived sample without comparing it against a control or reference sample set.

"The ability of DRUML to predict drug rankings within a cancer cell population, without the need to compare to reference samples, is crucial for the clinical implementation of ML and fulfills a core aim of precision medicine," the authors wrote.

Do you have a unique perspective on your research related to artificial intelligence or proteomics? Contact the editor today to learn more.

---

Join The Science Advisory Board today!

If you like this content, please share it with a colleague!