December 1, 2021 -- A new approach to machine learning (ML) outperformed current ML methods in drug design, demonstrating its potential in speeding up the drug discovery process, according to research published online November 29 in the Proceedings of the National Academy of Sciences.
The new approach, dubbed "transformation machine learning" (TML) by the research team, made better predictions than traditional ML in three domains that address scientific problems, including drug design.
"In drug design, we found that TML provided insight into drug target specificity, the relationships between drugs, and the relationships between target proteins," wrote the authors, led by Ivan Olier of the School of Computer Science and Mathematics at John Moores University in the U.K.
Traditional ML vs. TML
Traditional supervised ML algorithms are trained on labeled examples (for example, labeled photographs of different animals), from which they learn to recognize intrinsic features (for example, "furry" and "small"). TML instead relies on extrinsic features derived from predictions from ML models trained on other related tasks.
For example, to train a TML model to recognize all known species of animals, with new ones expected to be added, one would start by applying existing prediction models for known species such as cats, rabbits, and donkeys. The outputs of these models would generate new extrinsic features such as "catness," "rabbitness," and "donkeyness," which would then be used to train a metalevel ML model to make predictions using this level of representation. The approach enables TML models to capture attributes of animals not originally encoded such as cuteness (shared by cats and rabbits) and having eyes at the side of the head (shared by rabbits and donkeys).
"Where a typical ML system has to start from scratch when learning to identify a new type of animal -- say a kitten -- TML can use the similarity to existing animals: kittens are cute like rabbits, but don't have long ears like rabbits and donkeys," said Ross King, a professor in Cambridge's Department of Chemical Engineering and Biotechnology, who led the research, in a statement. "This makes TML a much more powerful approach to machine learning."
Promise of drug discovery
The researchers said that TML shows particular promise in the area of drug discovery. Whereas a typical ML approach will search for drug molecules based on intrinsic features such as molecular shape and structure, TML speeds up the process by checking what other ML models convey about a particular molecule.
The paper includes a case study using TML to predict quantitative structure activity relationships (QSAR), a common step in early-phase drug discovery. Given a target (usually a protein) and a set of chemical compounds (small molecules) with associated activities (e.g., inhibition of the target protein), the QSAR task is to learn a predictive mapping from molecular representations to activities. In the TML approach, standard ML methods based on intrinsic descriptors are first applied to existing QSAR prediction tasks, and then their outputs are used as the extrinsic features for a new TML model that can be applied to a new QSAR task.
To evaluate the TML approach on QSAR learning, the researchers trained a variety of ML methods on 2,219 QSAR problems using 1,024-bit molecular fingerprint representations as intrinsic features. They then used the predicted compound activities from the previously learned ML models as extrinsic attributes for the TML QSAR model.
Even though the QSAR datasets had been extensively studied in the literature (using a total of 18 learning methods and six molecular representations, according to the authors), a comparison showed that TML significantly outperformed the best previous results.
"I was surprised how well it works -- better than anything else we know for drug design," said King. "It's better at choosing drugs than humans are -- and without the best science, we won't get the best results."
Do you have a unique perspective on your research related to drug design or machine learning? Contact the editor today to learn more.