Deep-learning approach easily identifies druggable protein sites

By Samantha Black, PhD, ScienceBoard editor in chief

October 27, 2020 -- A deep-learning algorithm was able to automatically identify binding sites on proteins that could be good targets for drug candidates, according to research published October 27 in Communications Biology. The algorithm, called Binding Site Neural Network (BiteNet), could help researchers sort through the almost 20,000 proteins in the human genome to find the ones that are most promising.

Protein binding sites -- areas of intermolecular interactions in spatial regions -- are common pharmaceutical targets, where designed drug-like molecules can bind and exert their influence. However, protein binding sites are often dynamic and are mediated by conformational changes of the 3D protein structure.

In the past, researchers have searched for potential binding sites on proteins by fragment screening and site-directed tethering using antibodies, small-molecule microarrays, hydrogen-deuterium exchange, or site-directed mutagenesis. But because the human genome has 20,000 proteins, these methods can be resource-heavy and are prone to error.

Alternatively, computational methods have the potential to identify binding sites on a large scale while taking into account protein flexibility via molecular dynamics simulation. These methods can also probe to fit chemical compounds using virtual ligand or fragment-based screening. Typical approaches rely on empirical scoring based on known structural information for machine learning algorithms but may lead to false-positive predictions (identification of undruggable regions).

In the study, researchers from the iMolecule group at Skolkovo Institute of Science and Technology (Skoltech) Center for Computational and Data-Intensive Science and Engineering (CDISE) in Moscow developed BiteNet for large-scale and spatiotemporal identification of protein binding sites.

"BiteNet is based on the computer vision, we treat protein structures as images, and binding sites as objects to detect on the images," according to Igor Kozlovskii, first author and PhD student at Skoltech. "It takes about 0.1 seconds to analyze one spatial structure and 1.5 minutes to evaluate 1,000 protein structures of about 2,000 atoms."

The machine-learning technique considers protein conformation as 3D images, binding sites as objects on the images to detect, and conformational ensembles (sets of conformations) as 3D videos to analyze. BiteNet can be used to identify both orthosteric sites (direct binding to endogenous ligands) and allosteric sites (binding away from active sites that can cause conformational changes at a second site). With this approach, detected conformations with observed binding sites of interest can then be used for structure-based drug design approaches, such as molecular docking and virtual ligand screening, as well as structure-based de novo drug design.

To demonstrate the power of BiteNet, the researchers applied binding site detection to three challenging pharmacological targets: the 2X3 receptor of the adenosine triphosphate (ATP)-gated cation channel family, the epidermal growth factor receptor of the kinase family, and the adenosine A2A receptor of the G-protein coupled receptor family.

The highly polarized ATP-specific interface of the ATP-binding site makes targeted drug design via orthosteric binding difficult. However, allosteric binding targeting protein-protein interactions is promising. BiteNet correctly identified oligomer-specific allosteric binding sites formed by the subunits of the trimeric P2X trimer receptor complex.

Although there are epidermal growth factor receptor inhibitors targeting the orthosteric binding site of the kinase domain, proteins found in cancer cells often have amino acid substitutions that make it insensitive to such inhibitors. BiteNet accurately identified a conformation-specific allosteric binding site of the epidermal growth factor receptor kinase domain.

There are potential allosteric binding sites spanning extracellular, transmembrane, and intracellular regions that can be leveraged for identification of novel allosteric sites in G protein-coupled receptors that can provide alternative options for drug discovery. Specifically, BiteNet predicted regions that may correspond to novel allosteric binding sites based on molecular dynamics trajectories in human adenosine A2A receptors.

The researchers found BiteNet to be computationally efficient, with higher performance relative to other benchmarked computational methods.

"The human genome consists of nearly 20,000 proteins, and very few among them get associated with a pharmacological target, said author Petr Popov, PhD, assistant professor at Skoltech CDISE. "Our approach allows searching the protein for binding sites for drug-like compounds, thus expanding the array of possible pharmacological targets. Besides, initial structure-based drug discovery strongly depends on the choice of the protein's atomic structure. Working on a structure with the binding site barred for the drug or missing altogether can fail. Our method enables analyzing a large number of structures in a protein and finding the most suitable one for a specific stage."

Do you have a unique perspective on your research related to cell biology or drug discovery? Contact the editor today to learn more.

---

Join The Science Advisory Board today!

If you like this content, please share it with a colleague!