Proteins are the building blocks of life, and the scientific community has long desired to create new proteins that perform new functions and processes. However, the creation of new proteins often requires mimicking existing structures or manually editing the amino acids of existing proteins -- processes that are time-consuming and difficult to achieve.
Proteins are made up of 20 naturally occurring amino acids that are assembled into hierarchical structures. They can be classified in three ways, based on their structure:
- Primary structure -- their sequence of amino acids
- Secondary structure -- their pattern of hydrogen bonds between amino acids that form alpha-helices or beta-sheets
- Tertiary structure -- their overall 3D structure
In the current study, researchers from the U.S. and Taiwan discussed how they theorized that protein design had commonalities with music theory. Both systems feature hierarchical structures: For proteins, the structures are amino acids that are organized into 3D spatial domains. For music, different instruments play notes, forming melodies, chords, and other complex structures in the time domain. Further, the researchers decided to leverage machine learning to contextualize the analogy.
"These networks learn to understand the complex language folded proteins speak at multiple time scales," said Markus Buehler, PhD, professor of engineering at the Massachusetts Institute of Technology, in a statement. "And once the computer has been given a seed of a sequence, it can extrapolate and design entirely new proteins by improvising from this initial idea, while considering various levels of musical variations -- controlled through a temperature parameter -- during the generation."
Based on the musical analogy, existing protein structures were translated into musical scores and used as a training set for the deep neural network. Different pitches were used for each amino acid, and variations in note length and note volume reflected secondary structure information and information about the chain length of protein molecules. This new type of sonification, or the use of nonspeech audio to convey information or perceptualize data, was modified from previous approaches that primarily focus on predicting folding patterns of known proteins.
Using musical scores to code the structure and folding of proteins composed of amino acids, each of which vibrates with a unique sound. Image courtesy of Markus Buehler, PhD.
The artificial intelligence technology learned hierarchical structures of protein sequences through a model based on long short-term memory, a type of recurrent neural network.
The deep-learning model was used to design de novo musical scores in which the pitch information was reverse translated into the sequence of amino acids of de novo proteins. The researchers used a basic local alignment search tool to compare the predicted amino acid sequences against known proteins and estimated folded protein structures using the Optimized protein fold RecognitION method (ORION) and MODELLER.
Through these steps, the researchers demonstrated that this deep-learning method could be used to generate de novo designer proteins that do not exist yet.
"This paves the way for making entirely new biomaterials," Buehler commented. "Or perhaps you find an enzyme in nature and want to improve how it catalyzes or come up with new variations of proteins altogether."
The "protein music" the researchers uncovered could also help create new compositional techniques in classical music by illuminating the rhythms and tones of proteins, a method Buehler refers to as materiomusic.
"In the evolution of proteins over thousands of years, nature also gives us new ideas for how sounds can be combined and merged," Buehler said.
Do you have a unique perspective on your research related to physics or bioengineering? Contact the editor today to learn more.
Copyright © 2020 scienceboard.net