key: cord-0930768-85ncc8v7 authors: Chang, Oscar; Gonzales-Zubiate, Fernando A.; Zhinin-Vera, Luis; Valencia-Ramos, Rafael; Pineda, Israel; Diaz-Barrios, Antonio title: A protein folding robot driven by a self-taught agent date: 2020-12-29 journal: Biosystems DOI: 10.1016/j.biosystems.2020.104315 sha: 1be2db46746f6d6f07cc7863eb5c9036ab3a330f doc_id: 930768 cord_uid: 85ncc8v7 This paper presents a computer simulation of a virtual robot that behaves as a peptide chain of the Hemagglutinin-Esterase protein (HEs) from human coronavirus. The robot can learn efficient protein folding policies by itself and then use them to solve HEs folding episodes. The proposed robotic unfolded structure inhabits a dynamic environment and is driven by a self-taught neural agent. The neural agent can read sensors and control the angles and interactions between individual amino acids. During the training phase, the agent uses reinforcement learning to explore new folding forms that conduce toward more significant rewards. The memory of the agent is implemented with neural networks. These neural networks are noise-balanced trained to satisfy the look for future conditions required by the Bellman equation. In the operating phase, the components merge into a wise up protein folding robot with look-ahead capacities, which consistently solves a section of the HEs protein. In present days, viral infection is a matter of great concern because a massive spread of a virus can significantly impact the well-functioning of society. This catastrophic effect has been painfully witnessed by the brutal blow of COVID-19 caused by the virus SARS-CoV-2 (Severe acute respiratory syndrome coronavirus-2) (Gorbalenya et al., 2020) . SARS-CoV-2 is one of the seven known coronaviruses to infect humans, including: HCoV-HKU1, HCoV-229E, HCoV-NL63, HCoV-OC43, MERS-CoV, SARS-CoV, and SARS-CoV-2 (Lai and Cavanagh, 1997; Hu et al., 2015; Su et al., 2016; Cui et al., 2019; Chan et al., 2020) . The infection pathway of the coronaviruses occurs when the Spike (S) protein, a viral membrane protein, is attached to specific receptors found in the surface of the host cell. This interaction triggers a conformational change in the Spike protein and stimulates membrane fusion between the viral and cell membranes (Lim et al., 2016) . Hemagglutinin-Esterase, a glycoprotein present in some of the coronaviruses, helps in this attachment and in the destruction of certain sialic acid receptors of the host cell (Huang et al., 2015; Lang et al., 2020; Zeng et al., 2008) . Effective infection depends on virus-host cell interactions, specifically on the structure of viral and host proteins; consequently, knowledge of protein folding mechanisms is crucial to develop a strategy for virus control (Doms et al., 1993) . Artificial Intelligence (AI) plays an important role in modern protein folding research. Qin et al. (2020) presented a neural network model capable of learning how a specific amino acid sequence folds into a protein structure. Similar studies also tackle the protein folding problem using deep learning, machine learning, and other variations of neural networks (Xu, 2019) . Lately, significant advances have been reported about deep reinforcement learning algorithms and self-taught agents (Chang et al., 2018b; Anjomshoae et al., 2019) . For example, deep reinforcement learning and self-taught agents are now capable of outperforming humans in playing video games by watching pixels on the screen. This technique can also be adapted to learn about the protein folding problem. This paper describes a robotic structure that simulates a stable peptide chain embedded inside the Hemagglutinin-Esterase. This robotic structure is controlled or driven by a self-taught agent with the capacity to explore for future rewards and learn by itself the efficient protein folding policies required to create the strong local bond between cysteine (C) amino acids and hydrogen bonds to form an anti-parallel β-pleated sheet. Our model uses a combination of robotics, artificial neural networks, gradient descent, and reinforcement learning. We prove that using a subjacent robotic structure, an efficient protein folding process (policy), can be learned by a self-taught agent. The peptidic region from the protein HCoV -HKU1, Hemagglutinin-Esterase (HEs) was chosen to test the capacities of the virtual robot/agent to learn efficient folding policies. HEs structure consist of two identical polypeptide chains; each polypeptide contains three important domains: the membrane proximal domain (MPD), the esterase domain (E), and the receptor domain (R) (de Groot, 2006) . (Fig. 1A) . Based on the recent solved structure of HEs (Hurdiss et al., 2020; Sehnal et al., 2018) , RCSB Protein Data Bank (www.rcsb.org), the receptor domain is constituted by eleven secondary structures: one α-helix and 10 β-strands. The formation of a secondary structure is the first step in the folding process that protein takes to assume its native structure. Two of these β-strands form an anti-parallel β-pleated sheet structure stabilized by a disulfide linkage and hydrogen bonds between the amino acids in both sequences: 182 LYLVPLCL 189 and 218 DCIYI 222 (Fig. 1B) . To analyze the folding of this structure, the sequence from residue 190 to residue 217 was replaced by a β-turn structure with the following amino acids: GSPN. Then, the final peptide to test the self-taught capacity of the proposed agent becomes: 1 LYLVPLCLGSPNDCIYI 17 The use of artificial intelligence in protein folding has recently become of much interest. The latest studies have focused on determining which is the most stable configuration of proteins (Noé et al., 2020; AlQuraishi, 2019) . AlphaFold (Senior et al., 2020 ) is a program belonging to Google, created to predict how proteins will fold themselves. The process is based on the principle that it is possible to infer which amino acid residues are in contact by analyzing co-variation in homologous sequences, which aids in the prediction of protein structures. AlphaFold shows that it is possible to train a neural network to make predictions of the distances between pairs of residues in an efficient way, with highly accurate results. This process gives more information about the structure of the protein. The system is optimized by gradient descend algorithm, and it represents a considerable advance in protein-structure prediction. In other recent works a Multi-scale Neighborhood -based Neural Network (MNNN) is proposed . The method focuses on alpha-helical proteins (as found in skin, hair, and many other mechanically relevant protein materials), and it is designed to learn how a specific amino acid sequence folds into a protein structure. The algorithm predicts the protein structure without using a template or co-evolutional information having a maximum error of 2.1 Å. Finally, the method finds that the prediction accuracy is higher than other models, and the prediction consumes less than six orders of magnitude time than ab initio folding methods. MNNN is a relevant method because it can predict the structure of an unknown protein that agrees with experiments. Therefore the model presents a great advantage in the rational design of new proteins. Other AI techniques have been tried before. In (Staples et al., 2019 ) AI methods and features are proposed, which explore new methods based on Reinforcement Learning and Support Vector Machine. Many of these methods require enormous computational power, however in (Xu, 2019) it is shown that using a powerful deep learning technique, even calculated with a personal computer one can predict new folds much more accurately than ever before. This method also works well on membrane protein folding. In previous work (Chang et al., 2018a) , we proposed an autonomous neural controller (ANC) capable of handling the mechanical behavior of a virtual, multi-joint robot, with many moving parts and sensors distributed through the body. This robot has an internal neural arrangements that burns "dark energy" (Raichle, 2006) , energy that seems to go nowhere, and generates behavior initiation by itself. This capacity is utilized first to create a learning agent that acquires good strategies or policies and second to assemble a robot whose mechanical dexterity is driven (controlled) by such self-taught agent, behaving as a Markov process. This agent is capable of running several thousands of learning episodes, after which the robot is capable of learning to efficiently lift by body contortions a heavy ball lying on the floor. For this paper, the model is adapted as follows: the peptide chain is constructed with a set of 17 moving parts, each of them representing one amino acid supported by a rigid structure, and having a set of closerange sensors, responsible for sensing and process near molecular forces. The associated self-taught neural agent is assembled with 17 sigmoidal neurons, inhibiting each other with balanced negative weights and sharing a common self-activating excitatory input ramp K, that forces all participant neurons in racing toward a 1.0 output. During a ramp, after a certain time, one participant (winner) will cross a threshold (end line), and values will be assigned to the agent outputs, according to the arrival order of the other neurons. Continuously added internal noise makes the race balanced and unpredictable (random). The structure of the self-taught neural agent is shown in Fig. 2 . The race output is converted to a vector of bounded positive and negative number through the formula: where alpha controls the degree of folding deformation of per cycle and the value 0.7 is an offset required to generated positive and negative amounts. Being an intrinsic neural structure, this trainable energy-burning network is used as the propelling element of the peptide chain folding, creating a dynamic ever moving creature. Before it becomes an efficient folder, the agent needs to develop adviser networks that will undergo an algorithmic training process where knowledge about folding is acquired and store through reinforcement learning principles. This clue information is finally saved in the weights (synapses) of a well-trained network. Once this goal is achieved, the network becomes an adviser to the agent, which can now take wise Markov decisions (see Fig. 3 ). When isolated from the world in each race, the agent generates a set of random folding information (increment/decrement) in terms of variations or "deltas" angles. These deltas are added to the values of the current angles, taken the folding to a new state where the immediate reward is captured. To satisfy Bellman equation requirements, the agent explores further into the future and moves to a new "imaginary" state, where it searches for a MAX value hidden somewhere in the logic of the proteins. A resultant discount reward is formed with the sum of both rewards, and the resulting path is memorized by gradient descend in the incipient adviser neural network. This equation states that the optimization to obtain a maximal reward in a sequential chain of state-decision events is solved by a summation of immediate events and events that happen in the future (Sutton and Barto, 2018) . The formula is defined by: The main established prerequisite is that to obtain a good reward in terms of control; the agent has to explore the future and search for a discounted MAX value, hidden somewhere in complexities of the system being represented. In terms of this paper, exploring the future requires a special kind of noise balanced training process, discussed in (Chang and Zhinin-Vera, 2020). A robot is designed to carry out the space-time physical actions of folding a peptide chain of 17 amino acids belonging to a receptor domain of the HEs protein, specifically the sequence: 1 LYLVPLCLGSPNDCIYI 17 . Amino acids are linked together through a strong peptide bond, and every bond has an actuator (muscle) that controls the folding angle (angle[i]) between every two of them; this angle is codified in an 8 bits signal that covers the range from − 90 to +90 • from the center or aligned position. The set of folding angles is a known parameter and defines the internal representation of the robot. This internal representation is one of the input vectors that the agent receives as input in order to generate the set of variations or delta[i] required to produce the next folding state through the actual_angle plus the delta angle generated in Equation (1). Each amino acid has six sensors that determine the kind of amino acid located in its neighborhood, close enough to create short-range molecular effects, like strong bonds or polarity arrangement. An individual sigmoidal neuron of a bigger network controls each actuator. The network controls the overall space-time folding of the strand. Each output neuron produces a signal that represents the increment or decrement that each angle will experience in the next cycle. Fig. 4 shows a full protein-folding robot schema. In this section, the algorithm that learns to perform protein folding is described. In the initial state, the protein is presented in a slightly straight position. The neural agent burns energy to generates new positions and explores future rewards. When a reward is found, the agent memorizes through gradient descend and reinforcement learning the paths leading to even bigger rewards (good protein folding). The agent works by episodes that last at most 4000 training cycles. During each episode, the agent learns by positioning the peptide chain in one initial state and then exploring new territories by producing several angles and folding variations in the neighborhood. The reward should be maximal when the cysteine-cysteine bond is formed, and β-sheet appears, but given the very large number of possible foldings, the probability that these two events simultaneously occur by random search is very small. Instead, the agent will learn the folding in two logical stages: first accomplish the cysteine-cysteine bond (the stronger attraction between the given amino acids) and second form the β-sheet, with weaker bonds, but it has to be polarity aligned: Amino acids Leucine (L), Isoleucine (I), Proline (P) and Tyrosine (Y). During the cysteine bonding, the maximal reward is obtained when sensors detect a close encounter between cysteines. The previous folding state and sensors are taken as input, and the current delta folding vector Fig. 2 . Self-taught Neural Agent. Neurons are all equally excited by a repetitive ramp K. In each run, the agent burns "dark energy", fires and declares a unique vector folding variation (blue-red column to the right). These self-generated elements are added to the current vector angle, and the protein undergoes an internally achieved small twisting. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) Fig. 3 . Adviser Neural Network Architecture. A reinforcement learning trained network receives the internal state (angle vector) and the sensor information as input. With this composite "image" the network has to learn to generate changes (predictions) in the protein folding that improve the probability of obtaining useful structures. as a target by gradient descend. To satisfy the search for the future conditions, this gradient descent algorithm compensates using noise as input and the NULL vector as target (Chang and Zhinin-Vera, 2020) . For every training cycle, the agent moves to a new state (folding) and once here captures the local reward and searches in its neighborhood for a folding sate that may produce MAX future reward. Once these two parameters are obtained, a reward discount is applied in terms of back propagation cycles (Chang and Zhinin-Vera, 2020) . During the β-sheet folding, the maximal reward is obtained when sensors detect a polarity match between the involved amino acids (L, I, P, Y). Again the previous folding state is used as input and the current delta folding vector as a target, using noise to compensate. After training the agent rapidly drives the peptide chain from any given starts position to the learned (β-pleated sheet) fold, following a Markov chain of decisions. The pseudo-code is shown in Algorithm 1. Several episodes with a maximum of 4000 cycles each are carried out. The episode starts with peptide chain in an initial straight random position. In each cycle, random delta angles ± are added to all angles and reward is measured as the proximity of cysteines. If nothing happens (reward = 0), explore the future with new delta angles. If cysteinecysteine proximity is detected by the sensors, do backpropagation, do noise balancing. Look into the future, if β-sheet is formed do backpropagation, do noise balancing. The knowledge acquired by the adviser network is now used as an input to the self-taught agent. Result: Efficient folding following a Markov chain. Folding processes include thermodynamic stability of the complex with concomitant formation of covalent and non-covalent bonds to form a stable structure. The first moves of the agent are focused on accomplishing the covalent bond between the cysteines and sequentially the hydrogen bonds, in order to stabilize the β-pleated sheet. The system learned to follow these rules and folded a partial section of the receptor domain of HEs protein in a competent way, needing only short-range sensors and a low-resolution representation of its internal state. Acquired knowledge makes possible to follow a short path when folding the peptide chain 1 LYLVPLCLGSPNDCIYI 17 into the protein shape: one β-pleated sheet. The sequence is shown in Fig. 6 . The performance of the program can be observed in Fig. 5 , where it is shown how the error of the agent decreases as it becomes a stable protein structure guided by rewards. The error that is measured corresponds to the distance between the cysteines and the rest of the proteins that make up the β-sheet. Fig. 4 . The protein folding robot. A multi-joint robot and its associated neural blocks have to learn to fold a peptide chain belonging to the HEs protein. The mechanical simulator is a chain of operative joints with independent muscles and neural controllability. The neural network receives sensor information and internal representation as input and produces folding information as output. The resultant robotic assembling and its driving agent must learn to fold a straighten chain into a 3D structure that comprises a cysteine-cysteine bond and an associate β-pleated sheet. Our study describes a method to design a robot driven by a selftaught generative agent that learns by reinforcement learning efficient policies to fold a section of the HEs protein. The agent sees the protein state through sensors and reads information about its current bending angles and molecular proximities. The agent then decides to move to a new state that leads towards a possible useful folding using the visual information gathered from the sensors. The robot contains an artificial peptide platform that behaves like a mechanical bending structure with a sensor that feeds sparse data to the self-taught generative neural agent. We have obtained comprehensive results proving that our agent working on this robot-like structure can learn to solve protein folding problems, it was also found that folding knowledge can be preserved in stable weights parameters. We are currently in the process of investigating complex folding problems, including other residue interactions such as hydrophobic and electrostatic. Beside the predicted distances and angle between pairs of amino acids, the agent will learn that specific amino acids would be involved in both interactions with different strength. In sum, the data presented in this paper together with complementary analysis will complete a full-protein folding training. Fig. 6 . Protein folding sequence. First, the agent has learned to bend the chain until the strongest Cysteine-Cysteine (in black) bond is completed. Once this interaction is stabilized, the agent uses its knowledge to align hydrogen bonds and finish the expected structure of a β-pleated sheet. The dotted lines symbolize other possible protein folds, which are not realized. The simulation shown in this figure is generated by the proposed system. Oscar Chang, Ph.D. obtained a Ph. D in E.E from Penn State University. He has developed academic and research work in universities in South America and Spain. His current research at Yachay Tech University deals with designing Neural Self-taught Agents, computer programs capable of exploring and learning new solutions. He has developed sophisticated virtual robots that explore and learn, elaborated mechanical wave motions and intelligent mass displacements in complex environments, that combine energy consumption, speediness of lifting and robot equilibrium. Lately, it is proved that these spirited robots are capable of learning to solve by themselves the intrincated protein folding that occurs during virus infection, opening a new method for possible protein study and Coronavirus control. Luis Zhinin-Vera, Master Student received his bachelor degree in Information Technology (Yachay Tech). His degree project « Credit Card Fraud Detector using Artificial Intelligence» was the motivation to research. His strongest knowledge being the field of Artificial Intelligence and Machine Learning. He worked on projects in INEEL (Mexico), Radical Ltda. (Ecuador) and Universidad Católica del Norte (Chile). He has developed social projects being Winner Top 10 HackTech COVID-IEEE SIGHT and he has participated in the regional -Hult Prize Foundation. He is CO-founder of MIND Research Group and master student in Artificial Intelligence at Universidad Politécnica de Madrid (Spain). Antonio Diaz, Ph.D. Currently works at the Chemistry and Engineering School, University of Yachay Ecuador. Antonio carried academic research in polymer science which later became commercial technologies in Ziegler-Natta polyolefin catalyst and vinyl acrylic polymer for use in adhesives and architectural coatings. His current field of research interest are polymeric gel for controlled liberation of drugs and other active principles, hybrid polymer-nanoparticle compounds for antibactericide coatings, high performance nanocomposites,and interfacial strategies for petroleum production enhancement. End-to-end differentiable learning of protein structure Explainable Agents and Robots: Results from a Systematic Literature Review Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan Autonomous robots and behavior initiators Self-programming robots boosted by neural agents A wise up visual robot driven by a self-taught neural agent Origin and evolution of pathogenic coronaviruses Folding and assembly of viral membrane proteins The species severe acute respiratory syndrome-related coronavirus: classifying 2019-ncov and naming it sars-cov-2 Structure, function and evolution of the hemagglutinin-esterase proteins of corona-and toroviruses Bat origin of human coronaviruses coronaviruses: emerging and re-emerging pathogens in humans and animals susanna lau positivestrand rna viruses Human coronavirus hku1 spike protein uses o-acetylated sialic acid as an attachment receptor determinant and employs hemagglutinin-esterase protein as a receptor-destroying enzyme Cryo-em structure of coronavirus-hku1 haemagglutinin esterase reveals architectural changes arising from prolonged circulation in humans The Molecular Biology of Coronaviruses Coronavirus Hemagglutinin-Esterase and Spike Proteins Co-evolve for Functional Balance and Optimal Virion Avidity Human coronaviruses: a review of virus-host interactions. Diseases 4, 26 Machine learning for protein folding and dynamics X19301447. folding and Binding Proteins Artificial intelligence method to design and fold alpha-helical structural proteins from the primary amino acid sequence Neuroscience. the brain's dark energy Mol*: towards a common library and tools for web molecular graphics Improved protein structure prediction using potentials from deep learning Artificial intelligence for bioinformatics: applications in protein folding prediction Epidemiology, genetic recombination, and pathogenesis of coronaviruses Adaptive Computation and Machine Learning Series Distance-based protein folding powered by deep learning Structure of coronavirus hemagglutinin-esterase offers insight into corona and influenza virus evolution