key: cord-340554-7cwp2xbw
authors: Yamasaki, Satoshi; Amemiya, Takayuki; Yabuki, Yukimitsu; Horimoto, Katsuhisa; Fukui, Kazuhiko
title: ToGo-WF: prediction of RNA tertiary structures and RNA–RNA/protein interactions using the KNIME workflow
date: 2019-03-06
journal: J Comput Aided Mol Des
DOI: 10.1007/s10822-019-00195-y
sha: 
doc_id: 340554
cord_uid: 7cwp2xbw

Recent progress in molecular biology has revealed that many non-coding RNAs regulate gene expression or catalyze biochemical reactions in tumors, viruses and several other diseases. The tertiary structure of RNA molecules and RNA–RNA/protein interaction sites are of increasing importance as potential targets for new medicines that treat a broad array of human diseases. Current RNA drugs are split into two groups: antisense RNA molecules and aptamers. In this report, we present a novel workflow to predict RNA tertiary structures and RNA–RNA/protein interactions using the KNIME environment, which enabled us to assemble a combination of RNA-related analytical tools and databases. In this study, three analytical workflows for comprehensive structural analysis of RNA are introduced: (1) prediction of the tertiary structure of RNA; (2) prediction of the structure of RNA–RNA complexes and analysis of their interactions; and (3) prediction of the structure of RNA–protein complexes and analysis of their interactions. In an RNA–protein case study, we modeled the tertiary structure of pegaptanib, an aptamer drug, and performed docking calculations of the pegaptanib-vascular endothelial growth factor complex using a fragment of the interaction site of the aptamer. We also present molecular dynamics simulations of the RNA–protein complex to evaluate the affinity of the complex by mutating bases at the interaction interface. The results provide valuable information for designing novel features of aptamer-protein complexes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s10822-019-00195-y) contains supplementary material, which is available to authorized users.

Recent advances in micro-array and sequencing technologies have revealed that large amounts of non-coding RNAs exist in cells. For example, in human cells, tens of thousands of long non-coding RNAs have been identified [1] . The functions of non-coding RNAs have been studied extensively, and relationships between mutations in non-coding RNAs and diseases have been investigated [2] [3] [4] . These non-coding RNAs have various functions, including regulation of gene expression and catalysis of biochemical reactions inside cells. Protein-RNA and RNA-RNA interactions are key elements in such functional processes. The RNA-induced silencing complex (RISC), small nuclear ribonucleoproteins (snRNPs) and small nucleolar RNA-protein complexes (snoRNPs) are protein-RNA complexes that regulate gene expression. Additionally, RNA interference can be used readily as a mechanism to decompose target mRNAs [5] [6] [7] . Several types of non-coding RNAs are associated with serious diseases. In cancer cells, many types of micro-RNA (miRNA) are expressed and affect tumor progression or 1 3 metastasis by interacting with various tumor factors [8] [9] [10] [11] [12] . Interaction between cellular human miRNA and viral RNA is important for infection or defense processes of pathogenic viruses, including HIV, influenza virus, SARS coronavirus and Ebola virus [13] [14] [15] [16] [17] . Moreover, there is a general acceptance that many kinds of structured non-coding RNAs exist in nature and play important roles in living cells; however, details of their functional mechanisms remain largely unknown [18] [19] [20] .

Non-coding RNAs have therapeutic value as potential drugs and drug targets [21] . There is high expectation that nucleic acid medicines will alleviate or cure disease processes, because specifically designed nucleic acids (oligonucleotides) will directly target DNA, RNA, or proteins that cause diseases. The depletion of drug discovery targets in low molecular weight drug discovery programs, the problems associated with mass production of antibody drugs that show high specificity, and issues of high cost to sales ratios will possibly be solved by nucleic acid medicine. Consequently, the development of nucleic acid medicinal approaches is attracting significant attention. Currently, five nucleic acid-based drugs have been approved and are on the market worldwide. Several companies are considering commercialization based on industry-academia co-operations, and a dramatic increase in this market is anticipated in the near future [5, 22] .

Nucleic acids generally play important roles in preserving genetic information and protein synthesis, and these biomolecules form various three-dimensional structures. In an effort to exploit their potential benefit to human health, it is important to carry out detailed analyses of protein-RNA or RNA-RNA interactions. Thus, solving the tertiary structures of RNA molecules, as seen with proteins, should aid nucleic acid drug development. Therefore, a large body of research investigating or predicting the tertiary structures of RNA has been performed [19, 20, [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] . Previously-developed software for predicting the secondary (RNAfold [34] , Mfold [35] , CentroidFold [36] ) and tertiary structures (FARNA [23] , MC-fold and MC-sym [24] , RNA2D3D [27] , SimRNA [32] , SPQR [33] ) of RNA molecules from their nucleotide sequences are available. A recent computational challenge is the prediction of tertiary structures of more difficult targets such as RNA-RNA and RNA-protein complexes. The current difficulties associated with using available tools and software are the requirement of several complex interactive manipulations and a user that has a strong understanding of RNA tertiary structures. To simplify the task of computational RNA analysis, we provide a novel workflow, which facilitates the study of RNA molecules from their sequence to their tertiary structure, and their interactions with various biomolecules. By using this workflow, it is possible to use various analysis tools, software and databases developed for RNA analytics in a single platform. For example, in the case of tertiary structure prediction, the input sequence selected from a database passes to secondary structure prediction software and the calculated secondary structure is used for tertiary structure calculations in a one-stop platform. The intermediate results of each computational step can be visualized.

Workflow systems with a graphical user interface are useful tools for addressing how to treat large-scale data and to efficiently integrate analytical resources [37] . We introduce the ToGo-WF workflow as a bioinformatic analytical resource, which is capable of analyzing, modeling and visualizing biological data by using the KNIME (Konstanz Information Miner) platform [38] . The KNIME workflow platform is freely available via http://www.knme.org. Over the last few years, we have released several bioinformatic nodes, mainly developed by bioinformatics groups at AIST, and have deposited them in the node repository using the KNIME environment. On this platform, users can combine independent analytical resources, called nodes in KNIME, from the repository of bioinformatics tools developed at AIST and external tools/databases by selecting them in an easy drag-and-drop manner. In this study, we have developed a workflow that combines RNA-related analytical nodes to predict RNA tertiary structures and RNA-RNA/protein interactions that provide insights and a starting point for tertiary structural analysis by RNA molecular simulations. Using the structure of an RNA-protein complex obtained from this workflow, we have performed molecular dynamics simulations of this RNA-protein complex to analyze the structural stability and affinity of the interaction.

We have created several novel RNA analytical nodes for predicting RNA tertiary structures and RNA-RNA/protein interactions. The KNIME environment, including the RNA nodes, can be downloaded from the ToGo-WF website (http://togo.medal s.jp/activ e_local _rna_predi ction .eng.html) for Microsoft Windows, Linux and Mac OSX operating systems. We have also posted installation instructions and user manuals for the RNA workflow. After installation, three workflow lines are released by selecting RNA structure prediction in the KNIME explorer, as shown in Fig. 1 . Nodes in the workflow that were developed by AIST include Sparql, CentroidFold, IPknot, RNA2DChecker, Viewer, Fragment-Selector, IntMinMM, RactIP, RASSIE and Rascal. Prediction of RNA secondary structures is performed using the programs CentroidFold [36] , IPknot and RactIP [39] , and tertiary structure prediction is achieved using the programs RASSIE [30] and Rascal [31] . The RNA2DChecker node is used to judge the difficulty of the tertiary structure prediction based on the secondary structure prediction results and from this analysis either RASSIE or Rascal is selected. The details of RASSIE and Rascal based on fragment-based methods are described in references [30, 31] . In brief, prediction using RASSIE is performed by assembling small tertiary structural elements based on secondary structures (stem(duplex), hairpin-loop, internal-loop, single-strandloop). Those small structural elements were derived from RNA structures found in the PDB. Targets with long loop structures that are not found in the PDB or targets with high complexity (e.g., targets with too many structural elements) are switched to the Rascal node by RNA2DChecker. Rascal is based on a fragment-assembly method, which uses dihedral angles of single strand three base fragments and randomly replaces them during 10-50 thousands of steps of Metropolis Monte Carlo simulation. Rascal can predict tertiary structures with any secondary structures because the single stranded three-base fragments derived from RNA structures found in the PDB are not based on their secondary structure like RASSIE. Rascal can handle single and double stranded structures as input with a secondary structure and joint secondary structures of RNA-RNA interactions. Rascal also has the feature of predicting tertiary structures by adjusting base-pair potentials without the secondary structure. RASSIE predicts relatively short chain RNAs with simple secondary structure elements in rapid computational time. Long-chain RNAs with complex secondary structures can be predicted by Rascal; however, more computational time is required to obtain good prediction results.

In the workflow for determining RNA-protein complexes, Fpockts2 [40] and AutoDockVina [41] are used as the analytical resources for RNA-protein docking. The node of the fragment selector for RNA can create the fragment structure The bottom workflow presents the structure prediction of the RNAprotein complex of the given RNA and the RNA fragment is used for docking calculations with the target protein whose active sites are specified by the Fpockts node. After docking, the RNA fragment-protein complex structures are obtained using the Merge Target and Ligand node. The original input RNA structure is reconstructed by the Rebuild node based on the docking position of the RNA fragment, and the DockingAnalyzer node is used to cluster the complex structures. By selecting one of the complex structures, the structure is energetically minimized by molecular mechanics, initMinMM.

In the KNIME environment, users can configure each node by clicking the right mouse button and can view a simple annotation in the node description by clicking the left mouse button. The nodes developed by AIST allow users to execute jobs on our AIST remote computer (not on the user's local computer). The nodes, RASSIE, Rascal, AutoDockVine and initMinMM demand high amounts of computing power and memory. These nodes are in our cloud-based workflow system and jobs with these nodes are submitted to the Message Passing Interface (MPI) parallel jobs system of the AIST cluster machine. The manuals on how to install and use ToGo-WF for RNA can be downloaded from our website (https ://togo.medal s.jp/activ e_local _rna_predi ction .eng.html).

We constructed a database by searching the literature for nucleic acid-based drugs and web services. RNA-based drugs can be classified by the mechanism of activity, and include inhibitors of mRNA translation (antisense) and RNAs that bind proteins and other molecular ligands (aptamers) [42] . Information about the authorization status of nucleic acid drugs, primary structure (sequence information) and previous researchwas primarily collected from the EU Clinical Trials Register (https ://www.clini caltr ialsr egist er.eu), U.S. Food and Drug Administration (https ://www. fda.gov/) and Ionis Pharmaceuticals, Inc. (http://www.ionis pharm a.com/pipel ine/). The developed database for medals Nucleic Acid Drug Database (mNADD) provides "ATC Classification", "Product Name", "Synonymous", "Phase", "Description" and "Target protein ID". In the workflow, the Sparql node (Fig. 1) is used to select a sequence from the mNADD database, which contains approved RNA drugs. The database also has short non-coding RNA entries less than 150 nucleotides obtained from fRNAdb [43] , which is accessible at the National Bioscience Database Center (https ://dbarc hive.biosc ience dbc.jp/en/frnad b/desc.html).

Molecular dynamics (MD) simulations in explicit solvent water were performed to relax the structures derived from either the RASSIE or Rascal module of the workflow. All initial structures were placed at the center of the water box and water molecules were modeled using the TIP3P 3-point charge model potential [44] . K + and Cl − ions were added using the LEaP module in the Amber package to neutralize the net charge of the system and to give the system an ionic concentration of 0.2 M. These systems were energyminimized. All calculations were performed using Amber 16 [45] with the protein.ff14SB (for protein), RNA.OL3 (for RNA) and tip3p (for solvent) force fields [46] [47] [48] [49] with a time step of integration set to 1 fs. All bond lengths involving hydrogen atoms were constrained to their respective equilibrium values by the SHAKE method [50] . Electrostatic interactions were treated with the particle mesh Ewald method with a cut-off of 9 Å [51] . Each system was gradually heated to 300 K during the first 60 ps at a heating rate of 5 K/ps in the NVT ensemble. Following a 200 ps heating and equilibrating NVT simulation, for each system, a 6 ns dynamic calculation was performed in the NTP ensemble, with pressure constant at 1 atm and temperature constant at 300 K using the Berendsen's thermostat [52] .

Molecular Mechanics-Generalized Born Surface Area (MM-GBSA) calculations were performed to evaluate the stability of the docked complex structure. Theoretical free energies were calculated using the MM-GBSA approach [53] . The MD simulations of docked structures were performed using the same method described above. Snapshot structures were sampled every 5 ps (from 0.5 to 1.5 ns of the MD trajectory) and used in estimating the binding free energies. In the analysis of the binding free energies, water molecules were replaced with the implicit solvent GB model. This evaluation of free energies using the MM-GBSA approach was performed for complex structures, which were selected from using the Rebuild node to confirm that the molecules do not overlap following the rebuilding of the fragments after the docking calculation. The structure of the complex that gave the lower binding free energy with adequate distance was used as the template model for building and carrying out simulations of the RNA mutants.

The workflow, RNA Structure Prediction, at the top of Fig. 1 models the tertiary structure of the RNA target according to secondary structure information, which is based on the calculated results from CentroidFold [36] . The RNA sequence input is provided by the user or using the Sparql node, which 1 3 is connected to our mNADD database. Figure 2a, b show the predicted secondary and tertiary structures for RNA aptamer [54] calculated from the CentroidFold and RAS-SIE nodes. The input sequence was obtained from a human IgG-aptamer complex (PDB ID: 3AGV). In the workflow, RNA2DChecker automatically selects RASSIE or Rascal, depending on the complexity of the secondary structure. For the short sequence, RASSIE was selected and succeeded in constructing the tertiary structure based on the secondary structure. Figure 2c , d show the predicted secondary and tertiary structures for the non-coding RNA (FR000373 of fRNAdb) calculated by the CentroidFold and Rascal nodes. The input sequence was selected from mNADD using the Sparql node in the workflow. The tertiary structure of such a long hairpin-loop was predicted using Rascal. The input sequence and the predicted secondary structure in Fig. 2c were more complicated than those in Fig. 2a . Thus, RNA-2DChecker automatically selected Rascal for further tertiary structure calculations in the workflow.

To evaluate the performance and capabilities of ToGo-WF, the predicted structures obtained from the workflow were compared with secondary structures predicted by MCfold, and tertiary structures predicted by MC-sym [24] and SimRNA [32] . The detailed comparison is given in Supplementary Material 1. The workflow predicted the RNA secondary and tertiary structures to a higher accuracy than MC-fold/MC-sym and simRNA. A reason for this higher accuracy is that the fragment-based method with high-resolution potential energy functions in RASSIE and Rascal enables highly accurate tertiary structure predictions. ToGo-WF uses a platform that aims to provide one-stop RNA analysis that can switch the workflow depending on the difficulty of the problem. Additionally, by linking the analysis workflow and database with one platform, large-scale analysis is possible.

The predicted RNA structures can be used easily as initial structures for MD simulations to obtain detailed tertiary structure analysis of conformational changes and the stability of potential chemical modifications. The calculations introduced here for prediction of tertiary structures of RNAs are described in Sect. 5 of the online manual (https ://togo. medal s.jp/downl oad/RNA_UserM anual _E_v1.0.pdf).

RNA-RNA interactions are important for many biological functions and considered to be potential targets of new medicines. The kissing hairpin loop of the dimerization initiation site (DIS) of HIV-1 is a typical example of such interactions. In the lifecycle of HIV-1, the dimerization of genomic RNA is crucial for viral replication [26] . Formation of a kissing loop hairpin structure in the DIS region is the primary event of dimerization. Therefore, inhibiting formation of the kissing loop is a potential target for treating HIV. The workflow, RNA-RNA Structure Prediction, in the middle of Fig. 1 predicts the tertiary structure of RNA-RNA interacting complexes with Rascal using the results of RactIP, which provides RNA two-dimensional complex structures for RNA-RNA interactions with kissing hairpins. We have previously predicted the tertiary structures of many kissing loops and duplex targets successfully from nucleotide sequences by using the RactIP-Rascal pipeline [31] .

Recently, patisiran (onpattro), which is the first siRNA therapeutic agent, was approved by the FDA for treating the rare disease hereditary transthyretin amyloidosis [55] . Using the workflow, we have modeled the tertiary structure of the double-stranded siRNA, patisiran (Fig. 3) . This workflow can model tertiary structures of RNA-RNA interactions .(((....) ))))))), was calculated by the CentroidFold node in the workflow. Gamma was set to 2 ^ 3 using the configuration from CentroidFold. b One of the predicted tertiary structures derived with the RASSIE node. The predicted and native (PDB ID: 3AGV) RNA structures are shown in red and blue, respectively. The rmsd (root-mean-squared deviation) between the predicted and native structure (3AGV) is 3.25 Å. c The input sequence of the putative conserved noncoding region for Rascal is TGT TGA TCC GGC GTG TAA TGA GGT GTT CCT GGA ATC AATA, which was selected from the mNADD database using the Sparql node (FR000373). The secondary structure, ((( ((((((((.................) )))).)))))), was calculated by the CentroidFold node in the workflow. d One of the predicted tertiary structures derived with the Rascal node and double-stranded RNA structures. The obtained tertiary structures can be used for further molecular simulations to compute the dimerization process from the kissing structure and the interaction of RNA-protein complexes. The results may provide new suggestions or data that cannot be explained without an understanding of the tertiary structures of the complexes.

By collecting information about the authorization status of nucleic acid drugs, primary structures and observations from previous research, we found 35 drugs that have gone through clinical trials. They are classified into two major types of nucleic acid-based drugs: one type targets nucleic acids and represents antisense drugs, whereas the other type targets proteins and are termed aptamers. Five of the 35 drugs tested have been approved. Eteplirsen (EXONDYS 51), Vitravene (Fomivirsen), Kynamro (Mipomersen) and Nusinersen (SPINRAZA) are antisense drugs that bind exon51 of dystrophin pre-messenger ribonucleic acid (pre-mRNA), target mRNA of cytomegalovirus, degrade ApoB-100 mRNA expressed in the liver and target survival motor neuron 2 (SMN2) pre-mRNA to inhibit factor suppressing splicing of exon 7 from binding to pre-mRNA, respectively. The other, pegaptanib (Macugen), is an aptamer that binds to vascular endothelial growth factor (VEGF). Pegaptanib is the first approved aptamer of a new therapy to slow vision loss in people with the eye disease neovascular (wet) age-related macular degeneration (AMD) [56] [57] [58] [59] .

Antisense drugs represent the majority of nucleic acids in clinical trials with most targeting cancer and viral infections. Consequently, there is increasing interest in antisense technology. Ions Pharma is currently conducting ongoing clinical trials of other antisense drugs for the treatment of common medical conditions, including rheumatoid arthritis, cancers and Crohn's disease, a serious intestinal illness. Figure 4 shows major adaptation diseases that are current targets of nucleic acid medicines. We summarize the information of nucleic acid drugs in terms of "Approval Status of Nucleic Acid Drugs", "Target Proteins and Number of its Nucleic Acid drugs" and "Number of Nucleic Acid drugs participating in Pathway", which are provided at the website (https ://medal s.jp/enucl eicac id.html). The database is readily accessible to the corresponding server's URL (https :// medal s.jp/edrug nainf o.html) and can be downloaded from https ://dbarc hive.biosc ience dbc.jp/en/nucle ic-acid-drug/ downl oad.html.

In the workflow, RNA based drugs and non-coding RNAs less than 150 nucleotides obtained from fRNAdb are stored in the RDF-formatted database. Users can access the database with the Sparql node and use the environment in which database and analysis tools are connected seamlessly in the workflow.

The RNA-protein workflow is presented at the bottom of Fig. 1 and is used to predict the structure of a nucleic acid drug-target protein complex by molecular simulations. We applied our workflow to the complex formed between the antagonistic aptamer pegaptanib and VEGF. Figure 5 shows the secondary structure of pegaptanib, (((.. ..(((....)) ).......))). When we used the workflow for simple prediction of the secondary structure (Fig. 1, top workflow of the CentroidFold with default parameters), the predicted structure is ((((........  ...........) ))). The secondary structure predicted is very different from the experimentally obtained secondary structure in Fig. 5 . It is difficult to estimate the tertiary structure using an inaccurate secondary structure as input. Thus, in this The input sequences of RNA1 and RNA2 are GUA ACC AAG AGU AUU CCA U (red) and AUG GAA UAC UCU UGG UUA CUU (blue), respectively. The predicted tertiary structure is derived with the Rascal node using the result obtained with the RactIP node case, the tertiary structure is derived using Rascal, which can model structures with internal loops without the need of a secondary structure as a starting point. In obtaining the candidate tertiary structures by Rascal, the potential function was set as a penalty if base pairs were formed between A4-C7 and U18-U24. A matrix file of dimensions (number of bases × number of bases) defining the weight of base-pair potentials is used as input together with the sequence. A detail explanation of how to use the workflow to obtain ten candidate structures of pegaptanib is given in Supplementary Material 2.

From the ten candidate structures derived from the workflow for tertiary structure determination of RNA by the Rascal module, two candidate structures shown in Figure S2 5) . NMR experiments have shown [57] that U14 interacts with the side chain of Cys27 of VEGF. U14 faces outward in the two selected structures. To relax these structures in a water environment, 6 ns MD simulations of each structure immersed in TIP3P water + KCl (0.2 M) were performed.One of the RNA structures that has the U14 base facing outward was selected as the tertiary structure in the RNA-protein docking process.

The relaxed RNA structure and protein structures in PDB format were given as input for the complex structure workflow analysis. Using the FragmentSelector node in the workflow, a three-base fragment A13-U14-G15 from the structure was extracted for docking. Coordinates of residues in the protein that are in the vicinity of the three-base fragment can be easily specified using the Fpocket 2 node. The coordinates near Cys27 in the protein were used as input, and then docking calculations of the pegaptanib fragment and VEGF were performed. The docking result is given in Supplementary Material 3.The structures of the complex were rebuilt based on the docking poses of the RNA fragment. By confirming that the protein and RNA structures do not sterically clash with each other in the process of rebuilding the full-length RNA, we selected three structures of the complex that are reasonable for carrying out MD calculations. Figure 6 shows the selected structures of the complex after the docking and rebuilding processes.

MD simulations and MM-GBSA calculations were performed after energy minimization with the initMinMM node to evaluate the stability and binding free energy of the docked structures. For the three docked structures (Fig. 6 ), 1.5 ns MD simulations in TIP3P water + KCl (0.2 M) were performed, and snapshot structures every ..) )).......))),which is an antagonistic aptamer that targets VEGF 5 ps were taken during the 0.5-1.5 ns simulation period and used for MM-GBSA calculations. The distance between the centroid of base heavy atoms of U14 and the centroid of the Cys10-Cys27 disulfide bond was also assessed to evaluate the protein-RNA distance (Fig. 7) . Previous experiments reported that Cys27 displayed the largest chemical shift change upon aptamer binding in NMR spectra and the mechanism for the formation of a photo-crosslink between U14 and the disulfide bond, Cys27:S-Cys10:S [56] . Among the c1-c3 structures, the simulation of c2 showed the closest distance between the aptamer and protein. Table 1 shows the binding free energy, ΔΔG, derived from the MM-GBSA calculations for the three complexes. Although the calculations showed that the c3 structure had the lowest ΔΔG, indicating the highest affinity, the distance is 8 Å. This distance is too large to reconcile the experimental observation of a photo-cross link. Thus, the c2 structure is considered to be consistent with the experimental results and the probable complex between the aptamer pegaptanib and VEGF. Furthermore, the structure was used for simulations of the mutants, which investigated the stability and affinity of the complex. Three simulations were performed, c2_A, c2_G and c2_C, where U14 in the c2 structure was mutated to A, G and C nucleotides, respectively. The simulations were performed using the same conditions as those used for the native sequence. Table 2 shows the results of these simulations. The native c2 structure showed the shortest distance and the c2_G had the highest binding affinity. This indicates that the native and the mutated c2_G structures form the probable complex, and these c2 structures support the experimentally obtained results.

In aptamers that recognize proteins by a three-dimensional structure, there are cases where the base moiety is modified to gain diversity of the three-dimensional structure or enhance affinity with the protein. Development of an artificial base pair for aptamers has also been attempted [60] . Our workflows that integrate RNA analytical tools can easily provide complex structures that may provide new suggestions for the design of aptamers by interpreting the recognition mechanism of RNA-protein complexes.

The current cloud-based workflow on our website integrates analytical resources for RNA tertiary structure determination into KNIME workflow systems. We have developed three analytical workflows for comprehensive structural analysis of RNA: (1) prediction of the tertiary structures of RNA molecules; (2) prediction of the structures of RNA-RNA complexes, including analysis of the interface; and (3) prediction of the structures of RNA-protein complexes, including analysis of the interface. Taking advantage of the graphical tools offered by KNIME communities, users visually interpret the structural results, reproduce the calculated data and generate data suitable for fast and reliable preparation of MD simulations. We have also introduced database-linking tools to offer seamless connection between tools and the RNA database in the workflow. Using the workflow, we have evaluated the RNA-protein complex formed between the RNA aptamer pegaptanib and VEGF. Molecular simulations were used to calculate changes in the affinity of complexes where key bases in the aptamer involved in complex formation were mutated. The results provide valuable information and insights that could aid the design of RNA aptamer-protein complexes with enhanced affinity, which affords an efficacy at a lower concentration and circumvents toxicity issues. Our workflows for the comprehensive structural analysis of RNA based on KNIME offer a powerful tool for structural analysis and design of nucleic acid drugs, and these structural analyses can be directly inputted into MD simulations. 

Towards a complete map of the human long noncoding RNA transcriptome

Long noncoding RNAs and the genetics of cancer

RNA and disease

Role of non-coding sequence variants in cancer

RNAi therapeutics: Principles, prospects and challenges

Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans

RNA interference

The microRNAs miR-373 and miR-520c promote tumour invasion and metastasis

MicroRNA expression profiles classify human cancers

A microRNA regulon that mediates endothelial recruitment and metastasis by cancer cells -with comments

Micro-RNA-21 targets tumor suppressor genes in invasion and metastasis

Identification of Ebola virus microRNAs and their putative pathological function

MicroRNome analysis unravels the molecular basis of SARS infection in bronchoalveolar stem cells

Viruses, microRNAs and cancer

Cellular microR-NAs inhibit replication of the H1N1 influenza A virus in infected cells

Suppression of microRNA-silencing pathway by HIV-1 during virus replication

The amazing world of bacterial structured RNAs

Cooperative tertiary interaction network guides RNA folding

Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction

Non-coding RNAs: therapeutic strategies and delivery systems

MicroRNAs in cancer: biomarkers, functions and therapy

Automated de novo prediction of native-like RNA tertiary structures

The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data

RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction

Polymorphism of bulged-out residues in HIV-1 RNA DIS kissing complex and structure comparison with solution studies

RNA2D3D: a program for generating, viewing, and comparing 3-dimensional models of RNA

Towards 3D structure prediction of large RNA molecules: an integer programming framework to insert local 3D motifs in RNA secondary structure

Bridging the gap in RNA structure prediction

Prospects for tertiary structure prediction of RNA based on secondary structure information

Tertiary structure prediction of RNA-RNA complexes using a secondary structure and fragment-based method

SimRNA: a coarsegrained method for RNA folding simulations and 3D structure prediction

A nucleobase-centered coarse-grained representation for structure prediction of RNA motifs

The ViennaRNA web services

Mfold web server for nucleic acid folding and hybridization prediction

CEN-TROIDFOLD: a web server for RNA secondary structure prediction

Scientific workflow systems: pipeline pilot and KNIME

KNIME: The Konstanz Information Miner

RactIP: fast and accurate prediction of RNA-RNA interaction using integer programming

MDpocket: open-source cavity detection and characterization on molecular dynamics trajectories

Software news and update autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading

RNA-based therapeutics: current progress and future prospects

The functional RNA database 3.0: databases to support mining and annotation of functional RNAs

Comparison of Simple potential functions for simulating liquid water

Refinement of the Cornell et al. Nucleic Acids Force Field Based on Reference Quantum Chemical Calculations of Glycosidic Torsion Profiles

RNA Structural Dynamics As Captured by Molecular Simulations: A Comprehensive Overview

Simmerling C (2015) ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB

Refinenement of the AMBER force field for nucleic acids: Improving the description of alpha/gamma conformers

Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes

Particle mesh Ewald: an N 	extperiodcentered log(N) method for Ewald sums in large systems

Molecular dynamics with coupling to an external bath

Generalized born models of macromolecular solvation effects

Matsumura H (2010) Conformational plasticity of RNA for target recognition as revealed by the 2.15 A crystal structure of a human IgG-aptamer complex

A therapeutic aptamer inhibits angiogenesis by specifically targeting the heparin binding domain of VEGF(165)

Pegaptanib, a targeted anti-VEGF aptamer for ocular vascular disease

Imino proton exchange rates imply an induced-fit binding mechanism for the VEGF(165)-targeting aptamer

Nucleic acid aptamers: clinical applications and promising new horizons

Unnatural base pair systems toward the expansion of the genetic alphabet in the central dogma

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.