key: cord-0028827-c5ap7xpe
authors: Carrascoza, Francisco; Antczak, Maciej; Miao, Zhichao; Westhof, Eric; Szachniuk, Marta
title: Evaluation of the stereochemical quality of predicted RNA 3D models in the RNA-Puzzles submissions
date: 2022-02-03
journal: RNA
DOI: 10.1261/rna.078685.121
sha: 08927e3e8352a8f3d1bf12cb2c6a9dd146170295
doc_id: 28827
cord_uid: c5ap7xpe

In silico prediction is a well-established approach to derive a general shape of an RNA molecule based on its sequence or secondary structure. This paper reports an analysis of the stereochemical quality of the RNA three-dimensional models predicted using dedicated computer programs. The stereochemistry of 1052 RNA 3D structures, including 1030 models predicted by fully automated and human-guided approaches within 22 RNA-Puzzles challenges and reference structures, is analyzed. The evaluation is based on standards of RNA stereochemistry that the Protein Data Bank requires from deposited experimental structures. Deviations from standard bond lengths and angles, planarity, or chirality are quantified. A reduction in the number of such deviations should help in the improvement of RNA 3D structure modeling approaches.

Knowledge of the RNA atomic structure is crucial to address biological problems, therefore computational tools for the prediction of RNA three-dimensional models from the sequence have been developed to help or bypass some hurdles of laboratory procedures Miao and Westhof 2017; Gumna et al. 2020; Li et al. 2020; Magnus et al. 2020 ).

The first decade of the 21st century resulted in several computer programs and protocols, which paved the way for automated modeling of RNA 3D structures: S2S (Jossinet and Westhof 2005) , FARFAR (Das and Baker 2007) , iFoldRNA (Ding et al. 2008) , MC-Fold/MC-Sym (Parisien and Major 2008) , and NAST (Jonikas et al. 2009 ). Some of them developed into highly specialized programs, which are used for either fully automatic or human-guided prediction. In the following years, this collection grew to include other tools such as ModeRNA (Rother et al. 2011) , RNAComposer (Popenda et al. 2012) , 3dRNA (Zhao et al. 2012 ), Vfold (Xu et al. 2014) , and SimRNA (Boniecki et al. 2016) .

To stimulate the improvement of quality in RNA prediction, RNA-Puzzles was organized 10 yr ago (Cruz et al. 2012) . RNA-Puzzles is a community-wide assessment of RNA 3D structure prediction that aims to understand the bottlenecks in current RNA 3D structure prediction to promote the improvement of prediction methods. Before the publication of an experimentally determined RNA structure, the sequence is disseminated among the community and prediction results are submitted within 3-4 wk. Assessment against the experimental structure is performed after the release of the structure. There are two categories of challenges, depending on the protocols used to obtain the models: They can originate from fully automated web services or human experts running various prediction programs. The starting point for each challenge is a novel experimentally determined RNA 3D structure, the conformation of which is unknown to the predictors. The web servers have 48 h and human experts 3-4 wk for submitting their models. After the deadline, the predictions are evaluated and the results are published with the ranking of the submitted models. Presently, 28 crystallographic structures have been part of the contest. Eighteen of them have been the basis of four scientific papers published by the RNA-Puzzles community (Cruz et al. 2012; Miao et al. 2015 Miao et al. , 2020 . As of October 2020, 22 challenges have been concluded with assessment results available on the RNA-Puzzles website (http ://www.rnapuzzles.org). It provides accuracy assessments determined in comparison with the reference structure and calculation of several global similarity and distance measures (Magnus et al. 2020) : root mean square deviation (RMSD) (Kabsch 1978) ; deformation index (DI) that normalizes RMSD with the sequence length (Parisien et al. 2009 ); interaction network fidelity (INF), including Watson-Crick, noncanonical, and stacking interactions (Parisien et al. 2009 ); and, more recently, mean of circular quantities operating in torsion angle space (Zok et al. 2014; Wiedemann et al. 2017) . RMSD serves as the main criterion to rank the predicted models, although it is only capable of assessing the minimum average distance between two 3D structures represented as two sets of atomic coordinates. The remaining metrics allow a focus on base pairs and torsion angles. Additionally, RNA-Puzzles uses the Clashscore-as defined in the MolProbity software package (Williams et al. 2018 )-for assessing the accuracy in a noncomparative procedure by finding overlapping or too close atoms in the models and used as an overall evaluation of the stereochemistry.

Nevertheless, current biological problems are setting new thresholds of what acceptable geometry qualities should be. Catalytic features, for instance, highlight that not only the model's geometry is important, but also its stereochemistry is an important factor as well. One example is the torsion-angle-based dependence between the active and nonactive conformation of base pairs in some ribozyme active sites (White et al. 2018) . Moreover, selfcleaving ribozymes can provide another example, in which the correct description of phosphate backbone stereochemistry is critical to correctly assess the reaction pathways of these mechanisms (Teplova et al. 2020 ). Yet another recent case is drug development that targets RNA (e.g., against viruses) (Aftab et al. 2020) . Therefore, there is a clear need to advance technology to provide useful and trustable tools capable to address these challenges.

Proper stereochemistry is at the core of biomolecular structure modeling. The geometries and stereochemistry of the nucleic acid building blocks are very well known and with high precision Gelbin et al. 1996; Schneider et al. 1996) . Inaccuracies in molecular geometry can result from geometry optimizations that fall into local minima that may lead to a metastable conformer, different from the native one or another biologically irrelevant conformation. Inappropriate geometry may mask incorrect choice in torsion-angle space (for example, a base incorrectly in the syn conformer can lead to geometrical distortions in the sugar-phosphate backbone). Biomolecular structures are extremely well finetuned and the whole variety of physicochemical interactions is exploited in the folded native structure. Neglect of some type of interactions, or an inappropriate calibration, can lead to wrong conformations that can produce molecular distortions under insufficiently controlled structural refinement (Popenda et al. 2021) .

Here, we revisit the evaluation of the stereochemistry of predicted models beyond interatomic noncovalent distances. We follow a routine recommended to experimenters who deposit their structure data in the Protein Data Bank (Berman et al. 2000) and the Biological Magnetic Resonance Data Bank (Ulrich et al. 2008 )-both contributing to the wwPDB partnership (Berman et al. 2003) . wwPDB stresses the importance of careful examination of structures by providing tools that set the standards for 3D structure submission. In 2017, it introduced OneDep-a unified system applying the deposition, biocuration, and validation pipelines for structural data (Gore et al. 2017; Young et al. 2017) . OneDep is an extensive suite of programs operating on different metrics to assess the accuracy of structures. It implements stereochemistry analysis through MAXIT (Feng et al. 1998; Berman et al. 2000) .

To evaluate the stereochemistry of RNA tertiary structure predictions, we analyzed the results of all RNA-Puzzles challenges with the standardized data available as of November 2020-that is, puzzles 1-15, 17-21, and 24 (puzzle 14 in two versions, bound 14a and free 14b)-and 22 corresponding reference structures. We downloaded 1030 predicted RNA models from the standardized data set belonging to RNA-Puzzles resources (located at https:// github.com/RNA-Puzzles; Magnus et al. 2020 ). Among those, 797 models were in the human category and 233 models in the web server category. From these data, we created 23 clusters by participants-each containing models submitted by a single human group or a web server (Table  1 ). An additional 24th cluster included the reference structures (Table 2) . We processed the structures in all of these subsets using MAXIT software (Feng et al. 1998; Berman et al. 2000) and compared results with the MolProbity software (Williams et al. 2018 ). Next, we used Barnaba (Bottaro et al. 2019 ) and X3DNA-DSSR (Lu and Olson 2003) to verify base-paring geometries and handedness of helices, respectively. Finally, we conducted a simple statistical analysis by computing the average value, standard deviation, and median for every subset including more than one model (see Materials and Methods). contacts, bond length deviations, bond angle deviations, deviation from planarity, chirality issues, and phosphate bond linkages (Supplemental Material includes tables with the error numbers in every model). Using MAXIT, we examined them first for the subset of 22 reference structures ( Fig. 1 ; Supplemental Table S1 ). Most of them contained some types of geometrical deviations from standard dictionaries. We found the highest incidence of errors in the bond angles (183 errors in 17 structures), followed by close contacts (54 errors in seven structures) and bond lengths (32 errors in five structures). Among the worst cases (PZ07, PZ01, and PZ21), two are for structures at a resolution worse than 2.5 Å (cf. Supplemental Fig. S1 in the Supplemental Material). The software X3DNA-DSSR (Lu and Olson 2003) does not reveal any lefthanded helix/dinucleotide step in RNA-Puzzles submissions nor experimentally determined RNA 3D structures (cf. Supplemental Tables S46-S69 in the Supplemental Material). In Figure 1 , one can also observe that there are no chirality issues, while deviations from planarity occur only in two instances (nine errors in total). For polymer linkage (i.e., deviations in P-O bond lengths), we found seven structures with a total of nine reported inaccuracies, making an average of less than one error per structure, the same as for errors in planarity.

We have analyzed separately clusters with models predicted by human experts and web servers. Each of these 23 collections contains the predictions submitted by one participant within all the considered challenges available in the standardized data set of RNA-Puzzles resource (cf. Table 1 ). Their cardinalities range from 1 to 188. Within each of these clusters, except for those including only one model (i.e., H10, H14, and H16), we determined the total number of errors of each type (Supplemental Tables S2-S24 in the Supplemental Material), the average number over all the errors and the standard deviation ( Fig. 2 ) and confirmed these results using MolProbity software, version 4.5.1 (Supplemental Figs. S2-S4; Supplemental Tables S71-S92 in the Supplemental Material; Williams et al. 2018) . We did the same for each of the six types of stereochemical properties; we further computed the average value of each error and the standard deviation per cluster (Fig. 3) . One can observe that some of the applied prediction methods have an advantage over others in terms of the total number of errors. However, most submissions have stereochemical issues to address. Interestingly, there is no visible difference between the qualities of human versus web server predictions as far as the average number of all the inaccuracies is concerned. In both categories, we can observe both good and bad scores. The average number of errors per model in the human category equals 106, while in the web server category it is 103.

Due to the significant difference between cluster cardinalities, there is no statistical consistency between them, but there is statistical consistency within each cluster-the results of a single participant. For instance, the H9 set, for which the total score is significant in Figure 2 , has only 32 items-we should remember that in a small set, one highly defective object significantly affects the average value and standard deviation. The most numerous clusters (over 100 models) are H2, H3, H4, and H5. The sets labeled as H1, H6, W2, and W6 include 50-100 items. The remaining ones have less than 50 models each. By clustering and comparing the predictions submitted, one can observe that H1 (average total number of errors, ATN = 0), H4 (ATN = 7.69), and H8 (ATN = 11.60) groups apply methods performing the best in the category of human experts; W2 (ATN = 1.30), W5 (ATN = 7.61), and W4 (ATN = 31.20) are most successful among the web servers-their average number of all errors per model is less than 50 (Fig. 2) . For clusters H2, H3, H5, H6, H11, H12, H13, and W1, the average total number of inaccuracies is in the range of 50-200. However, the significant standard deviations indicate a large spread in stereochemical issues for the prediction methods used to obtain the models collected in these clusters. In the other clusters, the average total number of errors falls in the range of 200-350 (if we do not consider singlemodel clusters). A comparison between Figures 1 and 2 shows the gap between reference structures and predicted models. The most notable conformational errors in predicted models Evaluation of stereochemical quality of predicted RNA occur in bond lengths and angles. On average, MAXIT has identified over a hundred of such inaccuracies per the predicted RNA 3D model and less than 10 per the reference structure (on average). Chirality (or incorrect sugar substituent) is correct in the experimentally determined RNAs, while 28% of the predicted models have problems with it. Quite many deviations from the average planes of aromatic rings are observed. Polymer linkage assesses bond lengths between the adjacent nucleotides by measuring the P-O bond distances. This parameter has the lowest error rate in computationally generated structures-errors of this type occurred in 22% of all analyzed RNAs. Figure 3 presents MAXIT results separately for each error type and allows us to take a closer look into the weakness-es of the protocols embedded within various prediction programs. The plots reveal the highest number of inaccuracies especially in bond angles (70, 184) . Virtually every prediction method generates errors in covalent geometries, and the exceptional models with no such issue are not necessarily the most similar to the reference structure(s) in terms of overall RMSD. In the human category, models collected in H1, H4, and H8 clusters have little or no geometric issues (although, at the same time, four H4 models in puzzle 19 have the largest bond length error with O5 ′ −C5 ′ length >100 Å), while predictions in H7, H9, and H15 are among those with the highest number of inaccuracies. In the web server category, W2, W4, and W5 perform the best as far as bond lengths and angles are concerned, while W3, W6, and W7 are at the end of the ranking. If we consider deviations from ring planarity (Fig. 3) , of which the total number is 17,594, their average per model for every cluster is below 90 errors. Models in H6, H9, and H13 have a significant number of these issues. The largest identified deviation from planarity equals 0.791 Å and occurred in H15 model 1 predicted for puzzle 12. An example error of this type is depicted in Supplemental Figure S5 .

The average number of chirality errors is below 20 for all clusters. For some clusters, MAXIT reported zero or one issue of this type in total: H1, H8, and H13 within the human category, and W2, W3, and W5 within the web servers. Let us add that for H4-having the longest track of submissions-the number of errors in this category is also negligible. Some approaches (H2, H15, and W4) scored higher as far as the average number of chirality inaccuracies is concerned.

In total, 2130 abnormalities classified by MAXIT as chirality errors occurred in 291 predicted models. The most common form of chiral error is the interchange of the hydroxyl group and hydrogen atom on the same carbon atom at the ribose moiety (Fig. 4) . Such an interchange does not lead to a chiral error; it produces another sugar type (for example arabinose or xylose instead of ribose). Such improper sugar construction represents 94.9% of all chiral errors identified by MAXIT. The remaining 5.1% are planar inaccuracies in the sugar ring, and they occur when the improper torsion angle at a sp 3 carbon atom is close to zero instead of being around −122 or +122 degrees. Such a situation occurs in the furanose ring with distorted or flat sugar rings.

A distribution of chiral errors among nucleotides is shown in Table 3 . We can see a high frequency in guanine (692 inaccuracies, which make 32.5% of all chiral errors) and a lower one for uracil (410 inaccuracies, which make 19.2% of all chiral errors). This relationship is visible for both sugar construction inversions and planar errors. However, the frequencies are affected by the nucleotide content in the analyzed RNA structures. Thus, in Table 3 , we also present the total number of adenines, cytosines, guanines, and uracils in the analyzed data set, and the percentage of these nucleotides having erroneous chirality. Let us add that among all nucleotides with chiral errors, 91% are anti while 9% are syn nucleotides. A similar distribution is observed for each of the individual nucleotide types. Syn/anti conformation characterizes a relative orientation of base and sugar and is determined based on the χ-torsion angle (defined by O4 ′ -C1 ′ -N1-C2 for pyrimidines and O4 ′ -C1 ′ -N9-C4 for purines). Usually, χ falls into the ranges [+90, +180] or [−180, −90] corresponding to the anti conformation. Occasionally, we observe its value in [−90, +90] , which refers to the syn conformation. Some chiral errors (11%) appear when the conformation of a nucleotide in the predicted model differs from that in the reference structure. However, in most cases (89%), these errors cannot result from the conformation change ( Table 4 ). Regarding the distribution of errors among chiral atomic centers, we can observe that 43% of inaccuracies occur at C4 ′ (cf. Fig. 4 ), ∼25% at C2 ′ and C3 ′ (cf. Fig. 4) , and only 5.9% at C1 ′ atom (Table 5) .

Polymer linkage errors are rare for most predicted models (Supplemental Fig. S6 ). We have found that MAXIT may report false positives in this category-whenever it comes to the truncation of the sequence in the model, MAXIT fails to recognize different chains properly. This kind of artifact is clear for the reference structures (e.g., PZ12 and PZ21). However, when it comes to RNA 3D models predicted by web servers or human experts, we have not found such false positives since different chains are labeled correctly by the prediction methods (∼10% of submissions contain doublechain models). Thus, polymer linkage inaccuracies depicted in Supplemental Figure S6 are true-positive errors, and they come from incorrect linkage bond length between oxygen and phosphate group of two neighboring nucleotides in the polymer chain (Supplemental Fig. S6 ). Such errors were suspected to occur during the assembly building process of RNA fragments since generally nucleotides start at 5 ′ -P and end at O3 ′ ; however, our analysis revealed that models predicted by assembly-based methods did not show errors of this type.

The analysis of the RNA-Puzzles data set containing 1052 RNAs reveals 2431 polymer linkage errors, including 2422 errors in 230 predicted models and nine errors in seven reference structures. H2 and H7 clusters have the highest average number of these errors among human expert predictions. For the web servers, MAXIT has found the highest number of this type of inaccuracy in W6 and W7. The remaining prediction methods do not tend to generate errors in this category. By default, MAXIT reports such error whenever the distance between oxygen atom O3 ′ and phosphorus atom P of the next nucleotide in the polymer chain is longer than a typical length of a covalent bond between these atoms . Some of these errors are small deviations, but major ones occur as well. For example, in H15 model three predicted within Puzzle 13, MAXIT identified a bond of length 82.52 Å between A70 and A71-it is the highest inaccuracy of this type identified within the data set. The distribution of errors by the nucleotide type and syn/anti conformation is presented in Table 6 . One can observe that linkage errors involving adenine (19%) are least frequent, and those with guanine (32%) occur most often. However, as a function of the relative contents of the four nucleotides in the analyzed RNA molecules, cytosine has the most linkage errors and adenine the least.

Then, we analyzed the data set with the Barnaba software (Bottaro et al. 2019) , and we computed the backbone root mean square deviation (herein called BBRMSD) and base- Our results (see Supplemental Fig. S7A) show that across all the puzzles, there is no sensitive trend in the average measurement of both BBRSMD and eRMSD. In the average RMSD values across participants (see Supplemental Fig.  S7B ), groups H1 to H4 and H12 performed better than the automated protocols, with the exception of W7, which showed the lowest average in both, backbone and basepair quality. Some groups, from H6 to H9, have lower eRMSD than BBRMSD values, suggesting these groups focus their attention on the base pairs rather than the backbone structure. Other groups, like H11 and W5, perform better at deducing the volumetric backbone shape, but they have relatively worst base-pair performance (Supplemental Fig. S7B ).

Stereochemical errors in the predicted RNA 3D models are primarily generated by computer programs used in both human-guided and web server prediction. In many cases, these may be the result of relatively small rounding errors appearing at one of the calculation steps and propagating in subsequent iterations. Knowing the general approach used by the prediction method, it is possible to indicate the most sensitive stages at which the errors arise.

Computational complexity is the main problem in de novo simulation of RNA folding. Therefore, different techniques are used to reduce the time cost of de novo prediction methods. One of them is to evaluate the fold using simplified potentials, which do not take stereochemical parameters into account. The simulation (e.g., Monte Carlo) converges toward an optimum defined in terms of the overall 3D shape of the molecule and gives a fold that is stereochemically oversimplified and far from ideal. The large number of calculations that are performed during the random sampling of the solution space also affects the generation of errors, as the errors that occur in one step are propagated further and often deteriorate the final solution. Another related problem concerns coarse-grained simulations. The transformation of a coarse-grained model to a full-atom model is a highly erroneous procedure.

In template-based approaches (homology modeling, and fragment assembly methods), a crucial moment is the choice of the right template/fragment. A model based An inversion at C1 ′ would replace a β-ribose nucleotide with an α-ribose nucleotide; an inversion at C2 ′ leads to arabinose instead of a ribose; an inversion at C3 ′ leads to a xylose instead of a ribose; and an inversion at C4 ′ would lead to a real chiral inversion from a D-ribose to a L-ribose. (Conformation "same") Both residues in the linkage have the same conformations in the predicted model and the reference structure; (1 changed) one of the predicted residues is in different conformation than in the reference structure; (2 changed) both predicted residues are in different conformation than in the reference structure. The percentage of erroneous linkages of a given type and conformation is calculated about all (2422) erroneous linkages in the set. on a stereochemically erroneous template or of low-resolution may incorporate the incorrect stereochemical parameters. Stereochemical errors may also arise during nucleobase exchange, structural blocks insertion, or their assembly into a larger whole. In the latter case, the choice of the structural blocks that are rotated and translated is critical, since maneuvering a larger element is more erroneous.

Errors that arise during the structure modeling process can be avoided when applying a function that validates partial solutions based on their stereochemical parameters. However, it is very time-consuming and-in the case of de novo methods-completely unprofitable since it may cause the method not to return the solution in a reasonable time. Therefore, the best solution to the problem is to improve stereochemistry postfactum by minimizing the geometry or the energy after building the model, but bad local geometries are not easy to relieve when embedded in a large fold. At the same time, rerefining the templates used with standard dictionaries may alleviate the propagation of errors, leading to tight conformers with stereochemical errors difficult to energy minimize. In some tools, like FARFAR (Das and Baker 2007; Das et al. 2010) and RNAComposer (Popenda et al. 2012; Purzycka et al. 2014; Antczak et al. 2016) , such a procedure has been implemented and successfully fulfills its role.

Erroneous bond lengths, bond angles, and planarity deviations are the most frequent errors in RNA 3D structure prediction, while incorrect sugar constructions or chirality and polymer linkage errors occur less frequently (∼10 issues per structure on average). False-positive errors, which are caused by improper identification of structural chains in multichain RNA structures, are found in the polymer linkage category of the MAXIT results. Most errors can be compensated by running energy minimization protocols -for example, CYANA (Güntert and Buchner 2015) , NAMD (Phillips et al. 2020) , XPLOR-NIH (Schwieters et al. 2003 )-for the preliminary models or ensuring a proper stereochemistry from the early stages of prediction. One can also process the predicted RNA structures using tools-for example, RNAfitme (Zok et al. 2015; Antczak et al. 2018) or QRNAS (Stasiewicz et al. 2019 )-having the potential to refine the nucleic acid structure.

We found that most RNA 3D structure prediction methods evaluated within RNA-Puzzles-either in human or web server category-generate models with some incorrect stereochemical parameters. Even the best models, according to the RMSD-based rankings, are not free of such errors. One could argue that one can generate easily a very precise model that is inaccurate and that precision in geometric and stereochemical parameters are of lesser importance. These geometric and stereochemical parameters are very well established and need to be implemented to be helpful in the future for modeling structures with catalytic or fine recognition properties. Thus, a similarity/distance measure assessing a model against a reference structure cannot be the only reliable indicator of the model quality and that all the predictors should ensure the stereochemical accuracy of their models before submission. We suggest that a detailed stereochemical analysis should enter regular evaluation processes for improving the accuracy of RNA-Puzzles submissions and promoting high-quality RNA 3D structure prediction.

In this research, we used MAXIT version 10 downloaded from RCSB PDB (https://sw-tools.rcsb.org/apps/MAXIT), Barnaba 0.1.7 obtained from https://github.com/srnas/barnaba (Bottaro et al. 2019 ), MolProbity 4.5.1 taken from https://github.com/ rlabduke/MolProbity, (Williams et al. 2018) , and X3DNA-DSSR, version 2.4 (Lu and Olson 2003) . Structures were divided into 24 subsets: one subset with the reference structures and 23 subsets with predicted models (one for each participant), and the average values and standard deviations were computed for them. Three clusters, H10, H14, and H16, including predictions by human groups, were excluded from the statistical analysis since, in all the challenges, these groups submitted only one model each. However, their MAXIT reports are also available in the Supplemental Material.

MAXIT reports the following stereochemical issues: (1) close contacts; (2) bond length deviations; (3) bond angle deviations; (4) deviations from planarity; (5) chirality errors; and (6) polymer linkage errors (the P-O bond lengths). For 1-3 and 6, the program identifies abnormality if the parameter exceeds the expected value six times the standard σ value. The expected values and attached σ's are based on Clowney et al. (1996 ), Gelbin et al. (1996 , Parkinson et al. (1996), and Schneider et al. (1996) and are in the Supplemental Material. The current source of these reference terms is the Cambridge Structural Database (Urzhumtseva et al. 2009; Tickle 2012; Bruno and Groom 2014) . Atomic clashes are signaled whenever any intermolecular atom pair is closer than the sum of their respective van der Waals radii. In general, a clash is defined when the distance between two atoms is <2.2 Å (if no H atom is involved) or 1.6 Å (if one H atom is involved) (cf. Supplemental Table D1 in the Supplemental Material).

Departures from the average best-fit plane center yield the RMS deviations for all atoms from the plane. It is reported when >6 × 0.02 Å or when at least one atom has a deviation >0.02 Å. MAXIT also determines improper torsion angles in the furanose ring and reports deviations from puckering in the ring (imposed by the sp 3 carbon atoms).

In the case of chirality assessment, MAXIT lists the residues that contain unexpected configuration of chiral centers (C1 ′ , C2 ′ , C3 ′ , and C4 ′ ). Improper dihedrals (Gelbin et al. 1996) are a measure of the chirality/planarity of the structure at a specific atom.

Polymer linkage between the adjacent nucleotides is measured based on the distances computed for O3 ′ −P and O5 ′ −P atom pairs. By default, the O3 ′ −P distance is evaluated. However, if it exceeds 2.5 Å, MAXIT takes the minimum value out of these two for consideration. Figures 4-6 were prepared using Symmetry Tool Plug-in 1.3 implemented in VMD software, version 1.94 (Humphrey et al. 1996) .

Supplemental material is available for this article.

Meet the First Author(s) is a new editorial feature within RNA, in which the first author(s) of research-based papers in each issue have the opportunity to introduce themselves and their work to readers of RNA and the RNA research community. Francisco Carrascoza and Maciej Antczak are the co-first authors of this paper, "Evaluation of the stereochemical quality of predicted RNA 3D models in the RNA-Puzzles submissions." Francisco received his PhD in molecular modeling from Babes-Bolyai University in Romania in 2016 and has been an Assistant Researcher at Poznan University of Technology since 2015, working in theoretical chemistry on ab initio molecular dynamics and metadynamics, and on the origins of life to investigate the natural formation of nucleobases and amino acids in the prebiotic world. Maciej is an Assistant Professor at the Institute of Computing Science, Poznan University of Technology, as well as the Institute of Bioorganic Chemistry, Polish Academy of Science, working on algorithms and computational methods for the analysis and prediction of RNA structures, combinatorial optimization methods for solving biologically inspired problems, high-performance computing, and artificial intelligence.

What are the major results described in your paper and how do they impact this branch of the field?

We summarized a stereochemistry-oriented evaluation of in silico RNA 3D predictions submitted for past modeling rounds of the RNA-Puzzles international contest. Most state-of-the-art methods for RNA 3D structure prediction should pay more attention to ensuring proper stereochemical features. We showed the distribution of common errors and their total count in time to outline the importance of the problem. Moreover, we quoted expected value ranges for bond lengths, torsion angles, etc., and proposed how these issues could be eliminated. We believe that our results will contribute to the improvement of RNA 3D structure prediction methods.

What led you to study RNA or this aspect of RNA science?

FC: Since my experience comes from the ab initio theories and full-atomistic level, I found that the stereochemical errors are a challenge for those methods where the close atom-atom interactions are of higher importance than the volumetric quality of the model. We observed from our results that at full-atom scale, there is a considerable quantity of errors to mitigate, for those working on developing tools for RNA 3D structure prediction.

MA: RNAs are fascinating molecules, especially for computing scientists. Knowledge about RNA 3D folds is crucial for designing new drugs and therapeutic solutions. RNA 3D structure determination is usually expensive and not always possible. So, experimentally determined 3D structures of many biologically relevant RNAs are still unknown. The only way to mitigate this gap is an application of state-of-the-art methods for RNA 3D structure prediction. The aim of the RNA-Puzzles initiative is to stimulate the community to make efforts on continuous improvement of these methods and indirectly the quality and accuracy of RNA 3D predictions. Moreover, we give experimentalists useful hints on the successfulness of considered methods in various applications.

During the course of these experiments, were there any surprising results or particular difficulties that altered your thinking and subsequent focus?

FC: The most surprising result for me was the stable trend of the number of identified errors across all RNA-Puzzles challenges released in recent years. Therefore, we expect this report to be fruitful for the community.

MA: Surprisingly, in most of RNA-Puzzles submissions, even topranking ones, stereochemical errors have been identified. That is the reason why basic stereochemical validation could be valuable during the submission of RNA 3D predictions.

In the case of difficulties, it is not always possible to execute a bioinformatical tool cloned directly from the GitHub repository without any manual intervention. Definitely, as bioinformaticians, we need to work on this.

If you were able to give one piece of advice to your younger self, what would that be?

FC: Passion for your work will bring success. Don't worry about the things you don't have, just allow your passion to blossom.

MA: Do not be afraid of asking questions and striving persistently for answers.

What are your subsequent near-or long-term career plans?

FC: This work allowed me to realize the current struggles of the RNA community. In the near term, I plan to study in deeper detail the folding and dynamics of RNAs.

MA: I want to do what I like the most, so I plan to continue my scientific research in structural bioinformatics of RNAs. Of course, I am open to international collaboration, which could result in interesting scientific projects soon.

Evaluation of stereochemical quality of predicted RNA www.rnajournal.org 261

What were the strongest aspects of your collaboration as co-first authors?

FC: The multidisciplinary exchange needed in this work between bioinformatician and chemist gave us a unique insight into different ways of approaching RNA structure. This multidisciplinary exchange, when joined, allowed us to address the problem at different levels.

We represent two different scientific domains. So, our collaboration relied on the synergy of our complementary knowledge and experience. We discussed a lot but did not come across any problems that we could not overcome together.

Analysis of SARS-CoV-2 RNA-dependent RNA polymerase as a potential therapeutic drug target using a computational approach

New functionality of RNAComposer: an application to shape the axis of miR160 precursor structure

RNAfitme: a web server for modeling nucleobase and nucleoside residue conformation in fixed-backbone RNA structures

The Protein Data Bank

Announcing the worldwide protein data bank

SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction

Barnaba: software for analysis of nucleic acid structures and trajectories

A crystallographic perspective on sharing data and knowledge

Geometric parameters in nucleic acids: nitrogenous bases

RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction

Automated de novo prediction of native-like RNA tertiary structures

Atomic accuracy in predicting and designing noncanonical RNA structure

Ab initio RNA folding by discrete molecular dynamics: from structure prediction to folding mechanisms

MAXIT: macromolecular exchange and input tool

Geometric parameters in nucleic acids: sugar and phosphate constituents

Validation of structures in the protein data bank

RNAthor -fast, accurate normalization, visualization and statistical analysis of RNA probing data resolved by capillary electrophoresis

Combined automated NOE assignment and structure calculation with CYANA

VMD: visual molecular dynamics

Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters

Sequence to structure (S2S): display, manipulate and interconnect RNA data from sequence to structure

A discussion of the solution for the best rotation to relate two sets of vectors

Advances in RNA 3D structure modeling using experimental data

3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures

RNAssess: a web server for quality assessment of RNA 3D structures

RNA-Puzzles toolkit: a computational resource of RNA 3D structure benchmark Evaluation of stereochemical quality of predicted RNA www

datasets, structure manipulation, and evaluation tools

RNA structure: advances and assessment of 3D structure prediction

RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures

RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme

RNA-Puzzles Round IV: 3D structure predictions of four ribozymes and two aptamers

The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data

New metrics for comparing and assessing discrepancies between RNA 3D structures and models

New parameters for the refinement of nucleic acid-containing structures

Scalable molecular dynamics on CPU and GPU architectures with NAMD

Automated 3D structure composition for large RNAs

Entanglements of structure elements revealed in RNA 3D models

Automated 3D RNA structure prediction using the RNAComposer method for riboswitches

ModeRNA: a tool for comparative modeling of RNA 3D structure

Geometry of the phosphate group and its interactions with metal cations in crystals and ab initio calculations

The Xplor-NIH NMR molecular structure determination package

QRNAS: software tool for refinement of nucleic acid structures

Crucial roles of two hydrated Mg 2+ ions in reaction catalysis of the pistol ribozyme

Statistical quality indicators for electron-density maps

Crystallographic model quality at a glance

Coupling between conformational dynamics and catalytic function at the active site of the lead-dependent ribozyme

LCS-TA to identify similar fragments in RNA 3D structures

MolProbity: more and better reference data for improved all-atom structure validation

Vfold: a web server for RNA structure and folding thermodynamics prediction

OneDep: unified wwPDB system for deposition, biocuration, and validation of macromolecular structures in the PDB archive

Automated and fast building of three-dimensional RNA structures

MCQ4Structures to compute similarity of molecule structures

Building the library of RNA 3D nucleotide conformations using clustering approach

See the following page for Meet the First Authors

Received January 16, 2021; accepted November 5, 2021.