key: cord-0968264-5zc8z0t7
authors: Baabu, Priyannth Ramasami S.; Srinivasan, Shivaramakrishna; Nagarajan, Swetha; Muthamilselvan, Sangeetha; Selvi, Thamarai; Suresh, Raghavv R.; Palaniappan, Ashok
title: End-to-end computational approach to the design of RNA biosensors for detecting miRNA biomarkers of cervical cancer
date: 2022-04-04
journal: Synth Syst Biotechnol
DOI: 10.1016/j.synbio.2022.03.008
sha: b057e62cfa36771b9f1b88e45e8e59d50c4deae3
doc_id: 968264
cord_uid: 5zc8z0t7

Cervical cancer is a global public health subject as it affects women in the reproductive ages, and accounts for the second largest burden among cancer patients worldwide with an unforgiving 50% mortality rate. Relatively scant awareness and limited access to effective diagnosis have led to this enormous disease burden, calling for point-of-care, minimally invasive diagnosis methods. Here, an end-to-end quantitative unified pipeline for diagnosis has been developed, beginning with identification of optimal biomarkers, concurrent design of toehold switch sensors, and finally simulation of the designed diagnostic circuits to assess performance. Using miRNA expression data in the public domain, we identified miR-21–5p and miR-20a-5p as blood-based miRNA biomarkers specific to early-stage cervical cancer employing a multi-tier algorithmic screening. Synthetic riboregulators called toehold switches specific to the biomarker panel were then designed. To predict the dynamic range of toehold switches for use in genetic circuits as biosensors, we used a generic grammar of these switches, and built a neural network model of dynamic range using thermodynamic features derived from mRNA secondary structure and interaction. Second-generation toehold switches were used to overcome the design challenges associated with miRNA biomarkers. The resultant model yielded an adj. R(2) ∼0.71, outperforming earlier models of toehold-switch dynamic range. Reaction kinetics modelling was performed to predict the sensitivity of the second-generation toehold switches to the miRNA biomarkers. Simulations showed a linear response between 10 nM and 100 nM before saturation. Our study demonstrates an end-to-end computational workflow for the efficient design of genetic circuits geared towards the effective detection of unique genomic/nucleic-acid signatures. The approach has the potential to replace iterative experimental trial and error, and focus time, money, and efforts. All software including the toehold grammar parser, neural network model and reaction kinetics simulation are available as open-source software (https://github.com/SASTRA-iGEM2019) under GNU GPLv3 licence.

Cervical cancer is the second most common cancer affecting women worldwide [1] [2] [3] , with 20% of cases in India [4, 5] . Most cervical cancers could be attributed to the HPV16 and HPV18 strains of Human Papilloma Virus (HPV) [1, 2] . Lifestyle factors also contribute to the etiology of the disease [2] . Cervical cancer tumorigenesis involves infection of the metaplastic epithelium by HPV at the cervical transitional zone, followed by viral persistence and the progression to pre-cancerous epithelial cells, thus nucleating the initiation of cancer [6] . Pap smear test is the gold-standard diagnostic method, but it is invasive, expensive, and time-consuming [4, 7] . Alternative testing strategies are necessary [8] [9] [10] , and micro-RNAs (miRNAs) circulating in the blood have emerged as effective biomarkers in many cases [11] [12] [13] [14] . Using data-driven methods for the identification of reliable early-stage diagnostic/prognostic biomarkers, coupled with the concurrent design of Peer review under responsibility of KeAi Communications Co., Ltd.

biosensors offers potential for the development of a new generation of molecular diagnostics in a single workflow [15, 16] .

Synthetic biology is delivering on its promise of harnessing nature's inherent diversity for the betterment of human health [17] . Bioelectronic diagnostic devices comprise an integrated single-unit reaction chamber housing the sensor element with the necessary reagents for analyte detection, and the corresponding transduction module for providing a reaction readout with optical, piezoelectric, or electrochemical means [18] [19] [20] . Toehold switch riboregulators are synthetic mRNA elements, which function by occluding the ribosome from translating a downstream gene [21] . Upon base pairing with the trigger RNA sequence, the toehold structure of the switch unfolds, permitting expression of a reporter gene. Such sensors could sense virtually any RNA sequence with excellent biosensor properties, namely specificity, modularity, and orthogonality [21] , and could be freeze-dried on a microfluidic platform for biomarker detection using a cell-free colorimetric assay [22] . The toehold-switch concept was extensively utilised in the design of a biosensor for detecting Zika virus infection [23, 24] . Takahashi et al. designed a toehold switch sensor that could detect the conserved region of C. difficile in a paper based platform [25] . In this work, we have formulated an alternative testing strategy for cervical cancer based on data-driven identification of biomarker miRNAs, design of cognate toehold-based sensors, followed by in silico modelling of sensor-circuit performance. The main contributions are twofold: (1) an end-to-end unified computational pathway from disease omics to biosensor design; and (2) application of this pathway to yield potential diagnostics/prognostics for cervical cancer. Supplementary Information accompanying the manuscript could be found at https://doi.org/10. 6084/m9.figshare.14915619.v5, and the software required to reproduce the workflow and use the tools described herein is available open-source under GNU GPLv3 at https://github.com/SASTRA -iGEM2019.

The overall strategy is outlined in Table 1 and discussed in detail below.

The Cancer Genome Atlas (TCGA) provided the miRNA-Seq dataset with 309 cervical adenocarcinoma samples and 3 tumor-normal matched controls [26] . The stage information was extracted from the associated clinical data using the attribute 'patient.stage_event.clin-ical_stage'. There were 163 stage-I, 70 stage-II, 46 stage-III and 21 stage-IV samples. The patient sample barcode encoded in variable hybridisation_REF was parsed to annotate the sample as cancer or normal.

RSEM-normalized Illumina HiSeq miRNASeq gene expression data [27] was log 2 transformed, and analyzed for differential expression using R (https://www.r-project.org/) limma package [28] . Linear modelling of stage-annotated gene expression matrix was performed, followed by empirical Bayes adjustment to obtain moderated t-statistics. To account for multiple hypothesis testing and the false discovery rate, the p values of the F-statistic of linear fit were adjusted using the BH method [29] . Differentially expressed miRNAs were considered significant if the absolute log-fold changes >1.5x and the p-value was <0.05 [30] . A tiered contrast analysis for identifying stage-specific DE miRNAs was then carried out (see the Methods section in Ref. [30] ). Finally, the association between the significant early-stage DE miRNAs and the overall survival of a patient was evaluated by univariate Cox proportional hazards regression analysis, and the significant prognostic miRNA Toehold switches act as sensors for exact sequences of RNA and could be designed to hybridize with cognate molecules. The toehold structure is marked by a hairpin loop sequestering the RBS, an internal loop containing the start codon, and two base paired stems, and functions as a regulatory-switch of translation by occluding the RBS in the absence of the cognate trigger molecule ('ground state') [21] (Fig. 1 ). Upon binding of the cognate miRNA (i.e, the trigger sequence) to the linear toehold domain, the bottom stem unravels and the double-looped structure of the toehold switch collapses to a linear structure, exposing the sequestered start codon to expression of the reporter gene ('active state'). Each toehold switch is a modular entity with distinct sequence domains engineered de novo for a particular trigger RNA. The dot-bracket notation encodes a representative grammar of RNA secondary structure, '.' representing unpaired bases, and matched-parentheses '(' and ')' representing paired bases (Fig. 1, bottom) . Regular expressions could be used to parse dot-bracket notation. An in-house Python script based on regular expressions and ViennaRNA utilities [32] was developed to parse the grammar of toehold switches into their constituent domains (https://github.com/SASTRA-iGEM2019), accounting for the following scenarios:

1. Wobble base-pair between G-U in the secondary loop containing the start codon. 2. Base-paired toehold domains 3 Base-paired linker regions 4 Possible unpaired bases in the top stem 1. The sequence of the toehold domain depends on the trigger complex and was at least 10 nts to allow effective binding of the trigger to the switch. 2. The descending bottom stem of the toehold switch was about 12 nucleotides, to maintain switch stability in the OFF state and ORF integrity.

3. The canonical RBS sequence AGAGGA was used. 4. The primary loop was made adenine-rich to maintain a larger loop structure.

The toehold-switch design and grammar are anchored with the following parts: RBS, 21-nt linker, start codon region, and Shine Dalgarno sequence. The variable domain is the trigger binding region (inclusive of the toehold domain and the ascending bottom stem) which obeys base-complementarity with the trigger.

In this case, the triggers of interest are mature miRNAs about 22 nts long, which might include a stop-codon trinucleotide towards their 5 ′ sequence. The presence of stop-codon subsequences could halt the expression of the downstream GFP, and pose a problem to the use of toehold-switches. Second-generation toehold switches are one solution to this problem ( Fig. 2 (A)) [33] . In addition to binding the miRNA, the toehold switch is required to bind another molecule, the anti-miRNA (antimiR) that is complementary to 12 nucleotides at the 3 ′ -end of the toehold switch ( Fig. 2 (B, C)). Essentially, the trigger is represented by a hybrid molecule with a base-paired region and two free ends. The design of the antimiR sequence could be used in improving the trigger binding event.

The sensitivity of the toehold switch is described by its dynamic range, defined as the ratio between the maximum and minimum measurable intensities of reporter protein (typically GFP) expression, also known as ON/OFF ratio [21] . The dynamic range of a toehold designed for exact sequences of trigger RNA is the key indicator of its effectiveness, and is generally unknown for new toehold designs. The problem can be addressed using supervised machine learning on toehold-sequence datasets with available dynamic range responses. A previous model developed by CUHK iGEM 2017 offered a web tool for predicting the dynamic range of toehold switches with modest goodness of fit (unadjusted R 2 ~0.04-0.16) [34] . To address this lacuna, we constructed multiple models that utilize the generic grammar of the toehold switch discussed above. A dataset of the toehold switches and their dynamic range responses was formed by combining the studies of Green et al. [21] and Pardee et al. [22, 23] . A total of 228 toehold switch sequences with ON/OFF ratios were obtained, consisting of:

(1) first-generation switches with moderate dynamic range (168 instances) [21] . (2) forward-engineered switches with significantly higher dynamic range ratios (13 instances) [21] . (3) toehold switches engineered for detecting Zika virus (47 instances) [23] .

The toehold sequence of each instance was parsed into its subdomains using our software. Sequence features were considered but rejected in favor of the more informative structure-based features [34] , which were computed using ViennaRNA utilities. These features included the overall switch minimum free energy (MFE), RBS-linker MFE, bottom-region MFE, and the net MFE of the second-generation switch-trigger complex (which was calculated as the difference in MFE between the final and initial states). The overall switch MFE determines the switch stability in the OFF state, the RBS-linker MFE is a thermodynamic proxy of the translation initiation rate, and the net MFE measures the energetic cost of the binding events taking place during toehold-mediated strand displacement.

In this manner, a dataset of engineered features and the corresponding toehold efficacies was compiled. Following an 80:20 train:test split, the features were normalized. Two regressor techniques were used to learn the relationship of the toehold-switch dynamic range with the feature space. One, a multivariate linear regression model was built with the selected features on the train data, and evaluated on the test data for significance and goodness-of-fit, using Python scikit-learn (https: //scikit-learn.org/). Second, a neural network with one hidden layer of 6 neurons, with L1 regularization and early stopping to control overfitting, was constructed based on the Keras framework [35] .

The chemical and biochemical kinetics model was developed by adding the following features to the first generation toehold switch model developed by CLSB-UK iGEM 2017 [36] :

(i) interaction of transcribed miR and ntimir oligos to form the miR-antimiR complex with two free ends, which then binds with the toehold switch (ii) maturation kinetics of the expressed GFP.

This generalized model can be presented as:

Here, k miR antimiR f and k miR antimiR b are the rate constants for forward and backward reactions, respectively.

The corresponding ODEs can be given by:

The kinetic parameters for the transcription, translation and maturation reactions in the genetic circuit were identified from literature [36, 37] and provided in Supplementary File S6. The kinetics for the miR-antimiR complex formation were modelled assuming a steady state with negligible backward flux, so that the equilibrium constant approximates the forward rate constant at equilibrium. From equation (1), the equilibrium constant for the complex formation is defined as:

NUPACK Analysis tool was used to obtain the equilibrium concentration of the miR, ntimir and their complex [38] . The chemical and biochemical kinetics were simulated using MATLAB R2017b (https://www.mathworks.com/).

The top 100 miRNAs of the linear model were screened for log-foldchange with respect to the control samples, stage-specificity, and significance of the trend, and finally corroborated with the literature evidence. This analysis yielded four DE miRNAs (Table 2) [12, 39] . It could be seen from the heat map and violin plots that hsa-miR-21-5p is the most significantly differentially expressed miRNA (Fig. 3) . The detailed results of the differential expression analysis and linear modelling are provided in Supplementary Files S1 and S2, respectively. It is interesting to note that all four miRNAs are upregulated across all stages of cervical cancer, suggesting possible role as oncomiRs and use in molecular diagnostics (Fig, S4 , Supplementary File S3).

The differential expression profiling of miRNAs associated with early-stage cervical cancer identified potential biomarkers, which were further investigated using survival analysis with the Cox proportional hazards model and Kaplan-Meier curves. The biomarkers were subjected to univariate survival analysis and the model significance was estimated using the log-rank test [32] . Fig. 4 shows the K-M plots of the univariate survival analysis for the four differentially expressed miRNAs. It could be seen that hsa-miR-29a-3p was insignificant (p-value > 0.05), and was consequently dropped from the hazard ratio estimation (Table 3) .

Next, we constructed a survival risk-score using multivariate Cox regression model [19] , to obtain an equation for the survival risk score (SRS).

Finally, a biomarker panel of hsa-miR-20a-5p, hsa-miR-21-5p and hsa-miR-200a-5p was constructed, classifying patients into high-risk and low-risk groups using an optimal cut point identified with the maxstat statistic of survminer R package. Overall survival curves were generated using the K-M method, and two-sided log-rank tests were used to compare the differences in overall survival between the risk groups. The K-M analysis shown in Fig. 5 (a) , indicated that the biomarker panel with three miRNAs was significant (p-value < 0.001). Considering the significance attributed to hsa-miR-21-5p in the literature [12, [39] [40] [41] [42] [43] [44] , we then evaluated a biomarker panel of just the two miRNAs, hsa-miR-20a-5p and hsa-miR-21-5p,. The K-M curve shown in Fig. 5 (b) , indicated that the significance of this panel was also significant (p-value < 0.001). However, a multivariate model of hsa-miR-21-5p and hsa-miR-200a-5p was not significant (p-value ~ 0.1). The univariate model shown in Fig. 4(b) and the form of eqn (5) both suggest that the overexpression of miR-200a-5p might have a protective effect against cervical cancer, and its expression might be a response to the cancer progression, making it inefficacious as a marker of cervical cancer. From the above results on differential expression and prognosis, we proposed to consider the two-marker panel of hsa-miR-20a-5p and hsa-miR-21-5p as the signature of early-stage cervical cancer.

We sought to validate this identified biomarker panel with an external miRNA dataset. Towards this, we used the GEO dataset, GSE30656, of 10 normal and 37 cervical cancer patients. The dataset was subjected to stage-annotated linear modelling as in the previous case (using the protocol described in Ref. [30] ), and yielded a confirmation that hsa-miR-21-5p is the top-ranked differentially expressed (upregulated) miRNA in this dataset as well (p-value < 1E-4). The details of this analysis are provided in Supplementary File S4. There is extensive literature support for hsa-miR-20a-5p as an upregulated early-stage serum biomarker of cervical cancer progression [40, 41, [45] [46] [47] [48] [49] [50] [51] . To investigate the pathways activated by the identified biomarkers, we first identified their regulatory targets, and then performed an enrichment analysis of the target genes using miRabel [52] and DAVID [53] , ascertaining the involvement of oncogenic pathways (Supplementary File S5). In particular, the KEGG enrichment analysis of the consensus target genes of both the biomarkers yielded significance for the MAPK signaling pathway. The two differentially expressed up-regulated miRNAs with prognostic significance constituted a reliable biomarker panel.

Two toehold switches were designed to target the biomarkers, hsa-miR-21-5p and hsa-miR-20a-5p. Table 4 shows the domain-by-domain decomposition of each of these toehold switches. The folding of the transcribed RNA of the designed sequences was predicted using Vien-naRNA [32] , and the predicted secondary structure was consistent with the toehold conformation ( Fig. 6(a, b) ). The minimum free energies of the hsa-miR-21-5p toehold switch and the hsa-miR-20a-5p toehold switch were estimated to be − 21.20 kcal/mol and − 19.20 kcal/mol respectively, suggesting a stable toehold conformation ready for interaction with the cognate hybrid miR-antimiR trigger.

In addition to the secondary structure and the minimum free energy, the dynamic range of the designed toehold switches is a key parameter determining the effectiveness of system. 

on the training set, with an adj. R 2 ~ 0.59. This provided a baseline to compare the performance of the neural network regressor. The detailed architecture and hyperparameters used in the neural network could be found in Supplementary File S6. The neural network architecture was experimented with different random seeds, yielding a series of models with variable performance. Of these, we chose the top twelve models that had adj. R 2 > 0.50 on the test set. The best of these models achieved an adj. R 2 ~ 0.71. The performance of this model is shown in Fig. 7 , illustrating a steady improvement in the test set metrics with the training epochs. A 10-fold cross-validation with this best-performing model yielded an adj. R 2 ~ 0.49. The growth of the small-sample correction to the R 2 score during k-fold cross-validation tends to quickly diminish the cross-validated goodness-of-fit. The script to build and train the neural network model is provided as a Jupyter notebook (https://github.com/SASTRA-iGEM2019/). All the top models are deposited in the project Github repository, along with the usage notes for all the software tools developed in this work. We then applied the top twelve neural network models to predict the efficacies of the toehold switches designed for the two identified miRNA biomarkers. For the two switches, we obtained a predicted dynamic range ~ [88.34 ± 16.39, 92.86 ± 18.74] respectively, which indicated potential robust design of the biosensors. This part of our workflow has been bundled into a single bash script that returns the predicted efficacy for any input toehold-switch sequence, and would be useful in toeholdbased synthetic biology.

We employed reaction network mass action kinetics to computationally evaluate the performance of the designed second-generation Table 2 DE miRNA biomarkers of cervical cancer. LogFC (log-fold change) and adj. p-values with respect to the control as estimated by the linear model are shown. The stage-specificity (individual log-fold changes and overall significance of contrast analysis) of the miRNA is computed using the protocol described in Ref. [30] . DE 

The miRNA biomarkers induced a stop codon in the design of firstgeneration toehold switches, which could be circumvented by the use of anti-miR based second-generation toehold switches. The secondgeneration toehold switches were modelled by adding a reaction upstream of the miRNA-closed toehold switch binding, where the miRNA 

Hazard ratio (HR), Z-score and p-value for the chosen miRNA biomarkers. HR > 1 indicates a covariate that is positively associated with the event probability (here, negatively with survival). The Z-score is an estimate of the significance of the hazard ratio under the assumption of standard normal distribution of the log-rank test statistic. 7 . Epoch tuning curves for the top-performing neural network model for the loss function and the adjusted R 2 . The adj. R2 was coded as a custom metric in the model definition. The validation loss is better than the training loss, an observation that can be traced to the use of L1-regularization in the model [35] . Initial negative values for adj. R 2 indicate that the fit is worse than a 'horizontal line' in the learning manifold, but progressively trains to achieve a better fit. A steady convergence beyond epoch ~#50 is visible. [21] hybridizes with the antimiR. NUPACK [38] was used to estimate the equilibrium concentrations of 100 nM hsa-miR-21-5p miRNA and 100 nM antimiR hybridizing at 18 • C to form the trigger complex (yielding 0.0264 nM, 0.0264 nM and 99.97 nM, respectively). This yielded a hybridisation rate constant k miR_antimiR_f on the order of 10 5 nM − 1 s − 1 (Eqn (4)). The above analysis was repeated with hsa-miR-20a-5p, and the details could be found in Supplementary File S7 -Sec. S2. Following the binding of the complex to the toehold switch, the switch would open, enabling translation of the downstream GFP. Since GFP fluorescence is associated with a lag between production and emission, a GFP maturation reaction was considered with a maturation factor ~0.2 min − 1 . A scaling factor of 79.429 was used to obtain the readout in fluorescence units, after iGEM Valencia_UPV 2018 [54] . The concentration profiles obtained from the model for the first toehold switch (hsa-miR-21-5p) is shown in Fig. 8(a) . The trends of declining miRNA and antimiR concentrations, and increasing complex concentration are evident. The complex formation occurred with exponential kinetics, and both the complex and the miRNA/antimiR attained saturation well before t = 500s. It is clear from Fig. 8(b) that the CTS and complex rapidly hybridized to open the CTS, with a significant drop in their concentrations concomitant while OTS concentration rises. The increase in OTS concentration permitted the translation of downstream GFP followed by the slow maturation kinetics of fluorescence. At about 500s, the complex was completely consumed, and OTS began to decay, while unhindered transcription allowed the CTS to continue to increase. Similar kinetics were observed in the case of the second toehold-switch (hsa-miR-20a-5p), with the smaller hybridisation rate constant slowing all intermediate reactions up to the final fluorescence readout (Supplementary File S7).

The concentration profiles for both the miRNA biomarkers informed the range of complex concentration (viz. 100 pM to 10 μM) for modelling the fluorescence intensity emission. The plot of intensity against the complex concentration over a 2-h interval obtained a sigmoidal fit with a steep rise between 10 nM and 100 nM (Section S4, Supplementary File S7). This showed that the model is sensitive to changes in the concentration of the trigger miRNA biomarker in the range of interest, satisfying a necessary biosensor property [25, [55] [56] [57] [58] [59] [60] . A similar trend of sigmoidal kinetics was observed for the hsa-miR-20a-5p biomarker over a complex concentration range of 100 nM and 1 μM (Supplementary File S7). This suggested that the slower kinetics in the case of hsa-miR-20a-5p might require a higher biomarker concentration for detection. We repeated the calculations with a 4-h extended time interval, and found that the increased reaction time did not alter the sigmoidal kinetics, only yielding higher fluorescence intensity than the 2-h interval case (R 2 = 0.99) (Fig. 9 ). These theoretical calculations require experimental validation prior to clinical translation. Table 5 shows the concentration trajectories of various species and the predicted fluorescence intensity readout for different miRNA-antimiR complex concentrations over a 4-hr simulation.

Towards the development of complete sensor circuits, medium-sized DNA constructs known as gBlocks (Integrated DNA Technologies; www. idtdna.com), housing one or more genes and conforming to the specifications in the Registry of Biological Parts [61] , could be developed. In this direction, three independent gBlocks were designed using the standard BioBricks (https://biobricks.org), comprising the constitutive E. coli T7 promoter, a 5 ′ prefix sequence with EcoRI and XbaI recognition sites, and a 3' suffix sequence with SpeI and PstI recognition sites, with the GFP-mut3b reporter gene. DNA sequences of the designed toehold-switches and genetic circuits are given in Supplementary File S8, while the gBlocks schematic, antimir design and BioBricks submissions are noted in Supplementary File S9. A limitation of the present study is that experimental validation with these synthetic circuits would be necessary before deployment in the clinic.

Neural networks have yielded useful applications for problems related to riboswitches [62] . Deep learning has been used for predicting toehold switch efficacy, with raw R 2 ~ 0.43-0.70 [63, 64] . These models are trained on a dataset of fused cis-triggers, significantly constraining their application to second-generation toehold switches. They require fixed-length inputs and do not lend to ready interpretation. Another application, MoiRNAifold is proprietary software and uses constraint-based linear programming to solve the inverse problem [65] , and does not predict the efficacy of toehold switch structures.

In comparison, our work is an all-in-one package providing a lightweight solution to the problem of computational toehold design, efficacy prediction, and circuit modelling. Our tool is (i) agnostic of input sequence length, (ii) trained on free triggers, and (iii) capable of predicting in batch mode (any number of sequences in one go). Though the mechanistic understanding of RNA structure is yet incomplete, modelling attempts with interpretable features is a necessary step towards this goal. Further, miRNAs are an emerging class of biomarkers with desirable serum profiles, and typically induce a rogue stop codon in toehold switches immediately downstream to the start codon of the gene to be expressed. In this context, second-generation toehold switches are the viable solution, and it is hoped that our approaches would yield effective diagnostics for miRNA biomarkers of disease conditions. Any input sequence conforming to the generic toehold grammar will be processed by our rational pipeline thus:

(i) call to ViennaRNA RNAfold to parse the input sequence into its dot-bracket representation (ii) call to GrammarParser to extract the segments of the toehold switch based on the dot-bracket representation (iii) call to more ViennaRNA RNAfold utilities to obtain engineered feature values for use in the regression/neural network model (iv) passing of feature values as arguments to the prediction script which yields the predicted dynamic range of the toehold switch sequence(s).

Geraldi et al. have highlighted the importance of synthetic biology in developing portable in vitro based diagnostic kits [66] . Toeholds embedded in genetic circuits enable the measurement of the intensity of expression of a downstream reporter proteinfor e.g., GFPthat could yield a precise quantification of the trigger itself. Pardee has reviewed the potential of freeze-dried cell-free (FD-CF) systems in health care for diagnostic and sensing purposes, given their biosafe mode [67] , and the promising opportunity for the rational design and manipulation of biological systems in relation to cell based systems has also been reviewed [68] . Fabricating microfluidic devices with our designed genetic circuits in a cell-free system followed by rigorous testing and validation would yield a point-of-care portable diagnostic tool for the real-time detection of early-stage cervical cancer.

Our workflow is modelled on the DBTL approach, which is analogous to the Design, Construct, Evaluate, Optimize (DCEO) approach in biotechnology [69] , and Fig. 10 captures the methodology we have adopted here, in two parts. The end-to-end computational approach employed in this study is illustrated in Fig. 10(a) . It includes four broad phases, namely identification of biomarker signatures, design of the biosensory toehold switch element, followed by the synthetic genetic circuit design comprising modular constructs and finally validating a proof-of-concept of the designed synthetic circuit using predictive modelling. Candidate toehold switches could be ranked for efficacy, and the predicted most efficacious toehold switch could be potentially used to optimize experimental workflows. This approach could be generalized to the design of portable, highly sensitive and specific biosensors for any condition, especially infectious diseases. The ongoing nCoV-SARS2 pandemic has highlighted the need for constant vigil for outbreak of infections. Such emerging agents require accurate, rapid, and scalable detection platforms, all requirements for which toehold biosensors are well-suited. Fig. 10(b) illustrates a scheme to produce such a toehold biosensor. Here the biomarker would be an optimal genomic fragment signature, unique to the pathogen but absent in the host. The toehold construct serves to signal the presence of the infectious agent that would then release the expression of the reporter gene, yielding a simple real-time visual readout. The proposed technique would generate a simple, reliable, rapid, portable and affordable printed diagnostic tool that could be made available at a global level in a point-of-care setting outside clinical diagnostic laboratory conditions. This biosafe product would be an asset in the field, especially in remote locations where it could help in curtailing the transmission and limiting the reproductive ratio of the infectious agent. Such a product would be vital in low-resource settings since it is sterile, abiotic and active for a year at room temperature. Extension of more than two biomarkers would allow for the simultaneous detection of, say, a family of coronaviruses, an imperative in our times. Thus, end-to-end scalable science is expected to overcome the limitations of ad-hoc diagnostic devices, thereby enabling affordable diagnosis in pandemic situations and other health emergencies.

Cervical cancer is a major public health issue with significant global burden of disease, but it is also an addressable one with the design of better molecular diagnostics. Early detection is necessary for effective treatment and greater compliance. In this work, three DBTL iterations were used to generate an in silico workflow for biomarker detection, sensor design and systems dynamics modelling. In the first cycle, we identified miRNAs that were significantly differentially expressed for early-stage cervical cancer using TCGA data and optimized down to two miRNA biomarkers based on prognostic significance. In the second cycle, we designed second-generation toehold switches and antimiRs for the two miRNAs. In this course, we developed a machine learning model of toehold-switch dynamic range that yielded adj.R 2 ~0.71, a major improvement in efforts in this direction. Finally we simulated the cellfree protein synthesis reactions to study the emergent kinetics. The results indicated that the fluorescence intensity underwent a sigmoidal transformation between 10 nM and 100 nM before reaching saturation. In summary, we have developed an end-to-end reproducible computational workflow for the sake of design of RNA biosensor devices for given conditions. Subsequent experimental validation would essentially democratize detection of emerging infectious diseases and lifethreatening conditions with a ready transportable instrument based on real-time visual readout. and infrastructural support to the iGEM 2019 project, from which this work originated (https://2019.igem.org/Team:SASTRA_Thanjavur). We would like to thank the anonymous reviewers for helping improve an earlier version of the manuscript. We would like to thank Ramit Bharanikumar and S. Prasanna Kumar for technical assistance. A.P. would like to acknowledge support from DST-SERB EMR/2017/000470.

Human papillomavirus and cervical cancer

Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

Estimates of the global burden of cervical cancer associated with HIV

Health technology assessment of strategies for cervical cancer screening in India

Pathology of cervical cancer

Perceptions of barriers and facilitators of cancer early detection among low-income minority women in community health centers

The evaluation of loop-mediated isothermal amplification-quartz crystal microbalance (LAMP-QCM) biosensor as a real-time measurement of HPV16 DNA

Assessment of a new low-cost, PCR-based strategy for high-risk human papillomavirus DNA detection for cervical cancer prevention

Rapid detection of human papilloma virus using a novel leaky surface acoustic wave peptide nucleic acid biosensor

A microarray platform for detecting disease-specific circulating miRNA in human serum

MicroRNAs as markers of progression in cervical cancer: a systematic review

Blood circulating miRNAs as cancer biomarkers for diagnosis and surgical treatment response

Circulating MicroRNAs in cancer: potential and challenge

Fold change rank ordering statistics: a new method for detecting differentially expressed genes

Gene expression profiling predicts clinical outcome of breast cancer

Synthetic biology devices for in vitro and in vivo diagnostics

Bioelectronic DNA detection of human papillomaviruses using eSensor TM : a model system for detection of multiple pathogens

Visual and modular detection of pathogen nucleic acids with enzyme-DNA molecular complexes

Fluorescent probe-based lateral flow assay for multiplex nucleic acid detection

Toehold switches: de-novo-designed regulators of gene expression

Paperbased synthetic gene networks

Rapid, lowcost detection of Zika virus using programmable biomolecular components

Synthetic biology provides a toehold in the fight against Zika

A low-cost paperbased synthetic biology platform for analyzing gut microbiota and host biomarkers

The cancer Genome Atlas pan-cancer analysis project

Analysis-ready standardized TCGA data from broad GDAC firehose 2016_01_28 run

Limma powers differential expression analyses for RNA-sequencing and microarray studies

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Novel significant stage-specific differentially expressed genes in hepatocellular carcinoma

A package for survival analysis in S. R package version. Survival (Lond)

ViennaRNA package 2.0

Complex cellular logic computation using ribocomputing devices

A comprehensive web tool for toehold switch design

Deep learning with Python

Experiment and mathematical modeling of gene expression dynamics in a cell-free system

Software news and updates NUPACK: analysis and design of nucleic acid systems

Achimas-Cadariu P. The role of miRNAs in diagnosis, prognosis and treatment prediction in cervical cancer

miR-20a promotes migration and invasion by regulating TNKS2 in human cervical cancer cells

MiR-20a promotes cervical cancer proliferation and metastasis in vitro and in vivo

Deregulation of miR-21 and miR-29a in cervical cancer related to HPV infection

MicroRNA-21 promotes proliferation, migration, and invasion of cervical cancer through targeting TIMP3

Orthotopic xenograft mouse model of cervical cancer for studying the role of MicroRNA-21 in promoting lymph node metastasis

Identification of a serum three-microRNA signature for cervical cancer diagnosis

Aberrant microRNA expression in human cervical carcinomas

A circulating serum miRNA panel as early detection biomarkers of cervical intraepithelial neoplasia

Circulating miRNA-20a and miRNA-203 for screening lymph node metastasis in early stage cervical cancer

Serum microRNA expression levels can predict lymph node metastasis in patients with early-stage cervical squamous cell carcinoma

Oncogenic microRNA signature for early diagnosis of cervical intraepithelial neoplasia and cancer

The role of miRNAs in the invasion and metastasis of cervical cancer

Improving bioinformatics prediction of microRNA targets by ranks aggregation

Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources

In Vitro use of cellular synthetic machinery for biosensing applications

DNA switch: toehold-mediated DNA isothermal amplification for dengue serotyping

Signal amplification and optimization of riboswitch-based hybrid inputs by modular and titratable toehold switches

Cell-free characterization of coherent feed-forward loop-based synthetic genetic circuits

Low-cost detection of norovirus using paper-based cell-free systems and synbody-based viral enrichment

Homogeneous and universal detection of various targets with a dual-step transduced toehold switch sensor

International Genetically Engineered Machines

Riboflow: using deep learning to classify riboswitches with ~99% accuracy

A deep learning approach to programmable RNA switches

Sequence-tofunction deep learning frameworks for engineered riboregulators

MoiRNAiFold: a novel tool for complex in silico RNA design

Synthetic biology-based portable in vitro diagnostic platforms

Perspective: solidifying the impact of cell-free synthetic biology through lyophilization

Synthetic biology goes cell-free

DCEO biotechnology: tools to design, construct, evaluate, and optimize the metabolic pathway for biosynthesis of chemicals

We are grateful to SASTRA Deemed University, especially the School of Chemical and Biotechnology and CeNTAB, for the generous financial 

Supplementary data to this article can be found online at https://doi. org/10.1016/j.synbio.2022.03.008.