key: cord-0858360-u0e9qfzd authors: Ghosh, Nimisha; Saha, Indrajit; Sharma, Nikhil title: Palindromic Target Site Identification in SARS-CoV-2, MERS-CoV and SARS-CoV-1 by Adopting CRISPR-Cas Technique date: 2022-01-06 journal: Gene DOI: 10.1016/j.gene.2021.146136 sha: 8074f8efe183bddfa5651477c2d48f602d54289f doc_id: 858360 cord_uid: u0e9qfzd Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) associated Cas protein (CRISPR-Cas) has turned out to be a very important tool for the rapid detection of viruses. This can be used for the identification of the target site in a virus by identifying a 3-6 nt length Protospacer Adjacent Motif (PAM) adjacent to the potential target site, thus motivating us to adopt CRISPR-Cas technique to identify SARS-CoV-2 as well as other members of Coronaviridae family. In this regard, we have developed a fast and effective method using k-mer technique in order to identify the PAM by scanning the whole genome of the respective virus. Subsequently, palindromic sequences adjacent to the PAM locations are identified as the potential target sites. Palindromes are considered in this work as they are known to identify viruses. Once all the palindrome-PAM combinations are identified, PAMs specific for the RNA-guided DNA Cas9/Cas12 endonuclease are identified to bind and cut the target sites. In this regard, PAMs such as 5’-TGG-3’ and 5’-TTTA-3’ in NSP3 and Exon for SARS-CoV-2, 5’-GGG-3’ and 5’-TGG-3’ in Exon and NSP2 for MERS-CoV and 5’-AGG-3’ and 5’-TTTG-3’ in Helicase and NSP3 respectively for SARS-CoV-1 are identified corresponding to SpCas9 and FnCas12a endonucleases. Finally, to recognise the target sites of Coronaviridae family as cleaved by SpCas9 and FnCas12a, complements of the palindromic target regions are designed as primers or guide RNA (gRNA). Therefore, such complementary gRNAs along with respective Cas proteins can be considered in assays for the identification of SARS-CoV-2, MERS-CoV and SARS-CoV-1. 1) Adopting CRISPR-Cas technology for target site identification. 2) Using k-mer technique to find PAM. 3) Using PAM to identify corresponding palindromes in reference sequences of viruses. 4) Designing primers complementary to palindromes for virus identification. 5) Finding population coverage of palindrome-PAM combinations for the virus sequences. COVID-19, the disease caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) has affected a lot of people around the globe and has claimed more than 5.3 million lives as of 15 th December 2021 1 . SARS-CoV-2 belongs to the family of Coronaviridae which also accommodates MERS-CoV and SARS-CoV-1 viruses (Zhou et al., 2020) . The symptoms of COVID-19 include cough, fever, dyspnoea, diarrhoea, myalgia (Hosseini et al., 2020) and in some extreme cases may also lead to severe respiratory distress leading to eventual death. Moreover, comorbidity issue in COVID-19 is relatively high and targets different organs like kidney, liver, heart, brain etc (Dey et al., 2020 , Qi et al., 2020 . Since its spread, symptom-based diagnosis of COVID-19 is being performed which includes chest X-ray and CT scan, quantitative reverse transcription polymerase chain reaction (qt-PCR) and antibody test. Most recently, another rapid detection method based on CRISPR-Cas has been proposed by the researchers (Broughton et al., 2020b) . Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) associated Cas protein (CRISPR-Cas) is an adaptive immune system in prokaryotic organism that can provide resistance to foreign elements. This system has been exploited in recent times as a powerful gene editing tool and for the diagnosis and inactivation of viruses (Jia et al., 2020) . The efficiency of CRISPR-Cas technique is dependent on the design of guide RNA (gRNA). gRNA guides the Cas protein to the intended DNA site and then creates a DNA double-strand break resulting in its repair which leads to different DNA sequence modifications (Rahman et al., 2021) . The study by (Bhat et al., 2020) provides an overview of using CRISPR-Cas system for editing plant genomes. The study also includes information on the approaches, procedural programs and applications in editing plant genomes for improving resistance against emerging pathogens, crop yield, herbicide tolerance and abiotic stresses. Some of the platforms that are being used by CRISPR-Cas systems include DNA endonuclease-targeted CRISPR trans reporter or DETECTR (Chen et al., 2018) , Cas13-assisted restriction of viral expression and readout or CARVER (Freije et al., 2019) , 1-h low-cost multipurpose highly efficient system or HOLMES (Li et al., 2019) and specific high sensitivity enzymatic reporter unlocking or SHERLOCK (Gootenberg et al., 2017) . In their study, Lyu et al. (Lyu et al., 2020) have highlighted the potential of CRISPR platforms as a tool for diagnosing tuberculosis in children. They have recommended further studies to evaluate the performance of CRISPR in non-invasive specimens collected from children. In (Kayesh et al., 2020) From all the aforementioned works, it can be said that CRISPR-Cas is a well established system for the rapid detection of viruses and therefore can be employed for the detection of SARS-CoV-2, MERS-CoV and SARS-CoV-1 as well. In this regard, Zhang et al. Taking cues from these recent works, we have adopted the concept of CRISPR-Cas system to identify the target sites for the identification of SARS-CoV-2 and other viruses of Coronaviridae family, that is MERS-CoV and SARS-CoV-1. In this regard, identification of protospacer adjacent motif or PAM is carried out in this work. PAM is a short DNA sequence having usually a length of about 3-6 nt that is present adjacent to CRISPR in the genomic sequence. The genomic locations that are the potential target sites for the identification of viruses are limited by the presence and locations of the PAM. Thus, in order to find the target sites for the identification of SARS-CoV-2, MERS-CoV and SARS-CoV-1 viruses, initially the PAM and their corresponding genomic locations are identified. Once the PAM are identified, instead of finding short palindromic repeats as required by CRISPR-Cas, we have modified the idea to consider palindromic sequences which are adjacent to PAM to be the target sites for virus identification. Thereafter, to bind and cut the target sites, specific PAMs are identified for the RNA-guided DNA Cas9/Cas12 endonuclease. In this regard, PAMs such as 5'-TGG-3' and 5'-TTTA-3' in NSP3 and Exon for SARS-CoV-2, 5'-GGG-3' and 5'-TGG-3' in Exon and NSP2 for MERS-CoV and 5'-AGG-3' and 5'-TTTG-3' in Helicase and NSP3 respectively for SARS-CoV-1 are identified corresponding to SpCas9 and FnCas12a. It is worth mentioning that studies performed by (Cain et al., 2001 , Chew et al., 2004 , Dirac et al., 2002 have suggested that palindromes can be considered to be involved in target identification, viral packaging and defence mechanisms. A palindromic sequence is a symmetrical sequence so that when read from the reverse direction, it is the exact complement of itself. For example, TGCA is a palindrome of length 4. It is to be noted that a palindrome is always even in length. Thereafter, to recognise these target sites in a virus genome as cleaved by SpCas9 and FnCas12a, primers are designed as complementary to the target site sequences. Thus, these complementary palindromic primers can be considered in assays for the rapid identification of SARS-CoV-2, MERS-CoV and SARS-CoV-1. These primers are akin to guide RNA (gRNA) in CRISPR-Cas technology. To find PAM and the corresponding palindromic sequences, initially the three reference genomic sequences of SARS- The results for the total number of PAM in the reference genomic sequence of each of SARS-CoV-2, MERS-CoV and SARS-CoV-1 are shown in Figure 1 Table 1 also reports the corresponding GC content of the palindromes. According to (Haeussler et al., 2016 , Reynolds et al., 2004 , it is difficult to target GC-rich genes and thus it can be said that sequences with moderate GCcontent are good candidates for being target sites of a virus. As can be observed from Table 1 Please note that all the palindrome-PAM combinations are unique to each virus, thereby confirming the fact that they can indeed be used for virus identification. Furthermore, we have also checked for the combinations from Table 2 in the reference sequence of Ebola, Dengue, Influenza and Zika viruses. Also, nucleotide BLAST 10 is used to check the specificity of the same and it has been observed that such palindrome-PAM combinations are not present in any of the other viruses. Apart from aforementioned results, all the palindrome-PAM combinations are provided in the supplementary as an excel file. It is to be further noted that though this work specifically focuses on PAMs as recognised by Cas9 or Cas12 endonuclease, we have reported other palindrome-PAM combinations as well in the hope that if any new endonuclease is engineered, our work can serve as a way for further virus identification. This study adopts the idea of CRISPR-Cas technology in order to identify palindrome-PAM combinations as target sites for virus identification. To achieve this, initially PAMs are identified using k-mer technique. Thereafter, palindromic sequences which are adjacent to the PAM locations are identified as the potential target sites. Next, PAMs specific for the RNA-guided DNA Cas9/Cas12 endonuclease to bind and cut the target sites are detected. In this regard, corresponding to SpCas9 and FnCas12a endonuclease, PAMs such as 5'-TGG-3' and 5'-TTTA-3' in NSP3 and The ethical approval or individual consent was not applicable. All the SARS-CoV-2, MERS-CoV and SARS-CoV-1 virus genomes with their corresponding reference sequences and the final results of this work are available at "http://www.nitttrkol.ac.in/indrajit/projects/COVID-CRISPR-Cas/". Not applicable. This work has been partially supported by CRG short term research grant on COVID-19 (CVD/2020/000991) from Science and Engineering Research Board (SERB), Department of Science and Technology, Govt. of India. The authors declare that they have no conflict of interest. The era of editing plant genomes using crispr/cas: A critical appraisal Crispr-cas12-based detection of sars-cov-2 Rapid detection of 2019 novel coronavirus sars-cov-2 using a crispr-based detectr lateral flow assay. medRxiv : the preprint server for health sciences Palindromic sequence plays a critical role in human foamy virus dimerization Crispr-cas12a target binding unleashes indiscriminate single-stranded dnase activity Palindromes in sars and other coronaviruses Unveiling COVID-19-associated organ-specific cell types and cell-specific pathway cascade Requirements for rna heterodimerization of the human immunodeficiency virus type 1 (hiv-1) and hiv-2 genomes Programmable inhibition and detection of rna viruses using cas13 Nucleic acid detection with crispr-cas13a/c2c2 Evaluation of off-target and on-target scoring algorithms and integration into the guide rna selection tool crispor The novel coronavirus disease-2019 (covid-19): Mechanism of action, detection and recent therapeutic strategies The expanded development and application of crispr system for sensitive nucleotide detection Development of an in vivo delivery system for crispr/cas9-mediated targeting of hepatitis b virus cccdna Editing the human cytomegalovirus genome with the crispr/cas9 system Holmesv2: A crispr-cas12b-assisted platform for nucleic acid detection and dna methylation quantitation Crispr-based biosensing is prospective for rapid and sensitive diagnosis of pediatric tuberculosis Single cell rna sequencing of 13 human tissues identify cell types and receptors of human coronaviruses CRISPR is a useful biological tool for detecting nucleic acid of SARS-CoV-2 in human clinical samples Rational sirna design for rna interference A protocol for detection of covid-19 using crispr diagnostics. A protocol for detection of COVID-19 using CRISPR diagnostics A pneumonia outbreak associated with a new coronavirus of probable bat origin This work was partially carried out during the tenure of an ERCIM 'Alain Bensoussan' Fellowship Program awarded to Nimisha Ghosh. Also, the authors would like to thank all those who have contributed sequences to GISAID and NCBI databases.