key: cord-0845842-cdjtqi9g authors: Islam, Mohammad Tanvir; Alam, ASM Rubayet Ul; Sakib, Najmuj; Hasan, Mohammad Shazid; Chakrovarty, Tanay; Tawyabur, Mohammad; Islam, Ovinu Kibria; Al‐Emran, Hassan M.; Jahid, Mohammad Iqbal Kabir; Anwar Hossain, Mohammad title: A rapid and cost‐effective multiplex ARMS‐PCR method for the simultaneous genotyping of the circulating SARS‐CoV‐2 phylogenetic clades date: 2021-02-01 journal: J Med Virol DOI: 10.1002/jmv.26818 sha: d1c9f16f84dd97ed29489caebf12b07f7521971c doc_id: 845842 cord_uid: cdjtqi9g Tracing the globally circulating severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) phylogenetic clades by high‐throughput sequencing is costly, time‐consuming, and labor‐intensive. We here propose a rapid, simple, and cost‐effective amplification refractory mutation system (ARMS)‐based multiplex reverse‐transcription polymerase chain reaction (PCR) assay to identify six distinct phylogenetic clades: S, L, V, G, GH, and GR. Our multiplex PCR is designed in a mutually exclusive way to identify V–S and G–GH–GR clade variants separately. The pentaplex assay included all five variants and the quadruplex comprised of the triplex variants alongside either V or S clade mutations that created two separate subsets. The procedure was optimized with 0.2–0.6 µM primer concentration, 56–60°C annealing temperature, and 3–5 ng/µl complementary DNA to validate on 24 COVID‐19‐positive samples. Targeted Sanger sequencing further confirmed the presence of the clade‐featured mutations with another set of primers. This multiplex ARMS‐PCR assay is a fast, low‐cost alternative and convenient to discriminate the circulating phylogenetic clades of SARS‐CoV‐2. (GISAID) 11 by the clustered, co-evolving, and clade-featured point mutations. The mutations at position C241T along with C3037T, C14408T (RdRp:p.P323L), and A23403G (S:p.D614G) was referred to as G clade. Additional mutation to the G clade at N protein:p.RG203-204KR (GGG28881-28883AAC) and ORF3a:p.Q57H (G25563T) refer to GR and GH clade, respectively. The V clade was classified by co-evolving mutations at G11083T (NSP6:p.L37F) and G26144T (ORF3a:p.G251V) where S clade strains contain C8782T and T28144C (NS8:p.L84S) variations, respectively. The L clade strains are the original or wild version for the featured mutations of five clades. 12 Previous studies showed that the prevalence of phylogenetic clades was different by regions and times and was closely related to variable death-case ratio. 9,13 G clade variant was dominant in Europe 14 and United States 15 on the eve of the pandemic, which caused high mortality in the United States. This mutation variant has gradually been circulated in Southeast Asia 9, 16 and Oceania. 12 On the contrary, GR and GH clades emerged at the end of February 2020, and GR mutants are now the leading type that causes more than one-third of infection globally. 12 Therefore, it is indispensable to identify the circulating clades in a specific region. Besides, several reports speculated the occurrence of SARS-CoV-2 reinfection by phylogenetically different strains that belong to separate clades. 17, 18 The dominance of a particular viral clade over others might determine the virulence, disease severity, and infection dynamics. 9 However, the implications of different clades on effective drug and vaccine development are yet to be clearly elucidated. 19 The identification of phylogenetic clades requires the identification of specific mutations into the viral genome. This identification is performed by the whole genome sequence through the nextgeneration sequencing (NGS) technique that has now scaled up the deposited sequences number in GISAID to 139,000 as of October 6, 2020. Another high-throughput NGS alternative is based on cladebased genetic barcoding that targets polymerase chain reaction (PCR) amplicons encompassing the featured mutation as described by Guan et al. 20 However, this state-of-the-art technique has limited access to most laboratories in low-income countries. A shortthroughput and small-scale genotyping would be the Sanger-based targeted sequencing approach, 7 but this is labor-intensive, timeconsuming, inconvenient, and difficult to perform at low cost. Therefore, we have hardly observed the worldwide distribution of circulating clades in many countries, like Afghanistan, Maldives, Iraq, Syria, Yemen, Ethiopia, Sudan, Zimbabwe, Bolivia, Paraguay, and Chile, most probably due to the lack of sequencing facilities and appropriate technical personnel to perform this state-of-the-art technique. PCR-based point mutation discriminating technique, which is also known as the amplification refractory mutation system (ARMS), has been proven to be useful in identifying subtypes or clades of other respiratory viruses previously. [21] [22] [23] In this study, we aimed to develop and validate an ARMS-based novel multiplex-PCR to identify the clade-specific point mutations of the circulating SARS-CoV-2 clades. Among the positive cases, only 503 possessed C t value <30, from which we selected 25 samples using a random number generator in Microsoft Excel Inc. (Table S1 ). One positive sample was excluded because it was a duplicate follow-up sample. Five recent samples from SARS-CoV-2 negative cases were also included in this study. Details of the selected SARS-CoV-2-positive samples can be found in Table S2 . cDNA was prepared for each selected sample using the Go-Script™ Reverse Transcription System (Promega) following the manufacturer's protocol. In brief, primer/RNA mix was prepared by mixing 10 µl of extracted RNA with 1 µl of Random primer and 1 µl of Oligo(dT) 15 primer (total volume 12 µl). Then the mixture was heated at 70°C for 5 min, followed by immediate chilling on ice for 5 min and a quick spin. The mixture for reverse transcription reaction was prepared by making a cocktail of the components from the Go-Script™ Reverse Transcription System in a sterile 1.5 ml microcentrifuge tube kept on ice. The final reaction mix was 40 μl for each cDNA synthesis reaction to be performed. A set of 15 primers (Table 1) The "w" and "m" in the primer names denote, respectively, the wild-and mutant-type allele corresponding to "no" and "single" base change for the wild and mutant type. b The nucleotide position of the coding sequence for each protein where the primers bind to. F I G U R E 1 Schematic workflow of ARMS-based multiplex PCR assays for the identification of SARS-CoV-2 clades. The upper portion of the figure showed the concept of the clade as described in the GISAID with a comprehensive genomic visualization. The lower segment is dedicated to the overall workflow and the primer design. ARMS, amplification refractory mutation system; cDNA, complementary DNA; GISAID, global initiative on sharing all influenza data; qRT-PCR, quantitative real-time reverse-transcription polymerase chain reaction; PCR, polymerase chain reaction; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2 The forward-or reverse-type-specific primers were paired with counterpart reverse or forward primer. The amplicons were simultaneously distinguished by their molecular weight (bp) in multiplex PCR in different combinations. The positive amplification of wild-type-targeting primers was determined as the L type. The other types were determined based on the co-evolving mutation at respective sites. The primer sets were designed using Primer3Plus 24 The multiplex PCR assays were performed over all the 24 positive samples. To validate the reliability of the assays, another five pairs of primer set (Table 2) were designed for the clades keeping the probable mutation points within the middle of the amplicons by using Primer3Plus 24 and Primer-BLAST. 25 The abovementioned parameter settings were followed to design those The cycle sequencing PCR condition was set up accordingly to the kit protocol. The sequences were aligned with and verified by the reference sequence (NC_045512.2_SARS-CoV-2_Wuhan- Analysis (MEGA X) software. 27 For the most part, the cost of the processes was optimized and compared to our in-house NGS system (Ion torrent; Thermo Fisher Scientific). We carried out RNA secondary structure prediction of the ORF3a or NS3 RNA using the Mfold web server. 28 The full NS3 sequence was extracted from the SARS-CoV-2 reference sequence of NCBI Gen-Bank. The default parameters were used in generating the structure. The singleplex PCRs showed successful annealing at 57°C; however, temperatures for the duplex, triplex, and quadruplex assays were needed to be further optimized to 60°C, 57°C, and 56°C, respectively ( Figure 2B However, the quadruplex (that had 26144G>T (p.G251V)) and the pentaplex arrangement could not discriminate the bands between wild types and mutants ( Figure S1 ). All the 24 positive samples confirmed the test reproducibility of the assays; four of them, excluding the one used before for multiplex assays, were taken as a representative to display the reproducibility in this article ( Figure 2C This study proposes a simple and exclusive ARMS-based SNPdiscriminating method using conventional PCR to establish multiplex assays in detecting SARS-CoV-2 mutation clades. This concept was adopted from the other studies applied to identify the genetic profile of respiratory or gastrointestinal coronaviruses of pigs, human cancer risk-related SNPs, the virus that causes systemic infection in canines, the resistance profile of a bacteria, and so forth. [29] [30] [31] [32] T A B L E 2 List of primers for targeted Sanger sequencing to validate the multiplex assays This study designed point-mutation-specific primers to detect the six different SARS-CoV-2 clades as described by GISAID. The cladebased discrimination during the COVID-19 pandemic was exceedingly important because the prevalence of SARS-CoV-2 clades was varied by regions and times, and was closely related to variable case-fatality rate. 9, 13 In this study, we attempted to validate two sets of multiplex PCR covering G, GH, and GR in the first set and V and S in the second set. Based on the available data of clade prevalence, we propose to run the first set of multiplex PCR at the beginning that can confirm the most three prevalent clades. 9 Our attempt for pentaplex and/or the Besides, the low concentration of our template cDNA or initial RNA due to low loads in samples may ultimately reduce the PCR amplification of the largest product from the "V-specific" clades, which is 568 bp product targeting 26144G>T (p.G251V). We could not use a long extension time in PCR since it will increase the primer-dimer formation and will inhibit the amplification of other targets, that is, G, GR, GH, and S clades. We also were not able to detect viral load in the sample or cDNA quantity precisely since it would be dependent on the sampling, cell culture, and availability of the control RNA. Since we did not have positive and/or negative control RNA (synthesized), we could not detect how sensitive or specific our method was as well. However, we employed Sanger sequencing from each representative group to check the reproducibility of the results, and as Bangladeshi strains are mostly GR clade that is included within the G clade as well, we identified both of these clades in terms of our in-house multiplex ARMS PCR and Sanger sequencing. The advantage of our ARMS-based multiplex assays is rapid. the process being faster and cost effective. 35 The cost of the assay for a single reaction was $7 per run (the cost includes import Tax and VAT, etc. for Bangladesh), which is much less than targeted and whole-genome-based NGS methods in identifying the clades. The cost will be further reduced if an optimized onestep PCR system is used and we are currently working on it to cut the overall cost down to less than $2. Thus, our method can overcome a serious limitation to effectively identify viral clades with a prospective broader application. The requirement of technical skill would also be less for this assay wherein the training of personnel is a minimal requirement and interpretation of results is generic. 36, 37 Besides, the presence of the template as well as their quantity and quality are determined at the same time. The false-negative result for the absence of a template can also be determined in a facile manner. 38 In general, mutating the primer at its 3ʹ-end makes it refractory to the "wild-type template," whereas the absence of mutation in the primer is retractable to the "mutant template" amending a reliable technique over sequencing. 32 On the other hand, NGS, Table 1 . ARMS, amplification refractory mutation system; COVID-19, coronavirus disease 2019; M, mutant band; PCR, polymerase chain reaction; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; WT, wild-type band such as whole-genome sequencing (WGS) or metagenomics approach, can generate millions of high-throughput data that enabled researchers to unroll new dimensions in the field of genomesequencing applications. 39 The lack of technical personnel to analyze NGS data is also a reason to prefer an alternative approach other than NGS technology in low-income countries. Therefore, the ARMS technology with the conventional multiplex PCR methods in identifying the clades would be more applicable in low and minimum resource settings. Our assay can enhance the identification of genotypic variants of SARS-CoV-2 worldwide, especially in low-resource settings where NGS and Sanger-sequencing techniques are difficult to reach out. This rapid barcoding method may assist to reveal disease epidemiology, patient management, and protein-based drug designing and also contribute to modify the future national policy and vaccine development. A more cost-effective one-step procedure based on modified tetra ARMS assay (T-ARMS) is under development by our group that will considerably reduce the labor and cost further. An interactive web-based dashboard to track COVID-19 in real time Genotype and phenotype of COVID-19: their roles in pathogenesis Accessory proteins of SARS-CoV and other coronaviruses Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV Nonstructural proteins of novel coronavirus (SARS-CoV-2)新型コロナウにルス等のウにルス複製に必要な蛋白質 Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity Understanding the possible origin and genotyping of the first Bangladeshi SARS-CoV-2 strain Structural and functional basis of SARS-CoV-2 entry by using human ACE2 Evolving infection paradox of SARS-CoV-2: fitness costs virulence? Preprint Comprehensive annotations of the mutational spectra of SARS-CoV-2 spike protein: a fast and accurate pipeline GISAID: global initiative on sharing all influenza data-from vision to reality Geographic and genomic distribution of SARS-CoV-2 mutations SARS-CoV-2 genomic variations associated with mortality rate of COVID-19 Tracking changes in SARS-CoV-2 Spike: Evidence that D614G increases infectivity of the COVID-19 virus Distinct viral clades of SARS-CoV-2: implications for modeling of viral spread Emergence of European and North American mutant variants of SARS-CoV-2 in South-East Asia. Transbound Emerg Dis. 2020. Epub ahead of print COVID-19 re-infection by a phylogenetically distinct SARS-coronavirus-2 strain confirmed by whole genome sequencing. Clin Infect Dis. 2020. Epub ahead of print Clinical characteristics, cause analysis and infectivity of COVID-19 nucleic acid re-positive patients: a literature review Genomics insights of SARS-CoV-2 (COVID-19) into target-based drug discovery A genetic barcode of SARS-CoV-2 for monitoring global distribution of different clades during the COVID-19 pandemic Design and testing of multiplex RT-PCR primers for the rapid detection of influenza A virus genomic segments: application to equine influenza virus Design of multiplexed detection assays for identification of avian influenza a virus subtypes pathogenic to humans by SmartCycler real-time reverse transcription-PCR Validation of two multiplex real-time PCR assays based on single nucleotide polymorphisms of the HA1 gene of equine influenza A virus in order to differentiate between clade 1 and clade 2 Florida sublineage isolates Primer3-new capabilities and interfaces Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction Improved DNA sequencing quality and efficiency using an optimized fast cycle sequencing protocol MEGA X: molecular evolutionary genetics analysis across computing platforms Mfold web server for nucleic acid folding and hybridization prediction Development of a single multiplex amplification refractory mutation system PCR for the detection of rifampin-resistant Mycobacterium tuberculosis The use of ARMS PCR and RFLP analysis in identifying genetic profiles of virulent, attenuated or vaccine strains of TGEV and PRCV A novel multiplex tetra-primer ARMS-PCR for the simultaneous genotyping of six single nucleotide polymorphisms associated with female cancers Multiplex amplification refractory mutation system polymerase chain reaction (ARMS-PCR) for diagnosis of natural infection with canine distemper virus Progress in high throughput SNP genotyping methods A new SNP genotyping technology Target SNP-seq and its application in genetic analysis of cucumber varieties Designing, optimization and validation of tetra-primer ARMS PCR protocol for genotyping mutations in caprine Fec genes A sensitive, specific, and cost-effective multiplex reverse transcriptase-PCR assay for the detection of seven common respiratory viruses in respiratory samples A low-cost library construction protocol and data analysis pipeline for illuminabased strand-specific multiplex RNA-seq Multiplex PCR: advantages, development, and applications Next-generation sequence assembly: four stages of data processing and computational challenges We acknowledge GISAID for sharing the sequence data and IDT for giving the opportunity to use the tools for validating primers in silico.We also acknowledge the Ministry of Health and Family Welfare, Bangladesh, for giving us permission for the SARS-CoV-2 diagnosis.The present study was funded by the Jashore University of Science and Technology Research Grant (#FoBST-06) supported by the University Grant Commission, Bangladesh. The authors declare that there are no conflict of interests. The peer review history for this article is available at https://publons. com/publon/10.1002/jmv.26818. The data that support the findings of this study are available in the GISAID EpiFlu™ database at https://www.gisaid.org/, with reference number (the accession numbers are EPI_ISL_548260, EPI_ISL_561630, EPI_ISL_561375, EPI_ISL_561376, and EPI_ISL_561377). These data were derived from the following resources available in the public domain GISAID (https://www.gisaid.org/).