key: cord-0314734-wy07bget authors: Rajib, S. A.; Ogi, Y.; Hossain, M. B.; Ikeda, T.; Tanaka, E.; Kawaguchi, T.; Satou, Y. title: A SARS-CoV-2 Delta Variant Containing Mutation in the Probe Binding Region Used for qRT-PCR Test in Japan Exhibited Atypical PCR Amplification and Might Induce False Negative Result date: 2021-11-20 journal: nan DOI: 10.1101/2021.11.15.21266335 sha: 88bc1fe9854d96e8961a3034cf49975997d8fb9f doc_id: 314734 cord_uid: wy07bget A recent pandemic of SARS-CoV-2 infection has caused severe health problems and substantially restricted social and economic activities. To cope with such an outbreak, the identification of infected individuals with high accuracy is vital. qRT-PCR plays a key role in the diagnosis of SARS-CoV-2 infection. The N protein-coding region is widely analyzed in qRT-PCR for the diagnosis of SARS-CoV-2 infection in Japan. We recently encountered two cases of SARS-CoV-2-positive specimens showing atypical amplification curves in the qRT-PCR. We performed whole-genome sequencing and found that the virus was a Delta-type variant of SARS-CoV-2 with a single nucleotide mutation in the probe-binding site. To evaluate the extent of spread of the variant in the area, we performed whole viral genome sequencing of samples collected from 61 patients infected with SARS-CoV-2 during the same time and in the same area. There were no other cases with the same mutation, indicating that the variant had not spread in the area. Furthermore, we performed phylogenetic analysis with various SARS-CoV-2 sequences deposited in the public database. Hundreds of variants were reported globally, and one in Japan were found to contain the same mutation. Phylogenetic analysis showed that the variant was very close to other Delta variants endemic in Japan but quite far from the variants containing the same mutation reported from outside Japan, suggesting that the variant would have been sporadically generated in some domestic areas. These findings propose two key points: i) mutations in the region used for SARS-CoV-2 qRT-PCR can cause abnormal amplification curves; therefore, the qRT-PCR result should not just be judged in an automated manner, but also manually checked by the examiner to prevent false-negative results, and ii) various mutations can be generated sporadically and unpredictably; therefore, efficient and robust screening systems are needed to promptly monitor the emergence of de novo variants. Since its first report from a seafood market in Wuhan, China qRT-PCR based testing plays a vital role in identifying infected patients and confining them where uninfected individuals can perform social activities [18] . Therefore, accuracy in detecting SARS-CoV-2 in patients plays a crucial role in controlling the transmission of this highly contagious pathogen. Falsenegative results put much stress on the containment efforts for COVID-19. Successful qRT-PCR based detection is dependent on the efficient binding of primers and probes to the target areas. Any changes in the target nucleotide sequence can significantly lower the binding affinity of the primers and probes to their target region, resulting in a pseudo-negative diagnosis [19, 20] . Consequently, the resulting undetected mutants can also result in future waves of COVID-19 [21] . Although SARS-CoV-2 has a lower mutation frequency compared to other RNA viruses because of its RNA proofreading mechanism [22], its higher transmission rate makes SARS-CoV-2 a good candidate to observe for any unforeseen mutation within the qRT-PCR primer and probe binding regions so that the quality of the detection methods could be ensured. This article reports a point mutation in the N gene of SARS-CoV-2 that resulted in an atypical qRT-PCR curve during the diagnostic test, leading to a dubious diagnostic interpretation. Sequencing results revealed the presence of a single point mutation at the probe-target site in the N gene. This analysis shows that sequencing can play a problem-solving role in qRT-PCR based diagnostic complications and raise caution to institutes performing qRT-PCR tests to detect SARS-CoV-2 infection. In this study, samples were collected by the Kumamoto City Medical Association Inspection Center (hereinafter referred to as the Kumamoto PCR center). The downstream analyses were performed at the For this study, sputum and/or nasopharyngeal swab samples were collected from the Kumamoto city area from August 16, 2021, to September 8, 2021, as part of the regular COVID-19 diagnostic service. The collected samples and extracted RNAs were preserved at -80 °C until further experiments were performed according to the guidelines of the NIID, Japan [23]. For qRT-PCR analysis and sequencing, RNA was extracted from the sputum and/or nasopharyngeal swab samples (140 µL) using the QIAamp Viral RNA Mini Kit (QIAgen) and EZ1 Virus Mini Kit v2.0 (QIAgen) according to the manufacturer's instructions. The final RNA product was eluted in 60 µL of buffer AVE (RNase free water with 0.04% sodium azide), and 5 µL of the eluted RNA was used for qRT-PCR, and the remaining RNA was stored at -80 °C for further analysis. qRT-PCR was performed on a LightCycler96 System (Roche, Basel, Switzerland) according to the protocol suggested by Shirato et al. [14] to detect the N gene using the One Step PrimeScriptÔ RT-PCR Kit (Perfect Real Time) (Takara). The N2 primer-probe set was used as described by Shirato et al. [14] . cDNA synthesis, viral sequence enrichment, library amplification, and indexing were performed using the QIAseq DIRECT SARC-CoV-2 kit (QIAgen), following the manufacturer's recommendations. For multiplexing the samples, QIAseq DIRECT UDI set-A (QIAgen) was used. SARS-CoV-2 libraries of 25 µL volume for each sample were prepared at the end of the process. The quality of the enriched SARS-CoV-2 libraries was evaluated by electrophoresis with a TapeStation 4150 system (Agilent Technologies). Finally, the prepared libraries were denatured and subjected to sequencing using MiSeq reagent Micro and Nano Kits (Version 2, 300 cycles) in the MiSeq desktop sequencing system (Illumina). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 20, 2021. ; https://doi.org/10.1101/2021.11.15.21266335 doi: medRxiv preprint Upon sequencing in the Illumina MiSeq sequencer, one FASTQ file (Read1) was generated for each sample. Adapter sequences were trimmed from the Read1 using the 'cutadapt' tool [24] . After the adapter trimming step, a cleaning step was performed using the PRINSEQ tool [25] and Read1 with Phred score > 20 were used for downstream analysis. Adapter trimmed and cleaned reads were then aligned to the SARS-CoV-2 reference genome NC_045512 (isolate Wuhan-Hu-1) using the BWA-MEM algorithm [26] . Subsequently, the Samtools program [27] was used to remove multiple aligned reads, and Freebayes (Version 1.2.0) command-line tools [28] were used to call the variants from the aligned reads and create a variant call format (VCF) file. Finally, the aligned files and VCF files were visualized using the Integrative Genomics Browser (IGV) [29] . The resulting consensus SARS-CoV-2 genomes were deposited at the Global Initiative on Sharing All Influenza Data (GISAID) [30] (Supplementary data 1). For secondary data analysis, the Pangolin COVID-19 Lineage Assigner tool was used to assign the nomenclature for the viral genomes proposed by Rambaut et al. [31] and the CoV-GLUE web application tool was used to check for any novel mutations within the viral sequences [32] . For local phylogenetic tree construction, 61 SARS-CoV-2 strains were collected from Kumamoto city between September 1, 2021, and September 8, 2021. First, the viral sequences were aligned to the SARS-CoV-2 reference genome NC_045512 (isolate Wuhan-Hu-1) using Geneious Prime (version 2020.2.4) (https://www.geneious.com/). Then, the aligned sequences were used to construct a maximumlikelihood phylogenetic tree using phyML 3.0 [33] with Smart Model Selection (SMS) [34] . SARS-CoV-2 genomes and associated metadata were downloaded from the GISAID database [30] (accessed on October 18 2021) to construct a global phylogenetic tree. Global genome collections were downloaded from the "Region-specific Auspice source file" of the GISAID database, resulting in 3580 viral genomes (globally distributed random viral strains collected from December 2019 to October 2021) for global phylogenetic tree construction. Globally distributed viruses with the G29234A mutation were identified using the 'substitution' tool from the GISAID database (accessed on October 18 2021) [30] . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 20, 2021. ; https://doi.org/10.1101/2021.11.15.21266335 doi: medRxiv preprint Then, the resulting 250 viral genomes containing the G29234A mutation were added to the Nextstrain build [37] as 'focal' sequences. The "global genome collections" were kept as 'contextual' genomes to build the phylogenetic tree. No non-human viral hosts were considered for tree construction. NIID initially recommended a multiplex qRT-PCR system for detecting SARS-CoV-2 using two primer and probe sets (N1 and N2) [14] . Subsequently, according to the 3 rd edition of "Guidelines for the operation of the new coronavirus (SARS-CoV-2) test methods, NIID recommends using N2 primer-probe set with one enforcement to detect viral RNA in the samples during the epidemic period in which outbreaks continue in multiple municipalities [38] . Following the instructions from NIID, the Kumamoto PCR center used the N2 primer-probe set for diagnosis purposes. In this qRT-PCR experiment, 20 samples collected from the IGV visualization of the VCF files revealed the presence of a GàA mutation at 29234 loci of the SARS-CoV-2 genome of A8167 and A8168 samples, whereas no mutation was found in the same locus for the rest of the samples (Figure 2A ). Coverage depth analysis revealed that the reads covering 29,234 loci were enriched with mutated base A (n=2296, 99.9%), whereas the wild type base G was supported by only three reads (0.1%) for sample A8167 ( Figure 2B) . A similar pattern was observed in sample A8168 as well, where the mutated A base was supported by 1,066 reads (100%) compared with the wild type G . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 20, 2021. ; https://doi.org/10.1101/2021.11.15.21266335 doi: medRxiv preprint base with zero (0%) reads in the mentioned position of the SARS-CoV-2 genome ( Figure 2B ). This mutation overlaps with the N gene probe used in the recommended N2 primer-probe set ( Figure 1A ). It can be assumed that the G29234A mutation in samples might be a cause of the aberrance in the qRT-PCR curve by considering the sequencing data obtained from the samples and their corresponding qRT-PCR curve. Whole-genome sequencing of another 61 samples collected from Kumamoto city within the same time primer-probe set was higher than that of the N1 set with all test samples during the protocol development. Similar results were found even in samples with low viral titers [38] . Therefore, the NIID advised all test centers to follow their recommended operation instructions during different endemic conditions. NIID recommended using the N2 primer-probe set with one enforcement during the epidemic period in which outbreaks continue in multiple municipalities [38] . Therefore, the similar use of the N2 primer-probe set might have been adopted in other prefectural test centers during the ongoing epidemic waves in Japan. Given this situation, the possibility of the presence of the G29234A mutation in the circulating strains might jeopardize some test results by returning a dubious qRT-PCR curve [19, 20] . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted November 20, 2021. ; https://doi.org/10.1101/2021.11.15.21266335 doi: medRxiv preprint Initially, the G29234A mutation showed a homoplasy trait. It appeared spontaneously in two clades at two time points: in 20C (clustered in Canada) during the early pandemic period in 2020 and 20I (also known as Alpha strain) from December 2020 to June 2021, mainly concentrated in England and Germany ( Figure 4) . However, from the recent trait, it was seen that the mutation had been spreading among the Delta variants of SARS-CoV-2. Although the frequency of this mutation was 0.007% compared to the total submitted genomes in the GISAID database at the time of reporting, the existence of this mutation in multiple clusters among the 21J (Delta) clade (Figure 4, Supplementary Figure 2 While searching for a possible causal mechanism for the G292324A mutation, we found a 5′-UGG (edited G is underlined) trinucleotide motif in the genomic strand of SARS-CoV-2. This complementary strand is 5′-CCA (edited C is underlined), suggesting the sequence preference of RNA for the APOBEC3G cytosine deaminase [40, 41] . However, at least three studies with SARS-CoV-2 genomic sequence analyses have shown that changes in CàU are biased in the viral genomic stand [42] [43] [44] . Therefore, the G292324A mutation may not be due to deamination by APOBEC3 proteins. Further investigation is required to confirm the involvement of APOBEC3 enzymes and/or other mechanisms in the G292324A mutation. In addition, the G29234A mutation caused a change in the amino sequence from glycine (G) to serine (S) at 321-position of the N protein. To determine why this mutation in the N protein did not sustain the other clades during the early pandemic and why this mutation had been spreading within the Delta variants requires further investigation. The above findings indicate two key points. First, mutations in the primer-probe target regions could cause an atypical qRT-PCT curve, leading to false-negative results. Second, various mutations can occur sporadically and unpredictably. Therefore, efficient and robust screening systems are deemed necessary to promptly monitor the emergence of new variants of interest. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Variants of concern (V0C) are marked according to the Greek alphanumeric symbols withing the bracket adjacent to the relevant nextstrain clade names. Quality control and preprocessing of metagenomic datasets Fast and accurate long-read alignment with Burrows-Wheeler transform Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools Haplotype-based variant detection from short-read sequencing Integrative genomics viewer GISAID: Global initiative on sharing all influenza data -from vision to reality A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology CoV-GLUE: A Web Application for Tracking SARS-CoV-2 Genomic Variation New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0, Systematic Biology SMS: Smart Model Selection in PhyML The neighbor-joining method: a new method for reconstructing phylogenetic trees Confidence Limits on Phylogenies: An Approach Using the Bootstrap Nextstrain: real-time tracking of pathogen evolution Guidelines for the operation of the new coronavirus (SARS-CoV-2) test method Genomic Surveillance in Japan of AY.29-A New Sub-lineage of SARS-CoV Delta Variant with C5239T and T5514C Mutations Mitochondrial hypoxic stress induces widespread RNA editing by APOBEC3G in natural killer cells The double-domain cytidine deaminase APOBEC3G is a cellular site-specific RNA editing enzyme Rampant C→U Hypermutation in the Genomes of SARS-CoV-2 and Other Coronaviruses: Causes and Consequences for Their Short-and Long-Term Evolutionary Trajectories Evidence for hostdependent RNA editing in the transcriptome of SARS-CoV-2