key: cord-0980262-e3ihbw0x authors: Chamings, A.; Bhatta, T. R.; Alexandersen, S. title: An increased ratio of SARS-CoV-2 positive to negative sense genomic and subgenomic RNAs within routine diagnostic upper respiratory tract swabs may be a marker of virion shedding date: 2021-07-03 journal: nan DOI: 10.1101/2021.06.29.21259511 sha: f34c2c7b40f06be844fdd504dac6c923fe73fcb7 doc_id: 980262 cord_uid: e3ihbw0x Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread rapidly in the global population since its emergence in humans in late 2019. Replication of SARS-CoV-2 is characterised by transcription and replication of genomic length RNA and shorter subgenomic RNAs to produce virus proteins and ultimately progeny virions. Here we explore the pattern of both genome-length and subgenomic RNAs and positive and negative strand SARS-CoV-2 RNAs in diagnostic nasopharyngeal swabs using sensitive probe based PCR assays as well as Ampliseq panels designed to target subgenomic RNAs. Using these assays, we measured the ratios of genomic to subgenomic RNAs as well as the ratios of positive to negative strand RNAs in SARS-CoV-2 positive nasopharyngeal swab samples. We found that while subgenomic RNAs and negative strand RNA can be readily detected in swab samples taken up to 19 and 17 days post symptom onset respectively, and therefore their detection alone is not likely an indicator of active SARS-CoV-2 replication. However, the ratios of genomic-length to subgenomic RNA and also of positive to negative strand RNA were elevated in some swabs, particularly those collected around the onset of clinical symptoms or in an individual with decreasing PCR Cts in successive swab samples. We tentatively conclude that it may be possible to refine such molecular assays to help determine if active replication of virus is occurring and progeny virions likely present in a SARS-CoV-2 positive individual. Assays targeting subgenomic N or ORF7a RNAs as well as strand specific ORF7a total genome-length and subgenomic RNAs may be the most sensitive for this purpose as these targets were consistently the most abundant in the swab samples. days after the initial onset of symptoms and the quantity of PCR targets may likely have been 174 approaching the limit of detection for the single PCR assays making quantitation unreliable. We 175 noticed that the swab sample from individual 2 (GC-26) was reporting a higher Ct in this study 176 compared to our previous SARS-CoV-2 work but the bioanalyzer analysis (see below) indicated that 177 there may have been some degradation of the RNA during storage between the two studies 11 . 178 Nevertheless, we could still detect the genomic and subgenomic SARS-CoV-2 targets in this sample 179 with the probe-based assays. higher in some of the swab samples collected within the first 24 hours after the onset of symptoms 191 8 of targets detected by this assay were likely SARS-CoV-2 genomic RNA molecules as opposed to 244 the subgenomic RNAs. Therefore, this PCR could be used to approximate the ratio of positive to 245 negative sense copies of the SARS-CoV-2 genomic RNA. From our previous work, we knew that the 246 strand specific PCRs were less sensitive than their non-strand specific counterparts 11 , therefore for 247 this experiment, we selected fourteen of the swab samples with the lowest 5' UTR Ct's and in which 248 both the subgenomic ORF7a and N gene non-strand specific PCRs were positive, in addition to the 249 cell culture sample and used the RNA of each in the strand specific PCRs. 250 The strand-specific ORF7a total PCR assay was able to detect positive sense SARS-CoV-2 genomic 251 RNA (ORF7a total) in all the fourteen naso-oropharyngeal swabs tested as well as the cell culture 252 sample (Table 5 and Figure 5) . Similarly, positive sense N gene subgenomic RNA was detected in 253 all samples, but positive sense ORF7a subgenomic RNA detected in only eight swab samples and the 254 cell culture sample. Negative sense genomic 7a total and N gene subgenomic RNA was detected in 255 eleven swab samples, while negative sense 7a subgenomic RNA was detected in 9 swabs ( Figure 5 ). 256 When sufficient RNA remained, the strand specific PCRs were repeatable, although some samples 257 with very high Ct's >35 for some of the targets did test negative in one of the repeats suggesting that 258 some of these targets were near the limit of detection of the assays. 259 While most of the fourteen samples were taken early in the clinical course of the SARS-CoV-2 260 infection, four samples, GC-26, GC-292, GC-291 and GC-310 taken on days 7,10,13 and 17 post 261 symptom onset were also within this set, and we were able to detect two or more of the negative sense 262 targets in each of these samples. In the samples with a non-strand specific ORF7a total PCR Ct of 20 263 or less, all negative and positive sense targets could be detected, but in samples with a CT > 20, 264 detection of one or more strand specific targets became inconsistent, and only one of those samples 265 (GC-26 taken on day 7) was PCR positive for all of the positive and negative strand targets. Despite 266 this, generally there was a trend in the diagnostic swab samples that as the Ct for the total amount of 267 virus RNA in the sample increased (as indicated by the Ct of the non-strand specific total ORF7a Ct 268 in Figure 5 ), the strand specific PCR Ct's, if positive, also increased. 269 In only six of the fourteen swab samples were we able to detect all positive and negative strand RNA 270 targets below a Ct of 35. These six samples included three taken on the day of symptom onset, and 271 three taken on 1, 2 or 10 days after the onset of symptoms. Using the strand specific PCR Ct's, we 272 calculated the relative proportion of each RNA molecule relative to the positive sense ORF7a total 273 positive assay in these six samples ( Supplementary Figures 3 and 4 and Source Data File). The 274 overwhelming majority of SARS-CoV-2 RNA in the swab samples was positive sense. In the three 275 swabs taken on the day of symptom onset the amount of positive sense SARS-CoV-2 RNA was 63 276 to 264 fold higher than the negative sense RNA, while in the three swabs collected later in the clinical 277 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. While we only detected all strand specific targets in six samples, we did however detect positive and 287 negative strand ORF 7a total in eleven swab samples and were therefore able to calculate the ORF7a 288 total positive to negative strand ratios in these samples. We again saw the pattern of swabs from the 289 day of symptom onset having 100 fold or more positive sense gRNA, while the majority of the swabs 290 collected later had a slightly lower ratio of positive to negative sense gRNA (ratio of 30-40) 291 (Supplementary Figure 4) . This however was not a hard and fast rule with two later samples (GC-26 292 and GC-291) also having relatively high proportions of positive to negative strand gRNA (ratio above 293 100). GC-291 was from individual 14 taken 13 days post symptom onset, and the diagnostic SARS-294 CoV-2 PCR Ct of this sample was lower than their first swab collected at day of symptom onset (GC-295 238). It was therefore likely that some virus replication was still occurring in this individual to account 296 for this increase in SARS-CoV-2 RNA. Presumably during replication more virions and cytoplasmic 297 positive sense RNA, including both full length genomic and subgenomic RNAs, used for translation 298 of SARS-CoV-2 proteins would be present, and this would increase the positive to negative strand 299 RNA ratio. After replication had finished, these virions would be excreted and cytoplasmic RNAs 300 degraded and the relative proportion of positive strand RNA would fall until only the positive and 301 negative strand SARS-CoV-2 RNA in the replication complexes remained 12 . 302 All positive and negative sense RNAs were detected in the cell culture sample, but the ratios of 303 positive to negative strand RNA were even higher in the cell culture sample compared to any of the 304 swab samples for all three targets. This is likely due to the fact that although the cell culture at 48 hrs 305 represents a late stage of infection with full lysis of cells and release of virions into the medium, the 306 material had been clarified by centrifugation possibly pelleting and thus removing most cellular 307 membrane material including double membrane vesicles, which we believe contain most of the 308 negative sense RNA. In addition, the lysed cells may still have contained fragments of positive sense 309 virus RNAs used for translation. In contrast, for the swab samples at the time of onset of symptoms, 310 that is up to 14 days post initial exposure 17 , any virions produced in vivo would be continually cleared 311 by normal muco-ciliary clearance mechanisms and later on by the immune response. The cell culture 312 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint material was also gamma irradiated prior to transportation to our laboratory which would be more 313 likely to damage double stranded RNA compared to single stranded RNA 18 . Given that any negative 314 strand would likely be complexed with the more highly abundant complementary forward strand, this 315 damage could disproportionately affect the negative sense RNA possibly also contributing to the 316 observed higher ratio of positive to negative sense RNA. 317 318 Detection and abundance of NGS reads mapped to subgenomic RNAs 319 To further explore the pattern of subgenomic RNAs in the naso-oropharyngeal swab samples, we 320 created two Ampliseq primer mini-panels using 11 reverse primers within each of the ten potential 321 SARS-CoV-2 subgenomic RNAs and in the 5' UTR region together with two different forward 322 primers within the 5' leader sequence of all the SARS-CoV-2 RNAs. These primers were a subset of 323 the Ampliseq primers from the ThermoFisher Ion AmpliSeq SARS-CoV-2 Research Panel which we 324 had previously used to detect subgenomic RNAs 11 . The rationale for doing so was to essentially create 325 a multiplex PCR to quantitate the relative abundance of genomic and subgenomic RNAs within each 326 sample. We found that despite these two mini-panels differing only in their forward primer, the 327 amount of amplification detectable in the same SARS-CoV-2 positive swab samples differed greatly 328 with an average of 901,000 reads generated with mini-panel 2 after 31 cycles and an average of 21,000 329 reads generated by mini-panel 1 after 31 cycles of PCR amplification (see Source Data). Given that 330 similar number (316 vs 339) of reads as the random hexamer cDNA (See Source Data) and again, 348 subgenomic ORF7a and N were the most abundant amplicons. 349 In mini-panel 2, the forward primer was much more efficient than the mini-panel 1 forward primer, 350 and we saw much more amplification across all targets. The 5' UTR amplicon was particularly 351 efficient, and even by 21 cycles of amplification, the number of reads mapped to this amplicon was 352 significantly higher than all other amplicons. By 31 cycles we could see that the relative number of 353 reads mapped to the subgenomic RNAs was reduced due to this single amplicon crowding them out 354 on the sequencer chip. Despite this, mini-panel 2 identified a similar pattern in the relative abundance Mini-panel 2 was also able to detect a low number of reads belonging to potential non-canonical 363 subgenomic RNAs which could initiate expression of the proteins of ORF 9b 19 and also ORF 7b. The 364 ORF9b subgenomic RNA contained the 69 nucleotides of the leader joined to nucleotide 28284, 365 which is 24 nucleotides downstream of the regular N gene TRS (based on the nucleotide numbering 366 of SARS-CoV-2 Wuhan-Hu-1 NC_045512.2) similar to what has been described 19 . The non-367 canonical ORF 7b subgenomic RNA, here designated subgenomic RNA 7b*, contained the 69 368 nucleotides of the leader sequenced joined to nucleotide position 27674, 26 nucleotides downstream 369 of the canonical 7b TRS site. The ability to detect such reads appeared to be related to the number of 370 reads detected mapping to the canonical subgenomic RNA upstream to these non-canonical sites (ie 371 subgenomic N gene for ORF 9b and subgenomic 7a for ORF 7b), and therefore these reads were seen 372 more frequently when we used 31 cycles of PCR amplification. The number of subgenomic ORF 9b 373 reads made up only approximately 0.03-0.11% of the total subgenomic N gene reads and the number 374 of subgenomic ORF 7b was equal to only 0.13-0.25% of the number of subgenomic ORF 7a reads 375 (See Source Data). Interestingly, there was some minor heterogeneity to these non-canonical 376 subgenomic RNAs in the occasional read in some samples such as GC-14 (using 21 PCR cycles, see 377 Source Data) which showed the leader joined to sequence upstream of the 7b AUG at position 27602 378 and also the 9b subgenomic RNA had some minor heterogeneity, for example for sample GC-251 379 (using the full panel (see below) and 21 cycles, see source data). Close inspection of reads generated 380 in our previous study 11 also supported this minor heterogeneity, for example GC-277and GC-26 381 which showed the leader joined to sequences at position 27678 for the 7b* subgenomic RNA and 382 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint some heterogeneity around the 9b site, respectively (data not shown, raw data available in SRA 383 Accession PRJNA636225). 384 385 SARS-CoV-2 Sequencing and subgenomic analysis using the full Ampliseq panel The sequence from individual 20, who also belonged to the case cluster, differed in two nucleotides 407 from MW321043 and the SARS-CoV-2 sequences obtained from the other individuals in the cluster. 408 One change at nucleotide 22993 (based on the nucleotide numbering of SARS-CoV-2 Wuhan-Hu-1 409 NC_045512.2) was a change from a C to T without altering the amino acid sequence of the spike 410 protein. The second change, G to A, was at nucleotide 28884 located within the N protein (amino 411 acid 203) and changed an arginine to a glutamine in the nucleoprotein sequence. This would have 412 also resulted in a lysine to glutamic acid in the ORF9c protein (amino acid 50) 20 , and altered the non-413 canonical TRS of the proposed ORF N* 19 which was present in the other SARS-CoV-2 sequences 414 presented here. Searching for sequences from the second wave in Victoria, Australia with BLASTN 415 on GISAID and GenBank's nucleotide database, we found that there was only one virus sequenced 416 in the local epidemic with this change in the nucleoprotein during the initial epidemic molecular 417 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint early August 2020. The characteristics for MW153442 matched those of our individual 20, and 419 subsequent follow up of laboratory records revealed that individual 20 was in fact tested for SARS-420 CoV-2 in early August, and the swab sample sent to the state reference laboratory for sequencing. 421 Given that no other virus sequences with this change were identified in the local epidemic, this variant 422 may have only arisen in a single individual during the outbreak in this cluster, but was not able or did 423 not have the opportunity to transmit and propagate significantly in the general population afterwards. sit. This revealed a change of a C to a T at nucleotide position 40 within the leader sequence, which 433 is within the 3' end of where the forward primer would anneal. This change is most likely the reason 434 the forward primer was unable to anneal and amplify in these two samples. Given that all the other 435 SARS-CoV-2 sequences we generated in this study, except for that of individual 20, were identical 436 to the SARS-CoV-2 genomes from individuals 12 and 21, it is entirely possible that those SARS-437 CoV-2 genomes also had the same C to T change at position 40, but a forward primer did manage to 438 eventually anneal during the amplification PCR of these samples resulting in the expected amplicon. 439 If this nucleotide change was present in the samples, particularly if strong RNA secondary structures 440 are also present in this region 16 , then that could explain why the mini-panel 1, which used this same 441 forward primer, amplified the 5'UTR and subgenomic RNAs inefficiently compared to mini-panel 2. 442 For mini-panel 2, the end of the primer was at nucleotide position 52, and therefore this particular 443 nucleotide substitution would have sat within the middle of the primer which is less critical to primer 444 annealing and extension when compared to the 3' end. Similarly, there was also a nucleotide change 445 of a C to a T at position 241 in all seven SARS-CoV-2 genomes sequenced in this study, and this also 446 sat within the middle of the annealing site of the reverse 5'UTR primer of the mini-panels and did 447 not affect the primer's function as the 5' UTR was the most efficient amplicon of mini-panel 2. 448 By looking for reads with the 5' UTR leader sequence joined to 3' ORFs at known TRS sequences, 449 we could identify between 161 to 142999 reads coming from subgenomic RNA in the full Ampliseq 450 panels (See Source Data). Consistent with the mini-panels, reads mapping to subgenomic ORF7a and 451 N gene RNAs were the most abundant, accounting for 37.3 to 61.2% of all subgenomic reads 452 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint subgenomic RNAs, and therefore it estimated the relative abundance of these two RNAs higher than 454 either of the two mini-panels. The remaining subgenomic RNAs were also detected (except for 455 ORF7b and ORF10) but made up a much smaller fraction of the total subgenomic reads (11-23% of 456 subgenomic reads). Like mini-panel 2, the full panel was also able to detect a low number of some 457 non-canonical ORF 9b subgenomic RNAs is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint to suppress cellular mRNA splicing and translation 23 and result in apoptosis of human epithelial 488 cells 24 , and therefore we also looked for correlations between the amount of virus detected in the 489 samples and the level of cellular RNA as we had seen a negative relationship in our previous study 11 . 490 We first measured total RNA in the swab samples using RNA 6000 Pico chips on a bioanalyzer and 491 assessed the appearance of ribosomal peaks in the extracted RNA. In the 30 swab samples for which 492 we had sufficient RNA to perform the bioanalyzer analysis, we found that quantity of extracted 493 nucleic acid was highly variable (40-4816pg/µl see Source Data File), and clear peaks corresponding 494 to ribosomal RNA were only visible in 16 of the 30 swab samples. As a result, the reported RNA 495 integrity (RIN) scores varied between 1 and 8.4, with nine samples not reporting a RIN and only six 496 with a RIN of 3 or greater (Source Data File). This indicated that there was some variability between 497 swab samples in the amount of epithelial cells and cellular material collected. The quality and quantity 498 of ribosomal RNA was also highly variable and may have either been poorly preserved prior to 499 arriving at our laboratory or reduced as a result of the SARS-CoV-2 infection. The bioanalyzer also 500 identified that there had been some degradation of the extracted RNA from the swab sample from 501 individual 2 which we had extracted as part of our previous study 11 and used again here. This may 502 have explained the increase in this sample's SARS-CoV-2 PCR Ct's seen in this investigation as 503 compared to our previous study 11 . 504 We then examined whether there was a relationship between the total RNA in a swab and the PCR 505 between the RIN and the SARS-CoV-2 PCR Ct's suggesting that the amount of SARS-CoV-2 PCR 523 RNA (Ct) was independent of how intact the ribosomal RNA (RIN) was at the time of testing. 524 525 Analysis of the host response in swab samples 526 To look more closely at cellular mRNA in the swab samples, we used an Ampliseq panel designed to 527 specifically amplify 683 targets from the mRNA of known human inflammatory genes and compared 528 the relative expression of these genes across the samples. SARS-CoV-2 is known to provoke innate 529 immune responses in epithelial cells 25 and is known to have general immune suppressive effects in 530 patients 26 , and therefore studying selected inflammatory genes could indicate the transcriptional 531 activity of the upper respiratory epithelium and whether the cells present were responding to the virus. 532 From previous work, we knew that the RNA extraction method we had employed here was able to 533 extract both DNA and RNA from samples, and therefore the first step of this analysis involved 534 running three selected no cDNA controls (two SARS-CoV-2 positive swabs (GC-11 and GC-21) and 535 the negative pool 1) to identify potential gene targets that could amplify from genomic DNA as well 536 as from host mRNA. As most amplicons in the Ampliseq panel were designed to sit across an intron, 537 theoretically only reads amplifying the cDNA of mRNA would amplify from both forward and 538 reverse primers, and any reads generated from a single primer binding to genomic DNA, would 539 include exon or intragenic sequence at one of the ends of the read sequence. As the read mapping 540 TMAP software would penalize any read with a large unmapped end when we mapped to the human 541 transcriptome, we were able to establish a stringent read mapping requirement of a minimum 542 alignment length of 74 nucleotides (the length of the shortest amplicon), and a MAPQ of 90, to filter 543 out the reads generated from DNA targets (these were seen in IGV as mapped reads with large soft 544 clipped ends and a MAPQ score of ~75 or lower depending on the amplicon). However, these settings 545 alone could not filter all reads coming from DNA as the panel included 22 amplicons which did not 546 span an intron, and therefore reads mapped to these amplicons could come from genomic DNA or 547 mRNA. These 22 amplicons were therefore excluded from this analysis (see source data file). 548 We identified an additional 80 amplicons in which 10 or more reads were mapped in one or more of 549 the no cDNA control samples, even at the stringent mapping requirements described above. We 550 inspected these amplicons in IGV and found that for many of these 80 targets, the majority of the 551 amplicon sat within a single exon of a gene, with essentially only one primer sitting in another exon. 552 Consequently, reads coming from the genomic DNA and which had their intron sequences trimmed 553 by the IonS5XL server for base calling quality reasons, were completely mapped within the amplicon 554 region and therefore received a high MAPQ score from the TMAP software. We therefore removed 555 these 80 amplicons from the analysis (see source data file). As a result, the inflammatory gene analysis 556 was restricted to 581 inflammatory gene targets. 557 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint low numbers of reads being mapped, we excluded any amplicons with less than an average of 10 559 reads across all the samples. With the remaining 469 inflammatory gene targets, the number of reads 560 mapped to each gene was ranked highest to lowest and we then performed hierarchical cluster analysis 561 to see if we could observe any difference in the pattern of inflammatory gene expression in the SARS-562 CoV-2 positive and negative swab samples. We then calculated the average number of reads mapped 563 to each inflammatory target across all the swab samples, and then for each sample classified how 564 many gene targets were above or below the average number of reads mapped of all samples to 565 quantify overall cellular expression. The hierarchical cluster analysis identified one broad cluster of 566 samples (Cluster 1), which included the SARS-CoV-2 negative samples, except for the swab from all the samples in this study. This sample however also had a high SARS-CoV-2 Ct, low RIN value 592 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. Benjamini-Hochberg method using a false discovery rate (FDR) of 25% was applied to examine only 634 those inflammatory genes with the largest differential expression between the three groups. We found 635 no genes with significant differences in the pattern of relative read counts between all three groups, is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint respiratory epithelial cells 29 , while DDX58 is understood to be able to bind to single stranded positive 663 sense viral RNA and block the SARS-CoV-2 RNA dependent RNA polymerase from copying its 664 genome 27 and thus significantly reducing viral replication. The elevation of these two genes along 665 with more positive sense SARS-CoV-2 RNA would be consistent with current/recent virus replication 666 in the upper respiratory epithelium at the time of sampling in these individuals. 667 Reads mapped to NMI were similarly elevated in the early group further indicating that the epithelial 668 cells were responding and regulating the innate immune response at the time these swabs were taken. 669 NMI is a regulator of the innate immune response and has been demonstrated to be part The number of reads mapped to IRF7 were higher in most of the early clinical swab samples, but the 681 expression levels did not drop off in the swabs collected later during the course of infection as sharply 682 as DDX58 or IFIH1 mentioned above. IRF7 is an important transcription activator for the production 683 of interferon in the innate immune response to some RNA viruses but interestingly, IRF3 and IRF5 684 have been shown to be more important that IRF7 in regulating the IFIH1 induced interferon response 685 in SARS-CoV-2 29 yet IRF7 was on average more elevated relative to IRF3 in the early clinical swabs 686 compared to the late clinical/negative swabs in this study (IRF5 was not part of the Ampliseq 687 inflammatory panel used here). 688 to attract and regulate neutrophil migration from the bone marrow into blood and into tissue such as 696 the respiratory epithelium 33, 34 . Given that C3AR1, FPR1 and TNFSF10 were elevated in the swab 697 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. The quantity of SARS-CoV-2 subgenomic RNA detected by PCR was related to the amount of total 715 SARS-CoV-2 RNA present in the samples detected by the genomic RNA and diagnostic PCRs. We 716 could consistently detect subgenomic RNA in samples with a genomic PCR Ct of 30 or less, but 717 above this level the detection of these RNA molecules became inconsistent. The majority of SARS-718 CoV-2 RNA within a swab sample was genomic-length RNA and this was on average between 7 to 719 22 times more abundant than subgenomic ORF 7a RNA, and about 2 to 10 times more abundant than 720 subgenomic N gene RNA in swab samples. Based on the probe-based PCRs, we estimate that on 721 average, the ORF7a and N gene subgenomic RNA molecules could make up to around 12.5-25% of 722 the total SARS-CoV-2 RNA collected by a nasopharyngeal swab sample. These two subgenomic 723 RNA molecules likely represent a large fraction of the subgenomic RNA present within a typical 724 naso-pharyngeal sample, as both of the Ampliseq subgenomic mini-panels and the genomic 725 sequencing SARS-CoV-2 Ampliseq panel all consistently found that these two subgenomic 726 molecules made up the largest percentages of all the subgenomic reads. Based on these findings, we 727 believe that assays detecting either the ORF7a or N gene subgenomic RNAs would be the most 728 sensitive way of detecting SARS-CoV-2 subgenomic RNA, and that these two targets, simply because 729 of higher abundance and not as a measure of current/active transcription, should be at detectable 730 levels for longer than perhaps some of the other subgenomic RNA molecules in diagnostic samples. 731 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. individuals, including 7 from our previous study 11 , were selected for this study (Table 1) . Individuals 812 were selected on the basis of 1) they had repeated swabs taken on multiple days, and/or 2) their initial 813 swab had a Ct below 30. One sample from individual 4 (GC-11) used in our previous study was 814 exhausted part way through this study, and so was only used for the inflammatory panel analysis We had previously developed and reported SYBR/Syto 9 based PCR assays to detect genomic and 849 subgenomic SARS-CoV-2 RNA in clinical samples 11 . These assays, however, had limitations in their 850 sensitivity and needed to be run individually as they all used a single fluorescent reporter SYBR/Syto 851 9 in the real-time PCR reaction. To improve the sensitivity of these assays, we designed four probe-852 based PCR assays, three of which were similar to our previous SYBR/Syto 9 based assays, and each 853 amplified either the 5' UTR genomic RNA, the ORF7a subgenomic RNA only, or the total (genomic 854 and subgenomic) ORF7a RNA, respectively, and a fourth PCR targeting the N-subgenomic RNA 855 only. Primer and probe sequences were designed based on the sequence of SARS-CoV-2 Wuhan-Hu-856 of prepared cDNA from the samples were added to a PCR mastermix comprising 1x Brilliant 872 Multiplex qPCR mastermix (Agilent), 1µM of both the forward and reverse primer, 0.2 µM probe 873 and nuclease free water to a total reaction volume of 10 µl. The PCR reaction was then performed in 874 a Quantstudio 6 real-time PCR machine (Thermofisher). Known SARS-CoV-2 positive and negative 875 swab samples and the SARS-CoV-2 cell culture sample were used to initially optimize the PCR 876 temperature cycling conditions, with the optimum cycling temperatures determined to be 95°C for 10 877 min, then 40 cycles of 95°C for 3 sec, 58°C for 30 sec and 64°C for 30 sec. 878 After the successful amplification from known positive clinical samples, the resulting PCR product 879 was visualized on a 2% agarose gel (2% Size Select E-gel, Thermofisher) to confirm a product of the 880 expected size was produced. The amplicon was then extracted from the gel and sequenced using the 881 3.1 Big Dye Terminator PCR sequencing reaction and sequenced on a Hitachi 3500 genetic sequencer 882 (Thermofisher) to confirm that the amplicons were indeed the expected genomic and subgenomic 883 targets. The gel purified amplicons were quantitated on a QiaXpert spectrophotometer (Qiagen, 884 Victoria, Australia) and used as standards in serial dilution in the subsequent qPCR reactions, and in 885 determining the efficiency of the PCR reactions. The PCR efficiency, estimated to be 95% for these 886 assays, along with the difference in Ct values between the genomic 5'UTR and the subgenomic 887 ORF7a and N assays was used to calculate the ratio of genomic to subgenomic RNA within the 888 samples. 889 To determine if we could multiplex the individual assays and develop a single test which could both 890 quantitate genomic and subgenomic RNA in a SARS-CoV-2 positive sample, we initially combined 891 the 5' UTR and the ORF7a subgenomic assays into a duplex real-time PCR. This was performed by 892 combining 1x Brilliant Multiplex qPCR mastermix (Agilent), 0.9µM of the shared SARS-CoV-2 893 leader forward primer, 0.9 µM of each of the 5'UTR and ORF7a reverse primers, 0.18 µM of each of 894 the 5'UTR and ORF7a probes, 2ul of sample/control cDNA and nuclease free water to a final volume 895 of 11ul. This PCR reaction was run under the same cycling temperatures described above. 896 Once we determined that the 5'UTR genomic and ORF7a subgenomic assays could be successfully 897 duplexed, we then attempted to create a triplex assay by incorporating the N gene subgenomic assay 898 into the duplex. This would allow us to efficiently quantitate two subgenomic RNA molecules 899 alongside the genomic RNA. Different primer concentrations of the common leader forward primer 900 from 0.71, 1.43 and 2.86 µM were evaluated as part of the optimization of this assay. The final PCR 901 reaction for the triplex assay was setup by adding 1x Brilliant Multiplex qPCR mastermix (Agilent), 902 2.86µM of the shared SARS-CoV2 leader forward primer, 0.71 µM of each of the 5'UTR and ORF7a 903 and N-gene reverse primers, 0.14 µM of each of the 5'UTR, ORF7a and N-gene probes, 2ul of 904 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint sample/control cDNA and nuclease free water to a final volume of 14ul. The triplex assay was run in 905 duplicate under the same temperature conditions as described above. 906 Real time assays to quantitate positive and negative sense SARS-CoV-2 RNA. During replication of SARS-CoV-2 in cells, the virus produces negative sense RNA to act as a 909 template for the production of full length copies of the virus genome and the various subgenomic 910 RNAs. We had previously detected negative strand RNA in three of eight diagnostic swab samples 11 , 911 however the quantity of negative strand SARS-CoV-2 RNA was 150 and 20 fold lower than positive 912 strand genomic and subgenomic RNA respectively 11 , and therefore we could only detect it in samples 913 with a very high virus load (low Ct). To create potentially more sensitive assays to detect strand 914 specific RNA in more SARS-CoV-2 samples, we attempted to adapt the probe-based PCRs described 915 above to single step, strand-specific PCR reactions combining sense specific primer reverse 916 transcription followed by PCR. 917 We initially used the SARS-CoV-2 48 hr cell culture, for which we had ample RNA available, for 918 the development and optimization of the strand based assays. For each strand specific PCR, 2µl of 919 the RNA from a sample was denatured at 95°C for 3 mins and then rapidly cooled on ice. The 920 denatured RNA was then mixed with 1x Brilliant II qRT-PCR 1-Step QRT-Master Mix (Agilent), 921 1µl of 10µM primer (the PCR forward primer for negative sense cDNA synthesis and the PCR reverse 922 as expected with efficiencies of around 95% similar to the non-strand specific versions of these 940 assays. 941 Once we were confident that we had developed an efficient set of strand specific assays, we focussed 942 on the samples which likely had enough virus RNA load to allow detection of strand specific RNA 943 targets. Fourteen SARS-CoV-2 positive swabs, the 48 hour SARS-CoV-2 cell culture supernatant, 944 and the negative individual and pooled negative swab sample 1 were tested in the strand specific 945 assays described above. Some of the clinical samples had little RNA remaining at this point in the 946 investigation, so all RNAs were diluted 1:4 in low TE prior to cDNA synthesis except for the cell 947 culture, which was not diluted and the sample from individual 3 (GC-13), which was diluted 1:6. 948 Due to the poor performance of the 5' UTR positive sense PCR, ratios of the amount of strand specific 949 subgenomic RNAs were estimated by the delta-Ct between the 7a/N subgenomic assays and the 7a-950 total assay, which was used as a proxy for the amount of genomic RNA present in the sample. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. astrovirus, canine parvovirus and canine papillomavirus in puppies using next generation . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the 1264 decision to publish the results. 1265 1266 1267 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint show that both the ORF7a and N gene subgenomic, and the ORF7a total PCR Ct's also increase as the UTR Ct increases. Above a Ct of around 30 for the 5'UTR, the detection of subgenomic RNAs becomes less consistent/below levels of detection/quantitation, and the general pattern of Ct values following ORF 7a total < 5' UTR < N gene subgenomic RNA < ORF7a subgenomic RNA becomes unreliable. (n=25 data points from 25 biological samples run once (Subgenomic N run twice)). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint The sample results have been ordered by ascending values of the 5'UTR PCR Ct, and show that both of the subgenomic assays have a higher Ct, with ORF7a subgenomic PCR being consistently higher than the N gene subgenomic assay, and both increase in Ct as the 5' UTR Ct increases. Above a 5'UTR Ct of 30, this becomes inconsistent due to high or non-detectable Ct values for the subgenomic targets. (n=24 data points from 24 samples run twice) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint Figure 6 . Relative percentage of each subgenomic RNA of all reads mapped to the SARS-CoV-2 subgenomic RNA for the Ampliseq mini-panel 1 in naso-oropharyngeal swab samples with 31 PCR amplification cycles. Subgenomic ORF7a and N gene RNAs were consistently the two most abundant subgenomic RNAs present in the samples. Subgenomic ORF7b and ORF10 were not detected in any sample. (n=12 from 12 biological samples run once) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 3, 2021. ; https://doi.org/10.1101/2021.06.29.21259511 doi: medRxiv preprint mucosa to SARS-CoV-2 infection, as represented by the swab samples included in our study, we first 1013 measured the quantity and quality of the extracted nucleic acid using Agilent RNA 6000 Pico Chips 1014 on an Agilent 2100 bioanalyzer (Agilent Technologies Kendall Rank correlation analysis on the total nucleic acid quantity (pg/µl) and the RNA Integrity 1016 Number (RIN) and the Ct's of the non-strand specific single target and triplex PCR assays, and the 1017 strand specific assays Exploring the cellular gene expression in the naso-oropharyngeal swab samples 1020 We then used the Ion Ampliseq RNA Inflammation Response research panel Australia) to look at the host inflammatory response in more detail. This designed 1022 primer panel amplified targets within 683 human genes associated with immune/inflammatory 1023 responses All swabs from 1025 individuals from which two or more swabs were collected on different days, were included in this 1026 analysis. Two SARS-CoV-2 negative individuals and four pools of SARS-CoV-2 negative 1027 References Strand-Specific Reverse Transcription PCR for Detection of Replicating SARS-CoV SARS-CoV-2 Total and Subgenomic RNA Viral Load in Hospitalized Patients The Impact of Universal Mask Use on SARS-COV-2 in Victoria Incubation period of COVID-19: a rapid systematic review and meta-analysis of 1139 observational research Inactivation of RNA Viruses by Gamma Irradiation: A Study on Mitigating Factors Altered Subgenomic RNA Expression in SARS-CoV-2 B.1.1.7 Infections. bioRxiv SARS-CoV-2 ORF9c Is a Membrane-Associated Protein that Suppresses 28 SARS-CoV-2 ORF9b inhibits RIG-I-MAVS antiviral signaling by interrupting K63-linked FPR-1 is an important regulator of neutrophil recruitment and a tissue-specific driver Identification and characterization of a new member of the TNF family that 1184 induces apoptosis Trim21 regulates Nmi-IFI35 complex-mediated inhibition of A molecular pore spans the double membrane of the coronavirus replication 1189 organelle Infection of human Nasal Epithelial Cells with SARS-CoV-2 and a 382-nt 1191 deletion isolate lacking ORF8 reveals similar viral kinetics and host transcriptional profiles In vivo restitution of airway epithelium. Cell COVID-19 Evolution of antibody immunity to SARS-CoV-2 PANTHER version 16: a revised family classification, tree-based classification tool, 1231 enhancer regions and extensive API Negative pools 1 and 2 were made from SARS-CoV2 negative individuals tested during the first wave of SARS-CoV-2 infections. Negative pools 3 and 4 were made from negative individuals tested during the second wave of SARS-CoV-2 infections. Supplementary Materials: The following is available online; Supplementary Information file 1 including Supplementary Tables 1-2 and Supplementary Figures 1-13 1045 cellular mRNA transcripts. However, it was apparent from the no cDNA controls, that a number of 1046 amplicons could be readily amplified from cellular DNA in the samples. Therefore, these amplicons 1047 were excluded from subsequent analysis in all libraries. The excluded amplicons are shown in the 1048 source data file. The mapping of reads to the remaining 592 inflammatory gene targets was visualized 1049 in IGV to find the most appropriate settings for the Coverage Analysis plugin to count the number of 1050 reads mapped to each amplicon's target. At a low mapping quality scores, reads spanning a target but 1051 with additional sequence were visible. These could have been products from single primer 1052 amplification of cellular DNA. Therefore, a minimum alignment score of 74 (the length of the shortest 1053 amplicon excluding primers) and a mapping quality score of 90 were settled upon for quantitation of 1054 amplicon reads. At these settings, any reads not spanning the full amplicon, or reads with additional 1055 sequence beyond the target (i.e. exon sequence from a single primer amplification) were filtered out 1056 from the target read counts. 1057The read count data from the samples was normalized using python to calculate the reads per million 1058 CoV-2 sequencing panel and for help with the custom Ampliseq mini-panels used. We 1245 acknowledge colleagues at the CSIRO Australian Centre for Disease Preparedness, especially Dr