key: cord-1049361-n4bf3750 authors: Lindley, Robyn A.; Steele, Edward J. title: Analysis of SARS‐CoV‐2 haplotypes and genomic sequences during 2020 in Victoria, Australia, in the context of putative deficits in innate immune deaminase anti‐viral responses date: 2021-09-30 journal: Scand J Immunol DOI: 10.1111/sji.13100 sha: 8a0d83557fa406b552f582657190536824aec473 doc_id: 1049361 cord_uid: n4bf3750 The SARS‐CoV‐2 epidemic infections in Australia during 2020 were small in number in epidemiological terms and are well described. The SARS‐CoV‐2 genomic sequence data of many infected patients have been largely curated in a number of publicly available databases, including the corresponding epidemiological data made available by the Victorian Department of Health and Human Services. We have critically analysed the available SARS‐CoV‐2 haplotypes and genomic sequences in the context of putative deficits in innate immune APOBEC and ADAR deaminase anti‐viral responses. It is now known that immune impaired elderly co‐morbid patients display clear deficits in interferon type 1 (α/β) and III (λ) stimulated innate immune gene cascades, of which APOBEC and ADAR induced expression are part. These deficiencies may help explain some of the clear genetic patterns in SARS‐CoV‐2 genomes isolated in Victoria, Australia, during the 2nd Wave (June–September, 2020). We tested the hypothesis that predicted lowered innate immune APOBEC and ADAR anti‐viral deaminase responses in a significant proportion of elderly patients would be consistent with/reflected in a low level of observed mutagenesis in many isolated SARS‐CoV‐2 genomes. Our findings are consistent with this expectation. The analysis also supports the conclusions of the Victorian government's Department of Health that essentially one variant or haplotype infected Victorian aged care facilities where the great majority (79%) of all 820 SARS‐CoV‐2 associated deaths occurred. The implications of our data analysis for other localized epidemics and efficient coronavirus vaccine design and delivery are discussed. The SARS-CoV-2 epidemic infections in Australia during 2020 were small in number in epidemiological terms and are well described. The SARS-CoV-2 genomic sequence data of many infected patients have been largely curated in a number of publicly available databases, including the corresponding epidemiological data made available by the Victorian Department of Health and Human Services. We have critically analysed the available SARS-CoV-2 haplotypes and genomic sequences in the context of putative deficits in innate immune APOBEC and ADAR deaminase anti-viral responses. It is now known that immune impaired elderly co-morbid patients display clear deficits in interferon type 1 (α/β) and III (λ) stimulated innate immune gene cascades, of which APOBEC and ADAR induced expression are part. These deficiencies may help explain some of the clear genetic patterns in SARS-CoV-2 genomes isolated in Victoria, Australia, during the 2nd Wave (June-September, 2020). We tested the hypothesis that predicted lowered innate immune APOBEC and ADAR anti-viral deaminase responses in a significant proportion of elderly patients would be consistent with/reflected in a low level of observed mutagenesis in many isolated SARS-CoV-2 genomes. Our findings are consistent with this expectation. The analysis also supports the conclusions of the Victorian government's Department of Health that essentially one variant or haplotype infected Victorian aged care facilities where the great majority (79%) of all 820 SARS-CoV-2 associated deaths occurred. The implications of our data analysis for other localized epidemics and efficient coronavirus vaccine design and delivery are discussed. human populations. This is particularly the case for the elderly co-morbid subgroup where the deficits in innate immunity are clearly evident 4-8 as well as compromised adaptive immunity. 9, 10 Type I and type III interferon (IFN) inducible anti-viral immunity is particularly effected. 11 Longitudinal studies show that patients in this subgroup appear 'immune defenceless' to coronavirus respiratory tract infections and are at a very high risk for severe outcomes including death. 11 Important reviews on the special susceptibility of this aged co-morbid group have appeared. 10, 12 An integrated schematic of the innate immune deficits in relation to adaptive immune responses and severity of viral replication in healthy and susceptible human subjects has been published by Sette and Crotty 10 which clearly shows the barrier of induced innate immunity which normally quells viral replication in infected healthy subjects in the first few days of infection. Such a picture is completely consistent with a very large population-level observational study in Denmark (4 million PCR-tested individuals) showing that the elderly are more often immunologically vulnerable to reinfection with SARS-CoV-2. 13 We have started the process of investigating exactly how SARS-CoV-2 varies genetically to gain insight into more effective vaccine design. In a previous study, we analysed the full-length genomic sequences generated over the first three months of the pandemic from late December 2019 through January 2020 in China (mainly Wuhan), then via the hot spot zones in Spain, some early sporadic outbreaks in California and Washington State (through February 2020) and then to the large-scale explosive outbreak in New York City in mid to late March 2020. 14 In that study, we used a haplotype rather than the more mainstream phylogenetic tree approach to the temporally emerging SARS-CoV-2 sequence data [15] [16] [17] The haplotype approach directly compares the full-length test sequence aligned against the Hu-1 reference. This type of comparison provides insights into the putative APOBEC and ADAR deaminase-driven cystosine to uracil (C > U) and adenosine-to-inosine (A > I/G) variation mechanisms per se which are not necessarily obvious or available with the phylogenetic approach of the Pangolin classification system [15] [16] [17] (which generates putative person-to-person or P-to-P lineages, clusters and clades). Indeed, RNAsequence haplotypes, in our view, are functional strings implying stable RNA secondary structures required for maximal replicative efficiency in that host's genetic/biochemical background. 14 The other implication is that new vaccine designs and therapies may need to specifically target functionally important secondary structures with the presumptive replicative stability of the viral RNA genome (eg reviewed in Robson et al 18 ) . Our detailed haplotype and sequence analyses reported here further extend the main findings from the previous work 14 but now within a small well defined geographical urban setting. The main findings show how a single viral clone or haplotype of SARS-CoV-2, either unmutated or lightly mutated, can dominate a local epidemic and cause considerable health impacts on elderly citizens who are putatively immune defenceless. The person-to-person (Pto-P) spread of the same haplotype clone in closed institutions and between institutions was most likely affected by more healthy asymptomatic carriers of the disease. We are thus testing the hypothesis that due to expected deficits in type I and type III IFN inducible innate anti-viral immunity 11 involving activated APOBEC and ADAR deaminase expression 19-21 there will be a lowered level of observed mutagenesis in isolated SARS-CoV-2 genomes isolated from such patients. As we discussed earlier 14 the anti-viral properties of the induced innate immune APOBEC and ADAR deaminases are well documented in the literature. They are powerful viral genome mutators causing predominantly transition mutations such as C > T /G > A and A > G/T > C single nucleotide variations (SNV) in single-stranded and double-stranded viral RNA and DNA genomes. [21] [22] [23] [24] [25] Such mutations at some deaminase sites can be pro-viral. 21 On occasion, such mutators can also go off target affecting host genomes and thus contributing to cancer mutagenesis as recently reviewed by Lindley. 26 Here, we report on an exhaustive analysis of the levels of mutagenesis in full-length SARS-CoV-2 genomes that have been, due to the mass testing coverage in Victoria, isolated mainly from putative elderly infected patients (and their carers and tending health professionals according to media and government reports) in order to assess a fit with the hypothesis that a predicted lowered innate immune APOBEC and ADAR anti-viral deaminase responses in a significant proportion of elderly patients would be consistent with/reflected in a low level of observed mutagenesis in many isolated SARS-CoV-2 genomes. The data, in an epidemiologically somewhat unique cohort (isolated island community, excellent biomedical research environment, first class public health response system), are consistent with our stated hypothesis. from a known source or from an unknown source have been recorded by the Government of Victoria at the Department of Health website https://www.dhhs. vic.gov.au/victo rian-coron aviru s-covid -19-data and in the Herald Sun newspaper, Figures 1, 2, 3, 4 and 5 and Table 1, and Supporting Information File A and F summarizes these infection data. Clearly as Table 1 shows most elderly people who died in Victoria (97.8%) on contracting SARS-CoV-2 were over 60 years, of median age 80-89 years, and these were concentrated in aged care facilities (Supporting Information File F). However, not all infected resident cases in aged care died; yet it was a significant yet variable fraction across a range of institutions, approximately 30%-40% of resident cases (Supporting Information File F). b. Herald Sun newspaper: A full page of SARS-CoV-2 infection statistical data and detailed Victorian State map of case incident summaries (from mid-July to October 28) appeared each day in this Melbourne daily tabloid newspaper. Many of these pages were preserved and a scanned (pdf) digital record created. Examples for late July and early August-just before hard Stage 4 lock downs began on August 2-are shown in Supporting Information File A. Data in this newspaper record were used to construct the summary in Figure 2 from data in Supporting Information File C. This page of the newspaper also had a story of the day discussing the main challenges which the health authorities confronted. In our considered view, the most important reports, and there were many, were those on the very high incidence of 'Community Transmissions'. These cases, which are defined by rigorous epsidemiological contact tracing and genomic sequencing linked to a known cluster, are defined as SARS-CoV-2 infections acquired from an unknown source (Figures 2, 3 a. GenBank/NCBI Virus. This publicly accessible site is hosted at The National Center for Biotechnology Information (NCBI) at the National Institutes of Health (NIH), Bethesda, Maryland, USA, which curates SARS-CoV-2 sequences at https://www.ncbi.nlm.nih.gov/ sars-cov-2/#nucle otide -seque nces. This was also used F I G U R E 1 New SARS-CoV-2 cases per day recorded in Victoria, Australia, during 2020. These data can be accessed at https://www. dhhs.vic.gov.au/victo rian-coron aviru s-covid -19-data for all sequences haplotyped and analysed in Steele and Lindley. 14 As before, the simple and straight forward NCBI Virus alignment tool at the site was used for all sample alignments against the original Wuhan Hu-1 reference sequence (NC_045512.2). The data for the haplotyping screens of 96 alignments of variable size can be found in the Supporting Information File B. It has sheets showing the alignment number and sequence IDs; the alignments grouped by collection period from patients; a summary of numbers of various haplotypes versus collection dates; tables showing accumulated haplotype numbers over different collection times.An excel table and histogram plots (see Figures 6 and 7) showing the emergence of the almost clonal L241f.1vic haplotype over time versus the related L241f.1 haplotype in collections from early March 2020 to early September 2020 is assembled from data in Supporting Information File B and D and also presented in Supporting Information File E. An important caveat is there is no metadata available in the public domain linking a particular patient's clinical/epidemiological data to a given GenBank Accession number for a full-length SARS-CoV-2 genomic sequence. All associations are thus inferred from the sheer dominance of all main SARS-CoV-2 case data (and deaths) amongst the elderly and within aged or nursing care facilities (Supporting Information F). Inquiries in writing were made to The Victorian Department of Health and The Peter Doherty Institute, and these were unsuccessful in securing metadata information. where the full patterns of SNV variability across many F I G U R E 6 Haplotype numbers by collection period in first screen from data at NCBI virus. The primary summarized data are in Supporting Information File B, and these plots are from Sheet S4 collection time points can be inspected in that wider data set. The alignments discussed here were preserved as screen shot records of each NCBI Virus alignment for later construction of Variable Site Diagrams, or VSDs which clarify haplotype analysis as discussed 14 (Supporting Information File D). The putative impact of SNV changes on protein structure and function was qualitatively assessed as in Steele and Lindley 14 using the colour code for amino acid conservation or change in Supporting Information File D. The VSDs were sorted by sub-variants of the L241f.1 haplotype and also by sequences that were clearly unmutated, usually grouped at the bottom of each VSD figure. In this way, lightly mutated haplotypes can also be easily identified and P-to-P transfers and P-to-P groups can also be readily identified. The SNV patterns from VSDs for the L241f.1vic haplotype are summarized in Table 3 . An independent check was kindly conducted by Dr Jared Mamrot on the above haplotype screen and sequence alignment analyses on Victoria, Australia SARS-CoV-2 sequence data at NCBI Virus. The catalogue of Victorian SARS-CoV-2 sequences at the GISAID db (with GenBank ID links) was downloaded and screened for haplotype numbers using the haplotype markers defined in Table 2 at given collection dates spanning the 2nd wave, June 1 to September 30. Specifically, all SARS-CoV-2 viral genomes were obtained from GISAID (https://www.gisaid.org/) on 15 March 2021 and from GenBank (https://www.ncbi.nlm.nih.gov/sarscov-2/) on 23 March 2021. FASTA sequences with 'Vic' in the identifier (case-insensitive search) were extracted from both databases, combined, and duplicates were removed. Sequences collected between 01/06/2020 and 30/09/2020 were extracted (n = 12,009) and aligned to the NC_045512.2 SARS-CoV-2 reference genome using MAFFT (v7.475; https://mafft.cbrc.jp/align ment/softw are/) with options '--6merpair --keeplength --addfragments'. Haplotypes were defined according to 19 genomic positions, relative to the reference genome as listed in Table 2 Nucleotides at these positions were identified for each sample, collated and summarized. These data are in Supporting Information File G and for further ease of principle haplotype identification and analysis were sorted first by column T (p.28881-3), then by column F (p.7540), then by column C (p.1059) and then by column D (p.3037). Table 2 At time of writing, we are in the process of forming a translation table for the main Steele-Lindley (S/L) 'replicative' haplotypes (Table 2) Therefore, a number of PANGO lineages in this first small sample tested can be understood in terms of the sequence architecture of the main coordinated positional changes in the 29 903 nt Hu-1 RNA sequence (which is 'B' or 'L'). Evidently, PANGO is tracking human passaged variants in local global regions and they appear as Sub-Haplotype variants of the main Haplotypes we have recorded in the first three months of the pandemic. 14 In other NCBI Virus, multiple alignment analyses of pairs or groups of the same PANGO variants collected at different times in different geographical regions it is apparent to us these PANGO variants are the end products of multiple P-to-P transfers accumulating largely deaminase-mediated 14 mutations at each transfer. Estimates of the number of putative mutagenic P-to-P transfers away from the primary haplotypes reported in Steele and Lindley 14 The opportunity to gain a deeper insight into the SARS-CoV-2 genetic variation mechanisms and viral adaptation strategies during the pandemic was offered by the definitive sets of observations on SARS-CoV-2 case numbers in Victoria, (mainly in Melbourne) Australia. The Victorian epidemics are scientifically definitive for several reasons: 1. The genesis of the two epidemics were clearly different, the 1st Wave due solely to overseas travellers entering Australia from Northern Hemisphere infected zones, the 2nd Wave to infections clearly acquired in Australia ( Figures 1, 3, 4, 5) . This is a very important demarcation in the case data. 2. The health authorities implemented strict lock downs and social distancing protocols, including mandatory mask wearing and mass PCR testing as well as quarantining of overseas travellers on arrival. 3. The health authorities identified hot zones all over the state and convinced tens of thousands of citizens to get PCR-tested on a mass scale whether showing symptoms or not (upwards of 30,000 PCR tests on oro-nasal swabs per day through May, June, July, August and September 2020). 4. Teams of contact tracers were actively coordinated throughout Melbourne and regions and were largely comprehensive despite some failures and much public political criticism of the regimentation of public behaviour involved. As a consequence, a large amount of epidemiological data, PCR testing data and full-length genomic sequences were generated and largely placed in the public domain. This allows the comprehensive analysis and conclusions we record here. These observations and genomic sequence data are represented in 2020 by the main 1st and 2nd Wave SARS-CoV-2 epidemics: the 1st Wave in March-April (with a May blip in new cases and secondary transmissions) and the 2nd Wave beginning mid-June through July, August and September (Figure 1 ). Thus, many of the full-length genomic sequences for these epidemics (≥12,000) are publicly available for direct analysis using the alignment tools at the NCBI GenBank/NCBI Virus website. Moreover, whilst these two epidemics in Australia have been very controversial and caused much political dysfunction in Victoria, 27 the epidemics nevertheless have features that allow succinct scientific analysis not necessarily available to Northern Hemisphere infected zones (where the great bulk of global infections have occurred and continue to occur at the time of writing). The Results section thus highlights the statistics of the SARS-CoV-2 cases through 2020, the complete haplotyping of all publicly available SARS-CoV-2 genomes, and a detailed comparative full-length sequence analysis of all L241f.1genomes collected in March, April, May, June and sample sets selected at different times through early, mid and late July, and though August and September (all these alignments are summarized as VSD in Supporting Information File D, with SNV summary in Table 3 ). The pandemic in Australia began with travellers into Australia from overseas infected zones, beginning in late January 2020 and continuing through to the end of April and also later times into May 2020. 28 The numbers of New Cases per Day are shown in Figure 1 . The main 1st and 2nd Waves are shown-March-April, then from June through September and early October. There is a slight blip in May which can be considered for operational purposes the tail of the 1st Wave. The 1st wave was generated solely by infected travellers into Melbourne, and the 2nd Wave from mid-June through to the end of October generated almost exclusively by new cases acquired in Australia. About 79% of all cases acquired in Australia (ie Victoria) in the 2nd wave were by contact with an identifiable SARS-CoV-2 'case' or by assignment to a known SARS-CoV-2 genomic sequence cluster. Many however were acquired from an unknown source and could not be assigned by contact tracing or linked to a known cluster by genomic sequencing (Figures 2-5) . Further insight into these case categories is provided at the Victorian Department of Health website in the data displayed in Figures 3-5 for March, April, May, June, July, August and September. This is a very informative set of plots based on recorded daily Acquired Cases in T A B L E 2 Haplotypes and main sites defining SARS-CoV-2 common strain variants updated and revised for Victoria -Corrected 30. Notes: p.11080/83 is a pan-Haplotype marker (G/T) of putative 8oxoG non-deaminase modification at a WG site that can occur across haplotypes. Key changes in italic to focus on for Victorian 2nd wave analyses. The Figure 1 during this period). All of the confirmed contact cases were associated with known putative cases in aged care and nursing facilities, which dominated the 2nd Wave. These case and death data, for residents, health workers and carers can be viewed in Supporting Information File F. These foci of explosive hotspots, on the data available at GenBank/NCBI Virus, putatively fuelled the spread of the main infective clones of the dominant L241f.1vic haplotype variant ( Table 2 ). The data in Figures 3-5 are significant because they attest to the robustness of the contact tracing and genomic sequence assignment system in place in Victoria. The numbers per day were small enough for the contact tracing teams coupled to SARS-CoV-2 genomic sequencing at The Peter Doherty Institute to establish or eliminate links of the patient to known clusters. We can deduce this because as the clones of L241f.1vic infecting aged care facilities were being amplified their sheer spiking numbers (see Figure 1 , particularly during July) reduced the apparent proportion of cases with an unknown source, which, as discussed, remained remarkably constant with an array of haphazard spikes. It can be seen in the 'Mystery Case' data for August 2020 as recorded in the media by the Herald Sun newspaper (Supporting Information File C). The positional SNV markers for the first haplotype screen of 12,798 genomic sequences at NCBI Virus (versus the Hu-1 reference) were ( Figure 6 . It is clear that there is considerable haplotype diversity through March and April. These haplotype patterns are consistent with the lineage and cluster patterns reported in Seemann et al 28 and in the sample we reported in Steele and Lindley. 14 For our immediate purposes here, note that the L241f.1 haplotype was prominent, but not dominant, in March 2020 and was at low frequency in late April and into first week of May. These markers, defining the T A B L E 3 SNV frequency and P-to-P spread among L241f.1vic haplotype sequences 5 June through 8 September 2020 L241f.1 sequence, then picked up sequence signals at low frequency from May 8 to 13 and maintained at low numbers right through to mid-June. Notice that by mid-June, most of the other haplotypes typical of the travellers entering Australia had dropped to low or undetectable numbers as numbers of international flights arriving in Australia were reduced. Then from the third week of June, right through to the end of the collection period, (Oct 8) the L241f.1 haplotype (as defined Figure 7) . We have no clear sequence-linked metadata, but a plausible assumption is these were causally associated with the daily reported flareups of case numbers (see spikes in Figure 1 , particularly through July) in the many Victorian SARS-CoV-2 case hotspots at aged care facilities across the inner and outer Western (and some Eastern) suburbs of Melbourne (Supporting Information File F). The flare ups in late June-early July are consistent with the wellpublicized outbreaks in Melbourne in the nine tower blocks in the north west of the city in Kensington. All these high population density towers, with many three generation families from migrant communities, were '.. put into "hard" lockdown on 4 July, with 3000 residents told not to leave their homes for five days. State police was brought in to ensure compliance, and as community transmission continued to grow, lockdown measures were applied across Melbourne'. 29 We then set out to establish whether or not the set of SARS-CoV-2 full-length sequences of the L241f.1 haplotype, as defined by the markers in Table 2 , is the same basic genomic sequence throughout the 1st and 2nd Waves. All the L241f.1 haplotypes available from March, April, May, June and at later sample collection times in July, August, September (we sampled groups across representative collection times as the numbers there were very large) were aligned against Hu-1 and the sequence alignments presented as VSD as shown in Supporting Information File D, and SNVs analysed as summarized in Table 3 for these collection times. The full set of sample times is in Supplementary Information File D which shows a more detailed trend in the VSD patterns over time, particularly in the L241f.1vic haplotype that went on to dominate sequence uploads exclusively from mid-June. The type of SNV change at a given VSD position relative to the Hu-1 reference sequence (C > T, A > G, T > C, G > A, G > T etc) can be deduced from the entries in the body of the table against the Hu-1 line at the top of each VSD figure (NC_045512). Note that a detailed compilation of the main types of SNV changes at APOBEC deamination Csites and ADAR deamination A-sites for isolates Jan-Mar 2020 from Northern Hemisphere hotspot zones (Wuhan, New York City, West Coast USA, Spain) can be found in Steele and Lindley (2020). 14 The VSD in Supporting Information File D show the sequence patterns of SNVs across the SARS-CoV-2 fulllength L241f.1 genomes collected in March 2020. Note the striking sequence conservation at the RNA and at the amino acid level. In this sample, 12/33 [36%] are unmutated and many are lightly mutated (15/33 [46%]) with 1-2 SNV per sequence. The sharing of sequences in the lightly mutated category can be seen. P-to-P transfers may well be evident in the unmutated set, but patient metadata would be required to establish that. These sequence data suggest that patients from SARS-CoV-2-infected Northern Hemisphere zones brought the unmutated L241f.1 sequence into Melbourne, and if they did transfer the sequence on by secondary P-to-P the recipient subject on average introduced one or two new SNVs into the sequence. This was also observed repeatedly in our first report. 14 The smaller numbers of more heavily mutated sequences (eg MT451254 with 8 SNVs) may indicate the endpoint of possibly two P-to-P transfers, and we assume in healthy subjects with active APOBEC and ADAR deaminase mutator responses. In the same vein, some of the unmutated sequences could be primary or secondary P-to-P infections circulating in subjects with deficits in Innate Immune deaminase mutagenesis. These sample VSD alignments (Supporting Information D) capture the VSD patterns from March, April though to May 19, and reveal the emergence of the L241f.1vic sequence in collections on May 13 (unmuted or lightly mutated). There are other interesting features in these data including sharing of sequences indicative of a number of P-to-P secondary transfers to at least 1 or 4 other recipient patients (close interacting groups such as a family?). Notice clear additional mutagenesis on secondary transfer in these groups, suggesting these recipients could be healthy subjects with functioning Innate Immune deaminase mutagenesis. The L241f.1vicm sequence identified did not flourish much further during June and early July (Supporting Information File D). This sample alignment (Supporting Information File D) illustrates features already discussed and shows that the L241f.1vic sequence begins to dominate the sequences uploaded to NCBI Virus through June. Note that a residue of L241f.1 sequences, initiated as secondary transfers and thus derivatives from incoming overseas travellers (?) display sequences that are heavily mutated, in contrast to many other sets (both L241f.1vicm and L241f.1vic sequences). What is notable about the L241f.1vicm and L241f.1vic sequences is largely the absence of mutation, or evidence of lightly mutated sequences on P-to-P transfer (Table 3 ). These sample alignments (Supporting Information D) show a continuation of the trend discussed (Table 3) . Thus, largely unmutated representatives of the L241f.1vic haplotype clearly dominate all uploaded sequence collections from infected subjects. Again, on secondary transfer of the unmutated sequence, one or two SNVs may be introduced but most are not further mutated or lightly mutated (Table 3 ). The alignments over these collection periods (Supporting Information File D) were for samples collected when the explosive epidemic was reaching its peak ( Figure 1 ). Whilst there are still many unmutated and lightly mutated l241f.1vic sequences (1-2 SNVs per sequence), there is also evidence of multiple P-to-P secondary transfers with a higher number of SNVs per sequence (Table 3 )this could plausibly suggest transfers in closed aged care facilities to healthier medical and carer staff (or other visitors) in such institutions. This latter group dominate sets of cases, outnumbering infected residents by about fifty per cent (Supporting Information File F). These alignments (Supporting Information File D) cover sequence collections as the epidemic was declining though August into September (right hand side of Figure 1 ). The increased SNV density pattern per sequence has now become far more pronounced. Whilst there are clearly some unmutated Ll241f.1vic sequences, the great majority are now mutated and evidence of secondary transfer of group sizes 2-4 are now evident in the data ( Table 3 ). The dominance of clearly multiply mutated sequences is plausible evidence of P-to-P transmission into and from healthy subjects introducing Innate Immune deaminasemediated mutations into the viral genome (which could also be introductions, or infections, into elderly co-morbid patients as well, but patient metadata would be required to identify such P-to-P transfers). These alignments (Supporting Information File D) cover sequence collections as the epidemic was in evident decline though August into September (right-hand side of Figure 1 ). The patterns of SNV and P-to-P transfers are very similar to that discussed above in Section 3.3.6 July 28 to August 23 ( Table 3) . The data just presented is either comprehensive or were selected to be representative and illustrative of the SARS-CoV-2 sequence changes in Victoria in the 1st and 2nd Waves (as made public at the NCBI Virus website). The wider set of SNV patterns showing this clear quantitative and qualitative trend in the VSD plots can be found in Supporting Information File D. The clear trend from mid-June to end of July is the amplification of the L241f.1vic haplotype clone virtually unmutated (or very lightly mutated) as summarized also in Table 3 . The uploaded sequence data at NCBI Virus for collections in Victoria, Australia, whilst substantial are clearly biased and do not report the complete numbers of known full-length genomes, estimated to be approximately 3500-4000 'Community Transmissions' or 'Mystery Cases' described above. For the period covering the entire 2nd Wave June 1-September 30, the numbers of these acquired cases from an unknown source totalled 3571 (from data in . That is, we do not know the extent of the types of haplotype diversity (see Table 2 ) or mutational loads in such haplotypes of such subjects, nor do we have information on such 'Mystery Cases' amongst residents and other cases in aged care facilities (see Supporting Information F). The clear biased deficit in numbers of putative mystery case SARS-CoV-2 genomic sequences, which would have differed significantly in haplotype sequence from the main amplified L241f.1vic haplotype (unmutated or lightly mutated), requires further comment. Either these collections were not sequenced: that is, contact tracing established by conventional means failed to identify a confirmed case contact X was considered sufficient evidence. Alternatively, they were routinely sequenced given the scale of the sequencing operation at The Peter Doherty Institute. This seems more plausible as all public commentary through June, July and August including testimony by lead epidemiologist at the Victorian Department of Health (Dr Charles Alpren) to the formal government inquiry into aged care outbreaks 27 directly implied that mystery cases were fulllength genomes sequenced to formally exclude them from the main known hot spot clusters in aged care. They were not made publicly available to GenBank/ NCBI Virus. We can only speculate that they were not made available as they were considered 'uninteresting sequences' in explaining the wild-fire spread of the main SARS-CoV-2 'clone' (L241f.1vic) through Victorian aged care and nursing facilities. This is a possible yet speculative explanation given the medical emergency at the time. Indeed as reported in the British Medical Journal, Professor Ben Howden, director of the Microbiological Diagnostic Unit Public Health Laboratory at The Peter Doherty Institute, told the government inquiry 29 in mid-August 2020.. '..that, of the 1837 cases of local transmission that have been sequenced since 8 May, 99.8% came from three clusters, one from the Rydges Hotel and two from the five star Stamford Plaza Hotel'. 29 We can plausibly associate the major L241f.1vic haplotype with the Rydges Hotel cluster, and the other minor haplotypes through May and June with the L241f.1vicm, L241a, L241c and L241f.1 haplotypic variants ( Figure 6 , Supporting Information Files B, D and section e. below, analysis of GISAID db available sequencing data). Yet everyday throughout July and August official public references were made (by the Premier of Victoria Hon. Daniel Andrews and Chief Health Officer Professor Brett Sutton) to the continuing high numbers of mystery cases as it was feared they were contributing also by allowing transmissions to aged care facilities causing infection cluster flare ups (the spikes in daily case numbers as seen in Figure 1 ). An independent check at the GISAID db (and GenBank db) on total Victorian sequences and their haplotypes ( Table 2 ) was conducted by Dr Jared Mamrot on the sequences uploaded by The Peter Doherty Institute for collections in the period June 1 through September 30. The magnitude of sequence numbers established in the analysis at NCBI Virus and the dominance of the L241f.1vic sequences in June, July, August and September were confirmed. In addition, the sporadic occurrences of other minor haplotypes are consistent with our above analyses, in June 2020. We have concluded that genomic sequences from the cases from an unknown cause, that is the expected large numbers of sequences quite different from the dominant L241f.1vic sequence, were not present in the GISAID repository at the time of submission of this paper. The haplotyping analysis is in Supporting Information File G. In this separate analysis, all Victorian GISAID and associated GenBank ID sequences (at both low and high coverage) allow estimates of the extreme haplotype dominance of the L241f.1vic haplotype (>99% of all sequences collected ) and the occurrence of other minor variants and haplotypes: the 18 nt in-frame deletion variant of L241f.1vic, from p.288879 removing the p.28881-3 marker ( Supplementary Information File B) ; the minor L241f.1vicm variant (n ~ 214) detected in May and June in GenBank/NCBI Virus uploaded data, which did not further appear in high numbers through July and August in sequence data uploaded to both GenBank/NCBI Virus and GISAID databases; and lower numbers of L241f n ~ 5, L241fc n ~ 41, L/Ln n ~ 6 and S with n ~ 1. In our considered view, as listed above, there are a number of scientific advantages in analysing the data on what exactly happened at the SARS-CoV-2 genomic sequence level during the 1st and 2nd Waves in Victoria, Australia. Thus, despite the extreme personal hardships, the SARS-CoV-2 epidemics in Australia enabled many types of observations based on a set of almost 'controlled conditions' or 'biomedical experiments of nature' (despite the biases in the data highlighted). For example, as we discuss below, it allows a considered reflection on how the 'UK Mutant' first arose to dominance and why this publicized highly putatively contagious SARS-CoV-2 mutant did not spread when it appeared in Australia in January 2021 brought in by multiple travellers from the UK into many different major Australian cities (Adelaide, Perth, Brisbane, Melbourne, Sydney). Using publicly available data and observations, we have recorded here some key genomic features of the 1st and 2nd Wave SARS-CoV-2 epidemics in Victoria that can be discerned which involves mainly the city of Melbourne and some nearby regional towns and cities to the north west (Macedon, Mitchell, Bendigo, Ballarat) and southwest of the city along Port Phillip Bay (Geelong, Colac) comprising upwards of 5 million people in a small area of land mass (as the greater eastern part of the state was basically virus free). The large numbers of publicly available SARS-CoV-2 genomes at the NCBI Virus database could be systematically analysed by the simple accessible alignment tools provided at that site. This has allowed us considerable insight into the mode of SARS-CoV-2 haplotype variation strategies and the major infection transmission mode of the virus in the Victorian aged care and elderly community. To summarize our main findings. a. The severity and quantitative patient dominance of the Victorian 2nd Wave epidemic in 2020 was focussed almost exclusively on elderly citizens (Tables 1 and 4) , presumed vulnerable and immune defenceless as discussed already due to clear deficits in first-line IFN -driven innate immunity. [4] [5] [6] [7] [8] [9] [10] [11] Many of these 'immune defenceless' elderly co-morbid citizens were concentrated in closed aged care communities (Supporting Information F and Table 4 ). Given that APOBEC and ADAR innate immune 'mutators' are activated by the same IFN type I and type III stimulated gene pathways [19] [20] [21] [22] [23] [24] [25] [26] it is a logical priority to consider the level of their mutagenic impact on the SARS-CoV-2 clones being transferred within and between aged vulnerable groups (eg VSD analysis in the Supporting Information D). b. It is clear that the L241f.1vic clone dominated the effective 'fire storm' of infections in the Victorian elderly. The spread of largely unmutated clones of this haplotype, particularly in second half June 2020 into late July 2020, is striking and suggests little or no APOBEC and ADAR mutagenesis on P-to-P transfers, and this plausibly implies the infections were rampant in patients with clear deficits in IFN type I and type III stimulated gene pathways (see in particular the VSD patterns in Supplementary Information Files D and F, and Table 3 summary). The data reported herein are thus consistent with the following P-to-P infection model which is also the operational hypothesis under test: clusters of immune defenceless elderly co-morbid citizens in many aged care and nursing facilities were all systematically struck with devastating force (high infection rates and death rates), with a single unmutated (or lightly mutated) SARS-CoV-2 haplotype variant (L241f.1vic). Through late June, July, August and September in 2020, this putatively cloned variant must have been spread unimpeded by carriers who were asymptomatic or lightly symptomatic infected healthcare professionals and carers working across multiple age care institutions. [30] [31] [32] [33] The large-scale amplifications of the L241.1vic variant-instanced by the size of the multiple 'New case' spikes (shown in Figure 1 ), particularly through July 2020-could have produced many trillions of L241f.1vic virions in each location thus contaminating numerous surfaces (fomites, personal effects of all types) and could have contaminated or infected human carriers in each institution. This then fuelled the further putative quantitative dominant rapid spread of this apparently capricious L241f.1vic variant into the local community and particularly to other aged care facilities leading to further putative viral amplifications in elderly co-morbid subjects. If anything, the Victorian experience underlines why elderly co-morbid citizens require very special care, protection and therapies during cold and flu seasons. 31, 32 Such an approach now seems obvious to handle the viral-induced respiratory crises in vulnerable aged care patients. There is a need for much further research on how to boost innate and specific adaptive immunity defences in such vulnerable patients-so as to achieve a survival infection outcomes in such vulnerable patients (see Sette and Crotty 10 integrated summary, and Lucas et al 11 ) . The fact that the L241f.1vic variant remained largely unmutated or very lightly mutated on secondary transfer (1-2 SNVs, Table 3 ) implies that the main amplifying source of viral replication in each aged care facility were patients with extreme deficits in first-line innate immune IFN type 1 and III dependent anti-viral immunity, see Figure 2C in Lucas et al 11 . The levels of both APOBEC and ADAR host deaminase anti-viral mutators are normally induced by invading pathogens in a healthy cellular innate immune response. [19] [20] [21] [22] [23] [24] [25] The lack of apparent mutation in the main transmitted L241f.1vic clone through June and July is consistent with the hypothesis that the virus was replicating almost unimpeded in immune defenceless elderly co-morbids. Indeed many travellers into Australia carrying the L241f.1 haplotype variant in March-April 2020, and their presumed putative secondary transfer recipients also display unmutated or lightly mutated sequences (Supporting Information File D). This suggests those incoming overseas travellers were probably also elderly immune defenceless patients. There is further support for this interpretation. Through August and into September, the incidence and frequency of mutations in mutated derivatives of the L241f.1vic haplotype increased, significantly, a pattern consistent with P-to-P transfer and passage of the L241f.1vic clone through healthy asymptomatic carriers with intact innate immune system and functioning APOBEC/ADAR mutators (Table 3 , Supporting Information File D). Our further hypothesis is that the L241f.1vic variant, which emerged as an escape haplotype variant from a hotel quarantine site on May 8 (Rydges Hotel) 27 gained a low level P-to-P foothold in the community for a month through late May and into mid-June where upon it was fortuitously and explosively amplified and spread on scale when introduced into closed groups of immune defenceless elderly co-morbid patients living in the Kensington Towers in north west Melbourne and aged care facilities. 29 Through late June and into July, its large amplification factors and numerous cross-institutional spreading 'vectors' (asymptomatic family members, aged carers and health workers, and their presumed fomites) ensured it took over and spread quickly through all of Melbourne's and nearby regional aged care facilities. The sequencing effort by the Victorian Department of Health in association with The Peter Doherty Institute and epidemiological interpretation was thus focussed on the L2411f.1vic haplotype to try and prevent its further spread. Our further related hypothesis is that the much publicized 'UK Mutant' (B.1.1.7) that emerged in the United Kingdom in the latter months of 2020 spread and came to dominance in much the same way as L241f.1vic in Victoria, Australia-a capriciously amplified SARS-CoV-2 variant in closed communities of immune defenceless elderly co-morbid patients. The UK Mutant is of the L241f.1 haplotype (L241f.1uk)-but different from L241f.1vic which has a smaller number (n = 7) of 'riboswitch' type key distinctive SNV markers that distinguish it from L241f.1 (Table 2 )-with about 17-19 additional sequence changes from L241f.1, four being deletions and one a STOP codon (Rambaut et al 16 ) . Apart from distinctive radicle changes, all the other SNV changes in L241f.1uk appear conserved or benign, a picture consistent with all our sequence analysis here and in Steele and Lindley. 14 The advance news media notice on the 'UK Mutant' as it arrived with travellers into Australia in early 2021 described the variant as a 'highly contagious rapidly spreading mutant'. On arrival with travellers from overseas, it inevitably escaped quarantine into the community and in public airport domestic departure lounges in Adelaide, Brisbane, Perth and then Melbourne exposing potentially many hundreds of Australian residents and other travellers. These escapes each caused severe hard Stage 4 lock downs (2-5 days) of the entire city populations, and in Victoria and Western Australia almost the entire State populations for 5 days. Given all the advance publicity and the hundreds of putative reported P-to-P contacts and opportunities for further P-to-P spreading (and further interstate by aeroplanes by many more hundreds of putative contacts), the UK Mutant was not contagious and did not spread into the wider Australian community. 34 Clearly, further research is required to understand exactly why the 'UK Mutant' did not spread when multiply introduced via multiple entry points into Australia in the first few months of 2021. One possibility is that it really was not that contagious in normal healthy people, it had been attenuated but behaved as a putative capriciously amplified variant in elderly co-morbids. That is clearly dangerous for that vulnerable group (which by chance were not in large numbers in the public places the infected travellers were supposed to have visited). In our view, what has unfolded with the L241f.1vic, as documented here, is a clear putative exemplar. Our hypothesis is that the 'UK Mutant' (B.1.1.7) or L241f.1uk haplotype was a capricious P-to-P generated SARS-CoV-2 variant haplotype that was fortuitously amplified on scale to quantitative clonal dominance in and between aged care facilities in southern England from September 2020, much like the L241f.1vic haplotype in Victoria, Australia. The main difference between the otherwise similar countries (both economically, culturally and multi-ethic diverse composition) is one of population scale given the population differential between UK and Victoria is about ten-fold, and given similar land areas the population density in the UK would be far higher than Victoria. Given all these findings what then of appropriate vaccine design? Apart from any safety or adverse reaction issues with the current set of spike protein mRNA expression vector vaccines, there are, in our opinion, three main issues to be addressed, two arise from our analysis. These relate in general to the vulnerable target group to be protected-the immune defenceless elderly co-morbidsand the type of protective acquired immunity to be induced by any vaccine (the relevant vaccine data is under detailed review elsewhere Gorczynski RM, Lindley RA, Steele EJ, Wickramasinghe NC. 2021 'Nature of acquired immune responses, epitope specificity and resultant protection from SARS-CoV-2', under submission). First, boosting innate immunity would seem a priority medical strategy in the vulnerable elderly, as for example BCG vaccination is expected to do for the lung infection tuberculosis. This could involve the use of IFNs to stimulate deaminase expression during the acute stage of infection. BCG is a safe and widely accepted non-specific activator of innate immunity, and it could contribute to elevated APOBEC and ADAR expression and thus mutagenesis of the coronavirus genome. This logically implies that strategies for boosting or elevation, by 'training' the innate immune responses 7,12 may boost mutagenic APOBEC and ADAR expression levels and other anti-viral gene products. 19, 20 Second, it is still unclear whether the current crop of SARS-CoV-2 vaccines is inducing the appropriate class of acquired immunity, namely mucosal responses involving specific secretory IgA responses in nose, saliva and respiratory tract (A. Hapel, discussion and person comm. February 2021). This is also confirmed by the April 20 2021 update report on vaccine efficacy from the US Centers for Disease Control. 36 A careful reading of that report suggests that none of the current 'jab in the arm' vaccines protect against catching SARS-CoV-2, yet may moderate severity. This is consistent with prior historical immunology discoveries. It has been known for over 45 years that the best form of protective immunity for pathogens invading by the nasal or oral route are local secretory IgA responses. 37, 38 Therefore, at this juncture, it is not clear whether currently deployed intramuscular (im) parenteral immunization is the best approach for highly effective mucosal immunity against SARS-CoV-2, and the oral-nasal routes of vaccination should be considered. Indeed, recent informed analysis and commentary supports this conclusion. 39, 40 Natural infections with SARS-CoV-2 (in recovered patients) would therefore be expected to induce protective dimeric sIgA mucosal immunity. Certainly, the recent longitudinal population scale study in Denmark implies that prior infection with SARS-CoV-2 affords upwards of 80% protection in the population under 65 years against reinfection between the first and second major surges of SARS-CoV-2 in Denmark in 2020; with the protective rate in the re-infected elderly vulnerable group a half lower again at 47%. 13 These are encouraging findings suggesting, at the time of writing, that natural 'herd immunity' could be well underway in Denmark and similar Northern hemisphere infected zones in 2020 and into likely surges and waves of SARS-CoV-2 in 2021. Finally, a striking feature of the adaptation strategy of the SARS-CoV-2 coronavirus is the signature of 'haplotype ribo-switching', focussed mainly on C-site and A-site centred motifs targeted by APOBEC and ADAR deamination events. 14 We expect this strategy to be general for coronavirus infections (it is evident in the additional SNVs over the L241f.1 haplotype in the L241f.1vic variant). Thus, whilst some variation can occur at the amino acid level, it is largely conserved as discussed 14 and as illustrated in the haplotyping data of the VSD presented here (Supporting Information File D). The predominant conclusion is the large amount of viral variation that goes on at the RNA level with presumptive links to functional RNA secondary structures, that is downstream consequences for appropriate RNA secondary structures compatible with viral replicative fitness. The striking sequence conservation at the amino acid level after human passage 14 implies that there are key conserved variations mainly at the RNA genomic level. Thus, most recorded SNVs were either synonymous or, if non-synonymous, the SNVs in coding regions lead often to likely functionally benign amino acid outcomes and thus result in likely conservation of viral function at the protein level. This was a striking fact across all data sets examined 14 and in the data reported here (Supporting Information File D). Clearly, all these observed viral sequences survived Darwinian negative selection within each infected subject against malfunctioning and poorly replicating variants in the quasi-species pool as discussed. 14 Targeted mutagenesis studies 41,42 on viral genomes affirm this conclusion. That data demonstrate how sensitive and tightly tuned information-rich single-stranded viral genomes really are to any non-synonymous substitutions of many types. In Hepatitis C Virus (HCV) studies, another similar approach has demonstrated how targeted substitutions in the coding regions of the HCV genome reveals a functional network of RNA regulatory structures important in efficacy of viral replication and infectivity OK. 43 Therefore, new vaccine designs and therapies may need to target these new and putative stable secondary structures thrown up by haplotype ribo-switching for the presumptive replicative stability of the viral RNA genome. This type of RNA secondary structure research and analysis is underway (Lindley RA and associates). This will obviously involve an entirely different approach to vaccine RNA target site design and curative therapies, which will necessitate further research on RNA structure rather than Clinical features of patients infected with 2019 novel coronavirus in Wuhan Wuhan seafood market may not be source of novel virus spreading globally Timing the SARS-CoV-2 index case in Hubei province Dysregulation of type I interferon responses in COVID-19 Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19 Impaired type I interferon activity and exacerbated inflammatory responses in severe Covid-19 patients medRxiv preprint Trained Immunity: A tool for reducing susceptibility to and the severity of SARS-CoV-2 infection Inborn errors of type I IFN immunity in patients with life-threatening COVID-19 Antigen-specific adaptive immunity to SARS-CoV-2 in acute COVID-19 and associations with age and disease severity Adaptive immunity to SARS-CoV-2 and COVID-19 Longitudinal analyses reveal immunological misfiring in severe COVID-19 Defining trained immunity and its role in health and disease Ethelberg S Assessment of protection against reinfection with SARS-CoV-2 among 4 million PCR-tested individuals in Denmark in 2020: A population-level observational study. The Lancet Analysis of APOBEC and ADAR deaminase-driven Riboswitch Haplotypes in COVID-19 RNA strain variants and the implications for vaccine design A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations Phylogenetic network analysis of SARS-CoV-2 genomes Coronavirus RNA proofreading: Molecular basis and therapeutic targeting Interferon-stimulated genes and their antiviral effector functions Interferon-stimulated genes: A complex web of host defenses Adenosine deaminases acting on RNA (ADARs) are both antiviral and proviral The AID/APOBEC family of nucleic acid mutators DNA deamination mediates innate immunity to retroviral infection The APOBEC3 family of retroelement restriction factors Massive APOBEC3 editing of hepatitis B viral DNA in cirrhosis Review of the mutational role of deaminases and the generation of a cognate molecular model to explain cancer mutation spectra COVID-19 Hotel Quarantine Inquiry Tracking the COVID-19 pandemic in Australia using genomics Covid-19 in Australia: most infected health workers in Victoria's second wave acquired virus at work Workplace coronavirus transmission driving Victorian case numbers, including in aged care Experts criticise Australia's aged care failings over COVID-19. The Lancet Epidemiology and clinical features of COVID-19 outbreaks in aged care facilities: A systematic review and meta-analysis Rose T Carer tracking lacking. Herald-Sun Spread of UK coronavirus variant limited to close contacts. The Australian Why didn't others get the hotel virus. The Australian CDC Update: Science Brief: Background Rationale and Evidence for Public Health Recommendations for Fully Vaccinated People Centers for Disease Control and Prevention Isolation and biological properties of three classes of rabbit antibody to Vibrio Cholerae Efficiency of antibody classes in cholera immunity COVID-19 vaccines may not prevent nasal SARS-CoV-2 infection and asymptomatic transmission Response to: COVID-19 re-infection vaccinated individuals as a potential source of transmission The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus Viral mutation rates The coding region of the HCV genome contains a network of regulatory RNA structures We thank Dr Jared Mamrot of GMDxgen Pty Ltd for the independent analysis of the haplotype sequences of Victoria Australia genomes made publicly available at the GISAID database. We thank Dr Andrew Hapel and Professor Patrick Carnegie for discussions and references on secretory IgA mucosal immunity. There is no conflict of interest with the scientific objectivity of this paper. RAL and EJS conceived the plan, aim and organization of the paper, and EJS wrote the first draft of the paper. EJS did the primary analysis of GenBank/NCBI Virus accessed COVID-19 genomic sequences. Both authors read and worked on the final version of the paper before submission by EJS.