key: cord-0742933-j1uhsod7 authors: Chandran, A.; Rosenheim, J.; Nageswaran, G.; Swaddling, L.; Pollara, G.; Gupta, R.; Guerra-Assuncao, J. A.; Woolston, A.; Ronel, T.; Pade, C.; Gibbons, J.; Sanz-Magallon Duque De Estrada, B.; Robert de Massy, M.; Whelan, M.; Semper, A.; Brooks, T.; Altmann, D. M.; Boyton, R. J.; McKnight, A.; Manisty, C.; Treibel, T. A.; Moon, J.; Tomlinson, G. S.; Maini, M. K.; Chain, B. M.; Noursadeghi, M.; investigators, COVIDsortium title: Non-severe SARS-CoV-2 infection is characterised by very early T cell proliferation independent of type 1 interferon responses and distinct from other acute respiratory viruses. date: 2021-03-31 journal: nan DOI: 10.1101/2021.03.30.21254540 sha: 7424e5cd4f6c87f615a871be4f4e3597abe606ad doc_id: 742933 cord_uid: j1uhsod7 The correlates of natural protective immunity to SARS-CoV-2 in the majority who experience asymptomatic infection or non-severe disease are not fully characterised, and remain important as new variants emerge. We addressed this question using blood transcriptomics, multiparameter flow cytometry and T cell receptor (TCR) sequencing spanning the time of incident infection. We identified a type 1 interferon (IFN) response common to other acute respiratory viruses, and a cell proliferation response that discriminated SARS-CoV-2 from other viruses. These responses peaked by the time the virus was first detected, and in some preceded virus detection. Cell proliferation was most evident in CD8 T cells and associated with rapid expansion of SARS-CoV-2 reactive TCRs. We found an equally rapid increase in immunoglobulin transcripts, but circulating virus-specific antibodies lagged by 1-2 weeks. Our data support a protective role for rapid induction of type 1 IFN and CD8 T cell responses to SARS-CoV-2. The host response in non-severe SARS-CoV-2 infection during the first epidemic wave, prior to vaccination, incorporates the mechanisms of effective host-defence in naïve populations. To date, our knowledge has been limited to immune responses after the detection of the virus or onset of symptoms, and to cross-sectional studies in which the time of infection was undefined. As a result, the temporal kinetics and relationships between the earliest immune responses to infection are not known. These early events, among the majority that experience asymptomatic infection or mild disease not requiring hospitalisation, may provide new insights into the determinants of immune protection in naïve populations that may also be relevant to emerging variants that escape vaccine mediated protection. We sought to address this question at systems level by genome-wide transcriptional profiling of weekly blood samples before, during and after incident SARS-CoV-2 infections during the first epidemic wave in London, and compared our findings with responses to other acute respiratory viruses using publicly available data from human challenge experiments. We undertook a nested case-control study derived from a cohort of 400 healthcare workers at one London hospital recruited from 23 rd March 2020 to undergo weekly nasopharyngeal swab PCR tests and blood sampling when fit to attend work, as previously described [1] [2] [3] [4] [5] . In this cohort, we detected 45 incident infections by PCR. Among these cases, we obtained 114 blood transcriptional profiles from 41 individuals spanning three weeks before to three weeks after the first PCR positive result, including 12 individuals for whom samples were available before the first positive PCR. We also profiled convalescent samples from 16/41 individuals 5-6 months later. We compared these data to blood transcriptional profiles obtained from baseline samples in 55 sequential uninfected controls who remained PCR and is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 31, 2021. ; https://doi.org/10.1101/2021.03.30.21254540 doi: medRxiv preprint significantly different from uninfected controls from the week before the first positive PCR to three weeks afterwards. Six month convalescent samples from a subset of these individuals were not significantly different to uninfected controls, indicating that the blood transcriptome had fully reverted to the baseline. To investigate the host response to infection, we identified differentially expressed transcripts by comparison of profiles from the time of first positive viral PCR, to those of uninfected controls ( Figure 1b ). These were subjected to upstream regulator enrichment analysis to identify molecular pathways predicted to be activated at the level of cytokines, transmembrane receptors, kinases and transcription factors that may be responsible for differential gene expression (Supplementary File 1). We filtered groups of target genes associated with each upstream regulator to include only those that had significantly greater co-correlated expression than would be expected at random in our blood transcriptomes, in order to increase our confidence that these represent co-regulated genes in a given molecular pathway (Supplementary Figure 4) . Among those that were retained, the associated upstream regulators formed two clusters resulting from overlapping associations with target genes (Figure 1c is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Figure 6a) , giving a measure of the consistency of both these responses in infected individuals. Despite this and the overlap in the temporal profiles of these two responses, the enrichment of STAT1 and CCND1-regulated modules representing each response at the individual participant level, did not correlate, suggesting that they may be independently regulated or subject to idiosyncratic capacity for each of these responses at the level of individual participants (Figure 2c ). The same observation was evident for differentially expressed genes combined as modules associated with each of the upstream regulators that reflected type 1 IFN or cell proliferation modules (Figure 2d ). Next, we compared the type 1 IFN and cell proliferation response to incident SARS-CoV-2 infection with those of other acute respiratory viruses, by comparing the peak expression of the STAT1 and CCND1regulated modules in our cohort to that of publicly available longitudinal blood transcriptomic data is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint To further evaluate the rapid T cell response to SARS-CoV-2 infection, we undertook sequencing of TCR alpha and beta chains in longitudinal samples to reflect dynamic changes in the T cell clonal repertoire. An expanded clone will increase or decrease in frequency depending on the sampling time point before and after the peak response. Therefore, we identified expanded TCR sequences as being is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The finding that the CCND1-regulated module did not correlate with our B cell signature does not exclude a B cell response. Emergence of antibodies to SARS-CoV-2 has been reported as early as early as five days after symptom onset 8 . We found increased expression of immunoglobulin (Ig) constant heavy and light chain transcripts, which peaked at the time of first PCR virus detection but was evident from one week before to two weeks after first PCR detection (Figure 5a -b). The increase in Ig gene expression in blood was less sustained than TCR expansion, and returned to baseline by three weeks after the first positive PCR. In contrast, circulating antibodies to SARS-CoV-2 S1 spike protein that correlate with virus neutralisation were not detectable until one week after the incident infection ( Figure 5c ) and continued to increase in this cohort for eight weeks 3, 4 . To the best of our knowledge, we report the earliest in vivo immune responses to SARS-CoV-2 infection available to date, enabled by serial sampling of individuals at risk of infection during the peak of the first epidemic wave in London. The general paradigm for early antiviral host defence is dominated by induction of type 1 IFNs. Attenuated responses as a result of autoantibodies to type 1 IFNs, and genetic polymorphisms associated with reduced expression of a type 1 IFN receptor subunit or with reduced expression of the IFN-inducible oligoadenylate synthetase (OAS) gene cluster have all been associated with severe disease 9,10 . These provide strong evidence that type 1 IFN responses contribute to effective protection against SARS-CoV-2 infection. We show that type 1 IFN responses can precede PCR detection of the virus and therefore may exert their protective effects in the earliest phases of infection, independent of symptoms. We propose that such early detection of IFN-inducible genes in the blood transcriptome may arise from localised immune responses as a result of leukocyte trafficking through lymphoid tissues or the site of infection, and may provide greater sensitivity than detection of circulating IFNs. As we have previously reported, an additional translational application of this finding is the detection of IFN-inducible transcripts in blood, as a diagnostic biomarker of early viral infection that may precede PCR detection of the virus and symptoms 11 . Alongside type 1 IFN responses, we detected an early cell proliferation response in the blood transcriptome, which we primarily attribute to CD8, and to a lesser extent CD4 T cell proliferation by correlation with cell-type specific transcriptional modules, corroborated by flow cytometry to show . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 31, 2021. ; https://doi.org/10.1101/2021.03.30.21254540 doi: medRxiv preprint significant increase in Ki67 positive CD8 T cells and TCR sequencing to show expansion of T cell clones. Whilst type 1 IFN responses were evident in a range of other acute respiratory virus infections modelled in human challenge experiments 6 , the early T cell response to SARS-CoV-2 in our study was significantly greater than in other viral infections. By comparison with emerging databases of SARS-CoV-2 specific TCRs in VDJdb, we were able to show that expanded T cell clones were most enriched for SARS-CoV-2 reactive cells, and that these were already evident by the time of first positive virus PCR. In individuals with COVID-19, T cell reactivity has been reported as early as 5-10 days after the onset of symptoms 12 . Importantly, in one report, T cell proliferative responses to SARS-CoV-2 were evident in 92% of family contacts of COVID-19 cases independently of serostatus 13 , and some people may have pre-existing cross-reactive T cells arising from previous seasonal coronavirus exposure [13] [14] [15] [16] [17] [18] . These may be expected to contribute to early viral clearance, analogous to findings in infuenza [19] [20] [21] . If this were the primary driver of rapid T cell responses to SARS-CoV-2 infection, the fact that the early proliferative response discriminated infected and uninfected individuals with an AUROC of 0.92 would require pre-existing T cell priming to be a near ubiquitous feature of asymptomatic or non-severe infection. Consistent with this hypothesis, among the largest studies of pre-pandemic blood samples, heterologous T cell reactivity to SARS-CoV-2 peptides with proven similarity to those of pre-existing seasonal coronaviruses has been reported in 81% 18 . In this context, we hypothesise that the variation in T cell proliferative response and the lack of its correlation with type 1 IFN responses may be explained by differential levels of T cell priming in individual participants. We also identified a similarly rapid B cell response represented by transient enrichment of Ig gene expression in blood. We interpret this to represent the transit of activated antigen specific B cells from lymphoid tissues to the predominant site of antibody production in the bone marrow and spleen. Since protective anti-S1 antibodies only became detectable after a two-week lag, we hypothesise that the B cell response may have had a less important role in rapid viral clearance in asymptomatic and non-severe infection. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 31, 2021. ; for an extremely small fraction of expanded sequences and does not exclude proliferation of bystander T cells. However, substantially lower levels of enrichment of CMV or EBV specific TCRs among expanded clones, and the lack of enrichment for IFN activity or other signatures of T cell activation in the blood transcriptome argue against generalised bystander T cell activation. Future re-analysis of the data (as more sequences across a wider range of HLA haplotypes are reported) will be necessary to evaluate whether the majority of expanded TCRs are ultimately found to recognise SARS-CoV-2. The focus on the blood compartment meant that we do not have direct measurements of responses at the site of host-pathogen interactions. Analysis of bulk RNA samples for transcriptional profiling and TCR sequencing restricted our ability to evaluate transcriptional heterogeneity at the cellular level, further characterise expanded T cell clones or undertake TCR analysis with paired alpha/beta chains. Most importantly, since less than 5% of infections lead to hospitalisation 22 , our study design precluded comparison of severe and non-severe outcomes that would require substantially greater sample size. Nonetheless, our data reflect immune responses in asymptomatic and non-severe infection, which incorporate correlates of effective host defence to natural infection in a naïve population, providing further evidence for the importance of early type 1 IFN and T cell responses. Human challenge experiments that control for variation in time and dose of exposure will offer the best opportunities to acquire the granular detail of early immune responses. Larger scale studies will be required to asses frequency of SARS-CoV-2 T cell reactivity in naïve populations, and determine whether early type 1 IFN or T cell responses predict outcomes. Although vaccine-roll out is likely to be the primary immunological strategy to control the pandemic 23 , understanding the determinants of effective natural immunity will remain a critical objective to enable risk stratification and novel vaccine design as the virus evolves. In particular, identification of the antigenic determinants of the earliest T cell responses in asymptomatic SARS-CoV-2 infection is a priority to inform development of potential universal coronavirus vaccines. The study was approved by a UK Research Ethics Committee (South Central -Oxford A Research Ethics Committee, reference 20/SC/0149). All participants provided written informed consent. We undertook a case control study nested within our COVIDsortium health care worker cohort. Participant screening, study design, sample collection, and sample processing have been described in is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint For 'cases', we included all available RNA samples, including convalescent samples at week 24 of follow-up for a subset of participants. For uninfected controls, we included baseline samples only. Genome wide mRNA sequencing was performed as previously described, 27 (Supplementary Figure 2c) . Molecular degree of perturbation (MDP) was calculated as previously described 32 . Briefly, transcripts were included if more than one sample had a TPM count above the limit of detection, and the standard deviation (SD) of TPM among uninfected controls was>0.5. The TPM values for each individual data set were then transformed to a Z score using the mean and is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 31, 2021. ; https://doi.org/10.1101/2021.03.30.21254540 doi: medRxiv preprint SD for each transcript among uninfected controls used as a standard reference. The MDP of each sample/data set was then represented as the sum of all Z scores>2 divided by the total number of transcripts. Differential gene expression between data sets from individuals with co-incident infection and non-infection controls was identified using a Mann-Whitney test with false discovery rate <0.05 and absolute fold difference >1.5 (or Log2 0.585). Analysis of upstream transcriptional regulation of the differentially expressed genes was performed using Ingenuity Pathway Analysis (Qiagen, Venlo, The Netherlands) and visualised as network diagram using the Force Atlas 2 algorithm in Gephi v0.9.2 33 . We depicted all statistically over-represented molecules (false discovery rate <0.05), predicted to be upstream of >2 target genes, and annotated with one of the following molecular functions: cytokine, transmembrane receptor, kinase and transcriptional regulator, representing the canonical components of molecular pathways responsible for transcriptional reprogramming in immune responses. The biological pathways represented by the upstream regulators were identified by Reactome pathway enrichment analysis using XGR 34 as previously described 35, 36 . For visualization, 20 pathway groups were identified by hierarchical clustering of Jaccard indices to quantify similarity between the gene compositions of each pathway. For each group, the pathway with the largest total number of genes was then selected to provide a representative annotation. To identify co-regulated gene networks used as transcriptional modules, we calculated the average correlation coefficient for pairwise correlations of the expression levels of each group of target genes associated with predicted upstream regulators in our transcriptomic data set, and compared this to the distribution of average correlation coefficients obtained from random selection of equivalent sized groups of genes repeated 100 times. Groups of target genes with average correlation coefficients that exceeded the mean of the distribution of equivalent sized randomly selected groups by ≥2 SD (z-score ≥2) with false discovery rate <0.05 were identified as transcriptional modules representing the functional activity of the associated upstream regulator (Supplementary Figure 3) . Independently derived Type 1 and Type 2 interferon inducible modules and cell-type specific transcriptional modules were described previously 35, 37, 38 . To derive an independent cell proliferation module, PBMC were isolated from BCGvaccinated individuals and stimulated in vitro with 100 ng/ml purified protein derivative (PPD) for 6 days to drive proliferation of antigen specific T cells. Stimulated and unstimulated PBMC were subjected to transcriptional profiling, differential gene expression and Reactome pathway enrichment analysis as previously described 37, 39 . Differentially enriched transcripts annotated to the "Cell Cycle" Reactome is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint (Supplementary table 3) were used to derive a transcriptional signature for T cell proliferation. The expression of each module was represented by the geometric the mean Log2 TPM value of its constituent genes. Publicly available data from previously published human viral challenge studies were downloaded from GEO (GSE73072). We calculated module scores for the STAT1 and CCND1 modules as the mean expression across all constituent genes, using log2-transformed microarray data. Only participants who developed evidence of infection following inoculation were included, as per the original study definitions 6 . The peak enrichment of STAT1 and CCND1-regulated modules for each infected individual was calculated was represented by the highest log2 TPM ratio to the mean of uninfected controls, across the time course of each data set (Supplementary Figure 6 ). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The α and β chains of the TCR repertoire were sequenced from all time points for which RNA was available within the first 4 weeks of the study.for all participants who were PCR+ at any time point, and for six randomly selected individuals who remained PCR-and seronegative throughout the study. The pipeline introduces unique molecular identifiers attached to individual cDNA molecules which allows correction for sequencing error PCR bias, and provides a quantitative and reproducible method of library preparation. Full details for both the experimental TCRseq library preparation and the subsequent TCR annotation (V, J and CDR3 annotation) using Decombinator V4 have been described previously [40] [41] [42] The Decombinator software is freely available at https://github.com/innate2adaptive/Decombinator. Expanded TCRs were defined as any TCR which changed significantly between any two time points (Supplementary Figure 8) . The boundaries (shown as blue dotted lines) were defined as the maximum TCR abundance which might be observed at time 2, given its abundance at time 1, assuming Poisson distribution of counts with p < 0.0001, to give a false discovery rate of <1 in 1000. TCR abundances are normalised for total number of TCRs sequenced in each sample, and expressed as counts/million. MAIT TCRs were defined as any TCR alpha containing TRAV1-2 paired with TRAJ12, TRA20 or TRAJ33. iNKT TCRs were defined as TCRs containing TRAV24 paired with TRAJ18. The VDJdb database (https://vdjdb.cdr3.net/about) was searched for any TCR annotated for CMV, EBV or SARS-Cov-2. TCRs annotated for multiple antigens were excluded. This set of antigen-associated TCRs were then compared to our set of expanded TCRs defined as described above. Applications for access to the individual participant de-identified data (including data dictionaries) and samples can be made to the access committee via an online application https://covidconsortium.com/application-for-samples/. Each application will be reviewed, with decisions to approve or reject an application for access made on the basis of (i) accordance with participant consent and alignment to the study objectives (ii) evidence for the capability of the applicant to undertake the . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 31, 2021. ; https://doi.org/10.1101/2021.03.30.21254540 doi: medRxiv preprint specified research and (iii) availability of the requested samples. The use of all samples and data will be limited to the approved application for access and stipulated in the material and data transfer agreements between participating sites and investigators requesting access. RNAseq data, TCR sequencing data and associated essential metadata will be made publicly available at time of peer-reviewed publication. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint (A) Enumeration of expanded TCR alpha chain abundance (per million total sequences) in non-infection controls and samples from infected individuals stratified by time from first positive PCR. Individual data points shown with violin plots depicting median, IQR and frequency distributions. (*FDR<0.05 by Kruskal-Wallis Test for each group compared to NIC). (B) Correlation of CCND1 module and (C) STAT1 module with TCR alpha-chain sequences (Log2 per million sequences). Regression lines shown in red, with r and p values for Spearman rank correlations. (D) Number of antigen-specific TCR sequences (alpha and beta-chains) for SARS-CoV-2, cytomegalovirus (CMV) and Epstein-Barr Virus (EBV) among expanded TCR sequences in all time points (-3 to +3 weeks) from individuals with SARS-CoV-2 infection and among non-expanded TCRs from the same samples, giving the odds ratio (±95% confidence interval, Fisher's exact test) for enrichment of antigen specific TCR sequences in each case. (E) Frequency heat map of individual (rows) SARS-CoV-2 reactive TCRs (alpha and beta-chains) identified among expanded TCR sequences in all time points (-3 to +3 weeks) from individuals with SARS-CoV-2 infection. NA=no sample available; ND=not detected in sample). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 31, 2021. ; https://doi.org/10.1101/2021.03.30.21254540 doi: medRxiv preprint COVID-19: PCR screening of asymptomatic health-care workers at London hospital Asymptomatic health-care worker screening during the COVID-19 pandemic -Authors' reply Time series analysis and mechanistic modelling of heterogeneity and sero-reversion in antibody responses to mild SARS-CoV-2 infection Discordant neutralizing antibody and T cell responses in asymptomatic and mild SARS-CoV-2 infection Healthcare Workers Bioresource: Study outline and baseline characteristics of a prospective healthcare worker cohort to study immune protection and pathogenesis in COVID-19 An individualized predictor of health and disease using paired reference and target samples VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium Antigen-Specific Adaptive Immunity to SARS-CoV-2 in Acute COVID-19 and Associations with Age and Disease Severity Autoantibodies against type I IFNs in patients with life-threatening COVID-19 Genetic mechanisms of critical illness in Covid-19 Blood transcriptional biomarkers of acute viral infection for detection of Robust T Cell Immunity in Convalescent Individuals with Asymptomatic or Mild COVID-19 SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls Targets of T Cell Responses to SARS-CoV-2 Coronavirus in Humans with COVID-19 Disease and Unexposed Individuals SARS-CoV-2-reactive T cells in healthy donors and patients with COVID-19 Selective and cross-reactive SARS-CoV-2 T cell epitopes in unexposed humans SARS-CoV-2-derived peptides define heterologous and COVID-19-induced T cell recognition Cellular immune correlates of protection against symptomatic pandemic influenza Preexisting influenza-specific CD4 + T cells correlate with disease protection against influenza challenge in humans Natural T Cell-mediated Protection against Seasonal and Pandemic Influenza. Results of the Flu Watch Cohort Study The 2020 SARS-CoV-2 epidemic in England: key epidemiological drivers and impact of interventions BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting Discordant neutralizing antibody and T cell responses in asymptomatic and mild SARS-CoV-2 infection Blood transcriptional biomarkers for active pulmonary tuberculosis in a highburden setting: a prospective, observational, diagnostic accuracy study Near-optimal probabilistic RNA-seq quantification Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis Surrogate Variable Analysis Assessing the Impact of Sample Heterogeneity on Transcriptome Analysis of Human Diseases Using MDP Webtool ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software XGR software for enhanced interpretation of genomic summary data, illustrated by application to immunological traits Persistent T Cell Repertoire Perturbation and T Cell Activation in HIV After Long Term Treatment Microinvasion by Streptococcus pneumoniae induces epithelial innate immunity during colonisation at the human mucosal surface Exaggerated in vivo IL-17 responses discriminate recall responses in active TB Validation of Immune Cell Modules in Multicellular Transcriptomic Data Vivo Molecular Dissection of the Effects of HIV-1 in Active Tuberculosis Quantitative Characterization of the T Cell Receptor Repertoire of Naïve and Memory Subsets Using an Integrated Experimental and Computational Pipeline Which Is Robust, Economical, and Versatile An Economical, Quantitative, and Robust Protocol for High-Throughput T Cell Receptor Sequencing from Tumor or Blood B. Decombinator V4 -an improved AIRR Funding for COVIDsortium was donated by individuals, charitable Trusts, and corporations including Goldman Sachs, Citadel and Citadel Securities, The Guy Foundation, GW Pharmaceuticals, Kusuma Trust, and Jagclif Charitable Trust, and enabled by Barts Charity with support from UCLH Charity. RKG