key: cord-0737518-t1ih69zo authors: Ng, Kevin W.; Attig, Jan; Bolland, William; Young, George R.; Major, Jack; Wrobel, Antoni G.; Gamblin, Steve; Wack, Andreas; Kassiotis, George title: Tissue-specific and interferon-inducible expression of non-functional ACE2 through endogenous retroelement co-option date: 2020-12-01 journal: Nat Genet DOI: 10.1038/s41588-020-00732-8 sha: f954eb54a724175d8421a0626ed74ddfec838106 doc_id: 737518 cord_uid: t1ih69zo Angiotensin-converting enzyme 2 (ACE2) is an entry receptor for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and a regulator of several physiological processes. ACE2 has recently been proposed to be interferon-inducible, suggesting that SARS-CoV-2 may exploit this phenomenon to enhance viral spread and questioning the efficacy of interferon treatment in Coronavirus disease 2019 (COVID-19). Using a recent de novo transcript assembly that captured previously unannotated transcripts, we describe a novel isoform of ACE2, generated by co-option of intronic retroelements as promoter and alternative exon. The novel transcript, termed MIRb-ACE2, exhibits specific expression patterns across the aerodigestive and gastrointestinal tracts and is highly responsive to interferon stimulation. In stark contrast, canonical ACE2 expression is unresponsive to interferon stimulation. Moreover, the MIRb-ACE2 translation product is a truncated, unstable ACE2 form, lacking domains required for SARS-CoV-2 binding and is therefore unlikely to contribute to or enhance viral infection. Interferons represent the first line of defence against viruses in humans and other jawed vertebrates 1 . Recognition of viral products in an infected cell results in autocrine and paracrine signalling to induce an antiviral state characterized by expression of a module of interferon-stimulated genes (ISGs) that restrict viral replication and spread 1, 2 . Indeed, recombinant interferon is often given as first-line therapy in viral infection 3 and preliminary results suggest that interferon treatment may be effective against Coronavirus disease 2019 (COVID- 19) 4, 5 . Interferon signalling results in rapid upregulation of several hundred ISGs, including genes that inhibit various stages of viral entry and replication as well as transcription factors that further potentiate the interferon response 1, 2 . Given that unchecked interferon signalling and inflammation can result in immunopathology, ISGs are subject to complex regulatory mechanisms 6 . At the transcriptional level, long terminal repeats (LTRs), derived from endogenous retroviruses and other LTR retroelements, as well as regulatory sequences in non-LTR retroelements serve as cis-regulatory enhancers for a number of ISGs and are required for their induction 7 . Adding to this regulatory complexity, many retroelements are themselves interferon-responsive promoters and are upregulated following viral infection or in interferon-driven autoimmunity [8] [9] [10] [11] . The co-evolution of viruses and hosts has resulted in a number of strategies by which viruses evade or subvert interferon responses 12 . Compared with other respiratory viruses, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) elicits a weak interferon response despite strong induction of other chemokines 13 . Though the mechanism by which SARS-CoV-2 dampens interferon responses remains unclear, the ORF3b, ORF6, and nucleoprotein of the closely-related SARS-CoV function as interferon antagonists 14 . SARS-CoV-2 uses angiotensin-converting enzyme 2 (ACE2) as its primary receptor 15, 16 and recent work suggested that SARS-CoV-2 may hijack the interferon response by inducing ACE2 expression 17 . By integrating multiple human, macaque, and mouse single-cell RNAsequencing (RNA-seq) datasets, Ziegler et al. identified ACE2 as a primate-specific ISG upregulated following viral infection or interferon treatment 17 . Use of an ISG as a viral receptor would result in a self-amplifying loop to increase local viral spread, and calls into question the efficacy and safety of recombinant interferon treatment in COVID-19 patients. Our recent de novo cancer transcriptome assembly 18 identified a chimeric transcript formed by splicing between annotated exons of ACE2 and an LTR16A1 retroelement, integrated in intron 9 of the ACE2 locus. This transcript, which we refer to here as MIRb-ACE2, includes exons 10-19 of ACE2 (Fig. 1a) . Splicing between the LTR16A1 retroelement and exon 10 of ACE2 was highly supported by splice junction analysis of RNA-seq data from The Cancer Genome Atlas (TCGA) lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) (Fig. 1a) . To identify potential transcription start site(s) of the MIRb-ACE2 transcript, we inspected promoter-based expression analyses of the FANTOM5 data set, which indicated peaks in the LTR16A1 retroelement and the immediately upstream MIRb retroelement in the same intronic region (Extended Data Fig. 1 ). FANTOM5 CAGE peak distribution over the LTR16A1 and MIRb retroelements exhibited cell-type specificity to a certain degree, with peaks residing almost exclusively in MIRb in bronchial epithelial cells, but extending to LTR16A1 in HEK293 cells (Extended Data Fig. 1 ). Both LTR16A1 and MIRb retroelements contained multiple transcription factor binding sites, with IRF-1 and IRF-2 binding sites and TATA-box residing in MIRb (Extended Data Fig. 2 ). To further define the transcription start site(s), we performed 5' RACE (rapid amplification of cDNA ends) PCR, followed by deep sequencing of the PCR products, amplified from normal human bronchial epithelial (NHBE) cells or human squamous cell carcinoma cell lines SCC-4 and SCC-25, treated with recombinant IFNα (Extended Data Fig. 2 ). Consistent with FANTOM5 CAGE data, 5' RACE analysis revealed showed multiple peaks in both LTR16A1 and MIRb, again with evidence of cell-type specificity in their relative utilization (Extended Data Fig. 2 ). These results suggested that the MIRb and LTR16A1 retroelements acted as a cryptic promoter for the MIRb-ACE2 transcript, with transcription start sites distributed across these two retroelements. Phylogenetic analysis of the respective LTR16A1 and MIRb elements in the Ace2 loci of representative mammalian species indicated that the ancestral integrations predated estimated dates of mammalian radial divergence (Fig. 1b) . Indeed, comparative genomic analysis produced good alignment of the LTR16A1 and MIRb integrations across a variety of species, with humans, dogs, and dolphins showing above 60% sequence identity to the mammalian consensus sequences of LTR16A1 and MIRb (Fig. 1b, c) . Of note, the LTR16A1 and MIRb integrations were also present, but truncated in the murine Ace2 locus species family. Therefore, MIRb-ACE2 expression in humans likely represents a common mammalian feature that has been lost in some, but not all other mammalian species. To assess the relative expression of ACE2 and MIRb-ACE2 isoforms, we quantified expression of both transcripts across tissue types in the TCGA and Genotype-Tissue Expression (GTEx) cohorts. Consistent with recent reports 17, 19 , full-length ACE2 was expressed predominantly in the healthy intestine and kidney and tumors of the same histotypes (Extended Data Fig. 4) . Expression of MIRb-ACE2 followed a similar overall pattern, but with notable expression also in healthy testis, likely owing to retroelement activation as part of epigenetic reprogramming during spermatogenesis. However, despite similar histotype distribution of ACE2 and MIRb-ACE2 expression, the ratio of the two isoforms was characteristically different between distinct histotypes and tumor types. For example, in larger TCGA patient cohorts, LUAD samples expressed higher levels of ACE2 than of MIRb-ACE2 (mean ACE2/MIRb-ACE2 ratio = 5.63), whereas LUSC samples showed the opposite phenotype with higher expression of MIRb-ACE2 (mean ACE2/MIRb-ACE2 ratio = 0.87) (Fig. 2a, b) . ACE2 and MIRb-ACE2 expression and their ratios were not affected by patient sex, arguing against a strong effect of the X chromosomal location of ACE2 on either isoform expression (Fig. 2a, b) . ACE2 and MIRb-ACE2 exhibited characteristic expression also within tumor types with only weak correlation between the two in the same tumor type (R 2 =0.252 for LUAD; R 2 =0.337 for LUSC), suggesting partly independent regulation. In healthy lung, expression of ACE2 and MIRb-ACE2 was similar to that in LUAD, with the balance slightly in favor of the full-length form (mean ACE2/MIRb-ACE2 ratio = 2.73) (Fig. 2c ). By In contrast, healthy colon expressed considerably higher levels specifically of the full-length isoform (mean ACE2/MIRb-ACE2 ratio = 26.37) (Fig. 2d ). These differences in ACE2 and MIRb-ACE2 expression between healthy lung and colon were again independent of gender sex (Fig. 2c, d) . Tissue-specific patterns of ACE2 and MIRb-ACE2 expression suggested dependency on cell lineage or identity. Alternatively, they could reflect transient adaptations to the local microenvironment, such as oxygen or microbiota composition differences between lung and intestine, or even differences in cellular composition between the different compartments. To examine whether patterns of ACE2 and MIRb-ACE2 expression are linked to cell identity, we examined RNA-seq data from 933 cancer cell lines from The Cancer Cell Line Encyclopedia (CCLE). These represent homogenous cell populations, grown under standardized conditions, independently of environmental influences. Again, expression of ACE2 and MIRb-ACE2 was characteristically different between different cell lines and correlated with their anatomical origin ( Fig. 3a-d) . Cell lines with the highest expression of MIRb-ACE2 were derived from the upper aerodigestive tract, including the mouth and nose (mean ACE2/MIRb-ACE2 ratio = 0.72), followed by esophageal cells lines (mean ACE2/ MIRb-ACE2 ratio = 1.66) and lung cell lines (mean ACE2/MIRb-ACE2 ratio = 6.27). Consistent with data from primary biopsies, cells lines from the large intestine exhibited the highest expression of ACE2, with minimal expression of MIRb-ACE2 (mean ACE2/MIRb-ACE2 ratio = 16.97). The low ACE2/MIRb-ACE2 ratio in the upper aerodigestive tract was highly significant when compared with other locations (p=0.0035, when compared with the lung; p=0.0023 when compared with the large intestine, Student's t-test). Together, these results uncover the transcription of a novel ACE2 isoform, initiated at the intronic MIRb-LTR16A1 retroelements, in a characteristic pattern of expression, forming a gradient from the upper aerodigestive tract (highest MIRb-ACE2 expression) to the large intestine (highest ACE2 expression). ACE2 has recently been described as a human interferon-stimulated gene (ISG), upregulated at the mRNA level following viral infection or interferon treatment 17, 20 . However, this conclusion was based mostly on analysis of single-cell RNA-seq data that might not have sufficient resolution to distinguish the two isoforms. Indeed, inspection of public single-cell RNA-seq data (GSE134355) 21 , demonstrated the limitation of such technologies, with RNAseq reads mapping exclusively to the shared 3' terminal exon of the ACE2 transcripts, and therefore unable to discriminate between the isoforms (Extended Data Fig. 5 ). To investigate the inducibility of the two isoforms by IFN or viral infection, we re-analyzsed public RNA-seq data (GSE147507) from NHBE cells, treated with recombinant IFNβ or infected with SARS-CoV-2, Influenza A virus (IAV) or IAV lacking the viral NS1 protein (IAVΔNS1) 13 . None of the treatments increased expression of full-length ACE2 (Fig. 4a) . In stark contrast, MIRb-ACE2 expression was strongly elevated by both IAVΔNS1 infection and recombinant IFNβ treatment, compared with mock treatment (p=0.0005 and p=0.0054, respectively, Student's t-test). Similar results were also obtained with analysis of lung cancer Calu-3 cells. In the absence of stimulation, Calu-3 cells express exclusively the full-length ACE2 isoform (Fig. 4b) . SARS-CoV-2 infection did not affect levels of ACE2 expression, but noticeably induced MIRb-ACE2 expression (Fig. 4b) . Lastly, analysis of RNA-seq data from explanted lung tissue from a single COVID-19 patient demonstrated elevated expression of MIRb-ACE2, but not of ACE2, compared with healthy lung tissue (Fig. 4c) , albeit statistical comparisons were not possible in this case. To further confirm the IFN-responsiveness exclusively of MIRb-ACE2 expression, we used SCC-4 and SCC-25 cells, which express both isoforms. Compared with mock treatment, addition of recombinant IFNα or IFNγ had a minimal effect on ACE2 expression in SCC-4 cells and no effect in SCC-25 cells (Fig. 4d ). This contrasted with very strong induction (~15-fold) of MIRb-ACE2 expression by either type of IFN in both cell lines (Fig. 4d) . Lack of ACE responsiveness to IFN stimulation was additionally confirmed at the protein level, where neither IFNα nor IFNγ affected levels of full-length ACE2, detected by Western blotting in SCC-4 and SCC-25 cells or in A549 cells, which express neither isoform and were used as a negative control (Fig. 4e) . Splicing from the LTR16A1 retroelement to exon 10 of ACE2 is in-frame and therefore the last 449 amino acids of ACE2 are also present in the putative MIRb-ACE2 protein. Of note, despite strong upregulation at the mRNA level and despite using polyclonal antibodies (ab15348) targeting the C-terminus of ACE2 present in both protein products, we were unable to detect a truncated form that would correspond to the MIRb-ACE2 translation product in SCC-4 or SCC-25 cells (Fig. 4e) . To confirm the differential IFN inducibility of ACE2 and MIRb-ACE2 expression, we stimulated NHBE cells with IFNα, IFNβ or IFNλ, as previously described 22 . Again, treatment with none of the IFNs had any measurable effect on ACE2 expression in these primary cells (Fig. 4f ). This contrasted with robust induction of MIRb-ACE2 expression, particularly by IFNα (Fig. 4f) . Collectively, these data demonstrate that type I, II and III IFNs stimulate transcription of the ACE2 isoform driven by the alternative MIRb-LTR16A1, but not the canonical ACE2 promoter. The MIRb-ACE2 isoform is predicted to encode a truncated ACE2 product (amino acids 357-805) and exonization exonization of the LTR16A1 element creates a novel 10 amino acid N-terminal sequence (MREAGWDKGG) in the putative translation product (Extended Data Fig. 6 ). Importantly, this predicted protein lacks the first 356 amino acids, including the signal peptide, substrate-binding site and domains that interact with SARS-CoV and SARS-CoV-2 spike glycoproteins (Extended Data Fig. 6 ). Despite sharing the C-terminal half of full-length ACE2, which was readily detectable, endogenously produced MIRb-ACE2 protein was not detectable in SCC-4 and SCC-25 cells naturally expressing the MIRb-ACE2 transcript, by Western blotting using polyclonal anti-ACE2 antibodies (ab15348) (Fig. 4e) . To explore the protein-coding potential of the MIRb-ACE2 transcript, we cloned the coding sequences of both isoforms into the pcDNA3.1 mammalian expression vector and transfected HEK293T cells, which do not endogenously express ACE2 that would confound detection of ACE2 produced following transfection 16, 23 . While ACE2-transfected HEK293T cells produced detectable full-length ACE2, no protein of the predicted size was detectable in MIRb-ACE2-transfected cells (Extended Data Fig. 7 ), in agreement with results using SCC-4 and SCC-25 cells (Fig. 4e ). In independently reported findings 24 , endogenously produced MIRb-ACE2 protein could not be detected by Western blotting using the same polyclonal anti-ACE2 serum (ab15348). However, a Myc-DDK-tagged or GFP-tagged MIRb-ACE2 protein product was detected following overexpression in T24 cells in the same study 24 . Moreover, a separate study 25 reported detection of the putative MIRb-ACE2 protein in primary nasal epithelial cells by Western blotting using the same polyclonal anti-ACE2 serum (ab15348), raising the possibility that the protein can indeed be translated. To explain the apparent inefficiency of protein production from MIRb-ACE2 transcripts, we cloned the coding sequences of both isoforms into the pcDNA3.1-DYK-P2A-GFP expression vector, which adds both a FLAG tag and P2A peptide-linked GFP as part of the protein product. Expression of GFP was comparable in ACE2-transfected and MIRb-ACE2transfected cells, suggesting that the single RNA molecule that encodes for both the FLAGtagged MIRb-ACE2 product and GFP is stable and translated (Fig. 5a) . Despite that, following transfection with plasmid concentrations producing readily detectable full-length ACE2 and resulting in MIRb-ACE2 RNA levels comparable with those endogenously produced in IFNα-stimulated cells, we could not detect the predicted MIRb-ACE2 protein with antibodies to the FLAG tag (Fig. 5b) . However, the FLAG tagged MIRb-ACE2 protein could be detected in HEK293T cells transfected with much higher plasmid concentrations, resulting in RNA expression levels which were one order of magnitude higher than those observed in IFNα-stimulated NHBE, SCC-4 or SCC2-25 cells (Fig. 5c) . Therefore, although the MIRb-ACE2 transcript is efficiently translated (supported by the levels of P2A-linked GFP), the MIRb-ACE2 protein product is much less abundant than the full-length ACE2 at a given RNA transcription level, suggesting post-translational protein instability of the former. Lysine residues 625 and 702 in the full-length ACE2 protein have been described to be ubiquitinated and may contribute to its proteosomal degradation 26 . We generated a K625R K702R (K2R) mutant of full-length ACE2, which increased protein levels, compared to the wild-type ACE2 (Fig. 5d) . We have introduced the same mutations in the corresponding residues of the predicted MIRb-ACE2 protein product, K279R K356R, which were similarly accessible for ubiquitination (Extended Data Fig. 8 ). However, we were unable to detect stable protein following transfection with the MIRb-ACE2 K2R-encoding mutant (Fig. 5d) . Consistent with this, the addition of the proteasome inhibitor MG-132 was sufficient to increase protein levels of ACE2, but did not rescue the MIRb-ACE2 protein product (Fig. 5d) . Moreover, cycloheximide treatment of HEK293T cells transfected with FLAG tagged ACE2 or MIRb-ACE2 constructs led to the rapid loss of MIRb-ACE2 protein, but did not affect levels of full-length ACE2 in the same time frame (Fig. 5e ), further supporting reduced stability of the former. Structural considerations suggested that the MIRb-ACE2 protein product would unlikely retain the partial structure of the canonical ACE2 peptidase fold, as removing most of the this subdomain would expose the remaining component of the highly charged substratebinding groove as well as large parts of the hydrophobic protein core (Extended Data Fig. 9 ). Hence, it seems unlikely that a protein coded by the MIRb-ACE2 transcript would form a structure similar to that of the canonical ACE2. Moreover, the MIRb-ACE2 protein product lacks a canonical signal peptide and when an IgG kappa chain-derived signal peptide, which has been successfully used to express the canonical ACE2 ectodomain (residues 15-615) 27 , was fused to the corresponding domain of the predicted MIRb-ACE2 protein (residues 1-269), there was no detectable secreted protein. These data suggest that the latter protein is subject to post-translational degradation through a proteasome-independent mechanism and therefore unlikely to exert significant biological activity. Nevertheless, as the MIRb-ACE2 protein was indeed made under certain conditions, it remained possible that it retained some biological function or that it affected the function of canonical ACE2 through heterodimer formation. To examine this possibility, we quantified levels of enzymatically active ACE2, an assay that is considerably more sensitive than Western blotting, and found, as expected, strong enzymatic activity in lysates from ACE2transfected, but not MIRb-ACE2-transfected cells (Fig. 5f) . Furthermore, co-transfection with MIRb-ACE2 did not affect the enzymatic activity conferred by ACE2 transfection (Fig. 5f ). To determine any involvement of the predicted MIRb-ACE2 protein in SARS-CoV-2 entry, we measured binding of the S1 subunit of SARS-CoV-2 spike glycoprotein, the first step of viral entry, to cells expressing either or both ACE2 isoforms. HEK293T cells were transfected with the P2A-GFP reporter constructs for ACE2 and MIRb-ACE2 and transfected and untransfected cells were distinguished based on GFP expression (Extended Data Fig. 10 ). Whereas SARS-CoV-2 S1 efficiently bound HEK293T cells expressing ACE2, it did not bind those expressing MIRb-ACE2 (Fig. 5g) . Moreover, co-expression of the two isoforms in the same cells did not alter binding of SARS-CoV-2 S1, beyond the effect of plasmid dilution (Fig. 5g) . Collectively, these results argue against significant effect of MIRb-ACE2 expression on ACE2 function or SARS-CoV-2 entry. Regulation of ACE2 expression and function is critical both in physiology and pathology 28 . The use of ACE2 as a primary receptor for entry by the pandemic coronaviruses, SARS-CoV and SARS-CoV-2 highlighted the potential effect of changes in ACE2 expression, particularly in response to IFN, on the course or severity of COVID-19 17 . Here, we show that ACE2 transcription and protein production is not responsive to IFN. Instead, we describe a novel RNA isoform, MIRb-ACE2, that is highly responsive to IFN stimulation, but encodes a truncated and unstable protein product. In support of these findings, the new isoform is independently described in two other recent pre-print reports 24, 25 and matches the sequence recently deposited under GenBank accession number MT505392. We find that the MIRb-ACE2 isoform exhibits distinct patterns of expression along the aerodigestive and gastrointestinal tracts and was likely responsible for the apparent IFN inducibility of ACE2 expression reported by analysis of single-cell RNA-seq data 17 and other similar studies 20 . We further show that transcription of this novel isoform is initiated by intronic retroelements, which function as a cryptic, IFN-responsive promoter, adding further evidence for the widespread involvement of such retroelements in gene regulatory networks. Indeed, endogenous retroelements comprise nearly half the human genome and can affect many host processes [29] [30] [31] . LTR and non-LTR retroelements represent an abundant source of promoters, enhancers, and polyadenylation sequences that can modulate the expression and structure of neighboring genes 32 , as with ACE2. For instance, retroelements serve as promoters or enhancers for a number of ISGs, conferring IFN inducibility, exemplified in the case of AIM2 7 . Retroelements may further modify the function of ISGs and we have recently described a novel isoform of the ISG CD274 (encoding PD-L1) that produces a truncated form through retroelement exonization 33 . The use of the intronic MIRb and LTR16A1 elements as the promoter and alternative exon for the MIRb-ACE2 isoform explains its independent regulation from that of the full-length ACE2 isoform. In addition to IFN inducibility, the cryptic MIRb-LTR16A1 promoter also confers tissue-specific expression, with the highest levels seen in the upper aerodigestive tract, where it can be the predominant isoform. In contrast, the canonical ACE2 isoform far exceeds expression of the MIRb-ACE2 isoform in the lower gastrointestinal tract. It is theoretically possible that the balance of MIRb-ACE2 and full-length ACE2 isoforms plays a role in the spread of SARS-CoV-2, particularly in the upper aerodigestive tract, or that RNA or protein products of MIRb-ACE2 are involved in other pathological or physiological processes. However, the low stability of the MIRb-ACE2 protein product argues that this is unlikely. Independently of any functional significance, expression of the MIRb-ACE2 isoform needs to be carefully considered in studies examining ACE2 regulation at the transcriptional level 17, 19, 20 . The description of this novel isoform highlights the need to validate single-cell RNA-seq data with orthogonal approaches. While single-cell RNA-seq initiatives are an invaluable resource and allow for rapid identification of cell types that express a gene of interest, coverage and read depth are largely insufficient to distinguish between isoforms. Technological advances to improve sequencing depth and bioinformatic tools to impute missing values are rapidly progressing; in the meantime, long-read sequencing techniques to quantify transcript isoforms and confirmation of protein expression levels can be incorporated into existing workflows. This work established MIRb-ACE2 as the predominantly induced form of ACE2 following viral infection or recombinant interferon treatment, including in the SARS-CoV-2-infected lung. The suggestion that ACE2 is an ISG raised fears that therapeutic interferon could be detrimental 17 ; however, we find that full-length ACE2 is not increased at the mRNA or protein level. The predicted MIRb-ACE2 protein product could be detected in vitro, albeit under high levels of MIRb-ACE2 RNA expression, and it remains possible that the MIRb-ACE2 protein, or fragments thereof, are produced under certain conditions in vivo. Indeed, despite its reduced stability when compared to full-length ACE2, evidence for production of the MIRb-ACE2 protein has also been independently reported 24, 25 . Nevertheless, it is worth noting that the predicted MIRb-ACE2 protein does not contain the residues required for SARS-CoV-2 spike glycoprotein binding 15 , does not bind recombinant SARS-CoV-2 S1 experimentally and is thus unlikely to contribute to viral spread. These results reconcile the apparent discrepancy between the interferon inducibility of ACE2 with promising data showing improved outcomes in COVID-19 following interferon treatment 4,5 . HEK293T, A549, SCC-4, SCC-25, Vero, CV-1, MDCK, R9ab and MCA-38 cells were obtained from and verified as mycoplasma free by the Cell Services facility at the Francis Crick Institute. Human cell lines were additionally validated by DNA fingerprinting. HEK293T and A549 cells were grown in Iscove's Modified Dulbecco's Medium (Sigma-Aldrich) supplemented with 5% fetal bovine serum (Thermo Fisher Scientific), L-glutamine (2 mmol/L, Thermo Fisher Scientific), penicillin (100 U/mL, Thermo Fisher Scientific), and streptomycin (0.1 mg/mL, Thermo Fisher Scientific). SCC-4 and SCC-25 cells were grown in Dulbecco's Modified Eagle Medium/Nutrient Mixture F-12 (Gibco) supplemented with 10% fetal bovine serum (Thermo Fisher Scientific), L-glutamine (2 mmol/L, Thermo Fisher Scientific), penicillin (100 U/mL, Thermo Fisher Scientific), and streptomycin (0.1 mg/mL, Thermo Fisher Scientific). NHBE cells were cultured as previously described 22 Transcripts were previously assembled on a subset of the RNA-seq data from The Cancer Genome Atlas (TCGA) 18 . The alternative promoter within ACE2 was more highly expressed in lung squamous carcinomas than the canonical isoform, prompting us to investigate its biology. RNA-seq data from TCGA, GTEx, CCLE, and other studies were mapped to the cancer-tissue transcriptome assembly and counted as previously described 18 . Briefly, transcripts per million (TPM) values were calculated for all transcripts in the transcript assembly 18 with a custom Bash pipeline (Supplementary Code 1) using GNU parallel 34 v3 and Salmon 35 v0.12.0, which uses a probabilistic model for assigning reads aligning to multiple transcript isoforms, based on the abundance of reads unique to each isoform 35 . Splice junctions were visualized using the Integrative Genome Viewer (IGV) 36 v2.4.19. Bulk RNA-seq data were downloaded from study GSE147507 13 . Reads were adapter trimmed and filtered for minimal 35nt sequences using Trimmomatic v0.39. Since some samples were infected with SARS-CoV-2 in vitro, we identified and removed viral reads using BowTie2 (seedlength 30nt) to align reads to the Wuhan region reference genome (MN908947). Subsequently, reads were mapped with HISAT2 (optional parameters -p 8 -qk 5) against GRCh38 reference chromosome assembly and transcripts were quantified against our custom transcriptome assembly using Salmon 35 v0.12.0, as described previously 18 and above in "Transcript identification, read mapping and quantitation". For single-cell RNA-seq data analysis, we downloaded the raw paired end sequencing reads as unmapped bam files from study GSE134355 21 , which were already demultiplexed, with one individual per tissue per sample. We then used the DropSeq:picard toolbox (v2.3.0) to recapitulate processing of HCL samples as documented on 'https://github.com/ggjlab/HCL'. In summary, this includes trimming polyA ends from each primary RNA sequencing read and tagging it with the cellular and molecular adapter sequence contained in the secondary read (BASE_RANGE=1-6:22-27:43-48 and BASE_RANGE=49-54, respectively). All reads were then mapped with HISAT2 (optional parameters -p 8 -q -k 5) against GRCh38 reference chromosome assembly. The HISAT2 index here was built with the --exon / --ss option to cover all known splice sites annotated in the GENCODE v34 basic annotation. The cellular and molecular barcode sequences were recovered using the MergeBamAlignment utility in picard. Total RNA from NHBE, SCC-4 and SCC-25 cells was isolated using the QIAcube (Qiagen) and cDNA synthesis was carried out with the High Capacity Reverse Transcription Kit (Applied Biosystems), with an added RNase inhibitor (Promega). Amplicons were generated using the 5' RACE System for Rapid Amplification of cDNA Ends (Invitrogen), according to manufacturer's instructions using primers listed in Supplementary Table 1. Libraries were prepared from amplicons using the NEB Ultra II DNA Library Prep Kit for Illumina (New England Biolabs), according to manufacturer's instructions and sequenced on a MiSeq system (Illumina). Reads were quality and adapter trimmed in pairs using cutadapt 37 v1.18 and aligned with STAR 38 v2.7.1a (setting outFilterScoreMinOverLread = 0.1 and outFilterMatchNminOverLread = 0.1) to the a GRCh38 reference with known slice sites from Ensembl release 100. The most 5' base of reads mapping to the MIRb-ACE2 transcript was taken as the TSS and were obtained from the properly-paired, uniquely-mapping reads using bedtools for visualisation within IGV v2.4.19. To identify the integration time of LTR16A1 into the ACE2 locus, we first compared the Homo sapiens LTR16A1 and MIRb to the respective consensus sequences in Dfam 39 . Based on sequence identity and the human neutral substitution rate estimated at 2.2 × 10 -9 substitutions per site per year, the LTR16A1 insertion is expected to be ~131 million years (with 284 nt matches across 399 nt) and the MIRb insertion ~155 million years (with 159 nt matches across 241 nt). To find evidence for insertion of the LTR16A1 and MIRb elements before the split of the major mammalian lineages, we used the UCSC liftover utility to find the ACE2 gene locus in Rhesus macaque (rheMac10 assembly), marmoset (caljac3 assembly), mouse (mm10 assembly), dog (canFam3 assembly), african elephant (loxAfr3 assembly), bottle-nose dolphin (Turtru2 assembly), cow (bosTau9 assembly), opossum (monDom5 assembly) and platypus (ornAna2). We used the MUSCLE aligner on default settings to build a global alignment of human to rhesus macaque and marmoset, and then aligned all other species to the profile, reverting the strand of the whole sequence for mouse, elephant, cow and opossum due to whole gene inversions. We then used muscle -refine on overlapping 30,000 column blocks to refine the alignment locally. Then we identified the longest potential sequences matching the LTR16A1 and MIRb elements in all species based on the sequences aligning with the repeat sequence in the human genome as annotated by RepeatMasker. These were aligned to LTR16A1 and MIRb consensus sequences from Dfam 3.2 with mafft (--ep 0 --genafpair --maxiterate 1000 options) and intronic sequence clearly distinct from the repeats were trimmed. The two elements are absent from the considerably shorter platypus ACE2 intron. In opossum, the respective intronic sequence is extended but no clear matches with either LTR16A1 or MIRb were found, prompting us to place both insertions ahead of the mammalian radial divergence. The illustration of the lineage tree including node times are taken from timetree.org. Open reading frames encoding ACE2, MIRb-ACE2, and respective lysine mutants were synthesized and cloned into the pcDNA3.1-DYK-P2A-GFP mammalian expression vector. Gene synthesis, cloning, and mutagenesis were performed by GenScript and verified by sequencing. Cells were transfected using GeneJuice (EMD Millipore) and harvested 48 hrs post-transfection for downstream assays. For interferon stimulation experiments, 2 × 10 5 SCC-4 and SCC-25 cells were stimulated with 100 ng/mL IFN-α or IFN-γ (Abcam) or PBS for 48 hrs. For proteasome inhibition experiments, cells were cultured in 20 μM MG-132 (EMD Millipore) 24 hrs after transfection and harvested 48 hrs after transfection. For cycloheximide experiments, cells were treated with 250 μg/mL cycloheximide (Sigma Aldrich) and harvested at denoted timepoints. NHBE cells were stimulated for 4 hrs with 1000 ng/ml IFNα, 100ng/ml IFNβ or 100ng/ml IFNλ were used in a previous study 22 , and stored cDNA was analyzed by RT-qPCR in this study. Cell lysates in RIPA buffer were resuspended in SDS buffer, heat denatured at 95°C for 10 min, run on a 4-20% gel (Biorad), transferred to a PVDF membrane (Biorad), and blocked in 5% (w/v) bovine serum albumin fraction V (Sigma-Aldrich) in TBS-T. Membranes were incubated with primary antibodies to ACE2 (1:1000; ab15348, Abcam), FLAG (1:1000; F1804-50UG clone M2, Sigma-Aldrich), HRP-conjugated secondary antibodies to rabbit IgG or mouse IgG (1:1000; #7074 and #7076, respectively, Cell Signaling Technology), and HRP-conjugated actin (1:25000; ab49900, Abcam). Blots were visualized by chemiluminescence on an Amersham Imager 600 (GE Healthcare). Total RNA from cell lines was isolated using the QIAcube (Qiagen), and cDNA synthesis was carried out with the High Capacity Reverse Transcription Kit (Applied Biosystems) with an added RNase inhibitor (Promega). Purified cDNA was used to quantify human ACE2 and MIRb-ACE2, or Ace2 and MIRb-Ace2 in other mammalian species, using variant-specific and species-specific primers (Supplementary Table 1 ). The IFN-inducible human genes CXCL10 and CD274 were also amplified as controls for the effect of IFN treatment, using transcript-specific primers (Supplementary Table 1 ). For amplification of a conserved house-keeping gene, primers complementary to HPRT sequences conserved in all species were used (Supplementary Table 1 ). Values were normalized to HPRT expression using the ΔC T method. ACE2 activity in cell lysates was measured using the SensoLyte 390 ACE2 Activity Assay (AnaSpec) according to manufacturer's instructions. Recombinant human ACE2 (Sigma-Aldrich) was used as a positive control. For SARS-CoV-2 S1 binding assays, cells were stained with biotinylated S1 (1:200; Acro Biosystems) for 30 minutes followed by APC-Streptavidin (1:200; Biolegend). For S1 binding assays and for GFP detection, single-cell suspensions were run on a LSR Fortessa (BD Biosciences) running BD FACSDiva v8.0 and analysed with FlowJo v10 (Tree Star Inc.) analysis software. Statistical comparisons were made using GraphPad Prism 7 (GraphPad Software) or SigmaPlot 14.0. Parametric comparisons of normally distributed values that satisfied the variance criteria were made by unpaired Student's t-tests or One Way Analysis of variance (ANOVA) tests. Data that did not pass the variance test were compared with non-parametric two-tailed Mann-Whitney Rank Sum tests or ANOVA on Ranks tests. Extended Data Fig. 1 Normalized data from the FANTOM Consortium and the RIKEN PMI and CLST (DGT) for transcription start sites in the proximity of the intronic MIRb and LTR16A1 elements in the ACE2 locus. Both the sense and antisense orientations are depicted. Data were visualized with the zenbu online viewer (https://fantom.gsc.riken.jp/zenbu) for FANTOM5 Human hg38 promoterome. RNA-seq trace of two multiplexed samples from adult lung, obtained from study GSE134355. Note the lack of coverage across the entire locus with the exception of only the 3' end of the last exon, shared between the isoforms. Data supporting the findings of this study are available within the article and its supplementary information files. All data, plasmids and cell lines are available and from the corresponding author upon reasonable request. Publicly available data were downloaded from the following databases: Europe PMC Funders Author Manuscripts (top), and mean (±SE) MIRb-ACE2 expression, determined by RT-qPCR in the same cells, in comparison with MIRb-ACE2 expression in IFNα-stimulated NHBE, SCC-4 and SCC-25 cells (bottom). Each symbol represents the mean value of two technical RT-qPCR replicates of a single culture, and the bars and error bars represent the mean and SE of the three independently-treated cultures in the same experiment. d, Detection of ACE2 and MIRb-ACE2 protein by Western blotting for the FLAG tag in cell lysates from HEK293T cells transfected (with 4 μg of expression plasmids) to express either wild-type isoform or either isoform with the two lysine residues mutated (K2R) (all in conjunction with a FLAG tag and GFP, linked by a P2A peptide). HEK293T cells transfected to express the wild-type isoforms were treated with the MG-132 inhibitor. One representative of 2 experiments is shown. e, Stability of ACE2 and MIRb-ACE2 protein, determined by Western blotting in HEK293T cells transfected to express either isoform, after the indicated times following treatment with cycloheximide. Data from a single experiment are shown. f, Kinetics of mean (±SD) ACE2 enzymatic activity in the supernatant of HEK293T cells transfected to express either ACE2 or MIRb-ACE2 or both (ACE2 + MIRb-ACE2). Expression plasmids were used at 4 μg and 2 μg each for individual transfections and co-transfections, respectively. Symbols represent the mean value of two technical replicates in the same experiment. One representative of 2 experiments is shown. g, Flow cytometric detection of SARS-CoV-2 S1 bindings to HEK293T cells transfected to express either ACE2 or MIRb-ACE2 or both (ACE2 + MIRb-ACE2). ACE2 and MIRb-ACE2 expression plasmids were used at 4 μg and 14 μg for individual transfections, respectively, and at 2 μg and 14 μg and co-transfections, respectively. Interferon-inducible antiviral effectors Type I interferons in host defense IFN-α subtypes: distinct biological activities in anti-viral therapy Triple combination of interferon beta-1b, lopinavir-ritonavir, and ribavirin in the treatment of patients admitted to hospital with COVID-19: an open-label, randomised, phase 2 trial Retrospective Multicenter Cohort Study Shows Early Interferon Therapy Is Associated with Favorable Clinical Responses in COVID-19 Patients Regulation of type I interferon responses Regulatory evolution of innate immunity through co-option of endogenous retroviruses Resurrection of endogenous retroviruses in antibody-deficient mice Microarray analysis reveals global modulation of endogenous retroelement transcription by microbes Physiological and Pathological Transcriptional Activation of Endogenous Retroelements Assessed by RNA-Sequencing of B Lymphocytes. Frontiers in microbiology ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses Ten Strategies of Interferon Evasion by Viruses Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19 Severe acute respiratory syndrome coronavirus open reading frame (ORF) 3b, ORF 6, and nucleocapsid proteins function as interferon antagonists Structural basis of receptor recognition by SARS-CoV-2 SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues LTR retroelement expansion of the human cancer transcriptome and immunopeptidome revealed by de novo transcript assembly A single-cell RNA expression map of human coronavirus entry factors Cigarette Smoke Exposure and Inflammatory Signaling Increase the Expression of the SARS-CoV-2 Receptor ACE2 in the Respiratory Tract Construction of a human cell landscape at single-cell level Type I and III interferons disrupt lung epithelial repair during recovery from viral infection Pre-existing and de novo humoral immunity to SARS-CoV-2 in humans Interferons and viruses induce a novel primate-specific isoform dACE2 and not the SARS-CoV-2 receptor ACE2. bioRxiv A novel isoform of ACE2 is expressed in human nasal and bronchial respiratory epithelia and is upregulated in response to RNA respiratory virus infection Multi-level proteomics reveals host-perturbation strategies of SARS-CoV-2 and SARS-CoV SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects The emerging role of ACE2 in physiology and disease Human transposon tectonics Endogenous viruses: insights into viral evolution and impact on host biology Immune responses to endogenous retroelements: taking the bad with the good Long Terminal Repeats: From Parasitic Elements to Building Blocks of the Transcriptional Regulatory Repertoire Soluble PD-L1 generated by endogenous retroelement exaptation is a receptor antagonist The Command-Line Power Tool. The USENIX Magazine Salmon provides fast and bias-aware quantification of transcript expression Integrative Genomics Viewer (IGV): highperformance genomics data visualization and exploration Cutadapt removes adapter sequences from high-throughput sequencing reads STAR: ultrafast universal RNA-seq aligner The Dfam database of repetitive DNA families We are grateful for assistance from the Advanced Sequencing, Scientific Computing, Flow Cytometry and Cell Services facilities at the Francis Crick Institute. The results shown here are in whole or part based upon data generated by The Cancer Genome Atlas (TCGA) Research Network (http://cancergenome.nih.gov). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. This work benefited from data assembled by the CCLE consortium. This work was supported by the Francis Crick Institute (FC001099, FC001206, FC001078), which receives its core funding from Cancer Research UK, the UK Medical Research Council, and the Wellcome Trust; and by the Wellcome Trust (102898/B/13/Z). composite LUAD and LUSC samples. Also shown is splice junction analysis of the same RNA-seq samples. b, Phylogenetic analysis of the MIRb and LTR16A1 sequences in the indicated representative mammalian species and percent sequence identity to the consensus MIRb and LTR16A1 sequences. The arrows indicate the estimated timing of ancestral integrations of the MIRb and LTR16A1 elements, respectively. mya, million years ago. c, Ng Custom code used in this study is available in the supplementary information.