key: cord-0968825-36j6daa3
authors: Chang, Jessie J.-Y.; Gleeson, Josie; Rawlinson, Daniel; De Paoli-Iseppi, Ricardo; Zhou, Chenxi; Mordant, Francesca L.; Londrigan, Sarah L.; Clark, Michael B.; Subbarao, Kanta; Stinear, Timothy P.; Coin, Lachlan J. M.; Pitt, Miranda E.
title: Long-Read RNA Sequencing Identifies Polyadenylation Elongation and Differential Transcript Usage of Host Transcripts During SARS-CoV-2 In Vitro Infection
date: 2022-04-06
journal: Front Immunol
DOI: 10.3389/fimmu.2022.832223
sha: 5a75c92bb5a66dc126447694ec8480c2e7efdad8
doc_id: 968825
cord_uid: 36j6daa3

Better methods to interrogate host-pathogen interactions during Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections are imperative to help understand and prevent this disease. Here we implemented RNA-sequencing (RNA-seq) using Oxford Nanopore Technologies (ONT) long-reads to measure differential host gene expression, transcript polyadenylation and isoform usage within various epithelial cell lines permissive and non-permissive for SARS-CoV-2 infection. SARS-CoV-2-infected and mock-infected Vero (African green monkey kidney epithelial cells), Calu-3 (human lung adenocarcinoma epithelial cells), Caco-2 (human colorectal adenocarcinoma epithelial cells) and A549 (human lung carcinoma epithelial cells) were analyzed over time (0, 2, 24, 48 hours). Differential polyadenylation was found to occur in both infected Calu-3 and Vero cells during a late time point (48 hpi), with Gene Ontology (GO) terms such as viral transcription and translation shown to be significantly enriched in Calu-3 data. Poly(A) tails showed increased lengths in the majority of the differentially polyadenylated transcripts in Calu-3 and Vero cell lines (up to ~101 nt in mean poly(A) length, padj = 0.029). Of these genes, ribosomal protein genes such as RPS4X and RPS6 also showed downregulation in expression levels, suggesting the importance of ribosomal protein genes during infection. Furthermore, differential transcript usage was identified in Caco-2, Calu-3 and Vero cells, including transcripts of genes such as GSDMB and KPNA2, which have previously been implicated in SARS-CoV-2 infections. Overall, these results highlight the potential role of differential polyadenylation and transcript usage in host immune response or viral manipulation of host mechanisms during infection, and therefore, showcase the value of long-read sequencing in identifying less-explored host responses to disease.

The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) was first discovered in Wuhan, China at the end of 2019 and is the causative agent of the global Coronavirus Disease 2019 (COVID- 19) pandemic. The World Health Organization (WHO) reported over 5.8 million deaths and over 409 million confirmed cases globally as of mid-February 2022 (1) , and the global health, social and economic burden due to this disease continues to grow. Extensive research on this virus has been carried out since the first discovery of the pathogen. Nevertheless, continued exploration of the host response during an infection with SARS-CoV-2 is imperative for developing novel therapeutics, diagnostics, and prophylactics.

The host response to SARS-CoV-2 infection has been comprehensively studied within the past two years. This includes transcriptomic studies of the host using RNA sequencing (RNA-seq) from in vitro infections of cell lines/ primary cells, in vivo infection models in ferrets as well as clinical samples from infected patients (2) (3) (4) . Of these, in vitro SARS-CoV-2 infection studies using continuous cell lines have been commonly used, due to the simplicity of the model. Vero (African green monkey kidney epithelial) cells are known for their high susceptibility to SARS-CoV-2, due to their defective interferon I responses (5) . However, due to the lack of biological relevance using these cells, human epithelial cells have mostly been used for assessing host responses instead of Vero cells, such as Calu-3 (human lung adenocarcinoma epithelial), Caco-2 (human colorectal adenocarcinoma epithelial) and A549 (human lung carcinoma epithelial) cells. SARS-CoV-2-infected Calu-3 cells exhibited upregulation of genes involved in innate immune response to viral infections such as IFIT2, OAS2, or IFNB1, similar to the responses elicited by the SARS-CoV-1 virus (2, 6) . Also, in both Calu-3 and Caco-2 cells, genes involved in response to Endoplasmic Reticulum (ER) stress and mitogenactivated protein (MAP) kinases were upregulated during infection (6) . However, responses between Calu-3 and Caco-2 were found to be cell-specific. Caco-2 cells lacked in innate immune responses when infected with SARS-CoV-1/2 (6) (7) (8) , and have shown fewer changes at the gene (6) and protein level (9) compared to Calu-3 cells. Furthermore, A549 cells have shown lack of susceptibility to SARS-CoV-2, despite being a human airway epithelial cell line like Calu-3 cells (2, 10) . This has been attributed to the lack of the main entry receptor of SARS-CoV-2 -Angiotensin-Converting Enzyme 2 (ACE2)on the surface of these cells. However, air-liquid interface culturing or ACE2-expressing A549 (A549-hACE2) cells enhanced the susceptibility to SARS-CoV-2 (11, 12) . Overall, host responses appeared to vary between different epithelial cell lines and were dependent on the multiplicity of infection (MOI) of the virus in A549-hACE2 cells (2) .

Most RNA-seq data reported in the literature have been generated using short-read sequencing methods such as Illumina sequencing (13, 14) . In these studies, differential expression and Gene Ontology (GO)/Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses have been the main outcomes. Short-read RNA-seq is an effective technique for measuring differential mRNA abundance. However, utilizing a long-read sequencing platform provides the ability to discern other functionally significant mRNA features such as length of the poly(A) tails, alternative splicing, and differential isoform usage (15) (16) (17) (18) . These additional mRNA features have been linked with different disease states (19) (20) (21) . However, these events have not been studied in depth for infectious diseases, especially with SARS-CoV-2 infections. An ability to measure full-length transcripts, polyadenylation status and isoform usage would permit significantly enriched insights into host responses to viral infection than standard RNA-seq methods allow.

Here we report the use of RNA-seq methods from the Oxford Nanopore Technologies (ONT) platform (direct RNA, direct cDNA and PCR cDNA) to carry out an in-depth investigation into the host response to SARS-CoV-2 in vitro. The responses were visualized throughout a time-course (0, 2, 24 and 48 hours post infection (hpi)) using four epithelial cell lines (Vero, Calu-3, Caco-2 and A549). Previously we performed a comprehensive analysis of the viral response for some of these datasets (22) . In this current study, we investigated differential polyadenylation and transcript usage between infected and mock control cells. Additionally, we were interested in whether long-read differential expression analysis conveyed similar differential expression results to short-read RNA-seq studies shown in literature. Overall, our study demonstrated the value of long-read sequencing in identifying less-explored host responses to disease.

ONT sequencing data (direct RNA and direct cDNA) for this study from cell lines (Vero, Caco-2 and Calu-3) was derived from our previous work (22) , and is currently publicly available at NCBI repository BioProject PRJNA675370. Additional datasets were generated for this study including PCR cDNA datasets for cell lines (Vero, Caco-2, Calu-3 and A549) and the direct RNA and direct cDNA datasets for A549. These datasets are also available at NCBI repository BioProject PRJNA675370. The results and code of individual analyses are available at Figtree DOI: 10.6084/m9.figshare.17139995 (differential expression), 10.6084/m9.figshare.16841794 (differential polyadenylation) and 10.6084/m9.figshare.17140007 (differential transcript usage).

Cell culture and RNA extraction/preparation methods have been described previously (22) for Calu-3 (human lung adenocarcinoma epithelial -ATCC HTB-55), Caco-2 (human colorectal adenocarcinoma epithelial -ATCC HTB-37) and Vero (African green monkey kidney epithelial -ATCC CCL-81) cells. For this current study, we additionally cultured A549 (human lung carcinoma epithelial -ATCC CCL-185) cells to supplement our main data, using similar methods. Briefly, A549, Vero, Calu-3 and Caco-2 cell lines were cultured in T75 flasks and maintained at 37°C and 5% (v/v) CO 2 . A549 cells were cultured with Ham's F-12K (Kaighn's) Medium (Gibco) supplemented with 10% FBS, 4 mM Lglutamine (Media Preparation Unit, The Peter Doherty Institute for Infection and Immunity (Doherty Institute)), 100 IU penicillin, 10 µg streptomycin/mL, 1X non-essential amino acids (Gibco-BRL) and 50 µM B-mercaptoethanol (Life Technologies). All cell lines were seeded in 4 x 6-well tissue-culture plates and maintained at 70-80% confluency for infection. Three wells of the 6-well plates were infected with SARS-CoV-2 (Australia/VIC01/2020) at a MOI of 0.1 and the remaining wells were used as mock controls for four time points (0, 2, 24 and 48 hpi). Total cellular RNA was extracted with the RNeasy Mini Kit (Qiagen), treated with the Turbo DNAse-free Kit (Invitrogen) and purified with RNAClean XP magnetic beads (Beckman Coulter). The final resulting RNA was eluted in nucleasefree water. Quality control was carried out using NanoDrop 2000C (Thermo Fisher Scientific), Bioanalyzer 2100 (Agilent Technologies) and Qubit 4 Fluorometer (Invitrogen).

Library preparation and sequencing methods have been described previously (22) . Briefly, RNA from mock control and infected cells harvested at 0, 2, 24 and 48 hpi from Caco-2, Calu-3 and Vero cells was sequenced with the ONT Direct cDNA Sequencing Kit (SQK-DCS109) in conjunction with the Native Barcoding Kits (EXP-NBD104 & EXP-NBD114). RNA harvested at 2, 24 and 48 hpi was sequenced with the Direct RNA Sequencing Kit (SQK-RNA002) by pooling the RNA from replicate wells. For this current study, RNA from A549 cells was sequenced as per our previous work with minor modifications in the number of time points sequenced, to supplement our main data. The ONT Direct RNA Sequencing Kit (SQK-RNA002) was used to prepare 6 µg of pooled total RNA (2 µg RNA from each replicate well) from control and infected cells at 24 hpi. The Direct cDNA Sequencing Kit (SQK-DCS109) was used in conjunction with the Native Barcoding Kit (EXP-NBD104) to prepare 3 µg of total RNA from all control and infected replicates separately at both 0 and 24 hpi time points. All direct RNA and direct cDNA libraries were loaded onto a R9.4.1 flow cell and sequenced for 72 hrs using an ONT MinION or GridION. Additionally, PCR cDNA long-read sequencing was carried out with RNA from all four cell lines (Vero, Caco-2, Calu-3 and A549) cells using the following methods: cDNA libraries were constructed with the PCR-cDNA Sequencing (SQK-PCS109) and PCR Barcoding (SQK-PBK004) kits using the supplied protocol. RNA samples from 0 and 24 hpi were randomized and multiplexed for sequencing in groups of six using sequential barcodes. 100 ng of sample RNA was used for cDNA synthesis. Transcripts were amplified by PCR and barcodes added using the specified cycling conditions with a 7 min extension time and 13x cycles. Amplified samples were individually cleaned using 0.5x AMPure XP beads (Beckman Coulter) and quantified using a Qubit 4 Fluorometer (Invitrogen). The length distribution was determined via the TapeStation 4200 (Agilent Technologies) before pooling. Equimolar amounts of each barcoded sample were pooled to a total of 100 -200 fmol (assuming median transcript size = 1.1 kb). 100 fmol of final libraries were loaded onto a R9.4.1 flow cell and sequenced for 72 hrs on an ONT GridION. Run metrics were monitored live and if active pores dropped below 200, any remaining library was loaded following a nuclease flush. Synthetic 'sequin' RNA standards, provided in two mixes (A and B) (23) , were added to each sample in direct RNA and PCR cDNA libraries. Mix A and B sequins, diluted 1:250 (approximately 6-10% of estimated total mRNA), were added to infected and control samples, respectively.

Publicly available data from our previous work (22) in combination with data generated from this study were analyzed using two High Performance Computing (HPC) platforms Spartan (24) and Nectar from the Australia Research Data Commons.

All FAST5 files were basecalled using standalone Guppy v3.5.2 (https://community.nanoporetech.com/sso/login?next_url=% 2Fdownloads), except PCR cDNA data from Vero cells which were live-basecalled using Guppy v3.2.8. All resulting FASTQ data were mapped using Minimap2 v2.17 (25) . Direct RNA-seq data was mapped to the combined genome (consisting of human/African green monkey genome from Ensembl (release 100), SARS-CoV-2 Australia virus (Australia/VIC01/2020, NCBI : MT007544.1) and the RNA sequin decoy chromosome genome (23) with the default direct RNA parameters '-ax splice -uf -k14 --secondary=no' and for all cDNA datasets '-ax splice --secondary=no'. All data were mapped to the respective combined transcriptome using the following parameters -'-ax map-ont'. The resulting BAM files were sorted and indexed using Samtools v1.9 (26) . Counts files were generated using Featurecounts v2.0.0 (27) for genome-mapped cDNA data, and with Salmon v0.13.1 (28) for transcriptome-mapped cDNA data.

DESeq2 was used to identify differentially expressed genes/transcripts from direct cDNA data. A minimum expression threshold of five reads per gene/transcript across all the replicates was used. Comparisons between control and infected cells were made per time point (0, 2, 24, 48 hpi) with standard methods. Also, the changes between time points (0-2, 0-24, 2-24, 24-48 hpi) were compared, where the interaction term between control and infected samples across time points were found using a method by Steven Ge (https://rstudio-pubs-static.s3.amazonaws.com/329027_ 593046fb6d7a427da6b2c538caf601e1.html#example-4-twoconditionss-three-genotpes-with-interaction-terms). All genes/transcripts with p-adjusted value (padj) < 0.05 were regarded as significantly differentially expressed. This is a more sensitive method as it calculates the changes between time points in infected cells while accounting for the changes in the expression level in the control cells. The heatmap of differentially expressed genes in Caco-2 and Calu-3 at 24 and 48 hpi were generated using a novel shiny-app multiGO (http://coinlab.mdhs.unimelb.edu.au/multigo). The filters used were padj < 0.05, GO p-value < 0.0001, scaled by row, with ten maximum GO terms. Columns with more than 80% of NA's and rows with more than 10% of NA's were excluded.

Two tools were used for poly(A) tail length analysis; nanopolish (29) and tailfindr (30) . For the nanopolish analysis, all Caco-2, Calu-3 and Vero direct RNA BAM files mapped to the combined reference genome (host, sequin, virus) were indexed with the nanopolish v0.13.2 'index' function with the command 'nanopolish index -d $FAST5 -s $SEQUENCING_SUMMARY $FASTQ'. The poly(A) tail lengths of each read were estimated using the 'polya' function with default parameters 'nanopolish polya -reads $FASTQ -bam $SORTED_BAM -genome $COMBINED_REFERENCE_ GENOME > output.tsv'. The host reference sequence names were extracted from the reference file with the following command 'cat

Using this name file, the host reads were extracted from the final nanopolish TSV file by this command 'awk 'NR==FNR{A[$1]; next} $2 in A' $NAMES.TSV $TSV'.

After the poly(A) lengths were determined, duplicates were removed, and the outputs were merged with a file generated by an in-house pipeline -npTranscript (https://github.com/ lachlancoin/npTranscript) -which allowed read names to be associated with Ensembl ID's. The data were grouped per Ensembl ID and whether they were mitochondrial or nonmitochondrial genes, and the median poly(A) lengths were calculated. Differential polyadenylation between the overall median lengths of control vs infected cells per cell line were determined with p-values using Wilcoxon's test of ranks for Ensembl ID's with more than one entry.

As the nanopolish results revealed evidence of differential polyadenylation between the overall median of control and infected poly(A) lengths, tailfindr analysis was utilized to gather more evidence at a gene level. For tailfindr analysis, direct cDNA datasets from Vero, Calu-3 and Caco-2 were interrogated. Basecalled FAST5 files were subsetted by read ID's derived from demultiplexed FASTQ files. Each replicate was passed through tailfindr v0.1.0 separately. The median poly (A) and poly(T) lengths were calculated per gene and grouped by whether they were mitochondrial or non-mitochondrial.

The Pearson product-moment correlations between the median poly(A) and poly(T) lengths per gene from tailfindr analyses were compared for 2, 24 and 48 hpi datasets from Caco-2, Calu-3 and Vero datasets. Additionally, the Spearman's correlations between the tailfindr poly(T)/(A) and nanopolish poly(A) median lengths per gene (in control and infected cells) were compared for Calu-3 48 hpi datasets via the 'cor.test' function in the stats package in R.

tailfindr poly(T) results were used for the main polyadenylation linear mixed-model analysis as replicate information was able to be preserved and showed higher correlation to nanopolish poly(A) lengths compared with tailfindr poly(A) lengths. The raw poly(T) lengths were log-transformed due to the right-skew distribution and data with at least 6 entries were selected. Then, the package lmerTest v3.1-3 (31) was used to derive a linear mixed-effects regression (lmer) and therefore calculate the effect of SARS-CoV-2 infection compared with control mock-infected cells. The p-values were generated per gene and Benjamini-Hochberg adjusted using the 'p.adjust' function in R, which were filtered by padj < 0.05. Raincloud plots were generated for raw poly(T) lengths of each gene with increased poly(A) length in the Calu-3 48 hpi dataset in both conditions (control and infected) using ggplot2 v3. 3.4 (32) to replicate the raincloud plots generated by the raincloudplots package in R (33) .

To test whether the same significant genes in the mixedmodel analysis appeared in nanopolish Calu-3 48 hpi poly(A) data, the raw tail lengths were log-transformed and the median lengths per gene were tested between control and infected cells using Wilcoxon's test of ranks, where p-values were adjusted using Benjamini-Hochberg adjustment as above.

Counts from Salmon using transcriptome-mapped BAM files were used to determine the differential transcript usage of transcripts between control and infected conditions for each cell line and time point. The counts were input into DRIMSeq v1. 16.1 (34) and filtered by conditions (min_samps_gene_expr = 6, min_samps_feature_expr = 3, min_gene_expr = 10, min_feature_expr = 10). The output was used for stage-wise analysis using StageR v1.10.0 (35) , where the final list of significant genes and transcripts was filtered by padj < 0.05.

Significant GO biological terms and KEGG pathways were identified with genes that were found to be significantly differentially expressed and polyadenylated in the analyses above. For differential expression analysis, genes found to be differentially expressed in direct cDNA datasets for each condition and time point were used for analysis. For differential polyadenylation analysis, genes that were found to be increased and decreased in poly(A) length in Calu-3 48 hpi direct cDNA dataset were used for analysis. All pathway analyses were carried out using multiGO (http://coinlab.mdhs.unimelb. edu.au/multigo). multiGO uses a hypergeometric test against a background of all genes included in the GO annotation database v100. For differential expression, thresholds of padj < 0.05 and enrichment p-value < 1E-6 in at least one dataset were used for generating the GO plot, and thresholds of padj < 0.05 and enrichment p-value < 0.0001 were used for generating the KEGG plot. Non-significant bubbles were also shown (http:// coinlab.mdhs.unimelb.edu.au/multigo/?subdir=multigo/ multiGO&file=DESeq2.zip). All terms with padj < 0.05, enrichment p-value < 0.05 and at least two genes were deemed as significant for the analysis. For differential polyadenylation and differential polyadenylation vs expression analyses, thresholds of padj < 0.05 and enrichment p-value < 0.0001 were used (http://coinlab.mdhs.unimelb.edu.au/multigo?file= multigo/multiGO/DP_calu_48hpi_dcDNA_6_nofilter.zip).

Using a hypergeometric test, the probability of obtaining greater than or equal to two genes overlapping between the differential expression and polyadenylation analyses were tested. Counts of downregulated genes (padj < 0.05) from Calu-3 48 hpi datasets from DESeq2 analysis were set as m=253. Counts of genes with elongated poly(A) tails were set as k=13. Genes which were both downregulated and increased in poly(A) length were set as n=2.

The total background count was set as N=15,426. The code used for the probability calculation was 'phyper(x,k,15426-k,m, lower.tail=F) + dhyper(x,k,15426-k,m)' and was carried out in R. Both mitochondrial and non-mitochondrial genes were included in this calculation.

We utilized the percentage of mapped reads to host or virus to assess the level of infection in each cell line across different datasets and over time (Data S1) to add to our earlier study (Chang et al., 2021) . Between 0 and 2 hpi, the percentage of viral reads were minimal (< 0.1% of all reads) across each of the cell lines. At 24 hpi, differences between the cell lines started to appear, with Vero cells leading in infection with~45% of reads mapping to virus, followed by Caco-2 (~2.3%), Calu-3 (~2%), and A549 (< 0.01%), as measured with direct cDNA datasets (Data S1). The relative proportions of viral transcripts between these four cell lines were aligned with the results in literature at the 24 hpi (9). The final time point (48 hpi) showed the greatest per-cell-line infection in Caco-2 (~12.5%) and Calu-3 (~3.7%) cells but lowered in percentage in Vero cells (~25%) compared with 24 hpi. These results agreed with the idea that the infection peaked at 24 hpi in Vero cells as shown by our previous study (22) . Interestingly, the percentage of reads mapping to virus were markedly higher in direct RNA datasets compared with the direct cDNA and PCR cDNA datasets at the 24 hpi ( Table 1) . The reason for this may be due to the direct RNA method involving the sequencing of the mRNA molecule, instead of the reversetranscribed cDNA, as in the direct cDNA and PCR cDNA methods. This would remove any biases caused by the reversetranscription. These results suggested that measuring viral infection using more than one ONT RNA-seq approach may be more beneficial to accurately gauge the level of viral RNA in the sample.

The host responses to SARS-CoV-2 have been extensively studied at the gene and protein expression level (9, 36) . As long-read sequencing enables full-length transcripts to be sequenced unlike short-read sequencing, we were interested in whether our long-read differential expression results would reveal similar results to existing studies (2, 7, 9, 37) . The direct cDNA datasets were used for differential expression analysis as it included data from all four time points (0, 2, 24, 48 hpi) in Calu-3, Caco-2 and Vero cells, and two time points (0 and 24 hpi) in A549 cells.

It is well-known that A549 cells are invulnerable to SARS-CoV-2, due to the lack of ACE2 receptors (2). However, ACE2 is expressed in varying degrees in different human tissues (38) and is expressed relatively poorly in the respiratory tract (38) . Previous studies have shown the expression of ACE2 at the gene (39) and protein (9) level in Vero, Calu-3 and Caco-2 cells. Furthermore, the importance of the protease TMPRSS2 during SARS-CoV-2 has been noted (40) . Our long-read data were in line with some of these results, where no transcripts mapped to ACE2 and TMPRSS2 in A549 cells. However, we observed the absence/low expression of ACE2 (< 5 reads per replicate) and TMPRSS2 (< 25 reads per replicate) genes across all our susceptible cell lines. The presence of these transcripts correlated with the viral burden observed in each cell line from our previous study (Data S1) (22) .

In all cell lines, the earlier infection time points (0 and 2 hpi) showed little significant differential expression as expected, given the short period of infection in which host responses could be elicited. We observed an increase in significantly upregulated genes in Calu-3 and Vero cell lines at 24 hpi (Figure 1 ). At the final time point (48 hpi), we noted an increase in downregulated genes as well as the presence of upregulated genes in Calu-3, Vero and Caco-2 cell lines. In line with previous studies (6, 9) , while Calu-3 and Vero cells exhibited clear changes in transcriptional activity throughout the final two time points, Caco-2 cells revealed little differential expression activity ( Figure 1 and Table 2 ). These results were recapitulated in a second measurement of gene expression changes ( Figure S1 ). In this combined analysis, the differences between the host gene expression of control and infected cells across two time points were measured (interaction termsee Materials and Methods) as opposed to differences at each individual time point.

As the initial differential expression results showed differences between Calu-3 and Caco-2 cell lines, we then investigated the similarity of gene expression patterns between the two cell lines at the 24 and 48 hpi via a heatmap ( Figure 2 ). Following the 

To investigate the cell-specific gene expression changes at a deeper level, we wondered whether any enrichment of Table 2 . pathways was shared between multiple cell types. By utilizing the genes which were significantly differentially expressed in the direct cDNA data, GO biological and KEGG pathway analyses were carried out using a new visualization tool multiGO (http:// coinlab.mdhs.unimelb.edu.au/multigo) (Figures 3 and S2) . Amongst many enriched GO biological terms, reactive oxygen species (ROS) metabolic process was enriched in all three SARS-CoV-2 susceptible cell lines. Also, we found that only neutrophil degranulation was commonly enriched exclusively in the two human cell lines and absent in Vero cells. In contrast, a greater number of terms were shared between Calu-3 and Vero cell lines. These terms included response to virus, positive regulation of interferon-alpha production and translation (Figure 3 and Data S2).

As expected, the Calu-3 cell line showed an increase in innate immune responses, with the strongest GO enrichment for various terms associated with host immune responses to pathogens. This included terms such as defense response to virus and type I interferon signaling pathway (Figure 3 and Data S2). As shown above with the gene expression results, these responses were either absent or lacking in Caco-2 cells compared with Calu-3 cells. Additionally, some unique GO terms were enriched in Vero cells. This included positive regulation of establishment of protein localization to telomere (Figure 3 and Data S2).

Similarly, Calu-3 cells presented with the strongest significant enrichment of KEGG pathways ( Figure S2 and Data S2). In our data, the enriched pathways were related to viral infections such as influenza A (MX1, OAS1, OAS2, OAS3, RSAD2, STAT1) and measles (MX1, OAS1, OAS2, OAS3, STAT1). These pathways were upregulated at the 24 and 48 hpi time points as well as between 2 vs 24 hpi and 24 vs 48 hpi datasets in infected cells compared with control cells ( Figure S2) . DDX58 was also observed as upregulated in these pathways in the same datasets except for 2 vs 24 hpi. This has also been shown in influenza A studies (41) and the gene has been shown to encode a cytosolic sensor for other coronaviruses (42) . The coronavirus disease pathway was enriched in both Calu-3 and Vero cells (Table 3) . Also, as shown previously in various infected epithelial cell lines (43) , pathways related to neurological diseases such as Alzheimer's, Parkinson's and Huntington's diseases were found to be enriched in both Vero and Calu-3 cells. The majority of genes in these pathways were downregulated (Figure 3 and Data S2). Overall, our long-read RNA-seq data were aligned with results from previous studies which utilized short-read RNA-seq.

Polyadenylation has been previously shown in literature to be critical for many different cellular functions. The process promotes stabilization of the RNA transcript (44) , trafficking into the cytoplasm (45) , and translation into proteins (46) . Furthermore, 3' UTRs can include binding sites for RNAbinding proteins (RBPs) (47) and microRNAs (miRNAs) (48) , which contribute to gene expression. However, only a small number of studies exploring changes in host poly(A) lengths during infections have been carried out to this date (49) . Therefore, we were interested in whether infection of cells with SARS-CoV-2 would elicit changes in polyadenylation of transcripts compared with control cells. The median poly(A) tail lengths of mitochondrial and non-mitochondrial transcripts were compared between control and infected cells at 2, 24 and 48 hpi with two different methods: nanopolish and tailfindr.

Firstly, nanopolish was used to analyze non-replicate direct RNA datasets ( Table 4) . Although the medians for each condition were similar in some datasets, we observed a significant (p-value < 0.05, Wilcoxon's test of ranks two-tailed approach) poly(A) tail length increase in host nonmitochondrial RNA of infected cells compared with control cells in the 24 and 48 hpi in all susceptible cell lines. No significant change was observed in mitochondrial RNA.

As nanopolish results only used data from non-replicate direct RNA datasets, a second approach was implemented. This involved direct cDNA datasets with triplicates for each condition using tailfindr to confirm the results of nanopolish at the gene level. As the direct cDNA dataset is double-stranded, either strand of the cDNA can be sequenced. Therefore, information on both poly(A) and poly(T) lengths were obtained, which were weakly correlated ( Figure S3 and Table S1 ). When the number of differentially polyadenylated transcripts between control and infected cells were compared with nanopolish and tailfindr poly (A) and poly(T) methods with the Calu-3 48 hpi nonmitochondrial data (Wilcoxon's test, padj < 0.05), the tailfindr poly(A) dataset showed no significant differential polyadenylation ( Table S2 ). The lack of significance in the tailfindr poly(A) data can be explained by the fact that less data was available from the poly(A) dataset compared to the poly(T) dataset. The full-length The expression levels were scaled per row and organized based on relevant GO terms. In Calu-3 cells, higher relative expression of interferon-response genes was observed compared with Caco-2 cells. In contrast, Caco-2 cells showed higher relative expression of ribosomal protein and mitochondrial genes. These results further validate the cell-type-specific responses to SARS-CoV-2. The data was filtered by padj < 0.05, enrichment p-value < 0.0001. Related to Figures 1, S1 and Table 2 . reads were comprised of 0.5% of poly(A) and 99.5% of poly(T) strands, which made up~58% of the total number of detected reads with valid Ensembl ID's. This may be attributed to the process of ONT direct cDNA sequencing, where the motor protein is situated on the 5' end of each strand. This means that the poly (T) sequence is sequenced first, whereas the poly(A) sequence is sequenced last for each respective strand. Therefore, this method of sequencing would lead to higher quantity and accuracy of poly(T) sequences compared with poly(A) sequences (Table S2) .

To test this proposition, we compared the median lengths of poly(A/T) tails per gene between nanopolish and tailfindr Calu-3 48 hpi datasets via the Spearman's correlation test. Weak significant positive correlations between nanopolish poly(A) and tailfindr poly(T) datasets were observed (r = 0.12-0.27, p-value < 0.05). In contrast, nanopolish poly(A) and tailfindr poly(A) Commonly enriched GO terms across the three cell lines involved the ROS metabolic process, indicating the importance of mitochondrial processes during SARS-CoV-2 infections. The bubble size and color indicate the -log10 enrichment p-value and fraction of upregulated genes, respectively. Thresholds of padj < 0.05 and enrichment p-value < 1E-6 in at least one dataset were used for generating the plot, where insignificant bubbles are also shown on the plot. All terms with padj < 0.05, enrichment p-value < 0.05 involving at least two genes were deemed as significant for the analysis. Related to Figure S2 and Data S2. Calu-3 cells reveal expression level changes with genes related to innate immune responses, whereas Vero cells show expression level changes with ribosomal protein genes. Related to Figure S2 . data were not significantly correlated ( Figure S4 ), leading us to choose poly(T) length as a proxy for direct RNA-inferred poly(A) length.

Specific genes involved in differential polyadenylation were investigated using a linear mixed-model method using tailfindr outputs. The most interesting dataset was Calu-3 48 hpi, where twelve genes were found to be significantly increased in poly(A) length (up to~101 nt in mean poly(A) length) in the infected cells compared with control cells (UQCRC1, RPL30, RPS12, RPL13, KRT17, CXCL8, RPS6, ZBTB44, MIEN1, RPS4X, RPL10 and a lncRNA-ENSG00000273149) ( Figure S5 ). Using multiGO, GO biological terms of genes with increased poly(A) length were found, which included viral transcription (RPS12, RPL30, RPS6, RPL13, RPS4X, RPL10) (enrichment p-value < 0.0001) (Figure 4 and Data S3). KEGG pathways of these genes included the coronavirus disease and ribosome pathways ( Figure S6 and Data S3). This suggests that poly(A) tail elongation may be directly linked to SARS-CoV-2 infections, as opposed to a randomly occurring event. A small number of mitochondrial genes were also found to be differentially polyadenylated (including ENSG00000198888/MT-MD1) in both Calu-3 and Vero cell lines. Additionally, among the twelve genes which were found to be increased in poly(A) length in tailfindr mixed-model analysis, eight genes were also found to be significantly increased in poly(A) length in nanopolish analysis after logtransformation and p-value adjustment (ENSG00000273149, RPS12, RPL30, RPS6, RPL13, MIEN1, RPS4X, RPL10, padj < 0.05). This confirmed the robustness of these results, which increased the confidence of true poly(A) elongation in these eight genes. The other four genes were unable to be detected in nanopolish datasets even when padj thresholds were relaxed.

We next investigated whether there was any relationship between differential polyadenylation and differential expression results in response to infection. Interestingly, when comparing the GO terms which were shared among the differential polyadenylation and differential expression results of Calu-3 48 hpi direct-cDNA datasets, the results showed an apparent correlation between the two analyses. The enriched GO terms were composed of genes with mainly increased poly(A) tail lengths and decreased expression levels after infection ( Figure 5 and Data S4). Upon closer inspection, we found that many of the genes involved in these GO terms were associated with the ribosome. Of note, two genes (RPS4X and RPS6) which contributed to all GO terms, both showed an increase in poly(A) length and downregulated gene expression. This overlap was significant (hypergeometric test, p=0.018). When the KEGG pathways were compared in a similar manner, we observed that the coronavirus disease pathway was shared between differential polyadenylation and expression datasets ( Figure S7 and Data S4). Unlike the GO terms, most genes had an increase in poly(A) tail length but were upregulated in differential expression levels, although many ribosomal genes were downregulated in the same dataset. For example, CXCL8 had an increased poly(A) length after infection and was upregulated in the expression level results, unlike the ribosomal protein genes described above. This suggested that the correlation between increased poly(A) tails and decreased expression levels were shown specifically in the ribosome-related protein genes and indicated the importance of ribosomal protein genes during SARS-CoV-2 infections. Non-mitochondrial W = 34564286, p-value = 1.505e-09

Significant differential polyadenylation (compared with matched uninfected sample) was observed in all non-mitochondrial datasets (p-value < 0.05), except for the Calu-3 2 hpi dataset (in bold). Absence of significant differential polyadenylation in mitochondrial transcripts was also observed. This implies that non-mitochondrial transcripts are more likely to be differentially polyadenylated compared with mitochondrial transcripts. Related to Figure S3 , S4 and Tables S1, S2.

Differential transcript usage is the differential presence of transcripts between different conditions measured via identifying the proportion of each transcript against the total pool of transcripts and is another valuable feature of ONT RNAseq. Using DRIMSeq and StageR, significant differential transcript usage was observed in all three SARS-CoV-2 susceptible cell lines (Calu-3, Caco-2 and Vero) between infected and mock-control cells. These events were observed in three time points (2, 24, 48 hpi) in Caco-2, two time points in Calu-3 (2, 48 hpi) and one time point in Vero cells (24 hpi) . This included a processed transcript -SLC37A4-205 -in the Caco-2 2 hpi dataset and a retained intron transcript -GSDMB-208in the Calu-3 48 hpi dataset ( Figure 6 and Table 5 ). These results suggested that non-protein-coding transcripts could also show differential usage as with protein-coding transcripts. Furthermore, these results revealed that differential transcript usage events were not specific to a given time point and may have also occurred in a cell-specific manner.

Long-read sequencing enabled the detection of differential polyadenylation, transcript usage and gene expression level changes within in vitro SARS-CoV-2 infection models. Firstly, median poly(A) tail lengths between control and infected cells in direct RNA data were estimated using nanopolish. This showed that FIGURE 4 | GO biological terms from genes with differential poly(A) tail length in tailfindr poly(T) mixed-model analyses from the Calu-3 48 hpi direct cDNA dataset. Genes involved in viral transcription, translation, translational initiation, SRP-mediated cotranslational protein targeting to membrane and nuclear-transcribed mRNA catabolic process and nonsense-mediated decay pathways were increased in poly(A) tail length after infection. The enrichment of these GO terms suggests the importance of transcription, translation and protein pathways during SARS-CoV-2 infection. The bubble size and color indicate the -log10 enrichment p-values and the fraction of genes with increased polyadenylation, respectively. Thresholds of padj < 0.05, enrichment p-value < 0.0001 were used. Only bubbles which meet the thresholds are shown. Related to Figures S5, S6 and Data S3.

the non-mitochondrial median poly(A) tail lengths were significantly increased in all three cell lines (Caco-2, Calu-3 and Vero) at the 24 and 48 hpi ( Table 4 ). These results suggested that infection with SARS-CoV-2 may cause an increase in the poly(A) lengths of non-mitochondrial transcripts. We explored this further using tailfindr. The results from the mixed-effects model analysis showed poly(A) tail elongation after infection in Calu-3 and Vero cells at the 48 hpi. In Calu-3 cells, six genes were involved in viral transcription (RPS12, RPL30, RPS6, RPL13, RPS4X, RPL10) (Data S3). This indicated that polyadenylation may play a role in aiding the virus to generate viral mRNA for further protein production or to replicate during infection with SARS-CoV-2. This group of genes is involved in the formation of the ribosome, which is required for protein synthesis. The result is relevant as the SARS-CoV-2 non-structural protein Nsp1 binds to the 40S subunit of the ribosome and inhibits translation initiation of cellular mRNA (50, 51) . Ribosomal proteins have also been known to be associated with viral transcription/replication as the host ribosomal machinery needs to be utilized to produce viral proteins for these processes (52) . Therefore, further investigation is warranted to explore the link between increased polyadenylation in host cells after infection with SARS-CoV-2. It would also be of interest to study whether host defense ability decreases with elevated polyadenylation of transcripts related to viral transcription and the ribosome. A lncRNA and a small number of mitochondrial transcripts were also observed with an elongated poly(A) length in infected cells compared with control cells. This is of interest as it suggests that not only the protein-coding genes may be able to play a role in host FIGURE 5 | Direction of differentially polyadenylated and expressed genes belonging to common GO biological terms in the two analysis methods using the Calu-3 48 hpi direct cDNA dataset. The plot shows a potential correlation in increased poly(A) tail length and downregulation in gene expression. This phenomenon may arise due to the virus-driven translation inhibition and host-driven post-transcriptional regulation. The bubble size and color indicate the -log10 enrichment p-values and fraction of upregulated genes/genes with increased polyadenylation, respectively. Thresholds of padj < 0.05, enrichment p-value < 0.0001 were used. Only bubbles which meet the thresholds are shown. Related to Figure S7 and Data S4. responses to SARS-CoV-2. Furthermore, as elongated poly(A) tails were observed at a late time point (48 hpi) in Vero cells, it suggested that differential polyadenylation may be more likely to occur at later stages of infection compared with early stages. We also explored the observation that many of the genes involved in the commonly enriched GO terms and KEGG pathways were increased in poly(A) tail length and decreased in gene expression (Figures 5 and S7) . Interestingly, the majority of genes which were involved in both of these observations were ribosomal proteins, such as RPS4X and RPS6 which encode for proteins in the 40S ribosomal subunit. In contrast, a non-ribosomal gene CXCL8 had increases in both poly(A) length and expression level after infection, which suggested that this correlation belonged exclusively to the ribosomal protein genes. This was an interesting observation as decreased expression of ribosomal proteins in response to SARS-CoV-2 infections have been observed previously (53) , due to the effect of global suppression of ribosomal activity initiated by the virus. However, the increase in poly(A) lengths in the same transcripts was unexpected, as elongation of poly(A) tails are indicative of increase in stability, in contrast to the decrease in expression levels. Why these observations were uniquely presented in these ribosomal protein genes is currently unclear. However, we speculate this may be due to the competition between viral-driven expression downregulation and host-driven post-transcriptional regulation for increased stability of mRNA. In some cases, aberrant polyadenylation has been linked to aid the destruction of eukaryotic mRNA (49) , which may provide an alternative explanation for this phenomenon.

Supporting the importance of the ribosome during SARS-CoV-2 infection, the translation GO term was enriched in infected Calu-3 and Vero cells in differential expression analyses (Figure 3) . Among the genes involved in translation, the EIF1 gene encodes for the and ENST00000537025/KPNA2-202 transcripts were found to decrease and increase in transcript usage in infected cells relative to control cells, respectively. The differential transcript usage events included protein-coding, processed and retained-intron transcripts, which reveals involvement of alternative splicing. Furthermore, these events were involved in genes previously implicated in SARS-CoV-2 infections, suggesting the importance of exploring the transcriptome at the isoform level. Transcripts with significant differential transcript usage between conditions are marked with an asterisk (padj < 0.05). Related to Table 5 .

eukaryotic translation initiation factor 1, which partakes in translation initiation in eukaryotes by forming a part of the 43S preinitiation complex along with the 40S ribosomal subunit. As mentioned earlier, according to Lapointe et al. (51) , this factor may enhance the binding of SARS-CoV-2 Nsp1 protein to the host 40S subunit, perhaps via changing the conformation of the mRNA entry channel. This facilitates host translation inhibition by competing with host mRNA, which has been shown to also decrease translation of viral mRNA. However, another study showed that the binding of Nsp1 to the 40S subunit induced preferential translation of viral mRNA over host mRNA (50) . These findings may explain the downregulation of EIF1 as a host response to viral infection. Differential transcript usage has not yet been extensively studied with SARS-CoV-2 infections but has been useful for studying other illnesses like Parkinson's disease (20) . Differential transcript usage was observed in all three SARS-CoV-2 susceptible cell lines studied -Caco-2, Calu-3 and Vero, where proteincoding, processed and retained-intron transcripts were involved. Calu-3 48 hpi data showed the greatest number of genes that had undergone differential transcript usage, where a retained-intron transcript GSDMB-208 showed differential usage in infected cells compared with control cells (Figure 6 ). SARS-CoV-2 induces pyroptosis in human monocytes (54) , and the Gasdermin family of proteins has been implicated in cell death where the granzymemediated cleavage of GSDMB can activate pyroptosis (55) . Furthermore, KPNA2 transcripts also showed differential usage in Calu-3 cells at 48 hpi. KPNA2 is an importin that is bound by ORF6 of the virus to block nuclear IRF3 and ISGF3 to antagonize IFN-1 production and signaling (56). This suggested that transcripts with differential usage may be involved in important pathways contributing to host responses towards viral infection or the evasion of these responses by the virus. Hence, the specific activity of each transcript as opposed to the activity at the gene level should be further investigated.

Overall, our long-read sequencing datasets agreed with differential expression studies in the literature. In agreement with previous studies, our differential expression analysis using direct cDNA datasets showed varied host gene expression activity upon infection in different cell types ( Figure 1) . Although Vero cells are imperfect in vitro models for SARS-CoV-2 infection, we note that the ROS metabolic pathway was enriched in our direct cDNA data across the SARS-CoV-2 susceptible cell lines (Figure 3) . The mitochondria can be linked to the ROS metabolic pathway as it produces ROS which can induce increased oxidative stress in cells, potentially leading to cell death (57) . These results increased support for the idea that mitochondrial processes are important during these infections (58, 59) .

We also observed the downregulation of pathways involved in neurological pathologies such as Parkinson's disease (43) ( Figure S2 ). The fact that some enriched pathways were involved in nonrespiratory, neurological illnesses suggests potential modes of action for SARS-CoV-2 co-morbidities. There are now increasing numbers of studies reporting on the relevance between COVID-19 and other diseases. This includes clinical data where patients with COVID-19 can develop neurological problems which are not only non-specific (e.g. headaches), but also varied, including such maladies as viral meningitis, encephalitis (60), olfactory and gustatory dysfunction (61) , and dementia-related symptoms similar to Alzheimer's disease (62) . However, the relevance of this pathway in non-neuronal cells is potentially limited.

In our study, we explored in vitro models of SARS-CoV-2 infections using continuous cell lines with a low MOI of 0.1, which may hinder the biological relevance of these results. However, our results confirmed that the use of ONT RNA-seq methods enabled the detection of full-length isoforms, differential polyadenylation and transcript usage. This provides evidence to pursue further investigations with more sophisticated models such as air-liquid-interface cultured organoids from healthy human nasal swabs or in vivo models such as ferrets. The ideal MOI should also be found via optimization studies.

Whilst this proof-of-concept study will not result in direct clinical usage, the results of this in-depth host transcriptomic characterization and the associated ONT technology may provide the foundation for future diagnostic or therapeutic strategies. For instance, existing studies have explored gene expression level changes in early and late stage infections with SARS-CoV-1/2 (63) and also infections with Middle East Respiratory Syndrome Coronavirus (MERS-CoV) (64) infections for biomarker detection. Similarly, we believe differential polyadenylation and transcript usage studies can lead to diagnostic and therapeutic potential. In our study, we revealed genes with elongated poly(A) tails following SARS-CoV-2 infection, which may be used as biomarkers after further validation with in vivo experiments. These genes included -UQCRC1, RPL30, RPS12, RPL13, KRT17, CXCL8, RPS6, ZBTB44, MIEN1, RPS4X and RPL10. Although these genes are not among the top differentially expressed genes in studies with MERS-CoV (64) and SARS-CoV-1/2 (63), chemokines were shown to have elongated poly(A) tails in our data as well as high expression in the MERS-CoV study (64) . Considering the chemokine CXCL8 showed elongated poly(A) tails and also increased expression in our data, it would be of interest to study whether elongation of poly(A) tails also occurs with chemokine CXCL2 in infected cells with MERS-CoV. Therefore, this technique has relevance for other viral infections and is not limited to SARS-CoV-2 infections alone. Furthermore, longread RNA-seq technology may be used to measure the changes in the viral transcriptome (22) , illuminating the value of dual RNA-seq studies capturing both host and pathogen mRNA. This process may aid in bioinformatically discerning viral targets for diagnostic methods which has been performed with CRISPR-Cas methods previously (65) .

Moreover, our results showed that the nanopolish and tailfindr methods had significant weak positive correlations (r < 0.3, p-value < 0.05) in the median poly(A) lengths from nanopolish and median poly(T) lengths from tailfindr ( Figure  S4) . However, the median poly(A) lengths of tailfindr showed non-significant correlations with nanopolish poly(A) data. The increased significance of poly(T) transcripts may have occurred because more data was available from the poly(T) dataset. As our results supported similar findings from Krause et al. (30) , we speculate that these discrepancies between poly(A) and poly(T) datasets using ONT direct cDNA sequencing may arise in future studies. We acknowledge that our data is preliminary and the correlation between nanopolish and tailfindr data should be tested via direct RNA datasets with replicates to validate these findings.

Functional work should also be carried out to further validate the results of this study. For differential expression analysis, knock-down experiments within the same cell lines using CRISPR technology may be applied to evaluate the effects of differentially expressed genes identified in this study. Furthermore, functional work for polyadenylation may be approached by utilizing cell lines with plasmids containing gene sequences of interest followed by a poly(A) sequence of varying lengths. To prove whether the polyadenylation elongation improves the host defense to SARS-CoV-2 or promotes viral mRNA and protein production, an approach using a polyadenylation inhibitor in mock-infected and infected Calu-3 cells may be beneficial. Assays such as measuring viral titer and further RNA-seq may be used to test the effects of these alterations after infection with SARS-CoV-2.

Overall, by utilizing three ONT RNA-seq methodologies, we generated an in-depth characterization of differential expression, polyadenylation and differential transcript usage of cell lines infected in vitro by SARS-CoV-2. Unravelling the pathways associated with duration of infection and responses of differing cell types using long-read methods will provide novel insights into the pathogenesis of SARS-CoV-2.

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) are; 1. https://www.ncbi.nlm.nih.gov/, P R J N A 6 7 5 3 7 0 . 

World Health Organization. COVID-19 Weekly Epidemiological Update

Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19

Upper Airway Gene Expression Reveals Suppressed Immune Responses to SARS-CoV-2 Compared With Other Respiratory Viruses

Transcriptional and Proteomic Insights Into the Host Response in Fatal COVID-19 Cases

Regulation of the Interferon System: Evidence That Vero Cells Have a Genetic Defect in Interferon Production

Transcriptomic Profiling of SARS-CoV-2 Infected Human Cell Lines Identifies HSP90 as Target for COVID-19

Meta-Analysis of Host Transcriptional Responses to SARS-CoV-2 Infection Reveals Their Manifestation in Human Tumors

Differential Immune Activation Profile of SARS-CoV-2 and SARS-CoV Infection in Human Lung and Intestinal Cells: Implications for Treatment With IFN-b and IFN Inducer

Isolation and Characterization of SARS-CoV-2 From the First US COVID-19 Patient

A Nanoluciferase SARS-CoV-2 for Rapid Neutralization Testing and Screening of Anti-Infective Drugs for COVID-19

Air-Liquid Interphase Culture Confers SARS-CoV-2 Susceptibility to A549

Transcriptome of Nasopharyngeal Samples From COVID-19 Patients and a Comparative Analysis With Other SARS-CoV-2 Infection Models Reveal Disparate Host Responses Against SARS-CoV-2

Transcriptomic Signature Differences Between SARS-CoV-2 and Influenza Virus Infected Patients

Nanopore Native RNA Sequencing of a Human Poly(A) Transcriptome

Isoform Age -Splice Isoform Profiling Using Long-Read Technologies

Nanopore Sequencing of Full-Length BRCA1 mRNA Transcripts Reveals Co-Occurrence of Known Exon Skipping Events

Accurate Expression Quantification From Nanopore Direct RNA Sequencing With NanoCount

Implications of Polyadenylation in Health and Disease

Differential Transcript Usage in the Parkinson's Disease Brain

Alternative Splicing and Disease

Transcriptional and Epi-Transcriptional Dynamics of SARS-CoV-2 During Cellular Infection

Spliced Synthetic Genes as Internal Controls in RNA Sequencing Experiments

Pairwise Alignment for Nucleotide Sequences

The Sequence Alignment/Map Format and SAMtools

Featurecounts: An Efficient General Purpose Program for Assigning Sequence Reads to Genomic Features

Salmon Provides Fast and Bias-Aware Quantification of Transcript Expression

Detecting DNA Cytosine Methylation Using Nanopore Sequencing

Alignment-Free Poly(A) Length Measurement for Oxford Nanopore RNA and DNA Sequencing

Lmertest Package: Tests in Linear Mixed Effects Models

Elegant Graphics for Data Analysis

Raincloud Plots: A Multi-Platform Tool for Robust Data Visualization

DRIMSeq: A Dirichlet-Multinomial Framework for Multivariate Count Outcomes in Genomics

Stager: A General Stage-Wise Method for Controlling the Gene-Level False Discovery Rate in Differential Expression and Differential Transcript Usage

Overview of Immune Response During SARS-CoV-2 Infection: Lessons From the Past

Comparative Transcriptomic Analysis of SARS-CoV-2 Infected Cell Model Systems Reveals Differential Innate Immune Responses

Expression of the SARS-CoV-2 Cell Receptor Gene ACE2 in a Wide Variety of Human Tissues

Long-Read RNA Sequencing of COVID-19

Systematic Analysis of SARS-CoV-2 Infection of an ACE2-Negative Human Airway Cell

SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor

Dynamics of IFN-b Responses During Respiratory Viral Infection. Insights for Therapeutic Strategies

Murine Coronavirus Induces Type I Interferon in Oligodendrocytes Through Recognition by RIG-I and MDA5

Meta-Analysis of MERS, SARS and COVID-19 In Vitro Infection Datasets Reveals Common Patterns in Gene and Protein Expression

A) Tail Length-Dependent Stabilization of GAP-43 mRNA by the RNA-Binding Protein HuD

Role of Poly (A) Tail as an Identity Element for mRNA Nuclear Export

Regulation of Poly(A) Tail and Translation During the Somatic Cell Cycle

From Cis-Regulatory Elements to Complex RNPs and Back

Elegans Heterochronic Gene Lin-4 Encodes Small RNAs With Antisense Complementarity to Lin-14

Aberrant Herpesvirus-Induced Polyadenylation Correlates With Cellular Messenger RNA Destruction

SARS-CoV-2 Nsp1 Binds the Ribosomal mRNA Channel to Inhibit Translation

Dynamic Competition Between SARS-CoV-2 NSP1 and mRNA on the Human Ribosome Inhibits Translation Initiation

Translational Control in Virus-Infected Cells

In Vivo Antiviral Host Transcriptional Response to SARS-CoV-2 by Viral Load, Sex, and Age

SARS-CoV-2 Engages Inflammasome and Pyroptosis in Human Primary Monocytes

Granzyme A From Cytotoxic Lymphocytes Cleaves GSDMB to Trigger Pyroptosis in Target Cells

Reactive Oxygen Species in Mitochondria-Mediated Cell Death

Elevated Glucose Levels Favor SARS-CoV-2 Infection and Monocyte Response Through a HIF-1a/Glycolysis-Dependent Axis

Network Analysis and Transcriptome Profiling Identify Autophagic and Mitochondrial Dysfunctions in SARS-CoV-2 Infection

A First Case of Meningitis/Encephalitis Associated With SARS-Coronavirus-2

Olfactory and Gustatory Dysfunction in Coronavirus Disease 2019 (COVID-19)

Network Medicine Links SARS-CoV-2/COVID-19 Infection to Brain Microvascular Injury and Neuroinflammation in Dementia-Like Cognitive Impairment

Gene Signatures of SARS-CoV/SARS-CoV-2-Infected Ferret Lungs in Short-and Long-Term Models

Gene Signatures and Potential Therapeutic Targets of Middle East Respiratory Syndrome Coronavirus ( MERS -CoV)-Infected Human Lung Adenocarcinoma Epithelial Cells

Palindromic Target Site Identification in SARS-CoV-2, MERS-CoV and SARS-CoV-1 by Adopting CRISPR-Cas Technique

Long-Read RNA Sequencing Identifies Polyadenylation Elongation and Differential Transcript Usage of Host Transcripts During SARS-CoV-2

We wish to thank Dr. Georgia Deliyannis for guidance with cell culture methods. This research was supported by The University of Melbourne's Research Computing Services and the Petascale Campus Initiative. Furthermore, this research was undertaken using the LIEF HPC-GPGPU Facility hosted at the University of Melbourne. This Facility was established with the assistance of LIEF Grant LE170100200. The content of this manuscript has previously been released as a pre-print on bioRxiv (66) . 's correlation test) . These results indicate that median poly(A) and poly(T) lengths from direct cDNA preparations can differ per gene and that one of the two datasets may be a better predictor for true poly(A) lengths. Related to Figure S3 and Table 4 . . Table 2 | Comparison of significantly differentially polyadenylated non-mitochondrial gene clusters in the Calu-3 48 hpi datasets between nanopolish, tailfindr poly(A) and poly(T) outputs (before log-transformation) using Wilcoxon's test of ranks. nanopolish and tailfindr poly(T) results were more comparable compared with tailfindr poly(A) results, as no significant polyadenylation was observed in tailfindr poly(A) data. These results suggest that the tailfindr poly(T) lengths may be more suitable for estimating differential polyadenylation compared with tailfindr poly(A) lengths. Related to Figures S3 and S4 and Table 4 .

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.