Streamlining differential exon and 3' UTR usage with diffUTR Streamlining differential exon and 3’ UTR usage with diffUTR Stefan Gerber1,2, Gerhard Schratt2 & Pierre-Luc Germain1,3,4,* 1Group of Computational Neurogenomics, D-HEST Institute for Neurosciences, ETH Zürich 2Lab of Systems Neuroscience, D-HEST Institute for Neurosciences, ETH Zürich 3Lab of Statistical Bioinformatics, DMLS, University of Zürich 4SIB Swiss Institute of Bioinformatics *Correspondence to Pierre-Luc Germain (pierre-luc.germain@hest.ethz.ch) Abstract Background: Despite the importance of alternative poly-adenylation and 3’ UTR length for a1 variety of biological phenomena, there are limited means of detecting UTR changes from standard2 transcriptomic data.3 Results: We present the diffUTR Bioconductor package which streamlines and improves upon4 differential exon usage (DEU) analyses, and leverages existing DEU tools and alternative poly-5 adenylation site databases to enable differential 3’ UTR usage analysis. We demonstrate the6 diffUTR features and show that it is more flexible and more accurate than state-of-the-art alter-7 natives, both in simulations and in real data.8 Conclusions: diffUTR enables differential 3’ UTR analysis and more generally facilitates DEU9 and the exploration of their results.10 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ Background11 Coding sequences in eukaryotic mRNAs are generally flanked by transcribed but untranslated12 regions (UTRs) which can impact RNA stability, translation, and localization [1]. In particular, the13 length of 3’ UTRs often varies even within a given gene due to the use of different poly-adenylation14 (polyA) sites [2], leading especially to the inclusion or not of regulatory elements such as binding15 sites for microRNAs (miRNAs) or RNA-binding proteins [3]. Alternative poly-adenylation (APA)16 is highly prevalent in mammals [4] and has been shown to be important to a variety of biological17 phenomena [5,6,7,8].18 A number of methods for 3’ end sequencing have been developed with the goal to map APA19 sites [9,10,11,12,13,4,14], leading to the development of atlases such as PolyASite [15] or PolyA DB20 [16]. As such methods are only marginally used, however, it would be beneficial to leverage21 the widespread availability of traditional RNA-seq for the purpose of identifying changes in 3’22 UTR usage. A chief difficulty here is that most UTR variants are not catalogued in standard23 transcript annotations, limiting the utility of standard transcript-level quantification based on24 reference transcripts, such as salmon [17]. Nevertheless, a number of methods have been developed25 to this purpose. Methods like DaPars [18] and APAtrap [19] try to infer new polyA sites from read26 coverage changes from RNA-seq experiments, however the depletion of RNAseq coverage at the 3’27 end of transcripts makes the precise inference of polyA sites challenging [20]. Other tools like QAPA28 [8] and APAlyzer [21] use already available polyA site databases but only compare the usage of the29 most proximal polyA sites to distal ones in a pairwise fashion and fail to grasp the full complexity30 of dynamic APA when there are three or more polyA sites, which is the case for approximately half31 of mammalian transcripts [4]. Furthermore they do not make use of the already proven statistical32 frameworks to analyse different exon usage (DEU) from count data [22,23,24,25]. These tools take33 into account the inherent properties of read count distributions and are arguably more appropriate34 to analyse differences in relative polyA site usage, which is conceptually highly similar to DEU. We35 therefore developed diffUTR, which streamlines and improves upon well established DEU tools,36 and leverages them, along with polyA site databases, to infer alternative 3’ UTR usage across37 conditions.38 1 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ Results39 Streamlining differential bin/exon usage analysis40 Popular bin-based DEU methods are provided by the limma [25,24], edgeR [23] and DEXSeq [22]41 packages. However, their usage is not straightforward for non-experienced users, and their results42 often difficult to interpret. We therefore developed a simple workflow (Figure 1A), usable with any43 of the three methods but standardizing inputs and outputs. In particular, bin annotation and quan-44 tification, as well as different usage results, are all stored in a RangedSummarizedExperiment45 [26], which facilitates data storage and exploration, and enables advanced plotting functions irre-46 spective of the underlying method. diffUTR is flexible in its application, and supports the use of47 strand information if available.48 Transcript annotation GRanges / EnsDb / .gtf polyA sites GRanges / .bed prepareBins countFeatures D E U w ra p p e rs DEXseq diffSplice2 edgeR Bins (GRanges) Ranged Summarized Experiment bam files Plotting functions (Rsubread) UTRCDS Transcript annotation + polyA sites Bins A B Figure 1: Overview. A: diffUTR workflow. Bins are prepared from various types of gene anno- tations as well as, optionally, additional APA-driven segmentation and extension, then read counts within bins as well as bin information are stored in a standardized RangedSummarizedExperiment, which can then be used as an input for any of the three DEU methods, producing again a stan- dardized output that can be used with the package’s plotting functions. B: Schematic of bin preparation. APA sites are used to further segment and extend disjoined gene bins. Improvement to diffSplice49 diffUTR also implements an improved version of limma’s diffSplice method which does not50 assume constant residual variance across bins of the same gene (see diffSplice2). To test the effect51 2 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ of these modifications in a standard DEU setting, we ran both versions (as well as the other two52 DEU methods) on simulated data from a previous DEU benchmark [27]. The precision and recall53 results (Figure 2A) confirmed the previously observed superiority of DEXSeq and, more generally,54 the imperfect false discovery rate (FDR) control. Importantly, it also confirmed that our improved55 diffSplice2 method outperforms the original, at no additional computing cost.56 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 FDR T P R Differential exon usage A 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 FDR T P R method APAlyzer APAlyzer2 DaPars DEXSeq diffSplice diffSplice2 edgeR QAPA.dPau QAPA.pval Differential UTR usage B d iff U T R Figure 2: FDR and recall (TPR) on simulated data. A: In the classical DEU context. B: In the differential UTR usage context. The dashed line indicates a real False Discovery Rate (FDR) of 5%, and the dots indicate nominal FDRs of 10, 5 and 1%. diffUTR methods far outperform QAPA and DaPars. In both contexts, our modifications to diffSplice significantly improve its performance. Application to differential UTR usage and benchmark on a simulation57 We next sought to evaluate the methods when applied for differential UTR analysis. For this58 purpose, APA sites are used to further segment and extend UTR bins, as illustrated in Figure 1B59 (see methods for the details). Given the absence of RNAseq data with a differential UTR usage60 ground truth, we simulated reads with known UTR differences from real data (see Simulated61 Data). We then ran the different diffUTR methods (as well as the unmodified diffSplice62 variant), and compared them to alternative methods. While DaPars and APAlyzer provide gene-63 level significance testing, QAPA does not, and our attempts to use its equivalence classes with64 standard transcript usage methods (see methods) gave very poor results. Therefore, for the65 purpose of comparison we tried two alternatives: simply ranked genes according to QAPA’s main66 output, i.e. the absolute difference in polyA site usage between conditions (|∆PAU|), labeled in67 2B as QAPA.dPau, or running t -tests on the log-transformed PAU values, labeled as QAPA.qval.68 3 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ Since APAlyzer produces different analyses for genes’ 3’ end and intronic APA usage, we used69 both the 3’ end results and a combination of the two (the latter shown as APAlyzer2 ). As Figure70 2B shows, all diffUTR methods outperformed alternatives by far. On this test, our improved71 diffSplice2 had comparable performance to DEXSeq, at a fraction of the computing costs.72 Differential UTR usage in real data73 We next sought to test diffUTR in real data. First, since 3’ UTRs are known to generally lengthen74 during neuronal differentiation [28,8], we expected to observe a skew towards positive fold changes75 of 3’ UTR bins when comparing RNAseq experiments from embryonic stem cells (ESC) and ESC-76 derived neurons. We therefore re-analyzed data from [29] and observed clearly the expected skew77 among statistically-significant genes, especially for bins with a higher expression (Figure 3A).78 We next found both 3’ sequencing and standard RNAseq data from samples of mouse hip-79 pocampal slices undergoing Forskolin-induced long-term potentiation [30], which enabled us to use80 the 3’ sequencing data as a truth for analysis performed on the standard RNAseq data (Figure81 3B and Supplementary Figure 1). In this case we represent the results through Receiver-operator82 characteristic (ROC) curves since the Precision-recall curves make the differences less visible due83 to the lower general power. Although power to detect UTR changes is necessarily low with respect84 to 3’ sequencing, we again observed that diffUTR methods clearly outperformed all alternative85 methods.86 Exploring differential exon/UTR usage results87 diffUTR provides three main plot types to explore differential bin usage analyses, each with a88 number of variations. Figure 4 showcases them in the context of long-term potentiation of mouse89 hippocampal neurons [30]. plotTopGenes (Figure 4A) provides gene-level statistic plots (similar90 to a ‘volcano’ plot), which come in two variations. For standard DEU analysis, absolute bin-level91 coefficients are weighted by significance and averaged to produce gene-level estimates of effect92 sizes. For differential 3’ UTR usage, where bins are expected to have consistent directions (i.e.93 lengthening or shortening of the UTR) and where their size is expected to have a strong impact on94 biological function, the signed bin-level coefficients are weighted both by size and significance to95 produce gene-level estimates of effect sizes. By default, the size of the points reflects the relative96 expression of the genes, and the color the relative expression of the significant bins with respect97 to the gene.98 4 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ A 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 FPR T P R method APAlyzer DaPars DEXSeq diffSplice2 QAPA.pval B −5 0 5 0 1 2 3 4 5 6 Bin log2(foldchange) B in m e a n l o g (C P M ) d iff U T R 3' UTR lengthening Figure 3: Differential UTR analysis on real data. A:. 3’ UTR lengthening during neuronal differentiation. Plotted are the UTR bins found statistically significant (bin- and gene-level FDR both ¡ 0.1) by diffUTR (diffSplice2) when comparing in vitro differentiated neurons to mouse embryonic stem cells. The color indicates the point density. The clear skew towards a positive bin- level foldchange (indicative, in most cases, of a UTR lengthening), especially for bins with a higher mean count (CPM=counts per million reads sequenced). B: Receiver-operator characteristic (ROC) curves of differential UTR usage analysis on the LTP dataset, using 3’ sequencing to establish the ground truth. The axes are square-root-transformed to improve visibility, and only a subset of method variations are shown (see Supplementary Figure 1 for all variants). deuBinPlot (Figure 4B) provides bin-level statistic plots for a given gene, similar to those99 produced by DEXSeq and limma, but offering more flexibility. They can be plotted as overall100 bin statistics, per condition, or per sample, and can display various types of values. Importantly,101 since all data and annotation are contained in the object, these can easily be included in the plots.102 Figure 4B shows a lengthening of the Jund 3’ UTR in the LTP group.103 Finally, geneBinHeatmap (Figure 4C) provides a compact, bin-per-sample heatmap represen-104 tation of a gene, allowing the simultaneous visualization of various information. We found these105 representations particularly useful to prioritize candidates from differential bin usage analyses. For106 example, many genes show differential usage of bins which are generally not included in most107 transcripts of that gene (low count density), and are therefore less likely to be relevant.108 Further variations tested109 During implementation, we tested other changes to the method which were ultimately discarded110 as they did not improve performance, but which we here briefly report.111 First, differential UTR analysis differs from typical differential exon usage analysis in that the112 vast majority of UTR bins are consecutively transcribed, meaning that changes in the usage of a113 5 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ Smg6 Ntrk2 Homer1 Nr4a2 Slc2a1 Txndc11Stmn4Rheb FosbRnf217 Dio2 Nr4a3 Syt4Scg2Hmgcs1 Plk2Rbbp7 Nfkb1 Clta Frmd6 Arid5a Eprs Lmna Slc1a3 Grem2 0 10 20 30 40 0.5 1.0 1.5 2.0 Weighted absolute coefficient − lo g 1 0 (q .v a lu e ) geneMeanDensity 0.00 0.25 0.50 0.75 −0.2 0.0 0.2 density.ratio A 0 1 2 3 sqrt−scaled genomic location type UTR CDS condition CTRL LTP JundB Scaled S m g 6 b in s ty p e lo g W id th m e a n L o g D e n s it y lo g 1 0 P V a lu e logcpm condition type CDS/UTR CDS UTR/3UTR UTR 3UTR non−coding log10PValue 0 10 20 30 40 50 condition CTRL LTP scaled logcpm −2 −1 0 1 2 logNormDensity 0 0.05 0.1 0.15 0.2 0.25 C logCPM lo g (C P M ) Figure 4: Plotting functions. A: plotTopGenes provides significance and effect size statistics aggregated at the gene level. B: deuBinPlot provides a more flexible version of the bin-level gene plots generated by common DEU packages. Shown here is the upregulation of Jund 3’ UTR upon LTP. C: geneBinHeatmap provides a compact, bin-per-sample heatmap representation of a gene. 6 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ bin should also be visible in downstream bins. We therefore reasoned that it would be beneficial to114 use this property to improve statistical analysis. We reasoned that connected bins with significant115 fold changes in the same direction could be unified and their p-values aggregated, and tested a116 rudimentary implementation using Fisher’s aggregation. However, this decreased accuracy and led117 to a worse FDR control (Supplementary Figure 2).118 Second, most methods compare bin-level foldchanges to gene-level ones to identify bins be-119 having differently from the others, and we reasoned that, especially for genes with more UTR bins120 than CDS bins, including counts of 3’ UTR when calculating overall gene expression could under-121 estimate the gene expression and possibly mistake the UTR foldchange for the gene foldchange.122 We therefore tried a modification of diffSplice to only calculate the gene foldchange from coding123 sequence (CDS) bins and then compare it to the individual bins. Again, this approach proved124 unsuccessful (Supplementary Figure 3).125 Discussion126 diffUTR streamlines DEU analysis and outperforms alternative methods in inferring UTR changes,127 which demonstrates the utility of harnessing powerful, well-established frameworks for new ends.128 It must be noted that the way in which the simulation was performed, i.e. elongating transcripts129 to the next polyA site(s), is similar to the way diffUTR disjoins the annotation into bins, which130 could cause a bias towards this method (as well as QAPA and APAlyzer, which also makes use of131 alternative polyA sites). However, this is unlikely to be the reason for the observed superiority of132 diffUTR -based methods given the considerable extent by which they outperformed alternatives,133 and the observation of similar results in real data.134 Similar to DEU tools [27], diffUTR fails to control the FDR correctly, and our attempts so far135 to improve this remained unsuccessful. We therefore recommend prudence with results close to136 the significance threshold. In addition, and in contrast to DEU where exons are subject to splicing137 in a potentially independent fashion, 3’ UTRs typically do not undergo splicing and therefore only138 differ in length between conditions. This means that the behavior of a UTR bin is dependent on139 that of upstream bins, a property which could be exploited to improve accuracy at the gene-level.140 However, our simple attempt to do so by combining p-values of consecutive bins did not have the141 desired outcome, pointing to the need of more research in this direction.142 Further, the bin-based approach has the drawback of not pinpointing the exact UTR locations:143 7 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ it is limited to the bin resolution, and the bins themselves are limited by incomplete transcript144 and APA annotations. Additionally, because there is a significant drop off in read coverage at the145 end of transcripts, we have observed that it is often bins upstream of the actual UTR lengthen-146 ing/shortening event which give a statistically-significant signal rather than the one truly affected.147 This is why we have provided tools to enable the further inspection of events in a given gene.148 Finally, the results of bin-based analyses are limited by the overlaps of transcripts from different149 genes, an issue on which differential transcript usage analysis approaches appear superior (e.g.150 [31]). However, transcript usage analysis tools are dependent on the completeness of the transcript151 annotation, while bin-based approaches are more open to the discovery of unannotated transcript152 variants, which is especially relevant for differential UTR usage. Here, we made the choice of153 including ambiguous bins, but flagging them as such, enabling users to interpret them with caution.154 While DEXSeq remains the tool of predilection for relative bin usage analyses, it scales very badly155 to larger sample sizes, and alternatives might be needed in some contexts. Our changes to156 limma’s original diffSplice method consistently result in more accurate predictions, making157 this new method the best compromise for bin-based approaches when DEXSeq is not applicable.158 More generally, it also shows that even with well-established approaches, there is still room for159 incremental, but non-negligible improvement.160 Methods161 0.1 Data and code availability162 The data objects and code used to produce the figures are available through the https://163 github.com/plger/diffUTR_paper repository. The diffUTR source code is available at https:164 //github.com/ETHZ-INS/diffUTR.165 0.2 RNAseq data processing166 For the evaluation of diffSplice2 in a standard DEU case, we used bin count data obtained167 from the authors of the original DEU benchmark [27]. For other datasets, reads were downloaded168 from the SRA, aligned to the GRCm38.p6 genome using STAR 2.7.3a with default parameters169 and the GENCODE M25 annotation as guide. The same gene annotation was used as input for170 bin creation.171 8 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://github.com/plger/diffUTR_paper https://github.com/plger/diffUTR_paper https://github.com/plger/diffUTR_paper https://github.com/ETHZ-INS/diffUTR https://github.com/ETHZ-INS/diffUTR https://github.com/ETHZ-INS/diffUTR https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ 0.3 diffUTR172 diffUTR is implemented as a Bioconductor package making use of the extensive libraries avail-173 able, especially the GenomicRanges package [32] and the different DEU methods (see Differential174 analysis).175 0.3.1 Preparing bins176 Exons are extracted from the genome annotation and flattened into non-overlapping bins (Figure177 1B). In other words, the exon annotation is fragmented into the widest ranges where the set of178 overlapping features is the same. Bins that do not overlap with coding sequences (CDS) and179 belong to a protein coding transcript are labeled as UTR and the rest as CDS. When APA sites180 are also provided as input (for the purpose of this article, polyAsite v2.0 sites were used), bins are181 further segmented and/or extended. For this the closest upstream CDS or UTR is found for every182 poly(A) site and the UTR is defined from this boundary to the polyA site and assigned to the183 corresponding gene and transcript (Figure 1B). If the newly defined UTRs exceeds a predefined184 length specified by maxUTRbinSize (default is 15000bp), it is ignored as unlikely to be a real185 UTR. Moreover, if the start of a gene is the closest upstream sequence before any UTR or CDS186 the newly defined UTR is ignored to avoid assignment problems. In order to later differentiate187 between regions that are 3’ or 5’ UTRs, regions that are downstream of the last CDS of a given188 transcript were labeled as 3’ UTR. The label ‘non-coding’ is assigned to all bins that have no189 protein coding transcript overlapping it.190 If a bin originates from regions belonging to different genes, the bin is duplicated and as-191 signed once to each gene, so that each gene contains the same fragment once. Alternatively, the192 genewise argument can be used so that only exons belonging to the same gene are considered193 when flattening.194 0.3.2 Quantification195 For quantification, countFeatures() uses the featureCounts() function from the Rsubread196 package [33] to count previously mapped reads overlapping each bin. By default every read is197 assigned once to every bin it overlaps with and can therefore be counted multiple times, which is198 needed because many bins are shorter than the read length. Alternative counting methods, such as199 summarizeOverlaps() from the GenomicAlignments package [32] performed considerably worse200 9 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ in the simulation. The function returns a RangedSummarizedExperiment object [26], containing201 the read counts as well as the bin annotation.202 0.3.3 Differential analysis203 Three wrappers implement corresponding DEU methods on the204 RangedSummarizedExperiment object previously generated, returning results as further stan-205 dardized annotation within the object. For differential UTR analysis, gene-level results are ob-206 tained by filtering the bin-level results for those assigned to the type UTR and/or 3’ UTR, and207 setting all other p-values to 1 before aggregation.208 diffSpliceDGE.wrapper() This is a wrapper around edgeR ’s DEU method based on fitting a209 negative binomial generalized linear model [23]. In a first step the bins are filtered to decide which210 have a large enough read count to be kept for the statistical analysis (filterByExpr()), the library211 sizes are normalized (calcNormFactors()) and the dispersion is estimated (estimateDisp()).212 After this the model is fitted (glmFit()). If the option QLF = TRUE (default), an extended model213 is fitted, using quasi-likelihood methods to account for gene specific variability (glmQLFit()).214 In the last step bin fold changes are tested to be different from overall gene fold changes,215 using a likelihood ratio test or a quasi-likelihood F-Test depending on the QLF option chosen216 (diffSpliceDGE()). The gene level p-values are obtained by the Simes’ method [34].217 DEXseq.wrapper() In this method the standard DEXseq differential exon usage pipeline [22] is218 implemented. It is similarly to edgeR based on fitting a negative binomial model but instead of219 comparing fold change differences between bins and genes, DEXseq compares a full model con-220 taining a term corresponding to the change in exon usage between conditions to a reduced model221 without this term. The two fits are compared using a χ2 likelihood-ratio test. The libraries are nor-222 malized (estimateSizeFactor()), the dispersion is estimated (estimateDispersion() and the223 models are fitted (testForDEU()). In a last step the fold changes between the bins are estimated224 ( estimateExonFoldChanges()). To obtain gene level results the function perGeneQValue()225 is used, which is based on the Šidák method [35].226 diffSplice.wrapper() and diffSplice2 This method implements the differential exon usage pipeline227 of limma for RNA-seq data [25]. The pre-processing is identical to diffSpliceDGE.wrapper(),228 then the precision weights are estimated with (limma::voom()) and the linear models are fitted229 10 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ (limma::lmFit()). In the last step, bin fold changes are tested to be different from overall230 gene fold changes, using a moderated t-test (diffSplice() or, by default, diffSplice2() – see231 below). The gene level p-values are obtained by the Simes’ method [34].232 The diffUTR::diffSplice2 function provides an improved version of limma’s original233 diffSplice method. diffSplice works on the bin-wise coefficient of the linear model which234 corresponds to the log2 fold changes between conditions. It compares the log2(fold change) β̂k,g235 of a bin k belonging to gene g, to a weighted average of log2(fold change) of all the other bins236 of the same gene combined B̂k,g (the subscript g will be henceforth omitted for ease of reading).237 The weighted average of all the other bins in the same gene is calculated by238 B̂k = ∑N i,i6=k wiβ̂i∑N i,i6=k wi (1) where wi = 1 u2i and ui refers to the diagonal elements of the unscaled covariance matrix (X T V X)−1.239 X is the design matrix and V corresponds to the weight matrix estimated by voom. The difference240 of log2 fold changes, which is also the coefficient returned by diffSplice() is then calculated241 by Ĉk = β̂k − B̂k. Instead of calculating the t-statistic with Ĉk, this value is scaled again in the242 original code:243 D̂k = Ĉk √ 1 − wk∑N i wi (2) and the t -statistic is calculated as:244 tk = D̂k uksg (3) s2g refers to the posterior residual variance of gene g, which is calculated by averaging the245 sample values of the residual variances of all the bins in the gene, and then squeezing these residual246 variances of all genes using empirical Bayes method. This assumes that the residual variance is247 constant across all bins of the same gene.248 In diffSplice2(), we applied three changes to the above method. First, the residual249 variances are not assumed to be constant across all bins of the same gene. This results in the250 sample values of the residual variances of every bin now being squeezed using empirical Bayes251 method, resulting in posterior variances s2i for every individual bin i. Second, the weights wi, used252 to calculate B̂k, now incorporate the individual variances by wi = 1 s2i u 2 i . Third, the Ĉk value is253 11 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ directly used to calculate the t -statistic, which after all these changes now corresponds to254 tk = Ĉk uksi . (4) 0.4 Simulated Data255 The simulation was done using the Polyester R package [36] using parameters obtained from the256 control samples of mouse hippocampus RNAseq [30]. Using salmon [17] with a decoy-aware tran-257 scriptome index for the mm10 genome from [37], the abundances for each transcript were first esti-258 mated to learn parameters for the simulation. 1000 transcripts from different genes were randomly259 chosen. The last exon of all these transcripts was lengthened to the next, second next or third next260 downstream APA site annotated in the polyAsite database [15]. Duplicates of these transcripts were261 generated, which had less or no lengthening of their last exon, generating pairs of transcripts with262 different UTR lengths. For each transcript pair, one transcript was up and the other one down reg-263 ulated by the same sampled fold change between 1.3 and 5. To make it more realistic, fold changes264 were also assigned to 300 genes from the set with differential UTR, and 300 genes that did not have265 differences in UTR usage. Reads were then generated for two conditions with three replicates each266 using the simulate experiment() function with the options paired = FALSE, error model =267 "illumina5", bias = "cdnaf" and strand specific = TRUE. The simulated reads are avail-268 able on figshare at https://dx.doi.org/10.6084/m9.figshare.13726143.269 0.5 3’-seq analysis270 To establish a set of true relative differences in UTR usage from the 3’ sequencing data [30], we271 downloaded the authors’ counts per cluster from the Gene Expression Omnibus (file272 GSE84643 3READS count table.txt.gz). We used the 3h treatment because we observed it273 to have the strongest signal, and excluded one sample (A6) that appeared like a strong outlier274 based on PCA and MDS plots. We kept only clusters with at least 50 reads in at least 2 samples,275 and used DEXSeq to fit a negative binomial on each gene and estimate the significance of the276 cluster:condition term. We considered as true positives genes with a gene-level and bin-level277 q-value ≤ 0.1, and true negatives genes with a gene-level q-value ≥ 0.8. Genes for which all278 tested methods produced a p-value of 1 or NA (i.e. genes filtered out as too lowly expressed in279 the standard RNAseq) were excluded for the benchmark.280 12 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://dx.doi.org/10.6084/m9.figshare.13726143 https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ 0.6 Comparisons with alternatives281 For the comparison of methods, all functions were used with their default parameters and run282 according to their manual. As QAPA and DaPars do not provide means to aggregate the results283 to the gene level this was implemented separately. For DaPars the p-values were aggregated to284 the gene level by using Simes’ method [34] for comparability with diffUTR. Aggregation by taking285 the minimum p-value of all the transcripts in a gene produced extremely similar results. For QAPA286 |∆PAU| was calculated and aggregated to a gene level by taking the maximum from all transcripts287 of a gene and the genes were ranked by this value. Alternatively, we also tested applying a t -test288 on the log-transformed PAU values (log-transforming had a negligible effect), followed by Simes’289 gene-level aggregation. Attempts to complement QAPA with p-values estimated from established290 statistical tests working with its equivalence classes, such as BANDITS [31], did not improve the291 results and were therefore discarded so as not to distort the original method. Finally, for APAlyzer2292 we combined the 3’ UTR and intronic APA analyses by using the minimum of the two p-values.293 See the https://github.com/plger/diffUTR_paper repository for details.294 We used the following software versions for comparisons: Polyester 1.24.0, DEXSeq 1.34.0,295 edgeR 3.30.0, limma 3.44.0, DaPars 0.9.1, APAlyzer 1.5.5. For QAPA, we used salmon 1.3.0296 with validateMappings.297 Competing interests298 The authors declare no competing interests beside being the developers of the described package.299 Author’s contributions300 SG developed the bin preparation and the diffSplice modification, and ran most of the analyses.301 PLG and SG wrote the package and paper. PLG and GS supervised the project.302 Acknowledgements303 SG performed this research as part of his bachelor thesis in the Interdisciplinary Sciences program at304 ETH. PLG’s position is co-funded by Prof. Mark Robinson (Institute of Molecular Life Sciences,305 University of Zurich) and Professors Gerhard Schratt, Johannes Bohacek and Isabelle Mansuy306 (Institute of Neuroscience, ETH Zurich). GS is supported by grants from the SNF (SNF 179651,307 13 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://github.com/plger/diffUTR_paper https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ SNF 189486) and the ETH (ETH-24 18-2 (NeuroSno)). We thank the Robinson group (UZH) for308 feedback.309 References310 1. Lewis, J. D., Gunderson, S. I. & Mattaj, I. W. The influence of 5′ and 3′ end structures on pre-mRNA311 metabolism. Journal of Cell Science. issn: 00219533 (1995).312 2. Tian, B. & Manley, J. L. Alternative polyadenylation of mRNA precursors. Nature Reviews Molecular Cell313 Biology. issn: 14710080 (2016).314 3. Fabian, M. R., Sonenberg, N. & Filipowicz, W. Regulation of mRNA translation and stability by microRNAs.315 Annual Review of Biochemistry. issn: 00664154 (2010).316 4. Derti, A. et al. A quantitative atlas of polyadenylation in five mammals. Genome Research. issn: 10889051317 (2012).318 5. Sandberg, R., Neilson, J. R., Sarma, A., Sharp, P. A. & Burge, C. B. Proliferating cells express mRNAs with319 shortened 3′ untranslated regions and fewer microRNA target sites. Science. issn: 00368075 (2008).320 6. Mayr, C. & Bartel, D. P. Widespread Shortening of 3UTRs by Alternative Cleavage and Polyadenylation321 Activates Oncogenes in Cancer Cells. Cell. issn: 00928674 (2009).322 7. Miura, P., Shenker, S., Andreu-Agullo, C., Westholm, J. O. & Lai, E. C. Widespread and extensive lengthening323 of 39 UTRs in the mammalian brain. Genome Research. issn: 10889051 (2013).324 8. Ha, K. C., Blencowe, B. J. & Morris, Q. QAPA: A new method for the systematic analysis of alternative325 polyadenylation from RNA-seq data. Genome Biology. issn: 1474760X (2018).326 9. Fox-Walsh, K., Davis-Turak, J., Zhou, Y., Li, H. & Fu, X. D. A multiplex RNA-seq strategy to profile poly(A327 +) RNA: Application to analysis of transcription response and 3′ end formation. Genomics. issn: 08887543328 (2011).329 10. Fu, Y. et al. Differential genome-wide profiling of tandem 3 UTRs among human breast cancer and normal330 cells by high-throughput sequencing. Genome Research. issn: 10889051 (2011).331 11. Zheng, D., Liu, X. & Tian, B. 3READS+, a sensitive and accurate method for 3 end sequencing of polyadeny-332 lated RNA. RNA. issn: 14699001 (2016).333 12. Jan, C. H., Friedman, R. C., Ruby, J. G. & Bartel, D. P. Formation, regulation and evolution of Caenorhabditis334 elegans 3′UTRs. Nature. issn: 00280836 (2011).335 13. Shepard, P. J. et al. Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. RNA.336 issn: 13558382 (2011).337 14. Hwang, H. W. et al. cTag-PAPERCLIP Reveals Alternative Polyadenylation Promotes Cell-Type Specific338 Protein Diversity and Shifts Araf Isoforms with Microglia Activation. Neuron. issn: 10974199 (2017).339 15. Herrmann, C. J. et al. PolyASite 2.0: A consolidated atlas of polyadenylation sites from 3 end sequencing.340 Nucleic Acids Research. issn: 13624962 (2020).341 14 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ 16. Wang, R., Nambiar, R., Zheng, D. & Tian, B. PolyA DB 3 catalogs cleavage and polyadenylation sites342 identified by deep sequencing in multiple genomes. Nucleic Acids Research 46, D315–D319. issn: 0305-1048.343 https://doi.org/10.1093/nar/gkx1000 (2021) (Jan. 2018).344 17. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware345 quantification of transcript expression. en. Nature Methods 14. Number: 4 Publisher: Nature Publishing346 Group, 417–419. issn: 1548-7105. https://www.nature.com/articles/nmeth.4197 (2021) (Apr. 2017).347 18. Xia, Z. et al. Dynamic analyses of alternative polyadenylation from RNA-seq reveal a 3′-UTR landscape across348 seven tumour types. Nature Communications. issn: 20411723 (2014).349 19. Ye, C., Long, Y., Ji, G., Li, Q. Q. & Wu, X. APAtrap: Identification and quantification of alternative polyadeny-350 lation sites from RNA-seq data. Bioinformatics. issn: 14602059 (2018).351 20. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: A revolutionary tool for transcriptomics. Nature Reviews352 Genetics. issn: 14710056 (2009).353 21. Wang, R. & Tian, B. APAlyzer: a bioinformatics package for analysis of alternative polyadenylation isoforms.354 Bioinformatics (Oxford, England). issn: 13674811 (2020).355 22. Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Research.356 issn: 10889051 (2012).357 23. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: A Bioconductor package for differential expression358 analysis of digital gene expression data. Bioinformatics. issn: 14602059 (2009).359 24. Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. Voom: Precision weights unlock linear model analysis tools360 for RNA-seq read counts. Genome Biology. issn: 1474760X (2014).361 25. Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies.362 Nucleic Acids Research. issn: 13624962 (2015).363 26. Morgan, M., Obenchain, V., Hester, J. & Pagès, H. SummarizedExperiment: SummarizedExperiment con-364 tainer. R package version 1.12.0 (2018).365 27. Soneson, C., Matthes, K. L., Nowicka, M., Law, C. W. & Robinson, M. D. Isoform prefiltering improves perfor-366 mance of count-based methods for analysis of differential transcript usage. Genome Biology. issn: 1474760X367 (2016).368 28. Blair, J. D., Hockemeyer, D., Doudna, J. A., Bateup, H. S. & Floor, S. N. Widespread Translational Remodeling369 during Human Neuronal Differentiation. Cell Reports. issn: 22111247 (2017).370 29. Whipple, A. J. et al. Imprinted Maternally Expressed microRNAs Antagonize Paternally Driven Gene Programs371 in Neurons. English. Molecular Cell 78. Publisher: Elsevier, 85–95.e8. issn: 1097-2765. https://www.cell.372 com/molecular-cell/abstract/S1097-2765(20)30041-1 (2021) (Apr. 2020).373 30. Fontes, M. M. et al. Activity-Dependent Regulation of Alternative Cleavage and Polyadenylation during Hip-374 pocampal Long-Term Potentiation. Scientific Reports. issn: 20452322 (2017).375 31. Tiberi, S. & Robinson, M. D. BANDITS: Bayesian differential splicing accounting for sample-to-sample vari-376 ability and mapping uncertainty. Genome Biology. issn: 1474760X (2020).377 15 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1093/nar/gkx1000 https://www.nature.com/articles/nmeth.4197 https://www.cell.com/molecular-cell/abstract/S1097-2765(20)30041-1 https://www.cell.com/molecular-cell/abstract/S1097-2765(20)30041-1 https://www.cell.com/molecular-cell/abstract/S1097-2765(20)30041-1 https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ 32. Lawrence, M. et al. Software for Computing and Annotating Genomic Ranges. PLoS Computational Biology.378 issn: 1553734X (2013).379 33. Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: An efficient general purpose program for assigning sequence380 reads to genomic features. Bioinformatics. issn: 14602059 (2014).381 34. Simes, R. J. An improved bonferroni procedure for multiple tests of significance. Biometrika. issn: 00063444382 (1986).383 35. Šidák, Z. Rectangular Confidence Regions for the Means of Multivariate Normal Distributions. Journal of the384 American Statistical Association. issn: 1537274X (1967).385 36. Frazee, A. C., Jaffe, A. E., Langmead, B. & Leek, J. T. Polyester: Simulating RNA-seq datasets with differential386 transcript expression. Bioinformatics. issn: 14602059 (2015).387 37. Stolarczyk, M., Reuter, V. P., Smith, J. P., Magee, N. E. & Sheffield, N. C. Refgenie: a reference genome388 resource manager. GigaScience. issn: 2047217X (2020).389 16 .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2021.02.12.430963doi: bioRxiv preprint https://doi.org/10.1101/2021.02.12.430963 http://creativecommons.org/licenses/by-nd/4.0/ Data and code availability RNAseq data processing diffUTR Preparing bins Quantification Differential analysis Simulated Data 3'-seq analysis Comparisons with alternatives