key: cord-0315631-fqppx5ce
authors: Chen, Athena; Kammers, Kai; Larman, H Benjamin; Scharpf, Robert B.; Ruczinski, Ingo
title: Detecting Antibody Reactivities in Phage ImmunoPrecipitation Sequencing Data
date: 2022-01-21
journal: bioRxiv
DOI: 10.1101/2022.01.19.476926
sha: 8e74ffa58a4aadee3ca94d22380868f7f46a0d35
doc_id: 315631
cord_uid: fqppx5ce

Phage ImmunoPrecipitation Sequencing (PhIP-Seq) is a recently developed technology to assess antibody reactivity, quantifying antibody binding towards hundreds of thousands of candidate epitopes. The output from PhIP-Seq experiments are read count matrices, similar to RNA-Seq data; however some important differences do exist. In this manuscript we investigated whether the publicly available method edgeR1 for normalization and analysis of RNA-Seq data is also suitable for PhIP-Seq data. We find that edgeR is remarkably effective, but improvements can be made and introduce a Bayesian framework specifically tailored for data from PhIP-Seq experiments (Bayesian Enrichment Estimation in R, BEER).

Because of their high abundance, easy accessibility in peripheral blood, and relative stability ex vivo, antibodies serve as excellent records of environmental exposures and immune responses. While several multiplexed methods have been developed to assess antibody binding specificities, Phage Immuno-Precipitation Sequencing (PhIP-Seq) is the most efficient technique available for assessing antibody binding to hundreds of thousands of peptides at cohort scale [2] [3] [4] . PhIP-Seq uses oligonucleotide library synthesis to encode proteome spanning peptide libraries for display on bacteriophages. These libraries are immunocaptured using an individual's serum antibodies, and the antibody-bound library members are identified by high throughput DNA sequencing. The VirScan 4 assay uses the PhIP-Seq method to quantify antibody binding to around 100,000 peptides spanning the genomes of more than 200 viruses that infect humans. Other commonly used libraries include the AllerScan 5 and ToxScan libraries 6 , and a focused library for coronaviruses, including SARS-CoV-2 7 .

We and others have utilized PhIP-Seq to successfully identify novel autoantigens associated with autoimmune diseases [8] [9] [10] , to broadly characterize allergy-related antibodies 5 , to quantitatively compare the antibody repertoires of term and preterm neonates 11 , to assess changes in the anti-viral antibody response after bone marrow transplant 12 , to characterize the self-reactivity of broadly neutralizing HIV antibodies [13] [14] [15] , to link enteroviral infection with acute flaccid myelitis 16 , and for use in large crosssectional and longitudinal studies of exposure and response to hundreds of human viruses and thousands of bacterial proteins in healthy individuals and in individuals infected with HIV or measles 4, [17] [18] [19] . In addition, we recently used PhIP-Seq to assess how antibody responses to endemic coronaviruses modulate COVID-19 convalescent plasma functionality 7 and evaluated the heritability of antibody responses 20 .

The output from PhIP-Seq experiments are read count matrices, similar to RNA-Seq data, but important differences in the data structures, experimental design, and study objectives exist between the two sequencing-based methods. RNA-Seq experiments typically focus on differentially expressed genes or transcripts between experimental groups, rather than identifying expressed genes for any particular sample. The objective of PhIP-Seq experiments however is typically just that: detecting peptide antibody reactivity in an individual sample. Thus, in contrast to RNA-Seq experiments, the design of PhIP-Seq experiments requires the use of negative controls (i.e. "mock" immunoprecipitations (IPs) lacking antibody input, also referred to as beads-only samples), which are typically included as 4 to 8 wells of a 96-well plate. This generates a "n versus 1" mock IPs versus sample comparison, in contrast to the most common n 1 versus n 2 two-group comparison in RNA-Seq. In addition, genes with low read counts are presumed to have little biological relevance, and RNA-Seq data workflows typically filter out lowly expressed genes (measured as counts-per-million) prior to analysis. In PhIP-Seq experiments however, peptides with low read counts may have biological relevance and are not filtered out in advance. That said, under suitable assumptions, such as equality of variances in both groups, a two-group comparison with a single sample in one group can still be carried out.

Significant advances in normalization and analysis methods for RNA-Seq data have been made in recent years, with edgeR, DESeq2, and voom among the most popular open-source software packages available 1,21-23 . These methods model the number of reads using a negative binomial distribution to account for the inflated variance due to biological variability between samples in comparison to the expected variance of the binomial distribution. Parameter estimation is based on empirical Bayes methods to borrow strength across transcripts, stabilizing the estimates of the respective standard errors.

Upregulated genes in RNA-Seq experiments draw a higher proportion of reads than expected for a given library size (here, total read counts), resulting in lower than expected read counts for other genes given that library size ("competing resources"). Thus, a normalization factor for each sample is calculated in RNA-Seq experiments to account for this effect 24 . One assumption is that the majority of genes are not differentially expressed when comparing cases to controls.

In this manuscript we investigated whether the publicly available method edgeR 1 for normalization and analysis of RNA-Seq data is also suitable for PhIP-Seq data. We highlight some of the differences between PhIP-Seq and RNA-Seq experiments and data sets, which motivates the development of a new methodology for PhIP-Seq data explicitly based on the assumed data generating mechanism, rather than adapting existing RNA-Seq approaches. To that end, we introduce a Bayesian framework specifically tailored for data from PhIP-Seq experiments (Bayesian Enrichment Estimation in R, BEER). Using simulation studies and data sets from existing HIV and SARS-CoV-2 studies, we investigate what improvements in sensitivity and specificity can be made, highlight the importance of empirical Bayes methods, and assess the effect of the number of mock IP samples on sensitivity and specificity.

Simulation. Regardless of the approach used to estimate prior parameters, BEER has high discriminatory power for identifying enriched peptides ( Figure 1, Supplementary Figure S1 ). In general, BEER using methods of moments (MOM) or maximum likelihood estimates (MLE) for the shape parameters in the beads-only prior distributions performed worse than BEER using edgeR parameter estimates, highlighting the importance of borrowing strength across peptides for improved parameter estimation (Supplementary Figure S1 , Supplementary Table S1 ). The stability of parameter estimates also affected the improvement in BEER predictive performance by the number of beads-only samples used. While BEER with MOM and MLE parameter estimates greatly benefited from the inclusion of more beadsonly samples in the experiment, BEER using edgeR parameter estimates had much less pronounced improvements as the number of beads-only samples was increased (Supplementary Figures S2, S3, and   Supplementary Table S1 ). Using these edgeR parameter estimates, BEER posterior probabilities of enrichment were well-calibrated (Supplementary Figure S4) , and estimates of fold changes were accurate (Supplementary Figure S5) . Thus, we recommend the edgeR parameter estimates as default and imply their use when simply referring to BEER as the method used.

In general, performances of edgeR and BEER for identifying enriched peptides were surprisingly similar ( Figure 1 , Supplementary Table S1 ). Both methods yielded near perfect receiver operating characteristic (ROC) curves for peptide fold changes above 4 (area under the curves (AUCs) > 0.99), and still outstanding ROC curves for fold changes between 2 and 4 (AUCs between 0.94 and 0.98) even when a design with only 2 beads-only samples was employed. Peptide fold changes less than 2 were harder to detect, reflected in substantially lower AUCs between 0.71 and 0.74. Precision-recall (PR) curves for edgeR and BEER are noticeably different for intermediate fold changes between 2 and 4, where also most improvement is (theoretically) possible for BEER if improved estimates of shape parameters used in the prior distributions were available. As expected from the near perfect ROC curves, reliable detection of peptides with fold changes above 4 is possible with a low rate of false positives. ( Figure   1 , Supplementary Table S2 ). For small fold changes less than 2 the positive predictive value (PPV) is generally poor, which is expected as the ROCs are modest and most peptides are not enriched.

Under commonly employed false discovery rate (FDR) control, BEER has a higher probability of correctly identifying enriched peptides than edgeR across all fold-changes, and the difference in probability is most pronounced for moderate fold changes between 2 and 8 ( Figure 2 ). For example, under a FDR control of 5%, on average, the probability of identifying a peptide with a 4 fold change is 53% for BEER, but only 21% for edgeR. Similarly, BEER has a probability of at least 50% to detect fold changes above 3.7 under this FDR control, while edgeR requires a fold-change of at least 5.5. Of note, the BEER posterior probability cut-offs in the ten simulations to achieve a 5% FDR (see Methods) were between 0.25 and 0.49; thus, using a commonly employed posterior probability cut-off of 0.5 leads to fewer false positives on average.

HIV elite controllers. Both BEER and edgeR had no false positives across the six mock IP samples Figure 1 : Average receiver operating characteristic (ROC; top panels) and precision-recall (PR; bottom panels) curves calculated from ten simulations, comparing edgeR (black lines) and BEER (red lines) across fold-change categories and number of beads-only samples available. Curves for BEER using the actual simulation shape parameters in the prior distributions (orange lines) are added to show the effect of sampling variability in these parameters. Results for fold changes above 16 are omitted since in all instances peptides were correctly classified as enriched. : Estimated probabilities for correctly identifying enriched peptides (y-axis) as a function of the fold-change (x-axis) for each of ten simulated data sets based on logistic regression models. BEER posterior probability cut-offs were selected to achieve a false discovery rate of 5% in each simulation (see Methods). Thin lines indicate the individual simulations, thick lines the respective averages.

using a posterior probability cut-off of 0.5 for BEER and an FDR control of 5% for edgeR (corresponding to p-value cut-offs between 1.0 × 10 −3 and 2.4 × 10 −3 across the eight samples). As the non-replicated serum samples are from individuals infected with HIV subtype B, we expected stronger antibody reactivity to proteins from HIV subtype B, and indeed, BEER and edgeR detect more enrichments to peptides tiling proteins from HIV subtype B than proteins of any other HIV strain represented in the library ( Figure 3) . Notably, for any particular subtype B protein, BEER detects more enriched peptides than edgeR (while expected to have a lower type I error with a posterior probability cut-off of 0.5, see above). Some antibody reactivity to proteins from other HIV subtypes is expected due to cross-reactivity (Supplementary Figure S6) . Figure S8) . This is not surprising as the BEER posterior probabilities for peptides ranked 100 in the two technical replicates were 0.99 and 1.00 respectively, and the edgeR p-values were 3.1 × 10 −4 and 9.4 × 10 −5 with respective 5% FDR cut-offs of 1.5 × 10 −3 and 1.8 × 10 −3 , respectively.

Thus, both methods exhibit high confidence that most of the peptides among the top 100 are truly enriched (Supplementary Figure S9 ). While BEER concordance decreases but remains above 0.90 when considering the lists of peptides with ranks up to 200, the edgeR concordance does drop more noticeably (Supplementary Figure S8 ), potentially indicating a higher sensitivity in BEER for peptides with smaller fold changes. Comparing subtype A peptides across technical replicates, BEER and edgeR had very similar performance. Compared to other subtypes, both methods also showed less discordance among subtype A peptides (Supplementary Table S3 ).

CoronaScan. In a round-robin, leaving one mock IP sample out in turn, no false positives (i.e., peptides falsely called enriched) were produced by BEER or edgeR across the eight mock IPs in the CoronaScan data using a posterior probability cutoff of 0.5 for BEER and an FDR control of 5% for edgeR (p-value cutoffs ranged from 3.3×10 −4 to 1.1×10 −3 ). Among the six serum samples from individuals prior to the COVID-19 pandemic, BEER and edgeR show more enrichment to peptides tiling human coronaviruses, but generally no enrichment of SARS-CoV-2 peptides (VRC 1 -VRC 6, Supplementary Figure S10 ). In contrast, among the four samples from a single individual infected with SARS-CoV-2, taken at 10, 11, 12 and 13 days since symptom onset, an enrichment of SARS-CoV-2 protein tiling peptides is apparent.

Particularly on day 13 after symptom onset, the patient presumably has produced a large number of antibodies which were detected by both BEER and edgeR. Of note, this was the first day the a SARS- Table S4 ).

In this manuscript we investigated whether the publicly available method edgeR 1 for normalization and analysis of RNA-Seq data is also suitable for PhIP-Seq data. With the exception of calculating one-sided p-values to infer peptide reactivity, no "tweaks" were necessary in the implementation, and we found the approach to be effective. However, using simulation studies we showed that substantial improvements are possible with a Bayesian framework specifically tailored for data from PhIP-Seq experiments (Bayesian Enrichment Estimation in R, BEER). In particular for peptides showing weaker reactivity, we saw an improvement of sensitivity with lower false positive rates when standard cut-offs were employed (posterior probability > 0.5 for the Bayesian method and a Benjamini-Hochberg false discovery rate control of 5% for edgeR). This comparison might be perceived as somewhat unfair, as the data were simulated from a model similar to that underlying BEER, which we recognize. However, BEER was implemented in a way we believe reflects the true data generating mechanism, which is also corroborated by the fact that BEER showed better performance also on real data such as the data from the HIV elite controllers, where BEER detected more enriched peptides of the correct HIV subtype than edgeR. This improved performance comes at a price of increased computational cost. While edgeR delivers almost instantaneous results, the Markov chains underlying BEER are time consuming. However, since laboratory prep and sequencing are expensive and certainly take more time than running such Markov chains, we believe utilizing extra CPU time to run BEER, as shown in our workflow (https://github.com/athchen/beer_manuscript), may yield worthwhile additional discoveries.

It was initially surprising to us how well edgeR fared on PhIP-Seq data despite being designed specifically for RNA-Seq data. While important differences between PhIP-Seq and RNA-Seq data structures exist, as previously described, edgeR captures some of the most important effects that exist in both types of data.

For example, unlike RNA-Seq, the PhIP-Seq experimental protocol requires the use of negative controls (i.e. samples with no serum) on a 96-well plate. The observed read counts mapped to the peptides among those negative controls show a very strong peptide-dependent bias in library representation and/or "background" binding to the beads, such that some peptides consistently draw a much higher proportion of reads than others (Supplementary Figure S13) . However, in a "n versus 1" mock IPs versus sample comparison where inference is drawn for each peptide, these differences among peptides are similar to the biological variability observed between genes in RNA-Seq 25 . In addition, edgeR models read counts using a negative binomial distribution to account for larger than binomial variability between samples, an effect we also observe in PhIP-Seq data (Supplementary Figure S14 ). And while we expect reactive peptides in a serum sample to pull a large number of reads, and thus -after adjusting for library sizeexpect non-reactive peptides in a serum-sample to have fewer reads on average than the corresponding beads-only sample peptides (Supplementary Figure S15) , the resulting attenuation constant in essence is the same as the scale factors derived from the trimmed mean of M-values approach in edgeR 1,24 .

Our findings also highlight the importance of empirical Bayes methods for parameter estimation. Methods of moments and maximum likelihood estimates for individual peptide prior distribution shape parameters performed substantially worse than those obtained by borrowing strength across peptides. By also using the true shape parameters in our simulations to assess sensitivity and specificity, we were able to demonstrate that, particularly for intermediate fold changes, better performance could be achieved by improving procedures to estimate these parameters. Our findings also give guidance for experimental design, such as the chosen number of mock IPs per 96-well plate. Allocating more beads-only samples to a plate improves estimation of these shape parameters that largely quantify between sample variability of the probabilities of a specific peptide to pull a read. Choosing more beads-only samples means reduced number of biological samples assayed per plate, which for the practitioner means additional cost and labor for more plates. In previous experiments, the number of mock IPs per plate was typically between 4 and 8. Our simulation studies showed that this is appropriate, as the observed difference in performance between 8 and 4 beads-only samples was much less than the observed difference between 4 and 2 mock IPs, indicating diminishing returns.

A few technical details should also be discussed further. As described in the Methods section, reads from highly reactive peptides (initial fold change estimate above 15) are removed from the data of mock IPs and the actual sample before BEER analysis, and the respective library sizes are recalculated. The main reason for doing so is simply to stabilize the inference and improve scalabilty, as allowing for extreme fold changes in the Bayesian model for a few peptides can affect these features. We verified that there were "no false positives" in the sense that all posterior probabilities for these highly reactive peptides were 1 if not excluded from the analyses. Thus, the chosen "highly enriched" threshold of 15 is likely conservative. We also note that the Bayesian model can be extended to run Markov chains for multiple samples against the beads-only samples. However, the resulting increase in parameter space makes this a challenging endeavor, especially with regards to scalability. It could be argued that for the same reasons stated above an increase in CPU time should be acceptable if this leads to an improvement detecting reactive peptides. However, we did not observe an improvement in detecting antibody reactivities in simulation studies we performed (data now shown). No notable improvements were observed when the same peptide was simulated as enriched in all samples compared to the beads-only, and a deterioration was observed when reactivity was not common to all peptides. Since in real life experiments we seldom expect the exact same peptides to be reactive, we did not pursue this line of research further.

In summary, antibodies commonly serve as indicators of environmental exposures and immune responses, and Phage ImmunoPrecipitation Sequencing allows for quantification of antibody binding to hundreds of thousands of peptides, in individuals and large cohorts. We believe that this technology will play an even more prominent role in the future, addressing questions about exposures and health outcomes in populations, as well as individualized medicine. In this manuscript, we introduce a method and a software package for analyzing data from this technology, contrast it with an existing RNA-Seq software package that can be retooled for PhIP-Seq data, and share a workflow with practitioners to successfully carry out their own analyses of data resulting from PhIP-Seq experiments.

A Bayesian model for detecting antibody enrichment. A succinct summary of the model notation is provided in Supplementary Table S5 . On a 96-well plate suppose we observe Y ij read counts for peptide i ∈ {1, 2, . . . , P } in sample j ∈ {1, 2, . . . , 96}. Let n j = i Y ij denote the total read count (library size) for sample j. Without loss of generality, assume samples {1, 2, . . . , N } are mock IP (beads-only) samples. To infer reactivity, we compare one sample to all beads-only samples on the same plate. Our hierarchical model to infer peptide reactivity in a sample j ∈ {N + 1, . . . , 96} is described as follows.

The main parameter of interest Z ij is a binary indicator denoting whether peptide i elicits an enriched antibody response in sample j (a 0 indicates no, a 1 indicates yes). The prior is a Bernoulli with success probability π j . For all mock IP samples, this success probability is zero. For sample j, π j is modeled as a beta distribution. The shape parameters a π and b π of the Beta prior distributions are chosen as 2 and 300 in our applications, to reflect peptide enrichment seen in previous studies, but also make it sufficiently diffuse to support a range of proportions (Supplementary Figure S16) . The parameter φ ij Figure S16 ; the attenuation constant is equal to 1 in the mock IP samples).

The observed read counts Y ij are modeled using a Binomial(n j , θ ij ) distribution, where θ ij denotes the probability that peptide i pulls a read in sample j, and n j denotes the total library size in sample j.

This Binomial probability is modeled using a Beta prior distribution, and the shape parameters depend on the expected peptide read counts observed in the mock IP samples (estimation procedures described below), the fold change φ ij , and the attenuation constant based on the reads pulled by all reactive peptides in the sample.

Shape parameter estimation. We define two functions, f a and f b , used for the description of the Beta shape parameters a and b given a mean µ and a variance σ 2 .

The mean and variance for peptide i in a beads-only sample (e.g., for a Beta distribution with shape parameters a i0 and b i0 ) is given by

.

Since each sample generally contains more than a million of reads, estimates of the Binomial probabilitieŝ 

The MOM estimates for a i0 and b i0 are then given bŷ on each plate in our application, but neither the MLEs nor the MOM estimates described above make use of this. In contrast for example, edgeR uses an emprical Bayes approach 29 to approximate the larger than binomial variability observed in the RNA-Seq read counts, and to stabilize these variance estimates, which are characterized by the tagwise dispersion parameter (the squared coefficient of variation ofθ ij , denoted here as τ edgeR ij ) 1 . We note that we can use the estimates of these tagwise dispersion parameters to derive new estimates of the variances for our Binomial probabilities θ ij . Specifically, for peptide i we have σ edgeR i0 2 = τ edgeR ij * θ 2 i0 , and thusâ Markov chain Monte Carlo. The model was implemented in JAGS (4.3.0) and run using the R interface for JAGS, rjags [30] [31] [32] . We use maximum likelihood estimates to select starting values of θ ij , Z ij , φ ij , c j , and π j to initialize the Markov chain for non beads-only sample j. As described above,θ init ij =θ ij is the MLE of the binomial probability calculated from the read counts. Since Z ij is needed to update c j , π j , φ ij , we setẐ init ij = 1 if its observed read count is at least twice as large as the expected read count in a beads-only sample. That is,Ẑ init ij = 1 if Y ij ≥ 2n jθi0 , and 0 otherwise. The initial value for the attenuation constant is derived by regressing the observed read counts on the expected reads count for all non-enriched peptides in that sample, withĉ init j being the slope estimatê

The initial value for the proportion of enriched peptides is the average of all enrichment indicatorŝ

and the respective peptide fold changes are initialized aŝ

Since c j and π j are modeled using Beta distributions with no support at values 0 and 1, we use a small offset in the event that c init j = 1 and π init j = 0.

In PhIP-Seq experiments we commonly observe very reactive peptides 18, 33 . Allowing for extreme fold changes in the Bayesian model for a few peptides can affect the inference for other less reactive peptides, and can have negative consequences for numerical stability and scalability. In our applications, clearly enriched peptides defined asφ init ij > 15 were filtered out before starting the Markov chain. Reads from such peptides in the mock IP and actual samples were removed, and the library sizes were recalculated.

Peptide reactivity detection with edgeR. To identify reactive peptides, each serum sample is compared to all beads-only samples from the same plate. Differential expression in edgeR is assessed for each unit (here, each peptide) using an exact test analogous to Fisher's comparing means between two groups of negative binomial random variables, but adapted for overdispersed data 34 . Two-sided p-values were subsequently converted to one-sided p-values as the alternative to the null of no reactivity (fold change = 1) is reactivity, leading to read count enrichment and thus, fold-changes larger than 1. Multiple comparisons corrections were based on the Benjamini-Hochberg procedure, using false discovery rates to delineate enrichment across all peptides.

Simulation study. We simulated ten data sets based on the read counts observed in the HIV elite controller data described below. Each of these data sets had eight beads-only samples and twelve simulated serum samples. The twelve samples contain one beads-only sample run as an actual sample and two technical replicates (samples generated from the same parameters). For each simulated serum sample, we randomly selected 50 peptides as reactive. Among those, 10 peptides each had fold changes between 1 and 2, between 2 and 4, between 4 and 8, between 8 and 16, and between 16 and 32.

Each data set was analyzed using the first two, four, and all eight beads-only samples to assess the sensitivity of the results to the number of beads-only used for analysis. For each data set and number of beads-only sample combination, we ran BEER with the true beads-only Beta a 0 , b 0 prior parameters, estimated beads-only parameters using maximum likelihood, method of moments, and edgeR derived estimates.

Performance was primarily assessed using ROC and PR curves on the full data and fold-change subsets of the data. For each fold-change bin, curves were generated using all non-enriched peptides and enriched peptides within the specified fold-change group from the simulated serum samples (no peptides from beads-only samples were included). To ensure that all curves had the same support points, we used linear interpolation to approximate the sensitivity or positive predictive value respectively at each support point for each simulation. ROC curves started at 0 sensitivity and 0 false-positive rate, while PRC curves started at 0 sensitivity and perfect positive-predictive value. The interpolated curves were averaged point-wise to generate an average curve for each condition. The area under each ROC curve was calculated using trapezoidal approximation from the interpolated data points. We also used logistic regression to model the probability of identifying and enriched peptide by fold-change in each data set.

Multiple comparisons for edgeR p-values were addressed using the Benjamini-Hochberg procedure to ensure a 5% FDR. Cut-offs for the posterior probabilities were selected in each data set to achieve 5% false positive calls.

Examples. Antibody reactivity counts for eight plates of data were generated using the PhIP-Seq assay and the VirScan library on serum samples from HIV elite controllers with HIV subtype A and B infections, and analyzed by Kammers et al. 15 to assess antibody profiles in HIV controllers and persons with treatment-induced viral suppression. We used count data for the 3,395 phage-displayed peptides spanning the HIV proteome in the VirScan library for ten samples and six-beads-only samples from one plate of data. Two of the ten samples are identical, run in duplicate on the same plate. To quantify the false-positive rate of each algorithm, we also ran each beads-only sample against the remaining five-beads-only samples in a round-robin.

The CoronaScan data consists of counts for 6,932 peptides for 10 serum samples and 8 beads-only samples from one plate of data 7 . Among the ten samples, six were pre-pandemic samples and four samples were from one individual infected with SARS-CoV-2. Samples from this individual were collected on days 10 through 13 since symptom onset. By design, each peptide is present in duplicate in the CoronaScan library, enabling us to assess the concordance of the fold-change estimates and the enrichment status within samples. We again ran each beads-only sample against the remaining 7 beads-only samples to assess false positive rates.

The example in the Discussion to highlight the strong peptide-dependent background binding to the beads was from a previous study to evaluates HIV antibody responses and their evolution during the course of HIV infection 18 and to generate a classifier for recent HIV infections 33 .

The data and code for the figures, tables, and benchmarks are freely available at https://github. com/athchen/beer_manuscript to ensure the reproducibility of our results. BEER can be run using the R package beer which is available on Github: https://github.com/athchen/beer (submitted to Bioconductor). Table S1 : Area under the ROC curves shown in Figure 1 . Both BEER using edgeR parameter estimates and edgeR had near perfect classification for peptides with fold-changes above 4, even when only four beads-only samples were used to estimate a i0 and b i0 . i ∈ {1, 2, · · · , P } The peptide index.

j ∈ {1, 2, · · · , 96} The sample index.

Y ij The observed number of reads mapped to peptide i in sample j.

The library size of sample j.

θ ij The probability that peptide i in sample j pulls a read.

Z ij The indicator whether peptide i in sample j is enriched (reactive).

π j The proportion of enriched peptides in sample j.

c j The attenuation constant for sample j.

φ ij The fold change of peptide i in sample j compared to the beads-only samples.

φ min The minimum fold-change for an enriched peptide (defined a priori).

a ij , b ij The shape parameters for the prior distribution of θ ij for peptide i in sample j.

a π , b π The shape parameters for the prior distribution of the π j .

a c , b c The shape parameters for the prior distribution of the c j .

a φ , b φ The shape and scale parameters for the prior distribution of φ ij |Z ij = 1 Figure S13: Evidence for a strong peptide effect in PhIP-Seq data, demonstrated using data from five plates of a previous experiment using HIV samples, analyzed in Eshleman et al. 18 and Chen et al. 33 Left: observed read counts per million reads for 95,242 peptides from two "beads only" samples from different plates. For these two samples, the Spearman correlation is 0.875. Middle: observed average read counts per million reads for 95,242 peptides from all "beads only" samples from two different plates. For these two plates, the Spearman correlation is 0.975. Right: within and between plate sample correlations as a function of sequencing depth. For each pair of bead only samples from the same plate (red dots, 117 pairs total), the Spearman correlation (y-axis) is related to the minimum of the respective two sequencing depths (x-axis). For each pair of plates (blue dots, 10 pairs total), the Spearman correlation between the average read counts of the bead only samples (y-axis) is also related to the minimum of the two median sequencing depths (x-axis), and substantially higher than the correlations of the bead only samples. Figure S14 : Evidence for larger than binomial variability in the PhIP-Seq data analyzed in Eshleman et al. 18 and Chen et al. 33 Left: library size (reads in millions) for 36 control ("bead only") samples from 5 plates. Middle: expected read counts per million reads aligned based on the estimated Binomial probabilities (dots, colored by plate) and respective 95% confidence intervals for a peptide with large expected read counts, for each of the control samples. Highlighted are samples 3 and 6 from plate 1, showing large discrepancies between the Binomial probabilities for this peptide between the two bead only samples. Right: the same statistics as in the middle panel, for a peptide with smaller expected read counts. Highlighted are samples 10 and 12 from plate 2, again showing a large discrepancy between the Binomial probabilities for this peptide between the two bead only samples. Figure S15 : Expected versus observed read counts for 95,242 peptides from a randomly selected serum sample. Expected read counts for each peptide were derived using maximum likelihood estimates from the negative controls on the same plate. Each point represents one peptide from one sample, and peptides were considered enriched (red) if the observed read count was over 2 times (left) and 5 times (right) the expected number of reads. Linear regression lines (blue) were fitted using the non-enriched peptides and compared to the line where observed and expected reads are equal (black). The observed read counts were truncated at 250 to enhance the display. 

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

PhIP-Seq characterization of serum antibodies using oligonucleotide-encoded peptidomes

Autoantigen discovery with a synthetic human peptidome

Comprehensive serological profiling of human populations using a synthetic human virome

Profiling serum antibodies with a pan allergen phage library identifies key wheat allergy epitopes

Prevalence, persistence, and genetics of antibody responses to protein toxins and virulence factors

Antibody responses to endemic coronaviruses modulate COVID-19 convalescent plasma functionality

PhIP-Seq characterization of autoantibodies from patients with multiple sclerosis, type 1 diabetes and rheumatoid arthritis

Cytosolic 5'-nucleotidase 1A autoimmunity in sporadic inclusion body myositis

Systematic autoantigen analysis identifies a distinct subtype of scleroderma with coincident cancer

The repertoire of maternal anti-viral antibodies in human newborns

Temporal virus serological profiling of kidney graft recipients using VirScan

Autoreactivity and exceptional CDR plasticity (but not unusual polyspecificity) hinder elicitation of the anti-HIV antibody 4E10

Ontogeny of recognition specificity and functionality for the broadly neutralizing anti-HIV antibody 4E10

HIV Antibody Profiles in HIV Controllers and Persons With Treatment-Induced Viral Suppression

Pan-viral serology implicates enteroviruses in acute flaccid myelitis

Measles virus infection diminishes preexisting antibodies that offer protection from other pathogens

Comprehensive Profiling of HIV Antibody Evolution

Population-wide diversity and stability of serum antibody epitope repertoires against human microbiota

Analysis of antibody binding specificities in twin and SNP-genotyped cohorts reveals that antiviral antibody epitope selection is a heritable trait

Article Navigation Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

voom: precision weights unlock linear model analysis tools for RNA-seq read counts

A scaling normalization method for differential expression analysis of RNA-seq data

Sequencing technology does not eliminate biological variability

Numerical Optimization

Differential expression analysis for sequence count data

Detecting Significant Changes in Protein Abundance

limma powers differential expression analyses for RNA-sequencing and microarray studies

JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling

R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing

A top scoring pairs classifier for recent HIV infections

Small-sample estimation of negative binomial dispersion, with applications to SAGE data

Funding for this work was provided by grants from the United States National Institutes of General Medical Science (NIGMS R01 GM136724) and Allergy and Infectious Diseases (NIAID R01 AI095068), and the National Cancer Institute (NCI P50 CA062924, NCI P30 CA006973).