key: cord-0884336-1rp2pwyh
authors: Vorsteveld, Emil E.; Hoischen, Alexander; van der Made, Caspar I.
title: Next-Generation Sequencing in the Field of Primary Immunodeficiencies: Current Yield, Challenges, and Future Perspectives
date: 2021-03-05
journal: Clin Rev Allergy Immunol
DOI: 10.1007/s12016-021-08838-5
sha: 216231758439cb223a760f856d887beeec012432
doc_id: 884336
cord_uid: 1rp2pwyh

Primary immunodeficiencies comprise a group of inborn errors of immunity that display significant clinical and genetic heterogeneity. Next-generation sequencing techniques and predominantly whole exome sequencing have revolutionized the understanding of the genetic and molecular basis of genetic diseases, thereby also leading to a sharp increase in the discovery of new genes associated with primary immunodeficiencies. In this review, we discuss the current diagnostic yield of this generic diagnostic approach by evaluating the studies that have employed next-generation sequencing techniques in cohorts of patients with primary immunodeficiencies. The average diagnostic yield for primary immunodeficiencies is determined to be 29% (range 10–79%) and 38% specifically for whole-exome sequencing (range 15–70%). The significant variation between studies is mainly the result of differences in clinical characteristics of the studied cohorts but is also influenced by varying sequencing approaches and (in silico) gene panel selection. We further discuss other factors contributing to the relatively low yield, including the inherent limitations of whole-exome sequencing, challenges in the interpretation of novel candidate genetic variants, and promises of exploring the non-coding part of the genome. We propose strategies to improve the diagnostic yield leading the way towards expanded personalized treatment in PIDs.

Primary immunodeficiencies (PIDs) are a group of inborn errors of immunity, caused by germline mutations that affect different parts of the immune system. PIDs are associated with a broad range of symptoms, including recurrent infections, autoimmunity, autoinflammation, allergies, and malignancy [1] . The diagnosis of these disorders is both hampered by the heterogeneous clinical presentation, genetic heterogeneity, and the variable mutational mechanisms that underlie the genetic defects. The advent of next-generation sequencing (NGS) has revolutionized the field of sequencing technologies, enabling high throughput sequencing with continuously decreasing costs and wide-spread use in both clinical and research settings. The application of these generic and unbiased techniques have also enabled an exponential increase in the identification of novel genes for PIDs, predominantly through the application of whole exome sequencing (WES) [2] [3] [4] . WES entails the sequencing of all protein coding exons and is widely used for the diagnosis of inherited disorders, enabling a genetic diagnosis that also provides insight into the molecular defect in PID patients, ultimately informing on the therapeutic options [5] [6] [7] .

In this review, we discuss the application of WES in the context of PIDs and determine the current diagnostic yield from documented studies. In addition, we discuss the value of WES in research setting together with the limitations and pitfalls that are relevant for its use in diagnostics and research. Lastly, we discuss future perspectives, including the possible added value of whole genome sequencing (WGS) in the field of PIDs.

• WES facilitates analysis of (almost) all protein coding regions of the genome instead of a selected gene panel. • WES in silico gene panels can be more easily updated than targeted sequencing panels as these leverage thexisting exome data, while actual sequencing panels require a more laborious procedure and acquisition of new data. • The diagnostic yield for PIDs using next-generation sequencing was on average 29%, and 38% for WES alone. • Systematic and regular re-analysis of WES data and analysis of the whole exome in research setting improves yield [2, 8, 9] .

• Variants called from WES data are subjected to standardized variant filtering for non-synonymous, rare variants impacting exons and splice sites. • In silico gene lists are useful to find variants in described PID genes [1, 3, 10, 11] . When no suitable variant is found, all the genes outside of the PID genes can be analyzed to find potentially novel variants. • To prioritize variants exome-wide, a detailed description of the patient phenotype and pedigree is helpful. The phenotype-genotype analysis requires judgement both at the variant and gene level. Metrics for the variant level include variant effect predictions, nucleotide, and amino acid conservation. For the gene level, these include constraint against loss-of-function, functional/pathway annotations, and phenotypes of animal models such as knockout mice [12, 13] . • In sporadic cases of PIDs, a trio analysis can be performed by sequencing of both the patient and parents, allowing the exploration of de novo variants [14, 15] . In familial PID cases, variants from affected and/or unaffected family members can be overlapped to find candidate variants [3] . These approaches can drastically decrease the number of possible candidate variants. • Functional testing in PID patients is minimally invasive and often required to demonstrate a possible functional defect for variants of uncertain significance (VUS).

• There remains a high potential to find new genes associated with PIDs, also indicated by the low average yield seen for NGS-based methods in PIDs. • Challenges in the diagnosis of PIDs include extremely rare or heterogeneous phenotypes, complex mutational mechanisms, or incomplete penetrance, which is in part due to the requirement of exposure to a specific pathogen to cause an overt phenotype. • Collaborative efforts are being undertaken to match patients with similar genetic defects, especially in extremely rare PIDs, for example, by using genetic matchmaking' platforms [2, 5, 16, 17] . • WGS starts to be used as a method to explore non-coding variation and its relevance in PIDs and has recently demonstrated its potential to diagnose patients with non-coding mutations [18] .

The number of PIDs and genes associated with inborn defects of immunity has sharply increased in recent years. Currently, over 400 different disorders have been described arising from defects in 430 genes. Of all known PID genes, NGS already accounts for 45% of new gene discoveries [1] . The potential of NGS is reflected by the increasing number of recent discoveries, as 64 of the 430 known PID genes were discovered in the last 2 years [1] . The mutations known to cause PIDs influence all components of the intricate immune system and follow different inheritance patterns. Most of the established PIDs are inherited in an autosomal recessive fashion, caused by homozygous or compound heterozygous mutations, leading to a loss of function (LoF) of the encoded protein.

In addition, heterozygous mutations inherited in an autosomal dominant (AD) fashion can exert LoF effects through haploinsufficiency or negative dominance, by causing an insufficient level of functional protein and through interference of the mutant protein with the wildtype protein, respectively. Lastly, in recent years, a growing number of heterozygous mutations have been identified that confer a hypermorphic or even neomorphic gain of function (GoF) effect, leading to an augmented or novel protein function [19] . The wide variety of genetic causes and the phenotypic heterogeneity of PID patients that pose an important diagnostic challenge are summarized in Fig. 1 , where the relationship between mutational mechanism and effects on the gene, protein, and pathway is indicated, as well as the effect of the presence or absence of a pathogen on the clinical manifestations of PID patients. Advances in the identification of new PID genes and the understanding of underlying disease mechanisms have revealed that different inheritance modes of mutations in the same gene exhibit different phenotypes. Illustratively, in the most recent classification of the International Union of Immunological Societies (IUIS), more than 35 genes are listed twice [1] . Examples of genes displaying such allelic series include CARD11, consisting of heterozygous mutations that cause distinct phenotypes through negative dominance (hyper IgE syndrome) or hypermorphism (B cell expansion with NF-kB and T cell anergy), whereas biallelic mutations lead to LoF ((severe) combined immunodeficiency) [20] . Moreover, both hypermorphic and LoF mutations in STAT1 [21, 22] and RAC2 [23, 24] cause distinct forms of PID. These examples illustrate the presence of allelic series and add to the complexity of genotypephenotype relationships, which complicates correct variant interpretation using NGS.

The utilization of WES in the diagnosis of PIDs has three clear advantages over targeted gene panels. Firstly, in silico gene panels used with WES can be adjusted as more PID-associated Fig. 1 Schematic overview of the mutational mechanisms and effects on gene, protein, and pathway level with the phenotypic manifestations that result from various forms of PID. For autosomal recessive, X-linked and autosomal dominant forms of PID, the effects on the gene level and subsequently on the protein level are indicated that lead to either deficiency or hyperactivation of the immunological pathways involved. These effects at the pathway level can either result in a phenotype manifesting with symptoms of immunodeficiency, autoimmunity, or autoinflammation. Pathogen exposure can be a prerequisite for the phenotype to develop, which contributes to the incomplete penetrance observed in PIDs.

Autoimmunity/-in ammation genes are identified, allowing for the analysis of new patients with the most up-to-date gene panel, and regular reanalysis of previously unsolved PID cases [9] . Filtering of the WES data provides a more straightforward analysis of known disease genes. Importantly, WES also facilitates the analysis of genes not included in any gene panel, providing the possibility to find candidate variants in all known coding regions of the genome in a research setting [2] . With this analysis, it is possible to identify rare variants in genes that could be involved in the phenotype of the patient in genes not yet associated with any form of PID. Secondly, WES provides genome wide detection of copy number variation (CNVs) [25, 26] and regions of homozygosity (ROHs) [27] . Thirdly, potentially relevant but unsubstantiated findings from the analysis of a single gene, gene panel, or in silico WES analysis can be more reliably interpreted after full exome or genome analysis to find better matching candidate variants. When no other better suitable candidate variant is found, this may support the suspicion of pathogenicity. In contrast, gene panels allow deeper sequencing, as only a selection of genes are sequenced, possibly leading to more reliable variant calling, especially for the detection of mosaicism [28] . WES also has a higher chance of incidental findings, requiring more intensive counseling by clinical geneticists [29] .

There is no single, most optimal approach to analyze exome data. Here, we aim to summarize the best practices from the literature and give practical guidance for this type of analysis. The analysis of variants detected using WES involves several filtering steps. Figure 2 shows these filtering steps schematically, with an approximate number of variants left after each step. These filter steps retain rare, coding non-synonymous variants, i.e. variants that alter the amino acid and variants affecting canonical splice sites that are rare in the general population. Rare variants are filtered based on variant allele frequencies (VAF), which are obtained from databases such as dbSNP and gnomAD [10, 11] . Also, variant frequencies from exomes in local and in-house databases can be useful to address local genomic variation or recurrent artifacts that may be platform specific [2, 30] . The exact allele frequency cut-off for filtering variants remains a matter of debate, but generally lies below 1%. Different frequencies may be applied for suspected dominant or recessive diseases [19] . A list of remaining variants can be ranked based on variant, gene, and available pedigree or segregation data, respectively. Variants are prioritized by the effect of the single-nucleotide variant (SNV) or CNV: frameshift, splice site, missense, insertion, and deletion. The remaining rare variants can be checked against a list of known PID genes and further analyzed based on predicted protein effect, conservation, constraint against LoF, and other annotations such as information of gene ontology (GO) and phenotypes of knockout mice models [31] . An approach that can significantly reduce the number of variants is the concurrent analysis of equally affected or unaffected close relatives. Especially the application of a trio WES analysis, which includes the healthy parents of the patient, can be especially relevant in (severe) sporadic cases of PID. De novo variants can be identified by excluding all variants inherited from either parent [3, 19] . Each exome harbors approximately 1-2 de novo variants, as indicated in Fig. 2 [14] . De novo variants can be included in the analysis when the pedigree suggests de novo occurrence. In consanguineous families, homozygosity mapping can be applied. This approach is based on the principle that pathogenic variants are relatively often present in homozygous regions formed from identity by descend, where both alleles share all variation [32] .

The general steps used to filter variants from WES. The approximate number of variants in each step is indicated. All filtering steps can be applied in silico. The number of variants that remain after filtering depends on the cut-off values used for filtering based on allele frequency and on algorithms used to call CNVs [3, 19, 26] . CNV numbers are highly dependent on the algorithm used; therefore, the number of CNVs is not indicated here. For the analysis of de novo variants, sequencing of a patient's parents as a trio is required, after which all variants present in the parents can be filtered [3] . 

WES has in many cases replaced targeted gene panels and Sanger sequencing and forms a first-line diagnostic approach for PID patients. Although the knowledge base on PIDs continues to improve, the majority of PID patients do not receive a diagnosis using NGS approaches [2, 5] . In this section, we determine the current diagnostic yield of NGS for PID from the current literature and discuss approaches to improve the diagnostic yield. We collected studies that describe the application of WES, WGS, and other targeted NGS approaches in PID patient cohorts. The studies that were included are presented in Table 1 .

The average diagnostic yield of NGS was 29% (range 10-70%) and 38% specifically for WES (range 15-70%) in the context of PIDs ( Table 1 ). The average yield indicates that in many cases, a majority of PID patients are not effectively diagnosed using NGS-based sequencing approaches such as WES.

The marked spread in diagnostic yield between studies can most likely be explained by variation in patient selection and is dependent on disease severity. Patient cohort characteristics differed widely between studies, with some using specific subsets of PID, whereas other studies describe a generic cohort of PID patients. Furthermore, the selection criteria for these patients were not always clearly described. The different patient populations influence the a priori chance of establishing a genetic diagnosis. For example, patients with a severe combined immunodeficiency (SCID) present early in life with severe immune defects and have an increased chance of a monogenic diagnosis. This is reflected in the studies that were used, with two of the studies with the highest yield (79 and 70%) describing a cohort of SCID patients [33, 34] . Moreover, one study describing a generic cohort of PID patients found a diagnostic rate of 100% for SCID patients [35] . Patients with a consanguineous background are more likely to carry homozygous variants, which can be more easily identified using WES. This bias is also observed in the studies included in this review, where some patient populations originated from countries with a high level of consanguinity, thereby influencing the chance of homozygous variants leading to PIDs [2, 33, [35] [36] [37] [38] [39] . In adults with milder symptoms, the likelihood for a true monogenic defect may be lower, which is reflected in a proportionally lower yield. The yield may be decreased due to the possibility of polygenic inheritance, acquired immunodeficiency, and environmental factors [40] . Patient selection also influences diagnostic yield because some forms of PID are more extensively described in literature, causing differences in the number of genes described per disease.

The sequencing approaches also varied between studies. Some studies used a targeted gene panel, while other studies applied WES. Several studies combined these approaches [33, 37, 41] . The number of genes that were analyzed varied from 46 to 356 for targeted gene panel approaches and between 12 and 4813 for WES-based approaches, with one study using all genes associated with human disease [42] . These numbers correlate with the specificity of the patient selection, where cohorts with SCID or HLH (hemophagocytic lymphohistiocytosis) patients require fewer genes to be analyzed, while still resulting in a relatively high diagnostic yield [34, 43] . The differences of analyzed genes between studies indicate the challenge of establishing universal gene panels for PIDs. Initiatives such PanelApp, GF-PID (Genetics First Primary Immunodeficiencies), and the IUIS that establish gene lists for PIDs attempt to unify this process for individual clinics [1, 5, 44] . Taken together, the described differences in patient cohort characteristics, sequencing approaches, and selected gene panels between studies do not permit to draw definitive conclusions about the current diagnostic yield of NGS in (subsets of) PIDs.

The characterization of novel PID genes in research setting and their implementation in the diagnostic PID panel have improved the diagnostic rate in recent years [1] . In two studies, a WES-based approach was used to identify variants in several genes not included in the IUIS classification, leading to an improvement of the yield with 4 and 11 percentage points, respectively [2, 8] . This approach could therefore lead to extra diagnoses in laboratories with the possibility to perform whole exome (re-)analysis in research setting.

Several intrinsic shortcomings of WES also influence the diagnostic yield. These include incomplete coverage of some genes [45] , but more specifically also incomplete analysis of structural variants, CNVs, repeat expansions, and the lack of coverage of most non-coding regions. These shortcomings are caused by mapping errors, which are inherent to short read sequencing approaches when applied to repetitive regions and because of the lack of targeting of regions that are outside of the exons. Because these sources of genetic variation might play a role in human disease, there is a possibility that diagnoses in PIDs are missed based on WES data [25, 46] . Ultimately, it is expected in the future that WGS will offer a more complete test to assess both the coding and non-coding part of the genome, outweighing the extra costs posed by more complex analysis of the data and data storage.

There are various indications that non-coding regions of the genome may also play a role in the pathogenesis of PIDs. In a recent study by Thaventhiran et al., 1318 PID patients underwent WGS, in order to address the diagnostic challenges posed by PID patients presenting in adulthood with no apparent family history of the disease. With this approach, 91 patients (10.3%) with variants in coding regions of known PID genes were diagnosed. Moreover, an analysis of the non-coding genome identified deletions in Table 1 A summary of the studies included for the determination of the yield of NGS based approaches in the diagnosis of PIDs. To determine the diagnostic yield of WES when applied in PID patients, literature describing the application of WES, other targeted NGS approaches, and WGS for the diagnosis of cohorts of PID patients were collected. A recent review from Yska et al. described the application of NGS in the diagnosis of PIDs, where a diagnostic yield between 15 and 79% was reported based on 14 studies [5] . To estimate the diagnostic yield of WES in PIDs, PubMed and Embase were used to search for studies describing WES and other NGS approaches for PIDs. The keywords "PID" or "Primary immunodeficiency diseases" were used in combination with "WES," "NGS," or "WGS." Studies describing cohorts of PID patients were included, while studies describing single patients or families were excluded. We expanded the selection from Yska et al. with 10 additional studies. Seven of the studies described the application of WES, two described the application of an NGS approach targeting known PID genes, and one described the application of WGS in the diagnosis of PID patients. From these 24 studies, the sequencing approach, number of genes sequenced using a targeted gene panel or included in the in silico analysis using WES, the number of included patients, and the number and percentage of patients diagnosed were extracted. The average yield of either NGS-based sequencing approaches or a WES-based approach is calculated from the yield of these studies. In two studies that employed WES, the whole exome was analyzed when there were no candidates found in a panel of genes associated with PIDs. The country where patient recruitment was done is also indicated. [18] regulatory elements, which were shown to contribute to the phenotype. Examples include a case where compound heterozygosity of ARPC1B was detected in a patient, caused by a coding nonsense variant and a deletion of a region containing the promoter. This study also gives an insight into the influence of multiple variants on the phenotype. In the cohort, 60 (6.8%) patients had a pathogenic TNFRSF13B (TACI) variant, with five also carrying a variant in another PID gene [18] . This study gives insight into the involvement of intergenic variation and the possibility of digenic and possibly polygenic causes of PIDs. As non-coding variants are not detected using a WES-based approach, this result also indicates the possibilities and added value of a wholegenome approach, possibly improving diagnostic yield through establishing deleterious variants in the non-coding regions of the genome.

Association of variants in genes not previously associated with genetic diseases such as PIDs has been accomplished using WES, although establishing a causal relationship between variant and phenotype remains challenging [1, 3] . As sequencing technologies continue to improve with rapid and cost-effective sequencing of all variants in the exome, the rate limiting factor of research has become the interpretation of this great wealth of genomic information [58] . Variant and gene annotations, overlapping strategies, and functional validations are vital in this process. However, these annotations still require interpretation and do not always clearly indicate the structural and functional effects of variants. Moreover, the effects of splice site variants, insertions, deletions, and structural variants are more difficult to interpret than SNVs. Also, variants in genes with a completely unknown function or poorly described functions are difficult to link to a phenotype without functional studies [3] . The number of rare variants that are generated from the WES data is based on cut-off values for the frequencies of the variants in various databases that represent the general population. These values therefore influence what variants are considered in the analysis of an exome. A generalized workflow for this approach is shown in Fig. 2 . These cutoff values are based on the expected occurrence of diseasecausing variants in the general population. However, choosing the cut-off values in the analysis of WES data remains subjective and a matter of debate. Filtering out variants that are less rare could lead to missed diagnoses, as more common alleles could also cause disease in compound heterozygous state. An example of common variants that can cause a genetic disease is the ABCA4 variant p.Asn1868Ile which has an allele frequency of close to 7% in the European population and causes a monogenic form of blindness known as Stargardt disease in a compound heterozygous fashion [59] . Filtering variants using a database of exomes from the local population can be very effective to filter benign variants that might occur at a very low frequency in the worldwide population in international databases [30] .

The selection of possible gene candidates is highly dependent on annotations, which causes a bias towards wellannotated genes and gene families with known functions. These annotations can include disease models such as gene knockouts in mice, which lead to potential for error because of the differences between model organisms and humans. The overlap in disease phenotypes in humans and model organisms is modeled in tools such as Exomiser, which uses this phenotype data to filter variants from exome data [60] . The description of the phenotype and the pedigree by the physician is of equal importance for a more accurate association of this phenotype with a gene functioning in a certain pathway. Additionally, a complete pedigree is important for predicting the mode of inheritance of the disease.

In silico predictions of variant effect tools such as PhyloP, CADD, and predictions of the constraint to LoF of genes such as LOEUF are often used to estimate the effect of variants and gene LoF on protein function and by extension the phenotype [11] [12] [13] . These factors indicate the conservation between species, the effect of missense variants on protein function, and an estimate of the tolerance of the gene and of specific residues to variants, respectively. These predictions should be used with caution, as variants that score highly using these metrics do not necessarily indicate a relation to the phenotype. On the contrary, variants with lower scores do not directly indicate the opposite. These predictions should be used as a guide in combination with functional annotations to estimate the relevance of variants. Instead of interpreting variants based on just one or a few important metrics, interpretation should be underpinned by a synthesis of (predicted) gene function from literature and other gene annotations, predictive metrics for the specific variant and the gene itself, the phenotype, and pedigree of the patient to gain insight into the effect of variants on the phenotype and to identify possible candidate variants.

An effective approach to decrease the number of candidate variants is trio analysis, which includes sequencing of the parents of a patient suspected a sporadic form of AD disease caused by a de novo mutation. As for other severe, sporadic diseases such as intellectual disability, patientparent trios allow for the systematic detection of possible de novo mutations, which are not present in the healthy parents and arise during gametogenesis or early in embryogenesis [14, 15] . Other forms of segregation by WES data can also be helpful; however, correct phenotyping of the tested family members is of great importance, in order to select variants shared between family members with comparable phenotypes or to exclude shared variants from unaffected family members. Nevertheless, the benefit of a pedigree is influenced by incomplete penetrance, other genetic factors, and environmental factors, where a PID manifests only with exposure to a specific pathogen [40, 61] .

Individual PIDs are rare, as some genetic defects have only been described in a single patient or a handful of patients in the literature [1] . This causes difficulties in the future diagnosis of PIDs, especially in patients with similar phenotypes, as the clinical presentation of PIDs can overlap significantly. Cohort-based studies of patients with similar phenotypes or with variants in the same gene can provide a more robust analysis of the underlying genetic causes. Notably, overlapping the variants of multiple patients with the same phenotype can also drastically reduce the number of candidate variants. This phenotype-first approach has been used in the past for the elucidation of genetic disorders, such as Miller syndrome, where WES was first applied to establish a pathogenic variant in a Mendelian disorder [62] . However, this approach is not feasible in patients with extremely rare or phenotypically heterogeneous disorders. WES enables a genetics-first approach for patient diagnosis. Data sharing platforms such as Genematcher from Matchmaker Exchange and initiatives such as Solve-RD (research project funded by the European Commission to solve the unsolved rare genetic diseases) and GF-PID can be helpful to identify patients that possibly share pathogenic variants in the same gene, increasing the chance of causality [2, 5, 16, 17] . Quality control of the analytic approach can also be accomplished through multi-center validation of variant interpretation [63] . Nevertheless, additional functional validation of the effect of a genetic variant on gene and protein level is paramount to establish a direct causal relationship and to give insight in the molecular mechanism of the disease. In PID patients, immune cells can easily be extracted from blood to be studied ex vivo, providing a non-invasive method for validation experiments. In case the function of the affected gene is unknown, its function and the effect of the variant can be studied in gene knockout and knock-in models to evaluate pathogenicity of either suspected LoF and hypermorphic or GoF variants, respectively.

WES can exclusively uncover the genetic causes of PIDs located in the coding regions of the genome. As described above, WGS allows for identification of variants in introns and regulatory elements and provides improved identification of structural variants such as copy number variants (CNVs) and other structural rearrangements [19, 64] . The high percentage of unsolved PID cases using WES, as determined from literature describing the application of WES for the diagnosis of PID patients, indicate there might be a possibility that the pathogenic variants are located in the noncoding regions of the genome. WGS has also been recently applied for the analysis of variants in PID patients. The added value of this approach was demonstrated, with the identification of compound heterozygous variants in both coding and non-coding regions [18] . However, the analysis of the vast amount of data generated by WGS is more complex than WES and more expensive, hindering its application in routine diagnostics [19] . Moreover, WGS has shown only a modest improvement of the diagnostic yield and coverage of coding sequences compared with WES so far [46] . WES and WGS both employ short read sequencing with GC bias leading to biased coverage. Short-read sequencing approaches suffer from mapping difficulties of repetitive and paralogous sequences. Long-read sequencing could be a potential solution, facilitating the sequencing of regions difficult to assess using short read NGS technologies [65] .

PIDs have long been considered as rare autosomal recessive disorders that cause completely penetrant phenotypes with severe defects in the immune system. However, this paradigm has shifted towards PIDs as a continuum ranging from mild and common diseases to severe and rare immune defects. This is shown in Fig. 3 , which depicts the relationships between allele frequency of pathogenic variants, disease severity, and diagnostic yield in the context of PIDs. This indicates the spectrum formed by various forms of PID, ranging from extremely rare variants leading to severe disease with a high diagnostic yield ((S)CID), rare variants leading to a milder phenotype with an intermediate diagnostic yield (CVID), and more common variants that lead to common disease with a mild phenotype and a low diagnostic yield [61, 66] . The differences in the presentation of PIDs could be caused by the interplay of multiple variants in patients with milder symptoms, illustrated recently by the presence of TNFRSF13B (TACI) variants in PID patients that also presented with another pathogenic variant [18] . We postulate that genetic causes of PIDs are more common than previously thought and are inherited following The relationship between variant effect size, allele frequency, and the diagnostic success rate in the field of PIDs. The triangles indicate variant effect, allele frequency, and diagnostic yield, ranging from highly impacting to weakly impacting, from extremely rare to common and from high to low, respectively. The characteristics of (severe) combined immunodeficiency ((S)CID), common variable immunodeficiency (CVID), and common disease are indicated in their approximate location within these three indicators more complex patterns. This constitutes a shift of PIDs from extremely rare recessive monogenic disorders that present in childhood towards a spectrum that includes more common diseases that are also caused by AD, de novo, multigenic and complex genetic variation that is also found in the non-coding regions of the genome and which causes disease later in life [40, 66, 67] . These factors also explain the limited diagnostic yield from NGS-based sequencing approaches, including WES, and indicate that this yield could be improved with more knowledge of these complex modes of inheritance that are being explored in the field of PIDs.

NGS-based sequencing approaches such as WES have been instrumental in the diagnosis of monogenic diseases such as PIDs, giving an insight into the underlying molecular disease mechanisms of immune defects [1, 5, 19, 37] . The heterogeneous nature of phenotypes seen in PID patients, combined with the influence of exposure to external factors, complicates the genetic diagnosis of these patients. Currently, more than 400 genes have been linked to inborn errors of immunity [1] , with both clinical and genetic heterogeneity. Exome sequencing allows for the sequencing of the genomic regions that directly influence protein structure, which are highly relevant to human disease [3, 19] . Because of the efficient nature of NGS approaches, the interpretation of variants and their relation to patient phenotypes has become the rate limiting step in its application in clinical and research approaches [58] . We have evaluated the application of NGS as a diagnostic approach in PIDs by discussing the current yield, inherent shortcomings of WES, strategies and difficulties of variant analysis, and the promises of exploring the non-coding genome using WGS. Furthermore, strategies for improvement of the diagnostic yield were considered. The general considerations of the application of NGS for the diagnosis of PIDs and a guide for variant analysis using WES can be found in Box 1.

We have collected studies describing the application of WES and other targeted NGS approaches for the diagnosis of PID in patient cohorts. These approaches resulted in a diagnosis in 10-79% of cases with an average of 29% and 38% for WES alone. The high variability in diagnostic yield between studies and the low average yield of WES and other NGS approaches is caused by several factors, the most important being the widely varying patient cohort characteristics. Apart from the apparent methodological differences between the studies, the average low yield is also in part explained by the limitations of WES as well as a knowledge gap in the interpretation of the genes associated with PIDs. It is challenging to tie a phenotype to a specific variant, for example, due to uncertain variant effects, complex mutational mechanisms, digenic or polygenic inheritance, and unknown gene functions. Furthermore, causative variants that are not efficiently identified by WES, such as structural and noncoding variants, could be present in WES-negative PID patients, which were described in a recent study where WGS was applied in the diagnosis of PID patients, indicating the added value of WGS in this context [18] . Additionally, there could also be a role for more common variants, possibly interacting with rare variants, in the pathogenesis of PIDs. Future research should focus on the elucidation of these relatively unexplored mutational mechanisms that play a role in PIDs [65, 68] . The contribution of external factors on the presentation of PIDs, such as the interplay with specific pathogens at various stages of life, also requires further investigation [40] . Lastly, we expect more effort towards functional characterization of the molecular consequences of genetic variation to guide therapeutic approaches.

WES has contributed to a more accurate diagnosis of PID patients in the clinic, leading the transition towards personalized medicine. A genetic diagnosis of PID patients helps to end "diagnostic odysseys," enables genetic counseling for family members, and can inform on treatment approaches [2] . Since the advent of NGS, and WES specifically, it has become clear that PIDs are moving away from rare, monogenic diseases with a severe clinical presentation often in childhood. As postulated by Casanova and Abel, PIDs might be more common than originally thought, with most individuals suffering from recurrent infections having some form of PID, with its presentation depending on the environment [61, 66] . The number of genes, genetic mechanisms, and phenotypes associated with PIDs keeps growing. This is in part due to the rise of novel pathogens, as exemplified by the COVID-19 pandemic during which pathogenic variants in both novel and known PID genes involved in the host immune response to viral infections have been associated with more severe forms of infection caused by the SARS-CoV-2 virus [69, 70] . PIDs give a unique insight into the molecular mechanisms of the immune response, shedding light on the pathways involved in the functioning and balance of immunity and self-tolerance. The continuing development in sequencing strategies and data interpretation will continue to improve the diagnosis and treatment of PID patients in the future.

Funding The authors of this review acknowledge support from the Solve-RD project of the European Union's Horizon 2020 research and innovation program (No. 779257, awarded to Dr. A. Hoischen) and a PhD grant from the Radboud Institute for Molecular Life Sciences.

Conflicts of Interest The authors declare that they have no conflict of interest.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

Human inborn errors of immunity: 2019 update on the Classification from the International Union of Immunological Societies Expert Committee

Exome sequencing in routine diagnostics: a generic test for 254 patients with primary immunodeficiencies

Disease gene identification strategies for exome sequencing

Review: diagnosing common variable immunodeficiency disorder in the era of genome sequencing

Diagnostic yield of next generation sequencing in genetically undiagnosed patients with primary immunodeficiencies: a systematic review

Uses of next-generation sequencing technologies for the diagnosis of primary immunodeficiencies

The promise of wholeexome sequencing in medical genetics

Expanding the clinical and genetic spectra of primary immunodeficiency-related disorders with clinical exome sequencing: expected and unexpected findings

Making new genetic diagnoses with old data: iterative reanalysis and reporting from genome-wide data in 1,133 families with developmental disorders

dbSNP: the NCBI database of genetic variation

The mutational constraint spectrum quantified from variation in 141,456 humans

CADD: predicting the deleteriousness of variants throughout the human genome

Detection of nonneutral substitution rates on mammalian phylogenies

De novo mutations in human genetic disease

New insights into the generation and role of de novo mutations in health and disease

The Matchmaker Exchange API: automating patient matching through the exchange of structured phenotypic and genotypic profiles

Gen-eMatcher: a matching tool for connecting investigators with an interest in the same gene

Exome and genome sequencing for inborn errors of immunity

The CBM-opathies-a rapidly expanding spectrum of human inborn errors of immunity caused by mutations in the CARD11-BCL10-MALT1 complex

Heterozygous STAT1 gain-of-function mutations underlie an unexpectedly broad clinical phenotype

STAT1 mutations in autosomal dominant chronic mucocutaneous candidiasis

RAC2 lossof-function mutation in 2 siblings with characteristics of common variable immunodeficiency

A dominant activating RAC2 variant associated with immunodeficiency and pulmonary disease

Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort

Copy number variation detection and genotyping from exome sequence data

Detection of runs of homozygosity from whole exome sequencing data: State of the art and perspectives for clinical, population and epidemiological studies

Exome sequencing covers >98% of mutations identified on targeted next generation sequencing panels

Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics

The Genome of the Netherlands C 2014 Whole-genome sequence variation

The gene ontology resource: 20 years and still GOing strong

Homozygosity mapping provides supporting evidence of pathogenicity in recessive Mendelian disease

Clinical, immunologic, and genetic spectrum of 696 patients with combined immunodeficiency

Rapid molecular diagnostics of severe primary immunodeficiency determined by using targeted next-generation sequencing

Primary immunodeficiency diseases: genomic approaches delineate heterogeneous Mendelian disorders

Whole exome sequencing (WES) approach for diagnosing primary immunodeficiencies (PIDs) in a highly consanguineous community

Clinical implications of systematic phenotyping and exome sequencing in patients with primary antibody deficiency

Investigation of genetic defects in severe combined immunodeficiency patients from turkey by targeted sequencing

Unbiased targeted next-generation sequencing molecular approach for primary immunodeficiency diseases

Severe infectious diseases of childhood as monogenic inborn errors of immunity

Diagnostics of primary immunodeficiencies through next-generation sequencing

Genetic diagnosis of autoinflammatory disease patients using clinical exome sequencing

Exome sequencing for simultaneous mutation screening in children with hemophagocytic lymphohistiocytosis

PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight

Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage

The Utility of Next-Generation Sequencing for Primary Immunodeficiency Disorders: Experience from a Clinical Diagnostic Laboratory

Clinical efficacy of a nextgeneration sequencing gene panel for primary immunodeficiency diagnostics

Diagnostics of primary immunodeficiency diseases: a sequencing capture approach

Targeted NGS: a cost-effective approach to molecular diagnosis of

Targeted next-generation sequencing: A novel diagnostic tool for primary immunodeficiencies

Application of extensively targeted next-generation sequencing for the diagnosis of primary immunodeficiencies

Next generation sequencing analysis of consecutive Russian patients with clinical suspicion of inborn errors of immunity

Whole-exome sequencing-based approach for germline mutations in patients with inborn errors of immunity

Genetic diagnosis using whole exomesequencing in common variable immunodeficiency

Whole-exome Sequencing for the Identification of Rare Variants in Primary Immunodeficiency Genes in Children With Sepsis: A Prospective, Population-based Cohort Study

Evaluating the genetics of common variable immunodeficiency: monogenetic model and beyond. Front Immunol 9

Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data

The Common ABCA4 Variant p.Asn1868Ile Shows Nonpenetrance and Variable Expression of Stargardt Disease When Present in trans With Severe Variants

Improved exome prioritization of disease genes through cross-species phenotype comparison

Primary Immunodeficiencies: A Field in Its Infancy

Exome sequencing identifies the cause of a mendelian disorder

National external quality assessment for nextgeneration sequencing-based diagnostics of primary immunodeficiencies

Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants

Long-Read Sequencing Emerging in

Primary immunodeficiencies in cytosolic patternrecognition receptor pathways: toward host-directed treatment strategies

Primary immunodeficiency diseases worldwide: more common than generally thought

Presence of Genetic Variants Among Young Men With Severe COVID-19