key: cord-333145-a20dlaxn
authors: Johnson, Todd A.; Mashimo, Yoichi; Wu, Jer-Yuarn; Yoon, Dankyu; Hata, Akira; Kubo, Michiaki; Takahashi, Atsushi; Tsunoda, Tatsuhiko; Ozaki, Kouichi; Tanaka, Toshihiro; Ito, Kaoru; Suzuki, Hiroyuki; Hamada, Hiromichi; Kobayashi, Tohru; Hara, Toshiro; Chen, Chien-Hsiun; Lee, Yi-Ching; Liu, Yi-Min; Chang, Li-Ching; Chang, Chun-Ping; Hong, Young-Mi; Jang, Gi-Young; Yun, Sin-Weon; Yu, Jeong-Jin; Lee, Kyung-Yil; Kim, Jae-Jung; Park, Taesung; Lee, Jong-Keuk; Chen, Yuan-Tsong; Onouchi, Yoshihiro
title: Association of an IGHV3-66 gene variant with Kawasaki disease
date: 2020-10-26
journal: J Hum Genet
DOI: 10.1038/s10038-020-00864-z
sha: 
doc_id: 333145
cord_uid: a20dlaxn

In a meta-analysis of three GWAS for susceptibility to Kawasaki disease (KD) conducted in Japan, Korea, and Taiwan and follow-up studies with a total of 11,265 subjects (3428 cases and 7837 controls), a significantly associated SNV in the immunoglobulin heavy variable gene (IGHV) cluster in 14q33.32 was identified (rs4774175; OR = 1.20, P = 6.0 × 10(−9)). Investigation of nonsynonymous SNVs of the IGHV cluster in 9335 Japanese subjects identified the C allele of rs6423677, located in IGHV3-66, as the most significant reproducible association (OR = 1.25, P = 6.8 × 10(−10) in 3603 cases and 5731 controls). We observed highly skewed allelic usage of IGHV3-66, wherein the rs6423677 A allele was nearly abolished in the transcripts in peripheral blood mononuclear cells of both KD patients and healthy adults. Association of the high-expression allele with KD strongly indicates some active roles of B-cells or endogenous immunoglobulins in the disease pathogenesis. Considering that significant association of SNVs in the IGHV region with disease susceptibility was previously known only for rheumatic heart disease (RHD), a complication of acute rheumatic fever (ARF), these observations suggest that common B-cell related mechanisms may mediate the symptomology of KD and ARF as well as RHD.

Kawasaki disease (KD) is an acute systemic vasculitis syndrome characterized by high fever, bilateral conjunctivitis, polymorphous skin rash, reddening of lips and oral cavity, changes in extremities and nonsuppurative cervical lymphadenopathy [1] and predominantly affects infants and children younger than 5 years [2] . In many cases, KD is self-limiting. However, it causes coronary artery complications such as dilatations and aneurysms (coronary artery lesions; CALs) in 20-25% of untreated patients [3] . Replacing acute rheumatic fever (ARF), KD is now the leading cause of acquired heart diseases of children in developed countries [4] . Most KD patients are treated with high dose intravenous immunoglobulin (IVIG) infusion combined with oral aspirin, which was established in the 1980s and 1990s and has become a standard treatment that is effective at resolving inflammation and reducing CALs [5] [6] [7] . However, the mechanism of IVIG action on KD has not been revealed, and 10-15% of KD patients do not respond to the treatment and have a higher risk for CAL. Recently a series of genome-wide association studies (GWAS) revealed several definitive KD susceptibility loci [8] [9] [10] . However, these genetic factors can explain only a part of the etiology. Also, the reason for the high incidence among East Asian children [11] , which is up to 20-fold higher than those in Western countries and is a crucial epidemiologic feature of KD, has not yet been explained. It might be attributed in part to lack of statistical power in the previous GWAS analyses that were carried out using modest sample-sizes, and therefore, many common genetic factors with relatively low penetrance may have gone undetected. In this study, to identify novel susceptibility loci for KD, we conducted a meta-analysis of GWAS from Japan, Korea, and Taiwan, the three countries with the highest incidence of KD in the world.

In this study, susceptibility loci for KD were screened and verified in a two-stage association analysis (Fig. 1) . Stage 1 was a whole-genome meta-analysis of results from GWAS conducted in Japan [9] , Taiwan [10] , and Korea [12] involving 6362 individuals. We carried out follow-up association studies using three case-control panels comprised of 1418 KD patients and 1700 controls (Japanese), 473 KD patients and 484 controls (Korean) and 261 KD patients and 567 controls (Taiwanese) independent from the subjects in the GWAS and performed meta-analyses with the Stage 1 results (Stage 2). Loci for the follow-up studies were selected based on their predicted potential to achieve

Impute2 & minimac softwares 1000genomes East Asian 572 haplotypes P values less than 5.0 × 10 −8 in the meta-analyses, with the prediction made by iterative simulations of follow-up studies with virtual case and control cohorts (detailed below). To further validate the associations of rs4774175 and rs6423677 in 14q32.33, an additional 1758 KD cases and 653 controls collected in Japan were used. The number of KD cases and controls, as well as platforms in the three previous GWAS and follow-up studies in Japan, Korea, and Taiwan are summarized in Supplementary Table 1 . [13]) as reference using pre-phasing with SHAPEIT v1 (ref. [14] ) and imputation with IMPUTE2 and minimac softwares [15, 16] . Imputed genotype data were analyzed by each study center in a case-control logistic regression analysis, and the output was merged in the R statistics environment (URL: https://www.R-project.org/) and filtered for variants that were polymorphic in both cases and controls (MAF cases ≥0.01 and MAF controls ≥0.01) and had info ≥0.4 in all three studies' dataset; there were 6,265,045 SNVs in the final filtered dataset. A fixed-effects meta-analysis of the three data sets' beta-coefficients and standard errors was performed using the R package metafor (https://cran. r-project.org/package=metafor).

Each chromosome's SNVs were filtered for those with a meta-analysis P < 0.01 and then labelled based on linkage disequilibrium to regional "top SNVs". Briefly, SNVs with P < 1 × 10 −4 were sorted by P value, the top SNV identified, and any SNVs within ±5 Mb that had r 2 > 0.1 to that top SNV assigned to its region; data on a chromosome was processed as such until no SNVs with P < 1 × 10 −4 were remaining. Each top SNV region was labelled based on chromosome and minimum and maximum positions of the linked SNVs. Based on that process, each region label would be unique, but regions could overlap with each other. Calculation of r 2 was performed using the ld function in the R Bioconductor snpStats package using East Asian genotype data from the 1000 Genomes Phase 1 Version 3 release.

For simplicity, we will refer to the regions of nominally linked SNVs described above as "loci". To perform the Stage 2 follow-up study efficiently, loci that had a high potential of achieving P values less than 5.0 × 10 −8 in a meta-analysis of Stage 1 and 2 results were selected by P value simulation as follows. For each locus, we first selected any nominally associated SNVs (P < 0.001) identified in the Stage 1 analysis, and for each SNV created a virtual set of 100,000 case genotypes and a virtual set of 100,000 control genotypes for each of the three Stage 1 case-control panels. Each virtual set was generated based on the genotype frequencies observed in that panel's Stage 1 cases or controls at a particular SNV, and each of those virtual sets was then re-sampled in R to randomize their order. Each virtual case and control set was then sampled 100 times to create 100 virtual case-control cohorts; the numbers of cases and controls sampled from the virtual genotype sets were the case-control counts that were expected in the three collaborators' follow-up studies (RIKEN in Japan: n cases = 1300, n controls = 1300; Academia Sinica in Taiwan: n cases = 500, n controls = 2000; Asan Medical Center, University of Ulsan in Korea: n cases = 412, n controls = 600). For each iteration, associations of the candidate SNVs were evaluated in a meta-analysis of the virtual cohorts and the three GWAS data sets. For each candidate SNV, the frequency of observing P values of 5.0 × 10 −8 or smaller in 100 iterations of the simulated meta-analysis was scored as the simulation score. Loci with at least one SNV having simulation scores of 0.8 or higher were considered to be promising, and for de novo genotyping, a representative SNV was selected for which assays were designable across the different platforms employed in each research center (Invader in Japan, Sequenom MassARRAY or TaqMan in Taiwan and VeraCode GoldenGate Genotyping kit or TaqMan in Korea, respectively) (Supplementary Table 1 ).

Nonsynonymous SNVs in IGHV genes were genotyped basically by the Invader Assay. Primers and the probes were carefully designed in order to ensure specificity of the assay. We refrained from using multiplex PCR to avoid both expected and unexpected nonspecific amplification of DNA fragments of high sequence homology which will allow cross reaction between amplicons and probes for different loci. Sequences of the primers and probes for 18 nonsynonymous SNVs in IGHV genes and rs4774175, as well as representative genotyping results of rs6423677, are provided in Supplementary Tables 2 and 3 and Supplementary Fig. 1 , respectively. Next-generation sequencing (NGS) of IGHV repertoires Two milliliters of venous blood was drawn from patients who were admitted to the hospitals for KD at four time points including (1) acute phase before receiving IVIG (3-8 illness days), (2) 48 h after the patients became afebrile (8-18 illness days), (3) the first follow-up visit to the pediatric clinic after discharge (17-50 days after the disease onset), and (4) the second follow-up visit to the pediatric clinic after discharge (3-4 months after the disease onset). Blood samples were collected into Vacutainer CPT Cell Preparation Tube (BD) and mononuclear cells were separated according to the manufacturer's instruction. Total RNA from the mononuclear cells was extracted by using the RNeasy Mini Kit (QIAGEN). 1.0 μg of RNA was reverse transcribed with PrimeScript (TAKARA) and the mixed oligonucleotides of random hexamer and oligo-dT primers. Isotype-specific libraries for NGS were prepared as follows.

Mixed forward primers covering the framework region 1 of 7 subgroups of IGHV genes (V1-V7) [17] and reverse primers specific to each IGHC gene for IgM, IgD, IgG, and IgA (including a partial Illumina adapter sequence in the 5′ ends of both primers) were designed for the 1st round PCR. Sequences of the primers are provided in Supplementary  Table 4 . 6-base barcode sequence and the full Illumina adapter sequence were added at 5′ and 3′ ends of the immunoglobulin amplicons in the 2nd round PCR. The barcode sequences were used to distinguish the patients and the sampling time points. The libraries were sequenced with MiSeq Reagent Kit v3 (600-cycle) (Illumina). Forward and reverse sequence reads were merged by using FLASh software [18] . Sequences unmerged due to insufficient overlapping length were excluded from subsequent analyses. Merged sequence reads were classified into isotypes and subclasses based on primer sequences by using blast software (https://blast.ncbi.nlm.nih.gov/Blast. cgi), and then quality filtering and removing the primer sequence were performed by using the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/index.html).

Immunoglobulin repertoires were determined by migmap-1.0.2 software (https://github.com/mikessh/migmap). Sequence reads that terminated prematurely or had a complementarity determining region (CDR) 3 that was non-canonical (i.e., lacking consensus amino acids at both ends or not fully mapped) were excluded from subsequent analyses. Clonotypes were defined by combinations of IGHV, IGH diversity (IGHD), and IGH joining (IGHJ) gene alleles. Correlation trend between the proportions of the IGH clonotypes using IGHV3-66 and the C allele counts at rs6423677 was evaluated by the Jonckheere-Terpstra test using R package PMCMR (https://cran.rproject.org/package=PMCMR). Change of each clonotype proportion from baseline (the 1st sampling point) was calculated by subtracting the baseline proportion from those at 2nd, 3rd, and 4th points. The diversity of CDR3 clonotypes was evaluated by the inverse Simpson's index using the R package vegan (https://cran.r-project.org/pa ckage=vegan).

In the Stage 1 analysis, a meta-analysis of three GWAS data sets, genome-wide level associations were seen in known susceptibility loci such as FAM167A-BLK (rs2736340; P = 1.23 × 10 −16 ), ITPKC (rs28493229; P = 3.07 × 10 −9 ), CASP3 (rs2720377; P = 2.66 × 10 −9 ) and CD40 (rs1883832; P = 1.76 × 10 −8 ). Four top SNVs in HLA class III~class II regions (rs2844485, rs73729123, rs7739458, and rs146650659) and rs13330932 in the 16q24.1 region also showed significant associations (Supplementary Table 5 , Supplementary Fig. 2) . From a standard test for inflation, the genomic control inflation factor λ GC = 1.0216 ( Supplementary Fig. 2 ).

In a simulation of meta-analyses, a "simulation score" was calculated for each nominally associated SNV (P ≤ 0.001; see Methods). Forty-nine loci were identified with one or more SNVs expected to satisfy a P value threshold of 5.0 × 10 −8 with simulation score of 0.8 or higher (Supplementary Table 6 ). From every locus, we selected representative SNVs for genotyping in replication sample panels and then performed a meta-analysis with data from the three GWAS. For one particular SNV locus (rs181511609) that satisfied the criteria, a genotyping assay could not be designed due to the complexity of the surrounding sequence. As a result, 12 out of 48 examined loci showed significant associations ( Table 1, Supplementary Table 7 ). These included four previously known susceptibility gene loci such as CASP3, FAM167A-BLK, ITPKC, and CD40, seven on chromosome 6p21 (NC_000006.11: 25.5-32.78 Mb) and one in the immunoglobulin heavy variable gene (IGHV) region on chromosome 14q32.33. The trend of association for rs13330932 on chromosome 16q24.1 was not replicated in the follow-up sample panels from the three populations (Supplementary Table 7) .

Association of an IGHV3-66 gene variant with Kawasaki disease Association signals that newly achieved genomewide significance after the meta-analyses of three GWAS and follow-up association studies A significant association of rs2857151 located near HLA-DOB and HLA-DQB2 genes with KD susceptibility was reported in the earlier Japanese GWAS [9] . However, the present study revealed that the association statistics for rs2857151 were not consistent across the other East Asian populations (Supplementary Table 8 ). Instead, 7 out of 12 groups of SNVs in the 6p21 region examined in the Stage 2 analyses showed significant association in the metaanalyses of the data sets in the three GWAS as well as in the follow-up studies ( Table 1 and Supplementary Fig. 3A ).

To verify that those seven signals were associated independently from rs2857151, we calculated LD between these SNVs and rs2857151, and we also performed logistic regression analyses conditioning on rs2857151 using the Japanese Stage 2 sample set. Consistent with information from 1000 Genomes, the seven candidates were not in high LD with each other (Supplementary Fig. 3B ). The association was most significant for rs2857151, and in the conditioned logistic regression analyses, the P value for rs2071473, which was located just 19 kb distal to and in marginal LD with rs2857151 (D′ = 0.99, r 2 = 0.36), became nonsignificant (P > 0.05). After conditioning, P values for the other SNVs including rs2857151 itself increased but were still significant (P ≤ 0.05). (Supplementary Table 9 ). These included SNVs with previous information on their functional significance or association with diseases such as rs2857602, where the associated allele is linked to the A allele at rs2239704 (r 2 = 1.0 in the 1000 Genomes JPT population), which has been associated with reduced expression of lymphotoxin A [19] , and rs7775228, which was previously associated with asthma in a Japanese population [20] . Thus, further investigation including that in each ethnic group is needed to unravel the involvement of the variants in this region in KD susceptibility.

In the meta-analysis of the Stage 1 and 2 studies, a significant association was also obtained for an SNV within the chromosome 14q32.33 region represented by rs4774175 (P = 6.0 × 10 −9 ) ( Table 1) . Five hundred and nineteen SNVs, linked with rs4774175 (r 2 > 0.1), and having simulation score of 0.5 or higher were distributed across a 340 kb region (106.88-107.22 Mb) of this locus where many IGHV genes as well as their pseudogenes are clustered in tandem. Similar to the HLA region, most IGHV genes harbor common nonsynonymous SNVs within them contributing to the high level of diversity observed within the immunoglobulin heavy chain repertoire. Therefore, we proceeded to analyze the nonsynonymous SNVs in that locus. Among the 519 SNVs, 18 were nonsynonymous and the others were intergenic, intronic, in the untranslated region, or synonymous SNVs. When the Japanese sample set used in the Stage 2 analysis was genotyped for these 18 nonsynonymous SNVs, rs6423677 in the IGHV3-66 gene showed the most significant association ( Table 2) . As expected, some other nonsynonymous SNVs also showed a similar trend of association. However, those associations, including that of rs4774175, were considered to be explained by LD with rs6423677 because they were no longer significant after conditioning the logistic regression analyses on rs6423677 ( Table 2 ). In contrast, rs6423677 remained significant (P < 0.05) after adjusting for any of the other SNVs (data not shown). In a Stage 3 analysis, rs4774175 and rs6423677 were further examined in an additional Japanese sample set, and rs6423677 achieved genome-wide significance in a meta-analysis of the Japanese sample panels (Table 3) . Meanwhile, the association trend of rs6423677 was weaker in Korean and unreproducible in Taiwanese subjects (Supplementary Table 10 ).

rs6423677 genotypes and IGHV3-66 gene usage in the immunoglobulin heavy chain repertoire

The significant association of an SNV within IGHV3-66 with KD prompted us to investigate the impact of rs6423677 on the function of immunoglobulin molecules or B cell receptors (BCRs) harboring IGHV3-66 in their heavy chain variable regions. Firstly, we examined IGHV usage in different immunoglobulin isotypes (IgM, IgD, IgG, and IgA) by analyzing the immunoglobulin heavy chain repertoire in peripheral B lymphocytes from healthy Japanese individuals (n = 10) and patients with acute KD (n = 9) by NGS. For every Ig isotype, we could identify IGHV3-66 clonotypes in each individual and found that the IGHV3-66 usage tended to be correlated with the number of C alleles each individual had ( Fig. 2A and Supplementary Fig. 4) . Consistent with the correlation trend, we observed significantly skewed usage of the alleles in the analyses of the five patients who were heterozygous for rs6423677, with the protective allele (A) almost completely suppressed in all immunoglobulin classes and time points examined ( Supplementary Fig. 5 ). Similar correlation trend was also observed between rs6423677 genotypes and relative levels of IGH transcript for IGHV3-64, the closest neighboring gene to IGHV3-66. However, the correlations were more modest than that seen for IGHV3-66 ( Supplementary  Fig. 6 ). Paralogues or pseudogenes of IGHV3-66 which have the 'A' nucleotide at the corresponding position of rs6432677 as well as PCR amplification bias in library preparation caused by additional variants interfering with the primer annealing could lead to such results. However, the distinct separation pattern of the allele discrimination plot in the Invader Assay and the clear electropherogram data in the direct sequencing of PCR-amplified genomic DNA from heterozygous patients does not support that possibility ( Supplementary Fig. 1 ). Thus, we consider these observations not to be artificial in origin. We also negated the chance of misassignment of IGHV3-66 with the A allele at rs6423677 (IGHV3-66*03) to other alleles (*01, *02, or *04) or paralogues such as IGHV3-53 by confirming that the migmap software could correctly assign an artificially created sequence with the rs6423677 A allele as IGHV3-66*03 (data not shown).

To obtain an insight into the role of immunoglobulins using IGHV3-66 in their heavy chain variable region in the pathogenesis of KD, we investigated how IGHV3-66 usage in the immunoglobulin heavy chain gene repertoire changed over time during the clinical courses of nine KD patients (Supplementary Table 11 ). Proportions of IGH clones harboring IGHV3-66 in their variable region seemed to be stable or not to have common varying patterns for IgM, IgD, and IgA, while in two patients (patients 2 and 5), transient increases were observed for IgG at the third evaluation point (Fig. 2B) . Then, we investigated the increased clonotypes in the IgG heavy chain repertoire more precisely by defining them by the combinations of IGHV, IGHD, and IGHJ genes. We found that both patients 2 and 5 had single clonotypes with IGHV3-66 that increased 1.0% or more as a proportion of the total from the first to the third evaluation points. However, the gene combinations of them were not the same (Table 4) , and CDR3 amino acid sequences corresponding to the V-D-J combinations also differed between the patients (data not shown). Five of the remaining seven patients also had one or more IgG heavy chain clonotypes with IGHV3-66 that increased 0.1% or more at the same follow-up time point. However, cooccurring V-D-J combinations in the increased clonotypes were only seen between patients 2 and 6. On the other hand, some of the CDR3 clonotypes which have recently been reported to dominate in the IgM heavy chain repertoire (>0.001%) in the acute pretreatment phase of Taiwanese KD patients [21] were also prevalent at a similar time point in the Japanese KD patients in this study (Supplementary Table 12 ). A distinct increase in diversity of the CDR3 clonotypes of IgM heavy chain during the convalescent phase of KD, which has also been reported in previous research of Taiwanese KD patients [21] , was observed for four of the nine KD patients ( Supplementary  Fig. 7 ).

It has generally been considered that the immune-mediated vasculitis of KD is triggered in response to infection with some type of microorganism. This assumption was made based on factors indicative of a primary infection, such as its clinical symptoms, which included fever and skin rash, epidemiologic findings such as its peak age of onset (6 months to 1 year), along with the seasonality of the occurrence of regional outbreaks and nation-wide epidemics [2, 11] . Despite tremendous efforts, no single microorganism has been conclusively proven as the pathogen of KD and lack of information about the pathogen has been a significant obstacle to causal treatment and disease prevention. Histopathological and immunological studies have revealed activation of neutrophils, macrophages, and monocytes in the acute phase of KD [22, 23] , and now the leading hypothesis for the pathophysiology of KD inflammation is an attack of the dysregulated or hyperactive innate immune system against vascular walls [24] . In addition, genetic studies using a genome-wide approach have identified several robust susceptibility loci/genes for KD [8-10, 12, 25-32] , and an in silico prediction of responsible variants and the types of cells where they have biological significance has highlighted the importance of B cells in the pathogenesis of KD [33] . In previous literature, polyclonal activation and increase of B cells [34, 35] and detection of auto-antibodies against components of vascular endothelial cells or neutrophils in peripheral blood of acute KD patients have been documented [36, 37] . Infiltration of oligoclonal plasma cells producing IgA into the vascular wall of KD patients was also reported [38] . These previous findings have suggested the involvement of B cells in KD inflammation. However, a recent transcriptomic study revealed downregulation of genes related to BCR signaling in acute KD patients [39] . Thus, there have been both supportive and unsupportive pieces of evidence, and there is no consensus view on the role of B cells, as well as of the adaptive immune system in KD pathogenesis. In this study, we identified genetic variants within the IGHV region significantly associated with KD in the Japanese population. The immunoglobulin heavy (IGH) locus spans 1.3 Mb (from 106.0 to 107.3 Mb on NC_000014.8) of the chromosome Clonotypes increased more than 1% during 1st and 3rd evaluation points are bold faced 14q32.33 region and encompasses serially arranged IGH constant (IGHC), IGHJ, IGHD, and IGHV clusters. Because of high sequence redundancy which obstructs designing of specific genotyping assays, there is a large blank area (~0.8 Mb) which was not covered by the SNP arrays and therefore we lack association information about the SNVs in the area (Supplementary Fig. 8 ). However, because SNVs within the gap are not in high LD (r 2 < 0.3) with rs6423677 or rs4774175 in the 1000 genomes JPT population ( Supplementary Fig. 8 ) and we did not observe association signals in the chromosomal region on the opposite side of the blank area, the association signals represented by rs6423677 or rs4774175 might be localized within the region where we could investigate in this study.

In contrast to the robust association of rs6423677 in the Japanese subjects, it was not consistent in the Korean and Taiwanese subjects. Although neither significant nor consistent with the results in the current study, a marginal trend of association between SNVs within the IGHV region and linked to rs4774175 (r 2 = 0.45-0.66 in the 1000genomes CHB population) and KD in the Taiwanese population had already been reported [28] . Findings in the previous and the current studies are strongly indicating that variants in the IGHV region are commonly involved in KD pathogenesis at least in the Japanese and Taiwanese populations. However, the robustness of association might not be uniform among the populations. Given that multiple pathogens with regionally or seasonally differing epidemic patterns could be triggering KD, it might be possible that various susceptibility genes or alleles corresponding to different antigenicity for the pathogens exist in this locus and are related to the mixed robustness of the association.

Highly skewed expression of IGH transcripts with the risk-associated allele adds support to consider that IGHV3-66 is the functional target of the associated variants. The A to C nucleotide change at rs6423677 results in an amino acid alteration from cysteine to glycine within the CDR2 region of IGHV3-66. Thus, the modification might affect the affinity to some antigens of immunoglobulins carrying IGHV3-66. However, we consider that the mechanism of increased susceptibility to KD associated with the C allele might not be due to reduced host defense to particular agents because it is inconsistent with the observation that the A allele, which is protective against KD and expected to have a higher neutralizing ability in this scenario, seemed to be nearly silenced. IGHV3-66 has been recognized as one of the functional IGHV genes and purported to be utilized at frequencies of around 1% [40] . The average proportions of IGHV3-66 clonotypes in the 10 healthy adults in our study (0.88% for IgM, 0.50% for IgD, 1.0% for IgG, and 1.1% for IgA), were consistent with that previous information. It is currently unknown in what stage, i.e., somatic recombination, RNA transcription, and processing of the pre-mRNA, the A allele was excluded and resulted in the significantly skewed allelic usage to the C allele of rs6423677. It is suggestive that usage of IGHV3-64, the nearest functional gene to IGHV3-66, showed difference among genotypes at rs6423677 which was similar to that of IGHV3-66 ( Supplementary Fig. 6 ). One potential reason is suggested by the predicted binding of CTCF transcription repressor in the 0.4 kb region encompassing rs6423677 ( Supplementary Fig. 9 ). CTCF has been reported to interact with multiple sites in the IGH region and plays essential roles in somatic recombination [41] of the distal area of the IGHV gene region. The association data of rs6423677 and the increased usage of IGH transcripts with the riskassociated allele are indicative of some active role of the immunoglobulin molecules as antibodies or as components of BCRs in the development of KD. One possible role of such immunoglobulins might be activation of B cells. Established KD susceptibility gene products such as B lymphoid tyrosine kinase (BLK) [8] [9] [10] and Inositol 1,4,5 trisphosphate 3-kinase C (ITPKC) [8, 9] are involved in BCR signaling. If IVIG acts through competing with such immunoglobulins for agents or antigens relevant to KD pathogenesis, the requirement of a high dose administration (1-2 g/kg) of IVIG to treat KD might be reasonable because only a fraction of the IgG would contribute to the therapeutic effect, with IGHV3-66 expected to only account for up to several percent of the IVIG preparations. B cells can be activated by nonspecific binding of microbial products such as superantigens (SAgs) to BCRs. Intriguingly, B cell SAgs restricted to IGHV3 segment of immunoglobulins have been known [42] . However, considering that the innate immune system has been thought to play a central role in the KD vasculitis and that KD can develop in patients with X-linked agammaglobulinemia, who lack or have small numbers of B cells [43] , the activity of B cells or immunoglobulins in KD might be relevant to initiation or enhancement of the innate immune activation but may be substitutable.

Recently, a significant association of an allele of the IGHV4-61 gene (IGHV4-61*02) with susceptibility to rheumatic heart disease (RHD), which is a long-term complication of ARF, was reported [44] . IGHV4-61 is located only 36 kb downstream of IGHV3-66 (Supplementary Fig. 6) . ARF develops as a sequela of Streptococcus pyogenes (S. pyogenes) infection and, similar to KD, has been recognized to affect genetically susceptible individuals [45] . Among reports of GWAS for human diseases, a genome-wide significant association of variants in the IGHV gene region has been identified only for RHD. Although S. pyogenes is not recognized as the cause of KD, considering that KD shares some characteristic symptoms such as skin rash and strawberry tongue with S. pyogenes infection, it is suggestive that the previously discussed role of B cells might be related to some underlying mechanisms of the common symptoms.

In 2020, multiple series of patients with Pediatric Inflammatory Multisystem Syndrome Temporally Associated with SARS-CoV2 infection (PIMS-TS) or (MIS-C) having KD-like symptoms or increase of severe KD patients after the SARS-CoV2 epidemic were reported from the US and European countries [46, 47] . In the latest studies, overrepresentation of IGHV3-53 and IGHV3-66 in neutralizing monoclonal antibodies against the receptor binding domain of SARS-CoV2 spike were also reported [48, 49] . Given immunoglobulin with IGHV3-66 play a role in KD, it might be possible that KD-like symptoms seen in such patients are mediated by interaction between SARS-CoV2 and B cells expressing IGHV3-66.

We also found some commonality between the CDR3 clonotypes that were increased in the IgM heavy chains of the Taiwanese and the Japanese KD patients (Supplementary Table 12 ). IGHV3-66 was used only in one commonly increased CDR3 clonotype (Supplementary  Table 13 ). However, as far as can be understood from the limited number of observations, IGHV3-66 seemed not to be important in the IgM response in the acute pretreatment phase.

Future characterization of the endogenous immunoglobulin molecules that are increased in KD patients utilizing information of the light chains that can be obtained simultaneously in single-cell analyses [50] will facilitate identification of the agent triggering KD as well as understanding the mechanism of action of IVIG treatment.

There are limitations in this study. First, in addition to the gap above where we could not examine the association of the variants, our strategy to focus only on nonsynonymous SNVs on IGHV genes left the possibility that rs6423677 is just a proxy of the genuinely responsible variant located outside IGHV3-66. Second, we did not analyze the timecourse change of the immunoglobulin heavy chain repertoire in other infectious diseases. So it is uncertain whether the upregulation of IgG heavy chain transcripts with IGHV3-66 is a specific observation for KD or not. Third, our results might not directly reflect changes of the immunoglobulins at the protein level because we lack information about the correlation between the proportions of particular IGH clones in the transcripts from B cells and in the proteins expressed on the cell surface or circulating in the serum.

In conclusion, a significant association of a nonsynonymous SNV in the IGHV3-66 gene with KD was observed. Further intensified study of the association in this region and repertoire analyses of immunoglobulins in different ethnicities and subpopulations of the patients with different demographic features would give insights into both the role of B cells in the KD pathogenesis and the causal agent of the disease.

Author contributions JKL, JYW, YTC, and YO supervised the study. JKL, JYW, and YO conceived the study. JYW, TAJ, TT, JKL, and YO designed the study. TAJ, YM, JKL, JYW, and YO wrote the manuscript. AH, HS, HH, TH, and Japan Kawasaki Disease Genome Consortium collected Japanese samples. YMH, GYJ, SWY, JJY, KYL, and Korean Kawasaki Disease Genetics Consortium collected Korean samples. Taiwan Kawasaki Disease Genetics Consortium and Taiwan Pediatric ID Alliance collected Taiwanese samples. YML coordinated the multi-center collaboration in Taiwan as the project manager and collected samples and clinical information. MK performed GWAS assays for the Japanese samples. AT performed statistical analyses for the Japanese GWAS data. DY and TP performed statistical analyses for the Korean GWAS data. JJK conducted a follow-up study (Stage 2) for the Korean samples. CHC performed statistical analyses for the Taiwanese GWAS data and followed-up meta-analyses for the Taiwanese data. YCL supervised the GWAS and replication genotyping pipeline, performed the data analyses. LCC performed statistical analyses for the Taiwanese GWAS data and followed-up meta-analyses for the Taiwanese data. CPC performed genotyping and direct sequencing of Taiwanese samples. TAJ, DY, and CHC conducted the whole-genome imputation. TAJ performed P value simulation and meta-analyses. KO, TT, and KI performed genotyping and direct sequencing of the Japanese samples. YM and YO performed the NGS data analyses for the IGH repertoires.

Conflict of interest The authors declare that they have no conflict of interest.

Ethical approval This study was approved by the Institutional Review Board at all involved institutes.

Informed consent Written informed consent was obtained from all subjects.

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Acute febrile mucocutaneous syndrome with lymphoid involvement with specific desquamation of the fingers and toes in children

Epidemiological observations of Kawasaki disease in Japan

Coronary aneurysms in infants and young children with acute febrile mucocutaneous lymph node syndrome

Nationwide survey of Kawasaki disease and acute rheumatic fever

High-dose intravenous gammaglobulin for Kawasaki disease

The treatment of Kawasaki syndrome with intravenous gamma globulin

A single intravenous infusion of gamma globulin as compared with four infusions in the treatment of acute Kawasaki syndrome

Genome-wide association study identifies FCGR2A as a susceptibility locus for Kawasaki disease

A genome-wide association study identifies three new risk loci for Kawasaki disease

Two new susceptibility loci for Kawasaki disease identified through genome-wide association analysis

Kawasaki disease: a brief history

A genome-wide association analysis reveals 1p31 and 2p13.3 as susceptibility loci for Kawasaki disease

An integrated map of genetic variation from 1,092 human genomes

A linear complexity phasing method for thousands of genomes

Genotype imputation with thousands of genomes

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing

Characterizing immunoglobulin repertoire from whole blood by a personal genome sequencer

FLASH: fast length adjustment of short reads to improve genome assemblies

Allele-specific repression of lymphotoxin-alpha by activated B cell factor-1

Genome-wide association study identifies three new susceptibility loci for adult asthma in the Japanese population

Immunoglobulin profiling identifies unique signatures in patients with Kawasaki disease during intravenous immunoglobulin treatment

Neutrophilic involvement in the damage to coronary arteries in acute stage of Kawasaki disease

Activation of peripheral blood monocytes and macrophages in Kawasaki disease: ultrastructural and immunocytochemical investigation

Kawasaki disease: a matter of innate immunity

ITPKC functional polymorphism associated with Kawasaki disease susceptibility and formation of coronary artery aneurysms

Common variants in CASP3 confer susceptibility to Kawasaki disease

A genome-wide association study identifies novel and functionally related susceptibility Loci for Kawasaki disease

Identification of novel susceptibility Loci for Kawasaki disease in a Han chinese population by a genome-wide association study

Genome-wide linkage and association mapping identify susceptibility alleles in ABCC4 for Kawasaki disease

Genetic variation in the SLC8A1 calcium signaling pathway is associated with susceptibility to Kawasaki disease and coronary artery abnormalities

Whole genome sequencing of an African American family highlights toll like receptor 6 variants in Kawasaki disease susceptibility

A genome-wide association analysis identifies NMNAT2 and HCP5 as susceptibility loci for Kawasaki disease

Genetic and epigenetic fine mapping of causal autoimmune disease variants

Immunoregulatory abnormalities in mucocutaneous lymph node syndrome

Mononuclear cell subsets and coronary artery lesions in Kawasaki disease

Immunoglobulin M antibodies present in the acute phase of Kawasaki syndrome lyse cultured vascular endothelial cells stimulated by gamma interferon

Antineutrophil cytoplasm antibodies in Kawasaki disease

Oligoclonal IgA response in the vascular wall in acute Kawasaki disease

Unique activation status of peripheral blood mononuclear cells at acute phase of Kawasaki disease

Accumulation of VH replacement products in IgH genes derived Association of an IGHV3-66 gene variant with Kawasaki disease from autoimmune diseases and anti-viral responses in human

CTCF-binding elements mediate control of V(D)J recombination

Age-associated changes in binding of human B lymphocytes to a VH3-restricted unconventional bacterial antigen

Autoimmunity in X-linked agammaglobulinemia: Kawasaki disease and review in the literature

Pacific Islands rheumatic heart disease genetics network. Association between a common immunoglobulin heavy chain allele and rheumatic heart disease risk in Oceania

Cumulative incidence of rheumatic fever in an endemic region: a guide to the susceptibility of the population?

Hyperinflammatory shock in children during COVID-19 pandemic

Clinical characteristics of 58 children with a pediatric inflammatory multisystem syndrome temporally associated With SARS-CoV-2

Potent neutralizing antibodies against SARS-CoV-2 identified by highthroughput single-cell sequencing of convalescent patients' B cells

Structures of human antibodies bound to SARS-CoV-2 spike reveal common epitopes and recurrent features of antibodies

High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire

23 • Taesung Park 24 • Korean Kawasaki Disease Genetics Consortium, Taiwan Kawasaki Disease Genetics Consortium

Acknowledgements This study was supported by grants from the Millennium Project, from the Japan Kawasaki Disease Research Center (2015 to YM, and 2018 and 2019 to YO), and from the Japan Agency for Medical Research and Development (JP18ek0410039 to TH). This study was also supported by a grant from the Ministry of Health & Welfare of the Republic of Korea (HI15C1575 to JKL). We are grateful to the KD patients and their family members as well as the medical staff taking care of the patients. We also thank Ms. Yoshie Kikuchi for her technical assistance.