key: cord-1035830-9j88ykip authors: Pégorier, Perrine; Bertignac, Morgane; Nguefack Ngoune, Viviane; Folch, Géraldine; Jabado-Michaloud, Joumana; Giudicelli, Véronique; Duroux, Patrice; Lefranc, Marie-Paule; Kossida, Sofia title: IMGT(®) Biocuration and Comparative Analysis of Bos taurus and Ovis aries TRA/TRD Loci date: 2020-12-28 journal: Genes (Basel) DOI: 10.3390/genes12010030 sha: 677e1c5990e95e9260cf5e50ab71816f83c93270 doc_id: 1035830 cord_uid: 9j88ykip The adaptive immune response provides the vertebrate immune system with the ability to recognize and remember specific pathogens to generate immunity, and mount stronger attacks each time the pathogen is encountered. T cell receptors are the antigen receptors of the adaptive immune response expressed by T cells, which specifically recognize processed antigens, presented as peptides by the highly polymorphic major histocompatibility (MH) proteins. T cell receptors (TR) are divided into two groups, [Formula: see text] and [Formula: see text] , which express distinct TR containing either [Formula: see text] and [Formula: see text] , or [Formula: see text] and [Formula: see text] chains, respectively. The TR [Formula: see text] locus (TRA) and TR [Formula: see text] locus (TRD) of bovine (Bos taurus) and the sheep (Ovis aries) have recently been described and annotated by IMGT(®) biocurators. The aim of the present study is to present the results of the biocuration and to compare the genes of the TRA/TRD loci among these ruminant species based on the Homo sapiens repertoire. The comparative analysis shows similarities but also differences, including the fact that these two species have a TRA/TRD locus about three times larger than that of humans and therefore have many more genes which may demonstrate duplications and/or deletions during evolution. The adaptive immune response arose in jawed vertebrates or gnathostomata more than 450 million years ago. It is characterized by the remarkable specificity and the extreme diversity of their antigen receptors [1] . These antigen receptors of the adaptive immune response are the immunoglobulins (IG) or antibodies of the B cells and plasmocytes [2] , and the T cell receptors (TR) of the T cells [3] . The IG recognize antigens in their native form, whereas the TR recognize processed antigens, which are presented as peptides by the major histocompatibility (MH) proteins. T cell receptors (TR) are divided into two groups, αβ and γδ, which express distinct TR containing either α and β, or γ and δ chains, respectively [3] . Each TR chain comprises a variable and a constant domain. The variable domain is the result of one rearrangement between variable (V) and joining (J) genes for α and γ chains, and two consecutive rearrangements between diversity (D) and J genes then between V and partially rearranged D-J genes for β and δ chains. After transcription, the V-(D)-J sequence is spliced to the constant (C) gene to give the final transcript [3] . The human TRα (TRA) locus consists of a cluster of 56 TRAV genes located upstream (in 5 ) of a J-C cluster, composed of sixty-one TRAJ and one TRAC [3] . The TRδ (TRD) locus is nested in the TRA locus between the TRAV and the TRAJ genes [3] . This locus comprises A comparison was performed based on the number of genes in the locus as well as the number of genes per subgroup (potential germline repertoire), the locus representation, the functionality of genes and the CDR lengths. Potential duplications and/or deletions that may have occurred during evolution are susceptible to be highlighted from this sort of comparison. The two TRA/TRD loci were annotated following the pipeline described in the Materials and Methods. The results of the annotation described below are summarized in Table 1 . The information regarding the genome assemblies and the boundaries is provided in Supplementary Table S1 . The bovine TRA/TRD locus, on chromosome 10 (REV), spans 3331 kb and consists of a total of 238 V genes: 183 TRAV genes (79 F, 14 ORF, 74 P, 3 F or ORF, 9 F or P, 3 ORF or P and 1 F or ORF or P) belonging to 40 TRAV subgroups and 39 (+16 non localized) TRDV genes (45 F, 5 ORF and 5 P) belonging to 5 TRDV subgroups, 9 TRDD genes (6 F and 3 ORF), 64 J genes: 60 TRAJ genes (52 F, 2 ORF, 4 P and 2 F or P) and 4 TRDJ genes (3 F and 1 ORF), 1 TRAC gene (F) and 1 TRDC gene (F). The IMGT 5' borne (OR10G3) has been identified 24 kb upstream of the first gene of the locus and the IMGT 3' borne (DAD1), has been identified 12 kb downstream of the last gene of the locus (cf. Supplementary Figure S1 ). The sheep TRA/TRD locus, on chromosome 7 (REV), spans 2882 kb and consists of a total of 381 V genes: 277 (+16 non localized) TRAV genes (124 F, 11 ORF, 149 P, 1 F or ORF, 7 F or P and 1 ORF or P) belonging to 39 TRAV subgroups and 70 (+18 non localized) TRDV genes (34 F, 12 ORF, 28 P, 5 F or ORF, 6 F or P and 3 ORF or P) belonging to 5 TRDV subgroups, 9 TRDD genes (5 F and 4 ORF), 84 J genes: 79 (+1 non localized) TRAJ genes (61 F, 6 ORF and 13 P) and 4 TRDJ genes (3 F and 1 ORF), 1 TRAC gene (F) and 1 TRDC gene (F). The IMGT 5 borne (OR10G3) was not found and IMGT 3 borne (DAD1) has been identified 12 kb downstream of the last gene of the locus (cf. Supplementary Figure S2 ). Regarding the sequences and the number of gaps, the quality of the last assemblies (this study) is better than the previous studies. For the bovine, the entire locus is localized on the chromosome 10 and there is only seven gaps. In all the previous assemblies there are genes on unplaced scaffolds and there are more than 260 gaps, except for [11] . On the other hand, many more genes have been described in previous studies (cf. Table 2 ). For the sheep, the entire locus is localized on the chromosome 7 and there are eighteen gaps. In the previous assembly there are genes on unplaced scaffolds and there are more than 80 gaps. Unlike cattle, fewer genes have been described in previous studies (cf. Table 3 ). Given that there is access to two full assemblies (ARS-UCD1.2 for Bos taurus and Oar_rambouillet_v1.0 for Ovis aries), qualified as "representative genome" and as the corresponding TRA/TRD locus has been fully localized on a single chromosome with fewer gaps than in previous IMGT annotated genomic sequences, IMGT000049 and IMGT000048 are considered as IMGT references loci. It has allowed the establishment of the bovine and sheep TRA/TRD gene nomenclature, as well as the evaluation of the functionality of genes. The previous IMGT genomic sequences were re-annotated accordingly and the allelic variants determined based on nucleotide differences in the core region (V-REGION, D-REGION, J-REGION, C-REGION). The number of TRAJ genes of human and bovine is similar and there are 19 more genes in sheep (cf. Table 1 ). Two TRAJ genes (TRAJ51 and TRAJ55) are missing in cattle and sheep compared to humans, and there are two TRAJ8 genes while there is only one in human. (cf. Table 4 ). The 19 TRAJ supplementary genes found in the sheep as a consequence of a duplication (or triplication for some genes) from TRAJ29 to TRAJ39 maybe due to a sequencing error or an amplification. Regarding the functionality, TRAC genes are functional and few TRAJ genes are P in human and bovine (3-4 and 4-6, depending on alleles, respectively). On the other hand, there are more pseudogenes in sheep mostly due to the duplicated genes (11 P out of 13 are duplicated genes) (cf. Table 4 ). Table 4 . IMGT Potential germline repertoires of the TRAJ sets in human (Homo sapiens), bovine (Bos taurus) and sheep (Ovis aries). Homo sapiens Bos taurus Ovis aries At the genomic level, each TRAC gene consists of several exons whose sizes are the same for all species except for exon 4 which is untranslated (EX4UTR) (cf. Figure 1 ). On the other hand, the size of the introns varies according to the species, especially between human and bovine/sheep. In humans, the intron between the exon 1 (EX1) and the exon 2 (EX2) and the intron between EX2 and the exon 3 (EX3) are shorter while the intron between EX3 and EX4UTR is longer compared to bovine and sheep. Each TRAC gene encodes a similar protein of 142 AA with the exon 1 (EX1) encoding the constant domain, the exon 2 (EX2) and the 5' part of the exon 3 (EX3) encoding the connecting region, the middle of EX3 encoding the transmembrane region and the 3' part of EX3 encoding the cytoplasmic region (cf. Figure 2) . Nevertheless, the structure of EX1 is different, there are fewer AA in the E and F strand and more AA in the G strand of human TRAC compared to bovine/sheep. [29] . The AA between parentheses at the beginning of EX1, EX2 and EX3 corresponds to the first codon resulting from a splicing frame 1 (sf1 The number of TRDJ genes of human, bovine and sheep is the same but there are more TRDD genes in bovine and sheep (nine against three in human) (cf. Table 1 ). Regarding the functionality, TRDC genes are functional, few TRDD genes are ORF in bovine and sheep (three and four, respectively) (cf. Table 5 ) and one TRDJ gene is ORF both in bovine and sheep (TRDJ2) (cf. Table 6 ). Table 5 . IMGT Potential germline repertoires of the TRDD sets in human (Homo sapiens), bovine (Bos taurus) and sheep (Ovis aries). Homo sapiens Bos taurus Ovis aries For each TRDD set, in each species, the number of TRDD genes by functionality and, between parentheses, the number of alleles are shown. F: functional; O: ORF. Data available in IMGT Repertoire (IG and TR) http://www.imgt.org/IMGTrepertoire/ > Locus and genes > Potential germline repertoires > TRDV, TRDD and TRDJ > Human, ibid. Bovine, ibid. Sheep. Unlike TRAC, the size of the exons of TRDC varies depending on the species except for EX1 (cf. Figures 3) . The EX2 is shorter in human but the EX3 is longer compared to bovine and sheep. In the same way, the size of the introns varies according to the species. Each TRDC gene encodes a similar protein of 155-156 AA with EX1 encoding the constant domain, EX2 and the 5 part of EX3 encoding the connecting region and the 3 part of EX3 encoding the transmembrane region (cf. Figure 4 ). The size of the V-CLUSTER (which describes the principal set of TRAV/TRDV genes) varies (cf. Figure 5 ). The V-CLUSTER is less extensive in human (56 genes on 900 kb) than in the bovine and sheep, which is consistent with the number of genes in these species (221 genes over 2200 kb and 346 genes on 2700 kb, respectively). Regarding the functionality of V genes, the proportion of functional genes is more important in human and in bovine compared to pseudogenes. However, there are more pseudogenes in sheep. [29] . The AA between parentheses at the beginning of EX1, EX2 and EX3 corresponds to the first codon resulting from a splicing frame 1 (sf1 Colors are according to IMGT color menu for genes (http://www.imgt.org/IMGTScientificChart/RepresentationRules/colormenu.php#h1_28): in green: functional genes, in yellow: ORF genes and in red: pseudogenes. The dotted line in Bos taurus indicates the distance in kb between two genes not represented at scale. Data available in IMGT Repertoire (IG and TR) http://www.imgt.org/IMGTrepertoire/ > Locus and genes > Locus representations > TRA, ibid. TRD > Human, ibid. Bovine, ibid. Sheep. All subgroups were defined according to those of the human genome. A phylogenetic tree with one representative gene by subgroup (except for TRAVA, TRAVB and TRAVC, highly degenerated pseudogenes present only in human) for the human, the bovine and the sheep was created in order to highlight the distance between the species within a subgroup (cf. Figure 6 ). This phylogenetic tree shows that, for the two species, the genes of a subgroup are grouped in the same branch with a corresponding human gene. Nonetheless there are subgroups missing in both cattle and sheep (TRAV7, TRAV15, TRAV30, TRAV31, TRAV32, TRAVA, TRAVB and TRAVC) and only in sheep (TRAV40), new subgroups in bovine and sheep (TRAV43, TRAV44 and TRAV45) and three subgroups are intermingled: TRAV4, TRAV26 and TRAV44 (cf. Supplementary Figure S3 ). However, there is less than 75% identity among the genes of these three subgroups for a given species, so they cannot be considered as genes belonging to the same subgroup. The number of TRAV genes varies depending on the species. There are fewer genes in human than in bovine and fewer genes in bovine than in sheep (cf. Table 1 ). The number of genes per subgroup also varies according to the species (cf. Table 7 ). In humans there are one or two genes by subgroup except for TRAV8 and TRAV12 (eight and three genes, respectively) while in cattle and sheep there are subgroups highly developed. In the sheep, there are six subgroups with more than 20 genes (TRAV8, TRAV13, TRAV22, TRAV23, TRAV25 and TRAV44) and three subgroups with more than 10 genes (TRAV9, TRAV14 and TRAV43) although there are only five subgroups in bovine with more than 10 genes (TRAV22, TRAV23, TRAV25, TRAV44 and TRAV45). In addition, as show in the phylogenetic tree (cf. Figure 6 ) eight subgroups are absent in both species and one subgroup is missing only in sheep. The CDR lengths are relatively well conserved between the different species (cf. Table 8 ). The most important differences are in bovine where for some subgroups there are two or three different lengths (TRAV10, TRAV20, TRAV22 and TRAV38) and for three human subgroups in which the CDR length is different from bovine and sheep (TRAV11, TRAV35 and TRAV39). These differences are shown in red in Table 8 . For two subgroups (TRAV17 and TRAV18) the bovine has some genes with the same CDR lengths as human (in blue) and some with the same CDR lengths as sheep (in green). Table 7 . IMGT Potential germline repertoires of the TRAV subgroups in human (Homo sapiens), bovine (Bos taurus) and sheep (Ovis aries). Homo sapiens Bos taurus Ovis aries In black: subgroups only present in humans. Tree generated using NGPhylogeny.fr [25] (with MAFFT [26] and PhyML [27] programs) and iTOL v4 [37] . Like for the TRAV genes, the subgroups were defined according to those of the human genome and a phylogenetic tree with all genes was created (cf. Figure 7 ). This phylogenetic tree shows that, except for the TRDV1 subgroup, the genes are grouped in the same branch with a corresponding human gene. However the TRDV1 subgroup is divided in two branches even if there is more than 75% identity between all those genes. Tree generated using NGPhylogeny.fr [25] (with MAFFT [26] and PhyML [27] programs) and iTOL v4 [37] . As for the TRAV genes, the number of TRDV genes varies depending on the species. There are fewer genes in human than in bovine and fewer genes in bovine than in sheep (cf. Table 1 ). There are two new subgroups in bovine and sheep compared to human (TRDV4 and TRDV5) and the TRDV1 subgroup much larger in cattle and sheep with 50 and 84 genes, respectively, compared to 1 in human (cf. Table 9 ). Contrary to TRAV genes, the CDR lengths are not conserved between human and bovine/sheep for TRDV2 and TRDV3 subgroups (cf. Table 10 ). For TRDV1 subgroups, there are several different lengths for bovine and sheep (nine and five respectively) due to the high number of genes in this subgroup. There are also genes with lack of CDR2-IMGT and part of CDR3-IMGT (deletion of nine amino acids (AA), not shown in Table 10 ). This particularity was already described in bovine by [11] and is present in sheep too. Four genes are concerned in bovine (three in-frame and one out-of-frame (P with frameshift)) and two in sheep (six in-frame and two out-of-frame). The in-frame genes are shown in Figure 8 . Homo sapiens Bos taurus Ovis aries The last step of the biocuration pipeline consists of the automatic annotation of the cDNAs available in IMGT/LIGM-DB database with the IMGT/Automat tool [21] : 176 cDNA sequences for cattle and 102 for sheep were annotated. This annotation allowed to highlight the transcription of approximately 50% (for cattle) and 40% (sheep) of the germline genes. Interestingly, TRAJ54 which has a stop codon in position 1 of the J-REGION, and TRDV1-13 with a stop codon in position 108, last position of the V-REGION have been found rearranged and give a productive sequence (with no stop codon and an in-frame junction) in accessions numbers JX065661 (http://www.imgt.org/ligmdb/result.action?accessionNumber=JX065661) and BC113229 (http://www.imgt.org/ligmdb/result.action?accessionNumber=BC113229) respectively, showing the trimming of the stop codon during the rearrangement. FR1-IMGT CDR1-IMGT FR2-IMGT CDR2-IMGT FR3-IMGT CDR3-IMGT (1-26 This study was carried out in order to highlight the differences between the IMGT ® annotation and the data previously published and to compare the TRA/TRD loci among bovine and sheep against the human locus. The annotation of each locus followed the pipeline defined in Materials and Methods. The expertise that follows this pipeline permits to establish the TRA/TRD germline repertoire according to IMGT ® nomenclature and the IMGT ® reference directory (IMGT ® reference sequences used by IMGT ® tools) of each locus and thus obtain sequence, gene and structure data. For each gene analyzed, there are more than 200 pieces of information available in IMGT ® databases, tools and web pages. The comparison of the data obtained after the biocuration was carried out against the data of the human TRA/TRD loci. This analysis was done with respect to the data entered in IMGT Repertoire. The two loci in the last assemblies have fewer gaps and are localized on a chromosome without unplaced scaffold compared to the previous studies (cf . Tables 2 and 3 ). Indeed, it is a basic requirement, with an expected positional organization of genes in the locus, for the annotation of a complete locus with a definitive nomenclature in IMGT ® . We rely on publicly available data, which is why we need good quality data so that we can annotate what we see with good quality annotations. It is worth noting that the nomenclature presented in this manuscript, for the under question loci and species, is carved on stone and it will not change in the future. As a matter of fact, once the IMGT biocuration team gets hold of a genomic assembly covering the whole locus (no contigs, no scaffolds), then a reference assembly is established which gives rise to the definite IMGT nomenclature. Obviously enough, subsequent assemblies might/will be available either for the same individual or for other individuals which will constitute novel haplotypes in the latter case, but will not afffect the original nomenclature. During the analysis of the TRA/TRD locus in bovine and sheep, it was noted that the general organization of the locus is conserved and is similar to the human one even if the V-CLUSTER is more extensive (cf. Figure 5 ). It should be noted that the IMGT ® unique nomenclature, based on subgroup assignment and position of genes within the locus, represents a valuable help in highlighting locus organizational similarities or differences. The results show that some subgroups are missing and three new subgroups were described in bovine and sheep compared to human. Some subgroups are more represented in bovine and in sheep than in human, which may indicate potential duplications during evolution. It can also explain the difference in the proportion of functional genes. Indeed, duplicated subgroups in bovine and sheep are composed of an important proportionality of pseudogenes resulting higher number of pseudogenes compared to human. Another indication of duplication during evolution is the presence of an important number of TRDV1 genes (50 in bovine and 66 in sheep) compared to 1 in human [13] . In the TRAV genes, there is only one CDR length for most of human, bovine and sheep subgroups, except for six bovine subgroups (TRAV10, TRAV17, TRAV18, TRAV20, TRAV22 and TRAV38) (cf. Table 8 ) while in the TRDV1 subgroups there are several lengths (cf. Table 10 ) and even some genes without CDR2-IMGT (cf. Figure 8) . It would be interesting to see if these specificities (expansion of the TRDV1 subgroup and of the TRAV subgroups, absence of CDR2-IMGT for some TRDV1 genes, etc.) are also found in other ruminant species. The veterinary species are valuable models for immunological and medical research. The comparison of the TRA/TRD locus among bovine and sheep presented here allow to have a global vision of the TRA/TRD locus in Bovidae and will be a useful resource to analyze the TRA/TRD locus in new species not yet analyzed. The work carried out and the use of the methodology established for the analysis of the TRB locus [19] show that this procedure can be used to facilitate the analysis of IG (IGH, IGK and IGL) and TR (TRA, TRB, TRD and TRG) loci among different species. Supplementary Materials: The following are available online at https://www.mdpi.com/2073-442 5/12/1/30/s1. Table S1 : Information regarding the genome assembly and TRA/TRD locus IMGT 5 and 3 borne in human (Homo sapiens), bovine (Bos taurus) and sheep (Ovis aries). Figure S1 : Locus representation of the bovine (Bos taurus) TRA/TRD locus. Colors are according to IMGT color menu for genes (http://www.imgt.org/IMGTScientificChart/RepresentationRules/colormenu.php#h1_28). The dotted line indicates the distance in kb between two genes not represented at scale. Figure S2 : Locus representation of the sheep (Ovis aries) TRA/TRD locus. Colors are according to IMGT color menu for genes (http://www.imgt.org/IMGTScientificChart/RepresentationRules/colormenu.php#h1_28). Figure S3 : Phylogenetic tree of all TRAV genes for all species (using V-REGION). Homsap: human, Bostau: bovine and Oviari: sheep. Tree generated using NGPhylogeny.fr [25] (with MAFFT [26] and PhyML [27] programs) and iTOL v4 [37] . The following abbreviations are used in this manuscript: The differences in CDR length are shown in red. The correspondances for subgrousp TRDV1 are shown in blue Immunoglobulin and T Cell Receptor Genes: IMGT(®) and the Birth and Rise of Immunoinformatics The Immunoglobulin FactsBook The T Cell Receptor FactsBook Responses of bovine WC1(+) gammadelta T cells to protein and nonprotein antigens of Mycobacterium bovis Immunological characterization of a gammadelta T-cell stimulatory ligand on autologous monocytes Bovine respiratory coronavirus Novel Influenza D virus: Epidemiology, pathology, evolution and biological characteristics Maternal allergic asthma during pregnancy alters fetal lung and immune development in sheep: Potential mechanisms for programming asthma and allergy Assessment of the nucleotide sequence variability in the bovine T-cell receptor alpha delta joining gene region The bovine T cell receptor alpha/delta locus contains over 400 V genes and encodes V genes without CDR2 Annotation and classification of the bovine T cell receptor delta genes Genomic analysis offers insights into the evolution of the bovine TRA/TRD locus Assignment of the TCRA/TCRD locus to sheep chromosome bands 7q1.4->q2.2 by fluorescence in situ hybridization Artiodactyl emergence is accompanied by the birth of an extensive pool of diverse germline TRDV1 genes Ovis aries) T cell receptor alpha (TRA) and delta (TRD) genes and genomic organization of the TRA/TRD locus Assembly: A resource for assembled genomes at NCBI IMGT®, the international ImMunoGeneTics information system® 25 years on IMGT® Biocuration and Comparative Study of the T Cell Receptor Beta Locus of Veterinary Species Based on Homo sapiens TRB. Front From IMGT-ONTOLOGY to IMGT/LIGMotif: The IMGT standardized approach for immunoglobulin and T cell receptor gene identification and description in large genomic sequences Immunogenetics Sequence Annotation: The Strategy of IMGT based on IMGT-ONTOLOGY. Stud WHO-IUIS Nomenclature Subcommittee for immunoglobulins and T cell receptors report The highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis fr: New generation phylogenetic services for non-specialists MAFFT multiple sequence alignment software version 7: Improvements in performance and usability New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0 IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains IMGT unique numbering for immunoglobulin and T cell receptor constant domains and Ig superfamily C-like domains Perles for the variable (V), constant (C), and groove (G) domains of IG IMGT standardized representation of domains (IG, TR, and IgSF variable and constant domains, MH and MhSF groove domains) IMGT/LIGM-DB, the IMGT comprehensive database of immunoglobulin and T cell receptor nucleotide sequences A comprehensive database for human and mouse immunoglobulin and T cell receptor genes T cell receptor and MHC structural data IMGT(®) tools for the nucleotide analysis of immunoglobulin (IG) and T cell receptor (TR) V-(D)-J repertoires, polymorphisms, and IG mutations: IMGT/V-QUEST and IMGT/HighV-QUEST for NGS A database and a tool for immunoglobulins or antibodies, T cell receptors, MHC, IgSF and MhcSF Interactive Tree of Life (iTOL) v4: Recent updates and new developments We are grateful to Gérard Lefranc for helpful discussion, to the IMGT ® team for their expertise and constant motivation, to Dominique Scaviner for the initial annotation of the human TRA/TRD, to Amandine Lacan (deceased October 19, 2018) for the initial annotation of the bovine TRA/TRD (based on [12] ) and to Imène Chentli for the initial annotation of the sheep TRA/TRD locus (based on [16] ). IMGT ® is a registered trademark of CNRS. IMGT ® is a member of the International Medical Informatics Association (IMIA) and of the Global Alliance for Genomics and Health (GA4GH). The authors declare no conflict of interest.