key: cord-0016982-2k5jkfma authors: Linguiti, Giovanna; Kossida, Sofia; Pierri, Ciro Leonardo; Jabado-Michaloud, Joumana; Folch, Geraldine; Massari, Serafina; Lefranc, Marie-Paule; Ciccarese, Salvatrice; Antonacci, Rachele title: The T Cell Receptor (TRB) Locus in Tursiops truncatus: From Sequence to Structure of the Alpha/Beta Heterodimer in the Human/Dolphin Comparison date: 2021-04-14 journal: Genes (Basel) DOI: 10.3390/genes12040571 sha: 3b882d0aaeea31bdb5ef7c5455ebc253027ac146 doc_id: 16982 cord_uid: 2k5jkfma The bottlenose dolphin (Tursiops truncatus) belongs to the Cetartiodactyla and, similarly to other cetaceans, represents the most successful mammalian colonization of the aquatic environment. Here we report a genomic, evolutionary, and expression study of T. truncatus T cell receptor beta (TRB) genes. Although the organization of the dolphin TRB locus is similar to that of the other artiodactyl species, with three in tandem D-J-C clusters located at its 3′ end, its uniqueness is given by the reduction of the total length due essentially to the absence of duplications and to the deletions that have drastically reduced the number of the germline TRBV genes. We have analyzed the relevant mature transcripts from two subjects. The simultaneous availability of rearranged T cell receptor α (TRA) and TRB cDNA from the peripheral blood of one of the two specimens, and the human/dolphin amino acids multi-sequence alignments, allowed us to calculate the most likely interactions at the protein interface between the alpha/beta heterodimer in complex with major histocompatibility class I (MH1) protein. Interacting amino acids located in the complementarity-determining region according to IMGT numbering (CDR-IMGT) of the dolphin variable V-alpha and beta domains were identified. According to comparative modelization, the atom pair contact sites analysis between the human MH1 grove (G) domains and the T cell receptor (TR) V domains confirms conservation of the structure of the dolphin TR/pMH. The bottlenose dolphin (Tursiops truncatus) and the other cetaceans phylogenetically belong to the Cetartiodactyla clade, which includes Cetacea and Artiodactyla [1, 2] . The divergence of cetaceans from their terrestrial ancestors about 54 million years ago [3] resulted in a complete adaptation to aquatic life. In addition to anatomical and physiological innovations required for life in water, cetaceans must have been confronted with challenges from changing environmental pathogens while they transitioned from land to sea [4, 5] . These challenges exerted intensified selection pressure on the whole genomes of cetacean lineage, including the Delphinidae family [6] , as well as on gene families related to the immune system [7] . Considering the percentage of nucleotide identity of the genes with respect to human [8, 21, 30] and the other mammalian species and based on the genomic position within the locus, each TRB gene was classified, and the nomenclature was established according to the IMGTconcepts of classification, approved by HGNC, VGNC, NCBI and the IUIS Nomenclature Committee for immunoglobulin (IG) and TR of all vertebrate species with jaws (gnathostomata) from fish to humans [8, 23, 31] (Supplementary Table S1 ). The functionality of the V, D, J, and C genes, based on the IMGT Scientific chart rules [18] , was predicted through the manual alignment of sequences adopting the following parameters: (a) identification of the leader sequence at the 5 of the TRBV genes; (b) determination of proper RS located at 3 of the TRBV (V-RS), 5 and 3 ends of the TRBD (5 D-RS and 3 D-RS) and 5 of the TRBJ (J-RS), respectively; (c) determination of conserved acceptor and donor splicing sites (IMGT ® [18] , IMGT Education > IMGT Aide-mémoire > Splicing sites); (d) estimation of the expected length of the coding regions; (e) absence of frameshifts and stop codons in the coding regions of the genes (IMGT ® [18] , IMGT Scientific chart > 1. Sequence and 3D structure identification and description > IMGT functionality). The criterion that sequences with a nucleotide identity of more than 75% in the Vregion belong to the same subgroup was adopted to assign the TRBV genes to 20 different subgroups. The percentage of nucleotide identity has been identified by using the Clustal Omega alignment tool, which is available at the EMBL-EBI website (http://www.ebi.ac. uk/, accessed on 10 February 2021). The TRBD, TRBJ, and TRBC genes were annotated, according to the similarity with the other artiodactyl species and their specific organization in three D-J-C clusters, numbered C1, C3, and C2 from 5' to 3' in the locus [25] [26] [27] . Each TRBJ gene of the TRBJ1, TRBJ2, and TRBJ3 sets was designed by a hyphen and a number corresponding to their position in the cluster. They were all predicted to be functional, except for the TRBJ1-5 and TRBJ3-6 which are pseudogenes owing to STOP-CODON and TRBJ1-2 (noncanonical J-NONAMER), Table S1 ). The TRBV genes used for the phylogenetic analysis were retrieved from the following sequences deposited in the GEDI (for GenBank/ENA/DDBJ/IMGT/LIGM-DB) databases: NW_011591622, NW_011593440, NW_011591151 (Camelus dromedarius TRB locus contig as previously characterized [26, 27] ), NC_010460 (Sus scrofa TRB locus contig as previously characterized [24] ); and NC_030811.1 (Capra hircus TRB locus contig as previously characterized [32] ) (Supplementary Table S3 ). The MUSCLE program was used to obtain multiple alignments of the gene sequences under analysis [33] . The evolutionary analyses were conducted in MEGA7 [34] . To reconstruct the phylogenetic tree we used the neighbor-joining (NJ) method [35] while the evolutionary distances were computed using the p-distance method [36] . Blood samples were provided by Zoomarine Italia S.p.A. (Rome, Italy) and were collected from two unrelated dolphins, a male (Marco) and a female (Leah), identified with letters M and L, respectively. Total RNA was isolated from peripheral blood leukocytes (PBL) using the Trizol method according to the manufacturer's protocol (Invitrogen, Carlsbad, CA, USA). About 5 µg of RNA was reverse transcribed with Superscript II (Invitrogen, Carlsbad, CA, USA) by using a specific primer, TCB1L1 (5 GCTGGGGTCCTCCTTGTC 3 ). After linking a poly-C tail at the 5 end of the cDNAss, the cDNAds was performed with Platinum Taq Polymerase (Invitrogen) by using a specific primer, TCB1L2 (5 TTCCCGTTCACCCACCAG 3 ) as lower primer and an anchor oligonucleotide as upper primer (AAP) provided from the supplier (Invitrogen). PCR conditions were the following: one cycle at 94 • C for 1 min; 35 cycles at 94 • C for 30 s, 58 • C for 45 s, 72 • C for 1 min; a final cycle of 30 min at 72 • C. The products were then amplified in a subsequent nested PCR experiment by using a specific lower primer, TCB1L3 (5 GTGGTGACGGGGTAGAAG 3 ) and Abridged Universal Amplification Primer (AUAP) oligonucleotide as upper primer, provided from the supplier (Invitrogen). Nested PCR conditions were the following: one cycle at 94 • C for 1 min; 30 cycles at 94 • C for 30 s, 58 • C for 35 s, 72 • C for 30 s; a final cycle of 30 min at 72 • C. All the specific primers, TCB1L1, TCB1L2, and TCB1L3 were designed on the sequence of the first exon of the TRBC gene. The rapid amplification of cDNA ends (RACE) products were then gel-purified and cloned using the StrataClone PCR Cloning Kit (Stratagene). Random selected positive clones for each cloning were sequenced by a commercial service. cDNA sequence data were processed and analyzed using the BLAST program (http://www.blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 10 February 2021), Clustal Omega alignment tool (http://www.ebi.ac.uk, accessed on 10 February 2021), and IMGT tools (IMGT/V-QUEST [37, 38] with integrated IMGT/JunctionAnalysis [39, 40] and the IMGT unique numbering for the V-domain [17, 23] ). All cDNA clones were registered in the EMBL database with the accession numbers from HG764428 to HG764459. The folding recognition methods implemented in pGenThreader and i-Tasser were used for highlighting TR alpha and TR beta homologous protein-crystallized structures. With this aim, the deduced amino acid sequences of the T. truncatus TRA (TRAV20-S2*01, 5RAL27 clone, GenBank: LN610732.1; TRAV18-1*02-J20, 5RAL11 clone, GenBank: LN610716.1) and TRB (TRBV30-D2-J2-2, L5RBL27 clone, clone 5RBL27 in GenBank: HG7644 54.1; TRBV30-D2-J3-3, L5RBL13 clone, clone 5RBL13 in GenBank: HG764440.1) cDNAs were used as query sequences for running pGenThreader (http://bioinf.cs.ucl.ac.uk/ psipred/, accessed on 10 February 2021) and i-Tasser (https://zhanglab.ccmb.med.umich. edu/I-TASSER/, accessed on 10 February 2021) to screen the PDB, searching for the most similar deposited crystallized structures [41] [42] [43] [44] [45] . SPDBV [46] was used for building a 3D all-atom model of all the investigated T. truncatus TR alpha or TR beta chains by using the human TR alpha or TR beta available in the human crystallized TR in complex with major histocompatibility class I (MH1) protein made of an I-ALPHA chain noncovalently associated with beta-2-microglobulin (B2M) [23] , available under the pdb code "3hg1.pdb". The human major histocompatibility class I (MH1 I-ALPHA) protein and beta-2 microglobulin (B2M) sequences were used as query sequences for identifying their counterparts in T. truncatus, through BLAST searches. The identified protein sequences were modeled by using the structures of the human MH1 I-ALPHA and B2M available under the pdb code "3hg1.pdb", used as a protein template, by SPDB according to our validated protocols [44, 45, 47] . All the generated 3D all-atom models were energetically minimized by using the Yasara Minimization server [45, 47, 48] . 2.6. 3D Modeling of T. truncatus TR Alpha and TR Beta in Complex with T. truncatus MH1I-ALPHA and B2M The proposed 3D comparative protein complex consisting of T. truncatus TR alpha and TR beta in complex with T. truncatus MH1 I-ALPHA and B2M was obtained by superimposing the above-cited single chains domains (T. truncatus TR alpha, TR beta, MH1 I-ALPHA, and B2M) on the corresponding 3D atomic coordinates of each corresponding chain within the crystallized human TR in complex with human MH1 I-ALPHA and B2M, available under the PDB code 3hg1.pdb [49] , by PyMOL as previously described [45, 47] . It should be noted that CANCER/MART-1 decapeptide was removed from the 3hg1 before starting the comparative modeling analysis. Superimposition operations were performed through the "super" command implemented in PyMOL, starting from the structural alignment of the analyzed backbones. The "super" command allows aligning the selected proteins under investigation for performing a comparative structural analysis, due to its ability in providing a sequence-independent structure-based pairwise alignment. Notably, the "super" command is more robust than the "align" command because it also successfully performs superimposition of proteins with a lower sequence similarity. Then, it was possible to model/relax missing/buried residues located at the protein-protein interface, solving clashes and putative breaks in the backbone [43, 45, 47] . All the generated 3D all-atom models were energetically minimized by using the Yasara Minimization server [45, 47, 48] . The obtained final models were examined in VMD, PyMOL, and SPDBV by visual inspection searching for putative unsolved clashes [44, 45, 47] . Protein-protein binding regions were highlighted by selecting residues within 4 Å at the protein-protein interface, in the superimposed structures. The FoldX AnalyseComplex assay was performed to determine the interaction energy between the four generated T. truncatus TR alpha and TR beta models and between each of the four combined TR alpha/beta in complex with the T. truncatus MH1 I-ALPHA and B2M, but also for determining the interaction energy between the human counterparts of the crystallized 3hg1.pdb as a reference system, a validation strategy, and for comparative purposes. The way the FoldX AnalyseComplex operates is by unfolding the selected targets and determining the stability of the remaining molecules and then subtracting the sum of the individual energies from global energy. More negative energies indicate a better binding. Positive energies indicate no binding [50, 51] . We employed the latest version of the whole genome assembly (mTurTru1.mat.Y) of the bottlenose dolphin (T. truncatus) submitted by the Vertebrate Genomes Project, to NCBI (BioProject ID: PRJNA625792) to identify the TRB locus in this species. We retrieved a sequence approximately 278 kilobase (kb) in length (gaps included), comprising the Genes 2021, 12, 571 6 of 27 MOXD2 and the EPHB6 genes that flank the 5 and 3 ends, respectively, of all mammalian TRB loci studied to date. A standard BLAST search of the genomic sequence was then performed by using human [21, 23] and artiodactyl sequences i.e., Sus scrofa, Camelus dromedarius, and Ovis aries [24] [25] [26] [27] to identify and annotate all dolphin TRB genes. The sequence analysis showed a conserved general structural organization of the dolphin TRB locus with a library of TRBV genes positioned at the 5 end of the D-J-C clusters, followed by a single TRBV gene located at the 3 end in an inverted transcriptional orientation ( Figure 1A) . Only functional TRBV genes and in-frame pseudogenes are shown in Figure 1B. We employed the latest version of the whole genome assembly (mTurTru1.mat.Y) of the bottlenose dolphin (T. truncatus) submitted by the Vertebrate Genomes Project, to NCBI (BioProject ID: PRJNA625792) to identify the TRB locus in this species. We retrieved a sequence approximately 278 kilobase (kb) in length (gaps included), comprising the MOXD2 and the EPHB6 genes that flank the 5′ and 3′ ends, respectively, of all mammalian TRB loci studied to date. A standard BLAST search of the genomic sequence was then performed by using human [21, 23] and artiodactyl sequences i.e., Sus scrofa, Camelus dromedarius, and Ovis aries [24] [25] [26] [27] to identify and annotate all dolphin TRB genes. The sequence analysis showed a conserved general structural organization of the dolphin TRB locus with a library of TRBV genes positioned at the 5′ end of the D-J-C clusters, followed by a single TRBV gene located at the 3′ end in an inverted transcriptional orientation ( Figure 1A) . Only functional TRBV genes and in-frame pseudogenes are shown in Figure 1B . [22] . The boxes representing the genes are not to scale. The exons are not shown. The arrows indicate the transcriptional orientation of the MOXD2, trypsin-like serine protease (TRY) 1, 2, 3, and TRBV30 genes. (B) The IMGT Protein Display of the dolphin TRBV genes. Only functional genes and in-frame pseudogenes are shown. The description of the strands and loops and of the framework regions (FR-IMGT) and complementarity-determining region (CDR-IMGT) is according to the IMGT unique numbering for V-REGION [52] . The five conserved amino acids (AA) of the V-DOMAIN (1st-CYS 23, except for the TRBV28 pseudogene, CONSERVED-TRP 41, hydrophobic AA 89, 2nd-CYS 104, and J-PHE 118) are indicated in bold. The amino acid length of the CDR-IMGT AA is also indicated in square brackets. Moreover, it revealed that the D-J-C cluster is organized in three D-J-C sets similar to those found in ruminants [27, 53] , pig [24] , and camel species [25, 26, 54] , confirming the tight evolutionary relationship between Cetacea and Artiodactyla. D-J-C cluster 1 contains one TRBD, six TRBJ genes but lacks the TRBC gene, due to a gap in the genomic sequence. Likewise, the second set (D-J-C cluster 3) lacks the TRBD gene and it includes seven TRBJ and one TRBC gene. D-J-C cluster 2 is the only complete set with one TRBD, seven TRBJ, and one TRBC gene ( Figure 1A , Supplementary Dataset S1). Approximately 10 kb away from the TRBC2 gene lies the TRBV30 gene, with an inverted transcriptional orientation. The classification, position, and predicted functionality of all TRB genes are reported in Supplementary Table S1 . The comparison of the entire dolphin TRB locus sequence with those previously characterized in mammals, allowed us to identify and partially annotate four unrelated TRB genes consisting of a group of trypsin-like serine protease (TRY) genes that are typically interspersed among the TRB genes. Three TRY genes are located downstream of TRBV1, and one (TRY4) is located upstream of the D-J-C region ( Figure 1A , Supplementary Figure S2 ). The gene-predicted functionality (only the TRY3 gene is functional) is reported in Supplementary Table S2 , together with the position of all TRY genes. The classification, position, and predicted functionality of the MOXD2 and EPHB6 genes, which delimit the TRB locus, are also reported. The structure of the dolphin D-J-C clusters region is similar to that of other artiodactyl species [24] [25] [26] [27] 32, 53, 54] . The TRBD, TRBJ, and TRBC genes are distributed within three in tandem D-J-C clusters located at the 3 end of the TRB locus, with the numbers 1, 3, and 2 corresponding to their positions from 5 to 3 ( Figure 1A , Supplementary Dataset S1). The name D-J-C cluster 3 was attributed to the central cluster as in sheep, goat, dromedary, and pig [24] [25] [26] [27] 32, 54] . The nucleotides and deduced amino acid sequences of all the TRBJ genes identified in the region are reported in Figure 2A . Due to the lack of the TRBC1, the TRBJ1 genes were assessed by comparison with the human and artiodactyl corresponding gene sequences. All TRBJ genes are typically 40-53 bp in length and are predicted to be functional, except for TRBJ1-5 and TRBJ3-6 pseudogenes for a STOP-CODON in the coding region ( Figure 2A , Supplementary Table S1) and TRBJ1-2, TRBJ1-4, and TRBJ3-2 classified as an open reading frame (ORF). Each TRBJ is flanked at the 5 end by a 12 RS (NONAMER-12 bp SPACER-HEPTAMER) and at the 3 end by a donor splice site. All the RSs are well conserved compared to the consensus. Figure 2B shows the nucleotide and deduced amino acid sequences of the only two TRBD genes. They are composed of G-rich stretches of 13 bp (TRBD1) and 15 bp (TRBD2), respectively, that can be productively read through their three coding phases and encode 1-3 glycine residues, depending on the phase. The 5 and 3 sides of the coding region are flanked by the RSs that are well conserved compared to the consensus. A protein of 177 amino acids is encoded by the two dolphin available TRBC genes composed of four exons and three introns. Ten nucleotides are different and this results in three amino acid (AA) changes, two in the extracellular and one in cytoplasmic regions ( Figure 2C ). The C-domain encoded by EX1 is 129 AA long as in humans and pigs because of the identical length of the FG loop compared to other artiodactyl species. Based on the IMGT unique domain for C-DOMAIN [55] , the C region is composed of a connecting region (CO) of 21 AA (encoded by EX2 and the 5 part of EX3) with a cysteine in the interchain disulfide binding, a TM of 21 AA (encoded by the 3 part of EX3 and the first codon of EX4) and a cytoplasmic region (CY) of 5 AA (encoded by EX4). ( Figure 2C ). The C-domain encoded by EX1 is 129 AA long as in humans and pigs because of the identical length of the FG loop compared to other artiodactyl species. Based on the IMGT unique domain for C-DOMAIN [55] , the C region is composed of a connecting region (CO) of 21 AA (encoded by EX2 and the 5′ part of EX3) with a cysteine in the interchain disulfide binding, a TM of 21 AA (encoded by the 3′ part of EX3 and the first codon of EX4) and a cytoplasmic region (CY) of 5 AA (encoded by EX4). The numbering adopted for the gene classification is reported on the left of each gene. The consensus sequence of the heptamer and nonamer is provided at the top of the figure and is underlined. In (A), the inferred amino acid sequence of the TRBD genes in the three coding frames is reported. In (B), the donor splice site for each TRBJ is shown. The canonical FGXG amino acid motifs are underlined. In (C), IMGT Protein display of the dolphin TRBC genes as derived from the alignment by Clustal W with the human, sheep, dromedary, and pig TRBC amino acid sequences is shown. The strands and loops are according to the IMGT unique numbering for the C-DOMAIN [55] . In the retrieved sequence, we annotated 23 TRBV germline genes grouped into 20 distinct subgroups according to the criterion that sequences with a nucleotide identity of more than 75% in the V-region belong to the same subgroup. Only two subgroups have more than one member, TRBV5 and TRBV7 with three and two genes, respectively. Twelve out of twenty-three TRBV genes (approximately 50%) are predicted to be functional as defined by the IMGT rules (see "Materials and Methods"), and eleven are pseudogenes (Table 1 and Supplementary Table S4 ). Table 1 . Characterization of the TRBV subgroups (number of genes, functionality, and CDR-IMGT lengths) in dolphin (Tursiops truncatus), goat (Capra hircus), pig (Sus scrofa), camel (Camelus dromedarius), and dog (Canis lupus familiaris). The classification and nomenclature of the dolphin TRBV gene subgroups were established first by the IMGT/V-QUEST tool [37, 38] , comparing each V-region sequence with the germline human, pig, and sheep TRBV sequence set from the IMGT reference directory. In the Cetartiodactyla superorder, the gene nomenclature based on IMGT standardized rules [23] has been previously well defined [24] [25] [26] 32] . In Figure 1B the TRBV potential functional genes and in-frame pseudogenes are shown. All sequences exhibit the typical framework regions (FR-IMGT) and complementaritydetermining regions (CDR-IMGT) as well as four amino acids: cysteine 23 (1st-CYS) in FR1-IMGT, tryptophan 41 (CONSERVED-TRP) in FR2-IMGT, hydrophobic amino acid 89, and cysteine 104 (2nd-CYS) in FR3-IMGT (except for the TRBV6 pseudogene) [52] . Conversely, CDR-IMGT varies in amino acid composition and length, indicated by numbers between brackets, separated by dots. The phylogenetic analysis was performed comparing all dolphin TRBV functional genes, ORF, and pseudogenes with the available corresponding pig, dromedary, and goat genes whose sequences were chosen by adopting two selection criteria: (1) only potential functional genes and in-frame pseudogenes were included; and (2) only one gene per each of the subgroups was selected. An unrooted phylogenetic tree was made using the NJ method [35] combining the V-REGION nucleotide sequences of all selected TRBV genes in the same alignment. The tree shows that each of the 20 dolphin subgroups forms a monophyletic group, when present, with the corresponding pig, dromedary, and goat genes, this being consistent with the occurrence of distinct subgroups prior to the divergence of the mammalian species. The phylogenetic clustering confirmed the classification of the dolphin TRBV genes derived from the sequence analysis. Therefore, each dolphin TRBV subgroup was classified as orthologous to the corresponding artiodactyl subgroups ( Figure 3 ). The neighbor-joining (NJ) tree inferred from the dolphin (Tursiops truncatus), goat (Capra hircus), pig (Sus scrofa), and dromedary (Camelus dromedarius) TRBV gene sequences. The evolutionary analysis was conducted in MEGAX [34] . The optimal tree with the sum of branch length = 16.02748379 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) is shown next to the branches [56] . The tree is drawn to scale with branch lengths in the same units as those of the evolutionary distances used to infer phylogenetic trees. The evolutionary distances were computed using the Maximum Composite Likelihood method [57] and are in the units of the number of base substitutions per site. This analysis involved 95 nucleotide sequences (Supplementary Table S3 ). All ambiguous positions were removed for each sequence pair (pairwise deletion option). There were a total of 473 positions in the final dataset. The different colors highlight the distribution of the phylogenetic groups (green balls for dolphin functional genes; red balls for dolphin pseudogenes; dark balls for missing genes in dolphin). The dolphin TRBV subgroup classification is performed according to the clustering with the orthologous mammalian TRBV subgroups. The gene functionality according to IMGT rules (F: functional, ORF: open reading frame, P: pseudogene) is indicated. The IMGT 6-letter for species (Turtru, Camdro, Susscr, and Caphir) standardized abbreviation for taxon is used. As summarized in Table 1 , the total number of the dolphin TRBV genes is considerably lower if compared with the artiodactyl species, with humans and carnivores too, mainly due to the lack of several TRBV subgroups. Particularly, eight (TRBV3, TRBV9, TRBV14, TRBV15, TRBV16, TRBV21, TRBV23 , and TRBV25) subgroups were not found in dolphin (dark balls in Figure 3 ). TRBV13 and ORF TRBV17 genes, present in humans [8] , are missing in cetartiodactyls (dolphin, goat, pig, and dromedary) and carnivores (dog) [58] genomes. The lack of the dolphin TRBV18 is shared with pig and dromedary; pig TRB locus lacks TRBV16 and TRBV26, while goat and dog loci lack the TRBV23 sub- Figure 3 . The neighbor-joining (NJ) tree inferred from the dolphin (Tursiops truncatus), goat (Capra hircus), pig (Sus scrofa), and dromedary (Camelus dromedarius) TRBV gene sequences. The evolutionary analysis was conducted in MEGAX [34] . The optimal tree with the sum of branch length = 16.02748379 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) is shown next to the branches [56] . The tree is drawn to scale with branch lengths in the same units as those of the evolutionary distances used to infer phylogenetic trees. The evolutionary distances were computed using the Maximum Composite Likelihood method [57] and are in the units of the number of base substitutions per site. This analysis involved 95 nucleotide sequences (Supplementary Table S3 ). All ambiguous positions were removed for each sequence pair (pairwise deletion option). There were a total of 473 positions in the final dataset. The different colors highlight the distribution of the phylogenetic groups (green balls for dolphin functional genes; red balls for dolphin pseudogenes; dark balls for missing genes in dolphin). The dolphin TRBV subgroup classification is performed according to the clustering with the orthologous mammalian TRBV subgroups. The gene functionality according to IMGT rules (F: functional, ORF: open reading frame, P: pseudogene) is indicated. The IMGT 6-letter for species (Turtru, Camdro, Susscr, and Caphir) standardized abbreviation for taxon is used. As summarized in Table 1 , the total number of the dolphin TRBV genes is considerably lower if compared with the artiodactyl species, with humans and carnivores too, mainly due to the lack of several TRBV subgroups. Particularly, eight (TRBV3, TRBV9, TRBV14, TRBV15, TRBV16, TRBV21, TRBV23 , and TRBV25) subgroups were not found in dolphin (dark balls in Figure 3 ). TRBV13 and ORF TRBV17 genes, present in humans [8] , are missing in cetartiodactyls (dolphin, goat, pig, and dromedary) and carnivores (dog) [58] genomes. The lack of the dolphin TRBV18 is shared with pig and dromedary; pig TRB locus lacks TRBV16 and TRBV26, while goat and dog loci lack the TRBV23 subgroups. The TRBV9 subgroup was not found in pig and dog genomes whereas the TRBV14 is absent from the dog locus (Table 1) . Genomic comparison of the dolphin TRB locus with that of the human, dromedarius, pig, and goat, highlighted the uniqueness of the structure of the dolphin TRBV locus. Twenty-three dolphin TRBV genes, grouped into 20 distinct subgroups, lie in a region of approximately 145 kb ( Figure 1A) . The dot-plot matrix of the dolphin versus human TRB locus confirms the high level of nucleotide identity between TRBV genes as indicated by dots and diagonal lines that correspond with gene location (Figure 4 ). The only region where gene duplication took place in T. truncatus TRB locus concerns the TRBV5 and TRBV7 genes, which are inserted in a group consisting of duplications of individual TRBV5 (from 5-1 to 5-3), and TRBV7 (from 7-1 to 7-2) pseudogenes, and are interspersed with the TRBV6 pseudogene and with the TRBV8 ORF ( Figure 1A and green line in Figure 4 ). This group of six pseudogenes and a single ORF is found in The only region where gene duplication took place in T. truncatus TRB locus concerns the TRBV5 and TRBV7 genes, which are inserted in a group consisting of duplications of individual TRBV5 (from 5-1 to 5-3), and TRBV7 (from 7-1 to 7-2) pseudogenes, and are interspersed with the TRBV6 pseudogene and with the TRBV8 ORF ( Figure 1A and green line in Figure 4 ). This group of six pseudogenes and a single ORF is found in the matrix showing the dot-plot of dolphin against the dromedarius TRBV cluster (green line in Supplementary Figure S3 ) and against the pig TRBV cluster sequences (green line in Supplementary Figure S4 ). This group of genes whose sequential order on the genome is conserved in cetaceans, swine and Tylopoda, occupies about 15 kb of the TRB locus in the dolphin genome. On the contrary, the corresponding group in humans went through duplications up to eight copies for TRBV5 (TRBV5-8) and up to nine copies for TRBV6 and TRBV7 (TRBV6-9, TRBV7-9). It should be noted that the latter are all functional genes, with the exception of a single pseudogene (TRBV7-5). Such duplications occupy a region extending for 200 kb (green rectangles in Figure 3 ) which is interrupted by a portion of 50 kb (red rectangle in Figure 4 ) containing the only copy of TRBV9 gene and the duplication of the group consisting of TRBV10, TRBV11 and TRBV12 genes ( Figure 4 ). In Capra hircus, the TRB locus [32] , in correspondence of the dolphin TRBV5-1, TRBV5-2, and TRBV6 genes, houses a series of repeated duplicative events that lead to the existence of 30 copies for the TRBV5 and 29 for TRBV6, for a total of 59 (29 functional, 26 pseudogenes, and 4 ORF) genes. This region seat of repeated duplicative events extends without interruption for 180 Kb (red rectangle in Supplementary Figure S5) . Furthermore, in the human-dolphin genome comparative analysis, the region including seven genes (from 12-4 to 18) in humans is deleted in the dolphin (orange rectangle in Figure 4 ) where TRBV12 and TRBV19 genes are neighbors in the same region ( Figure 1A) . If compared to the other cetartiodactyls, this deletion (orange rectangle) in dromedary (Supplementary Figure S3) , in pig (Supplementary Figure S4) , and in goat (Supplementary Figure S5 ) matrices, is unique and typical of the dolphin TRB locus. The deleted region marked with an orange square in Figure 4 is due to the presence of the interspersed deletions of TRBV23 and TRBV25 dolphin genes in comparison with the human contig. Finally, the dot-plot matrix in Figure 4 shows five lines of similarity indicating the duplications in human of TRY4 (from TRY4 to TRY8) (blue rectangles) and five dots in correspondence of the only functional TRY gene (TRY3) (blue rectangles) present in dolphin (Supplementary Table S2 , Supplementary Figure S2 ). To evaluate the functional competency of the three D-J-C clusters, we performed three 5 RACE experiments on total RNA isolated from the peripheral blood of two unrelated animals, a male (M) and a female (L), using a TRBC-specific primer. Each RACE product was gel-purified and cloned into the TA-vector and randomly selected positive clones for each cloning were sequenced [9] . A total of 45 diverse clones of different lengths containing rearranged V-(D)-J-C transcripts with a correct open reading frame were obtained. In particular, 22 out of 45 cDNA clones belonged to the male subject (M5RBL series), while the remaining 23 cDNAs were derived from the female subject (L5RBL series) ( Figure 5 ). Five clones of the L5RBL series (L5RBL12, L5RBL13, L5RBL20, L5RBL2, and L5RBL27), and three clones of the M5RBL series (M5RBL15, M5RBL22, and M5RBL30) showed an identical coding sequence for a total of 13 redundant clones. Each of the remaining 32 cDNA sequences was manually analyzed to identify the TRBV, TRBD, and TRBJ genes through alignment with the germline dolphin TR genes. Figure 5 shows the deduced amino acid sequences of the V-(D)-J-region of all cDNA clones according to IMGT unique numbering for the V-REGION and V-DOMAIN [52] . In the cDNA clones, we identified 7 different TRBV genes belonging to 7 out of 12 different subgroups consisting of predicted functional germline genes. In 16 out of 32 clones, the TRBV gene perfectly matched the corresponding germline sequence. The remaining 16 TRBV sequences showed a nucleotide identity from 97 to 99% with respect to the reference germline gene sequence. We referred to these as new alleles, although we cannot rule out sequencing errors or that duplicated genes have been missed. The TRBV30 subgroup gene represented the most expressed (17 unique different clones, 53%) followed by the TRBV12 (7 clones) and the TRBV4 (4 clones) genes. Only one clone was found for four (TRBV10, TRBV20, TRBV22, and TRBV27) subgroups. Moreover, the functionality of the dolphin TRBV12 gene was confirmed since originated seven productive transcripts despite the presence of a stop-codon in the germline CDR3-IMGT at position 108 ( Figure 1B ) [52] (IMGT Repertoire, Alignments of alleles, http://www.imgt.org, accessed on 10 February 2021), that was trimmed during rearrangement ( Figures 5 and 6 ). In both subjects analyzed, of the 12 functional genes present in the locus ( Figure 1A ,B) and described in Table 1 , five of them (TRBV1, TRBV8, TRBV19, TRBV24, and TRBV26) were not found expressed in peripheral blood ( Figure 5 ). A certain difference in the two analyzed individuals was observed in the usage of the TRBV genes in the recombinative events. In fact, all TRBV subgroups were found in the cDNA from the male subject (M5RBL series) while only 3 out 7 TRBV subgroups were identified in the female (L5RBL series) cDNA. It is interesting to note that the TRBV30 gene is almost equally recurrent (9 out of 23 L5RBL series and 8 out of 22 M5RBL series) FR1-IMGT CDR1-IMGT FR2 -IMGT CDR2-IMGT FR3-IMGT CDR3 -IMGT FR4 -IMGT TRBV (1-26 Each of the remaining 32 cDNA sequences was manually analyzed to identify the TRBV, TRBD, and TRBJ genes through alignment with the germline dolphin TR genes. Figure 5 shows the deduced amino acid sequences of the V-(D)-J-region of all cDNA clones according to IMGT unique numbering for the V-REGION and V-DOMAIN [52] . In the cDNA clones, we identified 7 different TRBV genes belonging to 7 out of 12 different subgroups consisting of predicted functional germline genes. In 16 out of 32 clones, the TRBV gene perfectly matched the corresponding germline sequence. The remaining 16 TRBV sequences showed a nucleotide identity from 97 to 99% with respect to the reference germline gene sequence. We referred to these as new alleles, although we cannot rule out sequencing errors or that duplicated genes have been missed. The TRBV30 subgroup gene represented the most expressed (17 unique different clones, 53%) followed by the TRBV12 (7 clones) and the TRBV4 (4 clones) genes. Only one clone was found for four (TRBV10, TRBV20, TRBV22, and TRBV27) subgroups. Moreover, the functionality of the dolphin TRBV12 gene was confirmed since originated seven productive transcripts despite the presence of a stop-codon in the germline CDR3-IMGT at position 108 ( Figure 1B ) [52] (IMGT Repertoire, Alignments of alleles, http://www.imgt.org, accessed on 10 February 2021), that was trimmed during rearrangement ( Figures 5 and 6 ). In both subjects analyzed, of the 12 functional genes present in the locus ( Figure 1A ,B) and described in Table 1 , five of them (TRBV1, TRBV8, TRBV19, TRBV24, and TRBV26) were not found expressed in peripheral blood ( Figure 5 ). All TRBJ genes within the set of cDNA sequences were assigned unambiguously to the distinct germline genes derived from the reference sequence and only in two cases (TRBJ2-7 and TRBJ3-1) we identified new alleles, which were labeled by consecutive numbers with respect to the germline allele *01 ( Figure 5) . The TRBJ2 set genes with a frequency of 20/32 clones appear to be preferentially expressed over the TRBJ3 set genes with a frequency of 11/32 clones. The surprising result concerns the almost absence of the use of the TRBJ1-1 gene with a single (L5RBL4) clone that contains it (Figures 5 and 6 ). A certain difference in the two analyzed individuals was observed in the usage of the TRBV genes in the recombinative events. In fact, all TRBV subgroups were found in the cDNA from the male subject (M5RBL series) while only 3 out 7 TRBV subgroups were identified in the female (L5RBL series) cDNA. It is interesting to note that the TRBV30 gene is almost equally recurrent (9 out of 23 L5RBL series and 8 out of 22 M5RBL series) in the transcripts of the two animals, suggesting that this gene may be involved in the expression of a public TRB repertoire (allele*02 shared in both individuals). All TRBJ genes within the set of cDNA sequences were assigned unambiguously to the distinct germline genes derived from the reference sequence and only in two cases (TRBJ2-7 and TRBJ3-1) we identified new alleles, which were labeled by consecutive numbers with respect to the germline allele *01 ( Figure 5) . The TRBJ2 set genes with a frequency of 20/32 clones appear to be preferentially expressed over the TRBJ3 set genes with a frequency of 11/32 clones. The surprising result concerns the almost absence of the use of the TRBJ1-1 gene with a single (L5RBL4) clone that contains it (Figures 5 and 6 ). More complex is the determination of the contribution of the TRBD genes involved in the CDR3 formation. For a close inspection, the nucleotide sequences corresponding to the CDR3 have been excised from each cDNA and analyzed in detail ( Figure 6 ). By comparison with the TRBD genomic sequences, the nucleotides located in the CDR3 were considered to belong to a TRBD gene if they constituted a stretch of at least five consecutive nucleotides. We were able to assess the contribution of only the TRBD1 and TRBD2 germline sequences but was not possible to identify the presence of a third TRBD gene because this is missing in the current genomic assembly. In this way, the TRBD was unambiguously identified in 21 out of 32 sequences (62.5%) with the TRBD1 present in 16 and TRBD2 in 5 clones, respectively. The remaining 11 sequences either do not have an identifiable TRBD gene, maybe for the lack of the TRBD3 reference sequence, or it was not possible to distinguish between TRBD1 and TRBD2 genes. However, the absence of a TRBD region could be also interpreted as a direct V-J junction and it is even possible that nucleotide trimming masked the initial participation of the TRBD gene during the rearrangement. The corresponding amino acid sequence of the CDR3 loop deduced from the nucleotide sequences is heterogeneous for amino acid composition and length (Figures 5 and 6 ). The mean length of the CDR3 loop was approximately the same in the dolphin (mean 12,3 amino acids in a range of 5-16 amino acids) with respect to the human (mean 12,7 amino acids; [59] ), pig (mean 12,2 amino acids; [24] ), camel (mean 12,8 amino acids; [25] ) and goat (mean 12,3 amino acids; [32] ). Finally, although the TRBD gene was not unambiguously recognizable in all of the cDNA clones, the interpretation of these rearrangements revealed that only five clones show the intracluster rearrangements with TRBD2-TRBJ2 (yellow in Figure 6 ). Intercluster rearrangements represent a substantial portion of the repertoire, with 9 TRBD1-TRBJ2 (light blue in Figure 6 ) and 7 TRBD1-TRBJ3 (green in Figure 6 ) rearrangements. To calculate the most likely interactions between the putative alpha/beta pairing in complex with MH1 I-ALPHA and B2M, we analyzed the amino acid sequences of the two types of rearranged TRA cDNA, originated by TRAV20S2*01-J17 (5RAL27) and TRAV18-1*02-J20 (5RAL11) rearrangements, respectively found in the peripheral blood of the animal identified with the letter L in our previous study [9] , and of relevant TRB cDNA, originated by TRBV30-D2-J3 (L5RBL13) and TRBV30-D2-J2 (L5RBL27) rearrangements among the nine found in this study in the peripheral blood of the same animal (TRBV30 clones initialed by the initial letter L in Figures 5 and 6) . The 3hg1.pdb crystallized structure consisting of the human TR alpha/beta chains in complex with MH1 I-ALPHA and B2M was identified along with pGenThreader and I-tasser searches and used as a protein template for guiding the 3D comparative modeling analysis (Figure 7) . T. truncatus TR alpha and TR beta chains were modeled by using as a protein template 3hg1.pdb chain D, the human TR alpha, sharing with the modeled T. truncatus TR alpha putatively coded by 5RAL27 or 5RAL11 clonotypes, 46 or 41% of identical amino acids, respectively, or 3hg1.pdb chain E, the human TR beta, sharing with T. truncatus TR beta, putatively coded by L5RBL13 or L5RBL27 clonotypes, 59-71% of identical AA residues, respectively, and 100% coverage. All T. truncatus TRA and TRB sequences showed 100% coverage with the corresponding human TR alpha or TR beta crystallized chains. by using ClustalW [60] implemented in the Jalview package [61] . The obtained sequence-structure pairwise alignment (Supplementary Figure S6A,B) was reported in the SPDBV alignment panel for guiding/building the 3D comparative model of both TRA and TRB chains, according to validated protocols [41] [42] [43] [44] [45] . For studying interactions between T. truncatus TR alpha and TR beta with MH1 I-ALPHA in T. truncatus we searched for T. truncatus MH1 I-ALPHA and we identified the sequence XP_033706382.1 sharing with the human MH1 I-ALPHA (3hg1.pdb, chain A) sequence more than 79% of identical amino acids and 100% coverage. Similarly, we searched for T. truncatus best hit B2M counterpart, and the sequence XP_033707513.1 was identified, sharing with the human B2M sequence more than 74% of identical amino acids and 100% coverage. A 3D comparative model of the T. truncatus MH1 I-ALPHA and B2M was built by using as a protein template the corresponding counterparts taken from 3hg1.pdb starting from the pairwise sequence-structure alignment built by CulstalW, as proposed for TR alpha and TR beta (Figure 7 and Supplementary Figure S6C,D) . The root mean square deviation (RMSD) between the coordinates of the built 3D comparative models and the crystallized 3hg1.pdb ranged between 0.15 and 0.5 Å. The pairwise sequence-structure alignment between the investigated T. truncatus TR alpha and TR beta and the human crystallized counterpart (from 3hg1.pdb) was obtained by using ClustalW [60] implemented in the Jalview package [61] . The obtained sequencestructure pairwise alignment (Supplementary Figure S6A,B) was reported in the SPDBV alignment panel for guiding/building the 3D comparative model of both TRA and TRB chains, according to validated protocols [41] [42] [43] [44] [45] . For studying interactions between T. truncatus TR alpha and TR beta with MH1 I-ALPHA in T. truncatus we searched for T. truncatus MH1 I-ALPHA and we identified the sequence XP_033706382.1 sharing with the human MH1 I-ALPHA (3hg1.pdb, chain A) sequence more than 79% of identical amino acids and 100% coverage. Similarly, we searched for T. truncatus best hit B2M counterpart, and the sequence XP_033707513.1 was identified, sharing with the human B2M sequence more than 74% of identical amino acids and 100% coverage. A 3D comparative model of the T. truncatus MH1 I-ALPHA and B2M was built by using as a protein template the corresponding counterparts taken from 3hg1.pdb starting from the pairwise sequence-structure alignment built by CulstalW, as proposed for TR alpha and TR beta (Figure 7 and Supplementary Figure S6C,D) . The root mean square deviation (RMSD) between the coordinates of the built 3D comparative models and the crystallized 3hg1.pdb ranged between 0.15 and 0.5 Å. 3.8. Binding Energy Calculations at the Protein Interface between TR Alpha and TR Beta Chains: Computational Analyses Predict the Pairing of the TRAV20-J17 (or TRAV18-J20) and TRBV30-D2-J3-3 Variable Domains The interaction energies calculated between the TR alpha and TR beta and between the TR alpha/beta and MH1 I-ALPHA resulted in a negative value ( Table 2 ), confirming that there might be a binding interaction in all the investigated cases. This result is encouraging, also due to the strategy validated by obtaining negative binding energies for the interactions between the human TR alpha and TR beta chains and the MH1 I-ALPHA chain within the crystallized complex available under the PDB code "3hg1.pdb". Table 2 . Interaction energies estimated at the protein interface between TR alpha and TR beta chains and between the reported TR chains and MH1 I-ALPHA. Notably, the strongest binding interactions between the investigated T. truncatus TR alpha and TR beta 3D models is observed in the L5RBL13_5RAL27 clonotypes containing protein complex and in the L5RBL13_5RAL11 clonotypes containing protein complex (in terms of interaction energies calculated by FoldX Analyse complex assay) (Table 2, Figure 8A ,B), whereas the strongest interaction between TR alpha/beta and MH1 I-ALPHA is observed in the L5RBL27_5RAL27 clonotypes containing protein complex (Table 2 ). [55, 62] . In 5RAL11 and 5RAL27 V-alpha domain CDR-IMGT is blue-green-green; in L5RBL13 V-beta domain CDR-IMGT is redorange-purple. Interacting residues at the interface of the alpha/beta TR 3D models with MH1 I-ALPHA in T. truncatus are reported in stick representation (lower left). The protein complex interface was computed by the pGenThreader (http://bioinf.cs.ucl.ac.uk/psipred/, accessed on 10 February 2021) and i-Tasser (https://zhanglab.ccmb.med.umich.edu/I-TASSER/, accessed on 10 February 2021). While interaction energies estimated at the protein interface within the modeled T. truncatus TR alpha and TR beta appear weaker than their interacting counterparts in the crystallized structure (3hg1.pdb), interaction energies calculated at the protein interface between the modeled T. truncatus TR alpha/beta and MH1 I-ALPHA appear stronger than their counterpart in the human 3hg1.pdb in all the cases (Table 2) . Interacting amino acids located in the CDR-IMGT of the variable alpha (V-ALPHA) and beta (V-BETA) domains of the TRA/TRB chains are shown in Figure 8A ,B. It is interesting to note the recurring position in the DE turn [18] of the FR3-IMGT regarding the L5RBL13 clonotype (red arrow indicating E84 in Figure 8B ), the 5RAL27 clonotype (red arrow indicating K 85 in Figure 8A ), and the 5RAL11 clonotype (red arrow indicating T 82 in Figure 8B ). For a complete picture, according to comparative modelization, a detailed atom pair contact sites analysis in the human 3hg1.pdb based on the IMGT unique numbering is reported in Supplementary Table S5, Figures 9 and 10 . "IMGT pMH contact sites" [63, 64] precisely identify the contacts between the amino acids of a presented peptide and those of the floor and helix walls of the MH groove, in 3D structures of pMH and TR/pMH complexes. Eleven standard "IMGT pMH contact sites" were defined (C1-C11). They correspond to a theoretical maximum length of 11 AA in the groove [63, 64] . The peptide binding mode to MH1 is characterized by the N-terminal and C-terminal peptide ends docked deeply with the C1 and C11 contact sites (red and pink, respectively in the IMGT Collier de Perles, Figure 10 ). For a peptide of 10 AA, one "IMGT pMH contact site" is absent (C2), for a peptide of 9 AA, two "IMGT pMH contact sites" are absent (C2 and C7), whereas, for a peptide of 8 AA, three pMH contact sites are absent (C2, C7, and C8). The characterization of the "IMGT pMH contact sites" based on contact analysis has superseded the previous identification of "pockets" in the MH groove [63, 64] . For a complete picture, according to comparative modelization, a detailed atom pair contact sites analysis in the human 3hg1.pdb based on the IMGT unique numbering is reported in Supplementary Table S5, Figures 9 and 10 . "IMGT pMH contact sites" [63, 64] precisely identify the contacts between the amino acids of a presented peptide and those of the floor and helix walls of the MH groove, in 3D structures of pMH and TR/pMH complexes. Eleven standard "IMGT pMH contact sites" were defined (C1-C11). They correspond to a theoretical maximum length of 11 AA in the groove [63, 64] . The peptide binding mode to MH1 is characterized by the N-terminal and C-terminal peptide ends docked deeply with the C1 and C11 contact sites (red and pink, respectively in the IMGT Collier de Perles, Figure 10 ). For a peptide of 10 AA, one "IMGT pMH contact site" is absent (C2), for a peptide of 9 AA, two "IMGT pMH contact sites" are absent (C2 and C7), whereas, for a peptide of 8 AA, three pMH contact sites are absent (C2, C7, and C8). The characterization of the "IMGT pMH contact sites" based on contact analysis has superseded the previous identification of "pockets" in the MH groove [63, 64] . [65] . Amino acid differences between species are highlighted in yellow (and further in bold in Tursiops truncatus lines Tt1). The contacts between the Homsap G domains and the TR V domains are displayed in lines Hs2: in (A), G-ALPHA1 contacts with V-ALPHA in green, and V-BETA in pink, in (B) G-ALPHA2 contacts with V-ALPHA in blue and V-BETA in red (arbitrary colors). The IMGT pMH contact sites [63, 64, [66] [67] [68] between the Homsap G domains and the peptide are displayed in lines Hs3 (colors evoking the contact sites). Positions 61A, 61B, and 72A are characteristic of the G-ALPHA2 and are not reported in G-ALPHA1. Hs1, Hs2 and Hs3 are from 3hg1 in IMGT/3Dstructure-DB, http://www.imgt.org (accessed on 10 February 2021). [65] . Amino acid differences between species are highlighted in yellow (and further in bold in Tursiops truncatus lines Tt1). The contacts between the Homsap G domains and the TR V domains are displayed in lines Hs2: in (A), G-ALPHA1 contacts with V-ALPHA in green, and V-BETA in pink, in (B) G-ALPHA2 contacts with V-ALPHA in blue and V-BETA in red (arbitrary colors). The IMGT pMH contact sites [63, 64, [66] [67] [68] [63, 64, 66, 67] between the Homsap G domains and the peptide MLANA (melan-A, MART1) Pr26-35 (ELAGIGILTV A2 > L), also known as melanoma antigen recognized by T cells 1 or MART-1, are displayed with the C1-C11 pMH contact site colors (see also Hs3 in Figure 9 ). Considering the features of contacts between the Homo sapiens G domains and the TR V domains displayed in lines Hs2 of the alignments of the G-ALPHA1 ( Figure 9A ) and G-ALPHA2 ( Figure 9B ) domains of T. truncatus MH1 I-ALPHA with Homo sapiens MH1 I-ALPHA HLA-A*0201, the probability that the same positions are involved in atom pair contacts in dolphins are very high (in Figure 9 , in yellow the amino acids in dolphin compared to 3hg1). In the superorder of Cetartiodactyla, bottlenose dolphin (T. truncatus) and the other cetaceans represent the most successful mammalian colonization of the aquatic environment and have undergone a radical transformation from the original mammalian bodyplan. In a previous work [9] we reported in dolphin genomic and expression studies on TR gamma (TRG) and alpha/delta (TRA/TRD) loci. In this study, we report in dolphin an extensive analysis of the genomic organization, evolution, and expression of the TRB genes and the 3D comparative modeling of the TR alpha/beta heterodimer in complex with MH1 I-ALPHA and B2M. According to comparative analyses with humans and Artiodactyla, i.e., dromedarius [25, 26] , pig [24] , and goat [32] , the organization of the dolphin TRB locus is similar to that of the other artiodactyl species, with three in tandem D-J-C clusters located at its 3′ end. In its genomic structure, the dolphin TRB locus is the smallest in length (gaps included), comprising the MOXD2 and the EPHB6 genes that flank the 5′ and 3′ ends, respectively. The overall length of 276 Kb corresponds to less than half of that of Homo sapiens (620 Kb), to about half of that of Capra hircus (558 Kb) [32] , to a little more than half of that of Sus scrofa (407 Kb) [24, 69] , and it is slightly smaller than that of Camelus dromedarius (302 Kb) [54] . The uniqueness of the dolphin TRB locus, when compared to the other artiodactyls and to humans, is given both by the almost complete absence of duplications in the TRBV region and by the presence of deletions that drastically reduce the number of the variable genes (see orange rectangles in Figure 4) . [63, 64, 66, 67] between the Homsap G domains and the peptide MLANA (melan-A, MART1) Pr26-35 (ELAGIGILTV A2 > L), also known as melanoma antigen recognized by T cells 1 or MART-1, are displayed with the C1-C11 pMH contact site colors (see also Hs3 in Figure 9 ). Considering the features of contacts between the Homo sapiens G domains and the TR V domains displayed in lines Hs2 of the alignments of the G-ALPHA1 ( Figure 9A ) and G-ALPHA2 ( Figure 9B ) domains of T. truncatus MH1 I-ALPHA with Homo sapiens MH1 I-ALPHA HLA-A*0201, the probability that the same positions are involved in atom pair contacts in dolphins are very high (in Figure 9 , in yellow the amino acids in dolphin compared to 3hg1). In the superorder of Cetartiodactyla, bottlenose dolphin (T. truncatus) and the other cetaceans represent the most successful mammalian colonization of the aquatic environment and have undergone a radical transformation from the original mammalian bodyplan. In a previous work [9] we reported in dolphin genomic and expression studies on TR gamma (TRG) and alpha/delta (TRA/TRD) loci. In this study, we report in dolphin an extensive analysis of the genomic organization, evolution, and expression of the TRB genes and the 3D comparative modeling of the TR alpha/beta heterodimer in complex with MH1 I-ALPHA and B2M. According to comparative analyses with humans and Artiodactyla, i.e., dromedarius [25, 26] , pig [24] , and goat [32] , the organization of the dolphin TRB locus is similar to that of the other artiodactyl species, with three in tandem D-J-C clusters located at its 3 end. In its genomic structure, the dolphin TRB locus is the smallest in length (gaps included), comprising the MOXD2 and the EPHB6 genes that flank the 5 and 3 ends, respectively. The overall length of 276 Kb corresponds to less than half of that of Homo sapiens (620 Kb), to about half of that of Capra hircus (558 Kb) [32] , to a little more than half of that of Sus scrofa (407 Kb) [24, 69] , and it is slightly smaller than that of Camelus dromedarius (302 Kb) [54] . The uniqueness of the dolphin TRB locus, when compared to the other artiodactyls and to humans, is given both by the almost complete absence of duplications in the TRBV region and by the presence of deletions that drastically reduce the number of the variable genes (see orange rectangles in Figure 4 ). An in-depth analysis of the genomic structure of the TRBV genes highlights a ratio of 1/1 of functional genes to pseudogenes against a ratio of about 2/1 in the other examined artiodactyls (Capra hircus and Sus scrofa) and in carnivores (Canis lupus familiaris). A 3/1 ratio is present in Camelus dromedarius ( Table 1 ). The phylogenetic analysis shows that each of the 20 dolphin TRBV subgroups forms a monophyletic group, when present, with the corresponding pig, dromedary, and goat genes. This result is consistent with the occurrence of distinct subgroups prior to the divergence of the mammalian species. The phylogenetic clustering (Figure 3 ) confirmed the classification of the dolphin TRBV genes derived from the sequence analysis. In the analysis of the TRB V-D-J rearrangements, intercluster rearrangements were observed to constitute a substantial portion of the repertoire, with 10 TRBD1-TRBJ2 and 6 TRBD1-TRBJ3. The fact that the 5 RACE assay resulted from a primer designed on a region shared by all the constant genes leads us to assume that the 19 clones containing the TRBJ2 set genes and the 12 clones containing the TRBJ3 set genes have the TRBJ gene linked to the corresponding TRBC2 and TRBC3, respectively ( Figure 6 ). Moreover, since the TRBC3 gene is located upstream from the TRBJ2 cluster in the germline DNA ( Figure 1A and Supplementary Dataset S1), we cannot rule out that TRBJ2 can be joined to TRBC3 as a product of a trans-splicing between a transcript with TRBJ2-TRBC2 genes and a transcript containing the TRBC3 gene [70] . The expression data highlight the preferential usage of TRBV30 in both the analyzed subjects. Out of a total of 45 cDNA clones obtained from the peripheral blood of the two subjects, TRBV30 is expressed with a percentage of 66.6% (30/45 clones). The redundancy of the clones (from +1 to +3) of the series containing the TRBV30, is highlighted in Figure 5 . The preferential usage of the TRBV30 gene, which lies at the 3 end of the TRBC2 gene in an inverted transcriptional orientation in comparison with other artiodactyls represents a condition of diversity in dolphins. In fact, considering the total number of cDNA clones from peripheral lymphoid organs (spleen and blood), against 30/45 clones in dolphin blood, 2/27 in goat blood [32] , 1/26 in pig spleen [24] , 0/35 in camel spleen [25] , and 0/36 in dog blood [58] are reported in the literature. The fact that both the unrelated subjects show a biased usage of the TRBV30 could be explained by the sharing by the two subjects of the same aquatic environment. Then the presence of common antigens may have stimulated T cells with a particular type of beta chain to expand, suggesting the existence of a basic "public" repertoire of a given alpha/beta TR [71] . Therefore we used the rearranged TRA [9] and TRB cDNA clones from the peripheral blood of one of the two analyzed subjects (Leah in this work), to build a 3D comparative modeling of the T. truncatus TR alpha/beta in complex MH1 I-ALPHA and B2M. The four possible 3D structures (Figure 7b -e) of TRAV20S2*01-J17 (5RAL27) and TRAV18-1*02-J20 (5RAL11) in combination with TRBV30-D2-J3 (L5RBL13) and TRBV30-D2-J2 (L5RBL27) clonotypes ( Figure 7 ) retain a very high similarity with the 3hg1.pdb structure of the human TR alpha/beta chains in complex with MH1 I-ALPHA and B2M. The negative value resulted from the interaction energies calculated between the TR alpha and TR beta and between the TR alpha/beta and MH1 I-ALPHA (Table 2) , allowed the identification of the resulting interacting amino acids located in the CDR-IMGT at a distance of length < 4.00 (Å). In particular, in the L5RBL13 clonotype, Tyr 55 is located on the edge of CDR2 ( Figure 8A,B) , and in the 5RAL27 clonotype, Tyr 57 on CDR2 ( Figure 8A ) correlate well with recent literature [72, 73] . Piepenbrink et al., in their article, reported that TR and MH proteins were refolded from bacterially expressed inclusion bodies. In this paper, results of the interaction free energy between two amino acid sidechains obtained in a double-mutant cycle pointed out how the engagement of Tyr at the center of the interface by sidechains of CDR TRA and TRB, contributes, depending on its location, significantly to TR affinity [72] . Nevertheless, our results show that interaction energies calculated at the protein interface between the modeled T. truncatus TR alpha/beta and MH1 I-ALPHA appear stronger than their counterpart in the human 3hg1.pdb in all the examined cases ( Table 2) . The results of this research confirmed the peculiarity of the genomic organization of the TR loci already identified in previous studies in T. truncatus. In the organism of marine mammals, there seems to be an evolutionary pressure that is responsible for reducing the length of the TR loci, in terms of kilobases. This is highlighted thanks to the comparative analyses with other artiodactyls and with humans for both the gamma locus (TRG) in the previous study and the beta (TRB) locus in the present study. The reduction in length of the locus is accompanied by a reduction in the content of the variable genes that are primarily responsible for antigen recognition. Therefore, in addition to having identified the peculiar genomic features and expression of dolphin TRB locus, we have built for the first time in a marine mammal the 3D human/dolphin comparative modeling of the TR alpha/beta in complex with major histocompatibility class I (MH1) protein and beta-2 microglobulin (B2M). Nevertheless, our results of the human/dolphin modelization, integrated with the atom pair contact sites analysis between the human MH1 grove (G) domains and the TR V domains, confirm conservation of the structure of the dolphin TR/pMH. The following are available online at https://www.mdpi.com/article/10 .3390/genes12040571/s1, Table S1: Description of the TRB genes in the T. truncatus genome assembly NC_047042.1 (NCBI Reference Sequence). The position of all genes and their classification and functionality are reported, Table S2 : Description of the unrelated TRB genes in the T. truncatus genome assembly NC_047042.1 (NCBI Reference Sequence). The position of all genes and their classification and functionality are reported, Table S3 : GEDI accession numbers for TRBV genes of Camelus dromedarius, Sus scrofa, Capra hircus, and T. truncatus. The position of all genes and their functionality are reported, Table S4 : Description of the Turtru TRBV pseudogenes, Table S5 : Contact analysis 3Hg1, Figure S1 : Flowchart that depicts the overall experimental setting and analy tutorsis, Figure S2 : Structure of the dolphin trypsin-like serine protease (TRY) genes within TRB locus (NC_047042.1). TRY1, TRY2, and TRY4 are partially annotated. The exons are represented with boxes and their positions within the NC_047042.1 sequence are reported. The red arrowheads indicate the transcriptional orientation of the genes, Figure S3 : Dot-plot matrix of dolphin/dromedarius TRB sequence genomic comparison. The transcriptional orientation of each gene is indicated by arrowheads. The colored rectangle (orange) encloses TRBV deleted regions in dolphin as referred to in the text. The green line indicates the only region where gene duplication took place in T. truncatus TRB locus, Figure S4 : Dot-plot matrix of dolphin/pig TRB sequence genomic comparison. The transcriptional orientation of each gene is indicated by arrowheads. The colored rectangle (orange) encloses TRBV deleted regions in dolphin as referred to in the text. The green line indicates the only region where gene duplication took place in T. truncatus TRB locus, Figure S5 : Dot-plot matrix of dolphin/goat TRB sequence genomic comparison. The transcriptional orientation of each gene is indicated by arrowheads. Colored rectangles (red) enclose TRBV duplicated regions in goat and (orange) TRBV deleted regions in dolphin as referred to in the text, Figure S6A : Sequence-structure alignment of the T. truncatus TR alpha chains with human TR alpha chains from the crystallized structures 3hg1.pdb and 5d2l.pdb, Figure S6B : Sequence-structure alignment of the T. truncatus TR beta chains with the human TR beta chains from the crystallized structures 3hg1.pdb and 5d2l.pdb, Figure S6C : Sequence-structure pairwise alignment of the human MH1 I-ALPHA (from 3hg1.pdb) with its closest counterpart in T. truncates, Figure S6D : Sequence-structure pairwise alignment of the human B2M (from 3hg1.pdb) with its closest counterpart in T. truncates, Dataset S1: Schematic representation (20X zoom) of the Tursiops truncatus TRB locus D-J-C region deduced from the genome assembly mTurTru1.mat.y (CM022282.1). The boxes representing the genes are not to scale. The financial support of the University of Bari and of University of Salento is gratefully acknowledged. This research was funded by "Programma Operativo Nazionale Ricerca e Innovazione 2014-2020-Fondo Sociale Europeo Azione I.2. Attrazione e mobilità internazionale dei ricercatori. Area strategica: Blue Growth". Institutional Review Board Statement: Zoomarine is a seaside water park. The availability of blood in this study was a byproduct of standard health checks. The blood samples were taken from the caudal vein on the ventral surface of the caudal fin. Training of medical behaviors brings the animals to collaborate completely, assuming and maintaining the positions that allow the veterinarian, constantly assisted by the trainers, to perform the necessary procedures to monitor their welfare. Zoomarine Italia SpA., via Casablanca 61, 00071 Pomezia (RM), Italy https://www.zoomarine.it/ (accessed on 10 February 2021). Origin of whales from early artiodactyls: Hands and feet of Eocene Protocetidae from Pakistan A complete phylogeny of the whales, dolphins and even-toed hoofed mammals (Cetartiodactyla) The Ancestor's Tale: A Pilgrimage to the Dawn of Evolution A map of the cis-regulatory sequences in the mouse genome Evolution of toll-like receptors in the context of terrestrial ungulates and cetaceans diversification Cytochrome b marker reveals an independent lineage of Stenella coeruleoalba in the Gulf of Taranto Analyses of RAG1 and RAG2 genes suggest different evolutionary rates in the Cetacea lineage The T Cell Receptor FactsBook Genomic and expression analyses of Tursiops truncatus T cell receptor gamma (TRG) and alpha/delta (TRA/TRD) loci reveal a similar basic public repertoire in dolphin and human The deduced structure of the T cell receptor gamma locus in Canis lupus familiaris The Camel Adaptive Immune Receptors Repertoire as a Singular Example of Structural and Functional Genomics Comprehensive genomic analysis of the dromedary T cell receptor gamma (TRG) locus and identification of a functional TRGC5 cassette Genomic organization and recombinational unit duplicationdriven evolution of ovine and bovine T cell receptor gamma loci Evolution of TRG clusters in cattle and sheep genomes as drawn from the structural analysis of the ovine TRG2@ locus Artiodactyl emergence is accompanied by the birth of an extensive pool of diverse germline TRDV1 genes Ovis aries) T cell receptor alpha (TRA) and delta (TRD) genes and genomic organization of the TRA/TRD locus Biocuration and Comparative Analysis of Bos taurus and Ovis aries TRA/TRD Loci Immunoglobulin (IG) and T cell receptor genes (TR): IMGT ® and the birth and rise of immunoinformatic Seven new dolphin mitochondrial genomes and a time-calibrated phylogeny of whales Database Resources of the National Center for Biotechnology Information The complete 685-kilobase DNA sequence of the human beta T cell receptor locus Locus maps and genomic repertoire of the human T cell receptor genes Sequence and evolution of the human T-cell antigen receptor beta-chain genes Overview of the germline and expressed repertoires of the TRB genes in Sus scrofa The occurrence of three D-J-C clusters within the dromedary TRB locus highlights a shared evolution in Tylopoda, Ruminantia and Suina Data charactering the genomic structure of the T cell receptor (TRB) locus in Camelus dromedarius Organization, structure and evolution of 41 kb of genomic DNA spanning the D-J-C region of the sheep TRB locus PipMaker-a web server for aligning two genomic DNA sequences Protein displays of the human T cell receptor alpha, beta, gamma and delta variable and joining regions WHO-IUIS Nomenclature Subcommittee for immunoglobulins and T cell receptors. WHO-IUIS Nomenclature Subcommittee for immunoglobulins and T receptors report The expansion of the TRB and TRG genes in domestic goats (Capra hircus) is characteristic of the ruminant species MUSCLE: A multiple sequence alignment with reduced time and space complexity MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets The neighbor-joining method: A new method for reconstructing phylogenetic trees Molecular Evolution and Phylogenetics The highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis IMGT/V-QUEST: IMGT standardized analysis of the immunoglobulin (IG) and T celle receptor (TR) nucleotide sequences The first tool for the analysis of the immunoglobulin and T cell receptor complex V IMGT standardized analysis of the V-J and V-D-J junction of the rearranged immunoglobulins (IG) and T cell receptors(TR) New methods for improved protein fold recognition and superfamily discrimination Protein Structure and Function Prediction Using I-TASSER FAD/NADH Dependent Oxidoreductases: From Different Amino Acid Sequences to Similar Protein Shapes for Playing an Ancient Function Computational approaches for protein function prediction: A combined strategy from multiple sequence alignment to molecular docking-based virtual screening An environment for comparative protein modeling. Electrophoresis Protein structure analysis of the interactions between SARS-CoV-2 spike protein and the human ACE2 receptor: From conformational changes to novel neutralizing antibodies Molecular modeling of antibodies for the treatment of TNFα-related immunological diseases Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP Germ line-governed recognition of a cancer epitope by an immunodominant human T-cell receptor A graphical interface for the FoldX forcefield The FoldX web server: An online force field IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains Genomic analysis reveals extensive gene duplication within the bovine TRB locus Comparative analysis of the TRB locus in the Camelus genus IMGT unique numbering for immunoglobulin and T cell receptor constant domains and Ig superfamily C-like domains Confidence limits on phylogenies: An approach using the bootstrap Prospects for inferring very large phylogenies by using the neighbor-joining method New insight into the genomic structure of dog T cell receptor beta (TRB) locus inferred from expression analysis Healthy human T-cell receptor beta-chain repertoire. Quantitative anlysis and evidence for J beta-related effects on CDR3 structure and diversity Bioinformatics in protein analysis Jalview Version 2-a multiple sequence alignment editor and analysis workbench Perles for the variable (V), constant (C), and groove (G) domains of IG T cell receptor/peptide/MHC molecular characterization and standardized pMHC contact sites in IMGT/3Dstructure-DB IMGT standardization for molecular characterization of the T cell receptor/peptide/MHC complexes IMGT unique numbering for MHC groove G-DOMAIN and MHC superfamily (MhcSF) G-LIKEDOMAIN A database and a tool for immunoglobulins or antibodies, T cell receptors, MHC, IgSF and MhcSF IMGT/3Dstructure-DB: Querying the IMGT database for 3D structures in immunology and immunoinformatics (IG or antibodies, TR, MH, RPI, and FPIA) IMGT standardized representation of domains (IG, TR, and IgSF variable and constant domains, MH and MhSF groove domains) IMGT Biocuration and Comparative Study of the T Cell Receptor Beta Locus of Veterinary Species Based on Homo sapiens TRB. Front Extensive analysis of D-J-C arrangements allows the identification of different mechanisms enhancing the diversity in sheep T cell receptor β-chain repertoire Method for assessing the similarity between subsets of the T cell receptor repertoire The basis for limited specificity and MHC restriction in a T cell receptor interface Coevolution of T-cell receptors with MHC and non-MHC ligands We thank Stefania Brandini and Angela Pala for technical assistance in cDNA cloning experiments and Vincenzo Tragni, for his support in computational analyses. We thank all the staff of Zoomarine Italy and all the trainers who dedicate themselves every day to dolphins. Through their work, they make possible the study of medical behaviors. We thank the members of the IMGT ® team for their expertise and constant motivation. The authors declare no conflict of interest.