key: cord-0895875-mm4zfp8u authors: Fukuzawa, Kaori; Kato, Koichiro; Watanabe, Chiduru; Kawashima, Yusuke; Handa, Yuma; Yamamoto, Ami; Watanabe, Kazuki; Ohyama, Tatsuya; Kamisaka, Kikuko; Takaya, Daisuke; Honma, Teruki title: Special Features of COVID-19 in the FMODB: Fragment Molecular Orbital Calculations and Interaction Energy Analysis of SARS-CoV-2-Related Proteins date: 2021-09-10 journal: J Chem Inf Model DOI: 10.1021/acs.jcim.1c00694 sha: 7b67fdbe5180634daad6b41776ae859f46bbbbf1 doc_id: 895875 cord_uid: mm4zfp8u [Image: see text] SARS-CoV-2 is the causative agent of coronavirus (known as COVID-19), the virus causing the current pandemic. There are ongoing research studies to develop effective therapeutics and vaccines against COVID-19 using various methods and many results have been published. The structure-based drug design of SARS-CoV-2-related proteins is promising, however, reliable information regarding the structural and intra- and intermolecular interactions is required. We have conducted studies based on the fragment molecular orbital (FMO) method for calculating the electronic structures of protein complexes and analyzing their quantitative molecular interactions. This enables us to extensively analyze the molecular interactions in residues or functional group units acting inside the protein complexes. Such precise interaction data are available in the FMO database (FMODB) (https://drugdesign.riken.jp/FMODB/). Since April 2020, we have performed several FMO calculations on the structures of SARS-CoV-2-related proteins registered in the Protein Data Bank. We have published the results of 681 structures, including three structural proteins and 11 nonstructural proteins, on the COVID-19 special page (as of June 8, 2021). In this paper, we describe the entire COVID-19 special page of the FMODB and discuss the calculation results for various proteins. These data not only aid the interpretation of experimentally determined structures but also the understanding of protein functions, which is useful for rational drug design for COVID-19. The COVID-19 pandemic has been ongoing since its declaration by WHO in March 2020, and as of June 8, 2021, it had killed 3 718 944 people and infected 172 million people worldwide. 1 COVID-19 has had a devastating impact on global health and economic activities, and the loss of social infrastructure due to urban lockdown has reached unprecedented levels, with no signs of abatement. To fight against COVID-19, there is a need to understand the causative virus, SARS-CoV-2, and develop effective vaccines and therapies. Consequently, structural analyses of SARS-CoV-2-related proteins have been rapidly conducted and are available worldwide in the Protein Data Bank (PDB). 2 In addition to experimental structures, researchers worldwide collate model structures, molecular dynamics (MD) simulation results, and bioinformatics data and made them available as their data resources. 3 The PDB Japan (PDBj) 4, 5 has categorized PDB data of SARS-CoV-2-related proteins into "All entries," "Representative entries," and "Latest entries" on the COVID-19 special page, and 1277 structures are available as of . The viral proteins include four structural proteins that form virus particles and 16 nonstructural proteins (nsps) intracellularly produced after infecting human cells. The structures of three structural proteins and 13 nsps have already been clarified by cryoelectron microscopy (Cryo-EM), X-ray, and NMR. Though these structural data are very useful information for developing therapeutic agents, there is a need to precisely calculate and clarify how these viral proteins interact with each other and when candidate therapeutic compounds bind strongly to the proteins. Based on these PDB structures, many molecular simulation studies have been actively carried out, and the most widely performed are classical MD calculations. For example, microsecond MD simulations 6−8 of SARS-CoV-2-related proteins have been performed using special-purpose computers such as Anton2 9 and MDGRAPE-4A, 10 and generalpurpose software such as GENESIS-2.0, 11 NAMD, 12 and GROMACS. 13 In particular, many dynamical structural analysis studies have been performed, such as the movement of the open structure of the spike (S) protein, 8 the evaluation of the effect of glycans, 8, 14, 15 the binding mechanism of the receptor-binding domain (RBD) of the S protein to the angiotensin-converting enzyme 2 (ACE2) or antibodies, 15 the binding of the main protease (Mpro) to inhibitors, 6, 16 and the apo and the remdesivir-binding structure of RNAdependent RNA polymerase (RdRp). 17, 18 As for QM calculation, the fragment molecular orbital (FMO) method 19, 20 is one of the methods that can treat the entire protein with uniform accuracy. The FMO calculation for several specific SARS-CoV-2-related proteins, such as the S protein, 21−25 Mpro, 26−30 and RdRp, 31 have been performed using the ABINIT-MP program 32, 33 and the GAMESS program. 34−36 The ABINIT-MP program used in this paper is capable of performing precise electron correlation calculations at the MP4 (SDQ) level for S proteins of up to 3300 residues using the Fugaku supercomputer. 22 On the other hand, the FMO-DFTB method included in GAMESS is suitable for screening owing to its much lower computational cost. 37 The FMO-DFTB method has also been employed as a part of the in silico drug discovery pipeline using supercomputers. 38 In contrast to classical MD calculations, which reveal the dynamic behavior and molecular mechanisms of proteins, QM calculations can reveal the precise electronic structure and interactions in a stationary structure (often an X-ray or cryo-EM structure). These two types of molecular simulations are used in complementary ways. Here, we have performed several FMO-based 19, 20 quantum chemical calculations on the entire structure of SARS-CoV-2related proteins, focusing on the representative entries of PDBj, and all of the results have been published in the FMO database (FMODB) 39, 40 since April 2020. In FMODB, all FMO calculation results can be downloaded and molecular interaction analysis can be performed through the web interface and BioStation Viewer software. 41 In this paper, we describe the FMO calculation results (681 structures as of June 8, 2021) available on the COVID-19 special page of the FMODB. We performed FMO calculations for 681 protein structures, mainly from the representative entries published on the COVID-19 special page of the PDBj to reveal their precise interaction energies. MOE 42 was employed for molecular modeling, and structural refinement was performed according to the resolution of the registered structures following the procedure shown in Figure 1 . Here, missing atoms in the PDB structure are complemented, and appropriate structural optimization is performed with constraints to avoid large deviations from the experimental structure. The level of structural optimization is changed according to the experimental method and its resolution but, for some data, exceptions are made, such as performing structural optimization for nonhydrogen atoms, even when the resolution is less than 2.0. The basic fragmentation method for FMO calculations is as follows: proteins are fragmented into amino acid residue units, nucleic acids are fragmented into backbone and base units, and ligands are fragmented into one or several fragments. 43 Quantum chemical calculations were performed at the theoretical level of FMO2-MP2/6-31G* 44, 45 to calculate the total energy (E total ), atomic charge, and interfragment interaction energy (IFIE; ΔEĨ J ). The IFIE can be further decomposed into four energy components, including electrostatic (ES), exchange repulsion (EX), charge transfer with higher-order terms (CT+mix, hereafter simply referred to as CT in the text), and dispersion (DI) terms. This is known as the pair interaction energy E E E IJ IJ IJ IJ IJ ES EX CT mix DI Δ̃= Δ̃+ Δ̃+ Δ̃+ Δ+ (2) PIEDA is vital in applying the FMO method to drug discovery because it gives information about the characteristics of the interaction, in addition to their magnitude (stable or unstable). For interpreting PIEDA, refer to FMO books 20, 34 and the original reference. 46 Hydrogen bonding is mainly detected as the stabilization energy for the ES and CT terms, and the hydrophobic type of interactions, such as CH/π and π−π, are mainly detected as the stabilization energy for the DI term. Among the data registered in the FMODB, for entries for which the binding energy can be defined, such as protein− ligand, protein−protein, and protein−RNA, the binding energy is calculated from the sum of IFIEs. The binding energy between molecules A and B (ΔE AB ) is expressed as follows: All calculations in the FMODB were performed using ABINIT-MP, 32 39 Each FMODB ID page contains information on the used PDB structures and calculation conditions, and the entire dataset, including ABINIT-MP input and output files, can be downloaded. In addition, the binding energies and IFIE/PIEDA energy lists are provided to analyze the interaction data using the web interface. BioStation Viewer 41 can be used to perform a detailed analysis using the downloaded data. As an example, Figure 2 shows a detailed view of the FMODB for the complex structure of the main protease and ligand. The upper part shows the structural data and calculation conditions, and the lower part shows the interaction energies with amino acid residues near the ligand by the PIEDA energy component. The bottom chart shows the interaction energies with amino acid residues near the ligand by the PIEDA energy component. Conditions, such as the range to be displayed, can be specified. SARS-CoV-2 is a single-strand positive-sense RNA virus composed of four structural proteins and RNA genes. After infecting a host cell, the virus synthesizes 16 types of nsps. S protein, a type of structural protein that protrudes from the viral surface, plays an important role in the process of viral infection of the hosts, making it an important target for vaccines and therapeutic agents. 48, 49 For nsps, protease and RdRp are also promising drug targets. Two proteases of SARS-CoV-2 are involved in the cleavage of polyproteins synthesized from RNA to produce various proteins. The Mpro cleaves the polyprotein at 11 sites, and the papain-like protease (PLpro) cleaves the polyprotein at three specific sites. 50 RdRp and helicase are the key enzymes in gene duplication by creating a replication and transcription complex (RTC). 51 The number of registered structures on June 8, 2021 is shown in Figure 3 . About 50% of the 681 structures registered in the FMODB are the Mpro, followed by S protein and PLpro. In addition, ADP ribose phosphatase, endoribonuclease, 2′-O-ribose methyltransferase, helicase, etc., have been widely registered. The experimentally determined structures of these SARS-CoV-2-related proteins are shown also in Figure S1 .The number of entries per protein depends on the number of representative entries in the PDBj site, and the FMODB preferentially collects the structures with the highest resolution among the PDB entries with 100% identical amino acid sequences, i.e., representative structures in the PDBj. The FMODB also preferentially collects the structures whose primary citation information is publicly available from the PDB site. The following is a typical example of registered structures. 3.1. Papain-like Protease (nsp3). PLpro is an essential coronavirus enzyme required for the cleavage of viral polyproteins at three sites, and the catalytic residues for processing are Cys111 and His272. 52, 53 The substrate-binding site of PLpro is exposed to solvent and contains a flexible βhairpin loop (G266−G271) called the blocking loop 2 (BL2), which changes its structure significantly depending on the substrate and inhibitor configuration. 54 PLpro is also involved in the cleavage of post-translational modifications of proteins on host proteins (deubiquitination) as an evasion mechanism against the host antiviral immune responses. 55, 56 Some examples of covalent and noncovalent ligand complexes registered in the FMODB are as follows. As an example of FMO calculations, we show the results for the X-ray crystal structure of the tetrapeptide Ac-hTyr-Dap-Gly-Gly-VME (VIR251) complex, which is one of the peptide inhibitors of PLpro and is covalently bound to PLpro at Cys111 (PDB ID: 6WX4, 57 FMODB ID: YQG52). VIR251 strongly inhibits the activity of both SARS-CoV PLpro and SARS-CoV-2 PLpro. 57 Although VIR251 is a fourresidue ligand, we treated the entire ligand as a single fragment (the residue names of the ligands are registered as LIG). Figure 4 shows the interaction between the entire ligand molecule and its surroundings and the interaction energy quantified by PIEDA. The binding energy between VIR251 and SARS-CoV-2 PLpro, according to eq 3, was −170.6 kcal/ mol (except for the interaction with Cys111, which is covalently bound to VIR251). ES (−132.9 kcal/mol) contributed most to the binding energy, followed by DI (−84.2 kcal/mol) and CT (−41.3 kcal/mol). Since VIR251 has a positive charge (+1e), the ES interaction (−107.2 kcal/ mol) with the acidic residue Asp164, which forms a hydrogen bond with the NH of hTyr, contributes significantly to the binding energy. In addition, the ES and CT components of the hydrogen bond-forming residues, such as Gly163, Tyr264, Tyr268, Gly271, etc., and the DI component due to the CH/ π interactions with surrounding hydrophobic residues, such as Leu162, Pro248, Tyr264, etc., also contributed significantly to the binding energy. In the interaction with the catalytic His272 fragment, a dispersion interaction derived from the imidazole ring of His272 was observed, in addition to the hydrogen bond with the CO derived from Gly271. Note that the CO from Gly271 belongs to the His272 fragment due to fragmentation. Strong interactions between VIR251 and several water molecules were also detected, suggesting that the role of water molecules in bridging ligands and PLpro is important in designing new inhibitors. Details of these energetic analyses can be performed freely using the FMODB. Next, the FMO results for the X-ray crystal structure of the tetramer with the neutral molecule GRL0617 bound as a noncovalent ligand have also been registered for each monomer (PDB ID: 7CMD, 54 FMODB ID: R1278, Z284N, 6321Z, 2KQVR). GRL0617 has an IC 50 of 2.2 ± 0.3 μM/L and is considered a promising candidate compound. 54 The average binding energy of GRL0617 and each monomer registered in FMODB was −112.5 ± 2.7 kcal/mol, which is about two-thirds of that of the positively charged covalent ligand VIR251. The number of interacting residues was also smaller than that of VR251, but the CH/π interaction with Tyr268 in the BL2 loop, the hydrogen bond with Gln269, and the hydrogen bond and CH/π interaction with Asp164 in the α3-to-α4 loop were large ( Figure 5 ), suggesting that GRL0617 inhibits the inversion of the BL2 loop by interacting with these loops, thereby inhibiting the recognition of the substrate, which is consistent with the structural considerations. 54 3.2. ADP Ribose Phosphatase (nsp3). Nsp3 encodes the largest multidomain, including the ADP ribose phosphatase domain. Although its function has not yet been elucidated, it is inferred to be related to the ability of proteins to remove ADP-ribose from ADP-ribosylated proteins and RNA. 58 The X-ray crystal structures of the complexes with ADP, AMP, etc., have been analyzed and they include catalytic water. 58 The complex of ADP ribose phosphatase and ADP ribose (PDB ID: 6W02, FMODB ID: 1JLYZ, Figure 6a ), which is drawn based on the atomic coordinates of the PDB crystal structure, shows that only the interaction between Asp22 and Lys44 is extracted. On the other hand, the PIEDA energies of the adenosine portion of ADP ribose in FMO calculations (Figure 6b ,c) show that adenine is stabilized by DI with Phe156 and ES with Asp22 and Ile23 (CO of the Asp22 main chain in the biochemical description). The side chains of Phe156 and Ile23 interact with the adenine base in a CH/ π interaction, indicating a relatively large DI component. A similar analysis on the diphosphate moiety of the ligand showed strong ES interactions with the two surrounding hydrogen-bonded water molecules, Val49, Gly130, Ile131, and Phe132. The analysis of the ribose moiety showed strong ES interactions with Ala39, Asn40, and Gly48. Lys44 also showed strong ES interactions with the diphosphate moiety, although they were separated by more than 4.5 Å. The details of the interaction energy values from these single fragments are available on the FMODB web interface. When the ligand consists of multiple fragments, the BioStation Viewer 41 can be used for more detailed ligand interaction energy analysis. Figure S2 shows a mapping of the interaction energies between the entire ligand molecule and the surrounding residues. From the whole ligand molecule, the interactions with two water molecules (HOH332 and HOH384) and Gly130, which are involved in the phosphatase activity, are more clearly shown. 3.3. Main Protease (nsp5). The Mpro is an enzymatic protein encoded in the viral genome, which is one of the two coronavirus proteases responsible for the cleavage of polyproteins translated from genomic RNA and is involved in the maturation of viral proteins. 50 The Mpro has been studied worldwide as one of the most important drug targets for COVID-19. In structural biology research, the first crystal structure of SARS-CoV-2-related protein was the covalent complex structure of the Mpro and the N3 inhibitor (PDB ID: 6LU7 59 ). The structures of the Mpro in the PDB are obtained from X-ray crystallography. Most of the structures are complex with ligands, and the binding mode is either noncovalent or covalent with Cys145. For the complex of the Mpro and the N3 inhibitor (PDB ID: 6LU7), FMO calculations were performed and registered in the FMODB by Hatada et al. 26 when the PDB structure was published (FMODB ID: R1GK8). In addition, statistical interaction analyses using FMO calculations for several sampled structures from classical MD calculations have been reported. 27 Among the COVID-19-related proteins in the FMODB, the Mpro has the highest number of registered structures. The Mpro is a three-domain cysteine protease, and the substrate-binding pocket consists of four subsites (S1′, S1, S2, and S4) and the Cys145−His41 catalytic dyad. 60, 61 Here, we classify the features of the pockets regarding the interaction of 110 ligand complex structures registered in the FMODB (Figure 7) . Figure 7a shows the superposition of eight covalent structures and one noncovalent structure, which are classified as representative structures in the PDBj. The covalent structures, in particular, contact all four subsites. We performed a self-organizing map (SOM) clustering analysis 62 of the interactions based on PIEDA energies for the 110 ligand complex structures. Figure 7b shows 14 clusters to which four or more ligands belong (figures for each cluster can be found in Figure S3 ). Most of the clusters have ligands localized to a single site and some ligand clusters were bound across multiple sites. All FMO calculation results for these structures are registered in the FMODB and can be accessed by anyone. Here, we discuss one example of each of the covalent and noncovalent ligands shown in Figure 7a . ligands, GC-376, is covalently bound to Cys145 of the Mpro after changing to GC-373. The S1, S2, and S4 binding sites interact with the substituents of the ligand (Figure 8a) , and PIEDA analysis showed that the interacting residues are His41, His163, Met165 (His164 CO), Glu166, and Gln189 (Figure 8b,c) . In the S1 site, χ-lactam of the ligand is hydrogen-bonded to His163, the side chain of Glu166, and Met165 (CO of the His164 main chain in the biochemical notation), resulting in a strong ES + CT interaction. In the S2 site, the isopropyl group of the ligand interacts with the His41 of the dyad in CH/π. In the S4 site, the amide moiety of the ligand is hydrogen-bonded to the main chains of Glu166 and Gln189, which are stabilized by ES + CT. Gln189 also has a large DI component, which may be attributed to the dispersion interaction between the benzene ring of the ligand and the α and β carbons of the Gln189 side chain. The benzene ring also has a dispersion interaction with the methyl group of Met165 (Figure 8c ). 63 The binding energy is −130.6 kcal/mol, except for Cys145, which is covalently bound. In the X-ray structure of the complex of the Mpro with the noncovalent ligand X77 (PDB ID: 6W79, FMODB ID: N3QNQ), the specific interactions with His41, Asn142, Gly143, His163, Met165, and Glu166 were confirmed by FMO calculations (Figure 9 ). According to the PIEDA component shown in Figure 9b ,c, as in the case of the covalent ligands, the CH/π interaction (DI) of His41 and the hydrogen bonds (ES + CT) between His163 and Met165 (CO of His164) are maintained but there is no DI in the Met165 side chain. For Glu166, the hydrogen bond in the main chain (ES + CT) is preserved but the side-chain hydrogen bond is absent, and the CH/π interaction is formed between the β-carbon and the pyridine ring of the ligand (shown as the DI component). On the other hand, several additional interacting residues are observed around the imidazole ring. In particular, there is a CH/O bond with Asn142, a hydrogen bond with Glu143, and a dispersion interaction with Cys145 since it is a noncovalent ligand. The ligand-binding energy is −149.4 kcal/mol, which is more stabilizing than that of the covalent ligand. A comparison between Figures 8c and 9c shows that the noncovalent ligand has more interactions than the covalent ligand. 3.4. RNA-Dependent RNA Polymerase (nsp12) Complex. RdRp is an RNA synthesis enzyme involved in the viral genome replication, forming RTC with cofactors nsp7−nsp9 and helicase nsp13 51,64 ( Figure 10 ). RdRp is responsible for replicating the genome, which is most important for the survival of the virus. RdRp is an important target of antiviral drugs, including remdesivir and favipiravir, and several ligand-containing PDB structures have been published. The structures registered in the FMODB are the RdRp−nsp7−nsp8−RNA−remdesivir complex (PDB ID: 7BV2, 65 FMODB ID: 1JL3Z 31 ), the complex without ligand (PDB ID: 6YYT), 64 the complex without RNA (PDB IDs: 7BV1, 6M71, 7BW4), nsp7−nsp8 complexes, and nsp9 alone. The FMO calculations by Kato et al. 31 for the structure of the complex containing nucleic acid analogue remdesivir show that remdesivir incorporated into the end of the template-primer-RNA duplex retains a nucleotide mimic interaction and is stabilized by hydrogen bonding between base pairs and π−π stacking between inter-and intra-RNA strands (Figure 11 ). In particular, the stacking interaction is enhanced by the carbon-to-nitrogen substitution introduced into the adenine backbone of remdesivir. In addition, the interaction between remdesivir and RdRp is stabilized by hydrogen bonding with the Asn691 and ES interaction with Asp760. Furthermore, the interaction of remdesivir with Thr687 is stabilized by OH/π interactions using π electrons of the cyano group introduced into the sugar of remdesivir, in addition to hydrogen bonding. Thus, the interaction of remdesivir in the inhibition process of the strand elongation is clarified. In the interaction between the RNA duplex and the protein, RdRp (nsp12) is mainly responsible for stabilization (about −600 kcal/mol), nsp8 is responsible for about half (−280 kcal/mol), and nsp7 is responsible for −24 kcal/mol. Figure S4 shows a comprehensive interactionenergy diagram, IFIE-MAP. 3.5. Helicase (nsp13). Helicase forms an RTC complex with RdRp, and both RdRp and helicase are essential for viral replication. 66 Helicase may be useful for smooth replication of the viral genome, such as backtracking RdRp 67 and unwinding RNA. 68 The FMODB has entries on the complex structure with (3-fluoro-4-methylphenyl)methanesulfonamide (PDB ID: 5RL6, FMODB ID: N3YLQ) and about 60 apo structures. The results of the complexes ( Figure 12) show strong hydrogen bonding between the amide group of the ligand and Lys192, Tyr224, Thr228, and HOH833. Most of the interactions with the ligand are ES but only Val226 exhibits a CH/π interaction with the aromatic ring of the ligand, indicating stabilization of the DI component. However, in this calculation, the Zn atom coordinating with helicase is neglected and the FMO calculation that includes Zn is in progress. We shall also perform FMO calculations for the RTC giant complexes (PDB ID: 6XEZ, 7CXN, etc.) consisting of RdRp−helicase−RNA−ADP. 3.6. Endoribonuclease (nsp15). Endoribonuclease is a uridylate-specific endoribonuclease that cleaves the poly-U sequence of RNA, 69 and a structure with uridine-5′monophosphate bound is available in the PDB (PDB ID: 6WLC, FMODB ID: N3G7Q). 70 The coordinate information of the crystal structure (Figure 13a) suggests only Ser294 as an interacting residue between the whole ligand molecule and its surroundings. On the other hand, the PIEDA interaction energies calculated by FMO (Figure 13b,c) show a strong ES interaction between the uridine portion of the ligand and Lys290 and Ser294, as well as stabilization by DI interactions with Tyr343. A similar analysis on the monophosphate moiety of the ligand showed strong ES interactions with hydrogen-bonded Tyr343, water molecules (HOH540, HOH646), and Lys290. Figure S5 shows the interaction energies of the entire ligand molecule and its surroundings mapped onto the structure. From the whole ligand molecule, strong ES interactions with Ser294 and Tyr343, which are thought to be important for RNA recognition and degradation, and stabilization by DI interactions with His250 are observed. On the other hand, the PIEDA results show that the positively charged SAM is stabilized by strong ES interactions with Asp6897, Asp6912, Asp6928, Asp6931, Asn6841, and Ala6870 (Figure 14b,c) . The binding pocket of SAM is composed of amino acids N43 (Asn6841), Y47 (Tyr6845), G71 (Gly6869), G81 (Gly6879), D99 (Asp6897), N101 71 The corresponding residues in Figure 14c are underlined. FMO analyses showed that the largest interactions are of the order of D99 (Asp6897), which is hydrogen-bonded to the hydroxyl group of the adenosine sugar backbone of SAM, D130 (Asp6928), and N43 (Asn6841), which are hydrogenbonded to the amine and carboxyl groups of the methionine portion of SAM, respectively. Recently, a complex structure of the methylated CAP-like RNA and SAH was published and registered to the FMODB (PDB ID: 7LW3, FMODB ID: 2K6MR). In this 2′-O-ribose methyltransferase complex, nsp10 contains two Zn atoms, which requires a special fragmentation process around Zn. The calculation of the Zncontaining structure has been carried out in the example of the complex of nsp10−nsp16 with the candidate inhibitor sinefungin (PDB ID: 6YZ1, FMODB ID: 6394Z). As shown in Figure S6 , the side-chain portions of the coordinating His and Cys residues are fragmented to be identical to Zn. We are currently working on automatic fragmentation around Zn. 3.8. Spike (S) Protein. One of the structural protein encoded in the viral genome, spike (S) glycoprotein, is a characteristic feature of coronaviruses, protrusions on the surface of the viral particle that binds to the ACE2 receptor on the host cell and allows the virus to enter the cell. 48, 49 The S protein has the highest number of published PDB entries, along with the Mpro. Cryo-EM is the main tool for the structural analysis of the whole trimer, and X-ray data are also available for the receptor-binding domain (RBD)− antibody substructure. Since An example of FMO-based PPI analyses using the RBD of the S protein and ACE2 complex (PDB ID: 6LZG, FMODB ID: 4NZVN) is shown in Figure 15 . To identify the amino acid residues important in molecular recognition, we first mapped the main PIEDA components of the interactions between the whole ACE2 and each residue of the S protein onto the molecular structure of the S protein ( Figure 15a ) and vice versa (Figure 15b ). The strength of the interaction energy is indicated by a color gradation for each amino acid residue. Since ACE2 is a −26e charged protein, the charged amino acid residues throughout the S protein interact electrostatically (Figure 15a ). On the other hand, the RBD of the S protein has a charge of +2e, indicating that the interacting amino acid residues are more concentrated at the binding interface compared to that in ACE2 (Figure 15b ). Due to the large negative charge of ACE2, hydrogen bonds and XH/π interactions, which are key interactions in molecular recognition of the S protein, are hidden by ES interactions of the charged residues. However, the hydrogen bond can be easily interpreted by analyzing the ES and CT terms of PIEDA, and the XH/π interaction can be easily interpreted by analyzing the DI term. 21 In addition, we performed PIEDA analysis of the mutant residues of the S protein, Lys417 and Asn501, in the South African and Brazilian mutants and the surrounding residues ( Figure 15b,c) . Lys417 forms a salt bridge with Asp30 of ACE2 and shows the strongest attractive interaction (−116.3 kcal/ mol) between the two proteins. On the other hand, the main contribution of Asn501 is ES with the surrounding amino acid residues (Tyr41, Lys353, Gly354, etc.). In a previous study, it was reported that the N501Y mutation enhances the ability of the S protein to bind to ACE2 by acquiring more hydrogen bonds and XH/π interactions than wild type. 21 As an example of RBD−neutralizing antibodies, we show the FMO results for a structure containing BD-629 Fab, which has many overlapping active sites on the RBD with the ACE2 receptor and a high activity value (IC 50 ) (PDB ID: 7CH5, FMODB ID: JM5M9). 23 First, the main components of PIEDA for the PPI of RBD and BD-629 Fab were mapped on the molecular structure (Figure 16a,b) . ES is the main component of many interactions between the RBD and BD-629 Fab, whereas only Tyr52 H , Tyr99 H , and Phe32 L of BD-629 Fab interact with the RBD by DI as the main component (subscripts L and H indicate heavy and light chains of the antibody, respectively). Next, the PIEDA on a single residue of Lys417 (Figure 16c,d) showed that Asp101 H and Tyr102 H exhibit strong ES interactions. In particular, a salt bridge is formed between Lys417 and Asp101 H , and the ES term is about −150 kcal/mol. In addition, Tyr52 H forms a CH/π interaction with the Lys417 side chain, and the DI term is the major component. On the other hand, considering Asn501 (Figure 16e,f) , hydrogen bonds are formed between Asn501 and Ser30 L , indicating a strong ES interaction. These results suggest that Lys417 and Asn501, which are character-istic of South African and Brazilian mutants, contribute significantly to the interaction with neutralizing antibodies similar to the ACE2 receptor. Details of the epitope analysis by neutralizing antibodies 23 and mutant strains 24 are available in the respective references. 3.9. Nucleocapsid (N) Protein. The nucleocapsid (N) protein, an structural protein, binds to viral genomic RNA to form a capsid, which constitutes the viral particle. Both the N-terminal domain (NTD) and C-terminal domain (CTD) are registered in the PDB. 76 The NTD is a major RNAbinding domain and its complex structure with single-and double-stranded RNA has been determined by solution NMR. 77 The CTD is a dimerization domain and its dimeric structure has been obtained by X-ray crystallography, 78, 79 suggesting that it also has an RNA-binding ability. The calculation results of these structures are registered in the FMODB, and representative analysis of the complex structure of the NTD and single-stranded RNA (PDB ID: 7ACT, FMODB ID: N3Y7Q) is described below. Figure 17a shows the interaction of each amino acid residue of the N protein from RNA, mapping the main components of the PIEDA on the molecular structure. We confirmed that the molecular recognition of RNA is mainly due to ES interactions. On the other hand, Figure 17b shows the DI interaction between the protein and only the base parts of RNA, indicating that DI plays an important role in recognizing the bases. For example, a large DI is observed in the interaction between the U3 base and Arg95, indicating π−π stacking between the pyrimidine of U3 and guanidyl group of Arg95 (Figure 17c,d) . Figure 17e shows the structure of the NTD with and without RNA binding (PDB ID: 7CDZ, FMODB ID: R12G8). As shown in Figure 17f , the interaction energy between the red-framed loop (Arg92−Ser105) and RNA is −1191.5 kcal/mol, which accounts for 53.8% of the total interaction. This indicates that the structural change is important in the binding of the N protein to RNA. 3.10. Other SARS-CoV-2. Structures other than the above nine are currently classified as other SARS-CoV-2. Although several cryo-EM structures of nsp1, which suppress the innate immune function of the host, in a huge complex with human 40S ribosomal subunit have been published in the PDB, 80, 81 the FMODB currently has only the calculated X-ray crystal structure of nsp1 alone (PDB ID 7K7P, FMODB ID: 5965Z), 82 In this study, we present an overview of the COVID-19 special feature in FMODB, describing the contents of the registered data and analysis methods for each type of SARS-CoV-2-related protein. We have collected data mainly on the structures classified as representative entries in the COVID-19 special page of the PDBj but it is difficult to completely cover them due to some technical issues. First, regarding the size of the calculated protein, complex structures of up to 1000 residues are mainly registered in the FMODB. Among the FMO calculations performed by ABINIT-MP so far, the largest-sized protein is the trimer of S protein (3300 residues), 22 of both open and closed structures. In the near future, we shall perform FMO calculations for large complex systems, such as RTCs of RdRp, including helicase, nsp1− ribosomal subunit, and spike trimer−antibody, and elucidate the molecular recognition mechanism in these complex systems. Second, for S glycoproteins containing carbohydrate chains, we have calculated structural models with and without carbohydrate modification and registered both of them in the FMODB (e.g., PDB ID: 6LZG, FMODB ID: 4NZVN and YQG92). Furthermore, for the NMR structure, the PDB contains dozens of MODEL structures due to the fluctuating structure in solution. We would like to register all of these MODELs as a standard of FMODB in the future. The fourth issue is the handling of Zn atoms: for the Zn-containing proteins PLpro, RdRp, helicase, nsp10, and ACE2, only a few of the complex structures have been currently calculated in the presence of Zn because of the large amount of manual work involved in fragmentation (see Section 3.7 for details). Since Zn is located far from the active site in these proteins, its presence or absence would not affect important interactions. Automatic fragmentation around Zn is currently under development and will be recalculated and registered in the FMODB as soon as it is available within this year. We shall address these technical issues in the activities of the FMO drug design consortium (FMODD). 86, 87 The last issue is the expansion of the FMO programs that are eligible for FMODB registration. Although the current FMODB includes only results calculated using the ABINIT-MP program, 32,33 some groups have performed FMO calculations for COVID-19-related proteins, such as Mpro 29, 30 and S proteins, 25 using the GAMESS program. 34−36 We currently improve FMODB so that the results of FMO calculations can be registered in GAMESS. Experimental structural studies of SARS-CoV-2 by structural biologists worldwide and their publications in the PDB are a valuable source of information for the fight against COVID-19. The interaction data published in the FMODB enhance our understanding of SARS-CoV-2 by adding quantitative interaction to structural information. Clarifying structural features of therapeutic agents important to protein interactions would not only help in the rational design of therapeutic agents but also contribute to the epitope analysis of mutant strains, which is currently the focus of global attention, and the identification of interactions leading to vaccine development. We shall continuously update the FMODB by performing FMO calculations on newly released important PDB structures, which we hope will help overcome COVID-19. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.1c00694. Structures of SARS-CoV-2-related proteins are displayed in Figure S1 . Interaction energies between the whole ligand are divided into multiple fragments and the surrounding residues ( Figures S2 and S5) . The substrate-binding pocket of the main protease and the structure of the 110 compounds shown in Figure 7b are displayed for each ligand cluster classified by SOM ( Figure S3 ). The full IFIE map between all fragments that consist of the complex structure of RdRp−nsp7− nsp8−RNA−remdesivir ( Figure S4 ). The fragmentation method of the four-coordinated Zn atom and the surrounding Cys and His residues ( Figure S6) WHO Coronavirus Disease (COVID-19) Situation Reports The Impact of Structural Bioinformatics Tools and Resources on SARS-Cov-2 Research and Therapeutic Strategies. Briefings Bioinf. 2021 Protein Data Bank Japan (PDBj): Updated User Interfaces, Resource Description Framework, Analysis Tools for Large Structures New Tools and Functions in Data-Out Activities at Protein Data Bank Japan (PDBj) Drug Binding Dynamics of the Dimeric SARS-Cov-2 Main Protease, Determined by Molecular Dynamics Simulation Elucidation of Interactions Regulating Conformational Stability and Dynamics of SARS-CoV-2 S-protein 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer Scalable Molecular Dynamics with NAMD High Performance Molecular Simulations Through Multi-Level Parallelism From Laptops to Supercomputers Beyond Shielding: The Roles of Glycans in the SARS-CoV-2 Spike Protein Is the Rigidity of SARS-CoV-2 Spike Receptor-Binding Motif the Hallmark for Its Enhanced Infectivity? Insights from All-Atom Simulations Remdesivir Strongly Binds to Both RNA-Dependent RNA Polymerase and Main Protease of SARS-CoV-2: Evidence from Molecular Simulations Remdesivir-Bound and Ligand-Free Simulations Reveal the Probable Mechanism of Inhibiting the RNA Dependent RNA Polymerase of Severe Acute Respiratory Syndrome Coronavirus 2 Structural Basis of the Potential Binding Mechanism of Remdesivir to SARS-CoV-2 RNA-Dependent RNA Polymerase Fragment Molecular Orbital Method: an Approximate Computational Method for Large Molecules Molecular Recognition of SARS-Cov-2 Spike Glycoprotein: Quantum Chemical Hot Spot and Epitope Analyses Interaction Analyses Of SARS-Cov-2 Spike Protein Based on Fragment Molecular Orbital Calculations Intermolecular Interaction Analyses on SARS-CoV-2 Spike Protein Receptor Binding Domain and Human Angiotensin-Converting Enzyme 2 Receptor-Blocking Antibody/Peptide Using Fragment Molecular Orbital Calculation Computational Ab Initio Interaction Analyses Between Neutralizing Antibody and SARS-Cov-2 Variant Spike Proteins Using the Fragment Molecular Orbital Method Hot Spot Profiles of SARS-Cov-2 and Human ACE2 Receptor Protein Protein Interaction Obtained by Density Functional Tight Binding Fragment Molecular Orbital Method Statistical Interaction Analyses Between SARS-Cov-2 Main Protease and Inhibitor N3 By Combining Molecular Dynamics Simulation and Fragment Molecular Orbital Calculation Dynamic Cooperativity of Ligand-Residue Interactions Evaluated with the Fragment Molecular Orbital Method Why Are Lopinavir and Ritonavir Effective against the Newly Emerged Coronavirus 2019? Atomistic Insights into the Inhibitory Mechanisms Interaction of 8-Anilinonaphthalene-1-Sulfonate with SARS-CoV-2 Main Protease and Its Application as a Fluorescent Probe for Inhibitor Identification Intermolecular Interaction among Remdesivir, RNA and RNA-Dependent RNA Polymerase Of SARS-Cov-2 Analyzed by Fragment Molecular Orbital Calculation Electron-Correlated Fragment-Molecular-Orbital Calculations for Biomolecular and Nano Systems Guiding Medicinal Chemistry with Fragment Molecular Orbital (FMO) Method The Fragment Molecular Orbital Method: Theoretical Development, Implementation in GAMESS, and Applications Recent Development of the Fragment Molecular Orbital Method in GAMESS Accurate Scoring in Seconds with the Fragment Molecular Orbital and Density-Functional Tight-Binding Methods FMODB: The World's First Database of Quantum Mechanical Calculations for Biomacromolecules Based on the Fragment Molecular Orbital Method Development of an Automated Fragment Molecular Orbital (FMO) Calculation Protocol Toward Construction of Quantum Mechanical Calculation Database for Large Biomolecules Molecular Operating Environment (MOE), 2019.01; Chemical Computing Group ULC: Montreal, QC, Canada How to Perform FMO Calculation in Drug Discovery A Parallelized Integral-Direct Second-Order Moller-Plesset Perturbation Theory Method with a Fragment Molecular Orbital Scheme Large Scale MP2 Calculations with Fragment Molecular Orbital Scheme Pair Interaction Energy Decomposition Analysis Implementation of Pair Interaction Energy DecompositionAnalysis and Its Applications to Protein-Ligand Systems Structures of Human Antibodies Bound to SARS-CoV-2 Spike Reveal Common Epitopes and Recurrent Features of Antibodies Veesler, D. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Crystallographic Studies on Coronaviral Proteases Enable Antiviral Drug Design Cryo-EM Structure of an Extended SARS-CoV-2 Replication and Transcription Complex Reveals an Intermediate State in Cap Synthesis Identification of a Novel Cleavage Activity of the First Papain-Like Proteinase Domain Encoded by Open Reading Frame 1a of the Coronavirus Avian Infectious Bronchitis Virus and Characterization of the Cleavage Products Identification of Severe Acute Respiratory Syndrome Coronavirus Replicase Products and Characterization of Papain-Like Protease Activity Crystal Structure of SARS-CoV-2 Papain-like Protease Severe Acute Respiratory Syndrome Coronavirus Papain-Like Protease Ubiquitin-Like Domain and Catalytic Domain Regulate Antagonism of IRF3 and NF-κB Signaling Dikic, I. Papain-Like Protease Regulates SARS-Cov-2 Viral Spread and Innate Immunity Activity profiling and crystal structures of inhibitorbound SARS-CoV-2 papain-like protease: A framework for anti-COVID-19 drug design Crystal Structures of SARS-Cov-2 ADP-Ribose Phosphatase: from the Apo Form to Ligand Complexes Structure of M(pro) from SARS-CoV-2 and Discovery of Its Inhibitors Design of Wide-Spectrum Inhibitors Targeting Coronavirus Main Proteases Novel Type of Virtual Ligand Screening on the Basis of Quantum-Chemical Calculations for Protein−Ligand Complexes and Extended Clustering Techniques GC-376, and Calpain Inhibitors II, XII Inhibit SARS-Cov-2 Viral Replication by Targeting the Viral Main Protease Structure of Replicating SARS-CoV-2 Polymerase Structural Basis for Inhibition of the RNA-Dependent RNA Polymerase from SARS-Cov-2 by Remdesivir Structural Basis for Helicase-Polymerase Coupling in the SARS-CoV Replication-Transcription Complex RNA Polymerase Backtracking in Gene Regulation and Genome Instability DnaB Drives DNA Branch Migration and Dislodges Proteins While Encircling Two DNA Strands Coronavirus Endoribonuclease Targets Viral Polyuridine Sequences to Evade Activating Host Sensors Tipiracil Binds to Uridine Site and Inhibits Nsp15 Endoribonuclease NendoU from SARS-CoV-2 High-Resolution Structures of the SARS-Cov-2 2′-O-Methyltransferase Reveal Strategies for Structure-Based Inhibitor Design New Variant of SARS-Cov-2 in UK Causes Surge of COVID-19 Emergence and Rapid Spread of a New Severe Acute Respiratory Syndrome-Related Coronavirus 2 (SARS-CoV-2) Lineage with Multiple Spike Mutations in South Africa Structural Basis of RNA Recognition by the SARS-Cov-2 Nucleocapsid Phosphoprotein High-Resolution Structure and Biophysical Characterization of the Nucleocapsid Phosphoprotein Dimerization Domain from the Covid-19 Severe Acute Respiratory Syndrome Coronavirus 2 Structural Insight Into the SARS-CoV-2 Nucleocapsid Protein C-Terminal Domain Reveals a Novel Recognition Mechanism for Viral Transcriptional Regulatory Sequences Structural Basis for Translational Shutdown and Immune Evasion by the Nsp1 Protein of SARS-CoV-2 SARS-CoV-2 Nsp1 Binds the Ribosomal Mrna Channel to Inhibit Translation Structure of Nonstructural Protein 1 from SARS-CoV-2 Structure and Drug Binding of The SARS-Cov-2 Envelope Protein Transmembrane Domain in Lipid Bilayers Structural Basis for SARS-CoV-2 Envelope Protein Recognition of Human Cell Junction Protein PALS1