key: cord-0585428-9s4d8af4 authors: Panagiotopoulos, Athanasios A.; Kotzampasi, Danai-Maria; Sourvinos, George; Kampa, Marilena; Pirintsos, Stergios; Castanas, Elias; Daskalakis, Vangelis title: The natural polyphenol fortunellin is a dimerization inhibitor of the SARS-CoV-2 3C-like proteinase, revealed by molecular simulations date: 2020-07-15 journal: nan DOI: nan sha: 4a6f7172f7deb6f6114fddfee117c9756080e103 doc_id: 585428 cord_uid: 9s4d8af4 3CL-Pro (or M-Pro) is the SARS-CoV-2 main protease, responsible for the cleavage of the large polyprotein 1ab transcript and the liberation of eleven proteins, responsible for viral growth and replication. It acts exclusively as a homodimer, while monomers are inactive. Due to its pivotal role, 3CL-Pro has been one of the most studied SARS-CoV-2 proteins and the subject of a number of therapeutic interventions, targeting its catalytic domain. A number of potential drug candidates have been reported, including some natural products. Here, we have investigated in silico, through fully flexible binding and extensive molecular dynamics simulations, the natural product space for the identification of potential candidates of 3CL-Pro dimerization inhibitors. We report that fortunellin (acacetin 7-O-neohesperidoside), a natural flavonoid O-glycoside, isolated from the fruits of Citrus japonica var. margarita (kumquat), is a potent inhibitor of 3CL-Pro dimerization. A search of the ZINC natural products database identified another 16 related molecules, which possess interesting pharmacological properties. We propose that fortunellin and its analogs might be the basis of novel pharmaceuticals against SARS-CoV-2 induced COVID-19 disease. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first identified in December 2019, and is the causative agent of coronavirus disease 2019 . It became a global pandemic, threatening the lives of millions of people belonging to sensitive health groups. Based on the World Health Organization (WHO) as of 6 July 2020, more than 11.4 million cases of COVID-19 have been reported in more than 188 countries and territories, resulting in more than 533,000 deaths. Intense scientific effort worldwide resulted in the identification of SARS-CoV-2 genomic structure, viral protein sequence and structure and disease characteristics (1, 2). The SARS-CoV-2 genome, very close to that of the severe acute respiratory syndrome coronavirus (SARS-CoV) (2) , contains genes encoding 3C-like proteinase, RNA-dependent RNA polymerase (RdRp), 20-O-ribose methyltransferase, spike protein, envelope protein, nucleocapsid phosphoprotein, and several polyprotein complexes of known or unknown function (https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/). SARS-CoV-2 3CL-Pro plays a key role in polyprotein processing. It acts as a homodimer (3) , and its structure is similar to the crystal structure of other coronaviruses main proteases (see reference (4) for a review, and references herein). This enzyme plays a vital role in cleaving the large polyprotein 1ab (replicase 1ab, ∼790 kDa) translated by the virus RNA, at 11 different sites, liberating proteins indispensable for viral replication and proliferation, with a unique specificity, not found in any human protease (5) . Therefore, it represents a preferential target for the development of a series of not toxic inhibitors. The individual monomers of SARS-CoV MPro are inactive, and two strategies have been employed to develop enzyme inhibitors: (i) molecules targeting the substrate binding pocket, to block its catalytic activity, and (ii) dimerization inhibitors. A significant effort targeted the inhibition of the enzymatic activity (201 substances were recorder in the ChEMBL database as per July 9, 2020 for the SARS-CoV-2 3CL-Pro, and a number of molecules, active on SARS-CoV protease and reviewed in Reference (6) , including also natural products (7) . In contrast, only scarce reports address the issue of SARS-CoV 3CL-Pro (8, 9) , and none of SARS-CoV-2. Here we have initiated an in silico study, in view of analyzing the dynamics of the monomeric and the homodimeric protein, and exploring possible natural polyphenolic compounds as potential inhibitors of the protein homo-dimerization. Our results returned the natural product fortunellin as a lead compound, which could be used as the basis for the design of novel antiviral compounds. The 3CL-Pro (or MPro) is one of the most studied SARS-CoV-2 proteins, due to its pivotal role in the mode of action of the virus. Indeed, the role of this protease in the cleavage of the coronavirus replicase polyprotein 1ab at eleven conserved sites, giving rise to eleven specific proteins, responsible for the majority of viral actions. This increased interest in this molecule is testified by the existence, in the Swiss Model Biospace of 110 different crystal structures, deposited in the protein data bank (July 2020). Their corresponding codes are presented in Supplemental Table 1 . These models, as an apoprotein, or bound to a number of ligands, show a great structural similarity. An example of comparison of 10 different crystals is presented in Supplemental Figure 1 , together with the corresponding Root Mean Square Deviations (RMSD). For the purpose of this work, and in view of the very small differences of the 3CL-Pro crystals, we have used the 6YB7 unliganded crystal, analyzed with X-ray diffraction. Interestingly, using the GalaxyWeb server, routine refine, we obtain the same 3D protein structure, based on the same 6YB7 crystal template. Employing this conformation of the proteinase, along with that of the 6LU7 crystal in order to increase the conformational phase space sampling, we have performed a molecular dynamic (MD) study for long time periods (70 μs), and a subsequent Markov State Modeling (MSM) on the trajectories. Based on the MSM analysis of the equilibrium MD trajectories, we obtain three macrostates, or conformational states of the monomeric protein, as shown in Figure 1A . The MPro acts as a Cyclic-C2 (Global Symmetry) Homo-dimer (10) (11) (12) (13) . Here, we further simulated the dimeric structure of the SARS-CoV-2 protein, with two independent methods: (1) we used the GalaxyWEB server, routine HOMOMER (http://galaxy.seoklab.org/cgibin/submit.cgi?type=HOMOMER) for the MPro protein oligomer structure prediction from a monomer sequence or structure (Oligomeric state = 2); (2) the HEX 8.0 program (http://hex.loria.fr/), a locally executed software, for protein-protein, or protein-nucleic acid interactions, based on spherical rotated protein complexes, taking into account both surface shape and electrostatic charge. We compared our results with the X-ray diffraction crystal solution (pdb 6YB7, Supplemental Figure 2 ). As shown, a very good homology was found, with global RMSD ~0.5 Å, confirming the validity of our approach. MD of the homodimer of 3CL-Pro (10μs), followed by 10μs parallel-tempering metadynamics at the well-tempered ensemble (PTmetaD-WTE) ( Figure 1B) , revealed three conformations of the homodimer, as derived by minimal surface energy conformations, with the Collective Variable (CV) phase space determined by the torsional angles of residues 3, 4, 5, 6, 84, 135, 141, 164, 167, 171, 175, 178, 179, 180, 190, 195, 217, 284, 285, 286, 290, 291 , 300, and 301. Please refer to Supplementary Methods, for the detailed conformational space functions out of the MSM analysis. The first conformation (C1) corresponds to the crystal structure of the dimer (monomer distance 1.72-1.78 Å), while the other two (C2-C3) correspond to dimer structures with higher (1.86-1.93 Å) distances among monomers, resulting in decreased monomer-monomer interactions (see Figure 2D ). The monomer distances have been calculated based on the minimum distances between residues on the monomer-monomer interface (4, 10, 11, 14, 28, 139, 140, 147, 290 , 298) (4). The identified dimerization surface of 3CL-Pro proteinase was used as a binding groove for the detection of possible natural small molecules, acting as inhibitors of dimerization. We have interrogated the FTMap server (http://ftmap.bu.edu) (14) , which uses the pdb structure of a protein as a scaffold and uses a series of 16 small molecular probes in order to identify binding hot spots, determine drugability and provide information about fragment-based drug discovery (15) . The results of FTMap permit us to design a minimal structure of a potential 3CL-Pro dimerization inhibitor. Interrogating the ZINC database of natural products (https://zinc.docking.org/) with this minimal structure, we have identified fortunellin (ZINC4349204, Figure 2A ) as a potential natural inhibitor of 3CL-Pro dimerization. Binding of fortunellin on the monomeric 6YB7 monomer of 3CL-Pro ( Figure 2B ) revealed a high affinity binding (ΔG -13.9 kcal/mol) using the fully flexible GalaxyWeb server, and a binding at the dimerization interface, with significant interactions with amino acids related to the dimerization of the molecule (Leu 32 , Asp 33 We have used the different poses of the monomeric 3CL-Pro MD solutions for 60 μs, at 1 ns intervals (poses were retrieved every 3 μs) as scaffolds for fortunellin binding (Table 1, Figure 2C ). As shown, the affinity of fortunellin is fluctuating around -12.3 kcal/mol (SD ±1.005 kcal/mol), and, after initial fluctuations, preceding significant changes of the dimerization domain, as expressed by changes of the local RMSD value, as compared to the crystal structure of 3CL-Pro, it stabilizes after 60 μs. In addition, an MSM analysis of the dynamics of MPro, in the presence and absence of fortunellin, were also performed to refine the conformational phase space and further reduce the degrees of freedom, compared to the initial MSM on the monomer. In Figures 2D -E, we present the three MPro states in the presence and absence of fortunellin, respectively on the same refined phase space (CV-1/ CV-2). The C1-C3 minima are assigned in these graphs based on structural feature comparison with Figure 1B . Based on MSM, we calculated that the transitions between the different states (C1-C3) occur at the average time scales of 37.6ns (slow processes) and 3.5ns (fast processes) in the absence of fortunellin ( Figure 2F ). However, the average transition time scale in the presence of fortunellin drops to 1.42ns with only fast transitions ( Figure 2G ). This significantly smaller time-interval results in greater energy changes faster in the presence of fortunellin, blocking the trapping at certain states, and inhibiting the formation of the dimer by sampling non-favorable monomer conformations, i.e. with greater distance among monomers, that favor its dissociation. We note that the C1 minimum that is associated with the crystal structure (active for dimerization) in Figure 2D , it is absent in Figure 2E , indicating that fortunellin disfavors this state. Instead, C1 is displaced by an alternative protein state, indicated by the purple area in Figure 2E . It appears therefore that the natural polyphenol fortunellin disrupts the dimerization of the main SARS-CoV-2 main protease 3CL-Pro and therefore is a good drug (or dietary supplement) candidate for combatting COVID-19 disease. Interrogating the ZINC database for natural products with fortunellin as a bait, we have retrieved 16 natural compounds with similar molecular properties, presented in Table 2 . They were tested for binding to the dimerization domain of the 3CL-Pro monomer, and found to possess an affinity similar or higher to that of fortunellin. Their absorption, distribution, metabolism, and excretion (ADME) characteristics were evaluated in the SwissADME site (16) , and the results are shown in Supplemental Table 2 . It derives that they are all water-soluble compounds, with limited enteric and skin absorption, but they are non-toxic, as they are not predicted to interact with the CYP drug metabolizing enzymes. Awaiting for an efficient vaccine for the prevention of COVID-19 disease, scientific efforts are directed towards repurposing known drugs, or developing new ones, against key viral proteins, involved in the viral replication and proliferation (17, 18) . In addition of targeting proteins involved in viral recognition and entry into the human cells, a significant effort is directed towards viral proteins involved in the replication and proliferation of the virus. One such protein is the SARS-CoV-2 3CL-Pro (or main protease-MPro), a pivotal enzyme, acting exclusively as a homodimer (3) and implicated in the cleavage of the large polyprotein 1ab, liberating 11 different proteins indispensable for viral replication and proliferation (5) . However, the bulk of the scientific effort targets the discovery or repurposing of inhibitors of its enzymatic activity. Here, we have taken another approach, directing our efforts toward the identification of (natural) compounds, which could inhibit the dimerization of the enzyme, indispensable for its biological activity. At a first step, beginning with the published crystal structure of the protein (19), we performed a molecular dynamics study, combined with Markov state model (MSM) theory (20) . This enables the extraction of long-time-scale individual monomer dynamics from rather short-time-scale MD trajectories of 3CL-Pro, in different states. The application and accuracy of the powerful MSM theory has been demonstrated in many cases also by experiments that include protein−protein, or protein-drug binding kinetics, as well as protein folding rates and protein dynamics (21) (22) (23) (24) . The application of the MSM method lies in the approximation of the slow dynamics of the protein, that are associated with its functionality, in a statistically efficient manner and identified a number of important residues among its structure, related also to its dimerization, verifying previous data (4) . To accurately describe the conformational phase space of MPro dimer, we employed the PTmetaD-WTE method biasing the torsional angles of important residues mentioned in the results section and the supplemental material. This latter enhanced sampling technique takes advantage of metadynamics simulations which can efficiently push the system under study to sample many minima, even if ergodicity is hindered by the shape of the energy landscape. It also takes advantage of the replica exchange method, with parallel simulations running at different temperatures (310K-400K). The system under study is able to surpass energy barriers at high temperatures, while it cools down by exchanges with conformations at lower temperatures. This produces an accurate Free Energy Surface (FES) and associated important conformations of MPro were extracted at the FES minima (310K) and were thus reported herein. Having identified the dynamics of the 3CL-Pro structure, we have used the dimerization interface, as a scaffold, for the identification of natural products, impairing molecular dimerization. We have found that fortunellin (acacetin 7-O-neohesperidoside), a natural flavonoid O-glycoside, isolated from the fruits of Citrus japonica var. margarita (kumquat) (25) , binds with high affinity (ΔG -13.9 kcal/mol) to the dimerization interface of the 3CL-Pro, and inhibits its dimerization, by interacting with residues responsible for the dimerization of the protein. Molecular simulation studies, based on the same MSM protocol employed for the unligated protein, revealed important residues that affect the associated MPro-Fortunellin dynamics. Some of them, belong to the C44-P52 loop, which has been proposed to host mutations, however only at the T45, S46, E47, L50 positions (26) . The latter are not listed as important residues in the MPro-fortunellin dynamics. This gives us confidence that fortunellin can target MPro, even when MPro is mutated. More importantly, in the presence of fortunellin, not all MPro conformations are accessible at room temperature. As indicated by the MSM analysis of MPro trajectories in the presence of fortunellin, the C1 conformation (crystal structure) is absent, indicating that the monomers are trapped in minima inactive for dimerization. Previous studies have implicated fortunellin as an activator of anti-oxidant enzymes (HO-1, SOD and CAT), through a direct action on Nrf2 and AMPK pathways, considered as important to protect against oxidative stress (27) . In addition, fortunellin was implicated as a cardioprotective factor in diabetic animals (28) . Here, we extend the actions of this agent, by reporting a direct effect of fortunellin, impairing the dimerization of 3CL-Pro SARS-CoV-2 proteinase, and therefore inhibiting SARS-CoV-2 replication and proliferation. Interestingly, a number of other natural polyphenols, extracted from the ZINC database of natural products share the activity of fortunellin. In details, we have identified apiin (ZINC3983878) and rhoifolin (ZINC3978800) among others, which have been previously studied for their biological effects. Apiin (apigenin-7-apioglucoside), is a natural flavonoid, a diglycoside of the flavone apigenin, isolated from leaves of Apium graveolens var. dulce (celery) and Petroselinum crispum (parsley). Apiin extracted from celery exhibited anti-inflammatory properties, as apiin showed strong inhibitory activity on inducible nitric oxide synthase (iNOS) expression and nitrite (NO) production when added before LPS stimulation of J774.A1 cells (29) . In mice models, apiin had a remarkable scavenging activity on maleic dialdehyde (MDA) and lipofuscin (LPF), promoted total antioxidant capacity (TAOC) and significantly enhanced the activities of superoxide dismutase (SOD), glutathione peroxidase (GSH-Px) and catalase (CAT)(30), by exerting a radical scavenging activity greater than that of absorbic acid (31) and Vitamin E (30) . Apiin also showed a marked antihypertensive effect in experimental pulmonary hypertension in dogs (32)and anti-influenza virus activity in vitro through inhibition of neuraminidase (NA) (33) . Besides, the role of apiin cardiovascular activity as antiarrhythmic and anti-ischemic agent has also been reported (34) . In view of our results, apiin might therefore be a strong drug candidate, as it inhibits SARS-CoV-2 virus and tackling also COVID-19 major disease symptoms. Rhoifolin (apigenin 7-O-neohesperidoside), is a neohesperidoside, a dihydroxyflavone and a glycosyloxyflavone, was first isolated from plant Rhus succedanea (Sumac or wax tree, originating from Asia, but also found in Australia and New Zealand) (35) . Resently, rhoifolin was found to efficiently block the enzymatic activity of SARS-CoV 3CL-Pro (36), with a methodology similar to that used in the present study. In addition, rhoifollin was reported to inhibit CVB3 infection, a primary cause of viral myocarditis in humans. In addition, it was found to decrease inflammation, by significantly decreasing prostaglandin E2 and the release of pro-inflammatory cytokines (TNF-α, IL-1β, and IL-6) (37, 38). Rhoifolin isolated from Citrus grandis leaves was beneficial in metabolic diseases, including type II diabetes, by enhancing adiponectin secretion, tyrosine phosphorylation of insulin receptor-β and GLUT4 translocation (39) . Rhoifolin also caused a decrease of mean aortic pressure, of the arterial and pulmonary capillary pressure and of heart rate in the dog (40) . Moreover, previous study has demonstrated that rhoifolin can have an inhibitory effect on angiotensin-converting enzyme (ACE) activity, which plays a key role in the regulation of arterial blood pressure (40) . In conclusion, our in silico strategy identified a series of natural flavonoids, which, in the form of drugs or dietary supplements, might be an effective strategy against the devastating SARS-CoV-2 infections, in humans. Interrogation of the Swiss Model Biospace (July 2020) resulted in 110 crystal structures of 3CL-Pro, bound or not to ligands, and deposited to the Protein Data Bank (https://www.rcsb.org/) (41) . Here, we have used the crystal with reference 6YB7, representing the dimeric molecule in the absence of a ligand. Ligands were retrieved from the ZINC database (http://zinc.docking.org/) (42) , usually in a canonical smiles format. Fully flexible binding on the GalaxyWeb server (http://galaxy.seoklab.org/), and dimerization of the protein were performed as described previously (43) . Comparison of molecular structures was made with Chimera (44) . For molecular dynamics studies, the 6YB7, 6LU7 crystals were employed, with Amber03 fields forces (45) . Classical/ Enhanced samopling MD simulations were performed in GROMACS 2020 (46), for long time-periods up to 90 μs, with a time step of 1 fs . Retrieved trajectories were further analyzed by Markov state modeling (MSM) (20, 24, 47, 48) . Time-structure independent components analysis (tICA) was used to reduce the dimentionality of our data, in PyEMMA (49) . Please, refer to the online Supplement for a detailed description of our methods. SARS-CoV-2 main protease (in pdb format) and ligand(s) (in mol2 or pdb formats) were uploaded to the GalaxyWebserver (http://galaxy.seoklab.org/) and a fully flexible docking (involving the receptor and the ligand) was performed (5-10). The server uses an algorithm, based on the GalaxyDock2 docking (11) , which, after an automatic prediction of the ligand binding pocket, permits a full ligand/receptor flexibility during binding simulation. This step is followed by optimization and subsequent refinement, through a specific algorithm named GalaxyRefine (6, 12) , which permits a protein-ligand structure refinement, by applying iterative side chain repacking and overall structure relaxation (6, 13) . The best solution (usually denoted as "Model 1") was retained. The corresponding pdb file (containing the MPro-ligand complex) was retrieved. The 3D structures of the liganded and unliganded MPro were compared, using the UCSF Chimera 1.11.2 program (14), available from https://www.cgl.ucsf.edu/chimera/. The same program was used for comparisons of the retrieved solutions with available crystals Simulation of MPro dimerization 1) Dimerization of MPro: The active form of the MPro SARS-CoV-2 main protease, is a homodimer (15) , as a Cyclic-C2 (Global Symmetry) Homo-2-mer A2 (Global Stoichiometry) (16) (17) (18) (19) . Here, we used two different docking programs to simulate the active dimer and we compare the obtained solutions with the crystal 6YB7: (1) We used the GalaxyWEB server, routine HOMOMER (http://galaxy.seoklab.org/cgibin/submit.cgi?type=HOMOMER) for the MPro protein oligomer structure prediction from a monomer sequence or structure (Ologomeric state = 2). The server carries out template-based modeling, ab initio docking or both, depending on the availability of proper oligomer templates, until 5 models are generated (5). (2) We used the Hex 8.0.8 program (http://hex.loria.fr/) (20, 21) . Hex 8.0.8, a specialized, locally executed, program, for protein-protein, or protein-nucleic acid interactions, based on spherical rotated protein complexes, taking into account both surface shape and electrostatic charge. Hex returns, through a graphical user interface, a set of >100 solutions, with the corresponding ΔG values. The input files for Hex 8.0.8 program are pdb files of MPro with or without ligand and the output file is a pdb file that contains the simulated MPro dimer. The 3D structures of the monomer and dimer structure of MPro from experimental crystal structures, GalaxyWEB server and Hex 8.0.8 program were compared, using the UCSF Chimera 1.11.2 program. The same program was also used to determine the dimerization interphase of the monomers. The crystal structures of SARS-CoV-2 main protease MPro, or 3CL-PRO (pdb codes 6LU7 (22) and 6YB7) were used as initial coordinates to build the models. The pdb structures refer to one monomer without and with inhibitor, respectively. For consistency, the inhibitor was removed from the crystal structure of MPro (6LU7) (22) . Our choice of coordinates was based on the completeness of the resolved MPro sequence and the quality of chains (at least 90%). To build the MPro dimer, we structurally aligned either two 6lu7, or two 6yb7 monomers on a reference dimer. (23) The protonation states of titratable residues were simulated at neutral pH, thus all Glu, and Asp residues were left deprotonated, except Glu-290 which was protonated, in accordance also with the PDB2PQR (propka 3.0 method, pH 7.3) predictions.(24) His-41, His-163, His-172 and His-246 were protonated only at the Nε site. The rest of His residues were protonated only at the Νδ sites, to maintain the hydrogen bonding network within the crystal structures. All crystallographic water molecules are retained within each crystal structure. Four samples were, thus prepared, in a consistent way (two monomers, two dimers). The all-atom models, as defined previously, were embedded in triclinic boxes of around 7.2nm x 11.2nm x 8.0nm (monomer), or 12.3nm x 12.3 nm x 12.3 nm (dimer) in the x, y and z dimensions, respectively. Up to around 57000 TIP3P water molecules(25) were used to hydrate each protein. Ion (K + , Cl -) concentration was set at the value of 150 mM to mimic the physiological salt content for the monomer or dimer models, in addition to zero, or excess concentrations of KCl, NaCl, CaCl 2 at 0, 150, 300, 400 and 500 mM only for the monomer models. The various anionic strengths were only employed to indirectly 'enhance' the sampled conformational space of the MPro. A surplus of K + was also added to neutralize the protein charges in each sample, resulting in simulation unit boxes of 62400 (monomer), or 181800 (dimer) atoms. The Amber03(26) protein force field was used for the residues and ions. The Amber03 parameters for the natural products were derived based on ACPYPE algorithm.(27) The equilibration-relaxation for the all-atom systems is employed based on a published protocol for water-soluble proteins. (28) This contains a steepest descend energy minimization with a tolerance of 0.5 kJ mol -1 for 1000 steps, and a sequence of isothermal (nVT), isothermal-isobaric (nPT) runs with the gradual relaxation of the constraints on protein heavy atoms (from 10 4 in steps1-2 to 10 3 kJ mol -1 nm -2 in step-4) and Cα atoms (from 10 3 in step-5, to 10 2 in step-6, 10 in step-7, 1 in step-8 and 0 kJ mol -1 nm -2 in step-9) for around 30 ns, with a time step of 1.0 fs (steps 1-4) and 2.0 fs (steps 5-9). In detail: (step-1) Constant density and temperature (nVT) Brownian dynamics (BD) at 100 K for 50 ps that employs the Berendsen thermostat,(29) with a temperature coupling constant at 1.0 fs. (steps 2-3 ) Two short constant density (nVT) and constant pressure (nPT) runs for 100 ps each, with a weak coupling Berendsen thermostat and barostat(29) at 100 K employing time coupling constants of 0.1 ps for the temperature and isotropic 50.0 ps coupling for the pressure with a compressibility of 4.6x10 -5 . (step-4) Heating from 100 to 250 K in a constant density ensemble (nVT) for 3 ns employing the v-rescale thermostat,(30) with a time coupling constant of 0.1 ps. (step-5) Heating from 250 to 310K in a constant pressure ensemble (nPT) for 2 ns, employing the v-rescale thermostat (30) and Berendsen barostat, (29) with time coupling constants of 0.1 ps for the temperature and 2.0 ps for the pressure, removing also all but the Cα-atom protein position restraints. (step-6) Equilibration at 310K (0.1 ps temperature coupling constant) for 5 ns (nPT, 1 atm, 2.0 ps coupling constant for pressure. (steps 7-8) Equilibration at 310K (0.5 ps temperature coupling constant) for 5 ns (nPT, 1 atm, 2.0 ps coupling constant for pressure). (step-9) Equilibration at 310K (0.5 ps temperature coupling constant) for 10 ns (nPT, 1 atm, 2.0 ps coupling constant for pressure). The barostatsthermostats employed for steps 6-9 were the same as in the production trajectories that follow. For the production all-atom classical Molecular Dynamics (MD), the Newton's equations of motion are integrated with a time step of 2.0 fs at 310K. All production simulations are run with the leap-frog integrator in GROMACS 2020(31) for 3.0 μs each. They were performed at the constant pressure nPT ensemble, with isotropic coupling (compressibility at 4.5x10 -5 ) employing the v-rescale thermostat(30) (310K, temperature coupling constant 0.5) and the Parrinello-Rahman barostat(32) (1 atm, pressure coupling constant 2.0). Details for parameters can be found in earlier work. (28) The first 500 ns were considered further equilibration from each independent trajectory per sample, and were disregarded in the analysis. Van der Waals interactions were smoothly switched to zero between 1.0-1.2 nm with the VERLET cut-off scheme. Electrostatic interactions were truncated at 1.2 nm (short-range) and long-range contributions were computed within the PME approximation. (33, 34) Hydrogen bond lengths were constrained employing the LINCS algorithm. (35) We obtained a series of MD equilibrium trajectories of the MPro monomers (60μs) of SARS-CoV-2 main protease under different salts/ ionic strengths, without inhibitors. These should have explored a major part of the MPro conformational phase space. We combined the allatom MD simulations with Markov state model (MSM) theory (36) (37) (38) in order to enable the extraction of long-time-scale individual monomer dynamics from rather short-time-scale MD trajectories of different states. The application and accuracy of the powerful MSM theory has been presented in many cases also by experiments that include protein−protein, or protein-drug binding kinetics, as well as protein folding rates and protein dynamics. (39) (40) (41) (42) Our objective was to approximate the slow dynamics in a statistically efficient manner. Thus, a lower dimensional representation of our simulation data was necessary. In order to reduce the dimensionality of our feature space, we employed the time-structure independent components analysis (tICA) which yields a representation of our molecular simulation data with a reduced dimensionality and can greatly facilitate the decomposition of our system into the discrete Markovian states necessary for MSM estimation. The conformations of the system were projected on these slowest modes as defined by the tICA method, then the trajectory frames were clustered into 100 cluster-centers (microstates) by k-means clustering, as implemented in PyEMMA. (43) Conformational changes of a system can be simulated as a Markov chain, if the transitions between the different conformations are sampled at long enough time intervals so that each transition is Markovian. This means that a transition from one conformation to another is independent of the previous transitions. Therefore, an MSM is a memoryless model. The uncertainty bounds were computed using a Bayesian scheme. (44, 45) We found that the slowest implied timescales converge quickly and are constant within a 95% confidence interval for lag times above 50ns. The validation procedure is a standard approach in the MSM field., a lag time of 50 ns was selected for Bayesian model construction, and the resulting models were validated by the Chapman-Kolmogorov (CK) test. Subsequently, the resulting MSMs were further coarse grained into a smaller number of three metastable states or microstates, using PCCA++ as implemented in PyEMMA. (43) The optimum number of microstates (three) was proposed based on the VAMP2-score. (46) Both the convergence of the implied timescales, as well as the CK test confirm the validity and convergence of the MSM. The CK test indicates that predictions from the built MSM agree well with MSMs estimated with longer lag times. Thus, the model can describe well the longtime-scale behavior of our system within error. The tICA method identified the torsional angles of the following MPro residues: 3, 4, 5, 6, 84, 135, 141, 164, 167, 171, 175, 178, 179, 180, 190, 195, 217, 284, 285, 286, 290, 291, 300 , and 301 as the most important features, by setting a series of thresholds for the coefficients in the tICA vectors. At first, we set a threshold of 0.09. We continued by setting a threshold of 0.04 for the coefficients in the tICA vectors of the filtered data and afterwards a threshold of 0.085. Finally, we set a threshold of 0.075 and thus we concluded in the previously referred MPro residues. For the selection of these thresholds we checked for different thresholds the VAMP2-score and the states projected onto the first two independent components. We report the exact functions of the first two tICA components, as a linear combination of cosine/ sine functions of the associated torsionals: The residues that seem to affect the MPro-fortunellin dynamics are derived based on the same MSM/ tICA protocol described before in the absence of the inhibitor. These include: 44, 48, 53, 82, 83, 84, 111, 112, 118, 137, 138, 139, 141, 159, 182, 238, 239, 240, 286, 287, 288, 289, 291 . At first, we set a threshold of 0.075. We continued by setting a threshold of 0.12 for the coefficients in the tICA vectors of the filtered data, a threshold of 0.18, a threshold of 0.065 and afterwards a threshold of 0.135. Finally, we set a threshold of 0.125 and thus we concluded in the previously referred residues. However, the MPro states in the presence of fortunellin, reported in Figure 2E of the main manuscript, are derived based on the projection of the Mpro-fortunellin trajectory data on the aforementioned phase space of the CV1, CV2 functions. To enhance the conformational sampling on the MPro dimer we employed the parallel tempering metadynamics in the well-tempered ensemble (PTmetaD-WTE) method. (47) (48) (49) (50) Nine replicas per sample were run at 310, 320, 330, 341, 352, and 363K, 375K, 387K and 400K in which only the potential energy (PE) was initially biased (bias factor 120) to achieving large fluctuations in PE and replica overlaps. Replicas were allowed to exchange every 1000 steps for 0.2μs each, which gave an exchange probability around 20% in the WTE. The obtained bias was saved and used for the subsequent PTmetaD production runs for another 0.5μs per sample/ replica. Nine replicas were again considered at the same temperatures. Including the equilibration time at reach replica, a cumulative simulation time of 10μs was achieved. An exchange was attempted every 1000 steps, that gave an exchange probability between replicas at around 20%, consistent with the large sample sizes. The Collective Variables (CVs) chosen for the PTmetaD runs were the first two tICA vectors presented above (CV1/ CV2). A combination of the GROMACS 2020/ PLUMED 2.5(51) engines was employed. A bias factor of 25 at the well-tempered ensemble, along with Gaussians of 1.2 kJ/mol initial height, and sigma values (width) of 0.25 in the CV space, deposited every 2 ps, was employed. The grid space for both CVs is defined between -4 to 4 at a resolution of 0.05. Four different PTmetaD-WTE runs were performed for the pdb 6yb7/ 6lub7 -based dimers at 150mM KCl without inhibitor. Supplemental Table 1 PDB codes of 3CL-PRO crystals, deposited in the PDB databank, accessed at July 3, 2020. 5r7y, 5r7z, 5r80, 5r81, 5r82, 5r83, 5r84, 5r8t, 5re4, 5re5, 5re6, 5re7, 5re8, 5re9, 5rea, 5reb, 5rec, 5red, 5ree, 5ref, 5reg, 5reh, 5rei, 5rej, 5rek, 5rel, 5rem, 5ren, 5reo, 5rep, 5rer, 5res, 5ret, 5reu, 5rev, 5rew, 5rex, 5rey, 5rez, 5rf0, 5rf1, 5rf2, 5rf3, 5rf4, 5rf5, 5rf6, 5rf7, 5rf8, 5rf9, 5rfa, 5rfb, 5rfc, 5rfd, 5rfe, 5rff, 5rfg, 5rfh, 5rfi, 5rfj, 5rfk, 5rfl, 5rfm, 5rfn, 5rfo, 5rfp, 5rfq, 5rfr, 5rfs, 5rft, 5rfu, 5rfv, 5rfw, 5rfx, 5rfy, 5rfz, 5rg0, 5rg1, 5rg2, 5rg3, 5rgg, 5rgh, 5rgi, 5rgj, 5rgk, 5rgl, 5rgm, 5rgn, 5rgo, 5rgp, 5rgq, 5rgr, 5rgs, 5rgt, 5rgu, 5rgv, 5rgw, 5rgx, 5rgy, 5rgz, 5rh0, 5rh1, 5rh2, 5rh3, 5rh4, 5rh5, 5rh6, 5rh7, 5rh8, 5rh9, 5rha, 5rhb, 5rhc, 5rhd, 5rhe, 5rhf, 6lu7, 6lze, 6m03, 6m0k, 6m2n, 6m2q, 6w63, 6wnp, 6wqf, 6wtj, 6wtk, 6wtm, 6wtt, 6xa4, 6xb0, 6xb1, 6xb2, 6xbg, 6xbh, 6xbi, 6xch, 6y2e, 6y2f, 6y2g, 6y84, 6yb7, 6ynq, 6yt8 , 6yvf, 6yz6, 6z2e, 7bqy, 7bro, 7brp, 7brr, 7buy, 7c8r, 7c8t The ratio of sp3 hybridized carbons over the total carbon count of the molecule MR Molecular refractivity TPSA Topological polar surface area iLOGP Efficient Description of n-Octanol/Water Partition Coefficient XLOGP3 XLOGP3 predicts the logP value of a query compound by using the known logP value of a reference compound as a starting point. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study A new coronavirus associated with human respiratory disease in China Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved alpha-ketoamide inhibitors Targeting the Dimerization of the Main Protease of Coronaviruses: A Potential Broad-Spectrum Therapeutic Strategy From SARS to MERS: crystallographic studies on coronaviral proteases enable antiviral drug design Knowledge-based structural models of SARS-CoV-2 proteins and their complexes with potential drugs Recognition of Natural Products as Potential Inhibitors of COVID-19 Main Protease (Mpro): In-Silico Evidences The N-terminal octapeptide acts as a dimerization inhibitor of SARS coronavirus 3C-like proteinase The interaction between severe acute respiratory syndrome coronavirus 3C-like proteinase and a dimeric inhibitor by capillary electrophoresis Long-range cooperative interactions modulate dimerization in SARS 3CLpro Mutation of Asn28 disrupts the dimerization and enzymatic activity of SARS 3CL(pro) Two adjacent mutations on the dimer interface of SARS coronavirus 3C-like protease cause different conformational changes in crystal structure Residues on the Dimer Interface of SARS Coronavirus 3C-like Protease: Dimer Stability Characterization and Enzyme Catalytic Activity Analysis The FTMap family of web servers for determining and characterizing ligandbinding hot spots of proteins Fragment-based identification of druggable 'hot spots' of proteins using Fourier domain correlation techniques SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules While We Wait for a Vaccine Against SARS-CoV-2, Why Not Think About Available Drugs? Six months of coronavirus: The mysteries scientists are still racing to solve SARS-CoV-2 main protease with unliganded active site (2019-nCoV, coronavirus disease 2019, COVID-19). in RSCB PDB Markov models of molecular kinetics: generation and validation Mesoscale All-Atom Influenza Virus Simulations Suggest New Substrate Binding Mechanism CV1 = (0.22245479085963174)*COS(PHE3PHI)+( ASN84PHI)+(0.08785335907805712)*COS(ASN84PSI)+( HIS164PHI)+(0.1736049601096124)*COS(HIS164PSI)+( LEU167PHI)+(0.010408062573537607)*COS(LEU167PSI)+( LEU167PSI)+(-0.035449253177574046)*COS(VAL171PHI)+(0.0397831323983392)*SIN(VAL171PHI)+( VAL171PSI)+(-0.034102380825128384)*SIN(VAL171PSI)+(0.030847079909921624)*COS(THR175PHI)+(-0.020040648643118112)*SIN(THR175PHI)+(0.12177024626152966)*COS(THR175PSI)+( GLU178PHI)+(-0.09137875539957069)*COS(GLU178PSI)+(0.11214339094322172)*SIN(GLU178PSI)+(-0.09294920607798879)*COS(GLY179PHI)+(0.28997810094722837)*SIN(GLY179PHI)+( GLY179PSI)+(-0.09673320535071989)*SIN(GLY179PSI)+(0.010926172367439396)*COS(ASN180PHI)+( ASN180PHI)+(-0.14912645633298552)*COS(ASN180PSI)+(0.3816102785675176)*SIN(ASN180PSI)+(-0.10964069317986454)*COS(THR190PHI)+(-0.009927774218497826)*SIN(THR190PHI)+( THR190PSI)+(-0.16639818413751062)*SIN(THR190PSI)+(0.08017482811083185)*COS(GLY195PHI)+(-0.14979887114706295)*SIN(GLY195PHI)+(-0.050958292897186695)*COS SER284PSI)+(0.6649555215066218)*COS(ALA285PHI)+(0.44940445619426095)*SIN 0.7837894441282054)*COS(ALA285PSI)+(-0.7618813787744232)*SIN(ALA285PSI)+(-0.41060021549010234)*COS(LEU286PHI)+( LEU286PSI)+(0.2011752565537491)*COS(GLU290PHI)+(0.08698680780975648)*SIN( GLU290PHI)+( PHE291PHI)+(-0.17343642924439104)*SIN(PHE291PHI)+(-0.11571250082711405)*COS(PHE291PSI)+(0.02324511232530362)*SIN(PHE291PSI)+(-0.047478679669659275)*COS CV2 = (-0.2884241825864551)*COS(PHE3PHI)+(0.1818110423100234)*SIN(PHE3PHI)+(-0.4677450443427827)*COS(PHE3PSI)+(-0.5483897439796918)*SIN(PHE3PSI)+(-0.3553107265200588)*COS(ARG4PHI)+(-0.22394962064864046)*SIN(ARG4PHI)+( ARG4PSI)+(-0.3829402238055943)*SIN(ARG4PSI)+( COS(ASN84PHI)+(-0.08339140826693826)*SIN(ASN84PHI)+(-0.38318457142235807)*COS VAL171PSI)+(-0.02648078463371077)*SIN(VAL171PSI)+(0.13636122184567023)*COS(THR175PHI)+(-0.16662872008089358)*SIN(THR175PHI)+(0.005026593046059509)*COS(THR175PSI)+(-0.020823512651369957)*SIN(THR175PSI)+(-0.15569457005305837)*COS ASN180PSI)+(-0.47359288885540496)*SIN(ASN180PSI)+( SER284PHI)+(0.050094307429749026)*COS(SER284PSI)+(0.19617816860701598)*SI N(SER284PSI)+(0.1826732916695155)*COS(ALA285PHI)+(-0.035338601991516574)*SIN(ALA285PHI)+(-0.19924762948366187)*COS(ALA285PSI)+(-0.1564466277523662)*SIN(ALA285PSI)+(-0.0443686239963879)*COS(LEU286PHI)+( LEU286PHI)+(0.45176938273388506)*COS The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling The Protein Data Bank ZINC 15--Ligand Discovery for Everyone Open Babel: An open chemical toolbox GalaxyHomomer: a web server for protein homooligomer structure prediction from a monomer sequence or structure GalaxyRefine: Protein structure refinement driven by side-chain repacking GalaxySite: ligand-binding-site prediction by using molecular docking GalaxyWEB server for protein structure prediction and refinement Galaxy7TM: flexible GPCR-ligand docking by structure refinement Prediction of Protein Structure and Interaction by GALAXY Protein Modeling Programs GalaxyDock2: protein-ligand docking using betacomplex and global optimization GalaxyRefine2: simultaneous refinement of inaccurate local regions and overall protein structure Effective protein model structure refinement by loop modeling and overall relaxation UCSF Chimera--a visualization system for exploratory research and analysis Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved alpha-ketoamide inhibitors Long-range cooperative interactions modulate dimerization in SARS 3CLpro Mutation of Asn28 disrupts the dimerization and enzymatic activity of SARS 3CL(pro) Two adjacent mutations on the dimer interface of SARS coronavirus 3C-like protease cause different conformational changes in crystal structure Residues on the Dimer Interface of SARS Coronavirus 3C-like Protease: Dimer Stability Characterization and Enzyme Catalytic Activity Analysis HexServer: an FFTbased protein docking server powered by graphics processors Evaluation of protein docking predictions using Hex 3.1 in CAPRI rounds 1 and 2 Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors Quaternary Structure of the SARS Coronavirus Main Protease PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations Structure and dynamics of the TIP3P, SPC, and SPC/E water models at 298 K A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations ACPYPE -AnteChamber PYthon Parser interfacE Structure and Dynamics of a Thermostable Alcohol Dehydrogenase from the Antarctic Psychrophile Moraxella sp Molecular dynamics with coupling to an external bath Canonical sampling through velocity rescaling GROMACS: A message-passing parallel molecular dynamics implementation Polymorphic transitions in single crystals: A new molecular dynamics method Particle mesh Ewald: An N⋅ log (N) method for Ewald sums in large systems Ewald summation for systems with slab geometry LINCS: a linear constraint solver for molecular simulations Everything you wanted to know about Markov State Models but were afraid to ask Markov models of molecular kinetics: Generation and validation Markov state models of biomolecular conformational dynamics Protein conformational plasticity and complex ligand-binding kinetics explored by atomistic simulations and Markov models Complete protein-protein association kinetics in atomic detail revealed by molecular dynamics simulations and Markov modelling Molecular Simulation of ab Initio Protein Folding for a Millisecond Folder NTL9(1−39) Mesoscale All-Atom Influenza Virus Simulations Suggest New Substrate Binding Mechanism PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models Probability distributions of molecular observables computed from Markov models Estimation and uncertainty of reversible Markov models Variational Approach for Learning Markov Processes from Time Series Data New advances in metadynamics Well-tempered metadynamics: A smoothly converging and tunable free-energy method Enhanced sampling in the well-tempered ensemble Free-Energy Landscape for β Hairpin Folding from Combined Parallel Tempering and Metadynamics PLUMED 2: New feathers for an old bird SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules We acknowledge PRACE for awarding us access to Joliot-Curie at GENCI@CEA (Irene), France, through the "PRACE support to mitigate impact of COVID-19 pandemic" call and the project "Epitope vaccines based on the dynamics of mutated SARS-CoV-2 proteins at all atom resolution". We also acknowledge Greece and the European Union