key: cord-0816609-qpztmdnw authors: Guo, Jingxu; Douangamath, Alice; Song, Weixiao; Coker, Alun R.; Edith Chan, A.W.; Wood, Steve P.; Cooper, Jonathan B.; Resnick, Efrat; London, Nir; von Delft., Frank title: In crystallo-screening for discovery of human norovirus 3C-like protease inhibitors date: 2020-07-16 journal: J Struct Biol X DOI: 10.1016/j.yjsbx.2020.100031 sha: ee10ef00bf9c73978cc652ed7da31d26d7013934 doc_id: 816609 cord_uid: qpztmdnw Outbreaks of human epidemic nonbacterial gastroenteritis are mainly caused by noroviruses. Viral replication requires a 3C-like cysteine protease (3CL(pro)) which processes the 200 kDa viral polyprotein into six functional proteins. The 3CL(pro) has attracted much interest due to its potential as a target for antiviral drugs. A system for growing high-quality crystals of native Southampton norovirus 3CL(pro) (SV3CP) has been established, allowing the ligand-free crystal structure to be determined to 1.3 Å in a tetrameric state. This also allowed crystal-based fragment screening to be performed with various compound libraries, ultimately to guide drug discovery for SV3CP. A total of 19 fragments were found to bind to the protease out of the 844 which were screened. Two of the hits were located at the active site of SV3CP and showed good inhibitory activity in kinetic assays. Another 5 were found at the enzyme’s putative RNA-binding site and a further 10 were located in the symmetric central cavity of the tetramer. Gastroenteritis accounts for the deaths of over 2,000 children every day worldwide, making it the second leading cause of death for children under the age of 5, more than the combination of AIDS, malaria and measles (Liu et al., 2012) . Whilst there are many other causes of gastroenteritis, including parasites, bacteria and viruses, human caliciviruses are recognised as the leading cause of gastroenteritis worldwide among people of all ages. The Caliciviridae family contains five genera known as norovirus, vesivirus, nebovirus, sapovirus and lagovirus (Clarke et al., 2012) with norovirus being the most common cause of disease in humans (Lambden et al., 1993) . Noroviruses account for more than 50% of gastroenteritis cases and at least 90% of nonbacterial acute gastroenteritis cases worldwide, as reported by the Centers for Disease Control and Prevention in the US (2011). Scallan et al. (2011) estimated that 99% of all viral foodborne illness incidents are caused by noroviruses which corresponds to 5.5 million per year in the US alone. From 2009 to 2013, around 62.5% of norovirus cases needed longterm care facilities in order to control the transmission (Vega et al., 2014) . Statistics are generally similar in Europe (Baert et al., 2009; Phillips et al., 2010) . Globally, it is estimated that noroviruses lead to a total of $4.2 billion in direct health system costs and $60.3 billion in social cost per year (Bartsch et al., 2016) . Clinical treatment and intervention is hampered by the lack of licensed vaccines or antivirals. Treatment with human immunoglobulin did show some benefit but did not result in clearance of the virus (Florescu et al., 2008) . Whilst development of a vaccine has been hindered by the lack of small-animal models and cell culture systems, a number of norovirus vaccines are yielding promising results in clinical trials 4 Noroviruses are genetically classified into 7 genogroups, GI -GVII, based on the amino acid sequence of the VP1 capsid protein and are further segregated into at least 40 genotypes (Vinjé, 2015) . Noroviruses from groups GI (like Southampton virus) and GII infect humans, as do members of the GIV.1 subgroup. GII viruses are the most frequently detected (89%) while GII.4 are the major cause of norovirus outbreaks worldwide (Siebenga et al., 2009 ). Many noroviruses have been reported such as Norwalk virus (Jiang et al., 1993) , Hawaii virus (Lew et al., 1994a) , Snow Mountain virus (Lochridge and Hardy, 2003) , Desert Shield virus (Lew et al., 1994b) , Southampton virus (Clarke and Lambden, 1997) and Lordsdale virus (Lambden et al., 1993) . The norovirus genome consists of a single-stranded positive-sense RNA of 7.5-7.7 kb in length and contains three open reading frames (ORFs) (Lambden et al., 1993) , except for the murine norovirus which has a fourth alternative ORF (McFadden et al., 2011) . ORF1 encodes a 200 kDa non-structural polyprotein which is co-and post-translationally cleaved into six or seven non-structural proteins by the viral 3C-like protease (NS6). The seven products of this proteolysis are, from N-terminus to C-terminus: p48 (NS1-2), an NTPase (NS3), a 3A-like protein (p22, NS4), a viral genome-linked protein (VPg, NS5), the 3C-like protease (3CL pro , NS6) and an RNA-dependent RNA polymerase (RdRp, NS7) (Blakeney et al., 2003) . ORF2 and ORF3 encode the capsid protein VP1 and the minor structural protein VP2, respectively. The 3C-like protease (3CL pro ) was named because of its similarity to the picornavirus 3C protease. It is a cysteine protease which shows a typical chymotrypsin-like fold containing two domains: a β-barrel domain and a β-sheet domain separated by a groove where the active site is located (Bazan and Fletterick, 1988; Boniotti et al., 1994) . The active site is characterised by a catalytic dyad (Cys139-His30) (Someya et al., 2002) or triad (Cys139-His30-Glu54) (Tiew et al., 2011) and shows a strong preference for a -D/E-F/Y-X-L-Q-G-P-(X can be H, Q, or E) sequence corresponding to the subsites S 5 -S 4 -S 3 -S 2 -S 1 -S 1 '-S 2 ' (Tiew et al., 2011) . Studies have indicated that norovirus 3CL proteases have a preferential order of processing the polyprotein, for example, the Southampton virus 3CL pro has a preference for cleavage at LQ-GP and LQ-GK, but it can also cleave at ME-GK, FE-AP and LE-GG (Hussey et al., 2011) . Although several norovirus 3CL pro structures have been determined (Hussey et al., 2011; Nakamura et al., 2005; Zeitler et al., 2006) , the full structural basis of how these enzymes recognise these different sites is still unknown. The key role of norovirus 3CL pro in the processing of the polyprotein and the absence of homologues in the human host make it an excellent target for antiviral drug discovery. There is currently no clinically approved norovirus 3CL pro inhibitor available but several compounds have been reported with strong inhibitory activity against 3CL proteases in vitro. These are usually peptidyl or macrocyclic compounds mimicking the substrate sequence whilst possessing a transition state analogue (Damalanka et al., 2017; Kankanamalage et al., 2015; Mandadapu et al., 2012) . Examples include peptidyl aldehydes and α-ketoamides which showed strong inhibition of norovirus 3CL pro , and the 3C or 3C-like proteases in picornaviruses and coronaviruses in cell-based assays . The aldehydes and α-ketoamides act as warheads which form a reversible adduct with the catalytic residue Cys139 in the active site . These compounds are named as latent transition state (TS) inhibitors. TS mimics, such as α-hydroxyphosphonate, are converted to the aldehyde form either with or without catalytic action of the enzyme and form a tetrahedral adduct with the Cys139 residue (Kankanamalage et al., 2015) . Hussey et al. (2011) first reported the X-ray structure of the Southampton norovirus 3CL pro (SV3CP) with an inhibitor bound. This compound consisted of part of the most rapidly cleaved substrate sequence (EFQLQ) with a Michael acceptor moiety linked to the S 1 residue Gln. This is attacked by Cys139 and a covalently bound complex is formed. Interestingly, the His30 sidechain is pushed away by the inhibitor, which disrupts the catalytic triad. 6 Screening by mass-spectrometry for covalent inhibitors of SV3CP has been described by us previously (Resnick et al., 2019) . In this work we have crystallised the protease in its native form with an unperturbed catalytic triad and have conducted crystal-based fragment screening of 844 compounds with the aim of discovering novel inhibitory functional groups which have the potential to be developed as therapeutic agents, either on their own or through chemical coupling. A total of 19 compounds were found to bind to 3CL pro in the crystals and two of them were located in the active site while another 5 were located at the enzyme's putative RNA-binding site. A further 10 compounds were found to bind in the central cavity of this putative tetrameric form of the enzyme. Expression and purification of SV3CP was conducted using the method described by Hussey et al., (2011) . Screening for crystallisation conditions for SV3CP was accomplished using the sitting-drop method at 21 °C with the screening kits: Structure Screen 1 & 2, JCSGplus, PACT premier, MIDAS and Morpheus from Molecular Dimensions (Suffolk, UK). A TTP Labtech Mosquito crystal screening robot (TTP Labtech, Hertfordshire, UK) was used to dispense 400 nl of the protein, at concentrations of 5 mg/ml and 10 mg/ml, with 400 nl of the corresponding well solution into each drop. High quality crystals were obtained in 0.2 M ammonium citrate and 12% (v/v) PEG3350 after approximately one week, although crystals kept appearing over the next 2-3 months prior to screening. Selected crystals were cryo-protected in 30% glycerol and mounted in loops before flashcooling. X-ray data were collected at beamline I04-1 at Diamond Light Source (DLS, Didcot, England). Fine-sliced data were collected as guided by the strategy suggested by the 7 program EDNA (Incardona et al., 2009) . Data were processed automatically by the program xia2 (Winter, 2010) at DLS, which revealed the space group to be C2, as shown in Table 1 . Further analysis using Phenix.xtriage (Zwart et al., 2005) suggested that the data were of good quality. The solvent content of this crystal form was estimated to be 44.9 % using Matthews_coef (Kantardjieff and Rupp, 2003) . Several rounds of manual rebuilding and correction were performed using Coot (Emsley and Cowtan, 2004) followed by restrained refinement using Refmac5 (Murshudov et al., 2011) and Phenix.refine (Afonine et al., 2012) . Since the crystal diffracted to near atomic resolution, the temperature factors were refined anisotropically. Structure validation was performed with MolProbity (Chen et al., 2010). The statistics for data collection, data processing and refinement are shown in Table 1 . (v/v) of 0, 10%, 20%, 30% and 40%, and on soaking time scales of 1h, 3h and overnight. In order to make the experiment more efficient, the crystals were also tested with and without additional cryo-protectant for data collection. It was found that these crystals could survive in 40% DMSO for many hours and additional cryo-protection was not required. The plates containing crystals were imaged using a Rock imager system (Formulatrix, USA ). All the crystals were then ranked using the program TeXRank (Ng et al., 2014) and positional coordinates for the injection of the fragments were manually defined in the drop. Each fragment from the DSLP library (776 fragments) (Cox et al., 2016) Fragment soaking was conducted in batches to give an average soaking time of approximately 2.5 hours prior to crystal mounting. Crystal harvesting was aided by the use of a crystallisation plate shifter (Oxford Lab Technologies, Oxford, UK). All the crystals were mounted in loops of about the same size as the crystals or slightly smaller to allow for automated, unattended data collection in which the X-ray beam was aimed at the centre of each loop. A total of 180° of data were collected for each crystal, taking approximately 60 seconds per crystal using DLS beamline I04-1. The data produced were managed using XChemExplorer (Krojer et al., 2017) which gathered ligand information and data processing results and launched different software pipelines, such as DIMPLE (Wojdyr et al., 2013) PanDDA (Pearce et al., 2016) for further analysis and hit identification. PanDDA uses an average of several ground-state crystal structures to calculate a background density correction which reveals better electron density for weakly bound fragments. All the hits were checked visually by using the program Pandda.inspect in the PanDDA suite (Pearce et al., 2016) . The hits were further refined using Refmac5 (Murshudov et al., 2011) followed by inspection using Coot (Emsley and Cowtan, 2004) for several rounds (Table 1 ). In most cases anisotropic B-factor refinement was undertaken and the fragment occupancy was fixed. Confirmatory omit maps for the ligands were generated using the program Composite omit map (Terwilliger et al., 2008) in the PHENIX program suite (Adams et al., 2010) . Interactions between ligands and SV3CP were analysed using LigPlot + (Wallace et al., 1995) . The protease (0.5 mg/ml final concentration) in a buffer containing 100 mM Tris, pH 8.5, and 5 mM β-mercaptoethanol was mixed with the fragment (dissolved in DMSO at concentrations of 0.027, 0.135, 0.27, 0.405 and 0.54 mM) for 20 min at RT. The solution was then mixed with the chromogenic substrate (Ac-EFQLQ-para-nitroaniline; Peptide Protein Research Ltd, Southampton, UK), which was dissolved in DMSO to give final concentrations of 0.4, 0.9, 1.4, 1.9, 2.5 and 3.0 mM, in a 1:1 ratio and the absorbance at 405 nm was measured at 20 s intervals over a 3 min period, using a Nanodrop ND1000 spectrophotometer. The K i values were determined using GraphPad Prism (www.graphpad.com). The structure of native SV3CP has been determined for the first time at the near-atomic resolution of 1.3 Å resolution (Fig. 1a ) revealing a crystallographic tetramer (Fig. 1b) . The monomers consist of an N-terminal and a C-terminal domain with the active site cleft located in between. As found in other noroviral 3CL pro structures, the N-terminal domain contains an α-helix and a twisted 7-stranded antiparallel β-sheet forming an incomplete β-barrel (Anand et al., 2002; Birtley et al., 2005; Mosimann et al., 1997) . The C-terminal domain is made up of 6 β-strands forming an antiparallel β-barrel and contains the catalytic cysteine residue (Cys139) which makes a catalytic triad with two residues from the N-terminal domain (His30 and Glu54; Fig. 1a ). Interestingly, the β-hairpin formed by β9 and β10, which is involved in binding the N-terminal side of the substrate peptide, adopts an appreciably different conformation from that observed in an earlier inhibitor-complexed structure ( The SV3CP enzyme has approximately 90 % sequence identity with other GI noroviral 3C proteases and an identity of the order of 68 % with the enzyme from the GII genotype. SV3CP has approximately 59 % identity with the mouse norovirus enzyme. The monomer structures of these enzymes superpose with SV3CP with a Cα RMSD of typically 1.0 -1.2 Å for virtually all of the amino acids in the chains. The structures differ most noticeably in the hairpin linking strands β9 and β10 which is close to the active site. In line with other noroviral 3C proteases which have been analysed by gel-filtration, it is highly likely that SV3CP forms dimers in solution or, at least, exists in a monomer -dimer equilibrium Leen et al., 2012; Zeitler et al., 2006) . Accordingly, a dimer 11 is observed in the crystallographic asymmetric unit of SV3CP (Fig. 2, chains A and B) . However, analysis with the PDBePisa website (Krissinel and Henrick, 2007) suggested a tetrameric form (Fig. 1b) might also be stable in solution. The interface area between the chains of the crystallographically observed dimers (formed by chains A and B) is 883.0 Å 2 . However, a neighbouring dimer in the crystal structure forms an interface of comparable buried surface area (692.3 Å 2 ) between chains labelled A and D chains and likewise for chains labelled the B and C. This result indicates that higher order oligomers may possibly be formed by SV3CP dimers, such as the putative tetramer shown in Fig. 1b et al., 2015) . Given that localised replication centres are known to form within norovirus-infected cells (e.g. Thorne and Goodfellow, 2014) , a high local concentration of 3CL pro may allow the enzyme to tetramerise. In the native SV3CP structure, no electron density is visible for the last 8 residues (ASEGETTL) at the C-terminal end of the protein. Since these residues are well-defined in the complex with a substrate analogue (Hussey et al., 2011) , their absence in the native structure might be due to autolysis during storage or crystallisation of the uninhibited protease. In this region of the structure, there is a minor consensus sequence for SV3CP cleavage with the following amino acids VQ-AS corresponding to the P 2 -P 1 -P 1 '-P 2 ' positions (Hussey et al., 2011; Kankanamalage et al., 2015) suggesting that slow autolysis prior to crystal growth is possible. Mass spectrometric analysis of the purified protein yielded a molecular mass of 19,290 daltons ( Supplementary Fig. 2 ) confirming that the protease was indeed fully intact at the time of crystallisation. Therefore another possibility is that this region of the molecule is simply disordered in the new crystal form. However, it is not clear why this should be since this region of both monomers is not involved in crystal contacts in either crystal form. Most crystals used in the non-covalent fragment screening experiment diffracted to resolutions ranging from 1.5 to 1.8 Å with good crystallographic statistics (Table 1) . Fragment J12 is the worst in terms of resolution, diffracting to approximately 2.1 Å, although the electron density is still of good quality. Screening with the DSPL library and part of the Maybridge Ro3 library identified 19 ligands in total which bind in five different sites, as illustrated in Fig. 2 . The majority of fragments have mean B-factors which are comparable with those of the protein moieties (Table 1 ). In only one case (J02) was the occupancy of the fragment refined, although for several others it was set to 0.5 due to the fragments residing on a 2-fold axis. Site A, the protease active site, is a long groove containing the catalytic Cys139 residue. Two fragments (J01 and J02) were found to bind here, each on different sides of the catalytic cysteine (Fig. 2) . Five hits (J03-J07) were found to bind in the putative RNA binding site (site B) including one (J07) which also binds in another site, site C. Site C lies in a pocket between chains A and B and the symmetry related chains A' and B', with 11 hits being identified (J07-J17) here. Two other fragments were found at additional sites: D (J18) and E (J19). Molecular structures of the ligands J01-J19 are given in Fig. 3. Two non-covalently bound fragments were identified in the active site of the protease named as J01 and J02, as indicated by their omit maps ( Fig. 4a and c) . J01 binds in the S 1 subsite where its carboxyl group is oriented towards S 2 and S 3 . J01 forms several direct hydrogen bonds with the side chains of Gln110 and Arg112 and makes some additional hydrogen bonds mediated by a water molecule ( Fig. 4a and b) . These residues are at the tip of the functionally important β-hairpin (connecting strands β9 and β10) that is involved in substrate recognition and moves substantially upon binding of polypeptide substrate analogues (Fig. 1 ). However, in the presence of J01, the β-hairpin adopts the same conformation as the ligand-free SV3CP, suggesting that binding of this fragment does not alter its conformation. Since the carboxyl group of J01 appears to hold the β-hairpin loop (residues 109 to 112) in the closed conformation, this must help to prevent the enzyme from adopting the 'open' conformation that can accommodate the substrate. The ligand -NH group (N1) is also within hydrogen bonding distance of the main chain carbonyl group of Thr134. The benzoic acid moiety of J01 forms makes many hydrophobic interactions with the active site residues including Pro136, Cys139 and Ala160. In contrast, the 5-methyl-2-thienyl group forms fewer contacts with the enzyme than the aromatic group since it points away from the active site towards a large solvent channel. J02 resides on the other side of the long active site, where it occupies the S 2 subsite without forming any hydrogen bonds ( Fig. 4c and d) . Instead, the phenyl ring is sandwiched between the side chains of His30 of the catalytic triad and Arg112 from the β-hairpin loop by  - stacking and cation - interactions. Interestingly, the guanidinium group of Arg112 has moved from its position in the other fragment complex to accommodate J02. Several hydrophobic interactions are formed between this fragment and Glu54 from the catalytic triad and Val114, and a number of contacts are made with a symmetry-related molecule. In kinetic assays both J01 and J02 showed inhibitory activity against SV3CP with K i values of 0.37 mM and 0.34 mM, respectively. These values are typical of initial hits in crystallographic fragment-screening studies targetting catalytic-or allosteric-sites of enzymes (Bauman et al., 2013; Delbert et al., 2018; Zhang et al., 2019) suggesting that the binding modes we observe in 3CL pro are highly relevant. Since J01 and J02 bind in the active site cleft and maintain the closed conformation of the hairpin, they are good candidates for developing further inhibitors and linking them into a new compound could also improve the bioactivity. A superposition of their binding modes on that of the covalently bound Michael acceptor inhibitor (Fig. 5 ) demonstrates how these two fragments occupy the S 1 and S 2 subsites, respectively. J02 does not overlap with the P 2 residue of the polypeptide inhibitor as well as J01 and the P 1 residue do, since it appears to lie somewhere between the spatially adjacent S 2 and S 1 ' subsites. In addition to the protease activity, studies on viral 3C proteases suggested that they or their larger precursors can bind specifically to the 5'-terminal nucleotides of the viral RNA (Leong et al., 1993; Nayak et al., 2006) . The interaction occurs only on the plus strand which forms a ribonucleoprotein (RNP) complex that is necessary for the initiation of the plus strand 15 synthesis (Andino et al., 1990) . It has been shown that human noroviral RNA noncompetitively inhibits the protease activity with an IC 50 of in the µM range (Viswanathan et al., 2013) . The RNA binding site has been studied by mutagenesis in other homologous 3C proteases, in which a key arginine residue was identified in the conserved sequence, KF/VRDI (F/V represents F or V) (Bergmann et al., 1997; Leong et al., 1993; Nayak et al., 2006) . Structural comparison of SV3CP with HRV 3CL pro (PDB ID: 5fx5; Kawatkar et al., 2016) and FMDV 3CL pro (PDB ID: 2j92; Nayak et al., 2006) identified Arg65 as the equivalent residue in SV3CP, which is within a KIRPDL sequence that has similarity with the consensus. The R and D residues in this sequence interact by a salt-bridge that forms one side of the putative RNA binding site of SV3CP (site B) which is shown in Fig. 2 and, as for the FMDV and HRV proteases, it is a shallow groove. In addition, these sites are in crystal contact areas and form deep channels with the neighbouring symmetry-related molecules in HRV, FMDV and SV3CP 3CL pro . Inhibitors binding in the RNA binding site have the potential to inhibit noroviral replication and are therefore of interest as a separate class of drug. Fragments J03-J06 were found to reside at this site and their contact residues are shown in Fig. 6 . All the fragments form hydrophobic contacts with Arg65 and other residues in the KIRPDL sequence. While J03 ( Fig.6a and b) and J06 ( Fig. 6g and h) are mainly involved in hydrophobic interactions, J04 ( Fig.6c and d) and J05 ( Fig. 6e and f) also form many hydrogen bonds with the neighbouring residues, potentially making them stronger binders. The carbonyl group (O1) of J04 is involved in three hydrogen bonds formed, directly or mediated by a water molecule, with Thr10, Lys11 and Ser91 (although the latter residue is from a symmetry related molecule). The N1 atom forms two hydrogen bonds with Ser7 and Pro3 (also from the symmetry mate) with the participation of a water molecule. A hydrogen bond is also seen between the fluorine substituent in the indole ring of J04 and the NE1 atom on the side chain of Trp19. This residue is one of a number of quite solvent-exposed aromatic residues including phenylalanines 12, 25, 39 and 40 which form the putative RNAbinding site. J05 also forms water-mediated hydrogen bonds with Ser91 from the symmetry related molecule. Unlike the active site fragments which bind in different subsites of the substrate-binding channel, these four fragments bind in approximately the same position with their aromatic 'heads' overlapping to a large degree but their aliphatic 'tails' pointing away in different directions. Since binding of viral RNA inhibits the protease activity (Viswanathan et al., 2013) , ligands binding at this site have the potential both to interfere both with RNA binding and with the protease activity. However, since this site is of the order of 20 Å from the catalytic centre the mechanism of protease inhibition is currently difficult to explain. Fragment J07 was found to bind in both the putative RNA binding site (B, Fig. 6i and j) and site C ( Fig. 7a and b) in the centre of the putative tetramer. The finding that the native crystals of the enzyme are formed by a tetrameric assembly of monomers is suggestive of a physiological role for the tetramer. We were also intrigued to find that the majority of the fragments binding to the protease (J07 -J17, Fig. 7) were located in a cavity at the centre of the putative tetramer, site C. The site is characterised by the convergence of two-fold symmetry axes, both crystallographic and non-crystallographic, since the NCS two-fold relating the monomers in each dimer and the crystallographic twofold relating both dimers in the tetramer meet at this point. The binding site is formed by four copies of the hydrophobic amino acids Leu122 and Val82 as well as Arg100 which are provided by all chains of the tetramer. These residues have a high level of sequence conservation. The sidechain of the arginine tends to form extensive stacking interactions with the aromatic moieties of the ligand. Since this site is formed at the convergence of 2fold axes, two copies of each ligand are present at this site and sometimes the two symmetry-related copies of the fragment interact extensively with each other. Since the same tetrameric assembly is observed in other GI and GII norovirus proteases, this binding site may be a conserved feature of these enzymes. Given its ability to bind so many heteroaromatic fragments and the diverse functions which noroviral proteins and their precursors are known to have (e.g. Emmott et al., 2019) , it is tempting to speculate that the tetramer cleft has a physiological role, perhaps even as a secondary substrate-or RNAbinding site. Two of the fragments (J18 and J19, Fig. 8 ) were found to bind at unrelated sites involving crystal contacts which are probably not of physiological significance. Site D lies close to Lys11, Lys88 and Glu93 whereas site E lies between Arg59 and the C-terminal end of the enzyme. The amide bond within J18 has apparently been cleaved and the resulting fragments, trifluroacetic acid and 2-ethyl-1,3,4-thiadiazole, bind at sites C and D, respectively. Interestingly, it appears that the amide bond in J11 has also been cleaved and the resulting 2-ethyl-1,3,4-thiadiazole binds instead at site C. A check on the stock solution of this compound was made mass spectrometry and this yielded a main mass of 130 daltons, which is within a dalton of the predicted molecular mass of the observed fragment. It is possible that the electron withdrawing groups on the amino terminal side of the amide bonds of these two compounds may render them unstable in water. The X-ray structure of the Southampton virus 3CL pro has been determined at 1.3 Å resolution in a crystal form that has allowed fragment-screening for novel inhibitors to be undertaken at similar resolutions. Two fragments were found to bind in the active site cleft of the protease. J01 and J02 bind in different subsites of the long active site (see Fig. 5 ) but both of them interact with the functionally important β-hairpin linking strands β9 and β10. J01 occupies S 1 and forms hydrophobic interactions with catalytic Cys139 while J02 occupies S 2 and forms hydrophobic and π-π interactions with Glu54 and His30, which are also from the catalytic triad. Both J01 and J02 could potentially be developed into more potent norovirus protease inhibitors, however, a better ligand might ultimately be obtained by coupling them together, given that the distance between the closest two atoms is slightly less than 3.8 Å. Some of the remaining fragments were found to interact with the protease at its putative RNA-binding site. Whilst these compounds are likely to have less effect on the protease activity than J01 and J02, which bind in the active site, RNA binding to the enzyme has been shown to cause non-competitive inhibition of the protease (Viswanathan et al., 2013) . Other fragments were found to bind at an additional site which is buried deeply in the centre of the crystallographic tetramer. The fact that a C193A mutant of the Minerva virus protease forms the same tetramer in the crystal with the C-terminus of one subunit occupying the active site cleft of another monomer (Muzzarelli et al., 2019) , suggests that this assembly may also be involved in proteolytic maturation of noroviruses. Hence, compounds that have the potential to interfere with formation of the tetramer or affect its stability may impact on noroviral replication and therefore deserve to be screened for in vivo activity, e.g. against mouse norvirus, which can be cultured, or in a suitable replicon assay. If such studies were to be successful, the highly symmetric nature of the binding site is something that could, in principle, be exploited in drug design. Given the recent COVID-19 pandemic, it is potentially useful to compare our results on SV3CP with the 3CL pro of coronavirus (e.g. Yang et al., 2003) . The two enzymes have quite low sequence identity of approximately 12 % within the common protease moieties and superimpose with an RMSD of 2.4 Å for 126 structurally aligned residues. The coronavirus protease is considerably larger (303 residues) than SV3CP due to the presence of a Cterminal domain which is involved in dimerisation. Although topologically similar, the protease moieties of both structures differ very substantially in the loop regions connecting the core β-strands. In spite of these differences, coronavirus protease also has specificity for Gln at the P 1 position of substrate. In very recent fragment screening of the SARS-CoV-2 protease, 23 active site hits were obtained which span the S 3 to S 1 ' subsites of the enzyme, thus providing somewhat better coverage of the active site cleft than we have achieved with SV3CP (Douangamath et al., 2020) . Other SARS-CoV-2 protease inhibitor structures have also been reported in recent months (Dai et al., 2020; Jin et al., 2020a; Zhang et al., 2020) . This resurgence of interest in rational 3CL pro drug design is likely to have combined benefits for what are currently intractable and severe viral infections. These studies provide a rational basis on which compounds with improved potency can be designed by medicinal chemists. Table 1 . X-ray statistics for the native SV3CP structure and fragment complexes. Values in parentheses are for the high resolution shell. For the minority of structures where the overall fragment occupancy was either refined or is less than unity due to proximity with a symmetry axis, the fractional occupancy is shown following the mean fragment B-factor. The β-hairpin loop connecting strands β9 and β10 moves significantly from its position in the native structure (which is very close to its position in the J01 and J02 complexes) upon binding the polypeptide inhibitor. Interactions between SV3CP and fragments J07-J17 which bind in site C at the centre of the putative tetramer. These are shown in 3D with the omit electron density contoured at 1.0 RMSD as (a, c, e, g, i, k, m, o, q, s, u) and in 2D with interacting residues shown in (b, d, f, h, j, l, n, p, r, t, v) , respectively. Hydrogen bonds are indicated by dashed lines in cyan and hydrophobic interactions are indicated by red eyebrow-like icons. Protein chain identifiers are indicated by the letters A and B in brackets and those with a prime are from symmetry-related chains. PHENIX: a comprehensive Python-based system for macromolecular structure solution Towards automated crystallographic structure refinement with phenix.refine Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra α-helical domain A functional ribonucleoprotein complex forms around the 5′ end of poliovirus RNA Reported foodborne outbreaks due to noroviruses in Belgium: the link between food and patient investigations in an international context Global Economic Burden of Norovirus Gastroenteritis Crystallographic Fragment Screening and Structure-Based Optimization Yields a New Class of Influenza Endonuclease Inhibitors Viral cysteine proteases are homologous to the trypsinlike family of serine proteases: structural and functional implications The Refined Crystal Structure of the 3C Gene Product From Hepatitis A Virus: Specific Proteinase Activity and RNA Recognition Crystallographic and electrophilic fragment screening of the SARS-CoV-2 main protease Polyprotein processing and intermolecular interactions within the viral replication complex spatially and temporally control norovirus protease activity Coot: model-building tools for molecular graphics Structure determination of Murine Norovirus NS6 proteases with C-terminal extensions designed to probe protease-substrate interactions Two cases of Norwalk virus enteritis following small bowel transplantation treated with oral human serum immunoglobulin A 25 structural study of norovirus 3C protease specificity: binding of a designed active sitedirected peptide inhibitor EDNA: a framework for plugin-based applications applied to X-ray experiment online data analysis Sequence and genomic organization of Norwalk virus Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors Structural basis for the inhibition of SARS-CoV-2 main protease by antineoplastic drug carmofur Structureguided design and optimization of dipeptidyl inhibitors of norovirus 3CL protease. Structureactivity relationships and biochemical, X-ray crystallographic, cell-based, and in vivo studies Matthews coefficient probabilities: Improved estimates for unit cell contents of proteins, DNA, and protein-nucleic acid complex crystals Design and Structure-Activity Relationships of Novel Inhibitors of Human Rhinovirus 3C Protease Broad-spectrum antivirals against 3C or 3C-like proteases of picornaviruses, noroviruses, and coronaviruses Inference of macromolecular assemblies from crystalline state The XChemExplorer graphical workflow tool for routine or large-scale protein-ligand structure determination Sequence and genome organization of a human small round-structured (Norwalk-like) virus Structure of a murine norovirus NS6 proteaseproduct complex revealed by adventitious crystallisation Human rhinovirus-14 protease 3C (3Cpro) binds specifically to the 5'-noncoding region of the viral RNA. Evidence that 3Cpro has different domains for the RNA binding and proteolytic activities Molecular characterization of Hawaii virus and other Norwalk-like viruses: evidence for genetic polymorphism among human caliciviruses Molecular characterization and expression of the capsid protein of a Norwalk-like virus recovered from a Desert Shield troop with gastroenteritis Child Health Epidemiology Reference Group of WHO and UNICEF Global, regional, and national causes of child mortality: an updated systematic analysis for 2010 with time trends since Snow Mountain virus genome sequence and virus-like particle assembly Norovirus vaccines under development peptidyl α-ketoamides and α-ketoheterocycles Production and Clinical Evaluation of Norwalk GI.1 Virus Lot 001-09NV in Norovirus Vaccine Development Phaser crystallographic software Norovirus regulation of Refined X-ray crystallographic structure of the poliovirus 3C gene product REFMAC5 for the refinement of macromolecular crystal structures Structural and Antiviral Studies of the Human Norovirus GII.4 Protease A norovirus protease structure provides insights into active and substrate binding site integrity Role of RNA Structure and RNA Binding Activity of Foot-and-Mouth Disease Virus 3C Protein in VPg Uridylylation and Virus Replication Norovirus antivirals: Where are we now Using textons to rank crystallization droplets by the likely presence of crystals A Multi-Crystal Method for Extracting Obscured Signal from Crystallographic Electron Density Community incidence of norovirus-associated infectious intestinal disease in England: improved estimates using viral load for norovirus diagnosis Rapid Covalent-Probe Discovery by Electrophile-Fragment Screening Foodborne illness acquired in the United States--major pathogens Norovirus illness is a global problem: emergence and spread of norovirus GII. 4 variants Identification of active-site amino acid residues in the Chiba virus 3C-like protease Iterative-build OMIT maps: map improvement by iterative model building and refinement without model bias Norovirus gene expression and replication Design, synthesis, and evaluation of inhibitors of Norwalk virus 3C protease Genotypic and epidemiologic trends of norovirus outbreaks in the United States Advances in laboratory methods for detection and typing of norovirus Norovirus Protease Shows pH-Sensitive Proteolysis with a Unique Arg-His Pairing in the Catalytic Site RNA binding by human Norovirus 3C-like proteases inhibits protease activity LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions xia2: an expert system for macromolecular crystallography data reduction DIMPLE -a pipeline for the rapid generation of difference maps from protein crystals with putatively bound ligands The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor X-ray crystallographic structure of the Norwalk virus protease at 1.5-Å resolution Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors Construction of a Shape-Diverse Fragment Set: Design, Synthesis and Screen against Aurora-A Kinase Xtriage and Fest: automatic assessment of X-ray data and substructure structure factor estimation We thank Profs P. M. Shoolingin-Jordan and I. N. Clarke (University of Southampton) for providing the expression construct and for numerous helpful discussions. We also thank Anthony Aimon (DLS Ltd) for fragment mass-spectrometric analysis. Noroviruses responsible for 99% of viral foodborne illness Norovirus 3C-like protease is excellent drug target X-ray fragment-screening A total of 19 fragment hits were found Two located at the active site and showed inhibitory activity Five found at the enzyme's putative RNA-binding site Ten in the symmetric central cavity of the tetramer