key: cord-0694114-00iixj0f
authors: Moiani, Davide; Link, Todd M.; Brosey, Chris A.; Katsonis, Panagiotis; Lichtarge, Olivier; Kim, Youngchang; Joachimiak, Andrzej; Ma, Zhijun; Kim, In-Kwon; Ahmed, Zamal; Jones, Darin E.; Tsutakawa, Susan E.; Tainer, John A.
title: An efficient chemical screening method for structure-based inhibitors to nucleic acid enzymes targeting the DNA repair-replication interface and SARS CoV-2
date: 2021-09-27
journal: Methods Enzymol
DOI: 10.1016/bs.mie.2021.09.003
sha: 6b31e46075f967a48d193a270233fb1267aef8be
doc_id: 694114
cord_uid: 00iixj0f

We present a Chemistry and Structure Screen Integrated Efficiently (CASSIE) approach (named for Greek prophet Cassandra) to design inhibitors for cancer biology and pathogenesis. CASSIE provides an effective path to target master keys to control the repair-replication interface for cancer cells and SARS CoV-2 pathogenesis as exemplified here by specific targeting of Poly(ADP-ribose) glycohydrolase (PARG) and ADP-ribose glycohydrolase ARH3 macrodomains plus SARS CoV-2 nonstructural protein 3 (Nsp3) Macrodomain 1 (Mac1) and Nsp15 nuclease. As opposed to the classical massive effort employing libraries with large numbers of compounds against single proteins, we make inhibitor design for multiple targets efficient. Our compact, chemically diverse, 5000 compound Goldilocks (GL) library has an intermediate number of compounds sized between fragments and drugs with predicted favorable ADME (absorption, distribution, metabolism, and excretion) and toxicological profiles. Amalgamating our core GL library with an approved drug (AD) library, we employ a combined GLAD library virtual screen, enabling an effective and efficient design cycle of ranked computer docking, top hit biophysical and cell validations, and defined bound structures using human proteins or their avatars. As new drug design is increasingly pathway directed as well as molecular and mechanism based, our CASSIE approach facilitates testing multiple related targets by efficiently turning a set of interacting drug discovery problems into a tractable medicinal chemistry engineering problem of optimizing affinity and ADME properties based upon early co-crystal structures. Optimization efforts are made efficient by a computationally-focused iterative chemistry and structure screen. Thus, we herein describe and apply CASSIE to define prototypic, specific inhibitors for PARG vs distinct inhibitors for the related macrodomains of ARH3 and SARS CoV-2 Nsp3 plus the SARS CoV-2 Nsp15 RNA nuclease.

An efficient chemical screening method for structure-based inhibitors to nucleic acid enzymes targeting the DNA repair-replication interface and SARS CoV-2 3.3 Ligand files preparation 10 3.4 Optimize the ligands for in silico study 11 4. Protein target and grid preparation 11 4.1 Prepare the protein structure 11 4.2 Generate a grid in the target area 11 4.3 Grids for macrodomains 12 4.4 Grids for Nsp15 trimeric assembly 12 5. Virtual screen and ranking parameter selections 12 5.1 Run a virtual screen calculation 12 5.2 Identify the top 10% compounds based on the docking score 12 5.3 Rescore these top 10% compounds with the MMGBSA energy score 13 5.4 Remove compound redundancies 13 5.5 Pick top scoring ligands with unique chemical features 13 5.6 Final selection for in vitro experiments 14 5.7 PARG in silico leads 14 5.8 ARH3 in silico leads 14 5.9 SARS CoV-2 MAC1 (Nsp3) in silico leads 14 5.10 SARS CoV-2 Nsp15 in silico leads 15 6. Fast qualitative and quantitative in vitro binding assays of top in silico results 16 6.1 MST assay 16 6.2 SPR assay 17 6.3 Limitation of these two assays and data evaluation for top in silico leads 17 7. Structural analysis of top in silico compounds 18 7.1 What to look for in initial co-crystal structures 18 8. Summary and prospects for advances 19 Acknowledgments 21 References 21

We present a Chemistry and Structure Screen Integrated Efficiently (CASSIE) approach (named for Greek prophet Cassandra) to design inhibitors for cancer biology and pathogenesis. CASSIE provides an effective path to target master keys to control the repair-replication interface for cancer cells and SARS CoV-2 pathogenesis as exemplified here by specific targeting of Poly(ADP-ribose) glycohydrolase (PARG) and ADP-ribose glycohydrolase ARH3 macrodomains plus SARS CoV-2 nonstructural protein 3 (Nsp3) Macrodomain 1 (Mac1) and Nsp15 nuclease. As opposed to the classical massive effort employing libraries with large numbers of compounds against single proteins, we make inhibitor design for multiple targets efficient. Our compact, chemically diverse, 5000 compound Goldilocks (GL) library has an intermediate number of compounds sized between fragments and drugs with predicted favorable ADME (absorption, distribution, metabolism, and excretion) and toxicological profiles. Amalgamating our core GL library with an approved drug (AD) library, we employ a combined GLAD library virtual screen, enabling an effective and efficient design cycle of ranked computer docking, top hit biophysical and cell validations, and defined bound structures using human proteins or their avatars. As new drug design is increasingly pathway directed as well as molecular and mechanism based, our CASSIE approach facilitates testing multiple related targets by efficiently turning a set of interacting drug discovery problems into a tractable medicinal chemistry engineering problem of optimizing affinity and ADME properties based upon early co-crystal structures. Optimization efforts are made efficient by a computationally-focused iterative chemistry and structure screen. Thus, we herein describe and apply CASSIE to define prototypic, specific inhibitors for PARG vs distinct inhibitors for the related macrodomains of ARH3 and SARS CoV-2 Nsp3 plus the SARS CoV-2 Nsp15 RNA nuclease.

Because of the high failure rate and extremely high cost of clinical trials, many drugs going into clinical trials are often modifications of approved drugs. For example, we identified the antibiotic Novobiocin (NVB) in a high throughput screen, as a DNA polymerase theta (POLθ) synthetically lethal with Homologous Recombination (HR) deficiency and thus a candidate target for HR-deficient cancers . However, it is difficult to alter the complex structure of approved drugs and natural products for a new target (Atanasov et al., 2021; Omanakuttan et al., 2012) . On the other side of the size range, fragment screening where crystals are probed with small molecules was designed to identify novel chemical structures (Murray & Blundell, 2010; Wilson 3rd et al., 2021) , but obtaining larger compounds with higher affinity and selectivity plus with the proper orientation as indicated by the fragment-bound structures has proved problematic for drug discovery. Conceptually, an ideal process would be to start drug discovery using a Goldilocks (GL) compound library that is pre-selected for ideal drug characteristics and that is sized bigger than fragments but smaller than approved drugs. This intermediate size enables chemical elaboration to improve molecule properties, either for binding or for drug characteristics. To exemplify the versatility of our approach, we present our GLAD library (our GL library combined with an Approved Drug or AD library) and our Chemistry and Structure Screen Integrated Efficiently (CASSIE) pipeline and then their application to define lead inhibitors for two distinct target families: ADP ribosylase macrodomains and SARS CoV-2 Nsp15 endoribonuclease.

Controlling repair-replication and innate immune responses, macrodomains are an important drug target family: they counteract ADPribosylation in DNA repair and in viral infection, and thus are a fundamental cellular response to stress. In the context of DNA damage, they regulate efficient targeting and kinetics for the DNA damage response and are a target for synthetic lethality in DNA repair deficient cancer cells. Trapped PARP1 and DNA adducts facilitate MRE11 and RAD51 recruitment (Moiani et al., 2018; Syed & Tainer, 2018) and stall replication forks. Their repair involves removal of poly-ADP-ribosylation (PARylation) by macrodomaincontaining glycohydrolases such as poly(ADP-ribose) glycohydrolase (PARG) (Houl et al., 2019) . In the host antiviral environment, viral pathogens such as SARS CoV-2 employ a related macrodomain, Nsp3 Mac1, to remove ADP-ribosylation to dysregulate host immune response (Brosey et al., 2021) . These features make macrodomains a focus for structure-based drug discovery as promising therapeutic targets both for cancer and antiviral agents. A key issue for DNA repair and virus macrodomain targets concerns homology of viral and human macrodomains and its impact on inhibitor binding pockets.

As part of an effort to develop inhibitors for additional SARS CoV-2 non-structural proteins (Nsp), we also applied our CASSIE approach for the SARS CoV-2 endoribonuclease Nsp15. Nsp15 is implicated in viral pathogenicity and evading host innate immune response as defective Nsp15 led to reduced viral pathogenicity in CoV models (Ancar et al., 2020; Deng et al., 2019; Kindler et al., 2017; Yuen et al., 2020) . Its nidoviral uridylate-specific endoribonuclease (NendoU) function may regulate viral RNA synthesis, limit recognition of viral RNA by cellular sensors, and block early, protective antiviral host cell response. In particular, Nsp15 may help CoV evade host detection by trimming long 5 0 -polyuridine (polyU) tracts, a pathogen-associated molecular pattern (PAMP) that cells use to detect viruses (Hackbart, Deng, & Baker, 2020) . Nsp15 NendoU activity is necessary and sufficient for inhibiting eIF2α-dependent and -independent stress granules (SGs) that function as an antiviral hub and whose suppression is needed for CoV replication (Gao et al., 2021) . Thus, an Nsp15i could be of value by critically combining a direct anti-viral effect by enabling the early, protective immune response to viruses that thereby averts later excessive dysregulated damaging immune activation (Channappanavar et al., 2016; Channappanavar & Perlman, 2017) ; consistently, an Nsp15 natural product inhibitor (Nsp15i) efficiently neutralized SARS-CoV-2 (Hong et al., 2021) . Yet, Nsp15 is understudied and virus-specific with no viable clinical drug, and the few cell-active inhibitors identified lack specificity (Choi et al., 2021; Hong et al., 2021) . Thus, we apply CASSIE to target Nsp15, as it is essential in CoV biology (Kim et al., 2020; Xu et al., 2021) and as it could be an independent or anti-CoV cocktail target.

As an overview of our methodology, we begin by harnessing Evolutionary Trace (ET) methods to distinguish conserved and variable sequence areas of the active site pocket as shown for PARG and Nsp3 Mac1 (Brosey et al., 2021) . Based upon an evolutionary perspective of surface sequence conservation, we focus on sites offering optimal specificity. Next, we use our CASSIE approach of in silico screening of our selective GLAD library against available protein structures and experimental validation by binding assays (Fig. 1) . By employing a chemically-diverse, tractably-sized 5000 compound GL library of compounds larger than fragments but smaller than drugs, we collapse potentially immense chemical library space into a sparse set of cell-friendly chemotypes. Our tactical approach for efficient computational screens is to identify the top 15-100 candidates for binding measurements. Top binding candidates are employed for crystallization experiments to obtain X-ray crystal structures of inhibitor candidates with micromolar or better binding. When human structures are not available, useful crystallographic feedback is obtained by employing avatars that incorporate essential target site features (Moiani et al., 2018) . We use structural results to provide a rational basis to build focused chemical libraries to improve binding affinity plus favorable ADME (absorption, distribution, metabolism, and excretion) properties and toxicological chemotypes. We show here how our GLAD library and CASSIE pipeline can provide inhibitor tools for cell biology and leads for preclinical drug discovery. 

The multiple interfaces, that DNA repair proteins have with their substrate and with other DNA repair and replication proteins and that viral proteins have for pathogen-host interactions, pose a challenge for in silico screening. It is key to define a target functional site and to identify residues relevant to the ligand docking volume. While structural and biological data may help define target sites, evolutionary sequence conservation complements the identification of novel regions of functional relevance and helps to prioritize sites. ET analysis combines phylogenetic relationships and sequence information to identify residues and regions of functional importance (Lichtarge, Bourne, & Cohen, 1996; Mihalek, Res, & Lichtarge, 2004) . Importantly, ET scores protein residues according to the phylogenetic distances of the homologous sequences that vary at that residue and outputs these scores in a range of 0 (most important) to 100 (least important). These scores can be subsequently mapped to available structures or protein structure predictions (Lua & Lichtarge, 2010) .

Clusters of high-ranking residues signify functionally important regions, such as active sites or critical allosteric sites (Lees-Miller et al., 2020) , with confidence in these clusters made stronger by the independence of ET calculation from available structural information. Notably, these identified residues can be input into the in silico screening programs to define Virtual High-Throughput Screening (VHTS) receptor space. See Section 5 for VHTS input into the docking program, GLIDE. Here, we outline how to use the ET server (http://evolution.lichtargelab.org/) (Lichtarge et al., 1996; Mihalek et al., 2004) .

The ET server uses the Blastall 2.2.15 program (Altschul et al., 1997) and the NCBI non-redundant (nr) UniRef90, and Uniref100 databases (Maglott, Ostell, Pruitt, & Tatusova, 2007; Suzek et al., 2015) . Typically, at least 50 homologous sequences in an alignment are used to calculate the ET scores. The exact number depends on the availability of homologous sequences and the sequence selection procedure. Technically, we can run ET with as many sequences as we like. The more sequences we use, the slower it is to align them and score (these relationships are exponential rather than linear). In the ET server, the user can define the maximum number of sequences (the default is 500 from the BLAST search). We typically start with 5000 sequences from BLAST.

Synthetic constructs, mutant sequences, and fragments of sequences should not be included in the MSA. Therefore, ET embodies filtering algorithms to remove such BLAST hits. Furthermore, redundant sequences may not be necessary or may bias the ET outcome to functions specific to the overrepresented phylogenetic branches, so redundancy should be avoided. Scripts embedded to the ET server may select representative homologs at various phylogenetic distances that have the fewest alignment gaps compared to the query sequence. Typically, this process chooses approximately 160 sequences.

In the extremely rare cases that technical errors appear using MUSCLE alignment program (Edgar, 2004) , the ET pipeline may turn to ClustalW alignment program (Thompson, Higgins, & Gibson, 1994) . The steps 2.1-2.3 may be skipped if users prefer to input manually curated MSA.

We used the option "position-specific gap-reducing real-valued trace." Although the ET algorithm may proceed with less homologous sequences, it is recommended to use MSA with at least five homologous sequences, although such small numbers of sequences may result in low resolution for ET scores (many ties in the residue ranks). Low resolution of ET ranks may also be obtained for larger number of homologous sequences, when they have high percentage identity compared to the query sequence. The ET analysis outputs scores for each residue in the range of 0 (most important) to 100 (least important).

This can be done with the PyMOL program and the PyETV plugin (Lua & Lichtarge, 2010) . Fig. 2 shows the PDB structures: 4B1G (human PARG), 6D36 (human ARH3), and 6W02 (CoV-2 Mac1), colored by prioritization scores (red notes the most important and green notes the least important residues). The PyETV plugin offers additional schemes to represent the important residues and additional functions, such as measuring the z-score for the non-random clustering of the top ET residues.

Applying this approach to PARG as a key repair-replication regulator and for comparative exemplary macrodomains ARH3 and SARS CoV-2 Nsp3 Mac1, highlights the active site as a shared primary region of functional significance in all three enzymes (Fig 2) . Inspection of the active site reveals greater importance among residues coordinating the ADP ribose (ADPr) pyrophosphate linker and terminal ribose group relative to those stabilizing the adenine moiety. This is underscored by the unique binding conformations assumed by aromatic residues within the adenine pocket (c.f. PARGphenylalanine 902, tyrosine 795; ARH3-phenylalanine 143, tyrosine 149; SARS CoV-2 Mac1-phenylalanine 156). These structural analyses by ET generally identify the active sites and other targets for screening. In this example, they suggest that selective targeting of the adenine pocket can be leveraged to identify ligands specific to each enzyme.

We note that ET analyses for all SARS CoV-2 proteins, are available through an interactive GUI (http://cov.lichtargelab.org/) .

The GLAD library has been assembled based upon experimental feedback over the last 5 years to consist of 6874 compounds and is divided into two major families as discussed in the introduction GL (Goldilocks) and AD (Approved Drugs). The GL library is subdivided into compound families (Fig. 3) . The GL library was first assembled with 2500 compounds and then expanded based upon experimental results in two different steps to 4763. The initial GL compounds were selected to ensure chemical diversity using calculated twodimensional fingerprints and the Tanimoto similarity index (Bajusz, Racz, & Heberger, 2015) . Additionally, compounds were selected to minimize potential compound promiscuity (Bruns & Watson, 2012; Dahlin et al., 2015) , assay interference (e.g., fluorescence), and with predicted favorable physicochemical properties (size, solubility, cellular permeability, etc.) (Lipinski, Lombardo, Dominy, & Feeney, 2001; Veber et al., 2002) . The GL library includes compounds that are brominated, fluorinated, Protein-Protein Interaction disruptors (PPI), Fsp3 (containing sp3 hybridized carbon units), Superior (compounds optimized for solubility, low toxicity and cell permeability), peptidomimetic, and/or are targeting proteins involved in apoptosis and DNA Repair. The GL library was optimized for the percentage of each compound subfamily to its current composition (Fig. 3) .

The last expansion (2111 compounds), called AD (Approved Drugs), provides larger soluble compounds including Vitamins, lipids, natural compounds and additional human metabolites representing circa 31% of the full GLAD library. A complete virtual entity of the library has been assembled for in silico study. For experimental measurements, copies of the library are stored both in powder form and as DMSO solutions for in vitro and in vivo experiments.

The material library is accompanied by a full collection of structure data format (.sdf ) files representing all compounds with related information such as chemical identification, calculated properties, molecular weight, and SMILES. These are provided by our supplier (https://lifechemicals.com/).

Upload the .sdf files for the full GLAD library to Maestro. 

Run LigPrep program (Chen & Foloppe, 2010) . In the program, we use the force field OPLS4 (Lu et al., 2021) , which generates multiple conformers for each ligand to be incorporated in the GLAD in silico library for Virtual Screening. This collection of compounds is supplemented with additional compound controls that are known to bind to the target. Several human metabolites are already present in the AD subdivision.

To reduce the computational complexity, we generate a ligand receptor grid onto the protein target site of interest. We use a rigid docking site with flexible ligands (Schnecke, Swanson, Getzoff, Tainer, & Kuhn, 1998) . The PDB structure selected for in silico study needs to be prepared, optimized and minimized (Nguyen et al., 2021) . If a protein target has an allosteric mechanism and the active site where the grid is centered has more than one preferential conformation, then multiple target structures need to be prepared in parallel to run multiple virtual screens (Moiani et al., 2018) .

In the Protein Preparation Wizard (Sastry, Adzhigirey, Day, Annabhimoju, & Sherman, 2013), we strip waters and non-metal ligands from the structure. We keep metals and assign with proper charges. We change non-standard amino acids to standard amino acids, like selenomethionine to methionine. We either complete missing loops or cap termini resulting from missing loops with N-terminal acetyl and C-terminal amide capping groups and which gives the proper zwitterionic charge.

This is typically an active site or protein-protein interaction surface. In the Receptor grid generation, we pick residues in the active site. The program will build a cube, typically with a range of dimensions of 10-15 Å on each side. If using a large grid (e.g., double the size), the calculations may take 1-2 days or longer, depending on computing power. Multiple grids can be generated and targeted at the same time. Results from multiple calculations and multiple grids requires a more accurate evaluation to generate the proper ranking of results.

For PARG, we used three structures (PDB ID 6OAK and PDB ID 4B1H with two conformations of phenylalanine 902) with the cubic dimensions (12 Å on each side) around phenylalanine 902, tyrosine 795, and phenylalanine 875 (based on the human structure). The Receptor grid generation program calculated the midpoint between the three c-alpha and centered the grid around the midpoint. We also generated grids for MAC1 (PDB ID 7KG3) and for ARH3 (PDB ID 6D36).

For Nsp15 (PDB ID 6WLC), we centered our grid on only Tyrosine 343 as it interacts with tipiracil and UMP (Kim et al., 2020 (Kim et al., , 2021 . Notably, we generated the grid on a Nsp15 homotrimer, as the active site is near the subunit interface.

Next, we dock our GLAD library into the selected cubic grids of the protein using Glide program (Friesner et al., 2006) . Glide allows the ligand to adopt multiple conformations but the protein is kept rigid. The virtual screening parameters are optimized using a workflow implemented in the Schr€ odinger package

In the Glide docking program we use "standard precision" and a parallelizing multiprocessor cluster and enable the MMGBSA scoring calculation in Prime. MM-GBSA energies (ΔG) are calculated from binding energies including covalent, van der Waals (VDW), coulombic, solvation, hydrogen bonding as well as from packing and energy differences related to strain in protein and ligand (Genheden & Ryde, 2015) . As needed for speed, we use VHTS receptor space, which offers a lower precision but faster calculation.

Some ligands may occur multiple times because they bind in different conformations.

We have the most leads when we combine the docking score (weighted 70%) with MMGBSA energy (weighted 30%) Moiani, Cavallotti, Famulari, & Schmuck, 2008) . We find both top binders but also more chemically diverse compounds in the list.

When the compound appears multiple times in the list because of different conformations, we typically remove all but the top scoring conformation.

We analyze the subfamily type of the GLAD library in the full list of results, view the ligands docked on available structures, and select about 15 top in silico leads, which contain the highest number of unique chemical features, also known as chemical singletons. Fig. 4 macrodomain targets and Nsp15 for each of them a top 15 compounds were selected per ranking, with chemical singleton redundancies cleaned. The target is presented in surface representation to better display the population of top in silico leads.

The final selection of 15 is plated and moved to in vitro structural, binding and functional assays (see Sections 6 and 7).

We merged the top compounds from in silico virtual screens performed on three different conformational structure of human PARG. Although we used different structures, some compounds were found in both docking analyses ( Fig. 4D and E). After merging, the top ranked were 90% AD, and only two compounds were from the GL, a PPI and a DNA Repair uracil derivative. Interestingly, atracurium, in the top 15, has chemical similarity to a small unit of branched PAR.

We merged the results obtained from targeting two grids simultaneously. To target the adenosyl clamp between phenylalanine 127 and tyrosine 133, the two grids were centered close to metal active site, specifically at residue asparagine 135. The results were grouped together and properly re-ranked as described in Section 5.3. The top 15 in silico leads' distribution shows 50/50 ratio between GL and AD. Specifically the highest scoring AD are drugs approved for infection and for targeting the immune system, which is interesting in light of the immune evasion role of this viral enzyme. The GL top in silico leads are peptidomimetics, PPI, Superior, and molecules targeting DNA Repair proteins.

After applying our ranking methods on the results from Glide, we analyzed the chemotype of top 15 in silico leads. The top two compounds are part of the DNA Repair subfamily of GL, and specifically are uracil derivatives. ADPr scored as third in the ranking, matching the high affinity for MAC1, proven by in vitro measurements (in Section 7) and shown in Fig. 5A . Top in silico leads include peptidomimetics and PPI disruptors. More than 50% of top results were AD, including an anticancer drug (Daunorubicin) and an antiviral drug (Valganiclovir). None of the compounds bound MAC1, based on Microscale Thermophoresis (MST) binding results except ADPr. The graphic model of top silico leads for Mac1 is shown in Fig. 4A .

The analysis of top in silico leads shows 70% are AD, while the rest include a PPI and three DNA Repair subfamily uracil derivatives. Among the AD results, we found certain peptides like Goserelin and Desmopressin or complex molecules like Anidulafungin (an antifungal drug). Interestingly, we found some high molecular weight compounds as top in silico leads with an extended binding surface as shown in Fig. 4C . 

As part of CASSIE (Fig. 1) , we validate our in silico hits with binding assays. The top in silico selections are physically plated at suitable concentrations for qualitative binding. Compounds are prepared from the mother solution (directly plated by supplier) and made uniform at 10 mM concentration in DMSO. Typically, the in vitro techniques selected for binding studies are Microscale thermophoresis (MST) or Surface Plasmon Resonance (SPR, Biacore). Measuring ligand-protein affinity constants ranging from nM to mM, MST tracks thermophoresis, the movement of fluorescently labeled molecules in and out of a small temperature gradient. When a protein binds a ligand, there may be changes in protein conformation or even the protein hydration shell. These changes can alter the rate of movement (thermophoresis) of the protein-ligand complex as compared to the control, protein without added ligand. This is monitored by changes in fluorescence, with varying ligand concentrations to generate a binding curve from which affinity is calculated. The deflections in the binding curves can be either positive or negative as the deflection is dependent on the relative rates of the protein without ligand vs the protein ligand complex. However, if there is no change in the thermophoresis rates upon a binding event, this event will be silent or undetected using MST. SPR tracks ligand binding by measuring changes in polarized light striking the target protein or protein complex attached to a sensor surface. The protein is usually covalently attached to a sensor surface or captured using an affinity tag such as his 6 or streptavidin. These two binding assays, one detecting changes in thermophoresis of the protein by binding ligand and the other detecting changes in target mass upon binding, provide complementary information and validation of the small molecule in silico lead. The selection of top compounds from in silico are run in parallel to a well-known binder for the enzyme and compared. Compounds identified with adequate affinity move to a quantitative test, such as isothermal calorimetry (ITC) which usually requires added material to ensure reproducibility of multiple measurements.

We typically measure MST on a Monolith NT1.15. Proteins are labeled with Atto488 NHS-ester (ATTO-TEC, GBH) according to the manufacturer's protocol with labeling efficiency 1:1 protein-to-dye ratio. Ligands are serially diluted into solution of PBS, 0.5 mM fresh TCEP, 0.01% Tween-20 with the Atto488-labeled protein. Samples are loaded into standard capillaries and each set analyzed in triplicate. Data fitting with the MO Affinity Analysis software for a series of compounds is performed at the lowest fixed power and time which gives the largest signal changed based on the either known binders or the most consistent change for that series. For ARH3 analysis, ADPr was used as a control and the settings were at high power with thermophoresis at 20 s. For Nsp15, the most consistent change for the series was with medium power with thermophoresis at 10 s. Capillaries with measure intensities greater than 1% DMSO or with intensities (AE) 5% of the overall average are excluded from the analysis. Binding was labeled as not detected (ND) if the data could not be fit to the K d model, or the calculated response amplitude was less than 2.

For ARH3, binding was detected for 3 of the 15 compounds with binding constants ranging from 20 to 400 μM. For Nsp15 binding was detected for 12 of the 15 compounds with binding constants ranging from 180 nM to 400 μM. For PARG there was no detectable change in the thermophoresis based on the above criteria and as a result, SPR was used.

We typically measure SPR on a Biacore T200 with proteins attached to the sensor surface. For PARG, 1 mg/ml protein was covalently attached to a Biacore CM5 chip using amine coupling via an EDC/NHS (N-(3-dimethylaminopropyl)-N'-ethylcarbodiimide hydrochloride/ N-hydroxysuccinimide) reaction. A control channel is treated similarly, except using ethanolamine instead of protein. The final relative response unit (RU) was 899.7. Serially dilute ligand starting at 100 μM was passed over the surface at 30 μL/min with a contact time of 120 s. The data was fit to a steady state affinity model with the Biacore Evaluation 3.0 software using blank subtracted RU values at 4 s before injection stop for a single run. For PARG, binding was detected for 6 of the 16 compounds with binding constants ranging from 23 to 1000 μM.

MST in qualitative mode is a relatively quick method and needs a low quantity of material (including protein) but the data are not extremely reliable, particularly if the binding data extrapolated fluctuation range is over half millimolar. MST in quantitative mode would be more appropriate and needs multiple measurements in the titration, which requires more time and material with more reproducibility but the final binding K d is still not precise. SPR offers better data if the system is properly calibrated. The protein needs to be anchored to the chip and a calibration test implemented. Following these steps, the system can efficiently produce multiple, reliable, and accurate affinity binding measurements. ITC is the best but requires time for calibration and optimization plus an abundant quantity of material.

If micromolar or better binders are not discovered within the top 15, then we recommend testing compounds in multiple binding assays and to test all of the top 100 compounds. In silico docking scores do not directly correlate to experimental affinities.

After in vitro binding assay validations of the in silico data, we proceed to structure-based optimizations. Crystallography is an optimal structural validation for precision; however, co-crystal structures may be difficult to obtain with preliminary candidates. With the CASSIE approach, inhibition of function and low micromolar binding are more effectively evaluated with minimal effort and protein.

In binding pockets, we use crystal structures strategically to identify two different kind of subsites: anchor points and induced plasticity. This approach is based upon our discovery of anchored plasticity to gain 20,000-fold between active site pockets formed by invariant sequences in nitric oxide synthase (NOS) and the ability to selectively inhibit MRE11 endonuclease and exonuclease activities (Garcin et al., 2008; Shibata et al., 2014) . Tactically, the larger sized AD library components can identify inducible plasticity regions within binding sites. The intermediate sized GL library components are better at identifying potential anchor points in binding sites that can be used for focused libraries to optimize affinity and drug-like properties by growing compounds into surface subsites with plasticity. Thus, this combination of data enables focused compound libraries to employ a systematic strategy of retaining anchor interactions while growing compounds in directions defined by a vector from the anchor site to flexible subsites. Focused libraries are designed from lead compounds by adding new chemical moieties that can bind in neighboring regions. Target flexibility or plasticity, if identified by structural analysis, can also be harnessed to provide additional regions for growing compounds to increase specificity. By employing this anchored plasticity approach CASSIE provides a systematic approach to enlarging binding pock subsites for increased affinity. This can span from creating new pockets for inhibitors as we did for Map kinase (Perry, Harris, Moiani, Olson, & Tainer, 2009 ) to building upon substrate analogue structures (Daniels et al., 2000; Doi et al., 2006) and learning how inhibitors displace bound water molecules (Putnam, Arvai, Bourne, & Tainer, 2000) . Although we currently use crystal structures and temperature factors, as employed for EXO5 nuclease (Hambarde et al., 2021) to identify target plasticity, we expect to be able to use atomic structures combined with SAXS to identify key plasticity associated with allosteric mechanisms (Brosey & Tainer, 2019; Hammel & Tainer, 2021) . Observed plasticity can even be used to apply inhibitor doorstoppers to mimic a protein inhibitor and block an enzyme's active site closing as needed for strong DNA binding, by uracil-DNA glycosylase (Nguyen et al., 2021; Putnam et al., 1999) .

Urgent demands for efficient drug design stem from the SARS CoV-2 pandemic (200 million cases and 4.4 million deaths as of August 24, 2021, Johns Hopkins Center for Systems Science and Engineering, https:// coronavirus.jhu.edu/map.html) and cancer (19 million new cases and 10 million deaths) (Sung et al., 2021) . These diseases underscore the merit of new approaches that reduce bottlenecks and improve efficiency for drug discovery. Yet, meeting this need is challenging as both cancer cells and pathogens have a mutator phenotype that can promote selection of resistance to both the host immune system and drug therapies. For cancer drugs, relevant new mechanistic insights have come from targeting poly (ADPribose) polymerase (PARP). PARP inhibitors (PARPi) are successful for ovarian cancer and breast cancer indications, but resistance often develops , suggesting a need to also be able to inhibit additional pathway proteins or complementary pathways. Interestingly, the clinical success of PARPi's can depend upon trapping PARP1 on the damaged DNA, and this allosteric capacity has been structurally characterized for different PARPi (Zandarashvili et al., 2020) . Indeed, we also know from biology that catalytically inactive proteins that bind and hold damaged DNA can control damage outcomes, as seen for ATL that forms a complex with alkylated base damage and thereby alters repair pathway outcome from base repair to the nucleotide excision repair pathway (Tubbs et al., 2009) . Analogously, XPG impacts multiple repair pathways by binding and sculpting DNA junctions and protein partners in addition to its nuclease activity , FEN1 sculpts 5' flaps to avoid template switching at replication forks (Perry et al., 2006; Trego et al., 2011; Tsutakawa et al., 2017) , GRB2 adaptor protein efficiently brings MRE11 nuclease to DNA breaks , XRCC1 links MRE11 and PolQ to promote alternative end joining of DNA breaks (Eckelmann et al., 2020) , SLX4IP binds and maintains SLX4-XPF-ERCC1 complex for inter-strand crosslink repair (Zhang et al., 2019) , and acetylation targets oxidative base repair initiation to open chromatin . In extreme examples, transient polyvalent binding by flexible proteins and RNA can create phase separated condensates of functional proteins for processes such as repair, as seen for DNA break repair protein KU and its complex with long non-coding RNA (Thapar et al., 2021) . Furthermore, combining structural and computational methods is proving enabling to identify and target molecular mechanisms as well as to understand the mechanisms where point mutations can cause different human diseases (Yan et al., 2019) .

All of the above noted observations support the value of molecular mechanistic knowledge for drug discovery. In fact, most successful new drugs result from molecular and mechanistic targeting that requires chemical, biophysical, biochemical and structural knowledge. It therefore becomes important to develop systematic strategies to attain this information efficiently. As a result of this reasoning, we developed the CASSIE pipeline and GLAD library to these potential overcome rate-limiting steps in developing inhibitor tools and preclinical drug candidates.

In Greek myth, Cassandra consistently made accurate predictions that were unfortunately not believed. CASSIE predictions, however, do not require faith as they can be tested and validated by practical levels of time and effort. In contrast, high throughput screens with overly large libraries can in practice be high input with challenges to effectively handle output. The CASSIE pipeline and GLAD library approach provide a manageable number of compounds, hits, and validation measurements within weeks. Getting compounds into cells and crystal structures in minimal time frames enables molecular targets and mechanisms to be efficiently tested directly and thereby provides a critical enabling path for preclinical drug discovery.

A new generation of protein database search programs

Physiologic RNA targets and refined sequence specificity of coronavirus EndoU

Natural products in drug discovery: Advances and opportunities

Heritable pattern of oxidized DNA base repair coincides with pre-targeting of repair complexes to open chromatin

Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations

Targeting SARS-CoV-2 Nsp3 macrodomain structure with insights from human poly(ADP-ribose) glycohydrolase (PARG) structures with inhibitors

Evolving SAXS versatility: Solution X-ray scattering for macromolecular architecture, functional landscapes, and integrative structural biology

Rules for identifying potentially reactive or promiscuous compounds

Dysregulated type I interferon and inflammatory monocyte-macrophage responses cause lethal pneumonia in SARS-CoV-infected mice

Pathogenic human coronavirus infections: Causes and consequences of cytokine storm and immunopathology

Drug-like bioactive structures and conformational coverage with the LigPrep/ConfGen suite: Comparison to programs MOE and catalyst

High-throughput screening of the ReFRAME, Pandemic Box, and COVID Box drug repurposing libraries against SARS-CoV-2 nsp15 endoribonuclease to identify small-molecule inhibitors of viral activity

PAINS in the assay: Chemical mechanisms of assay interference and promiscuous enzymatic inhibition observed during a sulfhydryl-scavenging HTS

Active and alkylated human AGT structures: A novel zinc site, inhibitor and extrahelical base binding

Coronavirus endoribonuclease activity in porcine epidemic diarrhea virus suppresses type I and type III interferon responses

Synthesis and characterization of oligonucleotides containing 2'-fluorinated thymidine glycol as inhibitors of the endonuclease III reaction

XRCC1 promotes replication restart, nascent fork degradation and mutagenic DNA repair in BRCA2-deficient cells

MUSCLE: A multiple sequence alignment method with reduced time and space complexity

Extra precision glide: Docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes

Inhibition of anti-viral stress granule formation by coronavirus endoribonuclease nsp15 ensures efficient virus replication

Anchored plasticity opens doors for selective inhibitor design in nitric oxide synthase

The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities

Coronavirus endoribonuclease targets viral polyuridine sequences to evade activating host sensors

EXO5-DNA structure and BLM interactions direct DNA resection critical for ATR-dependent replication restart

An atypical BRCT-BRCT interaction with the XRCC1 scaffold protein compacts human DNA Ligase IIIalpha within a flexible DNA repair complex

X-ray scattering reveals disordered linkers and dynamic interfaces in complexes and mechanisms for DNA double-strand break repair impacting cell and cancer biology

Epigallocatechin gallate inhibits the uridylate-specific endoribonuclease Nsp15 and efficiently neutralizes the SARS-CoV-2 strain

Selective small molecule PARG inhibitor causes replication fork stalling and cancer cell death

Crystal structure of Nsp15 endoribonuclease NendoU from SARS-CoV-2

Tipiracil binds to uridine site and inhibits Nsp15 endoribonuclease NendoU from SARS-CoV-2

Early endonuclease-mediated evasion of RNA sensing ensures efficient coronavirus replication

Uncovering DNA-PKcs ancient phylogeny, unique sequence motifs and insights for human disease

PARP inhibitor resistance: The underlying mechanisms and clinical implications

An evolutionary trace method defines binding surfaces common to protein families

Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings

OPLS4: Improving force field accuracy on challenging regimes of chemical space

PyETV: A PyMOL evolutionary trace viewer to analyze functional site predictions in protein complexes

Entrez gene: Gene-centered information at NCBI

A family of evolution-entropy hybrid methods for ranking protein residues by importance

Oxoanion binding by guanidiniocarbonylpyrrole cations in water: A combined DFT and MD investigation

Targeting allostery with avatars to design inhibitors assessed by cell activity: Dissecting MRE11 endo-and exonuclease activities

Structural characterization of a Protein A mimetic peptide dendrimer bound to human IgG

Structural biology in fragment-based drug design

An effective human uracil-DNA glycosylase inhibitor targets the open pre-catalytic active site conformation

Anacardic acid inhibits the catalytic activity of matrix metalloproteinase-2 and matrix metalloproteinase-9

p38alpha MAP kinase C-terminal domain binding pocket characterized by crystallographic and computational analyses

WRN exonuclease structure and molecular mechanism imply an editing role in DNA end processing

Active and inhibited human catalase structures: Ligand and NADPH binding and catalytic mechanism

Protein mimicry of DNA from crystal structures of the uracil-DNA glycosylase inhibitor protein and its complex with Escherichia coli uracil-DNA glycosylase

Protein and ligand preparation: Parameters, protocols, and influence on virtual screening enrichments

Screening a peptidyl database for potential ligands to proteins with side-chain flexibility

DNA double-strand break repair pathway choice is directed by distinct MRE11 nuclease activities

Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches

The MRE11-RAD50-NBS1 complex conducts the orchestration of damage signaling and outcomes to stress in dna replication and repair

Mechanism of efficient double-strand break repair by a long non-coding RNA

CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

The DNA repair endonuclease XPG interacts directly and functionally with the WRN helicase defective in Werner syndrome

Human XPG nuclease structure, assembly, and activities with insights for neurodegeneration and cancer from pathogenic mutations

Phosphate steering by Flap Endonuclease 1 promotes 5'-flap specificity and incision to prevent genome instability

Flipping of alkylated DNA damage bridges base and nucleotide excision repair

Molecular properties that influence the oral bioavailability of drug candidates

Identification of evolutionarily stable functional and immunogenic sites across the SARS-CoV-2 proteome and greater coronavirus family

Fragment-and structure-based drug discovery for developing therapeutic agents targeting the DNA damage response

Compartmentalizationaided interaction screening reveals extensive high-order complexes within the SARS-CoV-2 proteome

Transcription preinitiation complex structure and dynamics provide insight into genetic diseases

GRB2 enforces homology-directed repair initiation by MRE11. Science Advances

SARS-CoV-2 nsp13, nsp14, nsp15 and orf6 function as potent interferon antagonists. Emerging Microbes & Infections

Structural basis for allosteric PARP-1 retention on DNA breaks

SLX4IP acts with SLX4 and XPF-ERCC1 to promote interstrand crosslink repair

A first-in--class polymerase theta inhibitor selectively targets homologous-recombination-deficient tumors

We thank Chi-Lin Tsai for comments. We acknowledge National Institute of Health (NIH) grants (P01 CA092584, R35CA220430), R01 GM141226 (to I.K.K.), a Robert A. Welch Chemistry Chair, and Cancer Prevention and Research Institute of Texas (CPRIT) RP180813. X-ray diffraction data was collected at SSRL beamline 12-2 and ALS beamlines 12.3.1 [supported by NIH project ALS-ENABLE (P30 GM124169) and the Integrated Diffraction Analysis Technologies (IDAT) program] and 8.3.1 (supported by NIH R01 GM124149 and P30 GM124169). Nsp15 research was supported by the U.S. Department of Energy (DOE) Office of Science through the National Virtual Biotechnology Laboratory, a consortium of DOE national laboratories focused on response to coronavirus disease 2019, with funding provided by the Coronavirus CARES Act.