key: cord-0720760-uleo5ej5
authors: Gentile, Francesco; Fernandez, Michael; Ban, Fuqiang; Ton, Anh-Tien; Mslati, Hazem; Perez, Carl F.; Leblanc, Eric; Yaacoub, Jean Charle; Gleave, James; Stern, Abraham; Wong, Bill; Jean, François; Strynadka, Natalie; Cherkasov, Artem
title: Automated discovery of noncovalent inhibitors of SARS-CoV-2 main protease by consensus Deep Docking of 40 billion small molecules
date: 2021-11-17
journal: Chem Sci
DOI: 10.1039/d1sc05579h
sha: bf48ba7d5a64cffb06dce040cdf03723fbe2426f
doc_id: 720760
cord_uid: uleo5ej5

Recent explosive growth of ‘make-on-demand’ chemical libraries brought unprecedented opportunities but also significant challenges to the field of computer-aided drug discovery. To address this expansion of the accessible chemical universe, molecular docking needs to accurately rank billions of chemical structures, calling for the development of automated hit-selecting protocols to minimize human intervention and error. Herein, we report the development of an artificial intelligence-driven virtual screening pipeline that utilizes Deep Docking with Autodock GPU, Glide SP, FRED, ICM and QuickVina2 programs to screen 40 billion molecules against SARS-CoV-2 main protease (Mpro). This campaign returned a significant number of experimentally confirmed inhibitors of Mpro enzyme, and also enabled to benchmark the performance of twenty-eight hit-selecting strategies of various degrees of stringency and automation. These findings provide new starting scaffolds for hit-to-lead optimization campaigns against Mpro and encourage the development of fully automated end-to-end drug discovery protocols integrating machine learning and human expertise.

Make-on-demand chemical libraries will soon exceed the 100 billion threshold, 1-3 while structure-based virtual screening (SBVS) is well projected to remain the gold standard method for early-stage drug discovery. Molecular docking implemented on high-performing computational resources has already been proven successful in identifying potent ligands from extensive databases (from 100s of millions to up to 1 billion chemicals) for several targets. [4] [5] [6] [7] [8] However, billion-scale SBVS remains out of reach for most academic research groups due to prohibitive computational cost. 6, 9, 10 Therefore, universal access to ultralarge chemical libraries is not yet part of the global trend to democratize drug discovery. 6, [11] [12] [13] [14] [15] To make the situation even more challenging, conventional docking is remarkably inefficient, where the vast majority of docked molecules is discarded, while only an exceedingly small subset of top-scoring compounds is considered for analysis. 16 Furthermore, docking is notoriously prone to highly ranked artifacts, which can hinder identication of true binders, a tendency that is particularly exacerbated in large-scale SBVS campaigns. 4, 17 Therefore, different levels of human intervention are required to efficiently discard false positives, making human inspection of docking poses a vital part of SBVS campaigns. 18 Hence, ultra-large scale SBVS would greatly benet from automatic hit selection, considering that meaningful human inspection of millions of seemingly well-docked molecules is impossible.

In order to speed up and automate conventional docking (without signicant loss of valuable information), we recently developed Deep Docking (DD)an AI-driven platform that can work in conjunction with any docking program and provides economical yet reliable access to billion-sized chemical libraries for SBVS. 19 Recently, similar approaches to DD that utilize AI to predict docking outcomes have emerged, including but not limited to MolPAL (molecular pool-based active learning), 20 lean-docking, 21 and AutoQSAR/DeepChem models. 22 Importantly, the high-throughput nature of DD can be utilized for docking ultra-large libraries but also enables the simultaneous use of multiple docking programs, facilitating the deployment of stringent consensus protocols, as advocated by the best practices of computer-aided drug design (CADD). 23 In other words, one could safely assume that when multiple independent docking approaches agree on a hit, the result should be associated with experimental activity with higher condence.

In this study, we developed and deployed twenty-six hitcalling strategies that rely on DD integration with Autodock GPU, 24 Glide SP, 25 FRED, 26 ICM, 27 ad QuickVina2 28 programs, combined with pharmacophore modeling, molecular clustering, root mean square deviation (RMSD)-based pose selection and other hit ltering approaches. These consensus-based protocols were applied to the stereoisomer-expanded Enamine Real SPACE library, 29 combined with the ZINC15 database, 30 totaling 40 billion molecular entries that were screened against the active site of SARS-CoV-2 main protease (Mpro). The performance of these automated hit-calling approaches was further benchmarked against two strategies, implemented by human experts and involving visual examination of docking poses.

Since the beginning of the COVID-19 pandemic, countless drug discovery campaigns have been directed towards the SARS-CoV-2 Mpro enzyme. Mpro has a critical role as it mediates the proteolytic cleavage of large polyproteins resulting from viral gene translation to initiate the formation of the replicationtranscription complex. 31 These studies resulted in a number of conrmed micromolar and nanomolar hits, including covalent 32,33 and noncovalent binders, 34 fragment-like molecules, 35 and some repurposed drugs. [36] [37] [38] Notably, only a handful of experimentally conrmed compounds were identied via SBVS despite the release of thousands of computational predictions, 9, 39, 40 illustrating that Mpro is a particularly challenging target for CADD in general, and for molecular docking in particular.

Thus, we anticipated that the use of DD will facilitate signicant expansion of the docking base (shown to be benecial in previously reported campaigns 4,5 ), and will also enable the use of particularly stringent consensus protocols, relying on up to ve independent docking programs. To benchmark the performance of the ve docking programs (Glide SP, Autodock GPU, FRED, ICM and QuickVina2) on Mpro enzymatic site as a target, we collected a set of 214 conrmed inhibitors from the literature (listed in Table S1 †) and generated computational decoys using the Directory of Useful Decoys -Enhanced (DUD-E) server. 41 Utilizing all ve docking programs, the resulting benchmark dataset was docked into the Mpro catalytic site of the PDB:6W63's crystal structure 42 (Fig. S1 †) .

The ability of each program to identify actives was then assessed by the resulting top 1% enrichment factor (EF) values featured on Fig. 1 . Notably, these values range from 1.4 to 10.4 for individual docking programs, illustrating their remarkable variability in retrospective performances on SARS-CoV-2 Mpro. Consequently we evaluated how the application of consensus approaches could inuence the docking outcomes, specically focusing on ranking by average scores (A.S.), and selecting compounds based on 3 of 5 or 5 of 5 consensus rules 43 requiring at least three or all ve programs to agree on the docking pose (with the corresponding pose superpositions yielding RMSD < 2Å). We observed that consensus-based performance improved substantially with the corresponding EF values consistently increasing along with the levels of stringency from 11.5 for only A.S.-based ranking, up to 59.0 when considering all ve programs agreeing on the pose (5 of 5 rule), strongly advocating in favor of consensus-centric best practices of CADD previously published by us. 23 Data-driven and expert-driven pharmacophore modeling

To deal with potential docking artifacts, an automated procedure was employed to design a pharmacophore model, as outlined in the Methods section. Notably, this model was generated in a data-driven, automated fashion, and contained only consensus features shared by multiple active ligands superposed into the active site of Mpro (as identied by the Consensus Pharmacophore Query Editor of Molecular Operating Environment (MOE) program). Human intervention was only limited to removing unnecessary pharmacophore features upon visual inspection of the binding poses. The resulting pharmacophore shown on Fig. 2 recapitulates six key proteinligand interaction features: a hydrogen bond acceptor with His163 in the S1 sub-site (F1), an aromatic/hydrophobic feature engaging S1 in hydrophobic, p-p, or cation-p interactions (F2), a hydrogen bond acceptor engaging Glu166's backbone amide (F3), a aromatic/hydrophobic moiety placed in the S2 sub-site (F4), a hydrogen bond acceptor feature in S1 0 (F5) and an aromatic moiety occupying S1 0 (F6).

In parallel, a highly detailed pharmacophore model was constructed by an expert using the binding pose of X77 compound from the PDB:6W63 structure 42 of Mpro, and from the structure of Jun8-76-3A molecule found in PDB:7KX5. 44 This expert-designed pharmacophore model involved aromatic/ hydrophobic (F1) and hydrogen donor (F2) features respectively placed at P1 position, one aromatic/hydrophobic feature (F3) at P2, an aromatic/hydrophobic linker (F4) between P1 and P2, an aromatic/hydrophobic feature (F5) at P1 0 and a hydrogen bond acceptor feature (F6) at P3, as shown on Fig. 2c . Unlike the pharmacophore model generated by MOE, the design of this model was fundamentally expert-driven and did not rely on automatic detection of binding features from available crystallographic data.

To carry out the ultra-large DD campaign, we generated a combined library of purchasable chemicals by merging ZINC15 30 with an enumerated Enamine REAL Space library, 29 resulting in 40 061 717 551 unique molecular entries. The conventional docking of a library of this size would require extraordinary resources, and up to 10 years of running time with the fastest Autodock GPU program implemented on 250 GPU units. Instead, we utilized the DD method to screen the library.

In its essence, DD is a classication model that is trained with the chemical ngerprints and docking scores of a small sample of a chemical library, and used to predict the top scoring molecules (virtual hits) while discarding unfavorable molecules without requiring to dock the entire database. Virtual hits are dened as molecules ranked by their scores within a user-dened top percentage of the library's molecules (see Methods section for the values used in this work). DD is an iterative process, where at each iteration the training set (initially a random sample of the library) is enriched with predicted virtual hits from the previous iteration, the virtual hit criterion becomes more stringent, training is repeated and inference is run over the whole library. As a result of this progressive model improvement, the number of molecules predicted as virtual hits is expected to decrease with every new iteration, enabling the user to conventionally dock the remaining molecules comprising true virtual hits without requiring prohibitive amounts of computational power. 19 In this work, we sequentially used the DD method with ve popular docking programs, switching to a different program when model performance did not signicantly improve with further training augmentation: starting with Autodock GPU, we ran DD iterations on the original library of 40 billion compounds, and we compared the number of predicted virtual hits at the end of an iteration with the number of predicted virtual hits in the previous one; aer three iterations, we observed that the decrease power of DD did not improve signicantly with further training, as more than 80% of the previously predicted virtual hits were still retained despite augmenting the training set, and thus we did not proceed further with Autodock GPU. Nevertheless, this yielded 10-fold reduction of a docking database by Autodock GPU with a remaining $4 billion molecules predicted as probable virtual hits. The same procedure was then repeated with Glide SP-DD using the 4 billion remaining molecules as the starting point, followed by FRED, ICM, and the slowest QuickVina2, until the total number of predicted virtual hits was reduced to $24 million molecules ( Table 1 ). The consecutive DD runs with the ve programs took us approximately nineteen days of computing on 250 GPUs and 640 CPU cores, and resulted in a signicant 1500-fold reduction of the initial 40 billion docking library. It should be noted that the extend of reduction of the docking database is expected to depend on the order of the application of the docking programs. Therefore, we prioritize the high-throughput capability of each docking algorithm to achieve faster library reduction. The fact that the ve popular programs collectively qualied only 24 million compounds out of more than 40 billion docked molecules (0.06%), once again may highlight a very approximate nature of molecular docking.

While only 0.06% of molecules were consistently ranked by all ve programs (in DD mode), the remaining 24 million candidate compounds still posed a signicant challenge for meaningful hit selection and required the use of various post-docking scoring and ltering tools. Hence, we experimented with a broad variety of CADD lters and consensus procedures to reduce this amount of chemicals to manageable hitlists. The corresponding hit-calling strategies are illustrated by Fig. 3 and discussed below in greater details.

Firstly, we selected hits solely from the ranks obtained by individual docking programs (lists A, B, C, D, E in Fig. 3 ). The top-100 compounds on these ve hit lists were then clustered based on Tanimoto similarity to determine their underlying chemical diversity (as depicted in Fig. 4 ). The constructed structure-similarity trees exhibited rather close agglomeration scores and maximum tree depth values for all ve docking approaches, which suggest equally diverse nature of identied hits, except for FRED that yielded fewer chemical scaffolds (details in Fig. S2 †) .

The generated ve hitlists were then re-clustered based on the average values of normalized docking scores resulting in a matrix (K), that was further processed with a series of consecutive score-and geometry-based lters to aggressively reduce potential hits to 'high-condence' subsets (sequential ltering paths 1.1 and 1.2 on Fig. 3 corresponding to overall more-and less-stringent selection strategies).

Thus, the rst lter L was used to remove compounds presenting high standard deviation values of the ve docking scores (hence, inconsistently ranked), using thresholds of 1.0 (list 1.1_L on the path 1.1) or 0.5 of the standard deviation unit (list 1.2_L, path 1.2). Next, we applied the two previously described consensus lters: 5 of 5 rule for path 1.1 (resulted in a hitlist 1.1_M) and 3 of 5 rule for path 1.2 (hitlist 1.2_M). The resulting lists on 1.1 and 1.2 paths (corresponding to most consistently docked and ranked compounds) were further ltered with the above-described pharmacophore model (resulting in hitlists 1.1_N and 1.2_N respectively), followed by best-rst structural 4 (O) and score tree-based (P) clustering procedures that aimed to increase diversity in the resulting hitlists. As Fig. 3 illustrates, the implemented less aggressive sequential ltering (path 1.2) resulted in 14 000-fold reduction of the initial 24 million dataset, while a stricter 1.1 ltering path led to 680 000-fold database reduction and yielded only 179 molecules that justied all consensus prediction criteria. Of note, the implemented ltering procedures were all highly automated and enabled handling raw docking results with minimal human intervention. 

It is well known that combinatorial databases like REAL Space tend to contain redundant chemotypes, and therefore could benet from structure-based clustering to ensure chemical diversity. 4 For that reason, we implemented selection strategy 2.0 (Fig. 3) , where we began ltering of 24 million hits by selecting 100 000 molecules that were top-scored by each docking program (due to technical limitations of clustering algorithms), and grouped them based on structural similarity, selecting the top-ranked member as cluster representative and discarding the others 4 (see Methods section for details), resulting in hitlists marked as F, G, H, I and J on Fig. 3 .

Unsurprisingly, the generated hitlists were signicantly more diverse than the ones obtained from raw docking results, as illustrated by the comparison of the distribution of Tanimoto similarity scores (computed on 1024 bits, radius 2 Morgan ngerprints) for the top-100 ranked molecules prior and aer clustering, reported in Fig. 5 , and by the circular dendrograms (Fig. S3 †) that exhibit lower agglomeration scores (Fig. S2 †) than that in Fig. 4 , indicating higher chemical diversity of the compounds. The consecutive ltering on the path 2.0 were identical to those used for 1.0, but the number of compounds that passed the next lter was signicantly smaller and the corresponding Spearman correlations between the ranks were lower (details are given on Fig. 3 and S4 †).

In addition to the automated strategies detailed in the previous sections, we composed two 'expert curated' hitlists that were constructed with signicant involvement of human intervention, such as visual inspection of docking poses. No specic requirements were imposed for the expert analysis and only docking and post-docking results generated in this study were used as well as up-to-date experimental binding information for Mpro ligands. The expert-generated pharmacophore lter turned out to be signicantly more stringent than the datadriven version, returning only 1119 and 304 unique compounds when respectively applied to Glide and ICM results. Considerably fewer molecules were returned by other three docking programs, that were therefore not included in the analysis. Aer visual inspection and prioritization of noncovalent interactions, 150 compounds from Glide and ICM docking 'streams' were nally selected.

The DD processing of 40 billion small molecules with ve popular docking programs provided a unique opportunity of compiling a large list of potential hits ($24 millions) on which a variety of fully or partially automated CADD ltering and consensus procedures could be assessed (including the most stringent ones, reducing 40 billion chemicals to just a few centers, a total of 1700 chemicals were ordered, from which 1283 ($75% synthesis success rate) were received and evaluated in Mpro enzymatic assay. Of those, 117 compounds demonstrated >10% inhibition of SARS-CoV-2 Mpro enzymatic activity at 5 mM, and have been regarded by us as potential hits (all presented in Table S2 †) .

Next, we computationally assessed the drug-like properties of the identied active compounds as well as their novelty. Remarkably, all 117 molecules satised the Lipinski's rule of 5 for orally active drugs, 46 as calculated by MOE. In addition, 42 of the identied inhibitors also passed the Oprea lead-like test indicating their potential as starting points for medicinal chemistry optimization efforts. 47 Notably, all the identied hits belong to the REAL Space library, hence they can be synthesized in few weeks at high success rate, and purchased at a reasonable cost. We then compared the structures of the 117 active molecules with the ones of the 214 known inhibitors used in our benchmarking (Table S1 †), by evaluating Tanimoto similarity scores computed on Morgan ngerprints (Table S3 † ). The average similarity score was 0.15 with a standard deviation of 0.04, clearly indicating that the novel Mpro inhibitors occupy a signicantly different region of the chemical space than known actives.

Notably, we observed a very low overlap between compounds emerging from different hit calling approaches (Fig. 6a) . Out of 3424 molecules from the top-200 hitlists, 2069 were identied only by a single selection strategy, and the number of compounds appearing in the hitlists of multiple strategies steadly decreased with the number of those. Only three molecules commonly appeared in in eight different hitlists, and none was picked by nine or more strategies (Fig. 6b) . Such variability in prioritizing molecules was also illustrated by the total lack of overlap between automatically generated hitlists A-2.2_N and expert-selected hitlists exp_G and exp_I (Fig. 6c) . In terms of the resulting hit rates, the performances of individual strategies also appeared as highly variable (Fig. 6c) .

Importantly, none of the constructed hitlists returned a null hit rate, with the lowest observed value being 3% for hitlist C corresponding to unprocessed docking outcomes from FRED while the highest hit rate of 11% for a single program was achieved by Autodock GPU. The consequently applied consensus docking and scoring procedures, as well as automated pharmacophore ltering procedures did not signicantly improved hit returns (1.0_K to 1.2_P). Nevertheless, we observed that introducing best-rst clustering based on chemical ngerprints, as initially proposed by Lyu et al. 4 did consistently improve hit rates for individual docking program outcomes (F to J) as well as for consensus-based strategies (2.0_K to 2.2_N), resulting in hit rates as high as 13%.

Seventeen compounds showing promising Mpro inhibition were selected to be subjected to concentration dependent evaluation, and eight demonstrated half-maximal inhibitory concentration (IC 50 ) values below 100 mM (Table S2 and Fig. S5 †) . Notably, the most potent compounds emerged from the data-driven pharmacophore ltering of consensus docking results and from the expert selection (Table S2 †) . Based on conventional subdivision of the Mpro catalytic site, 48 all the identied inhibitors could be assigned to simultaneous occupation of the S1 and S2 regions (Table S2 †) . Of note, these particular sub-pockets are widely considered the most critical sites to inhibit the catalytic activity of Mpro. 49, 50 Thus, the docking poses of the three most potent compounds (Z4921675438, Z4927220858 and Z4921678803, Fig. 7) all highlight a critical hydrogen bond interaction established with His163 in S1, and the occupancy of S2 by hydrophobic groups that are well-buried into the site. The linking region between S1 and S2 (S1/S2) was also occupied in all three cases, as well as in $80% of all the active molecules listed in Table S2 . † Notably, Z4921675438 (Fig. 7a) and Z4921678803 (Fig. 7c) both interact with S1 0 /S3 0 through hydrogen bonding, while Z4927220858 (Fig. 7b) interacted with the S3/S4 site. Each of these two distinct subpockets was occupied by roughly half of the active compounds, as they both are able to accommodate a variety of groups, which can be further exploited to improve activity and specicity of the inhibitors. 51 However, none of the identied hits occupied all ve sub-pockets of Mpro catalytic region.

The developed workow enabled us to rapidly and efficiently reduce 40 billion compounds to a manageable subset of potential hits against the SARS-CoV-2 Mpro. To rene compounds that were well-docked by all ve programs, we explored a variety of hit-selecting strategies, that can be seamlessly integrated into an almost fully automated SBVS pipeline and compared against non-automated, expert-relying approaches. As a result, more than a hundred novel SARS-CoV-2 Mpro inhibitors were identied with minimal or no human intervention. This study yielded a signicant number of novel chemotypes, suitable for future medicinal chemistry optimization, but also yielded some of the most potent Mpro inhibitors that emerged from only molecular docking. 8, [52] [53] [54] Our results also demonstrate that automated hit-calling strategies based on raw docking scores can identify a signicant number of actives (with approximately 10% hit rate), and yet there is a very little agreement between different programs. The introduction of successive consensus lters slightly improved hit rates and consistency of docking outcomes. However, it is the combination of docking pose consensus with structural diversity ltering that returned the highest rates of actives. Notably, all these automated strategies still fall short of outperforming human expert selection. The factor that limits the performance of the automatic ltering likely comes from the inaccurate scoring function of docking and/or even wrong prediction of the binding poses by the programs, 18,55 generating a large number of inactive artifacts that hinder active molecules constituting a minimal percentage of the screened library. The expert selection of binding features and renement of pharmacophore model on the basis of knowledge-driven experience and visual inspection appeared to be critical to the hit selection process. Hence, it should be adopted as essential last step in the general practice of current SBVS.

The results of this ultra-large docking campaign also enabled to conduct a broad analysis of the occupancy of critical subpockets of Mpro catalytic site by active noncovalent ligands. Notably, none of them span the entire catalytic cavity, which is likely to explain their lower inhibition activity compared to latest peptidomimetic covalent inhibitors that entered clinical trials. 56 These results jointly corroborate the complexity of Mpro as a CADD target and once more highlighted the limited success of recent SBVS campaigns in general and docking studies in particular. 57 Nevertheless, the novel noncovalent Mpro inhibitors that we report here provide a signicant number of starting points for their future optimization into drug candidates, following the path of recent successful stories in developing low nanomolar noncovalent Mpro inhibitors from high micromolar hits identied by docking and consensus ltering. 34, 37 Taken together, these results suggest that in silico discovery of inhibitors of SARS-CoV-2 Mpro (among any other targets) can be successfully automated, but still will largely benet from human expert intervention. Thus, we envision a multifaceted role for AI in CADD in the upcoming era of giga-libraries, 2, 15 boosting the high-throughput capabilities of emerging largescale docking platforms, 4,6,9 while closing the gap with knowledge and experience of seasoned CADD experts.

The structure of Mpro with X77 noncovalent ligand bound in the active site (PDB:6W63) 42 was optimized using the Protein Preparation Wizard 58 of Maestro suite from Schrödinger 59 with default parameters. Autodock maps for all supported atom types were generated with autogrid4, 60 using a box of 18Å Â 19.50Å Â 21.75Å centered on X77, and a grid spacing value of 0.375Å. The Glide docking grid was prepared using the Receptor Grid Generation tool from Schrödinger 59 dening a 10 A box centered on the ligand. For ICM docking, the grid was prepared using the Receptor Setup tool 61 on the ligand-receptor complex. The FRED grid was prepared using the Make Receptor tool. 62 The box shape was dened around the ligand with dimensions of 16Å Â 20Å Â 21Å. QuickVina2 box's dimension and coordinates were set to the same values as for Autodock.

QUACPAC 62 was used to generate a dominant ionized, tautomer form at pH 7.4 for each compound to dock. Omega's pose mode 63 was used to generate multiple 3D conformations for FRED docking. Omega's classic mode was used to generate single low energy conformations as inputs for the other programs. Openbabel 64 was used to translate compounds from sdf to pdbqt format for Autodock and QuickVina2 docking. Docking was performed with ve programs, using default parameters unless differently specied: Autodock GPU (https:// github.com/scottlegrand/Autodock-GPU/tree/relicensing version), 24 with 5 million energy evaluations and 10 Lamarckian genetic algorithm 65 runs for each compound, considering the lowest energy pose from the most populated cluster as the best solution; Glide SP module; 25 FRED; 26 ICM 27 with an effort value of 2; and QuickVina2. 28 Benchmarking 214 Mpro ligands with available crystal structures were collected from the Protein Data Bank (RCSB PDB) 66 and the Fragalysis database. 14,35 50 decoys per known active ligand were generated using the DUD-E server. 41 Compounds were docked with the ve programs discussed above. EF 67 at 1% were calculated as EF ¼

Actives list Molecules list (1)

Structures of 54 covalent and noncovalent Mpro active site binders were downloaded from RCSB PDB's COVID-19 Resources collection. In addition, 295 crystals of Mpro complexed with ligands were obtained from the Fragalysis database 14, 35 and ltered to retain only noncovalent active site inhibitors with reported IC 50 below or equal to 5 mM. The protein sequences were aligned and superimposed on the pocket residues of the reference compound X77 in MOE. 68 Pharmacophore modeling was performed using the MOE Ph4 Query tool with the application of Extended Hückel Theory annotation scheme. 69 Consensus pharmacophore features were automatically computed using a default tolerance distance of 1.2Å from the ligands' atoms and a 50% threshold of overlapping features. Features were then selected based on visual inspection of known actives in complex with Mpro.

At the time of this research, the REAL Space library consisted of 12.3 billion compounds in SMILES format. Stereochemical expansion of the database was performed using ipper 63 to enumerate unspecied stereocenters, resulting in 38.7 billion unique molecules. Morgan ngerprints (considering chirality) of 1024 bits and radius 2 were calculated using rdkit chemoinformatic package. 70 The resulting library was then merged with the previously prepared 19 ZINC15 set (1.36 billion compounds). 30 Deep Docking DD was run using a recall of 0.9 for virtual hits. The sizes of validation, test and training (both initial and augmentation batches) sets were set equal to 0.07% of the starting database, as previously described. 19 For all the runs, virtual hits were dened as top 1% ranked molecules of the library in the rst iteration, 0.9% in the second, 0.8% in the third, and so on.

Compounds showing extremely high docking scores (>50) for at least one program were discarded. All docking scores were standardized to ensure mean of 0 and standard deviation of 1. Best-rst clustering as proposed by Lyu et al. 4 was applied using Tanimoto scores calculated over Morgan ngerprints as similarity measure, and a threshold of 0.5. Dendogram-based clustering was applied considering the ve docking scores as implemented in the heatmap.2 R package. 71 Prioritization of molecules based on human inspection Two expert-selected hitlists were generated by following a standard procedure, removing non-neutral and/or compounds with many chiral centers, and processing separately the docking results from Glide and ICM results using a pharmacophore lter. Aer careful visual inspection on the fragments binding in S1, S2 and S1 0 subpockets, 150 compounds were selected for quotation from each list.

A hierarchical agglomerative clustering (HAC) algorithm was applied to investigate the chemical diversity of the lists of selected compounds. HAC utilized Tanimoto scores as measure of diversity and was performed using function agnes (Agglomerative Nesting) implemented in cluster package as described by Kaufman and Rousseeuw 72 and available in R programing language. 71 HAC results were depicted as a circular hierarchical dendrograms, where each merge is represented by a horizontal line that represent the Tanimoto similarity of the two clusters that were merged at each step. 72 To compare the chemical diversity of the lists, we calculated the minimum and maximum number of nodes from the root to the leaves as well as the agglomerative coefficient of the HAC dendrograms. The coefficient describes the strength of the clustering structure, with values closer to 1 suggesting a more balanced clustering structure while values closer to 0 suggest less well-formed clusters.

High throughput screening and IC 50 determination SARS-CoV-2 Mpro (Cat #100823) and internally quenched uorogenic Mpro substrate (Cat #79952) assay kits were purchased from PBS Bioscience. All procedures were performed in accordance with the manufacturer's instructions and protocols. Assays were adapted to a 384-well plate format with compound concentrations at 5 mM (n ¼ 1) in two independent runs. The assay was initiated by dispensing 150 nL of compound per well, to which 7.5 mL of Mpro protease at 2.25mg mL À1 (62.5 nM) and 7.5 mL of Mpro substrate at 15 mM, were added. Plates were read at 460 nm emission upon excitation at 360 nm. HTS percent inhibition was calculated for each compound from the signal in uorescence units (FU) and the mean of the plate controls and the mean of the plate blanks using the following equation:

%inhibition ¼ 100 Â 1 À signal À blank mean control mean À blank mean (2) The compounds of interest were dened as those with a percentage of inhibition over 10% compared with the reaction in absence of inhibitors. Dose-response testing of hit inhibitors were measured using the same reaction mixture as the one used for HTS, but compounds were instead serially diluted in 12 concentration points 2-fold from 200 mM (n ¼ 2). All experimental data were calculated and analysed using GraphPad Prism.

For GPU-based docking and machine learning, we used 250 Tesla V100 GPUs from the Vancouver Prostate Centre's Juggernaut cluster, Dell Technologies HPC and AI Innovation Lab's Rattler cluster, and the Sockeye supercomputing cluster at The University of British Columbia. All other computations were run on Intel® Xeon® Silver 4116 CPU @ 2.10 GHz cores available at the Vancouver Prostate Centre.

Data for this paper, including raw experimental data, computational data for identied inhibitors, structures used for pharmacophore generation, and scripts and results for ltering procedures have been deposited in the Federated Research Data Repository at DOI: 10.20383/102.0524. The Deep Docking code is freely available from https://github.com/jamesgleave/ DD_protocol. 

There are no conicts to declare.

to accelerate the AI algorithms, and the UBC Advanced Research Computing team for providing access and technical support for the Sockeye supercomputing cluster. The authors thank Dr Larry Goldenberg for his tireless support of research efforts on COVID-19 therapeutics development.

6W63: Structure of COVID-19 main protease bound to potent broad-spectrum non-covalent inhibitor X77

7KX5: Crystal structure of the SARS-CoV-2 (COVID-19) main protease in complex with noncovalent inhibitor Jun8-76-3A

The RDKit Documentation-The

Finding Groups in Data: An Introduction to Cluster Analysis