key: cord-0476460-ihak05lt authors: Murugan, Natarajan Arul; Podobas, Artur; Gadioli, Davide; Vitali, Emanuele; Palermo, Gianluca; Markidis, Stefano title: A Review on Parallel Virtual Screening Softwares for High Performance Computers date: 2021-11-30 journal: nan DOI: nan sha: 361469c33ab862e52cbe56d00bcc76c79c42d40d doc_id: 476460 cord_uid: ihak05lt Drug discovery is the most expensive, time demanding and challenging project in biopharmaceutical companies which aims at the identification and optimization of lead compounds from large-sized chemical libraries. The lead compounds should have high affinity binding and specificity for a target associated with a disease and in addition they should have favorable pharmacodynamic and pharmacokinetic properties (grouped as ADMET properties). Overall, drug discovery is a multivariable optimization and can be carried out in supercomputers using a reliable scoring function which is a measure of binding affinity or inhibition potential of the drug-like compound. The major problem is that the number of compounds in the chemical spaces is huge making the computational drug discovery very demanding. However, it is cheaper and less time consuming when compared to experimental high throughput screening. As the problem is to find the most stable (global) minima for numerous protein-ligand complexes (at the order of 10$^6$ to 10$^{12}$), the parallel implementation of in-silico virtual screening can be exploited to make the drug discovery in affordable time. In this review, we discuss such implementations of parallelization algorithms in virtual screening programs. The nature of different scoring functions and search algorithms are discussed, together with a performance analysis of several docking softwares ported on high-performance computing architectures. Drug discovery is one of the highly challenging, time consuming and the most expensive projects in the healthcare sector. The usual time involved in bringing a drug from basic research to market is 12-16 years and the cost associated is about 2.5 billion dollars [1] [2] [3] [4] . To meet one of the EU sustainable development goals 5 aimed at the good health and well-being for everyone, drugs should be made available to the common people in an affordable price and the current protocols in drug development need to be redesigned to make the discovery process economically sustainable. One of the most promising technique to accelerate the drug discovery process, and to make it more cost-effective, is to perform in-silico virtual screening, and to exploit the computational power of large High-Performance Computing (HPC) systems. One of the major contributing factors to the cost and time associated with the discovery is that it has been reported 6,7 that only one in 10,000 compounds subjected to research and development (R&D) comes out to be successful. The drug discovery involves various steps such as target discovery, lead identification, lead optimization, ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) properties optimization, and clinical trials. 8 Once a valid target is known for a disease, compounds from different chemical libraries are subjected to high-throughput screening against this target. If the number of compounds used for screening can be narrowed down to a few hundreds, the cost and time associated with a drug discovery process can be drastically reduced. Using computational approaches, many of the steps involved in the drug discovery projects can be made cost effective and less time consuming. For example, in the case of protein tyrosine phosphatase-1B, 9 the experimental high-throughput screening of a chemical library with 400,000 compounds yielded a success rate of 0.021% in identifying the ligands that can inhibit the enzyme with IC 50 values less than 100 µM. However, with the use of a preliminary screening phase using a computational approach, the success rate came out to be 34.8% starting from a chemical library of 235,000 compounds. To summarize, the experimental high throughput screening is not suitable to deal with modern chemical spaces since they are composed up to billions of molecules. To solve this problem, it is common to use computational approaches on HPC systems. In this review, we highlight various currently available implementations of virtual screening softwares suitable for high performance computers. Below, we provide general introduction to virtual screening (VS) problem and discuss about the possibilities for the parallelization so that it can be effectively implemented for computing facilities offered by HPCs. The paper is organized as follows. Section 2 introduces the computational VS with 2 details on scoring functions and search algorithms. Section 2.2 presents details on the major breakthroughs obtained in VS and Section 3 presents the main parallelization techniques used in VS and why they target HP systems. In Section 4, we provide an overview about the implementations of different VS softwares. Finally, we discuss about the opportunities offered by reconfigurable architectures such as FPGAs. In general, the computational approaches for molecular docking have two main components: Sampling and Scoring. Sampling refers to generation of various conformations and orientations for the ligand within a target binding site (defined usually by a grid box). Scoring refers to evaluating the binding/docking energies for various configurations of the ligand within the binding site. The most stable configuration of the ligand is referred to as binding pose. The VS protocol where molecular dockings are carried out for all the ligands from a chemical library includes a third component referred to as ranking where different ligands are ranked with respect to their binding potential. Overall, the VS identifies the ligand with the topmost binding affinity (which is based on the docking energies of different ligands) for a given biomolecular target. In addition the most stable binding mode/pose for each of the ligands within the binding site is found (which is based on the relative docking energies of different configurations of the same ligand). Figure 1 shows the general workflow of computer aided drug discovery where VS approach is used to identify lead compounds. It shows the steps involved in the binding pose identification of ligands within the binding site and subsequently the ranking of different ligands is carried out to identify the lead compounds. Most of the VS schemes do not include the flexibility for the target protein and only the sampling over translational, rotational, and torsional degrees of freedom of ligand is accounted for. The reliability and accuracy of the scoring functions used for screening compounds are the most important parameters that dictate the success rate of the computational screening approaches. The scoring functions are mostly defined to be proportional to the binding affinity of the ligand towards a target. The scoring functions are often classified as physicsbased, knowledge-based and empirical. (ii) The knowledge-based scoring functions are based on the available protein-ligand complex structural data from which the distributions of different atom-atom pairwise contacts are estimated. The frequency of appearance of different pairwise contacts are used to compute potential mean force which is used for ranking protein-ligand complexes. (iii) Finally, the empirical scoring functions as the name implies are based on empirical fitting of binding affinity data to potential functions whose weights are computed using a reference test system. Modern scoring functions mainly fall in this class including the machine learning based approaches built based on the available information on the protein-ligand 3D structures and inhibition/dissociation constants. 10 As we discussed above, there are different scoring functions developed and this section mainly focuses on implementations available in open source softwares such as Dock, 11 Autodock4.0, 12 Autodock Vina 13 and Gnina. 14 The docking energy defined to rank proteinligand complexes (sf ) in Autodock4.0 is classified as physics based and is defined as the sum of van der Waals, electrostatic, hydrogen bonding and desolvation energy, as shown in Equation 1 . In addition, the entropic contribution which is proportional to number of rotatable bonds is also added to the docking energy. In the equation, r ij refers to the distance between the two atoms, i and j centered on protein and ligand subsystems. Similarly, q i and q j refer to charges on these atoms. A ij , B ij are the coefficients of the potential energy functions describing van der Waals interaction. C ij , D ij are the coefficients of the potential energy functions describing hydrogen bonding interaction. The terms S i and V i refer to the solvation parameter and fragmental volume of atom, i respectively. Since the protein-ligand complexes are considered to be in an aqueous environment, the binding free energies need to account for this and solvation energy adds the binding free energy differences due to vacuum to aqueous like environments. In particular, the last term in the equation accounts for this solvation effect (refer to Equation 1). In general, the entropic contributions can be due to translational, rotational and tor- Here the summation runs over the ligand and receptor atoms, and g(r ij ) refers to the relative probability distribution of distances of a specific types of protein-ligand atom pairs in the 5 docked complex structure when compared to reference experimental complex structure. Recently deep-learning networks are proposed to provide scoring functions. For instance, Gnina uses convolutional neural network (CNN) based scoring function to rank the proteinligand complexes. 14 The neural networks were trained using three-dimensional protein-ligand complex structures from the PDBbind database. In particular, the dataset contained two sets The search algorithms aim at finding the protein-ligand structure corresponding to the global minimum in a potential energy surface. However, this is very challenging problem and many search algorithms end up in a local minima. Therefore, molecular docking software use several techniques such as deterministic search, 18 Genetic algorithm, Monte Carlo with simulated annealing 19 , particle swarm optimization 20 , or Broyden-Fletcher-Goldfarb-Shanno (BFGS). 13 Deterministic approaches apply techniques such as gradient descend, and they As we discussed above molecular docking approaches employ different types of scoring functions and before implementation they were validated rigorously against available experimen-6 tal data. In particular, two properties obtained from molecular docking can be considered in general for benchmarking: (i) RMSD computed for the predicted binding pose against the crystallographic pose obtained experimentally. (ii) Binding free energies/docking energies which are proportional to experimental inhibition/dissociation constants. The RMSD in the above list is computed from the experimental and predicted protein- The other set of quantities used for benchmarking the molecular docking approaches are the inhibition constants, dissociation constants, IC 50 and pIC 50 which are available from experimental binding assay studies. All these quantities refer to the binding potential or inhibiting potential of ligands to a specific biomolecular target. The dissociation constants and binding free energies are related to each other through the following Equation: Where R is the gas constant (equals to 1.987 cal K −1 mol −1 ) and T refers to temperature (set to 298.15 K). Thanks to this equation, the computed docking energies can be directly validated using the experimental binding assay results. In a computational drug discovery project high affinity lead compounds against a target M, a total of 10 13 of such calculations were carried out. This will be further increased in the flexible receptor docking where the sampling over side chain conformations of residues needs to be accounted for 26 . As can be seen, the computational demand is really huge with such virtual screening applications and so it is inevitable to develop parallel algorithms and to use HPCs to accomplish such screenings within affordable time. The first virtual screening using 3D structures of chemical compounds was carried out in 1990 against the target, dopamine D2 agonists based on which agonist with pKi 6.8 was successfully found. 31 The number of compounds screened against various targets keeps increasing with time. As the chance for a compound with better binding affinity increases with size of the chemical library such screening procedures result in identifying highly potent compounds. We here list a few example cases (refer to Molecular docking for a single chemical compound involves sampling and scoring. This refers to sampling over the configurational phase space for the compound in the target binding site and computing the scoring functions for each of these configurations generated. In the case of virtual screening, compounds from a chemical library are ranked against a single target with respect to their binding affinities and this additional component is referred to as ranking (refer to Figure 1 ). As discussed in the previous section this is computationally very demanding and can be highly benefited from the use of the parallel algorithms which can run on high performance computers with different (shared memory and/or distributed) architectures. 39, 40 Nowadays, the massively parallel computing units referred as graphical processing units (GPUs) and Field Programmable Gate Arrays (FPGAs) are accessible to research groups or even individuals and with the development of parallel virtual screening softwares drug discovery projects can be offloaded to such groups making the drug discovery economically sustainable. Below we highlight the opportunities for the parallelization of VS protocol. As we discussed above the virtual screening involves three key steps: Parallel computing architectures can be classified based on the memory availability to computing units: Currently there are many parallel implementations for doing virtual screening in multicore machines, clusters and accelerators. As we discussed above, different strategies were adopted for parallel virtual screening. Only a few virtual screening softwares such as Flexscreen use the parallel implementation of docking energy calculation. The remaining do the parallelization by distributing the conformational sampling/scoring or the ligands ranking segments over different computing units. We will now analyze one by one more in detail these softwares: we will focus on their performance in massively parallel architectures and accelerators when compared to CPU implementation. Wherever possible the accuracy of the docking results (in terms of reproducing the experimental protein-ligand complex structure and experimental inhibition constants) will be discussed. imental and predicted binding poses was <2Å ) which is due to reduction in the sampling failure of previous Dock versions. There is a also report in the literature on the offloading of Dock6 to CPU+GPU architectures using CUDA. 50 Only the ranking using amber scoring was offloaded to GPU architectures. In this offloading, the coordinates, gradients and velocities are copied to host (CPU) memory to device (GPU) memory and the results are copied back from GPU to CPU memory. Since the GPU could only handle single precision numbers, the original data in double precision were converted to single precision numbers before transfer. Overall, the study reported about 6.5 speed up for the amber scoring in Dock6 in GPU (NVIDIA GeForce 9800 GT) when compared to AMD dual-core CPU. 50 Autodock4.0 is the most widely used molecular docking software based on physics based scrored function but the original version is sequential in nature. However the Autodock include residues for flexible docking, possible to print more than 20 poses. In terms of speed up in HPCs, this did not contribute to any improvement. The performance analysis in Blue gene/P with 2,048 (8, 192 ) node (core) showed that Grid map reuse has reduced single threaded execution time by 17.5%. Multithreaded execution of the code yielded 25% improvement in the overall performance. The execution of the code on nodes ranging from 512-4096 showed near linear scaling behavior for symmetric multi processing (SMP) mode (OMP=4). In particular, for 16384 core system, the speed gained was 92% to that of the ideal case. The virtual node (VN) mode (OMP=1) with the grid map "reuse" option showed however 72% speed up when compared to the ideal case. The gain in the performance of SNP mode should be attributed to multithreading. The node utilization can be further improved with the use of pre-ordering ligands with decreasing number of torsional angles. VinaMPI is another implementation of Autodock Vina for distributed computing architectures. 54 It is written in C and for communication between the nodes, it uses MPI libraries. In order to avoid poor scaling behavior of the parent-child (or master-slave) distribution scheme in massively parallel supercomputers, this implementation uses all-worker scheme. It is worth recalling that rather VinaLC used Master-slave scheme for distributing tasks. In this code, each worker (or each MPI rank) deals with its own protein-ligand complex and within each rank the computation (related to search of global minimum) is carried out using multithreading. Due to this reason it is also suitable for the virtual screening for more than LiGen 55 is a VS software that leverages CPU and GPU to perform the required computation. Several versions of the tool have been developed, starting from a CPU only application, 18, 55, 56 then the main kernels have been ported to the GPU by 57 using OpenACC. Finally, it has been optimized using CUDA for the GPU kernels and this last version has been used to perform a large VS experiment in the search of a therapeutic cure for COVID-19 screening 71.6B compounds against 15 binding sites from 12 Sars-Cov2 viral proteins on 2 supercomputers accounting for 81PFlops. 38 LiGen uses deterministic algorithms to generate the different conformers of a ligand, and an empirical scoring function to select the best molecule. The Docker-HT application is the version of LiGen that is designed targeting a large VS campaign, and it is able to leverage multi-node, multi-core and heterogeneous systems. In particular, it uses MPI to perform the multi-node communication, which is limited as much as possible by the algorithm to avoid large communication overheads. 58 Indeed, the amount of data that needs to be processed by every node is divided beforehand, and it may create load balancing issues since there is no mechanism to re-balance it during the execution of the application. On the single node, it leverages the C++ thread library. Finally, it uses CUDA to support GPU acceleration. Within each node, the program uses pipeline parallelism and work-stealing to process the ligands. This is a parallel implementation of virtual screening available for multicore CPU, GPU and Xeon Phi-computers. The software uses a common code for front-end computations in all these computers. 59 The performance of the code has been tested in multicore CPUs and massively parallel architectures namely Xeon Phi and NVIDIA GPUs. The testing using CCDC/Astex dataset showed 1.9 times increase in performance for Xeon Phi when compared to 10-core Xeon CPU. Further on the GeForce GTX 980 GPU accelerator, the performance was 3.5 times higher when compared to the CPU version. Poap is a GNU parallel based multithreaded pipeline for preprocessing ligand, for doing virtual screening and for post processing the docking results. 60 It also allows the minimal use of memory through optimized dynamic file handling protocol. It has also been optimized Similarly, the Autodock Vina showed 2.4 times speed when compared to default mode (which is already multithreaded) and here the number of jobs was set to three. Gnina is a fork of SMINA 51 and Autodock Vina. 14 When compared to the hybrid scor- Autodock4.0 is one of the most widely used molecular docking software but it is a serial code which runs on a single thread so can not be effectively used in high performance computing environments with multiple CPUs and GPUs. Autodock-GPU 61 is the version of autodock developed for multiple node parallel computers with GPU accelerators. It is worth recalling that the above discussed MPAD4 was developed for multi CPU architectures. This program has been developed using the application programming interface, OpenCL as it allows portability to hybrid platforms with CPUs and GPUs. When compared to Autodock4.0, the local search algorithm uses derivatives of energies with respect to translations, rotations and torsions (this implementation of gradient based local search is referred to as ADADELTA). In the case of CPU+GPU architectures, the workflow consists of a sequence of host and device functions. In analogy to biological gene, the state of the protein-ligand complex is represented by a sequence of variables. In the case of a rigid docking (where the protein framework is treated as a rigidbody), the variables represent positions, orientations and con- times improved which has to be attributed to the computationally expensive calculation of gradients and difficulties associated with the parallelization of this local minimization step. In general, TITAN V cards showed 10 times higher speed up when compared to M2000 versions. The performance analysis in multiple core CPUs showed a similar trend where for the Solis-Wets search the speed up was in the range 5 to 33 times (the number of cores employed 8-36) while for the ADADELTA local search the speed up was 2-20 times better. The focus of this review was mostly about the open source parallel VS softwares which are summarized in Table 3 In prior sections we have focused exclusively on reviewing methods of virtual screening and molecular docking that targets modern central processing unit (CPU) and graphics processing units (GPU) solution (refer to Figure 3 ). At the same time, we know that Moore's law (transistor scaling) is terminating, which could motivate (or even necessitate) the search for alternative computing platform that can continue the performance trend that molecular docking has come to rely upon. Among the many (so-called) post-Moore technologies, 67 reconfigurable architectures are perhaps the most noticeable, partially because they are readily available today. A reconfigurable architecture, such as an Field-Programmable Gate Array (FPGA) or Coarse-Grained Reconfigurable Array (CGRAs) 68 is a system which aspires to retain some of the silicon plasticity that is lost when manufacturing an Application-Specific Integrated Circuit (ASIC). In turn, users can leverage reconfigurable systems to perfectly match the hardware to the application, which in turn can lead to improvement in performance and reduction in energy costs. For example, the expensive von Neumannbottleneck associated with the decoding of instructions in CPUs can be virtually eliminated. Traditionally, reconfigurable architectures such as FPGAs have been programmed using complex low-level hardware description languages (HDLs) such as VHDL or Verilog. This, in turn, has limited exposure of using these devices to hardware specialized and thus out of reach for the typical HPC users. However, with the increased in maturity of High-Level Synthesis (HLS) 69 Aside from disseminating their design-process, they also vary several different architectural properties in their accelerator. For example, they consider both floating-point and fixedpoint representation for various phases of the computation, which demonstrates an advantage that FPGAs can provide over more general-purpose approaches. The accelerator runs at a fairly high frequency (between 172 MHz and 215 MHz) on a Intel Arria 10, and consumes a varying amount of resources (subject to their design-space exploration). They compare their accelerator against the single-threaded Autodock software on five protein targets, and show that they reach between 1.73x and 2.77x speed up. Today, there is a remarkably small number of published work that leverage FPGAs in the Autodock software (for surveys using FPGAs on other molecular algorithms, see 75, 76 ) . What is even more surprising is that (to the authors' knowledge), CGRAs have been largely unexplored in this domain. With both FPGAs and CGRAs emerging as performance (and, more importantly, greener ) alternatives to traditional CPUs and GPUs, we believe that these systems will come to play a much larger role in molecular docking and virtual screening in 25 the future than they have been so far. The parallel implementations virtual screening algorithms in massively parallel computers with multiple CPUs and/or GPUs have the high potential to speedup the exploration of gigantic chemical spaces (having compounds in the range 10 9 to 10 12 ) in real time. In a serial version of the virtual screening softwares, it may take many years of CPU hours for such tasks. Current regard for gigantic docking is the screening of billion compounds from ZINC15 and Enamine database with the use of Autodock-GPU in Summit HPC computer in less than a day. The parallel implementations and reliable scoring functions will increase the success rates in the lead compounds identification for drug discovery. This makes the drug discovery less time consuming and economically sustainable. Further, as the chemical spaces are really huge the drugs with entirely different scaffold geometry can be identified. The speed up of the virtual screening softwares is found to be dependent on the number of factors: energy minimization algorithm, scoring function, biomolecular target and computer architecture. More elaborate studies will allow us to come up with highly optimized virtual screening softwares in the future. The implementation of VS for FPGAs is still in its infancy and a dedicated research is needed for adopting such architectures for drug discovery projects. Autodock Vina What has virtual screening ever done for drug discovery? Expert opinion on drug discovery The cost of drug development: a systematic review The cost of new drug discovery and development Drug development costs about $1.7 billion. Chemical & Engineering News Indicators for monitoring sustainable development goals: An application to oceanic development in the European Union Innovation and marketing in the pharmaceutical industry Exploring different approaches to improve the success of drug discovery and development projects: a review Biopharmaceutical research & development: The process behind new medicines Molecular docking and highthroughput screening for novel inhibitors of protein tyrosine phosphatase-1B Machine-learning scoring functions for structure-based drug lead optimization Using shape complementarity as an initial screen in designing ligands for a receptor binding site of known three-dimensional structure Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading GNINA 1.0: molecular docking with deep learning MMPBSA. py: an efficient program for end-state free energy calculations Integrative approaches in HIV-1 nonnucleoside reverse transcriptase inhibitor design Converging a knowledge-based scoring function: DrugScore2018 LiGen: a high performance workflow for chemistry driven de novo design A comparative study of the simulated-annealing and Monte Carlo-with-minimization approaches to the minimum-energy structures of polypeptides:[Met]-enkephalin Particle swarm optimization Comparative study of several algorithms for flexible ligand docking Virtual chemical libraries: miniperspective Accelerating high-throughput virtual screening through molecular pool-based active learning Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17 ZINC 15-ligand discovery for everyone Ultra-large library docking for discovering new chemotypes Enamine real database: Making chemical diversity real. Chemistry today PubChem substance and compound databases ChemSpiderbuilding a foundation for the semantic web by hosting a crowd sourced databasing platform for chemistry ChEMBL: a large-scale bioactivity database for drug discovery Accomplishments and challenges in integrating software for computeraided ligand design in drug discovery 3D database searching in drug design High-throughput virtual laboratory for drug discovery using massive datasets. The International Journal of High Performance Computing Applications An open-source drug discovery platform enables ultra-large virtual screens GPU-accelerated drug discovery with docking on the summit supercomputer: porting, optimization, and application to COVID-19 research OpenEye Scientific, GigaDocking™ -Structure Based Virtual Screening of Over 1 Billion Molecules Webinar Supercomputer-based ensemble docking drug discovery pipeline with application to COVID-19 EXSCALATE: An extreme-scale in-silico virtual screening platform to evaluate 1 trillion compounds in 60 hours on 81 PFLOPS supercomputers Parallelization of molecular docking: a review Optimization methods for virtual screening on novel computational architectures. Current computer-aided drug design Schulten, K. Accelerating molecular modeling applications with graphics processors An implementation of the smooth particle mesh Ewald method on GPU hardware Effective parallelization of non-bonded interactions kernel for virtual screening on gpus Development and validation of a modular, extensible docking program: DOCK 5 DOCK 6: Impact of new features and current docking performance Pairwise GB/SA scoring function for structure-based drug design Parametrized models of aqueous free energies of solvation based on pairwise descreening of solute atomic charges from a dielectric medium A smooth permittivity function for Poisson-Boltzmann solvation methods Docking validation resources: protein family and ligand flexibility experiments Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise Multilevel parallelization of AutoDock 4.2 Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines Facilitating multiple receptor highthroughput virtual docking on high-performance computers Use of experimental design to optimize docking performance: The case of ligendock, the docking module of ligen Tunable approximations to control time-to-solution in an HPC molecular docking Mini-App Accelerating a geometric approach to molecular docking with OpenACC Understanding the I/O Impact on the Performance of High-Throughput Molecular Docking. International Parallel Data System Workshop GeauxDock: accelerating structure-based virtual screening with heterogeneous computing POAP: A GNU parallel based multithreaded pipeline of open babel and AutoDock suite for boosted high throughput virtual screening Accelerating AutoDock4 with GPUs and gradient-based local search Diverse, high-quality test set for the validation of proteinligand docking performance Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results Structure-based virtual screening: an overview. Drug discovery today Efficient flexible backbone protein-protein docking for challenging targets Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine Architectures for the post-Moore era A survey on coarse-grained reconfigurable architectures from a performance perspective A survey and evaluation of FPGA high-level synthesis tools From OpenCL to high-performance hardware on FPGAs Accelerating parallel computations with openmp-driven system-on-chip generation for fpgas FPGA-based acceleration of the AutoDock molecular docking software. 6th Conference on Ph A case study in using opencl on fpgas: Creating an opensource accelerator of the autodock molecular docking software Accelerating Molecular Docking by Parallelized Heterogeneous Computing-A Case Study of Performance, Quality of Results, and Energy-Efficiency using CPUs, GPUs, and FPGAs Hardware accelerators in computational biology: application, potential, and challenges Hardware accelerated molecular docking: A survey