key: cord-265581-pbv8mjfc authors: Tong, Yaojun; Deng, Zixin title: An aurora of natural products-based drug discovery is coming date: 2020-06-06 journal: Synth Syst Biotechnol DOI: 10.1016/j.synbio.2020.05.003 sha: doc_id: 265581 cord_uid: pbv8mjfc Natural products (NPs), a nature's reservoir possessing enormous structural and functional diversity far beyond the current ability of chemical synthesis, are now proving themselves as most wonderful gifts from mother nature for human beings. Many of them have been used successfully as medicines, as well as the most important sources of drug leads, food additives, and many industry relevant products for millennia. Most notably, more than half of the antibiotics and anti-cancer drugs currently in use are, or derived from, natural products. However, the speed and outputs of NP-based drug discovery has been slowing down dramatically after the fruitful harvest of the “low-hanging fruit” during the golden age of 1950s-1960s. With recent scientific advances combining metabolic sciences and technology, multi-omics, big data, combinatorial biosynthesis, synthetic biology, genome editing technology (such as CRISPR), artificial intelligence (AI), and 3D printing, the “high-hanging fruit” is becoming more and more accessible with reduced costs. We are now more and more confident that a new age of natural products discovery is dawning. The number of unique permutations in the chemical space is believed to be much larger than the number of stars in the current known universe. Natural products, especially the secondary metabolites from plants and microbes (bacteria and fungi), occupy a huge and uniquely biologically relevant chemical space that no current chemical synthesis can cover. It makes natural products very important resources for pharmaceuticals. Natural products-based medicines can be traced back thousands of years in the ancient Mesopotamia's sophisticated medicinal system [1] . In China, for example [2] the first record is 五十二病方 (Wushi'er Bingfang), containing 52 prescriptions, which dates from about 1100 B.C. Then followed by神农本草经 (Shennong Herbal), from about 100 B.C., containing 365 drugs; 唐本草 (新修本草, 英公本草, Tang Herbal), from 659 A. D., containing 850 drugs; and 本草纲目 (Compendium of Materia Medica), from 1596 A. D., containing 1,892 drugs. Together with many other documented or undocumented Chinese materia medica, they gave rise to Traditional Chinese Medicine (TCM). However, due to the technology limitations, we have not known for a very long time which natural product component(s) of the herbs are responsible for the medical activities. The discovery of penicillin from the filamentous fungi Penicillium notatum in 1929 by Sir Alexander Fleming [3] not only opened the molecular age of natural products, but also dramatically shifted the focus of natural products discovery from plants to microbes. Fleming shared the 1945 Nobel Prizes in Physiology or Medicine with Ernst B. Chain, and Sir Howard Florey for that discovery. As a result of this focus change, a number of other discoveries were made in rapid succession: in 1943, Selman A. Waksman discovered streptomycin from Streptomyces griseus [4] (who was awarded the 1952 Nobel Prizes in Physiology or Medicine for that discovery); in 1945, Benjamin M. Duggar discovered chlortetracycline from Streptomyces aureofaciens [5] ; in 1947, Paul R. Burkholder discovered chloramphenicol from Streptomyces venezuelae [6] ; in 1953, Edmund C. Kornfeld discovered vancomycin from Amycolatopsis orientalis [7] ; and in 1976, Satoshi Omura discovered avermectins from Streptomyces avermitilis [8] (he shared the 2015 Nobel Prizes in Physiology or Medicine with William C. Campbell and Youyou Tu for the discovery of avermectins and artemisinin, respectively); etc. These remarkable milestones during the 1950s to 1960s are known as the Golden Age of natural products-based drug discovery. To date, research based on the discovery of natural products has been awarded Nobel Prizes in Physiology or Medicine a total of three times in its 120-year history. The first two were 7 years apart, while the last two were 63 years apart. This decline in research outcomes is partially due to the fact that the easily accessible discoveries were made by the classic "top-down" strategy ( Fig. 1 30 years now with no new approvals of natural product-based antibiotics [9] . With the emergence of combinatorial synthesis, a large number of synthetic libraries have been synthesized. It made many pharmaceutical companies step back and even exit from the natural products arena, which resulted in a sharp reduction in the output of both new drug leads and approved drugs from the drug development pipeline [10] . However, with the advances in whole genome sequencing and genome mining, we see that there is a huge yet unexploited potential of secondary metabolites which is hidden in the genome of the microorganisms, both culturable and unculturable [11] . Typically, there are 20-50 biosynthetic gene clusters (BGCs) encoding secondary metabolites per actinomycete strain, less than 10% of which have been chemically identified in laboratory conditions. Given the fact that new bioactivities are inherently linked with novel chemical structures, we could imagine a large number of new bioactive secondary metabolites are waiting on us to discover them. The award of the 2015 Nobel Prizes in Physiology or Medicine for research in natural products both validated and reinvigorated the whole natural product community [12] . The incredible rate of development in genome sequencing, modern metabolic engineering, synthetic biology, advanced genome editing, big data, artificial intelligence (AI), and 3D printing together with the growing microbial strain collections enable us to access the previously inaccessible natural products. Taking these advances all together, there is no doubt that a new age of natural products discovery is yet to come. As the Chinese proverb goes, "without rice, even the cleverest housewife cannot cook". Natural products discovery also requires good resources. The resources discussed here are mainly microbial strain collections and high-quality genome/metagenome sequences. Microbes, as the natural products producers, are the start point of the whole journey of microbial natural products discovery, making themselves irreplaceable. Initiated by Professor V. B. D. Skerman, the Word Federation for Culture Collection (WFCC)-World Data Centre for Microorganisms (WDCM) has been growing into WFCC-MIRCEN (Microbial Resources Centres)-WDCM, the world's leading data center for microbial resources (http://www.wdcm.org). It has been hosted and maintained by the Institute of Microbiology, Chinese Academy of Sciences since 2010. To date, more than 3,233,000 microbials from 791 culture collections in 78 countries were registered in WFCC-MIRCEN-WDCM. Beyond these registered resources, there also are some wellestablished microbial strain collections in both non-profit research institutes and companies (Table 1) . Given the fact that the majority of microogranisms are unculturable with current technologies, genomic and metagenomic sequence information are becoming more and more important as nowadays we are able to directly get encoded compounds from the genetic information. A detailed introduction of microbial genome resources can be found in the review [13] . Two such well-known databases are GenBank (https:// www.ncbi.nlm.nih.gov/genbank/) and Joint Genome Institute (JGI) Genome Portal (http://genome.jgi.doe.gov/portal/). To date, they have recorded > 21,000 and > 16,000 complete bacterial genomes, respectively. It's worth noting that a large part of these complete genomes are resequencing results of the same species. Only around 260 complete genomes for streptomycetes are available in GenBank. Of course, many genome sequencing projects are ongoing, for example the 10 k microbial sequencing project coordinated by Dr. Lixin Zhang, and also many microbial genome sequences have not yet been deposited to public databases, for example the genomic database of over 135,000 strains that was originally assembled by Warp Drive Bio, and now it belongs to Ginkgo Bioworks. With rice ready, now we need tools to cook the meals. Tools related to natural products discovery will be discussed in following sections. After the discovery the double helix of DNA and the human genome project (HGP), synthetic biology was crowned "the third biotechnology revolution". It is a product of multi-disciplinary integration, which is one of the most active areas in biological and biotechnological development. This highly interdisciplinary area applies engineering principles to biology. It involves biology, evolution, chemistry, physics, mathematics, engineering, and informatics. Applications of synthetic biology have made significant achievements in many areas, such as bioenergy, biomaterials, biomedicine, and bulk chemicals. It greatly enables us to better understand and even engineer life. Given the complexity of the biosynthetic pathway of natural products, the "topdown" strategy is obsolete, the more acceptable way to study natural products is the "bottom-up" approach (Fig. 2) . It starts with genome mining (the analysis of high quality whole genome information), which requires bioinformatics, big data, and even AI; to pathway cloning (refactoring), expression and fermentation, which needs design-buildtest-learn (DBTL) cycle-based metabolic engineering; to the target natural product identification, which requires modern chemical analysis; and to later compound modification and clinical studies, which needs biochemistry and cell biology. This procedure perfectly matches the principles of synthetic biology. Applying synthetic biology to natural products-based drug discovery surely will bring the renaissance of natural products discovery [14] . Natural products are a subset of specialized metabolites produced by a given cell factory (the living organism). As a complicated system, the cell itself can be considered as a delicate factory, each metabolic pathway works as a pipeline, all pathways together forming a complicated network [15] . The expression and high yield of a given natural product are highly linked to the whole metabolic network (primary and secondary metabolisms) [16] . We have to know which pathways/enzymes have crosstalk; when, where, and how many enzymes are needed; how to balance the cell growth and production; how to direct the metabolic flow to the target pathway to reach the theoretical yield. A classic example is the production of penicillin, the yield increased more than 10,000 times by simple strain (the cell factory) improvement. With the continuous development in CRISPR-based genome editing techniques, such as CRISPR-Cas9 [17] , CRISPR base editor [18, 19] , and CRISPR prime editing [20] enables much faster and more detailed strain improvement. Given the fact that most of the microbial gene clusters encoding secondary metabolites are so called "cryptic/silent" gene clusters, named for their non-expression and/or trace expression, heterologous expression becomes a powerful approach. With the growth of the metabolism knowledge base, and with the advances in genome editing and metabolic modeling, we are now able to design and construct better microbial cell factories for activation and/or high production of natural products. For example, the anti-malaria drug artemisinin was primarily from A. annua. Due to the low yield, seasonal and regional limitations, it could not meet the market demand. However, artemisinin can easily be chemically synthesized from the precursor artemisinic acid, therefore, researchers reconstituted the biosynthetic pathway of artemisinic acid in a yeast cell factory, the yield can reach as high as 25 g/L in fermentation [21] . Similarly, the bonds between opioid production and requiring field-grown poppies were broken. Recently, researchers successfully produced hydrocodone in a yeast cell factory by reconstitution of a complete hydrocodone biosynthetic pathway involving 23 enzymatic steps [22] . Besides yeast cell factories, like Saccharomyces cerevisiae and Yarrowia lipolytica, some other microbes have great potential for being cell factories in regards to natural product production, such as: E. coli, which has the most well established knowledge base and toolkit; Pseudomonas putida, which in general has high tolerance to many chemicals; and actinomycetes (S. albus, S. coelicolor, S. avermitilis, S. ambofaciens, Saccharopolyspora erythraea, etc), which could be more suitable for expressing biosynthetic gene clusters with high GC content. The design and construction of actinomycete cell factories have been heavily limited by the humble traditional genetic manipulation approaches, the recently established CRIPSR based genome editing methods for actinomycetes [23] [24] [25] bring us possibilities to efficiently make good actinomycete cell factories. Moreover, the emerging of CRISPR-based biosensor development [26] enables faster and more sensitive detection of target compounds produced by microbial cell factories. During the past 50 years, we have witnessed the incredible development of DNA sequencing technologies. The first whole genome to be sequenced was that of the bacteriophage ϕX174 in 1977 [27] , with only 5,386 bp. To get a human genome (3, 234 .83 Mb) sequenced, the HGP took 13 years (1990-2003) , involved 20 universities in 6 countries, and spent~3 billion US dollars. By comparison, the Broad Institute announced that it had sequenced 100,000 whole human genomes by 2018. With the fast evolution of sequencing related technologies, from shotgun sequencing, to pyrosequencing (454), to Illumina sequencing, and now to long read PacBio sequencing and Nanopore sequencing, DNA sequencing has become much cheaper, easier and orders of magnitude faster. We are now entering the age of the 1K genome (it only costs 1,000 US dollars for whole genome sequencing of a human). The increased ability of DNA reading (sequencing) has made the information of DNA sequences indispensable for biological research. The exponentially accumulated information of whole genome sequences is also changing how natural products are discovered, instead of the traditional "top-down" approach, we are shifting to the genome mining based "bottom-up" strategy. Genome mining confirms that there is still a huge potential of novel natural products in the microbial genomes [28] . Compared to the ability to read DNA, the ability to write DNA still has a long way to go. The current nucleic acid synthesis relies heavily on chemical synthesis, with relatively high cost and a long processing time. Can we develop a bio-based (mimicking living organisms), highfidelity, cheap, and fast nucleic acid synthesis platform to reach the stage of "made-to-order" DNA for desired purposes? One bottle neck of the aforementioned "bottom-up" strategy is the cloning of long biosynthetic gene cluster with high GC content. One direct solution, of course, is the complete synthesis of the whole biosynthetic gene cluster and even the genome with reasonable cost and processing time [29] . We are accumulating tons and tons of data in every aspect. For example, the microbial whole genome sequences are becoming big data. For natural products discovery, the information we are looking for from the big data of genomes is clear, to find the right biosynthetic gene clusters with all necessary factors. However, these pieces of information are still like the needle in a haystack. Therefore, the processing of big data to determine the right way to find that needle is critical. More and more software and algorithms have been developed for natural products discovery. A good summary can be found in the Secondary Metabolite Bioinformatics Portal (https://www.secondarymetabolites. org) [30] , currently it contains 25 such tools. The advances in AI and automation will also facilitate natural products discovery. 3D printing is a process of building a three-dimensional object from a computer-aided design. It has been successfully applied in manufacturing, medical, industry, and many other areas. The list of these potential areas is growing. To date, 3D printing has not reached the molecular or even atomic level yet. Natural products-based drugs (or small molecules) that could be directly 3D printed on-demand would be highly advantageous. Some pioneer work has already been done recently [31] . If that day of 3D printing in the molecular level comes, and surely it will, it can help us to solve so many problems, such as the shortage of medical supplies in the fight against the sudden outbreak of some diseases such as COVID-19 caused by the coronavirus SARS-CoV-2. We are in the best times, a lot of what was considered the impossible is becoming possible. Those previously inaccessible resources are becoming accessible. It is an age of pioneering and innovation. Worldwide scientific collaborations are extraordinarily frequent and easy. It is becoming more and more clear that multidisciplinary integration is the trend of scientific development, such as synthetic biology. We are the witnesses of history, but we also need to be prepared for being the builders of history. With world population growth increasing, areas such health, resources, and environment have drawn more and more attention. Natural products would be one of the keys to a bio-sustainable world. We rely on nature, learn from nature, and eventually we may outperform nature. Imagining a picture that in the near future, big data assists us in designing molecules for given diseases, AI helps us to construct the optimal biosynthetic pathways, which then will be synthesized by modern DNA synthesis, synthetic biology helps us to express the pathways and get the desired compounds. An even more wild thought would be to get the desired compounds directly by 3D printing. There are no conflicts to declare. The beginnings of drug therapy: ancient mesopotamian medicine The pharmacology of Chinese herbs The discovery of penicillin Isolation of streptomycin-producing strains of streptomyces-griseus Aureomycin; a product of the continuing search for new antibiotics Chloromycetin, a new antibiotic from a soil actinomycete Some laboratory and clinical experiences with a new antibiotic, vancomycin Avermectins, new family of potent anthelmintic agents: producing organism and fermentation Natural products as sources of new drugs from 1981 to 2014 Drug discovery and natural products: end of an era or an endless frontier? Opportunities for natural products in 21(st) century antibiotic discovery A new golden age of natural products drug discovery Web resources for microbial data Genome engineering and modification toward synthetic biology for the production of antibiotics Engineering and modification of microbial chassis for systems and synthetic biology Strategies for terpenoid overproduction and new terpenoid discovery Multiplex genome engineering using CRISPR/Cas systems Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage Search-and-replace genome editing without double-strand breaks or donor DNA High-level semi-synthetic production of the potent antimalarial artemisinin Complete biosynthesis of opioids in yeast Highly efficient DSB-free base editing for streptomycetes with CRISPR-BEST CRISPR/Cas-based genome engineering in natural product discovery CRISPR-Cas9 based engineering of actinomycetal genomes A CRISPR-Cas12a-derived biosensing platform for the highly sensitive detection of diverse small molecules Nucleotide sequence of bacteriophage phi X174 DNA Genome mining approaches to bacterial natural product discovery Synthetic genomics: from DNA synthesis to genome design The secondary metabolite bioinformatics portal: computational tools to facilitate synthetic biology of secondary metabolite production Digitization of multistep organic synthesis in reactionware for on-demand pharmaceuticals The authors thank Dr. Wen-Jun Li and Dr. Lixin Zhang for agreeing to disclose the number of their microbial strain collections and providing unpblished information. The authors thank Simon Shaw for proofreading the manuscript. Y.T. acknowledges fundings from the Novo Nordisk Foundation (NNF10CC1016517; NNF15OC0016226; and NNF16OC0021746). Z.D. acknowledges funding from the National Natural Science Foundation of China (21661140002).