key: cord-0499951-8l7hrany authors: Wang, Rui; Hozumi, Yuta; Yin, Changchuan; Wei, Guo-Wei title: Mutations on COVID-19 diagnostic targets date: 2020-05-05 journal: nan DOI: nan sha: acaf2f7b3ae9cc5687b869e3c9e01fb5d541bf77 doc_id: 499951 cord_uid: 8l7hrany Effective, sensitive, and reliable diagnostic reagents are of paramount importance for combating the ongoing coronavirus disease 2019 (COVID-19) pandemic at a time there is no preventive vaccine nor specific drug available for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It would be an absolute tragedy if currently used diagnostic reagents are undermined in any manner. Based on the genotyping of 7818 SARS-CoV-2 genome samples collected up to May 1, 2020, we reveal that essentially all of the current COVID-19 diagnostic targets have had mutations. We further show that SARS-CoV-2 has the most devastating mutations on the targets of various nucleocapsid (N) gene primers and probes, which have been unfortunately used by countries around the world to diagnose COVID-19. Our findings explain what has seriously gone wrong with a specific diagnostic reagent made in China. To understand whether SARS-CoV-2 genes have mutated unevenly, we have computed the mutation ratio and mutation $h$-index of all SARS-CoV genes, indicating that the N gene is the most non-conservative gene in the SARS-CoV-2 genome. Our findings enable researchers to target the most conservative SARS-CoV-2 genes and proteins for the design and development of COVID-19 diagnostic reagents, preventive vaccines, and therapeutic medicines. cleotide primers from regions of the virus nucleocapsid (N) gene, i.e., N1 and N2, as probes for the specific detection of SARS-CoV-2. The panel has also selected an additional primer/probe set, the human RNase P gene (RP), as control samples. Many other diagnostic primers and probes based on RNA-dependent RNA polymerase (RdRP, also named IP2, IP4, ORF1ab, or ORF1b), envelope (E), and N genes have been designed [3] and/or designated by the World Health Organization (WHO) as shown in Table S1 of the Supporting Material, which provides the details of 41 commonly used diagnostic primers and probes [10] . Diagnostic test reagents were designed based on early clinical specimens containing a full spectrum of SARS-CoV-2, particularly the reference genome collected on January 5, 2020, in Wuhan (SARS-CoV, NC004718) [12] . It has been reported that different primers and probes show nonuniform performance [1, 2, 4, 6, 7, 11] . Our findings are based on the genotyping of 7818 SARS-CoV-2 genome samples collected up to May 1, 2020, which have 5117 single mutations over about 29.8 kilobases (kb). These mutations occur on all of SARS-CoV-2 genes and proteins, indicating alarming impacts on the current efforts in the development of COVID-19 diagnostic tests, prevention vaccines, and therapeutic medicines. We employ K-means methods to cluster these mutations, resulting in globally at least five distinct subtypes of SARS-CoV-2 genomes, from early Cluster I to late Cluster V. Table 1 shows cluster distributions of samples (N NS ) and total mutation counts (N TF ) for 11 countries. Table 2 provides all mutations on various primers and probes and their occurring frequencies in various clusters. More detailed mutation information is given in Tables S2-S42 of the Supporting Material. It is interesting to note that N-China-F [10] is the most inefficient reagent among all primers/probes and its SARS-CoV-2 target has eight mutations involving samples in all five clusters, which may explain many media reports about the inefficiency of certain COVID-19 diagnostic kits made in China. Note that primers and probes typically have a small length of around 20 nucleotides. Currently, all the primers and probes used in the US target the N gene [10] . Unfortunately, Table 2 shows that all of the US CDC designated COVID-19 diagnostic primers have been compromised. The targets of N gene primers and probes used in Japan, Thailand, and China, including Hong Kong, except for that of N-China-P, have undergone multiple mutations involving many clusters as well. It is interesting to note that the targets of four E gene primers and probes have only six mutations. No mutation has been found on the targets of RNA-dependent RNA polymerase-based primers or probes, nCoV-IP2-12669Fw primer, ORF1ab-China-F, ORF1ab-China-R, and ORF1ab-China-P. However, the target of nCoV-IP2-12759R recommended by Institut Pasteur, Paris has 7 mutations. Overall, targets of the envelope and RNA-dependent RNA polymerase based primers and probes have fewer mutations than those of the N gene. This observation leads us to wonder whether the N gene is particularly prone to mutations. To understand whether there is a differentiation in SARS-CoV-2 gene mutation pattern, we analyze the gene-specific statistics of SARS-CoV-2 single mutations. Table 3 lists the mutation ratio, i.e., number of unique single-nucleotide polymorphisms (SNPs) over the corresponding gene length, for all SARS-CoV-2 genes. A smaller mutation ratio for a given gene indicates its higher degree of conservativeness. Clearly, ORF7b gene has the smallest mutation ratio of 0.0775. The N gene has the second largest mutation ratio of 0.2705, which is very close to the largest ratio of 0.2800 for ORF3a gene. To take into the consideration of mutation frequency, we introduce the mutation h-index, defined as the maximum value of h such that the given gene section has h single mutations that have each occurred at least h times. Normally, larger genes tend to have higher h-index. Table 3 shows that, with a moderate length, the N gene has the largest h-index of 29, which is significantly higher the second largest h-index of 18 for NSP3. Therefore, it was truly unfortunate for the world to have selected SARS-CoV-2 N gene primers and probes as diagnostic reagents for combating COVID-19. In summary, the targets of currently used COVID-19 diagnostic reagents have had numerous mutations that have seriously undermined our ability to combat COVID-19. In the Supporting Material, we provide a full list of all 5117 SNP variants, including their positions and mutation types. This information, together with ranking of the degree of the conservativeness of SARS-CoV-2 genes or proteins given in Table 3 , enables researchers to avoid non-conservative genes (or their proteins) and mutated nucleotide segments in designing COVID-19 diagnosis, vaccine and drugs. Methods and materials SARS-CoV-2 genome sequences from infected individuals dated between January 5, 2020, and May 1, 2020, are downloaded from the GISAID database [8] ( https://www.gisaid.org/). We only consider the records in GISAID with complete genomes and submission dates. The resulting 7818 complete genome sequences are rearranged according to the reference SARS-CoV-2 genome [12] by using the Clustal Omega multiple sequence alignment with default parameters [9] . Gene variants are recorded as single-nucleotide polymorphisms (SNPs). The Jaccard distance [5] is employed to compute the similarities among genome samples. The resulting distance matrix is used in the k-means clustering of all samples. The nucleotide sequences of the SARS-CoV-2 genomes used in this analysis are available, upon free registration, from the GISAID database ( https://www.gisaid.org/). Supporting Material presents a list of 5117 SNP variants of 7818 SARS-CoV-2 samples across the world, a list of 41 commonly used diagnostic primers and probes, and tables of mutation details on 41 diagnostic primers and probes. The acknowledgments of the SARS-COV-2 genomes are also given in the Supporting Material. Comparative performance of SARS-CoV-2 detection assays using seven different primer/probe sets and one assay kit. medRxiv Improved molecular diagnosis of COVID-19 by the novel, highly sensitive and specific COVID-19-rdrp/hel real-time reverse transcription-pcr assay validated in vitro and with clinical specimens Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Comparative analysis of primer-probe sets for the laboratory confirmation of SARS-CoV-2 Distance between sets Comparative performance of SARS-CoV-2 detection assays using seven different primer/probe sets and one assay kit Evaluation of a quantitative rt-pcr assay for the detection of the emerging coronavirus SARS-CoV-2 using a high throughput system Gisaid: Global initiative on sharing all influenza data-from vision to reality Clustal omega. Current protocols in bioinformatics Diagnosing COVID-19: The disease and tools for ddtection Analytical sensitivity and efficiency comparisons of SARS-CoV-2 qrt-pcr assays. medRxiv A new coronavirus associated with human respiratory disease in China The coronavirus disease 2019 (COVID-19) pandemic outbreak caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), first reported in Wuhan in December 2019, has spread to 187 countries and territories with more than 3.481 million infection cases and 244,633 fatalities worldwide by May 1, 2020. Additionally, travel restrictions, quarantines, and social distancing measures have essentially put the global economy on hold. Unfortunately, there is no specific medication nor vaccine for COVID-19 at this moment. Therefore, reopening economies depends vitally on effective COVID-19 diagnostic testing, patient isolation, contact tracing, and quarantine. It cannot be overemphasized the importance of diagnostic testing for combating COVID-19.We reveal that there are many mutations on the COVID-19 diagnostic targets commonly used around the world, including those designated by the United States (US) Centers for Disease Control and Prevention (CDC). These mutations seriously undermine the current global effort in COVID-19 testing, prevention, and control. Approved by the US Food and Drug Administration (FDA), the CDC has detailed guidelines for COVID-19 diagnostic testing, called "CDC 2019-Novel Coronavirus (2019-nCoV) Real-Time RT-PCR Diagnostic Panel" ( https://www.fda.gov/media/134922/download). The CDC has designated two oligonu- This work was supported in part by NIH grant GM126189, NSF Grants DMS-1721024, DMS-1761320, and IIS1900473, Michigan Economic Development Corporation, Bristol-Myers Squibb, and Pfizer. The authors thank The IBM TJ Watson Research Center, The COVID-19 High Performance Computing Consortium, and NVIDIA for computational assistance.