key: cord-269283-jm18lj5t authors: Uddin, Md Bashir; Hasan, Mahmudul; Harun-Al-Rashid, Ahmed; Ahsan, Md Irtija; Imran, Md Abdus Shukur; Ahmed, Syed Sayeem Uddin title: Ancestral origin, antigenic resemblance and epidemiological insights of novel coronavirus (SARS-CoV-2): Global burden and Bangladesh perspective date: 2020-07-01 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2020.104440 sha: doc_id: 269283 cord_uid: jm18lj5t SARS-CoV-2, a new coronavirus strain responsible for COVID-19 has emerged in Wuhan City, China and still continuing its worldwide pandemic nature. Considering the severity of the disease, a number of studies are underway, and full genomic sequences have already been released in the last few weeks to enable the understanding of the evolutionary origin and molecular characteristics of this virus. Bioinformatics analysis, satellite derived imaging data and epidemiological attributes were employed to investigate origin, immunogenic resemblance and global threat of newly pandemic SARS-CoV-2 including Bangladesh perspective. Based on currently available genomic information, a phylogeny study was employed focusing four types of representative viral proteins (spike, membrane, envelope and nucleoprotein) of SARS-CoV-2, HCoV-229E, HCoV-OC43, SARS-CoV, HCoV-NL63, HKU1, MERS-CoV, HKU4, HKU5 and BufCoV-HKU26. The findings clearly demonstrated that SARS-CoV-2 exhibited evolutionary convergent relation with previously reported SARS-CoV. It was also found that SARS-CoV-2 proteins were highly similar and identical to SARS-CoV proteins, though proteins from other coronaviruses showed lower level of similarity and identical patterns. The cross-checked conservancy analysis of SARS-CoV-2 antigenic epitopes showed significant conservancy with antigenic epitopes derived from SARS-CoV. The study also prioritized the temperature comparison through satellite imaging alongside compiling and analyzing the epidemiological outbreak information on the 2019 novel coronavirus based on several open datasets on COVID-19 (SARS-CoV-2) and discussed possible threats to Bangladesh. COVID-19 has opened a new chapter of human civilization with a lots of tragedy stories. A new strain of coronavirus family, 2019 novel coronavirus or SARS-CoV-2 has emerged and infected thousands of humans. It is gaining importance due to daily increases in the deaths caused by this disease [1] [2] [3] . The virus has already been reported from Wuhan (China), Thailand, Japan, South Korea, Iran, and the US and is poised to occur in many more areas of the world community causing a pandemic scenario [4] [5] [6] and globally increasing the potential for rapid horizontal spread geographically [7] . Determining the origin, evolution and antigenic resemblance of SARS-CoV-2 is urgently needed to study its molecular pathogenesis, perform surveillance, [8, 9] were employed for the study. Again, as some reports and analyses guessed bats as the probable original host of SARS-CoV-2, we also considered two strains of bat-originated coronavirus (HKU4 and HKU5) in this study. From database and literature searches, only a single Buffalo-originated coronavirus strain collected from Bangladesh (BufCoV-HKU26-M) [10] was used for the comparative study with COVID-19 strains isolated from Wuhan, China [11] . The global risk of the 2019 novel coronavirus (COVID-19 [SARS-CoV-2]) has recently been addressed by many scientists [12] [13] [14] [15] . Outside China,COVID-19 transmission has been found in over 210 countries and territories [1, 15] . The US declared emergency funds because of coronavirus to the countries that are either affected or at high risk of spread, including Bangladesh [16] . As the outbreak of the 2019 novel coronavirus (COVID-19 [SARS-CoV-2]) is expanding rapidly, analysis of epidemiological data of COVID-19 is necessary to explore the measures of burden associated with the disease and to simultaneously gather information on determinants and interventions. Therefore, we designed this study to compare the genetic materials of SARS-CoV-2 with different previously reported [17] [18] [19] [20] . Accordingly, we also extracted population data of countries and provinces (China) from several websites [21] [22] [23] . The retrieved protein sequences were subjected to multiple sequence alignment (MSA) by ClustalW [24] and phylogenetic relationship (maximum parsimony, MP) studies by using MEGA X [25] to understand the ancestral origin and antigenic resemblance of SARS-CoV-2 with other coronaviruses. In addition, pairwise sequence alignment of SARS-CoV-2 proteins with other viral strains was performed by the EMBOSS Needle online software, which uses the Needleman-Wunsch alignment algorithm to find the optimum alignment (including gaps) of two sequences along their entire length [26] . Moreover, sequence alignment was also visualized and analyzed by using Jalview software (https://www.jalview.org/). Targeting potential antigens from viral proteins is crucial for constructing peptide-based vaccine molecules that can interact with B lymphocytes [27] . It was reported that peptide flexibility and J o u r n a l P r e -p r o o f Journal Pre-proof proper surface accessibility are prerequisites for being a potential B cell epitope. Considering those parameters, the immunogenic peptide sequences from four types of viral proteins were determined by using the B cell epitope prediction tools of The Immune Epitope Database (IEDB) [28], which employs the Bepipred linear epitope prediction method [29] . The VaxiJen v2.0 server (http://www.ddgpharmfac.net/vaxijen/) was used for screening out the most immunogenic peptides determined from IEDB [28] . However, epitope conservancy analysis is an important step to determine the degree of desired epitope distribution in its homologous protein set. In this study, the conservancy pattern of mostly immunogenic B cell peptide sequences of COVID-19 was compared with other homologous sequences retrieved from the NCBI database by using BLASTp [30] . Moreover, the conservancy study of immunogenic peptides predicted from the SARS-CoV-2 proteins was also compared against other human coronavirus strains (HCoV-229E, HCoV-OC43, SARS-CoV, HCoV-NL63, HKU1 and MERS-CoV). The epitope conservancy analysis tool (http://tools.iedb.org/conservancy/) of the IEDB was used to continue the conservancy analysis [31]. Homology modeling of spike glycoprotein (P0DTC2), membrane protein(P0DTC5), envelope protein (P0DTC4) and nucleoprotein (P0DTC9) of SARS-CoV-2 was performed by using the I-TASSER server [32] . Although 3D structures were generated by multiple threading alignments in the I-TASSER server, refinement was conducted using ModRefiner [33] followed by the FG-MD refinement server to improve the accuracy of the predicted 3D modeled structure [34]. ModRefiner allowed for significant improvements in the physical quality of the local structure based on hydrogen bonds, side-chain positioning and backbone topology of the native-state proteins. Again, FG-MD, a molecular dynamics-based algorithm for structure refinement, usually works at the atomic level. The refined protein structure was further validated by RAMPAGE [35] and ERAAT analyses [36] .Structures were visualized and analyzed by PyMoL [37] . J o u r n a l P r e -p r o o f We illustrated the number of cases and deaths of SARS-CoV-2 in a consecutive way through graphs to elucidate the pattern of occurrence of those outcomes. We covered country-wise cases and deaths, the onset of global and Chinese cases by date, the global death toll per day, and province-wise cases and deaths in China. We calculated the crude mortality rate and case fatality according to the formulas suggested by the CDC [38] as well as Jacob and Ganguli [39] . Here, we calculated the crude mortality rate for those countries and for Chinese provinces, having death records per 1 crore persons, for better interpretation. It is already known that the SARS-CoV-2 can multiply even at high temperatures, especially temperatures higher than 15° C [40, 41] ; however, SARS-CoV-2 is rapidly inactivated at 20°C [41] . Therefore, temperature plays a great role in its multiplication. For this purpose, recent environmental temperature data from the place of first occurrence as well as Bangladesh were obtained from Landsat-8 satellite data. This satellite provides high spatial resolution (30 m) data at 15-day intervals. Using the brightness temperature of band number 10 (TIR-1) and emissivity data temperature (in °C) of bands 4 and 5 (L8 Data Users Handbook), a large area (a 30-km-wide swath) can be obtained for a time with minor deviation from in situ temperature data (maximum 0.45 degree Celsius SD). Therefore, cloudless or less cloudy images (less than 90%) were obtained from the USGS webpage (www.earthexplorer.usg.gov). A maximum of 2 data points were available for one area in each month. However, neighboring path and row image borders shared some common areas, which provided more frequencies for those overlapped areas. Level-1 Tier-1 images, which are radiometrically and geometrically corrected, were used in this study. First, all images fulfilling the cloud-related conditions were downloaded. A total of 90 images covering the land areas of Wuhan, China, Korea, Italy and Bangladesh were downloaded. Then, DN of Band 10 data were converted to emissivity and simultaneously converted to brightness J o u r n a l P r e -p r o o f Journal Pre-proof temperature by using "equation 1" [42] . Then, the emissivity was converted to temperature by using "equation 2" [43] . The estimated data were obtained by the Landsat 8 Thermal Infrared Sensor (TIRS) of band 10. This information was automatically obtained from metadata. The four phylogenetic trees constructed from four types of representative viral proteins (spike, Besides, fewer level of similarity and identical patterns were found with other viral strains, including BufCoV-HKU26 of Bangladesh origin ( Table 1) . were employed to determine the most antigenic sites by using the B cell epitope prediction tool of IEDB and VaxiJen scoring. The VaxiJen server, which gave a result well above the threshold value (0.40), usually reveals the immunogenic potential to stimulate a protective response in host organisms [44] . From the analysis, a total of 17 epitopes from S proteins, 1 epitope from M proteins, 1 epitope from E proteins and 5 epitopes from N proteins were found to be mostly immunogenic in SARS-CoV-2, with almost 100% of peptides carrying more than the threshold value of the antigenic score of the VaxiJen server ( and were subjected to conservancy analysis with the immunogenic epitopes from SARS-CoV-2 proteins. It was found that antigenic sites are almost conserved in all of the homologous protein sequences deposited in the NCBI database ( Table 2) . Cross-checked conservancy analysis of COVID-19 antigenic epitopes with SARS-CoV proteins showed that conservancy when crosschecked with other coronaviruses, including BufCoV-HKU26 of Bangladesh origin, was not significant ( Table 3) . In China, 28 of 34 provinces experienced deaths from COVID-19, and the highest death toll occurred in Hubei (3,122) Province, followed by Henan (22) and Heilongjiang (13); in other provinces, the death toll was below ten up until 19 March 2020 (Supplementary File 4) . Upon analysis of mortality data over the time period from 11 January 2020 to 09 March 2020, therefore, only the eastern part is shown here. Similarly, during the study period in the midregion of Korea, the temperature was very low, which was caused by the presence of heavy and widespread clouds in that region during satellite image acquisition. However, very few clouds covers were found for the Landsat-8 image acquisition for February 2020 for the Italy areas. In almost all areas temperature were lower than 20°C except a few places where the temperature did not exceeded 25°C. Therefore, interpretations from the figures for these regions should be guarded in order to avoid errors. The novel coronavirus SARS-CoV-2 became a pandemic because of its global spread [45] . As the genetic architecture of SARS-CoV-2 was highly divergent from that of BufCoV-HKU26 J o u r n a l P r e -p r o o f (Figures 1 and 2 Many of the scientists and pathologists revealed that high temperature and humidity able to J o u r n a l P r e -p r o o f restrict the spread of COVID-19 and spread of disease will be suppressed as the weather warms [61, 62] . This also supports our hypothesis. Interestingly, coronaviruses that cause colds do tend to subside in warmer months. However, it is highly uncertain whether SARS-CoV-2 will behave the same way. Current research by scientists is too early to predict how the virus will respond to changing weather [63]. Immunogenicity and epitope conservancy analyses of coronavirus proteins were performed to determine the potential B-cell epitopes that would interact efficiently with B lymphocytes to initiate the immune response against specific viral pathogens [64] . The study identified a total of 24 highly immunogenic B-cell epitopes from SARS-CoV-2 proteins (17 epitopes Table 1 ). The antigenic sites of COVID-19 were also crosschecked with other coronavirus-corresponding proteins ( Table 3) respectively. This calculation agrees with the report of Wang et al. [72] , who stated that the global case fatality was close to 3%. However, the global case fatalities of SARS (9.60%) and J o u r n a l P r e -p r o o f World Health Organization): Coronavirus disease (COVID-2019) situation reports Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia International Journal of Infectious Diseases The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health -The latest 2019 novel coronavirus outbreak in Development of Genetic Diagnostic Methods for Novel Coronavirus 2019 (nCoV-2019) in Japan Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany The global spread of 2019-nCoV : a molecular evolutionary analysis Passengers' destinations from China: Low risk of Novel Coronavirus (2019-nCoV) transmission into Africa and South America A highly conserved WDYPKCDRA epitope in the RNA directed RNA polymerase of human coronaviruses can be used as epitope-based universal vaccine design Occurrence of Foot and Mouth Disease (FMD) during 2014-2016 in cattle of Sirajganj district First genome sequences of buffalo coronavirus from water buffaloes in A Novel Coronavirus from Patients with Pneumonia in China Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Potential for global spread of a novel coronavirus from China Preliminary assessment of the International Spreading Risk Associated with the 2019 novel Coronavirus ( 2019-nCoV ) outbreak in Wuhan City Global Health Policy:COVID-19 Coronavirus Tracker Outbreak of Acute Respiratory Syndrome Asssociated with a Novel Coronavirus 2020. and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum Protein-protein docking on molecular models of Aspergillus niger RNase and human actin: Novel target for anticancer therapeutics Pymol: An open-source molecular graphics tool Measures of Risk, Section 3: Mortality Frequency Measures Handbook of Clinical Neurology The effects of temperature and relative humidity on the viability of the SARS coronavirus Persistence of coronaviruses on inanimate surfaces and its inactivation with biocidal agents Estimation of Sea Surface Temperature (SST) Using Split Window Methods for Monitoring Industrial Activity in Coastal Area Retrieval of sea surface temperature over Poteran Island water of Indonesia with Landsat 8 TIRS image: A preliminary algorithm Immunogenicity Prediction by VaxiJen: A Ten Year Overview Real-Time Estimation of the Risk of Death from Novel Coronavirus (COVID-19) Infection: Inference Using Exported Cases Recombination, Reservoirs, and the Modular Spike: Mechanisms of Coronavirus Cross-Species Transmission Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Temperature, humidity, and latitude analysis to predict potential spread and seasonality for COVID-19 Transmissibility of COVID-19 and its association with temperature and humidity In Silico Vaccine Strain Prediction for Human Influenza Viruses Immunoinformatics approaches for designing a novel multi epitope peptide vaccine against human norovirus (Norwalk virus) Exploring T & B-cell epitopes and designing multiepitope subunit vaccine targeting integration step of HIV-1 lifecycle using immunoinformatics approach. Microbial Pathogenesis Significance of RNA Sensors in Activating Immune System in Emerging Viral Diseases. Dynamics of Immune Activation in Viral Diseases World Health Organization (WHO) Recombinant Modified Vaccinia Virus Ankara Expressing the Spike Glycoprotein of Severe Acute Respiratory Syndrome Coronavirus Induces Protective Neutralizing Antibodies Primarily Targeting the Receptor Binding Region Middle East respiratory syndrome coronavirus (MERS-CoV): MERS monthly summary A novel coronavirus outbreak of global health concern Human immunopathogenesis of severe acute respiratory syndrome (SARS) Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Manuscript writing-original draft, review and editing Abdus Shukur Imran:Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Manuscript Writingoriginal draft Formal analysis, Methodology, Project administration, Software, Supervision, Validation, Visualization, Manuscript writing-original draft All authors read and approved the final version of the manuscript. The descriptions are accurate and agreed by all authors Table 2 : Template proteins considered for 3D homology structure predictions by using I-TASSER. Table 3 HCoV -OC43 The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.J o u r n a l P r e -p r o o f