key: cord-0970451-y23t4vke authors: Huang, Philip C.; Goru, Rohit; Huffman, Anthony; Yu Lin, Asiyah; Cooke, Michael F.; He, Yongqun title: Cov19VaxKB: A Web-based Integrative COVID-19 Vaccine Knowledge Base date: 2021-12-28 journal: Vaccine X DOI: 10.1016/j.jvacx.2021.100139 sha: 177cb48d94192f3d7368e346fcf5261abaa9792a doc_id: 970451 cord_uid: y23t4vke The development of SARS-CoV-2 vaccines during the COVID-19 pandemic has prompted the emergence of COVID-19 vaccine data. Timely access to COVID-19 vaccine information is crucial to researchers and public. To support more comprehensive annotation, integration, and analysis of COVID-19 vaccine information, we have developed Cov19VaxKB, a knowledge-focused COVID-19 vaccine database (http://www.violinet.org/cov19vaxkb/). Cov19VaxKB features comprehensive lists of COVID-19 vaccines, vaccine formulations, clinical trials, publications, news articles, and vaccine adverse event case reports. A web-based query interface enables comparison of product information and host responses among various vaccines. The knowledge base also includes a vaccine design tool for predicting vaccine targets and a statistical analysis tool that identifies enriched adverse events for FDA-authorized COVID-19 vaccines based on VAERS case report data. To support data exchange, Cov19VaxKB is synchronized with Vaccine Ontology and the Vaccine Investigation and Online Information Network (VIOLIN) database. The data integration and analytical features of Cov19VaxKB can facilitate vaccine research and development while also serving as a useful reference for the public. The emergence of coronavirus disease 2019 , caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), has severely impacted human populations on a global scale. As of November 8, 2021, over 246 million confirmed cases of COVID-19 had been recorded worldwide since the start of the COVID-19 pandemic, resulting in nearly 5 million deaths [1] . Health Organization's COVID-19 Vaccine Tracker and Landscape [2] , the London School of Hygiene and Tropical Medicine VaC tracker [3] , and the New York Times Coronavirus Vaccine Tracker [4] . However, these resources typically focus on one or a few of these vaccine-related topics for a specific group of users, such as adverse events, clinical trials, or general vaccine or vaccination information. As more relevant data is generated, an organized, accessible knowledge base that integrates COVID-19 vaccine information from various sources is necessary. Existing data curation, integration, and analysis systems that focus on vaccine information include the Vaccine Investigation and Online Information Network (VIOLIN) database and Vaccine Ontology. VIOLIN is a web-based, publicly accessible vaccine database that includes information about over 4,000 vaccines for over 200 pathogens and non-infectious diseases (http://www.violinet.org) [5] . VIOLIN also includes many small databases and features such as the Vaxign2 vaccine design program [6] and VO-SciMiner, an ontology-based literature mining tool [7] . Vaccine Ontology (VO) is a community-based ontology that covers different aspects of vaccines and vaccination, including vaccine components, formulations, and host responses [8, 9] . To address the need for a publicly accessible and integrated repository of COVID-19 vaccine information, we have developed the COVID-19 Vaccine Knowledge Base (Cov19VaxKB). Developed as a relatively independent program under the umbrella of the VIOLIN system, Cov19VaxKB is focused on the collection, annotation, and integration of COVID-19 vaccine information encompassing vaccine development, production, safety, immunogenicity, efficacy, and more. Cov19VaxKB also contains features that allow users to analyze data related to vaccine efficacy, safety, and mechanisms. The knowledge base is freely available for public use and can be accessed at http://www.violinet.org/Cov19VaxKB. Cov19VaxKB was established within the VIOLIN database system using two virtual servers in the University of Michigan Medical School virtual server system that runs the Redhat Enterprise Linux operating system [5] . It is developed with classical three-tier architecture. The knowledge base website features a series of comprehensive vaccine lists, a vaccine adverse event analysis program, a vaccine design tool, an automated literature search feature, a list of vaccine news updates, and links to other COVID-19 vaccine resources. Figure 1 illustrates the workflow of the Cov19VaxKB/VIOLIN database design and implementation. Data in Cov19VaxKB is manually curated and annotated through two platforms: the knowledge base's vaccine list web pages and the VIOLIN web-based data curation system. The vaccine list pages are constructed using the PHP programming language. Data within the vaccine lists is primarily derived from the WHO's "COVID-19 Vaccine Tracker and Landscape," which contains an extensive list of all COVID-19 vaccines in all stages of development as well as their associated clinical trial IDs (https://www.who.int/publications/m/item/draft-landscape-of-covid-19-candidatevaccines). This resource is used to gather information about vaccine names, vaccine type, manufacturer, route of administration, number of doses, length of time between doses, and clinical trial IDs. In addition, clinical trial record URLs, age subgroups, and location are derived from clinical trial websites such as clinicaltrials.gov. Relevant publications are identified by searching the name of the vaccine of interest on PubMed. Links to corresponding VIOLIN and Vaccine Ontology entries are also incorporated into these lists. When applicable, the date on which a vaccine was first authorized by a regulatory agency is sourced from a manual web search. All information from these resources is manually curated into PHP files, which are then uploaded to the Cov19VaxKB server for display. The vaccine list pages are organized according to vaccine development status, including preclinical studies, Phase 1-3 clinical trials, and authorization for emergency or full use. These lists are updated weekly to ensure that the information provided is up-todate and accurate. The VIOLIN data curation system is also utilized for manual curation of data in Cov19VaxKB [5] . VIOLIN entries for each vaccine contain product information, such as manufacturer, vaccine type, antigen, and immunization route, as well as host response data from preclinical and clinical studies, including vaccine efficacy, immune response, and side effects. These entries can be accessed through the Cov19VaxKB query feature described in the next section. The Cov19VaxKB web interface includes a query for COVID-19 vaccine entries that are stored in the VIOLIN database. The query is submitted from the Cov19VaxKB web user interface (the presentation tier) and is then processed using PHP/SQL (the middle tier, application server) against a MySQL relational database (the data tier, database server). Query results are then displayed in an accessible web browser. The vaccine adverse event analysis tool in Cov19VaxKB contains a query for adverse event case report information derived from VAERS and a statistical analysis feature. Case report data for all vaccines is downloaded monthly from the CDC VAERS database and deposited into a local MySQL database. Through a server-side script, the data is parsed and filtered for COVID-19 vaccines. The resulting case report data is then formatted to include attributes such as vaccine name, USA state or territory, age and sex of vaccine recipient, year of vaccination, and VAERS report year. This formatting allows users to query and filter adverse events for a specific COVID-19 vaccine based on the attributes described above. Users can also select a specific adverse event to access comprehensive tables of individual VAERS case reports. To display a potential association between a specific adverse event (AE) and a COVID-19 vaccine, three statistical measures are calculated: a Chi-squared value with its associated degrees of freedom and p-value, Proportional Reporting Ratio (PRR) [10] , and case report frequency [11] . An R script for a Pearson Chi-square test with Yates' continuity correction uses a 2x2 frequency/contingency table to calculate the Chi-squared value, degrees of freedom, and p-value. The PRR represents the frequency of an adverse event for a vaccine of interest relative to all other case reports for all vaccines in the VAERS database. To determine whether a specific AE is significantly enriched for a specified COVID-19 vaccine, we have used a set of significance cutoffs as reported previously [11] , which includes three criteria: Chi-squared value > 4, PRR > 2, and number of case reports > 0.2% of total case reports for the specified vaccine. All three criteria need to be met to identify the AE as significantly enriched for the vaccine. Using the Cov19VaxKB statistical analysis tool and cutoff criteria, we generated a list of statistically significant adverse events for the Pfizer-BioNTech, Moderna, and Johnson & Johnson (Janssen) vaccines. These adverse events were systematically compared and analyzed among the three vaccines. Cov19VaxKB features an automated literature update tool that lists COVID-19 vaccine-related publications that have been published within the current and previous months. Publications are extracted from PubMed via NCBI's E-utilities data retrieval program and are formatted and displayed in HTML webpages using a PHP script [12] . respectively. In all queries, the results are filtered by date. To update the publication lists automatically, the PHP script is run daily using a server-side cron job [13] . Direct links to PubMed queries of COVID-19 vaccine and coronavirus vaccine publications are also included. We have previously developed a web application for vaccine design (Vaxign2), which utilizes reverse vaccinology and machine learning to predict vaccine targets [6] . The Cov19VaxKB version of this vaccine design tool includes an embedded view of the SARS-CoV-2 results from Vaxign2. The SARS-CoV-2 Vaxign2 output includes the protein name and accession number, adhesin probability, number of trans-membrane helices, and a Vaxign-ML score derived from a machine learning-based prediction [14] . To enable data transfer, Cov19VaxKB is synchronized with VO, which serves as an ontological storage system for information regarding vaccine names, vaccine type, route of administration, manufacturers, antigens, host species, and adjuvants [9] . VO entries for COVID-19 vaccines are manually created and updated using the Protégé ontology program in the Web Ontology Language (OWL). Links to these VO entries are manually incorporated into the knowledge base's vaccine lists and VIOLIN entries. Excel files of the vaccine lists, Vaxign2 output, and adverse event analysis results are uploaded to the "Data Download" webpage for user download. The Cov19VaxKB system is designed to focus on three aspects of COVID-19 vaccine data: vaccine development, product-side information, and host-side information. Vaccine development information includes data about clinical trials and pre-clinical research studies of newly developed COVID-19 vaccines. Product-side information refers to vaccine type, antigens, adjuvants, manufacturer, and storage. Host-side data includes information regarding immune responses, efficacy, and adverse events. The Cov19VaxKB provides a user-friendly data query for users to search and compare the COVID-19 vaccine entries stored in the VIOLIN database. The query is located on the homepage of the knowledge base, allowing users to access the feature upon entering the website. A user can begin by selecting a category in the drop-down menu and typing in a keyword that will be used to query vaccines only containing that keyword ( Fig. 2A) . Up to three different categories can be specified to query a list of COVID-19 vaccines. The query feature also allows the user to sort vaccines according to conditions such as vaccine name or Vaccine Ontology ID. The query generates a filtered list of COVID-19 vaccines with links to their corresponding VIOLIN and VO entries (Fig. 2B) . After a queried vaccine list is generated based on the user's input, multiple vaccines from this list can be chosen to view a formatted side-by-side display of their respective VIOLIN entries, which include product information and host response data (Fig. 2C) . Also, from the list of vaccines generated from the initial query, the user can click on a VO ID link to access its formatted VO entry in the Ontobee data server (Fig. 2D ). Users can generate a filtered list of adverse events (AEs) and access a comprehensive table of case reports through the vaccine adverse event query (Fig. 3A-C) . for Janssen, and 26 for COVID-19 vaccines for which the manufacturer information is unavailable. Statistically significant AEs for each COVID-19 vaccine were identified by the adverse event statistical analysis tool (Fig. 3D-E) . The total number of significant AEs was 101 for the Pfizer-BioNTech vaccine, 37 for the Moderna vaccine, and 101 for the Janssen vaccine (Fig. 4) . Seven AEs are significant for all 3 vaccines, among which severe AEs include pulmonary embolism and gait inability. The Cov19VaxKB vaccine design tool has identified 24 SARS-CoV-2 vaccine targets as reported in our previous study [15] . The antigen target with the highest Vaxign-ML score is the surface glycoprotein, also known as the spike protein. Unsurprisingly, the surface glycoprotein is the antigen for many existing COVID-19 vaccines, including the Pfizer-BioNTech, Moderna, Johnson & Johnson, and Oxford-AstraZeneca vaccines. VO contains entries for all COVID-19 vaccines that have been authorized for public use or that are currently in Phase 1-3 clinical trials. Figure 5 displays an ontological framework for the Pfizer-BioNTech COVID-19 vaccine. The ontology representation indicates that the "Pfizer-BioNTech COVID-19 vaccine" uses the "mRNA of the S protein of SARS-CoV-2" as its part and immunizes against the virus. The vaccine is administered via the "intramuscular route" and is manufactured by "Pfizer Inc." The ontology also illustrates that the "S protein" induces "cell-mediated immunity." To the best of our knowledge, Cov19VaxKB is the first web-based, publicly Users can predict vaccine targets using the vaccine design tool or utilize the adverse event data analysis feature to determine safety signals for specific COVID-19 vaccines. The Cov19VaxKB web query enables users to compare product and host response information between various COVID-19 vaccines. Thus, the query can be a powerful tool for analyzing relationships between two or more vaccine properties or attributes. For instance, users can analyze the relationship between immune responses and vaccine type by comparing neutralizing antibody levels and protection rates among vaccines of different types. Vaccine efficacy rates can be compared across different vaccine types to assess any broad differences or similarities in vaccine efficacy. The adverse event analysis tool can identify enriched adverse events in COVID-19 vaccines that can be analyzed for potential causal relationships in future studies. A statistically significant adverse event for a vaccine represents the enriched association between the vaccination and the adverse event at the population level, but it does not imply that the vaccination induced the adverse event for a specific individual. An adverse event is any undesirable experience that happens after vaccination which may or may not be caused by a vaccine [16] . Overall, existing COVID-19 vaccines have been demonstrated to be safe [17] . Further adverse event analysis is necessary to determine whether instances of adverse events such as death and thrombosis were directly caused by COVID-19 vaccines. Although statistically significant adverse events occur more frequently for the vaccine of interest compared to other vaccines, these AEs occur at a very low frequency compared to the total number of vaccinations administered. For example, there were only 2,736 reported occurrences of pulmonary embolism among the 3 FDA-authorized vaccines, in contrast to a total number of 428,006,540 COVID-19 vaccine doses administered in the United States as of November 6, 2021 [18] . In other words, pulmonary embolism occurred in only 0.0006% of all COVID-19 vaccinations in the US. In conclusion, Cov19VaxKB provides a timely platform for the curation, sharing, and analysis of COVID-19 vaccine information. In the future, we aim to continue adding new features to the knowledge base to improve the user experience and meet the growing demand for relevant vaccine data analysis. As the volume of COVID-19 vaccine and vaccination data continues to grow, the knowledge base will be a useful and reliable reference for both researchers and the public. The authors have declared that no competing interests exist. in the drop-down menu and typing in a keyword that will be used to query vaccines that only contain that keyboard. Up to three different categories can be specified to query a list of COVID-19 vaccines. The query feature also allows the user to sort vaccines according to conditions such as vaccine name or Vaccine Ontology ID. (B) Once the user has clicked "Search," the query will produce a list of vaccines that satisfy the specified criteria. The user can select one or more of these vaccines and click "Compare" to compare the VIOLIN entries of the desired vaccines. (C) By doing so, the user will be presented with formatted side-by-side lists that contain general vaccine and host response information. General vaccine information contains data such as the product name of the vaccine, vaccine type, and antigen. Host response information contains brief summaries of randomized controlled trial data found in relevant publications. (D) Also, from the list of vaccines generated from the initial query, the user can click on the VO ID link. This will direct the user to a formatted VO vaccine entry in the Ontobee data server. in the drop-down menu and typing in a keyword that will be used to query vaccines that only contain that keyboard. Up to three different categories can be specified to query a list of COVID-19 vaccines. The query feature also allows the user to sort vaccines according to conditions such as vaccine name or Vaccine Ontology ID. (B) Once the user has clicked "Search," the query will produce a list of vaccines that satisfy the specified criteria. The user can select one or more of these vaccines and click "Compare" to compare the VIOLIN entries of the desired vaccines. (C) By doing so, the user will be presented with formatted side-by-side lists that contain general vaccine and host response information. General vaccine information contains data such as the product name of the vaccine, vaccine type, and antigen. Host response information contains brief summaries of randomized controlled trial data found in relevant publications. (D) Also, from the list of vaccines generated from the initial query, the user can click on the VO ID link. This will direct the user to a formatted VO vaccine entry in the Ontobee data server. COVID-19) Weekly Epidemiological Update and Weekly Operational Update COVID-19 vaccine tracker and landscape An interactive website tracking COVID-19 vaccine development Updates on the web-based VIOLIN vaccine database and analysis system Vaxign2: the second generation of the first Web-based vaccine design program using reverse vaccinology and machine learning Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network Mining of vaccine-associated IFN-gamma gene interaction networks using the Vaccine Ontology Ontology representation and analysis of vaccine formulation and administration and their effects on vaccine immune responses Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports Ontology-based combinatorial comparative analysis of adverse events associated with killed and live influenza vaccines Database resources of the National Center for Biotechnology Information VIOLIN: vaccine investigation and online information network Vaxign-ML: Supervised Machine Learning Reverse Vaccinology Model for Improved Prediction of Bacterial Protective Antigens COVID-19 coronavirus vaccine design using reverse vaccinology and machine learning. bioRxiv The Ontology of Adverse Events Mortality Rate and Characteristics of Deaths Following COVID-19 Vaccination COVID-19 Vaccinations in the United States ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: