key: cord-1013333-5dqwx6lp
authors: Shanbehzadeh, Mostafa; Kazemi-Arpanahi, Hadi
title: Development of minimal basic data set to report COVID-19
date: 2020-09-01
journal: Med J Islam Repub Iran
DOI: 10.34171/mjiri.34.111
sha: f28ffde966c751c91d385de1f51dc5194ba6bae6
doc_id: 1013333
cord_uid: 5dqwx6lp

Background: Effective surveillance of COVID-19 highlights the importance of rapid, valid, and standardized information to crisis monitoring and prompts clinical interventions. Minimal basic data set (MBDS) is a set of metrics to be collated in a standard approach to allow aggregated use of data for clinical purposes and research. Data standardization enables accurate comparability of collected data, and accordingly, enhanced generalization of findings. The aim of this study is to establish a core set of data to characterize COVID-19 to consolidate clinical practice. Methods: A 3-step sequential approach was used in this study: (1) an elementary list of data were collected from the existing information systems and data sets; (2) a systematic literature review was conducted to extract evidence supporting the development of MBDS; and (3) a 2-round Delphi survey was done for reaching consensus on data elements to include in COVID-19 MBDS and for its robust validation. Results: In total, 643 studies were identified, of which 38 met the inclusion criteria, where a total of 149 items were identified in the data sources. The data elements were classified by 3 experts and validated via a 2-round Delphi procedure. Finally, 125 data elements were confirmed as the MBDS. Conclusion: The development of COVID-19 MBDS could provide a basis for meaningful evaluations, reporting, and benchmarking COVID-19 disease across regions and countries. It could also provide scientific collaboration for care providers in the field, which may lead to improved quality of documentation, clinical care, and research outcomes.

In December 2019, a series of cases of pneumonia with mysterious etiology was first identified in Wuhan, China. On January 7, 2020, the novel Coronavirus (COVID- 19) , previously known as Severe Acute Respiratory Syndrome Coronavirus2 (SARS-CoV-2 or 2019-nCoV) was identified as the causal organism (1) (2) (3) . COVID-19 is classified 2 as a type of RNA virus, belonging to the family of coronaviruses, which primarily leads to a respiratory system infection and is extensively transmitted among humans and mammals, causing numerous conditions that range from the ''common'' influenza to death (4, 5) . COVID-19 seems to be extremely communicable. The World Health Organization (WHO) has recently confirmed the COVID-19 a public health emergency (6) . The WHO is warning countries to expand their efforts to contain the disease and safeguard health care environments and notes that a solution calls for a worldwide 'aggressive preparedness' (7) .

Early, systematic, and active emergency management practices are key points in epidemic prevention and control. The effective surveillance of this emerging outbreak heavily relies on regulatory management and coordinated interventions, which include comprehensive and directed surveillance, antimicrobial stewardship program, education and training, research and epidemiological studies, and policymaking, etc. These interventions highlight the importance of rapid, valid, and reliable information sharing across hospitals and public health authorities for monitoring crisis and early warning. In this situation, highquality datasets are the prerequisite of necessary analysis for public health, which is inherently a data-intensive domain (8) (9) (10) (11) (12) . In Iran, most organizations have developed different processes and infrastructure for management and subsequent data collection of COVID-19 patients (13) (14) (15) (16) (17) . Although current efforts to report COVID-19 are a good start, the absence of information management viewpoint regarding which data elements are critical to be recorded leads to significant inconsistent, unreliable, redundant, or duplicate reports. Thus, this precludes data integration, which limits the share of data across multiple health information systems (18, 19) .

Further, standardized clinical documentation is an essential factor for electronic health records (EHRs) and for supporting secondary use of data gathered in the context of clinical daily workflows for other purposes than patient care, eg, for clinical research, quality management, epidemiologic studies, patient outcomes, and interoperability initiatives. MBDS is a data collection tool that aims to identify the common components of data sets as one of the first and most basic steps in foundation and implementation of numerous information systems through minimizing duplication of effort and improving data quality (20) (21) (22) (23) (24) . COVID-19 monitoring depends on clinical data and reports from widely scattered public and hospital information systems as data input (eg, Hospital information systems (HIS), Iranian Electronic Health Record (socalled SEPAS), Iranian Integrated Health System (known as SIB), and other clinical information systems). Accordingly, as we are in the primary step of this emergency, the need to establish a supportive, standardized, accurate, and updated dataset is of paramount importance. Adopting such dataset is an important step in promoting data (capture) and data exchange with regard to COVID-19. Thus, we conducted a systematic literature review combined with a Delphi survey to establish a minimal dataset that would be regarded as a standardized method of reporting COVID-19 disease, and thus it is expected to improve the quality of clinical and research outcomes.

In this study, a 3-step sequential approach was used. First, an elementary list of data was collected from the existing information systems and datasets. Next, a search strategy was developed to identify data items for establishing COVID-19 MBDS from an evidence-based perspective. These sources were continuously reviewed until data saturation (maximum data set). Finally, the data included from the review were analyzed using a 2-round Delphi survey to achieve consensus on optimal data set (minimum data set).

The initial data elements were extracted from the medical records of patients with COVID-19, reports from Corona National headquarters, and other clinical and public health organizations affiliated to the Iranian Ministry of Health as well as official dataset provided by international organizations, such as the World Health Organization (WHO) reports, European Centre for Disease Prevention and Control dataset (ECDC), Chinese Center for Disease Control and Prevention (Chinese CDC). In addition to mapping available evidence supporting the development of minimal dataset, a systematic review was also conducted to identify probable data elements for inclusion in COVID-19 MBDS. To that end, PubMed, Scopus, Web of Science, and Google Scholar databases were reviewed by the following search terms (designed using English MeSH keywords and Emtree terms): "COVID-19", "Novel coronavirus 2019", "2019 nCoV", "clinical characteristics", "clinical features" and "clinical findings". In Table 1 , the systematic search strategy is proposed based on Boolean search operators, keywords, and search fields (advance 3 search interface).

Two authors independently performed electronic literature searches for study identification and screening. The results of the initial search strategy were first screened based on the title and abstract. The full-texts of relevant articles were examined for inclusion and exclusion criteria. This research included all full-text articles extracted from reliable sources in English between December 2019 and April 2020. Short articles, letters to the editor, accepted papers in conferences, thesis, and reports extracted from blogs were not included in this study. The main criterion for selection of research articles was the relevancy of their content with the research title. Due to the large number of available research articles, several criteria were considered for selecting articles and introducing clinical core data elements to report COVID-19. Hence, full articles with at least 2 of the following data classes related to the main objectives of reporting of COVID-19 were selected: (1) clinical, (2) laboratory, (3) radiology, and (4) epidemiological features. Finally, probable data elements to be included in COVID-19 MBDS were introduced in a checklist.

A questionnaire was developed using the data elements of the checklist and included 5 columns: "very important", "important" "neutral", "slightly important", and "very slightly important" for each data item. To add necessary data elements by experts, a blank row was provided at the end of the questionnaire. The content validity of the questionnaire was assessed by an expert panel, including 2 infectious specialists and 3 health information management (HIM) experts. To add necessary data elements by experts, a blank row was provided at the end of the questionnaire. Test-retest (at an 8-day interval) was done to determine the reliability of the questionnaires, based on experts' answers, including 2 health information management (HIM) and 2 medical informatics experts. Finally, the collected data were analyzed using SPSS 16, with the questionnaire showing a Cronbach's alpha of 0.86.

The data elements were validated using 2 rounds of the Delphi survey by a group of multidisciplinary medical experts ( Table 2 ). The experts participating in the study were asked to score the tabulated list of data elements in terms of their importance using a 5-point Likert scale (ranging from 1:"very slightly important" to 5:"highly important". The level of agreement was considered to be a criterion for the acceptance of the data elements. Thus, after initial ranking, data elements with ≤50% agreement were excluded in the first round, those with 50%-75% agreement entered the second round, and data elements with ≥75% agreement were included in the primary round.

A total of 643 articles were obtained from the literature review. After removal of duplicate articles and applying the exclusion criteria, 38 articles were included in the analysis (Fig. 1) . The demographic data of the study participants are presented in Table 2 . The potential participants consisted of 25 medical specialists involved in COVID-19 care, treatment, and research domains. However, 6 specialists did not participate in the study. Thus, 19 experts contributed.

Overall, 3 data categories, 19 data classes and 149 data items were extracted from the related comprehensive literature review (maximum dataset). These data categories were epidemiological, clinical, and paraclinical. Epidemiological data contained 4 categories, including basic in-formation, exposure history, transmission mode, and susceptible populations. The clinical data category consisted of clinical manifestations, coexisting conditions, treatment and supportive care, physical examinations, complications, time intervals, disease severity, disease status, and outcome data classes. Finally, the paraclinical category was divided into 2 laboratory and radiology indicators. The definitive numbers of data elements for epidemiological, clinical and paraclinical classes were 25, 73, and 51 respectively (Table 3) . 

Finally, 149 primary data elements were included in the Delphi survey, of which 112 data elements were finalized in the first round and 15 were rejected. A total of 22 data elements progressed to the second round of the Delphi survey. Of them, 9 were approved in round 2. Thus, on completion of the survey, 125 data elements were approved. Accordingly, the final data elements for epidemiological, clinical, and paraclinical categories were 22, 57, 6 and 46, respectively.

This study reports the basic required data items originally derived from studying the COVID-19 patients' medical records, existing official data sets, and through conducting a systematic literature review, and Delphi survey. The aim of this study was to identify a set of parameters believed to be essential and sufficient to assist the uniform reporting of data on COVID-19. Through the designed COVID-19 MBDS, it is possible to meet some of the data requirements regarding care practice, leading to reliable framework on which health care experts can base their documentation. These elements give both the clinicians and researchers high-quality data to support diagnosis and analysis, respectively. The resulting MBDS is therefore more likely to be acceptable and practical in clinical prac-tice and biomedical research. It has also the potential to homogenize data capturing among public and medical information systems, so that clinical data on COVID-19 can be merged and compared. In addition, data exchange and interoperability can be enhanced using a proper and reliable data set (60) . Development of a required data set is the most fundamental step for construction of any information system in the health care sector. Determining these data elements based on viewpoints and real requirements of their customers or users can help designers and vendors of information systems to facilitate and accelerate the development of such systems and reduce the possibility of their failure (61) . Thus, the MBDS established in this study can be used as a basis for developing different information systems for collection and management of COVID-19 data.

In the context of COVID-19, huge volumes of data are 7 generated every day in clinical and public health domains. In such big data area, what can be collected is not an issue; rather attention should be paid to the depth and statistical power of collected data to confirm or disprove a hypothesis, and answer specific questions (62, 63) . The anticipated hypothesis and questions to be addressed by a health information system or clinical registry should determine the data items that are preferred, and resource accessibility should inform the scope of the data collected to respond to the expected queries. Part of the problem can be due to lack of comparable data derived from limited sharing, unstructured reporting, and lack of standardized data capture strategies (64, 65) . To resolve this, new advances in data collection instruments improve the fundability, accessibility, interoperability and reusability (FAIR) of data, highlighting the need for uniform data that can be integrated from different fragmented resources (66) (67) (68) (69) . In this regard, the anticipated benefits of the COVID-19 MBDS for investigators can include accelerating study initiation, facilitating data exchange and accumulation, and good data management to reach FAIR data. The COVID-19 MBDS aims to facilitate FAIR data collection (26, 29, 30, 32, 37, 39, 42, 43, 48, 49, 56) 11 4.31 (85.31) ✓ Accept AST (U/L) ↑↓ (26, 29, 30, 32, 37, 39, 42, 43, 48, 49 This study reported the development of the first MDS-COVID-19 based on state-of-the-art evidence as well as consultation with future users (experts and clinicians). This method could contribute to establishing a balance between scientific theoretical knowledge and technical knowledge as well as applied wisdom from clinical practice to inform the data set. The resulting MBDS is therefore more possible to be satisfactory and practical in clinical practice. We identified the variables required to analyze fundamental aspects, such as transmission patterns, severity, clinical phenotype, prognostic factors, the effectiveness of therapeutic plans and complications, survival estimation, as well as incidence and prevalence of disease across the country.

The literature review only incorporated the search published in the first 4 months of COVID-19 disease during the review period. A more systematic review may have identified additional relevant studies. However, given that the literature review is aimed to identify potential items for inclusion in the MBDS (rather than identifying every paper that considered COVID-19 parameters), and we drew on the collective wisdom of experts in the COVID-19 field throughout the consensus process, it seems unlikely that any important aspects of COVID-19 would have remained overlooked.

To conclude, the developed MBDS used structured agreement methods that integrated a literature review and expert opinion to consolidate COVID-19 documentation, research, and practice. Data collection in line with the configuration presented in this MBDS contributes to unified reporting, probably leading to improved quality of patient documentation, augmented continuity of care, and improved health outcomes regarding COVID-19. COVID-19 MBDS is not proposed to be inclusive; it is what the consulted professionals arbitrated to be a manageable, minimal, and essential set that would ideally be provided in all COVID-19-related research studies. This core set can be augmented in each particular project according to the project's purpose and available resources. Future testing in other health care settings is recommended. In the future, further strategies, including a comprehensive search of the literature, should be considered to enhance this MBDS. 

The SARS, MERS and novel coronavirus (COVID-19) epidemics, the newest and biggest global health threats: what lessons have we learned

Diagnosis of the Coronavirus disease (COVID-19): rRT-PCR or CT?

Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study

Clinical characteristics and imaging manifestations of the 2019 novel coronavirus disease (COVID-19): A multi-center study in Wenzhou city

COVID-19, a worldwide public health emergency

Clinical characteristics of coronavirus disease 2019 in China

Availability of COVID-19 Information from National and International Aesthetic Surgery Society Websites

Data Quality of Chinese Surveillance of COVID-19: Objective Analysis Based on WHO's Situation Reports

Epidemiological data from the COVID-19 outbreak, real-time case information

Digital health and the COVID-19 epidemic: an assessment framework for apps from an epidemiological and legal perspective

Cloud-Based System for Effective Surveillance and Control of COVID-19: Useful Experiences From Hubei

Responding to COVID-19: The UW Medicine Information Technology Services Experience

Isfahan COvid-19 REgistry (I-CORE): Design and methodology

Rationale and Design of a Registry in a Referral and Educational Medical Center in Tehran, Iran: Sina Hospital Covid-19 Registry (SHCo-19R)

Is reporting many cases of COVID-19 in Iran due to strength or weakness of Iran's health system?

A model for COVID-19 prediction in Iran based on China parameters

Design and development of a web-based registry for Coronavirus (COVID-19) disease

The Data set for Patient Information Based Algorithm to Predict Mortality Cause by COVID-19. Data Brief

A study on the quality of novel coronavirus (COVID-19) official datasets

Identification of the necessary data elements to report AIDS: A systematic review

Developing a Minimum Data Set (MDS) for Cardiac Electronic Implantable Devices Implantation

Development of a minimum data set for cardiac electrophysiology study ablation

Key Data Elements in Myeloid Leukemia. MIE

Information management for aged care provision in Australia: development of an aged care minimum dataset and strategies to improve quality and continuity of care

Risk-Aware Identification of Highly Suspected COVID-19 Cases in Social IoT: A Joint Graph Theory and Reinforcement Learning Approach

Epidemiologic and Clinical Characteristics of 26 Cases

Arising from Patient-to-Patient Transmission in Liaocheng, China

The correlation between viral clearance and biochemical outcomes of 94 COVID-19 infected discharged patients

The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19)-China

Clinical features and dynamics of viral load in imported and non-imported patients with COVID-19

Clinical and epidemiological features of 36 children with coronavirus disease 2019 (COVID-19) in Zhejiang, China: an observational cohort study. Lancet Infect Dis

Clinical, laboratory and imaging features of COVID-19: A systematic review and meta-analysis

The Clinical and Chest CT Features Associated with Severe and Critical COVID-19 Pneumonia

Clinical and Transmission Characteristics of Covid-19-A Retrospective Study of 25 Cases from a Single Thoracic Surgery Department

Clinical Features and Treatment of COVID-19 Patients in Northeast Chongqing

Epidemiological characteristics on the clustering nature of COVID-19 in Qingdao City, 2020: a descriptive analysis

Epidemiological and initial clinical characteristics of patients with family aggregation of COVID-19

The clinical and immunological features of pediatric COVID-19 patients in China

Clinical features of COVID-19 in elderly patients: A comparison with young and middle-aged patients

Clinical characteristics of 36 non-survivors with COVID-19 in Wuhan, China. medRxiv

The clinical characteristics of COVID-19: a retrospective analysis of 104 patients from the outbreak on board the Diamond Princess cruise ship in Japan

The performance of chest CT in evaluating the clinical severity of COVID-19 pneumonia: identifying critical cases based on CT characteristics

Association between Clinical, Laboratory and CT Characteristics and RT-PCR Results in the Follow-up of COVID-19 patients. medRxiv

Clinical Characteristics, Laboratory Findings, Radiographic Signs and Outcomes of 52,251 Patients with Confirmed COVID-19 Infection: A Systematic Review and Meta-Analysis. Microb Pathog

Clinical and pathological characteristics of 2019 novel coronavirus disease (COVID-19): a systematic reviews

COVID-19 pneumonia manifestations atthe admission on chest ultrasound, radiographs, and CT: single-center study and comprehensive radiologic literature review

Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (COVID-19) during the early outbreak period: a scoping review

Clinical characteristics and diagnostic challenges of pediatric COVID-19: A systematic review and meta-analysis

Unique epidemiological and clinical features of the emerging 2019 novel coronavirus pneumonia (COVID-19) implicate special control measures

COVID-19): a review of clinical features, diagnosis, and treatment

Epidemiology and Clinical Features of COVID-19: A Review of Current Literature

Clinical features and short-term outcomes of 221 patients with COVID-19 in Wuhan

Clinical characteristics of coronavirus disease 2019 (COVID-19) in China: a systematic review and meta-analysis

Clinical Presentation of COVID-19: A Systematic Review Focusing on Upper Airway Symptoms

Incidence, clinical characteristics and prognostic factor of patients with COVID-19: a systematic review and meta-analysis. medRxiv

Clinical Characteristics of 2019 Coronavirus Pneumonia (COVID-19): An Updated Systematic Review. medRxiv

Epidemiological Characteristics and Clinical Features of 32 Critical and 67 Noncritical Cases of COVID-19 in Chengdu

Clinical characteristics of non-critically ill patients with novel coronavirus infection (COVID-19) in a Fangcang Hospital

CT scans of patients with 2019 novel coronavirus (COVID-19) pneumonia

Clinical and CT imaging features of 2019 novel coronavirus disease (COVID-19)

A proposed minimum data set for international primary care optometry: a modified Delphi study. Ophthalmic Physiol Opt

Electronic Medical Records for Mental Disorders: What Data Elements Should These Systems Contain? Stud Health Technol Inform

Approaches to recruiting 'hardto-reach'populations into research: a review of the literature

Recruitment of hard-toreach population subgroups via adaptations of the snowball sampling strategy

Common data elements and data management: Remedy to cure underpowered preclinical studies

National institute of neurological disorders and stroke outcome set for hip fracture trials

The FAIR Guiding Principles for scientific data management and stewardship

Developing a core common data element project-approach and methods

The development of the Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS): a large-scale data sharing initiative

Rapid, responsive, relevant (R3) research: a call for a rapid learning health research enterprise

This article is extracted from a research project supported by Abadan Faculty of Medical Sciences (IR.ABADANUMS.REC.1399.021). We also thank the Research Deputy of Abadan faculty of Medical Sciences for financially supporting this project. We also would like to thank all experts who participated in this study and played a role in the validation of the data elements.

The authors declare that they have no competing interests.