key: cord-0731232-6d2n60mm authors: Mathioudakis, Alexander G.; Fally, Markus; Hashad, Rola; Kouta, Ahmed; Hadi, Ali Sina; Knight, Sean Blandin; Bakerly, Nawar Diar; Singh, Dave; Williamson, Paula R.; Felton, Tim; Vestbo, Jørgen title: Outcomes Evaluated in Controlled Clinical Trials on the Management of COVID-19: A Methodological Systematic Review date: 2020-12-15 journal: Life (Basel) DOI: 10.3390/life10120350 sha: 3e3c6dfefd259f3994b1c9089c65031e14e442cc doc_id: 731232 cord_uid: 6d2n60mm It is crucial that randomized controlled trials (RCTs) on the management of coronavirus disease 2019 (COVID-19) evaluate the outcomes that are critical to patients and clinicians, to facilitate relevance, interpretability, and comparability. This methodological systematic review describes the outcomes evaluated in 415 RCTs on the management of COVID-19, that were registered with ClinicalTrials.gov, by 5 May 2020, and the instruments used to measure these outcomes. Significant heterogeneity was observed in the selection of outcomes and instruments. Mortality, adverse events and treatment success or failure are only evaluated in 64.4%, 48.4% and 43% of the included studies, respectively, while other outcomes are selected less often. Studies focusing on more severe presentations (hospitalized patients or requiring intensive care) most frequently evaluate mortality (72.5%) and adverse events (55.6%), while hospital admission (50.8%) and viral detection/load (55.6%) are most frequently assessed in the community setting. Outcome measurement instruments are poorly reported and heterogeneous. Follow-up does not exceed one month in 64.3% of these earlier trials, and long-term COVID-19 burden is rarely assessed. The methodological issues identified could delay the introduction of potentially life-saving treatments in clinical practice. Our findings demonstrate the need for greater consistency, to enable decision makers to compare and contrast studies. The emergence of the coronavirus disease 2019 led to an unprecedented research mobilization aiming to understand the virus and develop effective preventive and therapeutic strategies [1, 2] . Characteristically, within ten months, over 60 thousand publications focusing on COVID-19 were indexed in PubMed and almost two thousand interventional studies were registered with the ClinicalTrials.gov database. However, the limited knowledge about the disease and the need for an expeditious response to the unfolding pandemic did not allow, in some cases, for adequate methodological planning and co-ordination. Extensive research duplication (or better multiplication) has been observed, with numerous randomized controlled trials (RCTs) evaluating the same interventions for COVID-19 in parallel [3] . Moreover, standardization is lacking in trial design and could limit comparability. An important source of variability in trial design could arise from the outcomes (endpoints) that are selected for evaluation. Heterogeneity in trial outcomes and omission of outcomes that are critically important to patients and clinicians complicate interpreting, comparing and synthesizing trial results, potentially delaying the introduction of novel, life-saving treatments into clinical practice [4, 5] . Core outcome sets are developed to address heterogeneity in the selection of outcomes. These are agreed standardized sets of outcomes that should be measured and reported as a minimum in all clinical trials in specific areas of health or health care [6] . The Core Outcome Measures in Effectiveness Trials (COMET) has developed a rigorous methodology for their development, to ensure the most pertinent clinical outcomes are included in core outcome sets [6] [7] [8] . Core outcomes should be informed by rigorous methodological systematic review [6] [7] [8] . Upon the emergence of COVID-19 pandemic, there was an urgent need for the development of a core outcome set. Within a few months, four sets were developed, using an accelerated process [9] [10] [11] [12] . These were based on methodological systematic reviews of the first registered RCTs, which were limited in number and design, due to the limited knowledge of the nature of COVID-19, at the time. However, in the meantime, our knowledge of the natural history of COVID-19 is expanding rapidly and numerous clinical trials have been registered. In this methodological survey, we describe the outcomes that are tested in RCTs evaluating therapeutic interventions for COVID-19 and the instruments used to measure these outcomes. We followed standard methodology recommended by the Core Outcome Measures in Effectiveness Trials (COMET) initiative for conducting methodological systematic reviews of outcomes evaluated in RCTs [6] , that was successfully applied in previous, similar methodological surveys [13] [14] [15] [16] . The Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) was used for reporting this systematic review (Table S1 ). Planned, ongoing or completed interventional clinical trials evaluating pharmacological or non-pharmacological interventions for the management of COVID-19 were considered eligible. Phase 1 trials were considered beyond the scope of this manuscript and, thus, excluded. All eligible trials from the U.S. National Library of Medicine clinical trials register (ClinicalTrials.gov, searched on 5 May 2020) were retrieved using standard filters recommended by the library. More specifically, for identifying studies evaluating COVID-19, we used the following terms: COVID-19, SARS-CoV-2, severe acute respiratory syndrome coronavirus 2, 2019-nCoV, 2019 novel coronavirus, and Wuhan coronavirus. Only studies identified as interventional by the submitting researcher were retrieved. Eligible studies were grouped into phase 2 or later stage trials, and according to the recruitment setting (community, hospital, or intensive care unit). The main methodological characteristics of all eligible studies, including the planned study population, age of the participants, recruitment setting, blinding, interventions, outcomes, funding, sponsoring, and geographic distribution of the participating centers were extracted automatically from the ClinicalTrials.gov extract (.csv), using a script developed in the platform R statistics (version 3.4.3; R Foundation for Statistical Computing, Vienna, Austria). One researcher (amongst MF, RH, ASH, AK) confirmed eligibility, cross-checked pre-extracted data for accuracy, searched for additional reports of the study protocol and extracted additional data, that were not automatically captured. A second researcher (AGM) cross-checked all extracted data for accuracy. Disagreement was resolved through discussion. Extracted data included the projected recruitment sizes, study settings, as well as details on the eligibility criteria and evaluated outcome measures. Descriptions of all outcome measures were extracted verbatim from the study protocols or registry entries. After in-depth assessment of the outcomes evaluated in a random sample of 20 studies, we developed a list of generic outcome categories defined by the treatment effect they aim to capture, rather than the specific measurement instrument. Two authors (amongst MF, RH, ASH, AK) categorized each of the extracted outcomes within the generic outcome categories. New generic outcome categories were developed as needed, in cases where the evaluated outcomes did not fit any of the existing categories, based on consensus among the co-authors. The instruments used for the quantification of each outcome were also captured. Disagreement was resolved through discussion with another reviewer (AGM). Finally, the generic outcomes were further classified according to the COMET taxonomy [17] . Our search retrieved 745 interventional studies. After excluding diagnostic, prognostic, preventive studies, phase 1 trials and those not directly focusing on the management of COVID-19, we selected 415 studies for inclusion in this systematic survey, including 178 phase 2, and 237 later phase RCTs ( Figure A1 , Table A1 ). Most of the included trials are conducted by academic investigators (75.7%) and only one in four is sponsored by the pharmaceutical industry. The planned recruitment ranges between 7 and 12,000 participants (median: 160, interquartile range [IQR]: 67-400). Most trials include two intervention arms (74.8%), but one in four evaluates more than two, and up to 19 interventions. Moreover, 79.8% of the trials are conducted in a hospital setting, including 6.5% conducted in the intensive care unit (ICU), while 15.2% are conducted in the community. Descriptions of disease severity are heterogeneous, with the recruitment setting being the most consistent measure of disease. Details on the characteristics of the included studies are available in Table 1 . Overall, 3948 unique outcomes are evaluated in the included studies, including 1691 from phase 2 trials and 2257 from later phase trials. We identified 25 generic outcome categories (Table 2) . Similar number of outcomes are evaluated in phase 2 (median: 8.5, IQR: 5-13) and later phase (median: 7, IQR: 4-11) trials ( Figures A2 and A3 ). Mortality and adverse events, the most frequently assessed outcomes, are only assessed in 64.6% and 48.4% of all trials, respectively. All remaining outcomes are evaluated in less than half of the trials, highlighting an important heterogeneity in outcomes selection (Tables 3 and 4) . Treatment success or failure is only evaluated in 41.6% of phase 2 trials and 44.1% of the later phase trials. Interestingly, the frequency that different outcomes are evaluated as outcomes or as primary outcomes are very similar for phase 2 and later phase trials. Mortality/Survival Evaluation the survival status. Treatment success or treatment failure A clinical evaluation of whether COVID-19 was successfully treated. Usually a composite endpoint based on one or more of the following: survival, symptoms progression or regression, pyrexia regression, oxygen requirements and/or the requirement for ventilation. We only considered in this category binary outcomes describing criteria either for treatment success or treatment failure. Time-to-treatment success or failure is a measurement instrument that could provide more granular information. A quantitative evaluation of disease severity. In this category we included outcomes presenting mean/median scores or change from baseline in a validated score. Outcomes describing predefined score thresholds for treatment success or failure were classified in the previous category. Quantitative or qualitative evaluation of the intensity of one or more symptoms, including but not limited to breathlessness, cough, pyrexia or anosmia. Physiological measures of oxygenation, including oxygen saturation (SatO2), the partial pressure of oxygen (PaO2) or carbon dioxide (PaCO2). The need for supplementary oxygen or ventilation were summarized in separate outcome categories. Pulmonary function and physiology Measures of pulmonary functions and lung physiology including the forced expiratory volume in 1 second (FEV1), forced vital capacity (FVC), respiratory muscle strength or the lung compliance. Polymerase chain reaction (PCR) to evaluate the presence, persistence and/or load of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV2). Detection of the presence and titres of antibodies against SARS-CoV2. Radiological progression in chest x-ray (CXR) or computed tomography (CT) of the chest. The levels and trajectories of any inflammatory biomarkers, including white blood cells count, lymphocytes, neutrophils, eosinophils, monocytes, CD4+ or CD8+ T cell counts, c-reactive protein, interleukins, tumour necrosis factors, or any other inflammatory biomarkers. The levels and trajectories of any other biomarkers, including but not limited to kidney function, liver function, haematocrit, coagulation profile, d-dimers, troponin or the brain natriuretic peptide (BNP). Pharmacokinetics/pharmacodynamics Evaluation of the pharmacokinetics and/or pharmacodynamics of the drug interventions (mainly serum levels over time). Adverse events or grade 3 or more severe adverse events, or serious adverse events, according to the Common Toxicity Criteria for Adverse Events (CTCAE). In this category, we also included outcomes evaluating specific adverse events, such as renal failure, liver failure, pulmonary embolism, myocardial infarction or tachyarrhythmias. Treatment discontinuation was also included in this category. Quantitative assessment of the general well-being of participants. This group of outcomes include the need for (i) hospital admission, (ii) hospital re-admission, (iii) intensive care admission, (iv) invasive ventilation, or need for ECMO. In each category, we also included the composite outcomes consisting of the need for the specific level of care or death. For example: "intensive care admission or death", as these composite outcomes were developed to account for patients who might have benefitted by the higher level of care but died or patients who were not eligible for the higher level of care due to their baseline clinical status. In studies conducted in the hospital setting, need for hospital admission at a specific follow-up timepoint, refers to the proportion of patients who remain inpatients at that timepoint. Similarly, for studies conducted in the ICU stay and the need for ICU admission. Duration of stay in a specific level of care This group of outcomes include length of (i) hospital stay, (ii) intensive care admission, or (iii) mechanical ventilation. The end date was often defined as the last day of stay in a specific level of care, or the last day that the stay was indicated (to account for cases when patients are medically optimized for hospital discharge but remain at hospital for social or other reasons. In this category we summarized outcomes that were reported in less than 10 of all eligible trials. These included changes in activities of daily living, quality of life, pharmacodynamics and pharmacokinetics, drug compliance, feasibility outcomes, use of antibiotics or other drugs, emergency room visits or use of other healthcare resources, the need for prone positioning, need for transfusion and discharge destinations. RCTs grouped in phase 2 and later phase trials. Outcomes evaluated in <10 RCTs were grouped as "Other outcomes". Time to treatment success or failure is a measurement instrument of the outcome treatment success or failure. However, it is reported separately here, as it provides more granular information. NIV: Non-invasive ventilation. The most frequently reported outcomes among studies conducted in a community setting (thus recruiting less severely ill patients), were viral detection or load (55.6%), need for hospital admission (50.8%), and symptoms (49.2%). In contrast, the most frequently evaluated outcomes in studies recruiting patients with more severe COVID-19, were mortality and adverse events, which were evaluated in 71.6%, and 50.3% of studies recruiting hospitalized patients, and in 88.9% and 66.7% of those recruiting critically ill patients, respectively. All-cause mortality is evaluated in all but six trials measuring mortality. When mortality was not further described, we presumed it referred to all-cause mortality. Time to death is assessed in 16 trials, and cause-specific mortality in six, mainly focusing on SARS-CoV2 mortality, but also including mortality due to pulmonary or cardiovascular complications. (Table 5 ) [18] . Treatment success is defined as an improvement in ordinal scales such as the WHO clinical progression scale by 2 points or 1 point in 57.5% and 24.8% of all outcomes using the scale to evaluate treatment success, while in the remaining outcomes, no specific threshold is provided. Complete resolution of the symptoms and signs of COVID-19 (clinical recovery) is used as a measure of treatment success in 51/220 (23.2%) outcomes and clinical improvement in 38/220 (17.3%) outcomes. The definition of complete resolution varies. Often, no further information is provided. In the remaining cases, it is defined as a composite outcome including several of the following components: complete resolution of breathlessness, tachypnoea, hypoxia, desaturation, cough, anosmia, myalgia, fever, or of oxygen requirements; a negative COVID-19 PCR; hospital discharge; or radiological resolution. A definition of clinical improvement as an outcome is also frequently lacking. In the remaining cases, it is defined as an improvement in several of the previously listed components. Improvement is either based on prespecified thresholds, or on a subjective clinicians' judgement. Finally, 14 outcomes (6.4%), use specific thresholds (0, ≤2 or ≤4) of the National Early Warning Score (NEWS or NEWS-2) to define treatment success. Treatment failure, or time to treatment failure is evaluated by 76 outcomes. In most cases (40/76, 52.6%), treatment failure is defined as a composite outcome consisting of several components with clear thresholds, such as: death, need for ICU admission, need for invasive ventilation, need for other organ support (e.g., vasopressors or renal replacement therapy), need for non-invasive ventilation (NIV), need for supplemental oxygen, a deterioration in oxygenation, need for hospital admission or re-admission or emergency visit, ventricular tachyarrhythmia. Ordinal clinical severity scales such as the WHO scale are used to define treatment failure in 16/76 (21.1%) outcomes, while the need for rescue therapy is used in 9/76 (11.8%) outcomes. The remaining 11 (14.5%) outcomes do not provide specific criteria and/or state treatment failure will be based on the clinician's judgement of deterioration in the clinical condition of the patient. 2. Severity scores: Standardized scores are used to evaluate disease severity and progression in 277 outcomes. Ordinal disease severity scales (such as the WHO scale) are the most frequently used scores (144/277 outcomes, 51.2%), followed by the Sequential Organ Failure Assessment (SOFA) Score [19] , a validated score for describing the severity of organ dysfunction (54/277 outcomes, 19.5%), and the NEWS score [20] . Acute Physiology and Chronic Health Evaluation II (APACHE II, 5/277), clinical sign score (5/277), Pneumonia Severity Index (PSI, 3/277), BRESCIA-COVID, Murray score, Sepsis Induced Coagulopathy, Small Identification Test, SMART-COP score, and the Vienna Vaccine Safety Initiative (ViVI) disease severity score are used less often. 3. Symptoms: 188 outcomes focus on symptoms, which are either assessed using visual analogue scales, or validated instruments. Composite scores evaluating several symptoms, including breathlessness, cough, sputum production, pyrexia, anosmia, myalgia, headache, or gastrointestinal symptoms are used in 40 outcomes (21.3%). Four composite outcomes specifically assess respiratory symptoms (2.2%). Each of the remaining outcomes focus on a single symptom. These include fever (72/188, 38.3%), breathlessness (18, 9. 6%), cough (12, 6.4%), and less often anxiety, depressive symptoms, anosmia, cognitive dysfunction, nausea, insomnia, or fatigue. In this category we also included the assessment of heart rate (8, 4.3%) or blood pressure (5, 2.7%). 1. Oxygenation (evaluated by 215 outcomes): Oxygenation is evaluated using the partial pressure of oxygen (PaO2), fraction of inspired oxygen (FiO2), oxygen saturation (SatO2), or respiratory rate. Oxygenation is often measured as the PaO2 or SatO2 corrected for FiO2 (95/215, 44.2%). In this category we also included measurements of the partial pressure of carbon dioxide (PaCO2) and pH, which are only rarely evaluated as outcomes. 2. Pulmonary function and physiology (28 outcomes): There is significant heterogeneity in this domain, with different outcomes evaluating peak flow rate, forced vital capacity (FVC), the ratio of forced expiratory volume in 1 second (FEV1) to FVC, vital capacity, diffusing capacity, lung compliance, and respiratory muscle function. 3. Viral detection and load (235 outcomes): The vast majority assess virologic clearance by a specific timepoint, or the time until virologic clearance. A small number of outcomes track changes in viral load over time, or differences in the viral detection and load when using different samples (nasal, nasopharyngeal, oropharyngeal swabs or sputum). 4. Viral antibodies: The development of antibodies against SARS-CoV2 is assessed in 31 outcomes. Evaluation of specific antibody types (IgA, IgG, or IgM) is only described in five trials. 5. Radiological outcomes (61 outcomes): Definitions of this outcome are inadequate. In most cases, it is broadly stated that the progression, regression, or resolution of the radiological findings are monitored. Details are only provided in six outcomes, which monitor the extent of the lesion as a proportion of the full lung volume, or perform lung densitometry. Development of fibrosis is evaluated in seven outcomes. Computed tomography (CT) is used in 21 (34.4%) outcomes, a chest X-ray (CXR) in 8 (13.1%), either a CT or a CXR in three, either CT or CXR or lung ultrasound in one and nuclear imaging in one outcome. The imaging modality used is not declared in the remaining 28 (45.9%) outcomes. 6 . Inflammatory biomarkers (321 outcomes, each describing either a single or multiple biomarkers): The most frequently evaluated biomarkers are the total white cell count, neutrophils, lymphocytes, eosinophils, monocytes, c-reactive protein, interleukins 1, 6, and 8, followed by other interleukins, procalcitonin, tumour necrosis factors, complement components, lymphocytes subtypes, immunoglobulins, and other inflammatory biomarkers. 7. Other biomarkers: 309 outcomes evaluate either a single or multiple non-inflammatory biomarkers. Mostly, these are surrogates for safety or adverse events. The most frequently captured biomarkers are d-dimers, cardiac enzymes, kidney function, liver function, clotting, red blood cells and haemoglobin, followed by a variety of other molecules. 8. Pharmacokinetics/Pharmacodynamics: Here, we categorized 33 outcomes, mostly evaluating plasma drug concentrations (12/33, 36.4%), but also half-life, maximum/minimum observed concentration, time to reach the maximum/minimum observed concentration, area under the plasma concentration-time curve. Adverse events (448 outcomes): 108 (24.1%) outcomes evaluate any adverse event; either their frequency, or participants experiencing at least one adverse event. 80 (17.9%) outcomes specifically assess serious adverse events, as defined by the Common Terminology Criteria for Adverse Events (CTCAE). Nineteen (4.2%) outcomes focused on drug reactions, 14 (3.1%) on grade 3 or 4 adverse events, as defined by the CTCAE, and 22 (4.9%) assessed the rate of study drugs discontinuation due to adverse events or due to any reason. The remaining outcomes focused on specific adverse events, mostly cardiac (38, 10.3%), secondary infections (37, 10.0%), thrombotic or bleeding events (29, 8.1%), or local administration reactions (13, 3.6%) The EuroQol 5 Dimensions (EQ-5D) is used in four outcomes, followed by the Research and Development Corporation's (RAND) 36-Item Health Survey (SF-36), which is used in three outcomes. Other instruments include the WHO Disability Assessment Schedule (WHODAS 2.0), the Control, Autonomy and Pleasure (CASP- 19) , and the Nottingham Health Profile. 3.2.6. Resources Use 1. Need for a (higher) level of care (352 outcomes): Need for hospital admission is evaluated by 68 outcomes (19.3%), need for hospital re-admission by 9 (2.6%), need for intensive care admission by 82 (23.4%), need for invasive ventilation by 167 (47.4%), and need for extracorporeal membrane oxygenation (ECMO) by 26 (7.4%; merged with the outcome need for ventilation in the tables). In studies conducted in the hospital setting, need for hospital admission at a specific follow-up timepoint, refers to the proportion of patients who remain inpatients at that timepoint. Similarly, for studies conducted in the ICU, and the need for ICU admission. In this category, we also included composite outcomes consisting of one of the above outcomes and mortality (e.g., need for ICU admission or death), as these composite outcomes focus on the need for a higher level of care, while death is added to account for patients who decease before accessing the higher level of care, or those who are not eligible for higher level of care due to their baseline clinical status. Such approaches could be crucial to account for bias, especially in situations such as the COVID-19 pandemic, when hospitals and ICUs are over-burdened and not infrequently unable to accommodate a significant proportion of the patients, leading to the introduction of stricter criteria for triaging patients. Moreover, some outcomes in this category also evaluate time-to-higher level of care (e.g., time-to-hospital admission). 2. Duration of stay in a specific level of care (469 outcomes): Of those, 206 (43.9%) focus on the length of hospital stay, 96 (20.5%) on the length of ICU stay, and 167 (35.6%) on the duration of invasive ventilation. Delays in discharging patients who are medically optimized due to social or other reasons could introduce bias in the outcome length of hospital stay. To account for this issue, 11 outcomes are defined as the time to discharge or to a NEWS ≤2, maintained for 24 h and another outcome as the time until participants are deemed medically optimized for discharge by a clinician. 3. Need for supplemental oxygen or NIV: This category includes 105 outcomes evaluating the need for supplemental oxygen or NIV in any setting. Most evaluate the need for supplemental oxygen administration at specific follow-up timepoints; 34 (32.4%) outcomes assess the need for NIV (including continuous positive airway pressure [CPAP] or bilevel positive airway pressure [BiPAP]), and 21 (20.0%) the need for high-flow oxygen. One outcome evaluates the need for domiciliary oxygen after hospital discharge. 4. Duration of supplemental oxygen or NIV (95 outcomes): Twelve (12.6%) evaluate the duration of NIV, and seven (7.4%) evaluate the duration of high-flow oxygen. 5 . Need for other organ support (other than invasive ventilation, 44 outcomes): 26 (59.1%) outcomes focus on the need for vasopressors, and 18 (40.9%) for renal replacement therapy. 6. Other outcomes: Here, we grouped 145 outcomes that could not be categorized in the previous categories and were evaluated in <10 RCTs each. Need for concurrent treatments is assessed in 22 outcomes, including 7 that specifically focus on the administration of antibiotics. Exercise capacity is assessed by 13 outcomes (mostly using the 6-minutes walking test), COVID-19 transmission by 9, resource requirements, and costs by 8 outcomes. Other outcomes include the use of prone positioning, ability to perform activities of daily living, incidence, and progression of cytokine storm syndrome, resilience, lost workdays, and discharge destinations. Planned follow-up for all included studies varies from less than a week, to over a year (Figure 1 , Figure A4 ). However, in most cases, it does not exceed one month (263/415 63.4%). Follow-up exceeds four months only in 50 (12.0%) studies and one year only in one. Follow-up plans do not differ between phase 2 and later phase trials, where they are limited to one month or less in 105/178 (59.0%) and in 158/237 (66.7%) trials, respectively. Longer-term follow-up, exceeding 4 months, is planned for 163 outcomes (Figure 2, Figure A5 ), evaluating mortality (16 outcomes), adverse events (15) , life impact (12), severity scores (12) , length of hospital stay (11) , viral detection and load (11) , inflammatory biomarkers (7), pulmonary function/physiology (6), need for ventilation (5) , and duration of ventilation (5) . 6. Other outcomes: Here, we grouped 145 outcomes that could not be categorized in the previous categories and were evaluated in <10 RCTs each. Need for concurrent treatments is assessed in 22 outcomes, including 7 that specifically focus on the administration of antibiotics. Exercise capacity is assessed by 13 outcomes (mostly using the 6-minutes walking test), COVID-19 transmission by 9, resource requirements, and costs by 8 outcomes. Other outcomes include the use of prone positioning, ability to perform activities of daily living, incidence, and progression of cytokine storm syndrome, resilience, lost workdays, and discharge destinations. Planned follow-up for all included studies varies from less than a week, to over a year (Figure 1 , Figure A4 ). However, in most cases, it does not exceed one month (263/415 63.4%). Follow-up exceeds four months only in 50 (12.0%) studies and one year only in one. Follow-up plans do not differ between phase 2 and later phase trials, where they are limited to one month or less in 105/178 (59.0%) and in 158/237 (66.7%) trials, respectively. Longer-term follow-up, exceeding 4 months, is planned for 163 outcomes (Figure 2, Figure A5 ), evaluating mortality (16 outcomes), adverse events (15) , life impact (12) , severity scores (12) , length of hospital stay (11) , viral detection and load (11), inflammatory biomarkers (7), pulmonary function/physiology (6), need for ventilation (5) , and duration of ventilation (5). In this methodological survey, we analysed the outcomes and outcome measurement instruments used in 415 RCTs evaluating therapeutic interventions for COVID-19. We identified a remarkable heterogeneity in the selection of outcomes, that is not unexpected given that these trials were designed within a few months from the emergence of the new coronavirus strain. More specifically, only 64.6% and 48.4% of the studies evaluate mortality and adverse events, respectively, while each of the remaining outcomes is assessed by markedly less than half of the studies. Variability was also observed in the choice of instruments used to measure different outcomes. Ordinal clinical severity scales were consistently used across the included studies to assess treatment success or failure and disease severity. Given the acute nature of COVID-19, and significant changes in the clinical status of patients in the course of the disease, such scales can effectively capture disease progression, especially in more severe presentations. Most of these scales follow the structure of the WHO scale, removing scale points for simplicity. Despite sharing a similar structure, these scales group patients differently, limiting interpretability and comparability. The WHO recently introduced a revised 11-point Scale, with increased granularity, and it would be advisable for all studies to align relevant outcomes with this revised scale, to improve interpretability and comparability [12] . To evaluate treatment success or failure, most studies used a 2-point change in the ordinal scale as a threshold, that corresponds to a significant change in the clinical status of the patient and this seems appropriate. Our study revealed a lack of focus on the long-term sequelae of SARS-CoV2 infection. The planned study follow-up exceeds four months only in 12% of all studies. Moreover, only 13 trials assess life impact beyond the acute phase, while exercise capacity is assessed by 13 trials, and the ability to perform simple daily activities during convalescence in only four trials. Only seven trials stated an intent to explore the development of pulmonary fibrosis. However, persistent symptoms, such as fatigue or breathlessness, and quality of life deficits are detected in many hospitalized patients, two to three months after discharge [21] [22] [23] . Moreover, fibrotic changes are detected in about one in three survivors of a hospitalization for COVID-19 infection [24, 25] . However, it should be noted that we evaluated RCTs registered by May 2020 and longer-term follow-up may have been planned for newer studies, in view of the emerging data. While this study did not focus on the analytical approaches used for evaluating outcomes, we observed that several studies described specific approaches to account for the bias introduced by mortality as a competing factor for other outcomes, including the duration of hospital stay, ICU stay and the duration of respiratory support. Several methods were described to account for this bias. Some studies stated the duration of hospital or ICU stay will be censored for deceased participants, while others assessed the days that participants are alive and out of hospital or ICU, instead. Homogenization and detailed description of the analytical approaches in the study protocols, along with the outcomes and outcome measurement instruments are crucial for increasing transparency and comparability. Future methodological studies should address analytical approaches. Four core outcome sets have already been published, with overlapping but not identical selection of components. The WHO Working Group on the Clinical Characterisation and Management of COVID-19 infection recommends the minimal use of three outcomes: mortality, viral burden and non-mortal clinical outcomes evaluated using the WHO clinical progression scale [12] . WHO also highlighted the need for a longer follow-up, of at least 60 days, to capture disease mortality, which is not adopted by most identified trials. Two other groups prioritized specific outcomes and measurement instruments, all of which were captured in our analysis, but were not necessarily the most frequently used [10, 11] . The last core outcome set prioritized broader domains to be addressed, rather than specific outcomes [9] . These domains encompass most outcomes identified in this methodological review. The same group also highlighted the need to evaluate the impact of COVID-19 on patient status and life impact in the longer term. Looking across these core outcome sets, a meta-core outcome set (meta-COS) was identified, only including the two domains that were prioritized by all initiatives (mortality and respiratory support), as the most critical, to be evaluated in all future RCTs in hospitalized patients [26] . Both domains recommended by the meta-COS were evaluated in 205 (49.4%) of the included studies. In view of the multiple available core outcome sets, the authors of this review believe that outcomes selection for future trials should (i) adhere to the recommendations by the WHO and the meta-COS, and (ii) attempt to address all of the domains proposed by Tong et al., a core outcome set that was informed by consensus of >9000 participants [9] . Undeniably, the objectives of individual trials vary and, accordingly, additional outcomes could be selected to address specific trial objectives. However, evaluating the most pertinent outcomes summarized in the previously mentioned core outcomes could improve the interpretability and comparability of their results. Methodological systematic reviews were conducted as part of the development of three core outcome sets. However, these reviews were almost exclusively based on studies conducted in China. Moreover, two of these reviews included approximately 100 RCT protocols [10, 11] , while the WHO document was informed by 1135 protocols, including both observational and interventional studies [12] . However, the outcomes of RCTs often differ from those selected in observational studies. Our methodological review was based on a globally representative sample of 415 RCTs, it employed more rigorous methodology to assess all outcomes, and it is the first review to evaluate the instruments used to evaluate the different outcomes beyond mortality. Our study only included clinical trials that were registered until May 2020 and this may be a limitation as trial designs and endpoints may have evolved since then, in view of the emerging knowledge on the nature and outcomes of COVID-19 infection, and the published core outcome sets. Importantly, the study protocols of some of the included RCTs have been amended since then and our methodological systematic review is a snapshot of the RCT designs and plans as of May-August 2020. Moreover, we only evaluated studies registered with the U.S. National Library of Medicine clinical trials register (ClinicalTrials.gov). However, our extensive, globally representative sample of 415 ongoing RCTs was a major strength of our methodological survey and we strongly believe it was sufficient to capture all relevant outcomes and measurement instruments. Characteristically, after extracting data from approximately 25% and 70% of the included trials, we reached saturation with regards to the outcome categories and the outcome measurement instruments, respectively. Therefore, we are confident that we have not missed important outcomes by focusing exclusively on clinicaltrials.gov. Future studies will need to assess the impact of the emerging evidence on the natural history and outcomes of COVID-19 and of the four published core outcome sets and the meta-COS on the selection of outcomes in more recently registered trials. Another limitation of our study is the lack of a prospectively registered protocol. However, we have used rigorous methodology recommended by the COMET Initiative, that we have previously employed in similar methodological systematic reviews [13] . Overall, this methodological survey reveals significant heterogeneity in the outcome categories and measurement instruments selected by trialists in the management of COVID-19 and highlights the need for greater consistency, to enable decision-makers to compare and contrast studies. Supplementary Materials: The following are available online at http://www.mdpi.com/2075-1729/10/12/350/s1, Table S1 . Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist. Verona, outside the submitted work. T.F. reports personal fees from Theravance Biopharma, Gilead and Menarini, outside the submitted work. J.V. reports grants and personal fees from Boehringer Inghelheim and personal fees from AstraZeneca, Chiesi, GlaxoSmithKline and Novartis, outside the submitted work. The remaining authors do not have any CoIs. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. GlaxoSmithKline, Glenmark, Menarini, Mundipharma, Novartis, Peptinnovate, Pfizer, Pulmatrix, Therevance and Verona, outside the submitted work. T.F. reports personal fees from Theravance Biopharma, Gilead and Menarini, outside the submitted work. J.V. reports grants and personal fees from Boehringer Inghelheim and personal fees from AstraZeneca, Chiesi, GlaxoSmithKline and Novartis, outside the submitted work. The remaining authors do not have any CoIs. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Figure A2 . The association between the number of outcomes reported in each RCT and the study population. Having observed that some trials listed numerous inflammatory or other biomarkers as distinct outcomes, for each RCT we have summarized inflammatory biomarkers as a single outcome and other biomarkers as another single outcome. Figure A3 The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak Review of trials currently testing treatment and prevention of COVID-19 COVID-19 Clinical Trials: Unravelling a Methodological Gordian Knot Can a core outcome set improve the quality of systematic reviews?-A survey of the Co-ordinating Editors of Cochrane Review Groups Guidance production before evidence generation for critical issues: The example of COVID-19 The COMET Handbook: Version 1.0. Trials Core Outcome Set-STAndardised Protocol Items: The COS-STAP Statement Core Outcome Set-STAndards for Reporting: The COS-STAR Statement Core Outcomes Set for Trials in People With Coronavirus Disease Core Outcome Set for Clinical Trials of COVID-19 Based on Traditional Chinese and Western Medicine WHO Working Group on the Clinical Characterisation and Management of COVID-19 Infection. A minimal common outcome measure set for COVID-19 clinical research Outcomes reported on the management of COPD exacerbations: A systematic survey of randomised controlled trials Core Outcome Set for the management of Acute Exacerbations of Chronic Obstructive Pulmonary Disease. The COS-AECOPD ERS Task Force study protocol Reporting of outcomes in gastric cancer surgery trials: A systematic review A systematic evaluation of the diagnostic criteria for COPD and exacerbations used in randomised controlled trials on the management of COPD exacerbations A taxonomy has been developed for outcomes in medical research to help improve knowledge discovery World Health Organization. Novel Coronavirus COVID-19 Therapeutic Trial Synopsis; WHO R&D Blueprint The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine A national early warning score for acutely ill patients Post-discharge persistent symptoms and health-related quality of life after hospitalization for COVID-19 Postdischarge symptoms and rehabilitation needs in survivors of COVID-19 infection: A cross-sectional evaluation Persistent Symptoms in Patients After Acute COVID-19 Abnormal pulmonary function in COVID-19 patients at time of hospital discharge Respiratory follow-up of patients with COVID-19 pneumonia The 'Meta-Cos' for Research in COVID-19 Hospitalised Patients Life 2020, 10, x FOR PEER REVIEW