key: cord-0772163-40s4y1fe authors: nan title: Systematic Review of Artificial Intelligence in Acute Respiratory Distress Syndrome for COVID-19 Lung Patients: A Biomedical Imaging Perspective date: 2021-08-11 journal: IEEE J Biomed Health Inform DOI: 10.1109/jbhi.2021.3103839 sha: ac0c73b293522715f3da5166c9cb5fcd954e426e doc_id: 772163 cord_uid: 40s4y1fe SARS-CoV-2 has infected over ∼165 million people worldwide causing Acute Respiratory Distress Syndrome (ARDS) and has killed ∼3.4 million people. Artificial Intelligence (AI) has shown to benefit in the biomedical image such as X-ray/Computed Tomography in diagnosis of ARDS, but there are limited AI-based systematic reviews (aiSR). The purpose of this study is to understand the Risk-of-Bias (RoB) in a non-randomized AI trial for handling ARDS using novel AtheroPoint-AI-Bias (AP(ai)Bias). Our hypothesis for acceptance of a study to be in low RoB must have a mean score of 80% in a study. Using the PRISMA model, 42 best AI studies were analyzed to understand the RoB. Using the AP(ai)Bias paradigm, the top 19 studies were then chosen using the raw-cutoff of 1.9. This was obtained using the intersection of the cumulative plot of “mean score vs. study” and score distribution. Finally, these studies were benchmarked against ROBINS-I and PROBAST paradigm. Our observation showed that AP(ai)Bias, ROBINS-I, and PROBAST had only 32%, 16%, and 26% studies, respectively in low-moderate RoB (cutoff>2.5), however none of them met the RoB hypothesis. Further, the aiSR analysis recommends six primary and six secondary recommendations for the non-randomized AI for ARDS. The primary recommendations for improvement in AI-based ARDS design inclusive of (i) comorbidity, (ii) inter-and intra-observer variability studies, (iii) large data size, (iv) clinical validation, (v) granularity of COVID-19 risk, and (vi) cross-modality scientific validation. The AI is an important component for diagnosis of ARDS and the recommendations must be followed to lower the RoB. trial for handling ARDS using novel AtheroPoint-AI-Bias (AP(ai)Bias). Our hypothesis for acceptance of a study to be in low RoB must have a mean score of 80% in a study. Using the PRISMA model, 42 best AI studies were analyzed to understand the RoB. Using the AP(ai)Bias paradigm, the top 19 studies were then chosen using the raw-cutoff of 1.9. This was obtained using the intersection of the cumulative plot of "mean score vs. study" and score distribution. Finally, these studies were benchmarked against ROBINS-I and PROBAST paradigm. Our observation showed that AP(ai)Bias, ROBINS-I, and PROBAST had only 32%, 16%, and 26% studies, respectively in lowmoderate RoB (cutoff>2.5), however none of them met the RoB hypothesis. Further, the aiSR analysis recommends six primary and six secondary recommendations for the non-randomized AI for ARDS. The primary recommendations for improvement in AI-based ARDS design inclusive of (i) comorbidity, (ii) inter-and intra-observer variability studies, (iii) large data size, (iv) clinical validation, (v) granularity of COVID-19 risk, and (vi) cross-modality scientific validation. The AI is an important component for diagnosis of ARDS and the recommendations must be followed to lower the RoB. C or Coronavirus is a disease that was declared a "public health emergency of international concern" or "pandemic" by the International Health Regulations Emergency Committee of the World Health Organization (WHO) on January 30, 2020. As of 20 th May 2021, the WHO statistics showed more than 165 million people have been infected causing Acute Respiratory Distress Syndrome (ARDS), and nearly 3.4 million have lost their lives due to this virus [1] . There is a dire necessity to flatten the pandemic curve and prevent this severe illness during the "long-COVID-19" (beyond the COVID-19 era). The SARS-CoV-2 virus directly affects the human lungs, travels through the respiratory system and into the body [2] . However, the mutated ribonucleic acid (RNA) present in the virus makes it difficult to treat the infected patient. As per the Journal of the American College of Cardiology (JACC), cardiac troponin may help determine the risk of myocarditis [3] , signaling a positive COVID-19 diagnosis [4] . Imaging, therefore, also plays a vital role in predicting and validating the severity of the infection [5] ; however, the patient's vital clinical information further improves its ability to predict the severity better and lowers the mortality rate [6] . Artificial Intelligence (AI) has been helpful in combating such diseases because of its ability to model extensive and non-linear covariates against COVID-19 deaths in a big data framework. In previous pandemics, models were developed to flatten their mortality curves. For example, the Zika epidemic [7] , Influenza type A, the H1N1 pandemic [8] , and the Chikungunya epidemic [9] had shown a correlation between their data streaming using telemedicine and the pandemic curve. More recently, in China, similar telemedicine models have been adapted [10] . Therefore, we firmly believe that predicting the severity of COVID-19 using AI through computational models will be significantly useful to address the lack of software "verification, scientific and clinical validation" (discussed in section VII.B) capabilities worldwide. Note that AI-based COVID-19 severity is determined by either (i) classifying the COVID-19 pneumonia patient scans against controls or other kinds of pneumonia or (ii) locating the diseased region in the lung scan(s). Ground Glass Opacities (GGO) can be used to validate the COVID-19 severity. This study is focused on (a) ARDS that deals with the lung gas exchange disorder caused by the SARS-CoV-2, and (b) imaging of infected lungs using Computed Tomography (CT) and Chest X-rays (CXR). In 2020, there have been six AI-based systematic reviews (aiSR) on ARDS [11] [12] [13] [14] [15] [16] . However, they are incomplete, not well focused, and lack practical recommendations for safe and effective AI design for ARDS analysis. A detailed comparison among the six aiSR is discussed in the benchmarking section VII.A. In general, there are limited aiSR, which rank these studies, compute their mean scores, and determine the AI studies with low RoB. Further, this aiSR establishes a link between the AtheroPoint's artificial intelligence-based Bias (AP(ai)Bias) and previous RoB paradigms such as the Risk of Bias in Nonrandomized Studies of Interventions (ROBINS-I) or Prediction model Risk Of Bias ASsessment Tool (PROBAST) for handling ARDS via CT or CXR. This aiSR uses Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) model for study selection. These studies are then analyzed to understand the role of AI in the detection of COVID-19 severity in radiological images and evaluate the RoB using AI attributes. It then presents the pathophysiology of COVID-19 leading to ARDS [17] , followed by different AI techniques used in the lung segmentation and classification of the disease severity. Furthermore, to help understand the non-randomized trials' outcome, we have proposed AP(ai)Bias, a novel bias estimation, that rates each of the ten AI attributes, computes cumulative and mean scores, and ranks the selected studies. Further, AP(ai)Bias is compared against qualitative paradigms such as ROBINS-I and PROBAST. Lastly, the aiSR presents several significant recommendations for lowering the RoB in the AI design for ARDS. A detailed search was performed using PubMed, IEEE Xplore, ScienceDirect, ArXiv and Google Scholar. The keywords used for selecting studies were COVID-19, ARDS, deep learning, lung segmentation, classification, lung CT, X-ray, and AI. Fig. 1 shows the PRISMA model consisting of the studies used in this review. A total of 339 studies were identified, and duplicates were removed using the feature called "Find Duplicates" in EndNote software by Clarivate Analytics [18] , thus, retaining 304 records. The three exclusion criteria were (i) studies not related to AI, (ii) non-relevant articles, and (iii) having insufficient data. This excluded 55, 56, and 104 studies (marked as E1, E2 (non-AI, but COVID- 19) , and E3 in Fig. 1 ), leading to the final selection of 89 studies. We hypothesize that "non-randomized Artificial Intelligencebased attributes can (a) detect, (b) classify, (c) estimate severity of the COVID-19 risk, and (d) meets the performance standards in lung infected ARDS patients." Three acceptability criteria were: (i) for the AP(ai)Bias-based ranking method, the mean score must be greater than or equal to 80% for an AI-based study while taking into consideration all the AI attributes [19] . This was due to the consensus of the experienced team and five different classes for each AI attribute based on its strength (such as low, moderate, high-moderate, low-of-a-high, and high-ofa-high, ranging from 1 to 5). Similarly, for the (ii) ROBINS-I and (iii) PROBAST paradigms, our acceptability criterion must meet the score of 80% or above for AI-based studies to be in the low-RoB zone [19] , [20] . The number of publications and cases discussing ARDS infected by SARS-CoV-2 has been increasing over time [21] . The first set of cases in Wuhan, China, reported having COVID-19 patients hospitalized for lower respiratory tract (LRT) complaints [22] . They also indicated that the symptoms of COVID-19 are incredibly diverse, ranging from minimal LRT symptoms to significant hypoxia due to ARDS [22] . Further, Huang et al. [23] reported that the time gap between the onset of minimal LRT symptoms to ARDS was as short as nine days, suggesting that minimal LRT symptoms could progress rapidly in these patients. Recent studies support that in COVID-19 patients, ARDS has higher rates than extrapumonary complications [24] - [26] . The pathophysiology of ARDS in COVID-19 patients begins with SARS-CoV-2 entering into the lung by aerosol transmission [27] , as outlined in Fig. 2 (note, numbers 1-12 correspond to letter D1-D12). The attachment of SARS-CoV-2 to the host cells occurs via the anchoring of its virion spike proteins (S1 and S2) to the angiotensin-converting enzyme 2 (ACE2) receptor on the surface of the type 2 pneumocytes on the pulmonary alveolar epithelium (D1). This causes respiratory symptoms to present as the earliest clinical presentation of COVID-19 [28] . Due to infection, the inflammatory process begins and leads to inflammatory mediators' production [29] , [30] (D2). Moreover, these inflammatory mediators stimulate alveolar macrophages in producing polymorphic neutrophils (PMNS) and cytokines such as IL-1, IL-6, and TNF-a (D3, D4a, and D4b). Additionally, cytokine hyperproduction causes a cytokine storm. The sequence of steps in the systemic inflammatory response, cytokine storm, and multiple organ failure plays a critical role in ARDS development [31] . Previous coronaviruses had also observed the same process of cytokine storm and ARDS development [32] . The produced PMNS recruits platelets and forms Neutrophil extracellular traps (NET) that causes endothelial dysfunction [33] . NET's are weblike structures of deoxyribonucleic acid (DNA) and proteins created after PMNS activation and infection. The key enzymes that help develop NET's are neutrophil elastase, type 4 peptidyl arginine deiminase, and gasdermin-D [34] . Although NET's have a beneficial role in the host defense against pathogens, they also have a detrimental role. They are created by facilitating micro thrombosis, resulting in permanent organ damage to the lung, heart, and kidney [35] . Additionally, cytokine storm causes endothelial dysfunction and creates gaps between cells, increasing vascular permeability [22] , [36] (D5, D7, and D8). Increased vascular permeability causes fluid leakage, resulting in increased alveolar space and diffuse inflammatory alveolar exudate, causing alveolar edema [37] [38] [39] (D8 and D9), typically seen in CT lung scans. Alveolar edema then leads to increased alveolar surface tension, a critical feature of diffuse alveolar atelectasis (i.e., damage or collapse) [40] , [41] (D10 and D11). Following the alveolar collapse, ventilation-to-perfusion mismatch, ARDS occurs, that results in an impairment of carbon dioxide excretion due to the increased alveolar dead space [42] (D12). For a comprehensive RoB analysis, it is customary to investigate the basic building blocks of the ARDS pipeline. As shown in Fig. 3 , the two major components are lung segmentation and COVID-19 severity classification. We will briefly study AI models' statistical distribution and AI architectures for these two components. Even though PRISMA selected 89 studies, only 42 were AIbased studies [43] - [84] . AI-based image classification occurred in 85% [43] [44] [45] [46] [47] [48] [49] [50] , [52] , [53] , [55] , [56] , [58] [59] [60] [61] [62] , [64] [65] [66] [67] [68] [69] [70] , [73] [74] [75] [76] [77] [78] [79] [80] , [83] of the selected studies, while lung segmentation with or without classification was 3% [51] , and 12% [57] , [63] , [71] , [72] , [84] , respectively ( Fig. 4(a) ). In terms of risk granularity, AI-based on binary classification (BC) showed 46%, while multiclass (MC) paradigm for classification showed 39%, and hybrid (combination of segmentation and classification) were 15% ( Fig. 4(b) ). Most of the studies used 2-D (82%) [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] , [54] , [57] [58] [59] [60] [61] [62] , [64] [65] [66] [67] [68] [69] , [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] , while others were used 3-D (12%) [53] , [63] , [70] , [83] , [84] , and 6% [55] used both 2-D and 3-D ( Fig. 5(a) ). The total images used were 51% in 2-D, only 1% in 3-D, and 48% used both ( Fig. 5(b) ). The landscape for AI models for image classification consisted mainly of machine learning (ML), deep learning (DL), transfer learning (TL), recurrent learning (RL), [4] , and an amalgamation of these learning paradigms [85] (Fig. 6 ). The granular division of the AI-based classification paradigm consisted of DL (50%) [43] , [46] , [47] , [49] , [50] , [52] , [53] , [55] , [56] , [59] [60] [61] , [64] [65] [66] , [69] , [78] [79] [80] , [83] , DL and TL (12%) [67] , [68] , [75] [76] [77] , TL (10%) [44] , [46] , [58] , [74] , DL and ML (5%) [48] , [73] , ML (5%) [45] , [62] , and RL combined with DL (3%) [70] . Note only six studies (15%) [51] , [57] , [63] , [71] , [72] , [84] were focused on lung segmentation. Lung segmentation is a process of extraction of the lung region in CXR projections or in CT-slices [86] . In the 2-D paradigm, (i) Rajaraman et al. [72] implemented UNet architecture with Gaussian dropout in CXR (Fig. 7) . As the shape resembles the letter U, the contraction path (also known as encoder) is the left arm of U and captures the context of the image using traditional convolutions and max-pooling layers. The expanding path (also known a decoder) is the right arm of U that enables precise localization using transposed convolutions. The depth of the UNet is the number of layers in the UNet architecture and is responsible for the performance, while having a tradeoff between accuracy and computational cost [72] . (ii) Oh et al. [57] implemented Fully connected (FC)-DenseNet103 transfer learning model (architecture shown in Fig. 8 and results shown in Figure A(left) , see Supplemental D) offered the advantage of saving time by using the pre-trained weights. In the 3-D paradigm, Wang et al. [63] implemented DenseNet121-FPN ( Figure A(right) , see Supplemental D), the only study that performed 3-D lung segmentation using CT volumes (Fig. 9 ). In FC-DenseNet models, 2-D or 3-D, the feature maps created by the preceding dense block are up-sampled to prevent a large number of computations and parameters. Note that, all the above architectures adapted a skip connection between the down-sampling path and up-sampling paths for transferring weights. 2-D CXR is preferred over 3-D CT volume imaging due to lower costs. The in-depth comparison between 2-D and 3-D segmentation is shown in Table I . The pipeline for ARDS diagnosis consists of the classification of lung scans based on COVID-19 risk severity. The AI model using ground truth with two classes leads to binary class (BC) framework, while models using multiple ground truths yielded multiclass (MC). Studies that adapted both segmentation and classification (using either BC or MC) were categorized into hybrid class (HC). The statistical distribution between the three types of classes was 46%, 39%, and 15%, respectively. Most of the studies that conducted BC used CXR [44] , [75] , [76] , [87] while MC (such as DenseNet [52] and truncated Inception Net for CXR [46] , COVNet (a modified Resnet-50 architecture) [83] and COVIDNet-CT [79] for CT, and VGG-16 for Ultrasound [74] ) adapted transfer learning (TL) architecture ranging from 5-fold to 10-fold cross-validation paradigms ( Figure B , see Supplemental D). The best accuracies of 96% and 99% in BC and MC were obtained by Brunese et al. [44] and Gunraj et al. [79] . A unique observation was seen in Ozturk et al. [59] for validating the COVID-19 output that consisted of the superimposition of colored heat maps on grayscale CXR using GRAD-CAM [59] . In HC, three studies had used 2-D CXR [57] , [59] , [72] while two studies had used 3-D CT [53] , [63] . Wu et al. [64] implemented a new HC multi-view fusion CT technique using the modified Resnet-50 on a patient size of 495, demonstrating accuracy of 83.33% and AUC∼0.905 (p<0.001). This was 6% better compared to the single view paradigm. Oh et al. [57] also showed that better classification performance could be achieved when trained using FC-DenseNet103 on segmented infected lung region in 2-D CXR. It is however noteworthy to explore factors that affects the performance of AI models which we discuss next. The image quality plays an important role when it comes to computer-aided diagnosis (CAD) performance [86] , [88] , [89] . The CT image quality can also be fuzzy or degraded due to the radiation dosage [90] . Further, due to high COVID-19 severity, there is a fluid leakage or alveolar edema, which causes lung-scans to show hyperintensity distribution (due to high consolidation and ground-glass opacity). Digital CXR is preferred over conventional CXR due to high-resolution imaging [92] . Several methods were designed to assess signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) in CAD-based static and motion imagery [93] . Image registration is essential during the image quality assessment [94] , [95] . However, the effect of an image quality assessment on AI for COVID-19 has not been well researched. The AI studies used in this aiSR did not demonstrate AI performance due to image degradation. The lack of image quality testing or the effect of image quality assessment on the AI performance will lower the overall score estimation during the comprehensive evaluation. Comparison of the AI models in the studies was based on accuracy, sensitivity (SEN), specificity (SPE), F1-score, precision, and recall metrics. The missing data was calculated based on the studies existing data, such as SEN, SPE, patient size, and sometimes using the true positive rate. The mean accuracy of the AI models extracted was 93.05±5.7%. The best accuracy was 98% [59] , while the sensitivity and specificity had a mean of 92.01±10.25% and 89.80±15.42%, respectively. All the studies combined achieved a mean F1-score and precision of 93.72±6.36% and 93.51±4.45%, respectively. Thus these performance metrics show high values, so certainly, these attributes will have strong contributions to the overall score when computing AP(ai)Bias. Note that the accuracies computed above considered the augmentation technique during data preparation process (also shown in recent studies [96] [97] [98] ). Our current novel ranking strategy, AP(ai)Bias helps to identify AI-based studies that are comprehensive, complete, errorfree, safe, and effective. On the contrary, ROBINS-I [99] and PROBAST [100] helps in identification of factors that contribute to RoB. This section is therefore focused on three kinds of paradigms for RoB in non-randomized AI for ARDS. We have considered 10 attributes for AI evaluation in AP(ai)Bias model that includes (i) segmentation and classification, (ii) cross-validation (CV) protocol, (iii) inter-& intraobserver variability (IIV), (iv) benchmarking, (v) scientific validation (SV) and clinical evaluation (CE), (vi) learning paradigm (LP), (vii) data preparation (DP), (viii) data size, (ix) performance evaluation (PE), and (x) innovation. Note that DP attribute was further subdivided into (vii.a) augmentation protocol [96] , [97] , [101] , (vii.b) region-of-volume selection, given the CT volume, (vii.c) format conversion declaration, (vii.d) manual tracings for ground truth or goal standard to generate binary shapes, and (vii.e) baseline characteristics demonstration. Each attribute (i-x) can earn up to a maximum of five points as presented in the section "Hypothesis and the Acceptability Criteria". The value obtained in each attribute is pro-rated based on the threshold adapted for the attributes. These values of each attribute are then added, leading to an accumulated score. Finally, these 42 scores (corresponding to 42 studies) were then ranked in decreasing order (low-bias to high-bias). The raw-cutoff (1.9) for the selection of AI studies was determined based on the intersection of the "cumulative plot of mean score vs. studies" and the "descending order of distribution of the scores of the studies" (Fig. 10) . Table V shows the AP(ai)Bias model-based ranking of the best 19 AI-based studies taken from a pool of 42 studies. According to the ranks, the color was assigned, where green is given to the group of the high-rank (low-bias), yellow for mid-rank (moderate-bias), and red for the low-rank (high-bias). Note that the maximum rank a study can obtain is 50 (10 attributes multiplied by a maximum of 5 points for each attribute). Using the above strategy and applying to all the 10 attributes to 19 studies, the top three contenders came out to be Wu et al. [64] , Alberto et al. [71] , and Ouchicha et al. [58] . To interpret these results, we analyze the mean scores for each of the ten attributes over all the 19 studies (see Table V : C1 to C10). The mean and the standard deviation of these attributes are shown in Fig. 11 , which can also be used for computing the percentage contribution of these attributes. (i) Learning paradigm (C6) was the highest scorer amongst all the attributes, since 86% of the studies were supervised. (ii) Inter-& intra-observer variability (C3, label as IIV) showed the lowest mean value amongst all the studies. This demonstrated that the evaluation of AI techniques was undermined and inconclusive. (iii) Data size (C8) attained a mean value of 2.7, meaning that most of the data size was in the range of 100 to 500 subjects. (iv) Four , and performance evaluation (C9), contributing to be 44%, 26%, 45.5%, and 30%, respectively. This clearly shows the segmentation and classification were binary. Only 26% of the studies implemented benchmarking. Amongst all the studies, 45.5% did the clinical evaluation with the radiologist or used a pre-evaluated cohort for the training. The attribute performance evaluation (C9) scored 31.5%, indicating that the accuracy of the AI models was around 93%. Our observation showed that only 6 studies ( [49] , [58] , [59] , [64] , [66] , [71] ) (Table V) were in the low-bias pool (bias-cutoff greater than 2.5, C12, Table V ) ((6/19) * 100∼32%), that passed the hypothesis. The objective of ROBINS-I is to mimic the randomization of the non-randomized studies. It covers seven distinct attributes (domains) divided into three intervention factors (marked parameters) for bias during (a) "Pre-Intervention," (b) "At-Intervention," and (c) "Post-Intervention," through which bias can be studied. Table II shows the bias due to (i) confounding factors (data size and data source); (ii) selection of participants (training and testing protocols); (iii) classification of interventions (data augmentation and use of imaging features); (iv) deviations from intended interventions (demographics, multicenter data, and comorbidity); (v) missing data (SWAB test, Table II (seven-column marked C1 to C7) below discusses the outcome of 19 studies ("Study" column). The three-color scheme is adopted to depict the outcome of the qualitative analysis. Red color means high-bias, indicating a severe issue in the study concerning the factors taken into consideration. Moderate-bias is depicted by yellow, indicating that the study holds good on the given set of non-randomized data, and the green means low-bias, as the study performs comparatively well on the testing parameters and the input data. We conclude from ROBINS-I (Table II) , "Study column, shaded in pink color" that ∼73% ((14/19) * 100) have at least one of the attributes with high-bias. Only three studies ( [64] , [71] , [80] ) passed the hypothesis test ((3/19) * 100 ∼16%). Further, we also conclude that 60% ((79/132) * 100) were low-moderate bias (green and yellow color) using the cell division approach. It is a popular AI-based RoB assessment tool. It uses four attributes as shown in Table III , where (a) participants, were the source of the image database and whether the radiologist verified them, (b) predictors, demographic data availability (Yes/No), and imaging features. (c) outcomes, consisted of the factors that whether different datasets were combined and if the reverse transcription-polymerase chain reaction (RT-PCR) test was conducted for the cohort, and (d) analysis, covers the cohort size, number of COVID-19 patients, optimization techniques used, validation, and the innovation in the design. We used the same 19 studies adapted for the AP(ai)Bias analysis ranking (Table V) . Using PROBAST, we conclude that ∼47% (9 out of 19) studies were high-bias (marked as H in red color); some even had an unclear RoB. Five studies [64] , [71] , [79] [80] [81] out of the 19 selected studies passed the hypothesis ((5/19) * 100 ∼26%). Using the cell division approach, PROBAST showed 67% ((51/76) * 100) were low-moderate bias. Following steps were adapted to generate the Venn diagram (Fig. 12) . (i) Conversion of ROBINS-I (Supplemental A) and PROBAST (Supplemental C) from qualitative scheme to quantitative scheme, using the conversion scores of low-biases to 5, moderate-bias to 3, high-bias to 1, and unclear-bias to 0. (ii) Selection of common studies between (a) ROBINS-I and PROBAST (left: [43] , [48] , [49] , [58] , [66] , [75] , [77] , right: [43] , [46] , [48] , [49] , [58] , [62] , [66] , [75] , [77] ), (b) PROBAST and AP(ai)Bias ranking (left: [48] , [75] , [77] , right: [43] , [46] , [48] , [62] , [75] , [77] ). (c) ROBINS-I and AP(ai)Bias (left: [48] , [62] , [75] , [77] , right: [43] , [46] , [48] , [62] , [75] , [77] ) and (d) between all the three paradigms (top: [48] , [75] , [77] , bottom: [43] , [46] , [48] , [62] , [75] , [77] ). This digital count is shown in the Venn diagram, Fig. 12 for high-bias and moderate-high bias. These counts are 7, 3, 4, 3 for high-bias, and 9, 6, 6, 6 for moderate-high bias, respectively. (iii) These digital counts are then converted into percentage by normalizing it by total studies (19) , shown in Fig. 12 . These percentages are 39%, 17%, 23%, and 17%, and 50%, 33%, 33%, 33% for high-bias and moderate-high bias, respectively. The top two low-bias studies [64] , [71] were the same in all three paradigms. It was interesting to note that reference [66] was low-bias using AP(ai)Bias, while it was in moderate-high bias for ROBINS-I and PROBAST. This is because the AP(ai)Bias takes into consideration attributes like CE, LP and DP. Note that Fig. 12 uses bias-cutoff of 2.2 and 2.5 for high-bias and moderate-high bias, respectively. The main contributions of the aiSR first showed the basic pipeline of the ARDS framework (Fig. 3) . The study then showed the statistical distribution visually for (a) different AI models, (b) image modalities, and (c) image dimension type (2-D vs. 3-D). The crux of the aiSR was to understand the Risk-of-Bias (RoB) in a non-randomized AI trial for handling ARDS using three paradigms: AP(ai)Bias, ROBINS-I, and PROBAST. AP(ai)Bias consisted of ten main attributes and several sub-attributes, while the ROBINS-I and PROBAST consisted of seven and four attributes. These two qualitative assessments were quantified using the same strategy as in AP(ai)Bias. This framework further allowed us to show how the AP(ai)Bias based ranking strategy can be compared against ROBINS-I and PROBAST using high-RoB and low-RoB cutoff's and pictorially represented using Venn diagram. The study further presented 12 (six primary and six secondary) recommendation for high-RoB studies for better AI designs for ARDS. Finally, the study corelated the pathophysiology of ARDS and lung damage process represented by GGO. In terms of imaging modality, CXR [46] , [57] , [59] , [72] , [80] is more economical, and it is seen that more than 50% of the studies had used them. GGO has been one of the most common manifestations in CT image volumes, but these may vary from person-to-person. It has been noted that data from the patients with rapid progression of the disease show a faster change in lung lesions, but to incorporate this in the AI framework, more data for the AI-model is required to train them. It also requires a follow-up on the patient's condition. The evaluation of the AI models has not been done using randomized trials, so to overcome this, we have proposed the usage of novel methods for the RoB analysis using ROBINS-I and PROBAST. Table IV shows a comparison between the previous aiSR, where 12 types of attributes were chosen to compare the five studies [11] [12] [13] [14] [15] . The proposed study is in the last column, labelled "Suri." Note that we offer " √ " in places for unique contribution in the "Suri model." We also offer recommendations in clinical validation, inter-and intra-observer variability, comorbidity, and risk granularity. The primary set of six-point recommendations is: (i) Comorbidity: This section focuses on a novel intuitive approach that can lead to new improvements to the present prevailing methods for COVID-19 diagnosis. Studies have shown that comorbidities like age, ethnicity, hypertension, diabetes, higher BMI, respiratory disorders, hyperlipidemia, and obesity lead to worsening of ARDS [102] , [103] . In these studies, ARDS is more prominent among the older patients, and if given a timely prognosis, can be controlled to reduce the risk factor and enhance the efficacy related to COVID-19. For validation of this data, a randomized controlled trial is necessary where the data is clinically validated. With AI techniques and comorbidity factors, this disease's prognosis and diagnosis can be made more accurate. (ii) Inter-and intra-observer variability: Our observation shows that only 12% of the AI-based studies attempted inter-and intra-observer variability study analysis. This does not assure that the AI results are robust. An example of comprehensive IIV can be seen in [104] , [105] . Similarly, interand intra-operator variability can be computed, ensuring further reliability in clinical settings. (iii) Data Size: Even though 50% of the AI studies for ARDS had a total number of subjects <500, this could be improved by targeting 1,000 and above. Typically, clinical trials adapted in meta-analysis range to many thousands (>15,000). This also requires conducting "power-analysis" for the AI system [106] [107] [108] . (iv) Scientific Validation and Clinical Evaluation: It requires the engineering AI-based design to be validated by the clinical community in terms of reliability, accuracy, and reproducibility. Therefore, the medical, scientific, and engineering community needs to collaborate more closely. They could also incur more costs for system design, should the medical community participate for a longer duration [109] , [110] . (v) Risk Granularity: Risk assessment in several medicine fields is binary; however, such a strategy poses a challenge during drug prescription and better patient care. For this reason, new strategies have evolved recently where a multiclass framework provides a stronger granularity of risk leading to better control of medications and monitoring [111] , [112] . This requires multiple classes in the ground truths design for COVID-19 severity. This means a careful examination of the CT lung images by the radiologist in conjunction with the pulmonologist. The second alternative is to stratify the risks of the AI systems' output in continuous values between 0 and 1, thereby partitioning the output into multiple classes. Note that the classes' thresholds are based on the baseline characteristics of the input cohort's demographics. Note that the classes' thresholds are based on the baseline characteristics of the input cohort's demographics. (vi) Scientific validation using cross-modality fusion: Scientific validation is crucial for ensuring the AI system's correct functioning. One way could be looking at COVID-19 using two different angles, such as imaging of the lung using CT and positron emission tomography (PET). With the advanced technology of combined PET/CT, one can visualize the PET images' metabolic distribution corresponding to the spatial CT [113] . Fig. 13 below shows the PET image showing the functional metabolic distribution of COVID-19. The selected studies did not consider PET/CT fusion as part of a systematic review. The secondary set of recommendations include (i) solid model design (training and prediction) [86] , [89] , [114] , (ii) reproducibility [104] , (iii) process of peer-review, (iv) high-quality documentation, (v) multi-ethnic and multi-regional data collection, and (vi) multiple ground-truth events for clinical validation [11] . Since all the three paradigms of hypothesis (section II.B) were invalid, we thus conclude that one requires the above concrete recommendations for improving AI design for meeting the requirements of the hypothesis keeping in mind for optimal AI performance. Note that in our study the AI solutions did not consider the socio-economic causes during the design. Recently a study [115] was conducted elaborating the need for inclusion of socio-economic conditions for COVID-19. This could possibly be adapted as an extension to the current work. The role of socio-economic conditions during the ML design was attempted by our group for neonatal deaths in different counties of Bangladesh [116] . A similar approach can also be adapted for COVID-19 based on geography, social-economic conditions. This main strength of this review is the selection of the best 19 AI studies for analyzing RoB. Most of the studies adapted DL as their base architecture due to the medical imaging source. As a result, DL was the best suited for its purpose. We observed that data augmentation was also used in 43% of the studies where the data was lacking. For the selected set of studies, we have successfully analyzed the risk factors using two schemes, ROBINS-I and PROBAST. There was a gap in the studies that there was no linking on how they did COVID severity, clinical validation, variability study, and cross-modality [117] . Some of the studies did not mention about hardware constraints. On top of all, none of them showed any 510(K) FDA approval. There is much inconsistency in the studies, which can be fixed by adopting some initial clinical validation on the patients using cross-modality. Even RL, TL with cross-modality, and comorbidity can introduce innovation to this lifesaving study worldwide. Understandably, this may take more time for the medical community but could cut corners on patient care. This study used the PRISMA model to select 89 studies, which was then AI-filtered to 42. Based on the mean score ranking, the selection was further refined to 19 studies using a raw cutoff of 1.9. Further, the three RoB paradigms were analyzed using the Venn diagram. The percentage of studies that satisfied the "non-randomized Artificial Intelligence-based" hypothesis for ARDS-based COVID-19-infected lungs were only 32%, 16%, and 26%, corresponding to AP(ai)Bias, ROBINS-I, and PROBAST, respectively. This percentage obtained was using high-bias cutoff of 2.2 and moderate-high bias cutoff of 2.5, respectively. None of the three RoB models met the requirement of the hypothesis. The aiSR's overall presents a set of six-point primary and six-point secondary recommendations for improving the AI design for ARDS were (i) the inclusion of comorbidity in AI design, (ii) increase in data size, (meets the performance of all the standards in ARDS COVID-19 lung infected patients), (iii) scientific validation using cross-modality, (iv) conducting the clinical validations, (v) improved inter-and intra-observer variability studies, (vi) risk granularity for better drug prescription. The secondary set of recommendations include (i) solid model design (training and prediction), (ii) reproducibility of the proposed model, (iii) process of peer-review, (iv) high-quality documentation, (v) multiethnic and multi-regional data collection, and (vi) multiple ground truth events for clinical validation. WHO COVID-19 Dashboard Molecular pathways triggered by COVID-19 in different organs: ACE2 receptor-expressing cells under attack? A review Imaging in COVID-19-related myocardial injury Integration of cardiovascular risk assessment with COVID-19 using artificial intelligence COVID-19 pathways for brain and heart injury in comorbidity patients: A role of medical imaging and artificial intelligence-based COVID severity classification: A review Cardiac troponin for the diagnosis and risk-stratification of myocardial injury in COVID-19: JACC review topic of the week Identifying protective health behaviors on Twitter: Observational study of travel advisories and Zika virus The use of Twitter to track levels of disease activity and public concern in the US during the influenza a H1N1 pandemic Public reaction to Chikungunya outbreaks in Italy-Insights from an extensive novel data streams-based structural equation modeling analysis Data mining and content analysis of the Chinese social media platform Weibo during the early COVID-19 outbreak: Retrospective observational infoveillance study Systematic review of artificial intelligence techniques in the detection and classification of COVID-19 medical images in terms of evaluation and benchmarking: Taxonomy analysis, challenges, future solutions and methodological aspects Coronavirus disease 2019 (COVID-19) CT findings: A systematic review and meta-analysis Artificial intelligence versus clinicians: Systematic review of design, reporting standards, and claims of deep learning studies Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: A systematic methodological review Prediction models for diagnosis and prognosis of COVID-19 infection: Systematic review and critical appraisal Machine learning models for image-based diagnosis and prognosis of COVID-19: Systematic review Molecular pathways triggered by COVID19 in different organs: ACE2 receptor-expressing cells under attack? A review EndNote X9 Effect of carotid image-based phenotypes on cardiovascular risk calculator: AECRS1.0 Artificial intelligence framework for predictive cardiovascular and stroke risk assessment models: A narrative review of integrated approaches using carotid ultrasound Lung recruitability in COVID-19-associated acute respiratory distress syndrome: A single-center observational study Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China A review of the coronavirus disease-2019 Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study COVID-19: Airborne transmission is being underestimated, warn experts COVID-19 and the cardiovascular system Innate immune response of human alveolar type ii cells infected with severe acute respiratory syndrome-coronavirus Understanding SARS-CoV-2-mediated inflammatory responses: From mechanisms to potential therapeutic tools The clinical pathology of severe acute respiratory syndrome (SARS): A report from China Overlapping and discrete aspects of the pathology and pathogenesis of the emerging human pathogenic coronaviruses SARS-CoV, MERS-CoV, and 2019-nCoV Neutrophil extracellular traps go viral Noncanonical inflammasome signaling elicits gasdermin D-dependent neutrophil extracellular traps Neutrophil extracellular traps in immunity and disease Can COVID-19 trigger the plaque vulnerability-a Kounis syndrome warning for 'asymptomatic subjects The acute respiratory distress syndrome The value of edema fluid protein measurement in patients with pulmonary edema COVID-19 Pneumonia: Different Respiratory Treatments for Different Phenotypes? Acute respiratory distress syndrome and diffuse alveolar damage. New insights on a complex relationship Diffuse alveolar damage-the role of oxygen, shock, and related factors. A review Pulmonary dead-space fraction as a risk factor for death in the acute respiratory distress syndrome Recognition of COVID-19 disease from X-ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique Explainable deep learning for pulmonary disease and coronavirus COVID-19 detection from X-rays Automatic pleural line extraction and COVID-19 scoring from lung ultrasound data Truncated inception net: COVID-19 outbreak screening using chest X-rays Implementation of a deep learning-based computer-aided detection system for the interpretation of chest radiographs in patients suspected for COVID-19 Deep learning approaches for COVID-19 detection based on chest X-ray images CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest X-ray images Improvement and multi-population generalizability of a deep learning-based chest radiograph severity score for COVID-19 From community-acquired pneumonia to COVID-19: A deep learning-based method for quantitative analysis of COVID-19 on thick-section CT scans CovXNet: A multidilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable multireceptive feature optimization Deep learning for automatic quantification of lung abnormalities in COVID-19 patients: First experience and correlation with clinical parameters Automated deep transfer learning-based approach for detection of COVID-19 infection in chest X-rays A deep learning approach to characterize 2019 coronavirus disease (COVID-19) pneumonia in chest CT images Automatic lung segmentation using control feedback system: Morphology and texture paradigm Deep learning COVID-19 features on CXR using limited training data sets CVDNet: A novel deep learning architecture for detection of coronavirus (COVID-19) from chest X-ray images Automated detection of COVID-19 cases using deep neural networks with X-ray images Deep learning for classification and localization of COVID-19 markers in point-of-care lung ultrasound Classification of COVID-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks An automated residual exemplar local binary pattern and iterative ReliefF based COVID-19 detection method using chest X-ray image A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis Deep learning-based multi-view fusion model for screening 2019 novel coronavirus pneumonia: A multicentre study Deep learning for detecting corona virus disease 2019 (COVID-19) on high-resolution computed tomography: A pilot study A novel comparative study for detection of COVID-19 on CT lung images using texture analysis, machine learning, and deep learning methods Deep learning-based decision-tree classifier for COVID-19 diagnosis from chest X-ray imaging Deep transfer learning artificial intelligence accurately stages COVID-19 lung disease severity on portable chest radiographs Lung infection quantification of COVID-19 in CT images with deep learning Automated quantification of CT patterns associated with COVID-19 from chest CT End-to-end learning for semiquantitative rating of COVID-19 severity on chest X-rays Iteratively pruned deep learning ensembles for COVID-19 detection in chest X-rays Severity assessment of coronavirus disease 2019 (COVID-19) using quantitative features from chest CT images POCOVID-Net: Automatic detection of COVID-19 from a new lung ultrasound imaging dataset (POCUS) Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning Diagnosing COVID-19 pneumonia from X-ray and CT images using deep learning and transfer learning algorithms Interpretable artificial intelligence framework for COVID-19 screening on chest X-rays Predicting COVID-19 pneumonia severity on chest Xray with deep learning COVIDNet-CT: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest CT images COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images Classification of severe and critical COVID-19 using deep learning and radiomics Adaptive feature selection guided deep forest for COVID-19 classification with chest CT Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: Evaluation of the diagnostic accuracy Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography The present and future of deep learning in radiology Lung Imaging and Computer Aided Diagnosis Six artificial intelligence paradigms for tissue characterization and classification of non-COVID-19 pneumonia against COVID-19 pneumonia in computed tomography lungs Recent Advances in Breast Imaging, Mammography, and Computer-Aided Diagnosis of Breast Cancer Lung Imaging and CADx Radiation dose and image quality of computed tomography of the supra-aortic arteries: A comparison between single-source and dual-source CT scanners Dose and perceived image quality in chest radiography Fischer's fused full field digital mammography and ultrasound system (FFDMUS) Breast image registration techniques: A survey Handbook of Biomedical Image Analysis Wilson disease tissue classification and characterization using seven artificial intelligence models embedded with 3D optimization paradigm on a weak training brain magnetic resonance imaging datasets: A supercomputer application Ultrasound-based internal carotid artery plaque characterization using deep learning paradigm on a supercomputer: A cardiovascular disease/stroke risk assessment system 3-D optimized classification and characterization artificial intelligence paradigm for cardiovascular/stroke risk stratification using carotid ultrasound-based delineated plaque: Atheromatic 2.0 ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions PROBAST: A tool to assess the risk of bias and applicability of prediction model studies 3-D optimized classification and characterization artificial intelligence paradigm for cardiovascular/stroke risk stratification using carotid ultrasound-based delineated plaque: Atheromatic 2.0 Bidirectional link between diabetes mellitus and COVID-19 leading to cardiovascular disease: A narrative review A narrative review on characterization of acute respiratory distress syndrome in COVID-19-infected lungs using artificial intelligence Intra-and inter-operator reproducibility analysis of automated cloud-based carotid intima media thickness ultrasound measurement Inter-observer variability analysis of automatic lung delineation in normal and disease patients Cardiovascular/stroke risk predictive calculators: A comparison between statistical and machine learning models Significance, errors, power, and sample size: The blocking and tackling of statistics Determining sample size Completely automated multiresolution edge snapper-A new technique for an accurate carotid ultrasound IMT measurement: Clinical validation and benchmarking on a multi-institutional database Automated and accurate carotid bulb detection, its verification and validation in low quality frozen frames and motion video Multiclass machine learning vs. conventional calculators for stroke/CVD risk assessment using carotid plaque predictors with coronary angiography scores as gold standard: A 500 participants study Multiclass magnetic resonance imaging brain tumor classification using artificial intelligence paradigm A case of COVID-19 lung infection first detected by [18F]FDG PET-CT Stochastic Modeling for Medical Image Analysis High tech, high risk: Tech ethics lessons for the COVID-19 pandemic response Risk factors of neonatal mortality and child mortality in Bangladesh A novel block imaging technique using nine artificial intelligence models for COVID-19 disease classification, characterization and severity measurement in lung computed tomography scans on an Italian cohort