key: cord-0980527-8eeoh21h
authors: Pennisi, Matteo; Kavasidis, Isaak; Spampinato, Concetto; Schinina, Vincenzo; Palazzo, Simone; Salanitri, Federica Proietto; Bellitto, Giovanni; Rundo, Francesco; Aldinucci, Marco; Cristofaro, Massimo; Campioni, Paolo; Pianura, Elisa; Di Stefano, Federica; Petrone, Ada; Albarello, Fabrizio; Ippolito, Giuseppe; Cuzzocrea, Salvatore; Conoci, Sabrina
title: An Explainable AI System for Automated COVID-19 Assessment and Lesion Categorization from CT-scans
date: 2021-05-21
journal: Artif Intell Med
DOI: 10.1016/j.artmed.2021.102114
sha: 8d5db6e33568565bf08f54809ae0b885b7f18787
doc_id: 980527
cord_uid: 8eeoh21h

COVID-19 infection caused by SARS-CoV-2 pathogen has been a catastrophic pandemic outbreak all over the world, with exponential increasing of confirmed cases and, unfortunately, deaths. In this work we propose an AI-powered pipeline, based on the deep-learning paradigm, for automated COVID-19 detection and lesion categorization from CT scans. We first propose a new segmentation module aimed at automatically identifying lung parenchyma and lobes. Next, we combine the segmentation network with classification networks for COVID-19 identification and lesion categorization. We compare the model’s classification results with those obtained by three expert radiologists on a dataset of 166 CT scans. Results showed a sensitivity of 90.3% and a specificity of 93.5% for COVID-19 detection, at least on par with those yielded by the expert radiologists, and an average lesion categorization accuracy of about 84%. Moreover, a significant role is played by prior lung and lobe segmentation, that allowed us to enhance classification performance by over 6 percent points. The interpretation of the trained AI models reveals that the most significant areas for supporting the decision on COVID-19 identification are consistent with the lesions clinically associated to the virus, i.e., crazy paving, consolidation and ground glass. This means that the artificial models are able to discriminate a positive patient from a negative one (both controls and patients with interstitial pneumonia tested negative to COVID) by evaluating the presence of those lesions into CT scans. Finally, the AI models are integrated into a user-friendly GUI to support AI explainability for radiologists, which is publicly available at http://perceivelab.com/covid-ai. The whole AI system is unique since, to the best of our knowledge, it is the first AI-based software, publicly available, that attempts to explain to radiologists what information is used by AI methods for making decisions and that proactively involves them in the decision loop to further improve the COVID-19 understanding.

the COVID-19 pneumonia, i.e., consolidation, ground glass and crazy paving, demonstrating its reliability in supporting the diagnosis by using only radiological images. Finally, we integrate the tested AI models into a userfriendly GUI to support further AI explainability for radiologists, which is publicly available at http://perceivelab.com/covid-ai. The GUI processes entire CT scans and reports if the patient is likely to be affected by  showing, at the same time, the scan slices that supported the decision.

To sum up, the main contributions of this paper are the following:

• We propose a novel lung-lobe segmentation network outperforming stateof-the-art models; • We employ the segmentation network to drive a classification network that first identifies CT scans of COVID-19 patients, and, afterwards, automatically categorizes specific lesions;

• We then provide interpretation of the decisions made by the employed models and discover that, indeed, the proposed approach focuses on specific COVID-19 lesions for distinguishing whether a CT scan is related to positive patients or not;

• We finally integrate the whole AI pipeline into a web platform to ease use for radiologists, supporting them in their investigation on COVID-19 disease. To the best of our knowledge, this is the first publicly available platform that offers COVID-19 diagnosis services based on CT scans with explainability capabilities. The free availability to the general public for such an important task, while the pandemic is still in full effect, is, in our opinion, an invaluable aid to the medical community.

The COVID-19 epidemic caught the scientific community flat-footed and in response a high volume of research has been dedicated at all possible levels. In particular, since the beginning of the epidemic, AI models have been employed for disease spread monitoring [8, 9, 10] , for disease progression [11] and prognosis [12] , for predicting mental health ailments inflicted upon healthcare workers [13] and for drug repurposing [14, 15] and discovery [16] .

However, the lion's share in employing AI models for the fight against COVID-19 belongs to the processing of X-rays and CT scans with the purpose of detecting the presence of COVID-19 or not. In fact, recent scientific literature has demonstrated the high discriminative and predictive capability of deep learning methods in the analysis of COVID-19 related radiological images [17, 18] . The key radiological techniques for COVID-19 induced pneumonia diagnosis and progression estimation are based on the analysis of CT and X-ray images of the chest, on which deep learning methodologies have been widely used with good results for segmentation, predictive analysis, and discrimination of patterns [19, 20, 21] . If, on one hand, X-Ray represents a cheaper and most effective solution for large scale screening of COVID-19 disease, on the other hand, its low J o u r n a l P r e -p r o o f Journal Pre-proof resolution has led AI models to show lower accuracy compared to those obtained with CT data.

For the above reasons, CT scan has become the gold standard for investigation on lung diseases. In particular, deep learning, mainly in the form of Deep Convolutional Neural Networks (DCNN), has been largely applied to lung disease analysis from CT scans images, for evaluating progression in response to specific treatment (for instance immunotherapy, chemotherapy, radiotherapy) [22, 23] , but also for interstitial lung pattern analysis [24, 25] and on segmentation and discrimination of lung pleural tissues and lymph-nodes [26, 27] . This latter aspect is particularly relevant for COVID-19 features and makes artificial intelligence an extremely powerful tool for supporting early diagnosis of COVID-19 and disease progression quantification. As a consequence, several recent works have reported using AI models for automated categorization of CT scans [21] and also on COVID-19 [28, 29, 30] but without being able to distinguish between the various types of COVID-19 lesions.

The proposed AI system aims at 1) extracting lung and lobes from chest CT data, 2) categorizing CT scans as either COVID-19 positive or COVID-19 negative; 3) identifying and localizing typical COVID-19 lung lesions (consolidation, crazy paving and ground glass); and 4) explaining eventually what CT slices it based its own decisions.

Our lung-lobe segmentation model is based on the Tiramisu network [31] , a fully-convolutional DenseNet [32] in a U-Net architecture [33] . The model consists in two data paths: the downsampling one, that aims at extracting features and the upsampling one that aims at generating the output images (masks). Skip connections (i.e., connections starting from a preceding layer in the network's pipeline to another one found later bypassing intermediate layers) aim at propagating high-resolution details by sharing feature maps between the two paths.

In this work, our segmentation model follows the Tiramisu architecture, but with two main differences:

• Instead of processing each single scan individually, convolutional LSTMs [34] are employed at the network's bottleneck layer to exploit the spatial axial correlation of consecutive scan slices.

• In the downsampling and upsampling paths, we add residual squeeze-and excitation layers [35] , in order to emphasize relevant features and improve the representational power of the model. Before discussing the properties and advantages of the above modifications, we first introduce the overall architecture, shown in Fig. 1 .

The input to the model is a sequence of 3 consecutive slices -suitably resized to 224×224 -of a CT scan, which are processed individually and combined through a convolutional LSTM layer. Each slice is initially processed with a standard convolutional layer to expand the feature dimensions. The resulting feature maps then go through the downsampling path of the model (the encoder) consisting of five sequences of dense blocks, residual squeeze andexcitation layers and transition-down layers based on max-pooling. In the encoder, the feature maps at the output of each residual squeeze-and-excitation layer are concatenated with the input features of the preceding dense block, in order to encourage feature reuse and improve their generalizability. At the end of the downsampling path, the bottleneck of the model consists of a dense block followed by a convolutional LSTM. The following upsampling path is symmetric to the downsampling one, but it features: 1) skip connections from the downsampling path for concatenating feature maps at the corresponding layers of the upsampling path; 2) transition-up layers implemented through transposed convolutions. Finally, a convolutional layer provides a 6-channel segmentation map, representing, respectively, the log-likelihoods of the lobes (5 channels, one for each lobe) and non-lung (1 channel) pixels.

In the following, we review the novel characteristics of the proposed architecture.

Explicitly modeling interdependencies between feature channels has demonstrated to enhance performance of deep architectures; squeeze-and-excitation layers [35] instead aim to select in- formative features and to suppress the less useful ones. In particular, a set of input features of size C × H × W is squeezed through average-pooling to a C × 1 × 1 vector, representing global feature statistics. The "excitation" operator is a fully-connected non-linear layer that translates the squeezed vector into channel-specific weights that are applied to the corresponding input feature maps.

We adopt a recurrent architecture to process the output of the bottleneck layer, in order to exploit the spatial axial correlation between subsequent slices and enhance the final segmentation by integrating 3D information in the model. Convolutional LSTMs [34] are commonly used to capture spatio-temporal correlations in visual data (for example, in videos), by extending traditional LSTMs using convolutions in both the input-to-state and the state-to-state transitions. Employing recurrent convolutional layers allows the model to take into account the context of the currently-processed slice, while keeping the sequentiality and without the need to process the entire set of slices in a single step through channel-wise concatenation, which increases feature sizes and loses information on axial distance. Fig. 2 shows an example of automated lung and lobe segmentation from a CT scan by employing the proposed segmentation network. The proposed segmentation network is first executed on the whole CT scan for segmenting only lung (and lobes); the segmented CT scan is then passed to the downstream classification modules for COVID-19 identification and lesion categorization.

After parenchyma lung segmentation (through the segmentation model presented in Sect. 3.1) a deep classification model analyzes slice by slice each segmented CT scan, and decides whether a single slice contains evidence of the COVID-19 disease. Note that slice-based COVID-19 classification is only the initial step towards the final prediction, which takes into account all per-slice predictions, and assigns the "positive" label in presence of a certain number of slices (10% of the total) that the model has identified as COVID-19 positive. Hence, COVID-19 assessment is actually carried out per patient, by combining per-slice predictions.

At this stage, the system does not carry out any identification and localization of COVID-19 lesions, but it just identifies all slices where patterns of interest may be found and according to them, makes a guess on the presence or not of COVID-19 induced infection. An overview of this model is shown in Fig. 3 : first the segmentation network, described in the previous section, identifies lung areas from CT scan, then a deep classifier (a DenseNet model in the 201 configuration [32] ) processes the segmented lung areas to identify if the slice shows signs of COVID-19 virus.

Once the COVID-19 identification model is trained, we attempt to understand what features it employs to discriminate between positive and negative cases. Thus, to interpret the decisions made by the trained model we compute classdiscriminative localization maps that attempt to provide visual explanations of the most significant input features for each class. To accomplish this we employ GradCAM [36] combined to VarGrad [37] . More specifically, GradCAM is a technique to produce such interpretability maps by investigating output gradient with respect to feature map activations. More specifically, GradCAM generates class-discriminative localization map for any class c by first computing the gradient of the score for class c, s c , w.r.t feature activation maps Ak of a given convolutional layer. Such gradients are then global-average-pooled to obtain the activation importance weights w, i.e.:

(1) Afterwards, the saliency map S c , that provides an overview of the activation importance for the class c, is computed through a weighted combination of activation maps, i.e.: VarGrad is a technique used in combination to GradGAM and consists in performing multiple activation map estimates by adding, each time, Gaussian noise to the input data and then aggregating the estimates by computing the variance of the set.

An additional deep network activates only if the previous system identifies a COVID-19 positive CT scan. In that case, it works on the subset of slices identified as COVID-19 positives by the first AI system with the goal to localize and identify specific lesions (consolidation, crazy paving and ground glass). More specifically, the lesion identification system works on segmented lobes to seek COVID-19 specific patterns. The subsystem for lesion categorization employs the knowledge already learned by the COVID-19 detection module (shown in Fig. 3 ) and refines it for specific lesion categorization. An overview of the whole system is given in Fig. 4 .

In order to explain to radiologists, the decisions made by a "black-box" AI system, we integrated the inference pipeline for COVID-19 detection into a J o u r n a l P r e -p r o o f Journal Pre-proof web-based application. The application was designed to streamline the whole inference process with just a few clicks and visualize the results with a variable grade of detail (Fig. 5 ). If the radiologists desire to see which CT slices were classified as positive or negative, they can click on "Show slices" where a detailed list of slices and their categorization is showed (Fig. 6 ).

Because the models may not achieve perfect accuracy, a single slice inspection screen is provided, where radiologists can inspect more closely the result of the classification. It also features a restricted set of image manipulation tools (move, contrast, zoom) for aiding the user to make a correct diagnosis (Fig.  7) .

The AI-empowered web system also integrates a relevance feedback mechanism where radiologists can correct the predicted outputs, and the AI module exploits such a feedback to improve its future assessments. Indeed, both at the CT scan level and at the CT slice level, radiologists can correct models' prediction. The AI methods will then use the correct labels to enhance their future assessments.

Data. Our dataset contains overall 166 CT scans: 72 of COVID-19 positive patients (positivity confirmed both by a molecular -reverse transcriptasepolymerase chain reaction for SARS-coronavirus RNA from nasopharyngeal aspirates -and an IgG or IgM antibody test) and 94 of COVID-19 negative subjects (35 patients with interstitial pneumonia but tested negative to COVID19 and 59 controls). 

Positives 72 

Procedure COVID-19 Identification Model. The COVID-19 detection network is a DenseNet201, which was used pretrained on the ImageNet dataset [40] . The original classification layers in DenseNet201 were replaced by a 2-output linear layer for the COVID-19 positive/negative classification. Given the class imbalance in the training set, we used the weighted binary cross-entropy (defined in 3) as training loss and RT-PCR virology test as training/test labels. The weighted binary cross-entropy loss for a sample classified as x with target label y is then calculated as:

where w is defined as the ratio of the number negative samples to the total number of samples if the label is positive and vice versa. This way the loss results higher when misclassifying a sample that belongs to the less frequent class. It is important to highlight that splitting refers to the entire CT scan and not to the single slices: we made sure that full CT scans were not assigned in different splits to avoid any bias in the performance analysis. This is to avoid the deep models overfit the data by learning spurious information from each CT scan, thus invalidating the training procedure, thus enforcing robustness to the whole approach. Moreover, for the COVID-19 detection task, we operate at the CT level by processing and categorizing each single slice. To make a decision for the whole scan, we perform voting: if 10% of total slices is marked as positive then the whole exam is considered as a COVID-19 positive, otherwise as COVID-19 negative. The choice of the voting threshold was selected according to the best operating point in the ROC curve.

network is also a DenseNet201 model where classification layers were replaced by a 4-output linear layer (ground glass, consolidation, crazy paving, negative). The lesion categorization model processes lobe segments (extracted by our segmentation model) with the goal to identify specific lesions. Our dataset contains 2,488 annotated slices; in each slice multiple lesion annotations with relative location (in lobes) are available. Thus, after segmenting lobes from these images we obtained 5,264 lobe images. We did the same on CT slices of negative patients (among the 2,950 available as shown in Tab. 1) and selected 5,264 lobe images without lesions. Thus, in total, the entire set consisted of 10,528 images. We also discarded the images for which lobe segmentation produced small regions indicating a failure in the segmentation process. We used a fixed test split consisting of 195 images with consolidation, 354 with crazy paving, 314 with ground glass and 800 images with no lesion. The remaining images were split into training and validation sets with the ratio 80/20. Given the class imbalance in the training set, we employed weighted cross-entropy as training loss. The weighted cross-entropy loss for a sample classified as x with target label y is calculated as:

where C is the set of all classes. The weight w for each class c is defined as: (5) where N is the total number of samples and Nc is the number of samples that have label c.

Since the model is the same as the COVID identification network, i.e., DenseNet201, we started from the network trained on the COVID-identification task and fine-tune it on the categorization task to limit overfitting given the small scale of our dataset.

For both the detection network and the lesion categorization network, we used the following hyperparameters: batch-size = 12, learning rate = 1e-04, ADAM back-propagation optimizer with beta values 0.9 and 0.999, eps = 1e-08 and weight decay = 0 and the back-propagation method was used to update the J o u r n a l P r e -p r o o f models' parameters during training. Detection and categorization networks were trained for 20 epochs. In both cases, performance are reported at the highest validation accuracy.

Lung/lobe segmentation model. For lung/lobe segmentation, input images were normalized to zero mean and unitary standard deviation, with statistics computed on the employed dataset. In all the experiments for our segmentation model, input size was set to 224 × 224, initial learning rate to 0.0001, weight decay to 0.0001 and batch size to 2, with RMSProp as optimizer. When CLSTMs were employed, recurrent states were initialized to zero and the size of the input sequences to the C-LSTM layers was set to 3. Each training was carried out for 50 epochs. All experiments have been executed using the HPC4AI infrastructure [41] .

In this section report the performance of the proposed model for lung/lobe segmentation, COVID-19 identification and lesion categorization.

Our segmentation model is based on the Tiramisu model [31] with the introduction of squeeze-and-excitation blocks and of a convolutional LSTM (either unidirectional or bidirectional) after the bottleneck layer. In order to understand the contribution of each module, we first performed ablation studies by testing the segmentation performance of our model using different architecture configurations:

• Baseline: the vanilla Tiramisu model described in [31] ;

• Res-SE: residual squeeze-and-Excitation module are integrated in each dense block of the Tiramisu architecture;

• C-LSTM: a unidirectional convolutional LSTM is added after the bottleneck layer of the Tiramisu architecture;

• Res-SE + C-LSTM: variant of the Tiramisu architecture that includes both residual squeeze-and-Excitation at each dense layer and a unidirectional convolutional LSTM after the bottleneck layer.

Baseline Tiramisu [31] 89. 41 Table 2 : Ablation studies of our segmentation network in terms of dice score. Best results are shown in bold. Note: we did not compute confidence intervals on these scores as they are obtained from a very large set of CT voxels.

We also compared the performance against the U-Net architecture proposed in [39] that is largely adopted for lung/lobe segmentation.

All architectures were trained for 50 epochs by splitting the employed lung datasets into a training, validation and test splits using the 70/10/20 rule. Results in terms of Dice score coefficient (DSC) are given in Tab. 2. It has to noted that unlike [39] , we computed DSC on all frames, not only on the lung slices. The highest performance is obtained with the Res-SE + C-LSTM configuration, i.e., when adding squeeze-and-excitation and the unidirectional C-LSTM at the bottleneck layer of the Tiramisu architecture. This results in an accuracy improvement of over 4 percent points over the baseline. In particular, adding squeeze-and-excitation leads to a 2 percent point improvement over the baseline. Segmentation results are computed using data augmentation obtained by applying random affine transformations (rotation, translation, scaling and shearing) to input images. The segmentation network is then applied to our COVID-19 dataset for prior segmentation without any additional fine-tuning to demonstrate also its generalization capabilities.

We here report the results for COVID-19 diagnosis, i.e., classification between positive and negative cases. In this analysis, we compare model results to those yielded by three experts with different degree of expertise:

1. Radiologist 1: a physician expert in thoracic radiology (∼30 years of experience) with over 30,000 examined CT scans; 2. Radiologist 2: a physician expert in thoracic radiology (∼10 years of experience) with over 9,000 examined CT scans; 3. Radiologist 3: a resident student in thoracic radiology (∼3 years of experience) with about 2,000 examined CT scans.

It should be noted that the gold standard employed in the evaluation is provided by molecular and antibody tests, hence radiologists' assessments are not the reference for performance comparison.

We also assess the role of prior segmentation on the performance. This means that in the pipelines showed in Figures 3 and 4 we removed the segmentation modules and performed classification using the whole CT slices using also information outside the lung areas. Results for COVID-19 detection are measured in terms of sensitivity, specificity and AUC, and are given in Tables   Sensitivity  C.I Table 4 : Specificity (in percentage together with 95% confidence interval) comparison between manual readings of expert radiologists and the AI model for COVID-19 detection without lung segmentation and AI model with segmentation. 4.3.2, 4.3.2 and 5. Note that the AUC is a reliable metric in our scenario, since we explicitly defined the test set to be balanced among classes. More recent techniques [42] may be suitable when this assumption does not hold, as is often the case for new or rare diseases.

Our results show that the AI model with lung segmentation achieves higher performance than expert radiologists. However, given the relatively small scale of our dataset, statistical analysis carried out with the Chi-squared test does not show any significant difference between AI models and radiologists. Furthermore, performing lung segmentation improves by about 6 percent points both the sensitivity and the specificity, demonstrating its effectiveness. In addition, we also measure how the sensitivity of the COVID-19 identification changes w.r.t. the level of disease severity. In particular, we categorize the 31 positive cases into three classes according to the percentage of the affected lung area: low severity (11 cases), medium severity (11 cases), high severity (9 cases). Results are reported in Table 6 that shows how our AI-based method seems to be yielding better assessment than the domain experts, especially at the beginning of the disease (low severity). This is important as an earlier disease detection may lead to a more favorable outcome. In case of high severity, two out of three radiologists showed difficulties in correctly identifying the COVID-19, mainly because when the affected lung area is significant, the J o u r n a l P r e -p r o o f typical COVID patterns are less visible. However, even in this case, our deep learning model was able to discriminate robustly COVID cases. As a backbone model for COVID-19 identification, we employ DenseNet201 since it yields the best performance when compared to other state of the art models, as shown in Table 7 . In all tested cases, we use upstream segmentation through the model described in Sect. 3.1. Voting threshold was set to 10% on all cases.

In order to enhance trust in the devised AI models, we analyzed what features these methods employ for making the COVID-19 diagnosis decision. This is done by investigating which artificial neurons fire the most, and then projecting this information to the input images. To accomplish this we combined GradCAM [36] with VarGrad [37] 9 and, Fig. 8 shows some examples of the saliency maps generated by interpreting the proposed AI COVID-19 classification network. It is interesting to note that the most significant activation areas correspond to the three most common lesion types, i.e., ground glass, consolidation and crazy paving. This is remarkable as the model has indeed learned the COVID-19 peculiar patterns without any information on the type of lesions (to this end, we recall that for COVID-19 identification we only provide, at training times, the labels "positive" or "negative", while no information on the type of lesions is given). 

For COVID-19 lesion categorization we used mean (and per-class) classification accuracy over all lesion types and per lesion that are provided, respectively, in Table 8 . Note that no comparison with radiologists is carried out in this case, since ground-truth labels on lesion types are provided by radiologists themselves, hence they are the reference used to evaluate model accuracy.

Model J o u r n a l P r e -p r o o f Mean lesion categorization accuracy reaches, when operating at the lobe level, about 84% of performance. The lowest performance is obtained on ground glass, because ground glass opacities are specific CT findings that can appear also in normal patients with respiratory artifact. Operating at the level of single lobes yields a performance enhancement of over 21 percent points, and, also in this case, radiologists did not have to perform any lobe segmentation annotation, reducing significantly their efforts to build AI models. The most significant improvement when using lobe segmentation w.r.t. no segmentation is obtained on the Crazy Paving class, i.e., 98.3% against 57.1%.

Although COVID-19 diagnosis from CT scans may seem an easy task for experienced radiologists, our results show that this is not always the case: in this scenario, the approach we propose has demonstrated its capability to carry out the same task with an accuracy that is at least on par with, or even higher than, human experts, thus showing the potential impact that these techniques may have in supporting physicians in decision making. Artificial intelligence, in particular, is able to accurately identify not only if a CT scan belongs to a positive patient, but also the type of lung lesions, in particular the smaller and less defined ones (as those highlighted in Fig. 8 ). As shown, the combination of segmentation and classification techniques provides a significant improvement in the sensitivity and specificity of the proposed method.

Of course, although the results presented in this work are very promising in the direction of establishing a clinical practice that is supported by artificial intelligence models, there is still room for improvement. One of the limitations of our work is represented by the relatively low number of samples available for the experiments. In order to mitigate the impact of this issue, we carried out confidence level analysis to demonstrate the statistical significance of our results. Moreover, the employed dataset consists of images taken by the same CT scanner, not tested in multiple scanning settings. This could affect the generalization of the method on images taken by other CT scanner models; however, this issue can be tackled by domain adaptation techniques for the medical imaging domain, which is an active research topic [43, 44, 45] .

Finally, one of the key features of our approach is the integration of explainability functionalities that may help physicians in understanding the reasons underlying a model's decision, increasing in turn, the trust that experts have in AI-enabled methods. Future developments in this regard should explore, in addition to model explainability, also causability features in order to evaluate the quality of the explanations provided [46, 47] .

J o u r n a l P r e -p r o o f

In this work we have presented an AI-based pipeline for automated lung segmentation, COVID-19 detection and COVID-19 lesion categorization from CT scans. Results showed a sensitivity of 90.3% and a specificity of 93.5% for COVID-19 detection and average lesion categorization accuracy of about 84%. Results also show that a significant role is played by prior lung and lobe segmentation, that allowed us to enhance diagnosis performance of about 6 percent points.

The AI models are then integrated into a user-friendly GUI to support AI explainability for radiologists, which is publicly available at http://perceivelab. com/covid-ai. To the best of our knowledge, this is the first AI-based software, publicly available, that attempts to explain radiologists what information is used by AI methods for making decisions and that proactively involves in the loop to further improve the COVID-19 understanding.

The results obtained both for COVID-19 identification and lesion categorization pave the way to further improvements, driven towards the implementation of an advanced COVID-19 CT/RX diagnostic pipeline, that is interpretable, robust and able to provide not only disease identification and differential diagnosis, but also the risk of disease progression.

This work has been also partially supported by:

The REHASTART project funded by Regione Sicilia (PO FESR 2014/2020 -Azione 1.1.5)

• The "Go for IT" project funded by the Conference of Italian University

Rectors (CRUI)

• The DeepHealth project, funded under the European Union's Horizon 2020 framework, grant agreement No. 825111

A novel coronavirus from patients with pneumonia in china

Novel coronavirus (2019-ncov): situation report

Use of chest ct in combination with negative rt-pcr assay for the 2019 novel coronavirus but high clinical suspicion

Imaging profile of the covid-19 infection: radiologic findings and literature review

Clinical and ct imaging features of the covid-19 pneumonia: Focus on pregnant women and children

Ct imaging features of 2019 novel coronavirus (2019-ncov)

Advanced deep learning embedded motion radiomics pipeline for predicting anti-pd-1/pd-l1 immunotherapy response in the treatment of bladder cancer: Preliminary results

On the coronavirus (covid-19) outbreak and the smart city network: universal data sharing standards coupled with artificial intelligence (ai) to benefit urban health monitoring and management

Combat covid-19 with artificial intelligence and big data

Predicting covid-19 in china using hybrid ai model

Early triage of critically ill covid-19 patients using deep learning

Artificial in-ˇ telligence in prediction of mental health disorders induced by the covid-19 pandemic among health care workers

Application of artificial intelligence in covid-19 drug repurposing

Artificial intelligence approach fighting covid-19 with repurposing drugs

Baricitinib as potential treatment for 2019-ncov acute respiratory disease

Explainable deep learning for pulmonary disease and coronavirus covid-19 detection from x-rays

Serial quantitative chest ct assessment of covid-19: Deep-learning approach

Pulmonary artery-vein classification in ct images using deep learning

18th International Conference

Artificial intelligence-enabled rapid diagnosis of patients with covid-19

Pulmonary nodule detection in ct images: false positive reduction using multi-view convolutional networks

Bladder cancer treatment response assessment in ct using radiomics with deep-learning

Classification of interstitial lung abnormality patterns with an ensemble of deep convolutional neural networks

Holistic classification of ct attenuation patterns for interstitial lung diseases via deep convolutional neural networks

Advanced segmentation techniques for lung nodules, liver metastases, and enlarged lymph nodes in ct scans

Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

Ai augmentation of radiologist performance in distinguishing covid-19 from pneumonia of other etiology on chest ct

The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation

Densely connected convolutional networks., in: CVPR

U-net: Convolutional networks for biomedical image segmentation

Convolutional lstm network: A machine learning approach for precipitation nowcasting

Squeeze-and-excitation networks

Gradcam: Visual explanations from deep networks via gradient-based localization

Sanity checks for saliency maps

The lung image database consortium, (lidc) and image database resource initiative (idri):: a completed reference database of lung nodules on ct scans

Automatic lung segmentation in routine imaging is a data diversity problem, not a methodology problem

2009 IEEE conference on computer vision and pattern recognition

HPC4AI, an AI-on-demand federated platform endeavour

A new concordant partial auc and partial c statistic for imbalanced data in the evaluation of machine learning algorithms

Unsupervised domain adaptation for medical imaging segmentation with self-ensembling

Unsupervised reverse domain adaptation for synthetic medical images via adversarial training

Semisupervised learning with generative adversarial networks for chest x-ray classification with ability of data domain adaptation

From machine learning to explainable ai

Causability and explainability of artificial intelligence in medicine