key: cord-0681739-8kuitglm
authors: de Carvalho Brito, Vitória; dos Santos, Patrick Ryan Sales; de Sales Carvalho, Nonato Rodrigues; de Carvalho Filho, Antonio Oseas
title: COVID-index: A texture-based approach to classifying lung lesions based on CT images
date: 2021-06-06
journal: Pattern Recognit
DOI: 10.1016/j.patcog.2021.108083
sha: 19a89f6f37f94d2afc2da49ea8f41e5dd760ca82
doc_id: 681739
cord_uid: 8kuitglm

COVID-19 is an infectious disease caused by a newly discovered type of coronavirus called SARS-CoV-2. Since the discovery of this disease in late 2019, COVID-19 has become a worldwide concern, mainly due to its high degree of contagion. As of April 2021, the number of confirmed cases of COVID-19 reported to the World Health Organization has already exceeded 135 million worldwide, while the number of deaths exceeds 2.9 million. Due to the impacts of the disease, efforts in the literature have intensified in terms of studying approaches aiming to detect COVID-19, with a focus on supporting and facilitating the process of disease diagnosis. This work proposes the application of texture descriptors based on phylogenetic relationships between species to characterize segmented CT volumes, and the subsequent classification of regions into COVID-19, solid lesion or healthy tissue. To evaluate our method, we use images from three different datasets. The results are promising, with an accuracy of 99.93%, a recall of 99.93%, a precision of 99.93%, an F1-score of 99.93%, and an AUC of 0.997. We present a robust, simple, and efficient method that can be easily applied to 2D and/or 3D images without limitations on their dimensionality.

• We propose eight image texture descriptors, which do not require parameterization;

• The proposed descriptors do not need to resize images;

• We have developed a scalable method, since it can be easily used in 2D or 3D images, without restrictions regarding the quantization of images;

• Our descriptors can achieve results, as promising as deep networks, in some cases, with superior results;

• Our descriptors do not require powerful hardware, as well as, approaches using deep neural networks; and,

• Our descriptors, too, do not need large amounts of images to achieve good results.

Since the discovery of a new coronavirus in China in late 2019, the disease has become a global concern, mainly due to its rapid spread. As of April 2021, the number of confirmed cases notified to the World Health Organization (WHO) has already exceeded 135 million, while the number of deaths 5 bas exceeded 2.9 million [1] . COVID-19 is an infectious disease caused by a recently discovered type of coronavirus called SARS-CoV-2. Although most people infected with COVID-19 recover without special treatment, older people and people with preexisting illnesses such as diabetes, cardiovascular disease, chronic respiratory disease, and cancer are more likely to be severely affected 10 [1]. The early diagnosis of COVID-19 is therefore essential for the treatment of the disease. Real-time polymerase chain reaction (RT-PCR) or chest computed tomography (CT) examination are possible alternatives for early diagnosis.

Several computer-aided detection (CAD) systems for the early diagnosis of COVID-19 have been developed in this context. A CAD system typically con- 15 sists of three steps: (i) image acquisition; (ii) segmentation of candidate regions; and (iii) characterization and classification of these regions. In a CAD system, the segmentation stage is typically automatic, and needs to be able to handle numerous regions with similar characteristics (shape, density, or texture). It is therefore essential to apply a stage that efficiently classifies all of these regions. 20 Thus, the proposed method acts in the characterization and classification stages.

Overall, CT images of cases of COVID-19 share certain specific features, such as the presence of ground-glass opacities (GGOs) in the early stages and lung consolidation in the advanced stages [2] . Pleural effusion may also occur in cases of COVID-19, but is less common than the other lesions. It is therefore 25 important to point out some difficulties with this approach, as follows:

• Although the features of COVID-19 are found in most cases, CT images of some viral pneumonias also show these features, which can ultimately make diagnosis more difficult [2] ; 30 • In some cases of COVID-19, biopsies are needed [3] ;

• Correct classification is required between healthy and diseased regions, especially those with COVID-19 and other more serious diseases such as lung nodules; and

• According to [4, 5] , COVID-19 regions are generally more rounded in 35 shape.

In view of the above, we can see that there is a need to provide specialists with a method that can enable an individual analysis of the types of lesions found in CT scans. Through correct classification, our method can provide the individual details of the lesions, which can help the specialist in making decisions 40 regarding the need for biopsies. Furthermore, based on the work in [4, 5] , we believe that the techniques used in our method of texture characterization can provide more meaningful information for lesion classification, since the shape of the lesion is not considered in the analysis.

This work makes original contributions in the areas of both medicine and 45 computer science, as follows:

• In the area of computer science:

-In the context of COVID-19, we propose phylogenetic and taxonomic diversity indexes for the characterization of image textures;

-We improve the efficiency of the index calculation by optimizing the 50 phylogenetic tree assembly.

• In the medical field:

-We present a method that can be applied in patient triage and can therefore assist in the efficient management of healthcare systems;

-Our system can diagnose patients quickly, especially when the medical system is overloaded;

-Our approach can reduce the burden on radiologists and assist underdeveloped areas in making an accurate and early diagnosis.

The rest of the article is organized as follows: Section 2 reviews related works in the literature; Section 3 describes the proposed method; Section 4 60 presents the results obtained from an implementation of our approach; Section 5 discusses some relevant aspects of this work; and, finally, Section 6 presents the conclusion.

Today, pattern recognition, and particularly intelligent analysis, is one of 65 the most promising areas of computer science. Several studies are notable in this field, such as the method developed in [6] , which used a 3D model to segment brain regions. In [7] , intelligent solutions for expression recognition and landmark localization problems were presented, while the authors of [8] proposed a method based on adversary learning to improve the efficiency of 70 deep learning approaches in object detection. In the area of optimization, we highlight the work in [9] , which introduced a consensus-based technique for a new form of data clustering and representation. Finally, we note the study in [10] , which presented a method for recognizing patterns in low-resolution images using super-resolution networks.

Based on the aforementioned studies and global trends in this area, pattern recognition techniques have been used in numerous ways to help combat COVID19 and to mitigate the damage caused by the pandemic. In this section, we note some relevant works on this topic and several works that have applied phylogenetic diversity to image texture analysis to find solutions to other prob-80 lems.

There is a great variety of diversity indexes, each of which has particular properties according to their categorization, for example (i) those that exploit the richness of species and the abundance of individuals; (ii) those that explore 85 the relationships between species; and (iii) those that explore the topology of the representations of common ancestors.

There are several notable studies in this area. For example, the work in [11] used the phylogenetic distance and taxonomic diversity with a support vector machine (SVM) to classify pulmonary nodules. Other approaches have also used 90 these indexes in automatic methods for the detection of glaucoma [12] and for the classification of breast lesions [13] .

As can be seen from the studies described above, the literature contains recent works that have used phylogenetic diversity indexes as texture descriptors and have applied this approach to diverse problems with promising results. Since 95 these were carried out in contexts that were different from ours, the studies listed above will not be used for a comparison of our results, but solely to highlight the contributions of the descriptors in the literature.

In [14] a methodology based on X-ray images and deep learning techniques 100 was presented for the classification of images into COVID-19 and non-COVID- 19 . The results were promising, with an accuracy of above 90%. The authors of [15] also analyzed X-ray images, and combined the texture and morphological features to classify them into COVID-19, bacterial pneumonia, non-COVID-19 viral pneumonia, and normal. The results showed an AUC of 0.87 for this multi-105 class classification scheme. Feature combination was also used in [16] , where radiomic features and clinical details were used to describe CT images and a Random Forest algorithm was applied to classify the features into non-severe and severe COVID-19.

Due to a lack of availability of public CT data on COVID-19, the authors of 110 [17] built a dataset called COVID-CT, composed of 349 CT images that showed COVID-19 and 463 that did not. In [18] , two subsets were extracted from a set of 150 CT images containing COVID-19 and non-COVID-19 patches, and classification was carried out using the deep feature fusion and ranking technique.

The studies in [2] and [19] also applied CT images and deep learning approaches Unlike the previous approaches, the authors of [22] , [23] and [24] proposed methodologies for the detection of COVID-19 using 3D volumes of CT scans.

In [22] , each volume was segmented using a pre-trained UNet and later classified with a weakly-supervised 3D deep neural network called DeCoVNet. The 125 authors of [23] also evaluated their proposed method using 3D CT regions, this time obtained from scans of 81 patients. Their scheme was a radiomic model that combined texture features with patients' clinical data to classify COVID-19 into common or severe types. Finally, the approach in [24] classified CT images into healthy, idiopathic pulmonary fibrosis (IPF) and COVID-19 using a 3D 130 approach called three-dimensional multiscale fuzzy entropy.

It can be observed from the above studies that solving the problem of COVID-19 diagnosis is not a simple task. Despite the application of various CNN-based methods to image classification, in which a CNN is responsible for extracting and selecting representative features in its convolutional layers, these 135 feature maps are not always efficient enough to allow for classification.

Although recent work on a diverse range of imaging applications has used CNNs, with results that have surpassed those of other methods, in the case where there are representative features of a specific problem, the use of these features may be more efficient than CNN methods. Another problem encoun-140 tered when using CNNs, which was also noted in [17, 18] , is the large number of parameters required to create models for both the architecture and the pa-rameters. The authors of these studies therefore proposed the use of transfer learning instead.

In addition, the training of a CNN requires considerable time in order to 145 create a capable model, and several tests of architectures and parameters are required. Powerful machines are also needed to run these networks. Finally, the use of a CNN requires a large number of images, and data augmentation is often required to handle this issue. However, this is not a trivial task, as it requires countless tests and training of the whole network until satisfactory results are 150 obtained.

We therefore propose to use phylogenetic diversity indexes for the feature extraction task of COVID-19, solid lesions and healthy tissue, in conjunction with the random forest and extreme gradient boost classifiers.

This section describes our methodology for classifying CT volumes as COVID-19, solid lesion, or healthy tissue. The images used in this study were acquired from the Lung Image Database Consortium Image Collection (LIDC-IDRI) [25] and from the MedSeg [26] repository, the latter of which contained two datasets with COVID-19 images. At the feature extraction stage, we applied phyloge-160 netic diversity indexes. The extraction and classification algorithms developed in this study are available from our GitHub repository. Figure 1 illustrates the workflow of our methodology.

We used three sets of volumes of interest (VOIs) extracted from the LIDC-165 IDRI and the MedSeg repositories to evaluate our method. These were as follows: (i) a set of images extracted from the LIDC-IDRI that contained VOIs showing solid lesions, for which we used the markings made by specialists for the base documentation; (ii) a set of healthy tissue VOIs that were extracted from LIDC-IDRI by applying the algorithm proposed in [27] , to guarantee that the VOIs of healthy tissue did not intersect with those of solid-type lesions (this method was chosen as it could provide samples found in real scenarios); and (iii) a set of images acquired from the MedSeg repository [26] which contained some external datasets of various types of CT exams, including those diagnosed with COVID-19. Hence, we used two different sets of images containing lesions 175 caused by COVID-19, i.e., regions with GGO lesions, consolidation, and pleural effusion. We used the specialists' markings that were available for the respective datasets to extract these lesions. Since MedSeg does not provide terminology for the COVID-19 datasets used here, we refer to these as COVID-19 (Dataset

180 Table 1 shows the number of images in each dataset. 

In this step, we present the rationale for the proposed indexes for texture characterization. Each index corresponds to a certain characteristic, meaning that a total of eight characteristics are extracted from each analyzed image. 

Phylogenetics is a branch of biology in which the evolutionary relationships between species are studied and the similarities between them described. In phylogenetic trees, leaves represent species and nodes represent common ancestors. The phylogenetic tree used in this work is called a cladogram. Figure 2   190 illustrates an example of a cladogram that represents the genetic relationship between the monkey and human species; it can be observed that from a genetic perspective, humans and chimpanzees are closer than the other pairs of species in the tree. A combination of phylogenetic trees and phylogenetic diversity indexes is 195 used to analyze the evolutionary relationships between species and to measure the variation between species in a community. In order to be able to apply these concepts to the characterization of CT images, we need to define a correspondence between the definitions used in biology and those used in this work. This is illustrated in Figure 3 . Using the correspondence shown in Figure 3 as a basis, we generate a cladogram for each study image, and an example of this is shown in Figure 4 The phylogenetic diversity (PD) index [28] is a measure that gives the sum of the distances of phylogenetic branches in the tree. When the branch length is longer, the species become more distinct. Equation (1) shows the formula for the PD, where B represents the number of branches in the tree, L i is the 210 extension of branch i (the number of edges in that branch), and A i refers to the average abundance of the species that share branch i:

The sum of phylogenetic distances (SPD) is a phylogenetic index that gives the sum of the distances between the pairs of species present in the tree [29] .

Equation (2) is used to calculate this index, where S represents the number and a j correspond to the abundance of species i and j, respectively. The term

S j=i+1 in the numerator represents the double sum of the products of the distances between all species in the tree based on their abundance; in the denominator, it corresponds to the double sum of the products of the abundances 220 of the species.

(2)

The mean nearest neighbor distance (MNND) is a weighted average of the phylogenetic distance of the nearest neighbor of each species [30] . The weights represent the abundance of each species. Equation (3) shows the formula used to calculate this index, where S represents the number of species in the community, 225 min(d ij ) represents the distance between species i and j, and a i corresponds to the abundance of species i. In the case of d ij , j refers to the closest relative of the species i.

The phylogenetic species variability (PSV) index measures the variation between two species in a community, and quantifies the phylogenetic relationship 230 between them.The PSV is calculated using Equation (4), where C is a matrix, trC is the sum of the diagonal values of this matrix, c represents the sum of all of the values in the matrix, and S is the total number of species.

The phylogenetic species richness (PSR) index calculates the richness of the species present in a community based on their variability [29] . As shown in 235 Equation (5), this calculation is done by multiplying the number of species (S) by the PSV.

The mean phylogenetic distance (MPD) represents the average phylogenetic distance, which is calculated by analyzing combinations of all pairs of species in the community [30] . The equation for this index uses the total number of 240 species, indicated by S, the phylogenetic distance between each pair of species, denoted by d ij , and a variable p i p j , which takes a value of one if the species is present and zero otherwise. The term

S j=i+1 is a double sum of the products of the distances between all species in the tree and the value indicating the presence or absence of the species, i.e., one or zero Equation (6) shows the 245 formula used to calculate the MPD.

The taxonomic diversity index ( ) value represents the average phylogenetic distance between the individuals of the species [31] . This index takes into consideration the number of individuals of each species and the taxonomic relationships between them. The formula for calculating is defined by Equation

(7), where a i (i = 1, ..., S) represents the abundance of species i, a j (j = 1, ..., S) represents the abundance of species j, S indicates the total number of species, n denotes the total number of individuals, and d ij is the taxonomic distance between species i and j.

Finally, the taxonomic distinction index ( * ), defined by Equation (8), ex-255 presses the average taxonomic distance between two individuals of different species [31] . In this expression, a i (i = 1,...,S) is the abundance of species i, a j (j = 1,...,S) is the abundance of species j, S is the total number of species and d ij is the taxonomic distance between species i and j. * =

The equations in Section 3.2.1 are derived in relation to biological concepts.

For a better understanding of how these indexes are calculated for images, we present an example of a three-dimensional image with two slices, from which we extract the cladogram and calculate the distances and the eight indexes. We used a small image so that the calculations were not extensive in the paper. To calculate the phylogenetic distances based on the cladogram, we use the 270 following equations:

where i and j are two different species. in Figure 4 .

In our implementation of these indexes, we represent the cladogram as a histogram structure. Each position in the histogram represents a species (which are the intensities in the image), and each value refers to the abundance (which is the number of voxels with each intensity). We can then calculate the distances 280 using the histogram. When constructing the cladogram, we apply a simple but Figure 4 . Table 2 , while the abundance of species can be seen in Figure   4 . Since the calculation of the PD is more complex than the other indexes, we 295 describe it using Table 3 .

Based on the values in Table 3 ,the PD value for the image example was as follows: To calculate the SPD, as shown in Equation (2) represents the closest relative of i (intensity j, which refers to the intensity following i), and a i denotes the abundance of the species i (number of voxels with intensity i). Since our cladogram has only one path between one species and another, the minimum path is the only path between species. Thus, for our image, the MNND has the following value:

To calculate the PSV, as shown in Equation (4) 

The PSR, as shown in Equation (5) To calculate the MPD, as defined in Equation (6), we consider the sum of the distances between species (d ij ) multiplied by the variables p i and p j , which take on a value of zero if the species is not present, or one if the species is present. Thus, when a given intensity in the histogram exists in the image, p is set to one; otherwise, p is set to zero.

In the calculation of , as defined in Equation (7), d ij represents the distance between the species (intensities) i and j, a refers to the abundance of the species Finally, the calculation of * , as defined in by Equation (8), is similar to that of , with the difference that in the this case, of * the denominator is the sum of the multiplication products of species the abundances of species.

Substituting in the values, we have:

ΣΣ i<j a i a j = 21 * = 87 21 = 4.14

For pattern recognition from the extracted characteristics, we applied two classifiers that are commonly used in the literature, the Random Forest (RF) [32] and Extreme Gradient Boost (XGBoost) [33] algorithms, both of which 300 were used with the default parameters in the Sklearn library [34] . We applied the cross-validation technique to validate the models, which involved randomly dividing the set of images into k folds of approximately equal size. The first fold was treated as a validation set, and the method was trained using the other k -1 folds. Each image in the data sample is assigned to an individual group, and 305 stays in that group for the duration of the procedure. This means that each sample can be used in the validation set only once but is used to train the model k − 1 times. For k = 5, each fold contains an average of 20% of the number of images in each class. Table 4 gives a brief description of the classifiers and the main parameters used in the models.

310 Table 4 : Classifiers used in the proposed method. XGBoost is an optimized machine learning technique developed in [33] , which is based on decision trees and uses a gradient-increasing structure. This algorithm was designed to be flexible and efficient, and its parameters can easily be changed [13] . XGBoost can be applied to regression and classification problems. 

For a better understanding of the proposed method, we present an example of the complete flow of our methodology, using a real volume from the dataset, 

This section presents the results of tests performed on the datasets described in Section 3.1. The characterization was performed using the techniques described in Section 3.2 and classification was carried out as described in Section 3.3. This section is divided into three parts, as follows: (i) we present the results of our method; (ii) we carry out an extensive comparison with results from sim-335 ilar schemes; and finally, (iii) we explore the results obtained from our method by analyzing some cases of success and failure;

Our tests were divided into three types of experiments, as shown in Table   5 . Each experiment included combinations of classification with the bases used, Table 6 shows the results achieved from the proposed method for the three experiments summarized in Table 5 . These experiments were performed to 345 illustrate the potential of the proposed method for different test scenarios and classifiers. prediction. In addition to the use of the indexes to carry out characterization efficiently, the low number of features facilitates the classification task with the techniques used here.

Another important observation that can be made from the data in Table 6 relates to the good balance of the metrics. A good classification method should 360 allow us to successfully classify not only sick cases but also healthy cases, thus enabling more efficient patient triage.

To analyze the impact of the results achieved by our method concerning other descriptors, we performed a series of comparisons to investigate the potential of the proposed method. For each experiment in Table 5 , we evaluated the 365 following descriptors:

• Traditional approaches All of the results in Table 7 were obtained using k-fold cross-validation, with 380 k=5. For clarity, we present only the best result for each descriptor in each experiment.

Based on the values in Table 7 , we can conclude that the classification problem addressed in this work is challenging. A CAD system must strike a good balance between the evaluation metrics, and the proposed method therefore We believe that this was due to the limitations on the characterization of these

descriptors.

An analysis of the execution time for the classification shows that the pro-400 posed method required less time than the other approaches due to the small number of attributes generated by the indexes. As the results in Table 6 show,

XGBoost was the classifier that achieved the best results. It appears that the characterization capabilities of the diversity indexes were sufficient to highlight the texture contained in each image class effectively. It should also be high- to 3D images, and these were different from the indexes proposed in this work. In addition, the phylogenetic tree used by our method was the cladogram, whereas the dendrogram was used in [11];

• The methods developed in [13, 12] aimed to solve different problems using 2D images;

• In addition to their different goals, all of the methodologies shown in Table   8 used different strategies to achieve their goals; and

• We performed a simple but effective optimization of the cladogram that allowed for higher efficiency of the index calculation process.

Next, we compared our approach with those in the papers reviewed in Section 430 2.2. Table 9 shows a summary of these methods. Only the best results from our method are highlighted in the table.

From Table 9 , several important points can be noted. Overall, our approach proved superior to the other methods; however, each study used a different dataset, and only [15] and [24] performed a multiclass classification. The types 435 of images differed: only [22] , [23] and [24] used CT volumes in their method, whereas the others used 2D regions.

None of the other papers in Table 9 classified lung regions into COVID-19, solid lesions and healthy tissue. For this reason, we performed the comparison presented earlier (Table 7) , where we used techniques from related papers to 440 reproduce the experiments using our dataset; this allowed for a fairer comparison with the other approaches, and demonstrated the efficiency of the proposed descriptors in terms of characterizing the images.

Our method uses information from the image itself to perform the classification; since we extract features that can represent the image well, there is no 445 need for exhaustive training to search for the best classification architectures and parameters, and we achieved an AUC of higher than 0.98 for both RF and XGBoost. 

In this section, we analyze our results based on some samples that were 450 correctly classified by the method and others that were incorrectly classified.

For this analysis, we chose the best result from Scenario 1.

We first present a plot of the features extracted with the proposed indexes.

We applied principal component analysis (PCA) to reduce the dimensionality of the data while preserving the most important information. Figure 6 (a) shows 455 both the PCA has three components of the data, and the generated confusion matrix is shown in Figure 6 (b). It can be seen that the healthy tissue class, for which the results were more accurate, was spatially distant from the other classes, which facilitated the work of the classifiers. In contrast, most of the samples of solid lesions and COVID-19 classes were spatially close, but it was 460 possible to draw a decision boundary between them. In the next analysis, we randomly selected a sample from each item in the confusion matrix for the test case. Figure 7 shows selected images in the form of a confusion matrix. The red dashes represent the samples in which the classifier was wrong, while the green ones represent the correct answers. 465 We can see that the healthy tissue class has a markedly different presentation from the others, thus proving the PCA presented in the previous analysis and Finally, for the lesion in Figure 7 .1(e), we can see that the graph in Figure   7 .2(e) differs from the pattern seen in the other graphs in Figure 7 .2, which is 475 why the classifier identified it as healthy tissue.

The proposed method extracts texture descriptors to enable the classification of images of lung tissue. Based on the results presented in Section 4, we point

In this work, we propose a method for characterizing the texture of lung tissue that can be used to correctly classify lesions caused by COVID-19, solid lesions, and healthy tissue. The application of this methodology can assist in the 500 early diagnosis of COVID-19 by a specialist or as part of a screening mechanism.

• The phylogenetic diversity indexes adopted in this work, when used with the XGBoost classifier, showed the best efficiency in terms of the characterization and classification of lung lesions, and yielded promising results 505 in all of the experiments that were performed;

• The results were promising and consistent in all test scenarios and with all of the classifiers used, motivating the application of descriptors proposed in real environments;

• Of all the works considered here, the set of images used with our method 510 was the most extensive; this is important, because it showed that the proposed descriptors achieved good results, even on a wide variety of images;

• The properties of COVID-19 lesions are also common to other lung diseases, and especially to viral forms of pneumonia. The individual classification of lesions is a strong advantage of our method;

• We believe that the promising results obtained by our method are related to the use of phylogenetic indexes for characterization, as each of these contributes to a particular attribute of the image, meaning that our approach can achieve good discrimination between tissue types associated with lung lesions. 3. Adapt our method for the classification of COVID-19 variants.

We believe that the method presented here can be integrated into a CAD that can be applied in real situations to aid in the diagnosis of COVID-19. Our approach offers benefits both to specialists, who can obtain a second opinion at 530 the diagnosis stage, and to patients, since early detection is important to allow them to receive treatment promptly, thus increasing the chances of a cure.

computed tomography: A texture analysis based on three-dimensional

Coronavirus disease (covid-19) outbreak situation

A deep learning algorithm using ct images to screen for corona virus disease

Histopathologic changes and sars-cov-2 immunostaining in the lung of a patient with covid-19

Outcome of pulmonary spherical ground-glass opacities on ct in patients with coronavirus disease 2019 (covid-19): A retrospective analysisdoi:10

A ct radiomics analysis of covid-19-related ground-glass opacities and consolidation: Is it valuable in a differential diagnosis with other atypical pneumonias?

Brain segmentation based on multi-atlas and diffeomorphism guided 3d fully convolutional network ensembles

Residual multi-task learning for facial landmark localization and expression recognition

Fooling deep neural detection networks with adaptive object-oriented adversarial perturbation

A novel consensus learning approach to incomplete multi-view 580 clustering

585 Pedestrian detection with super-resolution reconstruction for lowquality image

Classification of patterns of benignity and malignancy based on ct using topology-based phylogenetic diversity index and convolutional neural network

Generative adversarial network and texture features applied to automatic glaucoma detection

Breast cancer diagnosis from 605 histopathological images using textural features and cbir, Artificial intelligence in medicine 105

Classification of covid-19 610 in chest x-ray images using detrac deep convolutional neural networkdoi

Machine-learning classification of texture 615 features of portable chest x-ray accurately classifies covid-19 lung infection

Severity assessment of covid-19 using ct image features and laboratory indices

Sample-efficient deep learning for covid-19 diagnosis based on ct scans

Coronavirus (covid-19) classifica-630 tion using deep features fusion and ranking technique, in: Big Data Analytics and Artificial Intelligence Against COVID-19: Innovation Vision and Approach

A novel comparative study for detection of covid-19 on ct lung images using texture analysis, machine learning, and deep learning methods

Classification of the covid-19 infected patients using densenet201 based deep transfer learning

Using artificial intelligence to detect covid-19 and communityacquired pneumonia based on pulmonary ct: Evaluation of the diagnostic accuracy

Deep learning-based detection for covid-19 from chest ct using weak label

Identification of common and severe covid-19: the value of ct texture analysis and correlation with clinical characteristics

Evaluation of covid-19 chest entropy

The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans

Covid-19 ct segmentation dataset

Automatic detection of solitary lung nodules using quality threshold clustering, genetic algorithm and diversity index

Conservation evaluation and phylogenetic diversity

Phylogenetic measures of biodiversity

Exploring the phylogenetic structure of ecological communities: An example for rain forest trees

A taxonomic distinctness index and its statistical 700 properties

Random forests

Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, Association for Computing Machinery

Scikit-learn: 715 Machine learning in Python

2009 IEEE Conference on Computer Vision and Pattern Recognition

Author Biography Vitória de Carvalho Brito graduated in Bachelor of Information Systems from the Federal University of Piauí (UFPI)

Nonato Rodriges de Sales Carvalho master's degree student in Electrical Engineering from the Federal University of Piauí (UFPI)

His research interests include medical image processing, machine learning and deep learning. 735 Declaration of Interest Statement Authors certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-740 licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations

The proposed method was supported by the following institutions: FAPEPIwww.fapepi.pi.gov.br (5492.UNI253.59248.15052018); CAPES -www.capes.gov.br; 535 and the CNPq -www.cnpq.br (435244/2018-3).

out some highlights and some aspects that need further investigation.