key: cord-1020579-l04jqax1 authors: Wang, Shui-Hua; Satapathy, Suresh Chandra; Anderson, Donovan; Chen, Shi-Xin; Zhang, Yu-Dong title: Deep Fractional Max Pooling Neural Network for COVID-19 Recognition date: 2021-08-10 journal: Front Public Health DOI: 10.3389/fpubh.2021.726144 sha: 02f9c34d3b343fa626bc3285c5301e8ce1b62aff doc_id: 1020579 cord_uid: l04jqax1 Aim: Coronavirus disease 2019 (COVID-19) is a form of disease triggered by a new strain of coronavirus. This paper proposes a novel model termed “deep fractional max pooling neural network (DFMPNN)” to diagnose COVID-19 more efficiently. Methods: This 12-layer DFMPNN replaces max pooling (MP) and average pooling (AP) in ordinary neural networks with the help of a novel pooling method called “fractional max-pooling” (FMP). In addition, multiple-way data augmentation (DA) is employed to reduce overfitting. Model averaging (MA) is used to reduce randomness. Results: We ran our algorithm on a four-category dataset that contained COVID-19, community-acquired pneumonia, secondary pulmonary tuberculosis (SPT), and healthy control (HC). The 10 runs on the test set show that the micro-averaged F1 (MAF) score of our DFMPNN is 95.88%. Discussions: This proposed DFMPNN is superior to 10 state-of-the-art models. Besides, FMP outperforms traditional MP, AP, and L2-norm pooling (L2P). Coronavirus disease 2019 (COVID-19) is a form of disease triggered by a new strain of coronavirus. "CO" stands for corona; "VI" virus; and "D, " disease. Until 28 June 2021, COVID-19 caused more than 181.437 million confirmed cases and over 3.929 million deaths. The pie chart of the top 10 countries with new cases, new death tolls, cumulated cases, and cumulated death tolls is displayed in Figure 1 . To effectively diagnose COVID-19, there exist two types of methods: (i) polymerase chain reaction (PCR), particularly real-time reverse-transcriptase PCR (rRT-PCR) with nasopharyngeal swab samples to test the existence of RNA fragments (1); and (ii) chest imaging (CI) examines the evidence of COVID-19 in the lung. The rRT-PCR is commonly used nowadays, but it has three shortcomings: (i) It has to wait for a few days to get the results; (ii) The samples are easily contaminated by the environment; (iii) Its performances on COVID-19 variants (2) are still under investigation. On the contrary, CI diagnosis has quite a few advantages compared to rRT-PCR (3) . (i) Chest imaging is able to detect conclusive evidence-lesions of lungs where "ground-glass opacity (GGO)" patches are observed to distinguish COVID-19 from healthy people. (ii) Chest imaging provides an instant result as soon as imaging is complete. ( iii) The previous study shows that chest computed tomography (CCT), one CI approach, can detect 97% of COVID-19 infections (4) . At present, there exist three styles of CI approaches: (i) chest X-ray, (ii) chest CT, and (iii) chest ultrasound. Among all three styles of CI approaches, CCT is capable of providing finer resolution than the other two styles (chest Xray and chest ultrasound), granting visualization of exceptionally small nodules in the lung, and displaying the realistic threedimensional imaging of the chest (5) . Some COVID-19 lesions are clearly observed in CCT, while they appear opaque in the other two CI approaches (chest X-ray and chest ultrasound) (6) . However, manual labeling on CCT images by human experts is tedious, onerous, labor-intensive, and time-consuming. In addition, the labeling performances are easily affected by interexpert and intra-expert factors (e.g., emotion, lethargy, tiredness, etc.). Furthermore, early-stage lesions are small and look similar to nearby healthy tissues (7) , making them more difficult to measure. Thus, those lesions are potentially ignored by human experts. Therefore, scholars nowadays favor using artificial intelligence (AI) and modern deep learning (DL) to assist radiologists in recognizing COVID-19. Yao (8) proposed a wavelet entropy biogeography-based optimization (WEBBO) method for COVID-19 diagnosis. Wu (9) presented three-segment biogeography-based optimization (3SBBO) for recognizing COVID-19 patients. Wang where m S k means the number of slices per class via the SLS, and, m P k , the number of patients per class. The overall STSR is defined as Five hundred and twenty-one subjects and 1,164 slice images were enrolled and extracted in (18) . Table 1 lists the demographics of the four-class cohort. Meanwhile, the values of triplets m k , m P k , and m S k of each class are displayed. From Table 1 , we can observe the overall STSR m = 2.23. Three experienced radiologists-one senior: M 3 and two juniors: M 1 and M 2 -were convened to curate all the images. Let b C mean one CCT scan and l A the labeling of each individual radiologist. The concluding labeling l F A of the CCT scan b C is written as: where h MAV denotes majority voting (MAV) function; l All A , the labeling of all three radiologists, viz., The above two formulas indicate that in cases of disagreement between the analyses of two junior radiologists (M 1 , M 2 ), a senior radiologist (M 3 ) is consulted to reach a MAVtype consensus. Table 2 presents the abbreviations and corresponding definitions. Let the raw dataset be symbolized as F A , each slice be symbolized as f a , and the number of total slices of all four classes be |F|, we have The size of each image is where W F A , H F A means the maximum values of width and height to the image set F A . h size is the size function. Figure 2 portrays the pipeline for preprocessing. Here, First, the color CCT images are converted into gray scale by retaining the luminance channel and obtaining the gray scale (21) . The grayscaled data set is symbolized as denotes the values of red, green, and blue color channels, the grayscaled image is calculated as Second, the histogram stretching (HS) is harnessed to increase the contrast of all images f b (i) . Take the i-th image f b (i) as an instance; its image-wise minimum grayscale value f l b (i) is calculated as: The image-wise maximum grayscale values f h b (i) is calculated as: Here, (w, h) means the index of width and height directions along with the image f b (i), respectively. W F B , H F B means the maximum values of width and height to the image set F B . The histogram stretched image set F C = f c (i) , i = 1, 2, · · · , |F| is calculated as: Third, cropping is performed to remove (i) the checkup bed at the bottom area, (ii) the texts at the margin regions, and (iii) the ruler along the right-side and bottom areas. Each image in the where W F C , H F C means the maximum values of width and height to the image set F C . (c 1 , c 2 , c 3 , c 4 ) means pixels to be cropped from four directions of the left, right, top, and bottom, respectively (unit: pixel). where h res stands for the resizing function. In this study, Figure 3 displays exemplar images of the four classes, where three are diseased and one is healthy. The meaning of k can be found at Table 1 The space-saving ratio (SSR) value v SSR can be calculated as. Pooling is necessary to reduce the size of the feature map (FM) (23) , which is generated after the convolution layer. Suppose, the input FM is with the size of N in × N in , and the output FM is N out × N out . Usually, N out < N in . In another sense, the pooling divides the input FM into N 2 out pooling regions P ij The output is Input kl (17) where h pool is different pooling function, such as max function in MP or average function in AP (24) . There are also more complicated pooling functions, such as the stochastic function (25) and rank-based functions. Traditional regular pooling methods with astride (α) of 2 are analyzed. For non-overlapping, we have For overlapping, we have The pooling regions of both cases are portrayed in Figure 4 . The red, green, yellow, and blue rectangles represent the four steps of both pooling procedures. In either non-overlapping or overlapping cases, we can observe Thus, the spatial size of FM halves in size with each pooling layer. This halving brings a by-product of discarding 1 − (0.5) 2 = 75% information of the previous FM. The rapid reduction may worsen the performance. Therefore, Graham (26) proposed a novel fractional max pooling (FMP), i.e., α × α MP, where α is allowed to take non-integer values. In their paper, they set So this can help make the pooling n times slower than the regular 2 × 2 pooling. FMP has been extended to new models, such as bi-linearly weighted FMP (27) and shallow and wide FMP (28) . are two increasing sequences of integers with N out numbers, staring at 1 and ending with 1 + N in . Also, all increments equal to either 1 or 2. That is The pooling regions can be formulated as: In this study, we choose disjoint type FMP. We also tested overlapping FMP; the computation burden increases, but the performance does not improve. Figure 5A shows a square grid where N in = 30. Figures 5B-D shows the FMP results according to α = 1.4, 1.5, and 1.6, respectively. The corresponding N out = 21, 20, and 19, respectively. Finally, Figure 5E displays the result with α = 2, which corresponds to the regular 2 × 2 pooling where the output N out = 15. We built a 12-layer DFMPNN from scratch. Its structure is itemized in Table 3 . Here, NWL represents the number of weighted layers and HS hyperparameter setting. Transfer learning, such as ResNet-50 (29), may help quickly build the network. In our study, we find ResNet-50 and other pretrained models do not provide competitive performances as building networks from scratch, which is coherent with the reports in (20) . Figure 6 shows the FM of all layers of this DFMPNN. Since our network is deep, we show Layer 1 to Layer 13 at Figure 6A and Layer 13 to Layer 25 at Figure 6B in which the random sequences {a i } and b i are generated differently at each run. Therefore, this network can be easily implemented multiple times, and thus making an ensemble of those implementations (31) . That is, the different pooling-region setting of each implementation defines a different member of the ensemble. The MA can help DFMPNN get better results. For a given test image, if we implement T tests, the MAV of the Ttests will be used as the final prediction. To alleviate the overfitting and coping with the small-size dataset problem, we used the 18-way DA in (32) . In their paper, X 1 = 9 different DA methods were used on both the raw image r (i) and its horizontally mirrored image r hm (i). The X 1 DAs are rotation, Gaussian noise, Gamma correction, random translation, vertical shear, salt-and-pepper noise, speckle noise, horizontal shear, and scaling, shown in Figure 7 . Suppose, the raw image is r (i) and the number of DA methods X 1 . Let x be the index of DA, and K x , x = 1, . . . , X 1 be each DA operation; we have: Step 1, X 1 geometric/photometric/noise-injection DA transforms are utilized on raw image r (i). Thus, we have X 1 augmented datasets on raw image r (i) as Note, each DA operations K x will yield X 2 new images: Step 2, horizontally mirrored imager hm (i) is generated via the horizontally mirrored function h m , Import Raw image r(i). Step 1.1 X 1 geometric/photometric/noise-injection DA transforms are utilized on raw image r (i). Step 1. 2 We obtain K x [r (i)] , x = 1, . . . , X 1 datasets. See Equation (25). Step 1.3 Each dataset contains X 2 images. See Equation (26) . Step 2 A horizontal mirror image is generated as Step 3.1 X 1 DA transforms are utilized on thehorizontally mirrored image r hm (i). Step 3.2 We obtain K x r hm (i) , x = 1, · · · , X 1 datasets. Each dataset contains X 2 images. See. Equation (28) . Step 4 The r (i), r hm (i), K x [r (i)] , x = 1, . . . , X 1 , andK x r hm (i) , x = 1, · · · , X 1 are combined to form a new enhanced dataset D (i). See Equation (29) . Enhanced dataset D (i). Its number of images is X 3 = 2 × X 1 × X 2 + 2. See Equation (30) . Step 3, all the X 1 different DA methods are performed on the horizontally mirrored image r hm (i), and generate X 1 new datasets as Step 4, the r (i), r hm (i), K x [r (i)] , x = 1, . . . , X 1 , and K x r hm (i) , x = 1, · · · , X 1 are combined via the concatenation function h co . That is, one raw training image r(i) will generate to an enhanced dataset D(i): Let X 3 represent the augmentation factor, i.e., the number of elements in the enhanced dataset D (i),; we have Finally, Table 4 shows the pseudocode of 18-way DA. Frontiers in Public Health | www.frontiersin.org 9 August 2021 | Volume 9 | Article 726144 1, . . . , 4. The non-test set will cover 80% of the total set, and the test set will cover 20% of the total set. The experiment consists of two phases. At Phase I "Validation, " 10-fold cross-validation is harnessed for validation on the nontest set, for the aim of selecting the best hyperparameters and best network structure. The 18-way DA is utilized on the training set. At Phase II "Test, " our model is trained, using the non-test set U ntest Q t times with (i) different initial seeds and (ii) the best hyperparameters/network structure obtained at Phase I. We attained the test results over the test set U test . Once combining the Q t runs, a summation of the test confusion matrix (TCM) E t is obtained. The ideal TCM is a diagonal matrix with the form of where all the off-diagonal elements are zero, E t ideal i, j = 0, i = j, indicating no prediction errors. In realistic occasions, all AI models will, no doubt, make errors. Hence, the performance per category is calculated to measure realistic AI models. For each class k = 1, . . . , 4, the label of that class is set to positive, and the labels of all the rest classes 1, . . . , k − 1, k + 1, . . . , 4 as negative. The definitions of true positive (TP), false negative (FN), false positive (FP), and true negative (TN) are illustrated in Figure 8 . Three performances metrics (sensitivity, precision, and F1 score) per category are defined: The performances of our DFMPNN model are measured over all four categories. The MAF score (symbolized as F1 µ ) is harnessed since our dataset is slightly unbalanced. MAF is defined as where Sen µ and Pr c µ are defined as. The parameter setting is itemized in augmentation factor is 542. We report our performance on 10 runs over the test set. Taking Figure 3A as an exemplar raw image r (i), Figure 9 shows the X 1 different DA results on raw image, i.e., K x [r (i)] , x = 1, . . . , X 1 . Due to the page limit, the horizontally mirrored image and its corresponding X 1 -way DA results are not shown here. We now demonstrate the effectiveness of FMP. If we use standard pooling methods with astride of 2, the corresponding networks will shrink faster and have a shallower depth. The three comparison baseline pooling methods are L2-norm pooling (L2P), MP, and AP. The results of 10 runs over the test set are itemized in Table 8 . The bar plot is shown in Figure 11 , where k−S, k−P, k−F, and k ∈ 1, 2, 3, and 4 stand for the sensitivity, precision, and F1 score for category k. The rightmost bar "MAF" stands for the micro-averaged F1 score. In terms of MAF, our DFMPNN model based on FMP attains the best results of 95.88%. The second best is MP, with an MAF of 92.92%. The AP ranks the third best with an MAF of 92.53%. The worst is L2P, with an MAF of 91.80%. The reason why our FMP attains the best results are two points: (i) The FMP makes the reduction of FM slower, so it can create a deeper network. (ii) The MA helps recreate the performance of our DFMPNN network. In the future, we shall try two FMP extension models (27, 28) to test whether we can further the performances. We compared our proposed DFMPNN method with 10 state-ofthe-art methods: WEBBO (8), 3SBBO (9), DeCovNet (10), FSVC (11), GN-COD (12), GLCMSVM (13), 5L-DCNN (14) , CSSN (15) , FCONet (16) , COVNet (17) . All the comparison was carried on over the same test set of 10 runs. The comparison results are itemized in Table 9 . Figure 12 compares the proposed DFMPNN model with 10 state-of-the-art models. All the models are ranked by the MAF performance (last column in Figure 12 ) in a descending direction. We can observe from Figure 12 that the proposed DFMPNN achieves the highest MAF value among all algorithms. We not only propose a DFMPNN model but also integrate three improvements: (i) The FMP replaces traditional MP and AP. (ii) Multiple-way DA is utilized. (iii) DFMPNN is proven to yield better results than 10 state-of-the-art models. The shortcomings of this model are four points. First, some advanced AI modules are not integrated, which may help improve the performance. Second, more advanced pooling techniques could be tested. Third, the dataset is relatively small. Fourth, we do not have an environment to clinically validate our model. To solve those weak points, we shall try to integrate more advanced DL modules, such as graph networks, attention mechanisms, etc. Meanwhile, some advanced pooling techniques will be tested, such as stochastic pooling, rank-based pooling, etc. Furthermore, we shall try to combine several COVID-19 datasets from different resources so as to make our model tested on more datasets. Finally, we shall try to distribute our software to hospital staff, and let them test the proposed model. The dataset is available upon reasonable request to corresponding authors. Requests to access these datasets should be directed to Yu-Dong Zhang, yudongzhang@ieee.org. S-HW: conceptualization, software, formal analysis, methodology, writing, review, editing, visualization, and funding acquisition. SS: methodology, writingoriginal draft, project administration, and resources. DA: writing-original draft, writing, review, editing. S-XC: data curation, writing-original draft, writing, review, editing, project administration, and supervision. Y-DZ: conceptualization, methodology, resources, writing-original Clinical COVID-19 diagnostic methods: comparison of reverse transcription loop-mediated isothermal amplification (RT-LAMP) and quantitative RT-PCR (qRT-PCR) National Study Group for COVID-19 Vaccination. Effectiveness of the BNT162b2 Covid-19 vaccine against the B117 and B1351 variants Follow-up testing of borderline SARS-CoV-2 patients by rRT-PCR allows early diagnosis of COVID-19 Correlation of chest CT and RT-PCR testing for coronavirus Disease 2019 (COVID-19) in China: a report of 1014 cases Ultra-low-dose chest CT performance for the detection of viral pneumonia patterns during the COVID-19 outbreak period: a monocentric experience CO-RADS versus CT-SS scores in predicting severe COVID-19 patients: retrospective comparative study Evolution of lung function and chest CT 6 months after COVID-19 pneumonia: Real-life data from a Belgian University Hospital COVID-19 detection via wavelet entropy and biogeography-based optimization Diagnosis of COVID-19 by wavelet Renyi entropy and three-segment biogeography-based optimization A weakly-supervised framework for COVID-19 classification and lesion localization from chest CT Novel feature selection and voting classifier algorithms for COVID-19 classification in CT images Detection of COVID-19 by GoogLeNet-COD Covid-19 classification based on gray-level co-occurrence matrix and support vector machine A five-layer deep convolutional neural network with stochastic pooling for chest CT-based COVID-19 diagnosis Predicting COVID-19 pneumonia severity on chest X-ray with deep learning COVID-19 pneumonia diagnosis using a simple 2D deep learning framework with a single chest ct image: model development and validation Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary ct: evaluation of the diagnostic accuracy PatchShuffle Stochastic Pooling Neural Network for an explainable diagnosis of COVID-19 with multiple-way data augmentation Deep learning on chest X-ray images to detect and evaluate pneumonia cases at the era of COVID-19 Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks A two-stage parametric subspace model for efficient contrast-preserving decolorization. Front Inform Technol Electron Eng Developing a Dynamic Cluster Quantization based Lossless Audio Compression (DCQLAC) Calibrating feature maps for deep CNNs Short U-net model with average pooling based on in-line digital holography for simultaneous restoration of multiple particles A sparsity-based stochastic pooling mechanism for deep convolutional neural networks Fractional max-pooling Bi-linearly weighted fractional max pooling: an extension to conventional max pooling for deep convolutional neural network Shallow and wide fractional max-pooling network for image classification Facial expression recognition via ResNet-50 Bayesian model averaging sliced inverse regression Estimation of air-flow parameters and turbulent intensity in hydraulic jump on rough bed using Bayesian model averaging attention network for COVID-19 explainable diagnosis based on convolutional block attention module draft, writing, review, and editing, visualization, project administration, supervision, and funding acquisition. All authors contributed to the article and approved the submitted version. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.