key: cord-0761475-up7ub29y
authors: Belciug, Smaranda
title: Learning deep neural networks' architectures using differential evolution. Case study: Medical imaging processing
date: 2022-05-17
journal: Comput Biol Med
DOI: 10.1016/j.compbiomed.2022.105623
sha: 5ea59156645faa9ad552b2ae290918f6b65985b9
doc_id: 761475
cord_uid: up7ub29y

The COVID-19 pandemic has changed the way we practice medicine. Cancer patient and obstetric care landscapes have been distorted. Delaying cancer diagnosis or maternal-fetal monitoring increased the number of preventable deaths or pregnancy complications. One solution is using Artificial Intelligence to help the medical personnel establish the diagnosis in a faster and more accurate manner. Deep learning is the state-of-the-art solution for image classification. Researchers manually design the structure of fix deep learning neural networks structures and afterwards verify their performance. The goal of this paper is to propose a potential method for learning deep network architectures automatically. As the number of networks architectures increases exponentially with the number of convolutional layers in the network, we propose a differential evolution algorithm to traverse the search space. At first, we propose a way to encode the network structure as a candidate solution of fixed-length integer array, followed by the initialization of differential evolution method. A set of random individuals is generated, followed by mutation, recombination, and selection. At each generation the individuals with the poorest loss values are eliminated and replaced with more competitive individuals. The model has been tested on three cancer datasets containing MRI scans and histopathological images and two maternal-fetal screening ultrasound images. The novel proposed method has been compared and statistically benchmarked to four state-of-the-art deep learning networks: VGG16, ResNet50, Inception V3, and DenseNet169. The experimental results showed that the model is competitive to other state-of-the-art models, obtaining accuracies between 78.73% and 99.50% depending on the dataset it had been applied on.

Since the outburst of the COVID-19 pandemic in 2020, cancer patient and obstetric care landscapes have been distorted. While hospitals got more and more crowded with COVID-19 patients, disturbances appeared through all aspects of cancer care from diagnostic to tailored or classical treatment (Feletto et al., 2020; Ng et al., 2020; van de Haar et al., 2020; van Dorn, 2020; Young et al., 2020) , as well as maternal-fetal care (Deprest et al., 2020; Mazur-Bialy et al., 2020; Chmielewska et al., 2021) . Europe and North America experienced a lot of pressure on the healthcare system, and changes in the routine cancer and maternal-fetal care were necessitated. Even if the cancer care remained available, cancer screening programs were interrupted. Delaying diagnosis ultimately increased the number of preventable cancer deaths (Alkatout et al., 2021; Gong et al., 2020; Cheng et al., 2020) . The onset of the fear and the anxiety of being infected with COVID-19 among individuals, prevented patients with potential non-specific symptoms of cancer to avoid consulting specialists (Dinmohamed et al., 2020) .

Colonoscopy rates fell by 4.1% to 75%, (Patt et al., 2020) , lung screening rates were reduced by 57%, 74%, and 56% respectively (Patt et al., 2020; Lang et al., 2020) . Cancer biopsies also recorded reductions, i.e colon (-33 to -79%), and lung (-47 to -58%). In the case of neuro-oncology patients, things seem even worse since the urgency in their care is changing at a much faster pace. So far, the impact on brain tumor patients of the COVID-19 pandemic is yet unknown (Mathew, 2020) . In what regards maternalfetal care in the COVID-19 pandemic, long-lasting congenital anomalies of infants have been observed, caused either by the actual infection, or by therapeutic maneuver (Khan et al., 2020) . The number of caesarean sections has also increased as a secondary cause of the pandemic, (Dube & Kar, 2020) . The importance of a correct interpretation of the ultrasound is given by the fact that it allows a detailed discussion regarding the prognosis with parents (i.e. procedural risks, long-term mortality, morbidity, and, ultimately, quality of life). The current approaches have limitations. A study of the pre-and postnatal for low grade glioma, (Dong et al., 2017) ; a conditional generative adversarial network which obtained 0.68 dice, 0.99 sensitivity, and 0.98 specificity, (Rezaie et al., 2017) ; a fully convolutional neural network which obtained 0.86 dice, (Zhao et al., 2018) ; a multi-view deep learning framework which obtained a 0.55 accuracy, (Munir et al., 2019) ; and a deep wavelet autoencoder which obtained an average accuracy of 0.93, (Alom et al., 2019) .

In (Burgos-Artizzu et al., 2020) , the authors applied different pretrained CNNs and two non-deep learning methods on two datasets regarding maternal-fetal ultrasounds and obtained accuracies ranging from 54% to 93.6%. Fujitsu started a research project with the Cancer Translational Research team and the Department of Obstetrics and Gynecology Showa University School of Medicine, in which they study fetal heart ultrasounds using deep learning, Komatsu et al., 2019) . Namburete et al. proposed a fully CNN for the segmentation of the 3D fetal brain, (Namburete et al., 2018) . A convolutional neural network was used for automated fetal cardiac assessment using 4D B-mode ultrasound, (Phillip et al., 2019) . A segmentation of the fetal lungs and brains was obtained by using deep learning with sequential forward feature selection techniques and Support Vector Machines on magnetic resonance images (MRI) and ultrasounds, (Torrents-Barrena et al., 2019) .

Finding the best architecture of the CNN for the problem at hand can be quite tricky. There is no perfect NN model that can be applied on every problem. This hypothesis was first introduced by Wolpert and Macready under the name of the 'no-free-lunch-theorem' (Wolpert & Macready, 1997) . All the NNs play the role of a certain 'restaurant' that provide us a 'dish', in our case a measure of performance, at a certain 'price'the computational cost. Determining the 'smart-deal' takes a lot of time and effort. In recent years, the interest in automatically learning NN architectures has increased substantially. Three directions can be distinguished: reinforcement learning (Baker et al. 2017; Cai et al., 2018; Zhong et al., 2018; Zoph & Le, 2017; Zoph et al., 2018) , progressive neural architecture search (Liu et al., 2018a) , and evolutionary computation (Miikkulainen et al., 2017; Real et al., 2017; Xie & Yuille, 2017) . In reinforcement learning, the structure of the model is encoded as the sequence of actions the agent makes.

The built model is afterwards trained and tested. The reward of the agent is computed as the obtained validation performance. In the progressive neural architecture search a sequential model-based optimization strategy is used. A surrogate model learns simultaneously to guide the search through the structure space.

In evolutionary computation, the NN's structure is represented as an array, which is subjected to random J o u r n a l P r e -p r o o f mutations and recombinations during the search process. Each model is trained and evaluated on the validation set. The top performing model is returned. All automated methods outperform manually tuning of the architectures. Ingenious architecture representations together with interesting methodologies have delivered astonishing results when compared to human designed networks, (Real et al., 2017; Real et al., 2019; Sun et al., 2020a) . The downside is represented by the necessity of significant computational resources. Nevertheless, neuroevolution necessitates less computational time than reinforcement learning models, (Sun et al., 2020b) .

We propose the use of differential evolution in determining the best NN architecture. We have applied and tested this approach on three different cancer datasets. For comparison purposes we have compared our best performing models with state-of-the-art DL algorithms, such as VGG-16, ResNet50, Inception V3, and DenseNet169. A thorough statistical analysis is performed, to determine is the obtained results are robust and trustworthy.

The remaining part of the paper is organized in the following manner. Section 2 describes briefly the related work in the field, Section 3 presents the design and implementation of the novel model, while Section 4 summarizes the benchmarking datasets, the design of experiments and parameter settings. Section 5 details the experimental results obtained by the proposed model and other state-of-the-art DLs, followed by thorough statistical assessment them. Section 6 comprises the discussion. The paper ends with Section 7 that contains the conclusions.

The need for a fast and reliable diagnosis, led to the quest of finding the best architecture of CNNs for the problem at hand. Recent studies have proven that by automatically determining the network's architecture we obtain far better results rather than by performing it manually. As we have state above, three directions are established: reinforcement learning, progressive neural architecture search, and evolutionary computation. By 2019, there were over 300 works published papers regarding NN architecture search, (Lindauer & Hutter, 2019) .

In (Zoph & Le, 2017) , the authors proposed a neural architecture search method together with the algorithm named REINFORCE, first published in 1992, (Williams, 1992) . REINFORCE estimated the parameters of a recurrent neural network, parameters that represented the sequence of actions that the agent J o u r n a l P r e -p r o o f took. The authors used as reward for the agent the classification accuracy obtained by the new designed model on the validation data. The study has been extended in through a more controlled search space by using stacked cells, and through the replacement of the REINFORCE algorithm with the proximal policy optimization algorithm, developed by (Schulman et al., 2017) . In (Zhong et al., 2018) the same neural architecture search method has been used, only the authors have replaced the policy gradient with the Q-learning method. The Q-learning algorithm was also deployed by (Baker et al., 2017) the difference between the studies being the lack of exploitation of the cell structure in the latter. In (Cai et al., 2018 ) the authors added an extra layer to the recurrent neural network trained through the policy gradient.

In (Whitelam & Tamblyn, 2020) , the authors developed an evolutionary reinforcement learning scheme, which involved alternating physical and evolutionary dynamics, that ultimately led to building networks that were able to promote self-assembly of a certain structure at a faster and a better manner than other methods, such as intuitive cooling protocols. The newly developed networks were able to select between two polymorphs that were equal in energy and had been formed in unpredictable quantities under slow cooling protocols. No human input was needed, beyond the specification of which target parameter to promote. (Liu et al., 2018a) proposed a progressive neural architecture. The method implements a progressive scan of the neural architecture search space, choosing at each step the best performing ones.

The networks' validation errors are collected and used to train a surrogate function which will predict the validation error of the succeeding architectures. (Lomurno et al., 2021) proposed a Pareto-optimal progressive neural architecture search that merges the architecture proposed by (Liu et al., 2018a) with a time-accuracy Pareto optimization problem. Technically, a new time predictor is added in order to do a joint prediction of time and accuracy to each candidate architecture, searching over the Pareto front.

The area of neuroevolution uses evolutionary computation strategies to define the NNs' architecture (Stanley, 2017) . Different types of evolutionary algorithms and stochastic gradient descent are used to learn the structure and/or the hyperparameters of the network. (Liu et al., 2018b) combined a hierarchical genetic representation that models the design pattern used by human experts, and expressive search space for complex topologies. In (Real et al., 2019) AmoebaNet-A image classifier has been evolved through the modification of the tournament selection of an evolutionary algorithm. The selection was modified by introducing an age property to favor the young genotypes. (Miikkulainen et al., 2017) proposed J o u r n a l P r e -p r o o f a new automated method, CoDeepNeet, that optimizes the DNN's architecture using the neuroevolution technique of NEAT, (Stanley & Miikkulainen, 2002) . NEAT is used to evolve topologies, weights, and hyperparameters. (Xie & Yuille, 2017) deployed a genetic algorithm to optimize the CNN's architecture.

In another study, the authors developed a scalable evolutionary algorithm for NN architecture search (Hajewski et al., 2020) . They have applied their novel method to the evolution of deep encoders. In (Sun et al., 2020) , meta-models with ensemble members can be used to estimate the accuracy of different CNNs. Their advantage consists in reducing the training time from 33 GPU days to 10 GPU days, gaining the same competitive results as other state-of-the-art techniques. A drawback of their approach is that they do not report the required number of training runs needed to reach that performance.

In (Whitelman et al., 2021) the authors show that neuroevolution performs the same as gradient descent on the loss function in the presence of Gaussian white noise. In this study numerical simulations were performed in order to illustrate the correspondence between the two methods which can be detected when applied to shallow and deep neural network. This connection between machine learning and statistical mechanics was also pointed out in (Bahri et el., 2020) . The authors provide a review of recent works which show the associations between deep learning and different mathematical and physical methods such as random landscapes, jamming, dynamical phase transitions, chaos, spin glasses, Riemannian geometry, random matrix theory, nonequilibrium statistical mechanics, free probability. Contrary to the abovementioned studies, authors such as Khadka et al., (Khadka et. al, 2019) suggest that we should be careful when comparing neuroevolution methods to gradient descent, on generation of neuroevolution being not sufficient enough for such a comparison.

Thorough reviews of modern-day neuroevolution which present various significant features of the process, including large-scale computing, advantages of novelty and diversity, the power of indirect encoding, meta-learning and architecture search, together with future challenges can be studied in (Stanley et al., 2019 , Galvan & Mooney, 2021 .

Different from the above methods, we propose the use of differential evolution for determining the architecture of CNNs. The obtained results of this method prove that it is competitive in terms of performance to other state-of-the-art CNNs.

Convolutional Neural Networks (CNNs) are a specific type of NNs. They architecture usually consists of three types of layers: convolutional layer (CONV), pooling layer (POOL), and fully connected layer (FC). The CONV layer uses filters that perform convolution operations by scanning the input and producing a feature map. The CONV's parameters include the filter size and the stride. The POOL layer is applied after a convolutional layer and downsamples the feature map producing spatial invariance. The FC layer works on a flattened input, where each input is connected to all neurons. The FC is the last layer of the CNN's architecture.

In terms of hyperparameters in a CNN we encounter the size of the filter, the stride (i.e. number of pixel by which the window moves after each operation, and zero-padding (i.e. the process of boarding with zeros the input).

In a CNN we have as activation functions the rectified linear unit layer (ReLU), with its variants the Leaky ReLU, and Exponential linear unit, ELU, and the softmax classifier. ReLU, Leaky ReLU, or ELU are used on all elements of the volume. They induce non-linearities in the network, whereas softmax is the generalized logistic function that takes as input a score vector ∈ ℝ and outputs a probability vector ∈ ℝ . The functions are defined as follows:

• ReLU:

• Leaky ReLU:

• ELU:

J o u r n a l P r e -p r o o f signals computed as the difference between the ground truth and the prediction at the top layer. Designing a CNN's architecture is captivating. Some researchers argue that deeper CNNs obtain a higher accuracy in classification problems. Many networks have their structures set deterministic, even if stochastic processes are used to avoid over-fitting, (Huang et al., 2016b; Ioffe and Szegedy, 2015) . Having deterministic structures limits the flexibility of the CNNs, hence we need to automatically learn the networks architecture.

The inception of Differential Evolution (DE) appeared in 1997. This heuristic optimization algorithm is flexible, versatile, easy to implement and understand, (Storn & Price, 1997; Storn & Price, 2003) . DE mimics the natural biological evolution process. Technically, DE generates a temporary individual having as starting point the differences within populations, followed by an evolutionary restructuration of the population. Several studies proved its suitability for solving numerical optimization problems, having a good global convergence and robustness. DE has been applied fruitfully in constrained image classification (Omran & Engelbrecht, 2006) , image segmentation (Aslantas & Tunckanat, 2007) , neural networks (Dhahri & Alimi, 2006) , linear array (Yang et al., 2002) , global optimization problems (Kim et al., 2007) , and other areas (Massa et al., 2006; Su & Lee, 2003; Tasgetiren et al., 2009; Sum-Im et al., 2007) .

Different from other evolutionary computation algorithms, DE uses a population-based global search strategy. The complexity of the mutation operation of the differential is reduced by using one-onone competition. By adapting the candidate solutions, DE explore in parallel different solutions. It enables dynamic track of the current search through its memory capacity, making possible the adjustments of the search strategy. Through this a global convergence and robustness is achieved.

Mathematically speaking, the population of each generation G contains N candidates. Each candidate can be written as = ( 1 , 2 , … , ), = 1, 2, … , , where M is the number of features.

The initial population of the candidate solutions is randomly generated between the upper and lower bound of the search interval for each feature.

where , is the lower bound of the variable , and , is the upper bound of the variable .

For the mutation process to take place, we need to select three vectors 1 , , 2 , , 3 , . The following formula is applied:

where +1 is the donor vector, ∈ [0,1] is the variation factor that regulates the amplification degree of the differential variable 2, − 3, .

Regarding the recombination process, the operator develops a trial vector , +1 from the target vector , and the donor vector +1 , using the following formula:

is an integer random number, and is the recombination probability. The recombination strategy allows the old and the new candidate solution to exchange part of the code in order to form a new individual.

After the mutation and recombination processes are over, the selection process begins. The target vector , is compared with the trial vector , +1 . The vector that minimizes the fitness function values gets selected to be part of the next generation:

where = 1, 2, … , , and = 1, 2, … , .

The DE method's steps are the following: 

In this subsection we will present a DE/CNN algorithm for determining competitive CNN's architectures. At first, we define how to represent the network's architecture, the size of the filters in each convolutional layer, and the hyperparameters' values as a candidate solution using a fixed-length array, followed by several DE processes defined in subsection 3.2. The DE processes help us navigate through the search space in a more professional manner, which lead us into finding high-quality solutions for our problems.

We define a population of candidate architectures which can be encoded in a fixed-length integer array. A CNN is composed of an input layer, convolutional hidden layers, pooling layers, and an output layer. Each hidden layer has a certain number of hidden neurons, . The number of pooling layers is smaller than the number of hidden layers. Each filter has a width, fw, and a height, fh. The depth of the filter is not variable since it matches the number of color channels the image has (e.g. 2 for grayscale images, and 3 for RGB images). The hyperparameters are the recombination probability, Cp, and the mutation variation factor, F. Therefore, a candidate solution is an integer array = ( , , , ℎ, , ), = 1, … , , where is the number of candidate solutions in the population. Because all the candidate solution must be of a fixed length to apply mutation, recombination, and selection, we decided that each hidden layer in a candidate solution contains the same number of hidden units. After each convolutional layer, we added in the network a max pooling layer, except for the last one which is a dense layer.

Our study has a limitation: we have applied DE to determine only the number of convolutional layers, their units, the filter's height and depth, and the recombination probability and mutation variation factor. In future studies we shall find a way to encode the candidate solution using different number of hidden units in each convolution. However, our experiments and statistical analysis prove that the proposed model can achieve competitive performance, using DE to tune only these features. Our method can be scaled up if results are unsatisfying.

The ReLU function was chosen as the non-linear activation function for each convolutional layer. The softmax function was chosen as activation function between the last dense layer and the output layer. The pool size was (2, 2). 3.3. Select: the individuals that will for the next generation based on their validation loss.

until stopping criterion is met (number G of generations is reached) 4. Output: the best candidate solution that will represent the networks' architecture

The DE/CNN architecture is presented in figure 1. The novel proposed method has been applied on two publicly available cancer datasets that regard lung and colon cancer histopathological images, brain cancer MRI images, and two maternal-fetal ultrasound

images. In what follows we shall briefly describe the datasets.

J o u r n a l P r e -p r o o f The maternal-fetal ultrasound dataset (https://zenodo.org/record/3904280#.YfjeTPVBzL9) was collected from two different hospitals. The dataset is split into two different sets. The first set (FP) contains 6 classes, 4 of which regard the fetal anatomical planes: abdomen (711 cases), brain (3092 cases), femur (1040 cases), and thorax (1718 cases), the fifth regarding the mother's cervix (1626 cases), and the last one includes the less common image plane (4213 cases). The second set (FB) contains images of the brain planes that are split in 3 classes: trans-thalamic (1638 cases), trans-cerebellum (714 cases), trans-ventricular (597 cases). The first set has 12 400 images, while the second contains 2949, (Burgos-Artizzu, 2020). 

In this study, we have compared the performance of the CNN, which had its architecture established through DE, with the performance of state-of-the-art DLs: VGG-16, ResNet50, Inception V3, and DenseNet169. All models were run on the five datasets.

To assess the models' performances, we have used the 10-fold cross-validation as validation method. For an effective and objective evaluation of the CNN algorithms, we have evaluated their results through a throughout statistical analysis. The following rule has been applied to all the methods that have been compared in this study: each method was executed in 100 independent runs (i.e. 100 times in a complete cross-validation cycle). The purpose of this procedure is to estimate the sample size needed for a high statistical power. A model lacks in performance if the sample size is too low, or on the contrary, using too many computational and time resources might not lead to a significant increase in performance. Hence, J o u r n a l P r e -p r o o f using a sample size of 100 computer runs, we have obtained a statistical power greater than 95%, with type I error = 0.05, for all the statistical tests that have been performed. The average accuracy over 100 complete cross-validation cycles (ACA) is recorded for each model. Besides the ACA, we have computed the standard deviation (SD), the 95% confidence interval (CI 95%), precision, recall, and F1-score. The standard deviation gives us an insight on the model's stability. To demonstrate the omnibus robustness, the methods must be applied on multiple datasets. If the SD varies from dataset to dataset, from smaller to larger values, then the method has failed in providing the omnibus robustness. We have considered the Precision-Recall curves (PR AUC) and not the Area under the ROC curve (AUC), because the datasets are imbalanced. Imbalanced data can lead to a probable change in false positives, which are used in computing the false positive rate used by AUC. Using PR AUC we obtain more precise results, due to the fact that we compare false positives with true positives, and not true negatives, as in the AUC case.

The statistical evaluation involved the following tests, which have been applied on the sample which contained 100 performances obtained after running the method 100 independent computer runs on the test set:

• (Altman, 1991) .

• Equality of variances: Levene's and Brown-Forsythe tests. If the samples have unequal variances, then the Type I error might be affected, resulting false positives. In practice, the issue is less problematic, since we are using the samples with the same size (in our case 100) (Altman, 1991 , Belciug, 2020 .

There are different methods for testing whether the data sample is governed by the Normal distribution or not. We have used the Shapiro-Wilk test because it has more power to detect the nonnormality, but it is used in general for smaller sample sizes. Kolmogorov-Smirnov & Lilliefors test is recommended for larger sizes, but has a lesser power (Yap & Sim, 2011) .

If the normality assumption and the equality of variances assumption are met, then we can proceed and apply t-test, One-Way ANOVA together with Tukey's post-hoc test to differentiate between the algorithms' performances. The One-Way ANOVA is used to establish whether there are any statistically J o u r n a l P r e -p r o o f 

The results of the experiments regarding the classification of brain tumors, lung, colon cancer, maternal-fetal planes, and brain planes obtained after applying DE/CNN are depicting in Table 1 in terms of ACA, SD, 95% confidence interval (CI), precision, recall, F1-score, and network's structure. In what follows, we shall present the data screening process that involved the Kolmogorov-Smirnov and Lilliefors test and the Shapiro Wilk W test. Table 2 show the obtained results. 

We have evaluated the DE/CNN by statistically comparing its results with the performances of other DLs applied on the same datasets. The competitors of the proposed model are the following state-ofthe-art DLs:

• VGG16 is considered to have an excellent architecture. It has won the ILSVR (ImageNet) competition in 2014. The VGG16 does not have large number of hyper-parameters. Instead, its architecture consists of convolutional layers of 3 × 3 filter with stride 1, same padding, and a maxpooling layer of 2 × 2 filter with stride 2. After a series of convolutions followed by maxpooling, the architecture ends with two fully connected layers and a softmax for the output.

The VGG16 has 16 layers that have weights (Simonyan and Zisserman, 2014) .

• ResNet50 stands for Residual Network 50. It has won the ILSVR (ImageNet) competition in 2015.

ResNet's signature is the concept of skip connection. The skip connection allows an alternative cutoff route for the gradient to flow through the network. In this way, the model can learn an identity function so that any higher level performs as well as a lower layer in the CNN. The

ResNet50 has 48 convolutional layers, 1 maxpool, and 1 averagepool layer (He et al., 2015) .

• Inception V3 aims to be more computational efficient. Its architecture is progressively built starting with factorized convolutions that reduce the computational efficiency by reducing the number of parameters in the network. Another characteristic of the Inception V3 architecture is that it replaces bigger convolutions with smaller convolutions, speeding up the training process. Besides smaller convolutions, the network supports asymmetric convolutions, and an auxiliary classifier (a small CNN) inserted in its architecture during training between other layers. This classifier acts as a regularizer. Inception V3 reduces the grid size through pooling operations .

• DenseNet129 is short for Dense CNN. In DenseNet each layer receives additional input, known as collective knowledge from all the previous layers. In this way, the network is more compact, J o u r n a l P r e -p r o o f with fewer channels. Instead of using a deep architecture, DenseNet reuses the features. It does not sum the output feature map of the layer with the following feature map, it concatenates them (Huang et al, 2016a) .

All the algorithms have been run under the same conditions for the comparison to be fair and objective. The results are displayed in table 3. We were interested in verifying the equality of variances on each dataset using Levene's and Brown-Forsythe tests. This was an important step in our statistical analysis, because we wanted to apply

One-Way ANOVA and post-hoc Tukey to verify whether indeed, they were or weren't any statistical significant differences between our proposed model and the other competitors. Table 4 presents the results of the two tests.

J o u r n a l P r e -p r o o f (Demsar, 2006; Seltman, 2018) . Even if the accuracies seem close enough, the test reveals that there are statistical differences between the competitors. In figures 7a, 7b, 7c, 7d, and 7e we present the visual representations of the least squares means for the five datasets. The One-way ANOVA revealed that there are significant differences between the competitors'

performances, but one should ask the following question: are there differences between all the competitors, or just between some of them? To answer the question above, we have applied Tukey's post-hoc test. Its results presented in Table 6 The benchmarking process was completed by presenting results that have been reported in literature on the same five datasets. It should be noted that these results were obtained by networks which were pre-trained on ImageNet Large Scale Visual Recognition Challenge, and then fully retrained using these datasets. In our study, the networks were not previously pre-trained. The datasets are recent, therefore there are not many papers in recent literature (2020-2022) that regard them. Through tables 7-10 we enable a fair and direct comparison between DE/CNN and the most recent methodologies. 

In this study we propose a new way to determine the architecture of a deep neural network through the use of differential evolution. At first, we proposed an encoding method for representing the structure of each CNN with a fixed-length integer array, after which we have used differential evolution processes (mutation, recombination, and selection) to explore in an efficient manner the search space. We have tested our method on three cancer related datasets that contain MRI scans and histopathological images concerning brain, lung, and colon tumors. The experimental results were further statistically analyzed in comparison with the results obtained by other state-of-the art DLs. The findings show that this neuroevolution method for determining a CNNs architecture is competitive with other methods.

Despite the interesting results, our method still has some drawbacks that will be resolved in future works. The limitation of the study consists in the fact that we use the same number of hidden neurons for each convolutional layer. We aim to explore how the performance changes when we set different numbers of hidden neurons using differential evolution. Also, in future studies we wish to see whether the performance improves if we use pretrained networks.

How COVID-19 affected cancer screening programs? A systematic review

A state-of-the-art survey on deep learning theory and architectures

Practical Statistics for Medical Research

Differential evolution algorithm for segmentation of wound images

Statistical Mechanics of Deep Learning

Designing neural network architectures using reinforcement learning

Artificial Intelligence in Cancer: diagnostic to tailored treatment

The state of artificial intelligence-based FDA approved medical devices and algorithms: an online database

Lung cancer detection: a deep learning approach

Brain Tumor Classification (MRI), Kaggle

Lung and colon cancer histopathological image dataset (LC25000)

The histological diagnosis of colonic adenocarcinoma by applying partial self-supervised learning

FETAL_PLANES_DB: Common maternal-fetal ultrasound images

Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes

Efficient architecture search by network transformation, Association for the Advancement of

Supervised machine learning model for high dimensional gene data in colon cancer detection

Impact of COVID-19 pandemic on fecal immunochemical test screening uptake and compliance to diagnostic colonoscopy

Effects of the COVID-19 pandemic on maternal and perinatal outcomes: a systematic review and meta-analysis. The Lancet Global Health

Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning

Statistical comparisons of classifiers over multiple data sets

Fetal diagnosis and therapy during the COVID-19 Pandemic: guidance on behalf of the international fetal medicine and surgery society

Fewer Cancer diagnoses during the COVID-19 epidemic in the Netherlands

The modified differential evolution and the RBF (MDE-RBF) neural network for time series prediction

Automatic brain tumor detection and segmentation using u-net based fully convolutional networks

COVID-19 in pregnancy: the foetal perspective-a systematic review

How has COVID-19 impacted cancer screening? Adaptation of services and the future outlook in Australia

Neuroevolution in Deep Neural Networks: Current trends and Future Challenges

Internet hospitals help prevent and control the epidemic of COVID-19 in China: multicenter user profiling study

Distributed evolution of deep autoencoders

Lung cancer detection using convolutional neural network on histophatological images

Brain tumor segmentation with deep neural networks

Deep residual learning for image recognition

Densely connected convolutional networks

Deep Networks with stochastic depth. European Conference on Compute Vision

Accelerating Deep Network Training by Reducing Internal Covariate Shift

MRI-Based Brain Tumor Classification using ensemble of deep features and machine learning classifiers

Evolution-guided policy gradient in reinforcement learning

Risk of congenital birth defects during COVID-19 pandemic: draw attention to the physicians and policymakers

Differential evolution strategy for constrained global optimization and application to practical engineering problems

Novel AI-guided ultrasound screening system for fetal heart can demonstrate finding in timeline diagram

Lung nodule classification using deep features in CT images, 12 th Conference on Computer and Robot Vision

Operational Challenges of a low-dose CT lung cancer screening program during the coronavirus disease 2019 pandemic

Best practices for scientific research on neural architecture search

Progressive Neural Architecture Search, European Conference on Computer Vision

Hierarchical representations for efficient architecture search

A comparison of deep learning performances against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis

Pareto-optimal progressive neural architecture search

Optimization of the directivity of a monopulse antenna with a subarray weighting by a hybrid differential evolution method

Convolutional neural networks for diagnosing colon and cancer histopathological images

Brain tumors and COVID-19: the patients and caregiver experience

A novel deep learning based system for fetal cardiac screening

Pregnancy and Childbirth in the COVID-19 Era -The course of disease and maternal-fetal transmission

Evolving deep neural networks

Cancer diagnosis using deep learning: a bibliographic review

Fully automated alignment of 3D fetal brain ultrasound to a canonical reference space using multi-task learning

Understanding the psychological impact of COVID-19 pandemic on patients with cancer, their caregivers, and health care workers in Singapore

Self-adaptive differential evolution methods for unsupervised image classification

Sonography in obese and overweight pregnant women: clinical, medicolegal and tehncial issues

Impact of COVID-19 on cancer care: how the pandemic is delaying cancer diagnosis and treatment for American seniors

Convolutional Neural Networks for automated fetal cardiac assessment using 4D

IEEE 16 th International Symposium on Biomedical Imaging

Large-scale evolution for image classifiers

Regularized evolution for image classifier architecture search

A conditional adversarial network for semantic segmentation of brain tumor

A score-based method for quality control of fetal images at routine second trimester ultrasound examination

Experimental design and analysis

Proximal policy optimization algorithm

Very deep convolutional networks for large-scale image recognition

Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images

Using deep learning for classification of lung nodules on computed tomography images

Neuroevolution: a different kind of deep learning

Evolving neural networks through augmenting topologies

Designing neural networks through neuroevolution

Differential-evolution -a simple and efficient heuristic for global optimization over continuous spaces

Differential evolution for multi-objective optimization

Network reconfiguration of distribution systems using improved mixed-integer hybrid differential evolution

A differential evolution algorithm for multistage transmission planning

Computer aided lung cancer diagnosis with deep learning algorithms

Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor

Evolving deep convolutional neural networks for image classification

Completely automated CNN architecture design based on blocks

Rethinking the Inception Architecture for Computer Vision

Differential evolution algorithms for the generalized assignment problem

Automated detection of pulmonary nodules in PET/CT images: ensemble of false-positive reduction using a convolutional neural network technique

High performances medicine: the convergence of human and artificial intelligence

Assessment of radiomics and deep learning for the segmentation of fetal and maternal anatomy in magnetic resonance imaging and ultrasound

COVID-19 and readjusting clinical trials

Caring for patients with cancer in the COVID-19 era

Learning to grow: control of material self-assembly using evolutionary reinforcement learning

Correspondence between neuroevolution and gradient descent

Simple statistical gradient-following algorithms for connectionist reinforcement learning

No free lunch theorems for optimization

A deep learningbased segmentation method for brain tumor in MR images

Sideband suppression in time-modulated linear arrays by the differential evolution algorithm

Comparisons of various types of normality tests

Uncertainty upon uncertainty: supportive care for cancer and COVID-19

A deep learning model integrating FCNNs and CRFs for brain tumor segmentation

Practical network blocks design with Q-Learning, International Conference on Learning Representations

Neural architecture search with reinforcement learning

Learning transferable architectures for scalable image recognition