key: cord-0510296-xwst83a0
authors: Toutouh, Jamal; O'Reilly, Una-May
title: Signal Propagation in a Gradient-Based and Evolutionary Learning System
date: 2021-02-10
journal: nan
DOI: nan
sha: bcadd9b0e97652473757386b46500206b043b7a6
doc_id: 510296
cord_uid: xwst83a0

Generative adversarial networks (GANs) exhibit training pathologies that can lead to convergence-related degenerative behaviors, whereas spatially-distributed, coevolutionary algorithms (CEAs) for GAN training, e.g. Lipizzaner, are empirically robust to them. The robustness arises from diversity that occurs by training populations of generators and discriminators in each cell of a toroidal grid. Communication, where signals in the form of parameters of the best GAN in a cell propagate in four directions: North, South, West, and East, also plays a role, by communicating adaptations that are both new and fit. We propose Lipi-Ring, a distributed CEA like Lipizzaner, except that it uses a different spatial topology, i.e. a ring. Our central question is whether the different directionality of signal propagation (effectively migration to one or more neighbors on each side of a cell) meets or exceeds the performance quality and training efficiency of Lipizzaner Experimental analysis on different datasets (i.e, MNIST, CelebA, and COVID-19 chest X-ray images) shows that there are no significant differences between the performances of the trained generative models by both methods. However, Lipi-Ring significantly reduces the computational time (14.2%. . . 41.2%). Thus, Lipi-Ring offers an alternative to Lipizzaner when the computational cost of training matters.

Generative modeling aims to learn a function that describes a latent distribution of a dataset. In a popular paradigm, a generative adversarial network (GAN) combines two deep neural networks (DNN), a generator and a discriminator, that engage in adversarial learning to optimize their weights [11] . The generator is trained to produce fake samples (given an input from a random space) to fool the discriminator. The discriminator learns to discern the real samples from the ones produced by the generator. This training is formulated as a minmax optimization problem through the definitions of discriminator and generator loss, which converges when an optimal generator approximates the true distribution so well that the discriminator only provides a random label for any sample.

Early GAN training methods led to vanishing gradients [2] and mode collapse [5] among other pathologies. They arose from the inherent adversarial setup of the paradigm. Several methods have been proposed to improve GAN models and have produced strong results [3, 13, 18, 31] . However, GANs remain notoriously hard to train [12, 23] .

Using forms of evolutionary computation (EC) for GAN training has led to promising approaches. Evolutionary (EAs) and coevolutionary algorithms (CEAs) for weight training or spatial systems [1, 8, 24, 25, 28, 29] . Deep neuroevolution offers concurrent architecture and weight search [8] . Pareto approximations also have been proposed to define multi-objective GAN training [10] . This variety of approaches use different ways to guide populations of networks towards convergence, while maintaining diversity and discarding problematic (weak) individuals. They have been empirically demonstrated to be comparable and better than baseline GAN training methods.

In this work, we focus on spatially-distributed, competitive CEAs (Comp-CEAs), such as Lipizzaner [24] . In these methods, the members of two populations (generators and discriminators) are placed in the cells of a toroidal geometric space (i.e., each cell contains a generator-discriminator pair). Each cell has neighbors from which it copies their pairs of generator and discriminator. This creates sub-populations of GANs in each cell. Gradient-based training is done pairwise between the best pairing within a sub-population. Each training iteration (epoch), selection, mutation, and replacement are applied and then, the best generator-discriminator pair is updated and the remainder of the sub-populations re-copied from the neighborhood. This update and refresh effectively propagate signals along the paths of neighbors that run across . Thus, the neighborhood defines the directionality, space of signal propagation a.k.a. migration [1] . Communicating adaptations that are both new and fit promotes diversity during this training process. This diversity has been shown to disrupt premature convergence in the form of an oscillation or moving the search away from an undesired equilibria, improving the robustness to the main GAN pathologies [24, 26] .

In this work, we want to evaluate the impact of the spatial topology used by this kind of method, changing the two-dimensional toroidal grid used by Lipizzaner into a ring topology. Thus, we propose Lipi-Ring. Lipi-Ring raises central questions about the impact of the new directionality of the signal propagation given a ring. How are performance quality, population diversity, and computational cost impacted?

Thus, in this paper, we pose the following research questions: RQ1: What is the effect on the generative model trained when changing the directionality of the signal propagation from four directions to two? RQ2: When the signal is propagated to only two directions, what is the impact of performing migration in one or more neighbors? In terms of population diversity, RQ3: How does diversity change over time in a ring topology? How does diversity compare with a ring topology with different neighborhood radius? How does diversity compare between ring topology and 2D grid methods, where both methods have the same sub-population size and neighborhood, but different signal directionality?

The main contributions of this paper are: i) Lipi-Ring, a new distributed Comp-CEA GAN training method based on ring topology that demonstrates markedly decreased computational cost over a 2D topology, without negatively impacting training accuracy. ii) an open source software implementation of Lipi-Ring 1 , and iii) evaluating different variations of Lipi-Ring, comparing them to Lipizzaner, in a set of benchmarks based on MNIST, CelebA, and COVID-19 X-Ray chest images datasets.

The rest of the paper is organized as follows. Section 2 presents related work. Section 3 describes the Lipi-Ring method. The experimental setup and the results are in sections 4 and 5. Finally, conclusions are drawn and future work is outlined in Section 6.

This section introduces the main concepts in GANs training and summarizes relevant studies related to this research.

GANs train two DNN, a generator ( ) and a discriminator ( ), in an adversarial setup. Here, and are functions parametrized by and , where ∈ G and ∈ D with G, D ⊆ R represent the respective parameters space of both functions.

Let * be the target unknown distribution to which we would like to fit our generative model [4] . The generator receives a variable from a latent space ∼ ( ) and creates a sample from data space = ( ). The discriminator assigns a probability = ( ) ∈ [0, 1] that represents the likelihood that the belongs to the real training dataset, i.e., * by applying a measuring function : [0, 1] → R. The ( ) is a prior on (a uniform [−1, 1] distribution is typically chosen). The goal of GAN training is to find and parameters to optimize the objective function L ( , ).

1 Lipi-Ring source codehttps://github.com/xxxxxxxxx This is accomplished via a gradient-based learning process whereupon learns a binary classifier that is the best possible discriminator between real and fake data. Simultaneously, it encourages to approximate the latent data distribution. In general, both networks are trained by applying back-propagation.

Mode collapse and vanishing gradients are the most frequent GAN training pathologies [2, 5] , leading to inconsistent results. Prior studies tried to mitigate degenerate GAN dynamics with new generator or discriminator objectives (loss functions) [3, 18, 19, 31] and applying heuristics [14, 22] .

Others have integrated EC into GAN training. Evolutionary GAN (E-GAN) evolves a population of generators [29] . The mutation selects among three optimization objectives (loss functions) to update the weights of the generators, which are adversarially trained against a single discriminator. Multi-objective E-GAN (MO-EGAN) has been defined by reformulating E-GAN training as a multi-objective optimization problem by using Pareto dominance to select the best solutions in terms of diversity and quality [6] . Two genetic algorithms (GAs) have been applied to learn mixtures of heterogeneous pre-trained generators to specifically deal with mode collapse [28] . Finally, in [10] , a GA evolves a population of GANs (represented by the architectures of the generator and the discriminator and the training hyperparameters). The variation operators exchange the networks between the individuals and evolves the architectures and the hyperparameters. The fitness is computed after training the GAN encoded by the genotype.

Another line of research uses CEAs to train a population of generators against a population of discriminators. Coevolutionary GAN (COEGAN) combines neuroevolution with CEAs [8] . Neuroevolution is used to evolve the main networks' parameters. COEGAN applies an all-vs-best Comp-CEA (with k-best individuals) for the competitions to mitigate the computational cost of all-vs-all.

CEAs showed similar pathologies as the ones reported in GAN training, such as focusing, and lost of gradient, which have been attributed to a lack of diversity [21] . Thus, spatially distributed populations have been demonstrated to be particularly effective at maintaining diversity, while reducing the computational cost from quadratic to linear form [30] . Lipizzaner locates the individuals of a population of GANs (pairs of generators and discriminators) in a 2D toroidal grid. A neighborhood is defined by the cell itself and its adjacent cells according to Von Neumann neighborhood. Coevolution proceeds at each cell with sub-populations drawn from the neighborhood. Gradient-based learning is used to update the weights of the networks while evolutionary selection and variation are used for hyperparameter learning [1, 24] . After each training iteration, the (weights of the) best generator and discriminator are kept while the other sub-population members are refreshed by new copies from the neighborhood. A cell's update of its GAN is effectively propagated to the adjacent cells in four directions (i.e., North, South, East, and West) once the neighbors of the cell refresh their sub-populations from neighborhood copies. Thus, each cell's sub-populations are updated with new fit individuals, moving them closer towards convergence, while fostering diversity. Another approach, Mustangs, combines Lipizzaner and E-GAN [25] .

Thus, the mutation operator randomly selects among a set of loss functions instead of applying always the same one in order to increase variability. Finally, taking advantage of the spatial grid of Lipizzaner, a data dieting approach has been proposed [27] . The main idea is to train each cell with different subsets of data to foster diversity among the cells and to reduce the training resource requirements.

In this study, we propose Lipi-Ring, spatial distributed GAN training that uses a ring topology instead of a 2D grid. We contrast Lipi-Ring to Lipizzaner in the next section.

This section describes the Lipi-Ring CEA GAN training, which applies the same principles (definitions and methods) as Lipizzaner [24] . We introduce both 2D grid and ring topologies applied by Lipizzaner and Lipi-Ring, respectively. We summarize the spatially distributed GAN training method. We present the main distinctive features between Lipizzaner and Lipi-Ring.

Lipi-Ring and Lipizzaner use a population of generators g = { 1 , . . . , } and a population of discriminators d = { 1 , . . . , }, which are trained against each other (where is the size of the population). A generator-discriminator pair named center is placed in each cell, which belongs to a ring in the case of Lipi-Ring or to a 2D toroidal grid in the case of Lipizzaner. According to the topology's neighborhood, sub-populations of networks, generators and discriminators, of size are formed.

For the -th neighborhood, we refer the center generator by g ,1 ⊂ g, the set of generators in the neighborhood by g ,2 , . . . , g , , and the generators in this -th sub-population by g = ∪ =1 g , ⊆ g. The same is stated for discriminators d.

Lipizzaner uses Von Neumann neighborhoods with radius 1, which includes the cell itself and the ones in the adjacent cells to the North, South, East, and West [24] , i.e. =5 (see Figure 1 .a). This defines the migration policy (i.e., the directionality of signal propagation) through the cells in four directions. Figure 1 .a shows an example of a 2D toroidal grid with =16 (4×4 grid). The shaded areas illustrate the overlapping neighborhoods of the (1,1) and (0,2) cells, with dotted and solid outlines, respectively. The updates in the center of (1,1) will be propagated to the (0,1), (2,1), (1,0), and (1,2) cells.

In Lipi-Ring, the cells are distributed in a one-dimensional grid of size 1 × and neighbors are sideways, e.g. left and/or right, i.e. an index position or more away. The best GAN (center) after an evolutionary epoch at a cell is updated. Neighborhood cells retrieve this update when they refresh their sub-population membership at the end of their training epochs, effectively forming two non-ending pathways, around the ring, carrying signals in two directions. Figures 1.b and 1 .c show populations of six individuals ( =6) organized in a ring topology with neighborhood radius one ( =1) and two ( =2), respectively. The shaded areas illustrate the overlapping neighborhoods of the cells (0) and (4), with dotted and solid outlines, respectively. The updates in the center of (0) will be propagated to the (5) and (1) 

Algorithm 1 illustrates the main steps of the applied training algorithm. First, it starts the parallel execution of the training on each cell by initializing their own learning hyperparameters (Line 2). Then, the training process consists of a loop with two main phases: first, migration, in which the cells gather the GANs (neighbors) to build the sub-population ( ), and second, train and evolve when each cell updates the center by applying the coevolutionary GANs training, (see Algorithm 2). These steps are repeated (generations or training epochs) times. After that, each cell learns an ensemble of generators by using an Evolutionary Strategy, ES-(1+1) [17, Algorithm 2.1], to compute the mixture weights to optimize the accuracy of the generative model returned ( , ) * [24] .

For each training generation, the cells apply the CEA in Algorithm 2 in parallel. The CEA starts by selecting best pair generator and discriminator (called and ) according to a tournament selection of size . It applies an all-vs-all strategy to evaluate all the GAN pairs in the sub-population according to a randomly chosen batch of data (Lines 1 to 4). Then, for each batch of data in the training dataset (Lines 5 to 10), the learning rate is updated applying Gaussian Mutation [24] and the offspring is created by training and against a randomly chosen discriminator and generator from the sub-population (i.e., applying gradient-based mutations). Thus, the sub-populations are updated with the new individuals. Finally, a replacement procedure is applied to remove from the sub-populations the weakest individuals and the center is updated with the individuals with the best fitness (Lines 11 to 15). The fitness L , of a given generator (discriminator) is evaluated according to the binary-cross-entropy loss, where the model's objective is to minimize the Jensen-Shannon divergence between the real and fake data [11] . 

The population structured in a ring with =2 provides a signal propagation similar to the Lipizzaner toroidal grid because both topologies and migration models allow the center individual to reach four cells (two located in the West and two in the East in the case of this ring), see figures 1.a and 1.b. Thus, they provide the same propagation speed and, as their sub-populations are of size =5, the same selection pressure. The signal propagation of the ring with =1 is slower because, after a given training step, the center only reaches two cells (the one in the West and the one in the East), see Figure 1 .b. In this case, the sub-populations have three individuals, which reduces the diversity in the sub-population and accelerates convergence (it has 40% fewer individuals than Lipizzaner subpopulations). Thus, Lipi-Ring with =1 reduces the propagation speed, while accelerating the population's convergence.

Lipi-Ring with =1 has two main advantages over Lipizzaner: a) it mitigates the overhead due to the communication is carried out only with two cells (instead of four) and the sub-populations are smaller, which reduces the number of operations for fitness evaluation and selection/replacement; and b) like all Lipi-Ring with any radius, it does not require to have a rectangular grid of cells, but may be any natural number.

Given the infeasibility of analyzing the change in selection and other algorithm elements, we proceed empirically.

We evaluate different distributed CEA GAN training on image data sets: the well known, MNIST [9] and CelebA [16] ; and a dataset of chest X-ray images of patients with COVID-19 [7] .

In our experiments, we evaluate the following algorithms: Lipizzaner; two variations of Lipi-Ring both with =1, Ring(1) that performs the same training epochs than Lipizzaner and Ring (1) that runs for the same computational cost (wall clock time) than Lipizzaner; and Ring(2) which is the Lipi-Ring with =2 performing the same number of iterations than Lipizzaner. These represent a variety of topologies and migration models. Ring (1) is analyzed to make a proper comparison between Ring(1) ( =1) and Lipizzaner taking into account the computational cost (time). Table 1 summarizes the main characteristics of the Lipi-Ring variations studied.

The parameters are set according to the authors of Lipizzaner [24, 25] . Thus, all these CEAs apply a tournament selection of size two. The main settings used for the experiments are summarized in Table 2 . For MNIST experiments, the generators and the discriminators are multilayer perceptrons (MLP). The stop condition of each method is defined as follows: a) Ring(1), Ring(2), and Lipizzaner perform 200 training epochs to evaluate the impact of the topology on the performance and computational time required; and b) Ring (1) stops after running for the same time than Lipizzaner to compare the methods taking into account the same computational cost (time). The population sizes are 9, 16, and 25, which means Lipizzaner uses grid of sizes 3×3, 4×4, and 5×5. Besides, we study the impact of the size of the populations on Ring(1) by training rings of size between 2 and 9.

For CelebA and COVID-19 experiments, deep convolutional GANs (DCGAN) are trained. DCGANs have much more parameters than MLP (see Table 2 ). Here, we compare Ring(1) and Lipizzaner. They stop after performing 20 training epochs for CelebA and 1,000 for COVID-19 (because COVID-19 training dataset has much fewer samples). This will discern the differences in terms of performance and computational cost with more complex networks. In these cases, the population size is 9.

The experimental analysis is performed on a cloud computation platform that provides 16 Intel Xeon cores 2.8GHz with 64 GB RAM and an NVIDIA Tesla P100 GPU with 16 GB RAM. We run multiple independent runs for each method.

We have implemented all variations of the Lipi-Ring by extending Lipizzaner framework [24] using Python and Pytorch [20] .

This section presents the results and the analyses of the presented GAN training methods. The first subsections evaluate them with the MNIST dataset. They are measured in terms of: the FID score, the diversity of the generated samples by evaluating the total variation distance (TVD) [15] , and the diversity in the genome space (network parameters). Then, we analyze the CelebA and COVID-19 results with measurements of Inception Score (IS) and computational time. The next subsection presents the results incrementally increasing the ring size of Ring(1). Finally, we compare the computational times needed by Ring(1) and Lipizzaner. Table 3 shows the best FID value from each of the 30 independent runs performed for each method. All the evaluated methods improve their performance when increasing the population size, while maintaining the same budget. This could be explained by diversity increasing during the training, as the populations get bigger.

Ring (1) has the lowest (best) median and mean FID results. In turn, it returned the generative model that provided the best quality samples (minimum FID score) for all the evaluated population sizes. The second best results are provided by Ring(1). This indicates that the methods with smaller sub-populations converge faster, even though the ring migration model ( =1) slows down the propagation of the best individuals.

Comparing Ring(2) and Lipizzaner, they provide close results. Though they use different topologies, their signal propagation and selection operate equally.

As the results do not follow a normal distribution, we rank the studied methods using the Friedman rank statistical test and we apply Holm correction post-hoc analysis to asses the statistical significance. For all the population sizes, the Friedman ranks Ring (1), Lipi-Ring, Lipizzaner, and Ring(2) as the firsts, second, third, and fourth, respectively. However, the significance (p-value) varies from p-value=5.67×10 −6 for population size 9 to p-value=1.13×10 −2 (i.e., p-value≥0.01) for population sizes 16 and 25. According to the Holm post-hoc analysis, Ring(1) and Ring (1) are statistically better than Ring (2) and Lipizzaner (which have no statistical differences between each other) for population size 9. For the other population sizes, Ring (1) provides statistically better results than Ring(2) and Lipizzaner and there is no significant difference among Ring(1), Ring(2), and Lipizzaner. Table 3 shows Ring (1) is better than Ring(2) and Lipizzaner (lower FID is better). However, the difference between their FIDs decreases when population size increases. This indicates that migration modes that allow faster propagation take better advantage of bigger populations.

Next, we evaluate the FID score throughout the GAN training process. Figure 2 illustrates the changes of the median FID during the training process. Ring (1) is not included because it operates the same as Ring(1).

According to Figure 2 , none of the evolutionary GAN training methods seem to have converged. Explaining this will be left to future work. The FID score almost behaves like a monotonically decreasing function with oscillations. The methods with larger sub-populations, i.e., Ring(2) and Lipizzaner, show smaller oscillations which implies more robustness (less variance).

Focusing on the methods with a ring topology, we clearly see the faster convergence when =1. Ring(1) most of the time provides smaller FID values than Ring(2). The reduced sub-population ( =3) favors the best individual from the sub-population to be selected during the tournament (it increases the selection pressure). Figure 2 shows Ring(1), Ring(2), and Lipizzaner converge to similar values. This is in accordance with the results in Table 3 that indicate these three methods provide comparable FID scores. Finally, Figure 3 illustrates some samples synthesized by generators trained using populations of size 16. As it can be seen, the four sets of samples show comparable quality. 

This section evaluates the diversity of the samples generated by the best generative models obtained each run. Table 4 reports the TVD for each method and population size. Recall we prefer low TVDs (high diversity) to show the quality of the generative model. The results in Table 4 demonstrate that, as the population size increases, the resulting trained generative models are able to provide more diverse samples, that is, they have better coverage of the latent distribution. Thus, again, all the methods take advantage bigger populations.

For population size 9, the mean and median TVD of Ring(1) and Ring (1) are the lowest. According to Friedman ranking Ring (1), Ring(1), Lipizzaner, and Ring(2) are first, second, third, and fourth, respectively (p-value=0.0004). The Holm post-hoc correction confirms that there are no significant differences between Ring(1) and Ring (1) and they are statistically more competitive than Ring(2) and Lipizzaner (p-values<0.01).

For bigger populations, the statistical analyses report that there are no significant differences between the Ring(1), Ring(2), and Lipizzaner, and Ring (1) is statistically better than Ring(2) and Lipizzaner (p-values<0.01).

With the support of the FID and TVD results, we can answer RQ1: What is the effect on the generative model trained when changing the directionality of the signal propagation from four directions to two? Answer: The impact on the results of performing migration to one or more neighbors is higher than the directionality itself. If we isolate the directionality, i.e., Lipi-Ring vs. Lipizzaner, the main differences are revealed when =1. Thus, Ring (1) generators creates statistically better samples (FID and TVD results) than Lipizzaner. However, when =2, there is no significant difference between Ring(2) and Lipizzaner for all the evaluated population sizes. So, we observe that directionality is irrelevant.

We can also answer RQ2: When the signal is propagated to only two directions, what is the impact of performing migration in one or more neighbors? Answer: When comparing between Lipi-Ring with =1 and with =2 using the same number of training epochs (i.e., Ring(1) and Ring(2)), they perform the same although Ring(1) converges faster. The smallest sub-population of Ring(1) likely increases the selection pressure and the convergence speed of the sub-populations, despite slower signal propagation.

This section analyzes the diversity in the genome space (i.e., the distance between the weights of the evolved networks). We evaluate the populations of size 9 and 16 to see how the migration model and sub-population size affect their diversity. 2 diversity and Table 5 summarizes the results. Figure 4 presents the 2 distances for the populations that represent the median 2 . Focusing on the Lipi-Ring methods with =1, the population diversity diminishes with more training, i.e., 2 distances between the networks in Ring(1) are higher than in Ring (1) for both population sizes, see Table 5 and Figure 4 . This shows that as the ring runs longer the genotypes are starting to converge.

Taking into account the Lipi-Ring methods with =1 and =2 that performed the same number of training epochs, i.e., Ring(1) and Ring (2), the first one shows higher 2 distances (darker colors in Figure 4 ). This confirms that the populations are more diverse when signal propagation is slower.

Comparing Ring (2) and Lipizzaner, that have the same subpopulation size and migration to four neighbors, there is not a clear trend because Lipizzaner generated more diverse populations for population size 9 and Ring(2) for population size 16. Therefore, we have not found a clear effect of changing the migration from four to two directions.

According to these results, we can answer the questions formulated in RQ3. How does diversity change over time in a ring topology? When Ring(1) performs more training the diversity decreases. How does diversity compare with a ring topology with different neighborhood radius? As the radius increases, the diversity is lower because the propagation of the best individuals is faster. How does diversity compare between ring topology and 2D grid, where both methods have the same sub-population size and neighborhood, but different signal directionality? We have not proven any clear impact on the diversity when changing the directionality for the methods that have the same sub-population size and neighborhood.

According to the empirical results, in general, Ring(1) and Lipizzaner compute generative models with comparable quality for MNIST (training MLP networks). Here, we study both methods training generative models for generating synthesized samples of CelebA and COVID-19 datasets using convolutional DNN, which are bigger networks (having more parameters). Table 6 shows the best IS value from each of the 10 independent runs for each method with population size 9.

In this case, it is more clear that both methods demonstrated the same performance. Focusing on CelebA dataset, Ring(1) computes higher (better) mean, minimum, and maximum IS, but Lipizzaner shows higher median. For the COVID-19 dataset, Lipizzaner shows better mean, median, and maximum IS values. The statistical analysis (ANOVA test) corroborates that there is not significant differences between both methods for both datasets (i.e., CelebA p-value=3.049 and COVID-19 p-value=0.731). Finally, Figure 5 illustrates some samples of CelebA and COVID-19 synthesized by the generators trained using Ring(1) and Lipizzaner. As it can be seen, the samples show similar quality. Note that these results are in the line of the answers given to RQ1 and RQ2. The change of directionality of the signal propagation and of the migration model allows the method to achieve the same results as Lipizzaner.

Here, we evaluate the results incrementally increasing the ring size of Ring(1) from 2 to 9 using MNIST dataset. Figure 6 illustrates how increasing the population size of Ring(1) by only one individual improves the result (reduces the FID). However, Lipizzaner using, an a×b 2D-grid would require adding (at least) or individuals. 

We know that training with smaller sub-populations sizes takes shorter times because it performs fewer operations. As the reader may be curious about the time savings of using Ring(1) instead of Lipizzaner, we compare their computational times for the experiments performed. Notice that they perform the same number of training epochs (see Section 4) and provide comparable quality results. Table 7 summarizes the computational cost in wall clock time for all the methods, population sizes, and datasets. All the analyzed methods have been executed on a cloud architecture, which could generate some discrepancies.

As expected, Ring(1) requires shorter times. Comparing both methods on MNIST, Lipizzaner needed 33.46%, 25.01%, and 14.20% longer times than Ring(1) for population sizes 9, 16, and 25, respectively. This indicates that the computation effort of using 40% bigger sub-populations (5 instead of 3 individuals) affects less the running times as the population size increases. For CelebA and COVID-19 experiments, Ring(1) reduces the mean computational time by 23.22% in CelebA experiments and by 41.18% in COVID-19. The time saving is higher for COVID-19 mainly because the training methods performed more epochs (1,000) for this dataset than for CelebA (20 epochs). The sub-population size principally affects the required effort to perform the evaluation of the whole sub-population and the selection/replacement operation which are carried out for each training iteration. This explains why the time saving are higher for COVID-19 experiments.

The empirical analysis of different spatially distributed CEA GAN training methods shows that the use of a ring topology instead of a 2D grid does not lead to a loss of quality in the computed generative models, but it may improve them (it depends on the setup).

Ring (1), which uses a ring topology with neighborhood radius =1 and run for the same time than Lipizzaner, produced the best generative models. Ring(1), Ring(2), and Lipizzaner, which were trained for the same training epochs, trained comparable generative models on the MNIST, CelebA, and COVID-19 datasets (similar FID and TVD for MNIST and IS for CelebA and COVID-19).

In terms of diversity, Ring(1) shows the most diverse populations which diminishes with more training. Focusing on ring topology, when the migration radius increases, i.e., Ring(1) ( =1) vs. Ring(2) ( =2), the diversity decreases. Finally, we have not found a marked difference on the diversity of the populations when changing the migration directionality , i.e., comparing between Ring(2) and Lipizzaner.

Ring (1), changing the signal propagation from four to two directions and using a migration of radius one, reduced the computational time cost of Lipizzaner by between 14.2% and 41.2%, while keeping comparable quality results.

Future work will include the evaluation of Ring(1) on more datasets, bigger populations, and for longer training epochs. We will apply specific strategies to deal with small sub-populations (3 individuals per cell) to analyze the effect of reducing the high selection pressure. We will perform a convergence analysis to provide appropriate Lipizzaner and Ring(1) setups to address MNIST. Finally, we are exploring new techniques to evolve the network architectures during the CEA training.

Towards distributed coevolutionary GANs

Towards Principled Methods for Training Generative Adversarial Networks. arXiv e-prints, art

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

Do GANs learn the distribution? Some Theory and Empirics

Multi-objective evolutionary GAN

COVID-19 image data collection

Coevolution of generative adversarial networks

The MNIST Database of Handwritten Digit Images for Machine Learning Research

Evolved GANs for generating Pareto set approximations

Generative adversarial nets

Improved training of Wasserstein GANs

The relativistic discriminator: a key element missing from standard GAN

Progressive growing of GANs for improved quality, stability, and variation

Deep Learning Face Attributes in the Wild

Surrogate-assisted evolutionary algorithms

Least squares generative adversarial networks

Dual discriminator generative adversarial nets

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Coevolutionary principles

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Improved techniques for training GANs

Lipizzaner: A System That Scales Robust Generative Adversarial Network Training

Spatial Evolutionary Generative Adversarial Networks

Analyzing the Components of Distributed Coevolutionary GAN Training. In Parallel Problem Solving from Nature -PPSN XVI

Data Dieting in GAN Training

Re-Purposing Heterogeneous Generative Ensembles with Evolutionary Computation

Evolutionary generative adversarial networks

Investigating the Success of Spatial Coevolution

Energy-based generative adversarial network