key: cord-0153537-86wznoos authors: Franci, Barbara; Grammatico, Sergio title: Training Generative Adversarial Networks via stochastic Nash games date: 2020-10-17 journal: nan DOI: nan sha: 475c711c3ab04fc1cbf28dfc5c3a5bb18d31c972 doc_id: 153537 cord_uid: 86wznoos Generative adversarial networks (GANs) are a class of generative models with two antagonistic neural networks: a generator and a discriminator. These two neural networks compete against each other through an adversarial process that can be modeled as a stochastic Nash equilibrium problem. Since the associated training process is challenging, it is fundamental to design reliable algorithms to compute an equilibrium. In this paper, we propose a stochastic relaxed forward-backward (SRFB) algorithm for GANs and we show convergence to an exact solution when an increasing number of data is available. We also show convergence of an averaged variant of the SRFB algorithm to a neighborhood of the solution when only few samples are available. In both cases, convergence is guaranteed when the pseudogradient mapping of the game is monotone. This assumption is among the weakest known in the literature. Moreover, we apply our algorithm to the image generation problem. Generative adversarial networks (GANs) are an example of unsupervised generative model. The basic idea is that, given some samples drawn from a probability distribution, the neural network takes a training set and learns how to obtain an estimate of such distribution. Most of the literature on GANs focuses on sample generation (especially image generation), but they can also be designed to explicitly estimate a probability distribution [1] , [2] , [3] . The learning process of the neural networks in GANs is made via an adversarial process, in which not only the generative model, but also the opponent, are simultaneously trained. Indeed, there are two neural network classes: the generator that creates data according to a given distribution, and the discriminator that tries to recognize if the samples come from the training data or from the generator. As an example, the generator can be considered as a team of counterfeiters, trying to produce fake currency, while the discriminative model, i.e., the police, tries to detect the counterfeit money [2] . To succeed in this game, the former must learn to reproduce money that are indistinguishable from the original currency, while the discriminator must recognize the samples that are drawn from the same distribution as the training data. Through the competition, both teams improve their methods until the counterfeit currency is indistinguishable from the original. Besides this simplistic interpretation, the subject has been widely studied in the literature, because it has many and various applications. In addition to the classic image generation problem [1] , [4] , GANs have been applied in medicine, e.g., to improve the diagnostic performance of the low-dose computed tomography method [5] and recently to detect pneumonia in potential Covid-19 patients [6] . Moreover, they can be used for correcting images taken under adverse weather conditions (as rain) [7] , [8] , editing facial attributes [9] , image inpainting [10] , [11] as well as Pacman [12] . The reason why these networks are called adversarial is related to the fact that they can be modeled as a game, where each agent payoff depends on the variables of the other agent [13] , [14] . However, the players in GANs can be also considered as cooperative players since they share information with each other [1] , [14] , [15] . Since there are only the generator and the discriminator, the problem is an instance of a two-player game and it can be also casted as a zerosum game, depending on choice of the cost functions. From a more general point of view, the class of games that suits the GAN problem is that of stochastic Nash equilibrium problems (SNEPs) where each agent tries to minimize its expected value cost function. Given their connection with game theory, GANs have received theoretical attention as well, both on the study of the associated Nash equilibrium problem [14] , [16] and on the design of algorithms to improve the learning and training process [16] , [17] . Among the available methods to solve a SNEP, an elegant approach is to recast the problem as a stochastic variational inequality (SVI) [17] , [18] , [19] . The advantage of this approach is that there are many algorithms available for finding a solution of an SVI, some of them already applied to GANs [17] , [20] . For instance, the most used in machine learning is the forward-backward algorithm [21] , also known as gradient descent [22] , which has the disadvantage that, to ensure convergence, the mapping should be cocoercive, i.e., strongly monotone and Lipschitz continuous. Since the GAN mapping is often non-convex [15] , [17] , one would prefer an algorithm that is guaranteed to converge for at most monotone mappings. In this case, it is possible to consider the extragradient (EG) algorithm [23] , [24] , [25] and the forward-backwardforward (FBF) algorithm [26] . The main downside of these two methods is that they require two costly evaluations of the arXiv:2010.10013v3 [cs. LG] 21 May 2021 pseudogradient mapping, which is computationally expensive. Due to the large-scale problem size, the ideal algorithm should not be computationally demanding and it should be guaranteed to converge under non-restrictive assumption on the pseudogradient mapping. Motivated by the need for computationally light algorithms converging under weak assumptions, we propose an algorithm that requests only one computation of the pseudogradient mapping at each iteration and we show its convergence under mere monotonicity. Specifically, our contributions are summarized next. • We propose a stochastic relaxed forward-backward (SRFB) algorithm and a variant with averaging (aSRFB) for the training process of GANs. The SRFB involves only one evaluation of the pseudogradient mapping at each iteration, therefore it is computationally cheaper than the EG and FBF algorithms. • We prove its convergence for monotone mappings, which is considered the "weakest possible" assumption on the pseudogradient mapping [27] . Specifically, whenever only a finite number of samples are available, we prove almost sure convergence to a neighborhood of the solution, while if an increasing set of samples is available, then the algorithm reaches an equilibrium almost surely. • We apply our algorithm to the image generation problem and compare it with the extragradient scheme. Our SRFB algorithm is inspired by [28] , [29] and a preliminary heuristic application to GAN was presented in [30] . Therein, we do not prove convergence of the SRFB algorithm nor of its aSRFB variant. Moreover, in [30] , we only run numerical simulations on synthetic toy examples while in this paper we train the two neural networks for the popular image generation problem with real benchmark data. Due to the connection between SNEPs and SVIs, many algorithms for variational inequalities have been applied to GANs [17] , [19] . The first one is the forward-backward (FB) algorithm [31] , [21] , also known as gradient descent [22] . It is the most used, even if in many cases it has been proven to be non-convergent [17] , [32] . From an operator-theoretic perspective, the FB is not convergent because the pseudogradient mapping should be cocoercive [33] and this is almost never the case in GANs. From an algorithmic perspective, the iterates typically cycle in a neighbourhood of a solution without reaching it [32] . Therefore, research has focused on the forward-backwardforward (FBF) algorithm and on the extragradient (EG) algorithm, that are guaranteed to converge for merely monotone mappings. The FBF algorithm, first presented in [34] and extended to the stochastic case in [26] , involves two evaluations of the pseudogradient mapping. A first attempt to apply the FBF algorithm for GANs is presented, along with a relaxed inertial FBF algorithm, in [35] . The extragradient (EG) method was first proposed in [36] and extended many years later to the stochastic case in [23] , [24] and to GANs in [17] , [25] . The EG algorithm requires two evaluations of the pseudogradient mapping as well, therefore in [17] , a variation is proposed. This involves an extrapolation from the past, i.e., it uses the evaluation of the mapping at previous time steps. In [17] the authors propose also the FB and the EG algorithms with averaging. The averaging technique was first proposed for VIs in [22] and studied more recently in [17] , [37] . In [37] , the authors examine two different techniques for averaging: the moving average, which computes the time-average of the iterates, and the exponential moving average which computes an exponentially discounted sum. For both the techniques, they show that, despite convergence cannot be proven, the averaging may help stabilizing the iterates, driving them towards a neighborhood of the solution. While [37] has mostly a heuristic approach, theoretical convergence studies are presented in [38] , [32] . Therein, the authors show that local convergence and stability properties of GAN training depends on the eigenvalues of the Jacobian of the associated gradient vector field. Another theoretical aspect that has not been extensively addressed yet is the inherent relation between GANs and game theory. In [14] , the authors formally introduce Generative Adversarial Network Games, describing (and seeking for) the Nash equilibria of the zero-sum game as saddle points in mixed strategies. The study of saddle-point problems is also studied, in connection with GANs, in [20] . The authors in [15] , instead, prove that Adam [39] , a second order method for GANs, converges to a stationary local Nash equilibrium. Let R indicate the set of real numbers and letR = R ∪ {+∞}. ·, · : R n × R n → R denotes the standard inner product and · represents the associated Euclidean norm. Given N vectors x 1 , . . . , x N ∈ R n , x := col (x 1 , . . . , x N ) = x 1 , . . . , x N . For a closed set C ⊆ R n , the mapping proj C : R n → C denotes the projection onto C, i.e., proj C (x) = argmin y∈C y − x . The idea behind generative adversarial networks (GANs) is to set up an antagonistic training process between the generator and the discriminator. Typically, the generator and the discriminator are represented by two deep neural networks, and accordingly, they are denoted by two functions, differentiable with respect to their inputs and parameters. The generator creates samples that aim at resembling the distribution of the training data. The generator is therefore trained to fool the discriminator who, in turn, examines the samples to determine whether they are real or fake. This adversarial mechanism can be modeled as a game where the generator and the discriminator represent the players, who want to improve their payoff [14] . Formally, the generator is a neural network class, represented by a differentiable function g, with parameters vector x g ∈ Ω g ⊆ R ng . Let us denote the (fake) output of the generator with g(z, x g ) ∈ R q where the input z is a random noise vector drawn from the data prior distribution, z ∼ p z [14] . In game-theoretic terms, the strategies of the generator are the parameters x g that allow g to generate the fake output. Similarly to the generator, the discriminator is a neural network class with parameter vector x d ∈ Ω d ⊆ R nd and a single output d(v, x d ) ∈ [0, 1] that indicates how good is the input v. The output of the discriminator can be interpreted as the probability of being real that d assigns to an element v. The strategies of the discriminator are the parameters x d . Usually [17] , [2] , the payoff of the discriminator is given by the function where φ : [0, 1] → R is a measuring function. The typical choices for φ are the Kullback-Leibler divergence or the Jensen-Shannon divergence (a logarithm) as in [2] but other options (such as the Wasserstein distance) are proposed in the literature [3] , [40] . Regardless, the mapping in (1) can be interpreted as the distance between the fake value and the real one. The payoff of the generator (J g ) instead, depends on how we describe the game. In fact, the problem can be modeled as a two-player game, or as a zero-sum game, depending on the cost functions. To cast the problem as a zero-sum game, the functions J g and J d should satisfy the following relation: ( Then, we can rewrite it as a minmax problem, i.e., In words, (3) means that the generator aims at minimizing the distance between the real value and the fake one, while the discriminator wants to maximize such a distance, i.e., d aims at recognizing the generated data. When the generator has a different payoff function from the discriminator, e.g., given by [17] then the problem is not a zero-sum game. Since the two-player game with cost functions (1) and (4) and the zero-sum game with cost function (1) and relation (2) have the same pseudogradient mapping (defined Section III), it can be proven that the two equilibria are strategically equivalent [14, Th. 10]. In this section, let us describe the GAN game as a generic stochastic Nash equilibrium problem (SNEP). The two neural network classes are indexed by the set I = {g, d}. Each agent i ∈ I has a decision variable x i ∈ Ω i ⊆ R ni . In general, the local cost function of agent i ∈ I is defined as for some measurable function The cost function J i of agent i ∈ I depends on its local variable x i , the decisions of the other player x j , j = i, and the random variable ξ : Ξ → R d that represent the uncertainty. The latter arises when we do not know the distribution of the random variable or it is computationally too expansive to compute the expected value. In practice, this means that we have access only to a finite number of samples from the data distribution. Given the probability space (Ξ, F, P), E ξ indicates the mathematical expectation with respect to the distribution of the random variable ξ(ω) 1 For our theoretical analysis, some assumptions on the cost function and the feasible set should be postulated. The following assumptions are standard in monotone game theory [41] , [42] . Assumption 1. For each i ∈ I, the set Ω i is nonempty, compact and convex. For each i, j ∈ I, i = j, the function J i (·, x j ) is convex and continuously differentiable. For each i ∈ I, j = i and for each ξ ∈ Ξ, the function J i (·, x j , ξ) is convex, continuously differentiable and Lipschitz continuous with the Given the decision variable of the other agent, the aim of each agent i is to choose a strategy x i that solves its local optimization problem, i.e., The solution of the coupled optimization problems in (6) that we are seeking is a stochastic Nash equilibrium (SNE) [42] . In other words, a SNE is a pair of strategies where neither the generator, nor the discriminator, can decrease its cost function by unilaterally deviating from its decision. While existence of a SNE of the game in (6) To seek for a Nash equilibrium, we rewrite the problem as a stochastic variational inequality (SVI). Let us first denote the pseudogradient mapping as We note that the possibility to exchange the expected value and the pseudogradient is ensured by Assumption 1 [42] . Then, the associated SVI reads as 1 Initialization: x 0 i ∈ Ω i 2 Iteration k: Agent i receives x k j for j = i, then updates: In light of Remark 1, we call variational equilibria (v-SNE) the solution of the SVI(Ω, F) in (8) where F is as in (7), i.e., the solution of the SVI that are also SNE. In this section, we propose two algorithms for solving the SNEP associated to the GANs process: a stochastic relaxed forward backward (SRFB) algorithm and its variant with averaging (aSRFB). The iterations read as in Algorithm 1 and Algorithm 2, respectively and they represent the steps for each agent i ∈ {g, d}. Algorithm 1 and Algorithm 2 differ, besides the presence of the averaging step, on the choice of the approximation used for the pseudogradient mapping. Moreover, we note that the averaging step in Algorithm 2, namely, can be implemented in a first-order fashion as for someλ K ∈ [0, 1]. Moreover, let us remark that (10) is different from (11a) and (14a). Indeed, in Algorithms 1 and 2, (11a) and (14a) are convex combinations, with a constant parameter δ, of the two previous iterates x k andx k−1 , while the averaging in (10) is a weighted cumulative sum over the decision variables x k for all the iterations k ∈ {1, . . . , K}, with time-varying weights {λ k } K k=1 . The parameterλ K can be tuned to obtain uniform, geometric or exponential averaging [17] , [37] . Let us now describe the approximation schemes used in the definitions of the algorithms. In the SVI framework, there are two main possibilities, depending on the samples available. Using a finite, fixed number of samples is called stochastic approximation (SA) [21] and it is widely used in the literature of SVIs, in conjunction with conditions on the step sizes to control the stochastic error [24] , [43] . In fact, unless the step size sequence is diminishing, it is only possible to prove convergence to a neighborhood of a solution. The stochastic approximation of the pseudogradient mapping, given one sample of the random variable reads as F SA uses one or a finite number, called mini-batch, of realizations of the random variable. Algorithm 2: Stochastic Relaxed forward-backward with averaging (aSRFB) 1 Initialization: x 0 i ∈ Ω i 2 Iteration k ∈ {1, . . . , K}: Agent i receives x k j for j = i, then updates: When a huge number of samples is available, one can consider using a different approximation scheme, i.e., . (13) In this case, an increasing number of samples, the batch size N k , is taken at each iteration [23] . The superscript VR stands for variance reduction and it is related to the property of the approximation error discussed in Remark 2. With the aim of proving convergence to a solution (or to a neighborhood of one) of Algorithms 1 and 2, we start this section with some assumptions that are common to both the algorithms. The following monotonicity assumption on the pseudogradient mapping is standard for SVI problems [23] , [29] , also when applied to GANs [17] and it is the weakest possible to hope for global convergence. (7) is monotone, i.e., F(x) − F(y), x − y ≥ 0 for all x, y ∈ Ω. Let us now define the filtration F = {F k }, that is, a family of σ-algebras such that F 0 = σ (X 0 ), F k = σ (X 0 , ξ 1 , ξ 2 , . . . , ξ k ) for all k ≥ 1, and F k ⊆ F k+1 for all k ≥ 0. For all k ≥ 0, let us also define the stochastic error as whereF indicates one of the two possible approximation schemes. In words, k in (15) is the distance between the approximation and the exact expected value mapping. Then, let us postulate that the stochastic error has zero mean and bounded variance, as usual in SVI [17] , [23] , [29] . Assumption 3. The stochastic error is such that, for all k ≥ 0, a.s., E[ k |F k ] = 0 and E[ k 2 |F k ] ≤ σ 2 . We now state the convergence result for Algorithm 1. First, let us postulate some assumptions functional to our analysis. We start with the batch size sequence, which should be increasing to control the stochastic error. Remark 2. Given F VR as in (13), it can be proven that, for some C > 0, E k 2 |F k ≤ Cσ 2 N k , i.e., the error diminishes as the batch size increases. Such result is, therefore, called variance reduction. More details can be found in [23, Lemma 3.12] [31, Lemma 6] . In addition to Assumption 2, we postulate that the pseudogradient mapping is Lipschitz continuous. (7) is -Lipschitz continuous for > 0, i.e., F(x) − F(y) ≤ x − y for all x, y ∈ Ω. Using the variance reduced scheme in (13), we can take a constant step size, as long as it is small enough while the relaxation parameter should not be too small. Assumption 6. The step size in Algorithm 1 is such that λ ∈ (0, 1 2δ(2 +1) ] where is the Lipschitz constant of F in (7) as in Assumption 5. The relaxation parameter in Algorithm 1 is such that δ ∈ [ 2 1+ √ 5 , 1]. We can finally state our first convergence result. Theorem 1. Let Assumptions 1-6 hold. Then, the sequence (x k ) k∈N generated by Algorithm 1 with F VR as in (13) converges a.s. to a SNE of the game in (6). Proof. See Appendix B. In this section, we state the convergence result (and the required assumptions) for Algorithm 2. First, the bound on the relaxation parameter is wider in this case (compared to Assumption 6). Assumption 7. The relaxation parameter in Algorithm 2 is such that δ ∈ [0, 1]. Next, we postulate an assumption on the SA approximation in (12) , reasonable in our game theoretic framework [17] . We also assume an explicit bound on the feasible set. Assumption 9. The local constraint set Ω is such that max To measure how close a point is to the solution, let us introduce the gap function, which is equal 0 if and only if x is a solution of the (S)VI in (8) [18, Eq. 1.5.2]. Other possible measure functions can be found in [17] . We are now ready to state our second convergence result. Theorem 2. Let Assumptions 1-3 and 7-9 hold. Let X K = 1 K K k=1 x k , c = 2−δ 2 1−δ , B be as in Assumption 8, R be as in Assumption 9 and σ 2 be as in Assumption 3. Then, the sequence (x k ) k∈N generated by Algorithm 2 with constant step size and F SA as in (12) is such that Thus, lim K→∞ E[err(X K )] = (2B 2 + σ 2 )λ. Remark 3. The average defined in Theorem 2 is not in conflict with the definition in (9) because if we consider a fixed step size, it holds that Let us present some numerical experiments to validate the analysis. We show how GANs are trained using our SRFB algorithm and we propose a comparison with one of the most used algorithms for GANs. Specifically, we compare our SRFB algorithm with the extragradient (EG) algorithm (Algorithm 3) [17] . We note that, compared to Algorithm 1, Algorithm 3 involves two projection steps and two evaluations of the pseudogradient mapping. For the simulations we use Adam (Algorithm 4) [39] instead of the stochastic gradient [21] . In Algorithm 5 we propose the Relaxed Adam, i.e., the SRFB algorithm with Adam; the EG algorithm with Adam can be derived similarly [17, Algorithm 4] . All the simulations are performed on Matlab R2020a with 128G RAM and 2 * Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (20 cores each). We train two DCGAN architectures [3] , [44] (presented in Table I ) on the CIFAR10 dataset [45] with the GAN objective [1] , [2] . We choose the hyperparameters of Adam as β 1 = 0.5 and β 2 = 0.9. We compute the inception score [46] to have an objective comparison: the higher the inception score, the better the image generation. In Figure 1 , we show how the inception score increases with time; the solid lines represent a tracking average over the previous and following 50 values of the inceptions score, which is averaged over 20 runs. The transparent area indicated the maximum and minimum values obtained in the 20 runs. We note that the SRFB algorithm is computationally less demanding than the EG algorithm. Specifically, in Figure 1 , after 24 hours (86400 seconds), the SRFB has performed approximatively 130000 iterations while the EG 90000. The averaged aSRFB shows worse performances (after approximatively 110000 iterations), but this is to be expected since we have convergence only to a neighborhood of the solution (Theorem 2). In Figure 2 , we show the mean to variance ratio (average Inception Score divided by its variance) at each time instant of the three algorithms. As one can see, from Figure Receives y k j for j = i, then updates: Algorithm 4: Adam Input : Initial parameters x 0 ,x 0 ∈ Ω Exponential decay rates β 1 , β 2 ∈ [0, 1) Step size α Output: Parameters x k+1 1 Initialize 2 1 st moment vector z 0 = 0 3 2 nd moment vector y 0 = 0 4 Time step k=0 Input : Initial parameters x 0 ,x 0 ∈ Ω Exponential decay rates β 1 , β 2 ∈ [0, 1) Step size α Relaxing parameter δ ∈ # update parameters 13 end 14 return x K 1 and 2 the SRFB algorithm has a similar performance to the EG algorithm but with a smaller variance, hence a higher reproducibility. The stochastic relaxed forward-backward algorithm is a very promising algorithm for training Generative Adversarial Networks. If an increasing number of samples is available and the pseudogradient mapping of the game is monotone, convergence to the exact solution holds. Instead, with only a finite, fixed mini-batch and the same monotonicity assumption, convergence to a neighborhood of the solution can be proven by using an averaging technique. Our numerical experience shows a similar performance compared to the extragradient scheme, widely used in the literature for GANs. For the future, it would be interesting to extend the convergence result to an exact solution also in the case of a small mini-batch. Since the cost function associated to GAN is often non-convex, it would also worth finding algorithms converging under weaker assumptions than monotonicity. We here recall some facts about norms, some properties of the projection operator and a preliminary result. Some results find inspiration from [28] where the algorithm is presented in the deterministic case. We start with the norms. We use the cosine rule Concerning the projection operator, by [47, Proposition 12.26] , it satisfies the following inequality: let C be a nonempty closed convex set, then, for all x, y ∈ C The projection is also firmly non expansive [47, Prop. 4.16] , and consequently, quasi firmly non expansive [47, Def. 4.1] . The Robbins-Siegmund Lemma is widely used in literature to prove a.s. convergence of sequences of random variables. Lemma 1 (Robbins-Siegmund Lemma, [48] ). Let F = (F k ) k∈N be a filtration. Let {α k } k∈N , {θ k } k∈N , {η k } k∈N and {χ k } k∈N be non negative sequences such that k η k < ∞, Then k θ k < ∞ and {α k } k∈N converges a.s. to a non negative random variable. The next lemma collects some properties that follow from the definition of the SRFB algorithm. Lemma 2. Given Algorithm 1, the following statements hold. Proof. Straightforward from Algorithm 1 and [28] . Proof of Theorem 1. Using the property of projection operator (20) we have Using Lemma 2.1, (22) becomes Then, adding (21) and (23) we obtain (24) Now we use the cosine rule (17), Then, by reordering and substituting in (24), we obtain: By using Lemma 2.2 and 2.3 as in (28) , and substituting in (25) , grouping and reordering, we get where we used Assumption 6. Moreover, by using Lipschitz continuity of F and Cauchy-Schwartz and Young's inequality, it follows that . Similarly we can bound the term involving the stochastic errors: 2λ By substituting in (26), we conclude that Now, we consider the residual function of x k : where we added and subtracted x k+1 = proj(x k −λF VR (x k )) in the first inequality and used the firmly non expansiveness of the projection and (19) . It follows that By substituting in (27), we have that Finally, by taking the expected value, grouping and using Remark 2 and Assumptions 3 and 6, we have Applying the Robbins Siegmund Lemma we conclude that α k converges and that k∈N θ k is summable. This implies that the sequence (x k ) k∈N is bounded and that x k −x k → 0 (otherwise 1 δ x k −x k 2 = ∞). Therefore (x k ) k∈N has at least one cluster pointx. Moreover, since k∈N θ k < ∞, res(x k ) 2 → 0 and res(x k ) 2 = 0. Proof of Theorem 2. We start by using the fact that the projection is firmly quasinonexpansive. Now we apply Lemma 2.2 and Lemma 2.3 to x k+1 − x * : (28) Then, we can rewrite the inequality as By applying Young's inequality we obtain Then, inequality (29) becomes Reordering, adding and subtracting 2λ k F(x k ), x k − x * and using Lemma 2, we obtain Then, by the definition of k , reordering leads to Next, we sum over all the iterations, hence inequality (32) becomes 33) Using Assumption 2 and resolving the sums, we obtain Now, we note that k , x k − x * = k , x k − u k + k , u k − x * . We define u 0 = x 0 and u k+1 = proj(u k − λ k k ), thus Therefore, 2λ k k , x k − x * = 2λ k k , x k − u k + u k − x * 2 + λ k k 2 − u k+1 − x * 2 . By including this in (34) and by doing the sum, we obtain By definition, u 0 − x * 2 = x 0 − x * 2 . Then, tby aking the expected value in (36) and using Assumption 3, we conclude that Let us define S = K k=1 λ k , Finally, it holds that if λ k is constant, S = Kλ and K k=1 λ 2 k = Kλ 2 F(x * ), X K − x * ≤ cR Kλ + (2B 2 + σ 2 )λ. Nips 2016 tutorial: Generative adversarial networks Generative adversarial nets Generative adversarial networks in computer vision: A survey and taxonomy Sp-gan: Self-growing and pruning generative adversarial networks Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss Detection of coronavirus (covid-19) associated pneumonia based on generative adversarial networks and a fine-tuned deep transfer learning model using chest x-ray dataset Image de-raining using a conditional generative adversarial network Single-image de-raining with feature-supervised generative adversarial network Mu-gan: Facial attribute editing based on multiattention mechanism Research on image inpainting algorithm of improved gan based on two-discriminations networks The improved image inpainting algorithm via encoder and similarity constraint Learning to simulate dynamic environments with gameGAN Randomized prediction games for adversarial machine learning GANgs: Generative adversarial network games GANs trained by a two time-scale update rule converge to a local nash equilibrium On gradient-based learning in continuous games A variational inequality perspective on generative adversarial networks Finite-dimensional variational inequalities and complementarity problems Stochastic learning via optimizing the variational inequalities Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile A stochastic approximation method On the weak convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in hilbert space Extragradient method with variance reduction for stochastic variational inequalities Optimal robust smoothing extragradient algorithms for stochastic variational inequality problems Revisiting stochastic extragradient Minibatch forward-backward-forward methods for solving stochastic variational inequalities Generalized Nash equilibrium problems Golden ratio algorithms for variational inequalities Stochastic generalized Nash equilibrium seeking in merely monotone games A game-theoretic approach for generative adversarial networks A distributed forward-backward algorithm for stochastic generalized nash equilibrium seeking Which training methods for GANs do actually converge Comments on "distributed robust adaptive equilibrium computation for generalized convex games A modified forward-backward splitting method for maximal monotone mappings A relaxed inertial forwardbackward-forward algorithm for solving monotone inclusions with application to GANs The extragradient method for finding saddle points and other problems The unusual effectiveness of averaging in GAN training The numerics of GANs Adam: A method for stochastic optimization On generalized Nash games and variational inequalities On the characterization of solution sets of smooth and nonsmooth convex stochastic Nash games Regularized iterative stochastic approximation methods for stochastic variational inequality problems Unsupervised representation learning with deep convolutional generative adversarial networks Learning multiple layers of features from tiny images Improved techniques for training GANs Convex analysis and monotone operator theory in Hilbert spaces A convergence theorem for non negative almost supermartingales and some applications She received the Bachelor's and Master's degree in Pure Mathematics from University of Siena he received the Bachelor's degree (summa cum laude) in Computer Engineering, the Master's degree (summa cum laude) in Automatic Control Engineering, and the Ph.D. degree in Automatic Control he was an Assistant Professor, first in the Department of Electrical Engineering, Control Systems