key: cord-0045172-miz6eit6
authors: Hwang, Hyung Ju; Jang, Jin Woo; Jo, Hyeontae; Lee, Jae Yong
title: Trend to Equilibrium for the Kinetic Fokker-Planck Equation via the Neural Network Approach
date: 2020-06-10
journal: J Comput Phys
DOI: 10.1016/j.jcp.2020.109665
sha: ce34a9e43f69be00e826b53b0bbf51076bfaf5d0
doc_id: 45172
cord_uid: miz6eit6

The issue of the relaxation to equilibrium has been at the core of the kinetic theory of rarefied gas dynamics. In the paper, we introduce the Deep Neural Network (DNN) approximated solutions to the kinetic Fokker-Planck equation in a bounded interval and study the large-time asymptotic behavior of the solutions and other physically relevant macroscopic quantities. We impose the varied types of boundary conditions including the inflow-type and the reflection-type boundaries as well as the varied diffusion and friction coefficients and study the boundary effects on the asymptotic behaviors. These include the predictions on the large-time behaviors of the pointwise values of the particle distribution and the macroscopic physical quantities including the total kinetic energy, the entropy, and the free energy. We also provide the theoretical supports for the pointwise convergence of the neural network solutions to the a priori analytic solutions. We use the library PyTorch, the activation function tanh between layers, and the Adam optimizer for the Deep Learning algorithm.

1. Introduction 1.1. Motivation. One of the main questions of interest in the study of the dynamics of rarefied gas particles is on the time-asymptotic behaviors of the particle density distribution and its macroscopic quantities. Since the era of Ludwig Boltzmann, the validity of the time-irreversibility and the entropy production of the Boltzmann kinetic equation has long been a bone of contention due to the Poincaré recurrence theorem. Indeed, it is an important problem to show that the time-scale of the convergence towards the equilibrium is much smaller than the time-scale of the validity of the Boltzmann equation.

In this work, we study the time-irreversibility and the entropy production of the kinetic Fokker-Planck equation, which is a fundamental model for a physical plasma. This question has been heavily studied in both analytic and numerical aspects. The Lyapunov functional for the kinetic Fokker-Planck equation is given by the relative entropy functional with respect to the steady-state, which we will describe in more detail below, and we provide a newly-devised numerical method of using a machine learning algorithm for the study of the large-data asymptotic behaviors of the Deep Neural Network (DNN) solutions to the kinetic Fokker-Planck equation in a bounded domain.

In fact, it has been considered numerically difficult to simulate an initial boundary value problem for a kinetic partial differential equation. One of the difficulties arises from the fact that a numerical solution is closely related to a computation in a bounded domain by nature, whereas the boundary of a kinetic equation still consists of the whole space in the momentum variable v. In order to resolve this issue, people (c.f. [31] ) have considered the decaying properties of the solutions to a kinetic equation; i.e., if one could observe that the solution decays fastly enough in time outside a compact set, then the size (in a specific sense) of solution outside the compact region is close to zero so one can treat the difference as a small error numerically. Another difficulty arises from the huge computational cost on the simulation of a kinetic partial differential equation due to the extra dimensions from the momentum variable v. This becomes worse when we consider an integral-based operator such as the classical Boltzmann collision operator or the Landau-Boltzmann collision operator.

The Deep Learning method is a new approach for solving partial differential equations that can resolve some of the issues that we listed above. The Deep Learning algorithm has an advantage of being intuitive and easy to be executed via the backpropagation method. The algorithm produces approximated solutions that can be differentiated continuously on domains. Also, it is relatively easier to put information in the algorithm by adding a term to the loss function via the Deep Learning approach. For example, we can simply include a term regarding the conservation of the total mass of the system in the total Loss function of the algorithm, as the scheme is proposed to conserve the total mass under several boundary conditions. In addition, the Deep Learning algorithm can be extended to arbitrary domains so it is not necessary to worry about how to split a domain into triangles as in the numerical methods. Although our DNN algorithm uses the uniform grids of domains, using sampling random points from a domain can be applied to a special domain in higher dimension equations [64] .

However, there are also some weaknesses of the approach that one should be careful of. Firstly, there is no guarantee that the Deep Learning algorithm will converge and it is theoretically difficult to show the convergence of the Deep Learning Algorithm. Also, it is hard to evaluate the accuracy of the Deep Learning algorithm in contrast with the numerical methods. So far, lots of varied measures are suggested to express the performance of a DNN model. Due to the randomly initialized parameters in a Deep Learning algorithm, each learning could give slightly different solutions, while the numerical methods are deterministic.

A brief history of the past results.

Mathematical results on the Fokker-Planck equation. The existence and the uniqueness of the solutions to the Fokker-Planck equation have been heavily studied. Dita [30] constructs an analytic solution of the stationary 1D Fokker-Planck equation with an absorbing boundary. Then Protopopescu [57] deals with the stationary 1D Fokker-Planck equation under some velocity-dependent external forces with some boundary conditions. DiPerna-Lions [29] established stability results for sequences of solutions and global existence for the Cauchy problem of the Fokker-Planck equation with large data. Desvillettes and Villani [28] showed that a polynomial decay for the solutions with suitable initial conditions to a global equilibrium with the help of logarithmic Sobolev inequalities. The irregular coefficients in Fokker-Planck, and transport equations were also studied in [12, 51] . Later, Mischler [53] provided the stability results of DiPerna-Lions renormalized solutions for Fokker-Planck equations with Maxwell boundary conditions. [62] showed that the well-posedness of the steady Fokker-Planck solutions in smooth domains with absorbing boundary conditions.

Regarding the hypoellipticity of the equation, Hwang-Jang-Velazquez in [40] showed the hypoellipticity properties for the Fokker-Planck equation with absorbing boundary conditions, and this has been generalized in [39] by Hwang-Jang-Jung. Also, Hwang-Phan [41] extended the results of [40] to the inflow boundary conditions.

Regarding the Vlasov-Poisson-Fokker-Planck system, Victory and O'Dwyer [65] proved the existence of local in time solutions to the VPFP system. Neunzert, Pulvirenti, and Triolo in [54] used a probabilistic method to prove the global existence of smooth solutions in one and two dimensions. Degond and Pierre [26] proposed a fully deterministic proof of the existence of global in time smooth solutions for the Vlasov-Fokker-Planck equations in one and two dimensions. They also proved that the solution of the VPFP equation converges to the solution of the VP equation as the coefficients in the FP operator term go to zero. Bouchut in [9, 10] showed the existence and uniqueness of strong and global in time solutions to the three-dimensional VPFP equation. The asymptotic behavior and the convergence to the equilibrium of the solutions to the Vlasov(-Poisson)-Fokker-Planck equation were studied in [8, 16, [18] [19] [20] . The stationary states and large time behavior of the Wigner-Fokker-Planck equation were studied in [3] . The low and high field scaling limits of Vlasov-Poisson-Fokker-Planck system were considered in [2] . The global existence and uniqueness of weak solutions to kinetic Kolmogorov-Vicsek models were considered in [34] . The Vlasov-Poisson-Fokker-Planck system with uncertainty and multiple scales was studied in [43, 70] . Regarding the recent development in the qualitative properties of the VPFP system, F. Bouchut and J. Dolbeault in [11] showed the large time behaviors and the steady-states for the solutions of the VPFP equation in the case that the particles occupy the whole space R 3 . Another related result is in [8] which studied the large time asymptotics for the VPFP system in a bounded domain with the reflection type boundary conditions.

Numerical results on kinetic equations. In this section, we would like to introduce a few past results on the numerical analysis of kinetic equations. Regarding past numerical results on the Fokker-Planck equation, we would like to start with some early developments via the conservative finite element method [5, 15, 21, 27, 48] . Regarding the (spatially homogeneous and inhomogeneous) nonlinear Landau collision equation, which is a generalized version of the linear Fokker-Planck equation, we have the early developments via the conservative finite element method [7, 13, 14, 24, 49, 56] . We also record a result via the spectral method [33] .

We note that many methods have been proposed to solve the Vlasov-Poisson-Fokker-Planck equation and the Fokker-Planck-Landau equation. Allen-Victory [1] and Havlak-Victory [36] proposed the random particle method with the analysis and the computational study of its method. The finite difference scheme was also used to solve the VPFP with the periodic 1D case in [22, 60, 61] . Another approach is the deterministic particle methods which are based on the characteristic trajectories for the transport term of the Vlasov-Poisson-Fokker-Planck equation [37] . Wollman and Ozizmir [67, 68] combined the deterministic method with the periodic regriding of the distribution function to approximate the solutions more stable and accurate. This method is extended to the two-dimensional case in [69] . The fast spectral method, which is also an alternative method for the numerical approximation was introduced in [55] . In [32] , Filbet and Pareschi presented a new spectral method that could be extended to the nonhomogeneous situation.

Neural networks and the Cauchy problem of a PDE. The neural network architecture was first introduced in [52] . Then, Cybenko [25] established sufficient conditions for which a continuous function can be approximated by finite linear combinations of single hidden layer neural networks with the same univariate function. Hornik-Stinchcombe-White [38] also showed measurable functions can be approximated by the multi-layer feedforward networks with a monotone sigmoid function. Then Cotter [23] extended the result of [38] to a new architecture, and later Li [50] proved that the multi-layer network with one hidden layer can approximate a target function and its higher partial derivatives on a compact set.

Though the theory of artificial neural networks as an approximation to solutions of differential equations has such a long history, the actual implementation of the ideas has a relatively short history due to the technical and algorithmical issues.

Solving differential equations using an artificial neural network with architecture including one single layer and ten units were studied in [46] , and the results were extended to a domain with complex boundaries in [47] . Then Jianyu et al. [42] replaced an activation function with a radial basis and solved the Poisson equation.

More recently, Berg-Nyström [6] used a DNN to solve steady problems in 1D and 2D space dimensions with complex geometry. Han-Jentzen-Weinan [35] then applied DNNs to the stochastic process for solving high dimensional differential equations. Very recently, Raissi-Perdikaris-Karniadakis [58] suggested an algorithm that solves both forward and inverse problems. Sequentially, Jo et al. [44] gave a theoretical reason that neural networks converge to analytic solutions in forward and inverse problems. Also, the application of other architectures or learning strategies of a convolutional neural network or reinforcement learning has been studied in [63, 66] . Also, the neural network approaches to solve a partial differential equation have been proposed in [59, 64] . 

where

is the probablistic density distribution of particles. In this paper, we consider the following 1-dimensional kinetic Fokker-Planck equation in a bounded interval Ω = (−1, 1) and d = 1 as

subject to the initial condition

The boundary conditions that we impose will be introduced in Section 1.4.

1.4. Boundary conditions. We define the outward normal vector n x on the boundary ∂Ω as

Since our Ω = (−1, 1) and hence ∂Ω = {−1, 1}, our n x =x at x = 1 and = −x at x = −1. Throughout this paper, we will denote the phase boundary of ∂Ω × R as γ def = ∂Ω × R. Additionally we split this boundary into an outgoing boundary γ + , an incoming boundary γ − , and a singular boundary γ 0 for grazing velocities, defined as In terms of the probability density function f , we formulate the following four physical boundary conditions throughout the paper.

1.4.1. Specular reflection boundary condition.

for x = −1 and 1.

Diffusive reflection boundary condition.

for both n =x and = −x.

( 

where M = f 0 (·, ·) L 1 x,v and C = |Ω| = 2. The Lyapunov functional η(t) is defined by the relative entropy of the solution f with respect to the stationary distribution f ∞ as

where the entropy of the system "Ent", the total kinetic energy "KE", and the total mass "Mass" of the system are defined as

and

We define the free energy functional "FE" as

whose time-derivative is equivalent to that of the Lyapunov functional η if the total mass is conserved. In the cases of the specular reflection, the diffusive reflection, and the periodic boundaries, we expect that the Lyapunov functional satisfies η (t) ≤ 0, which is a manifestation of the second law of thermodynamics.

The following balance laws on the macroscopic quantities [8, p. 1350 ] are also well-known:

• Balance of the total mass:

• Balance of the total kinetic energy:

• Balance of the entropy:

1.6. Main results, difficulties, and our strategy. The main results of the paper consist of two parts. One is on the theoretical evidence on the relationship between the DNN solutions and the a priori analytic solutions. More precisely, we first prove in Theorem 3.4 that a sequence of neural network approximated solutions, which make the total loss term converge to zero, exists if a C (1,1,2) solution to the equation exists. Namely, the theorem implies we can always find appropriate weights of the neural nework which reduce the total error functions as much as we want. However, since this does not guarantee the convergence of the approximated solutions as the total loss term converges to zero, we prove an additional Theorem 3.6 which provides us that the neural network solutions indeed converge to the analytic solution as the total loss term vanishes. This is proved under the suitable smallness on the boundary condition (3.3) which corresponds to the truncation of the v domain in the choice of our grid points, as introduced in Section 2.2. Though the equation is linear, the derivation of an energy inequality for the convergence includes the boundary integrations not just in x variable but also in v variable due to the truncation of v space and hence we needed the smallness condition on the difference of the approximated solutions and the analytic solution as in the condition (3.3) to deal with the boundary integrations in v variable. Also, in order to create a sufficiently large dissipation term on the left-hand side of the energy inequality, we follow the transformation of the Fokker-Planck equation motivated by Carrillo [17] .

On the other hand, we provide in Section 4 the time-asymptotic behaviors of the neural network approximated solutions and the macroscopic physical quantities to the kinetic Fokker-Planck equation under the various types of initial and boundary conditions with different values of diffusion and friction coefficients. The learning algorithms of our DNN algorithm are based on the development of the total loss function that contains the information of the Fokker-Planck equation and the initialboundary conditions and the appropriate selection of the grid points and the weights for each loss term so they fit to the physical conditions. Our algorithm uses the library PyTorch and the hyper-tangent tanh activation function between the layers. For the purpose of minimizing the total loss term, we use the Adam optimizer which is based on the stochastic gradient descent. One of the reasons that many of the existing numerical methods fail to simulate the solutions to several kinetic equations is on the issue of the conservation of the total mass. In order to deal with the difficulty, we remark that we increased the learning efficiency of the model by adding the mass conservation law to the total loss function. It is indeed a huge advantage of using the neural network approach that we can intuitively add any physically relevant conditions on the necessary quantities when we design the total loss function, though the reduction of the total loss function is another issue of concern that we need to take care of.

In addition, our numerical simulations indeed provide several predictions on the time-asymptotic behaviors of the a priori analytic solutions to the kinetic Fokker-Planck equation and the physically relevant macroscopic quantities in the pointwise sense. Namely, we provide the time-asymptotic plots on the actual pointwise value at each grid under varied types of conditions, and these definitely contain much more information than the graphs on the asymptotic behaviors of the general weighted L p -moments of the solution for some p. To the best of authors' knowledge, there have been no numerical plots of the pointwise values of the approximated solutions for a kinetic equation under the varied types of the physical boundary conditions, such as the inflow-type and the reflection-type boundary conditions. Via the numerical simulations, we have observed the asymptotic behaviors of the solutions to the equation that were theoretically proved under some specific situations; our numerical simulations predict the pointwise convergence of the neural network solutions to the global Maxwellian for varied types of the boundary conditions and the varied coefficients for the diffusion and the friction. Under the specular reflection boundary condition, we also provide the different rates of convergence under the varied choices of the friction coefficient β.

Outline of the paper. In Section 2, we will introduce in detail our DNN architecture and method to solve the Cauchy problem to the Fokker-Planck equation. This will include the detailed descriptions on the hidden layers and the definitions of grid points and loss functions. In Section 3.1, we will first prove that there exists a sequence of weights such that the total sum of loss functions converges to 0 and that neural networks equipped with such weights converge to the analytic solutions. In Section 4, we provide our numerical simulations and the result for each initial and boundary condition. Several plots will show the actual values of the distribution function at each time and spatial variable and will provide the asymptotic behaviors of macroscopic quantities as well as the actual values of the distribution. Finally, in Section 5, we complete the paper by summarizing our methods and the results.

In this section, we introduce our DNN structure and method to solve the Cauchy problem to the 1-dimensional kinetic Fokker-Planck equation (1.2).

2.1. Our Deep Learning algorithm and the architecture. A Deep Learning algorithm is a non-linear function approximation method via a DNN structure. A DNN consists of several layers, and each layer has several neurons which are connected to pre-and post-layer neurons. Connected neurons have a relation with an affine transformation and a non-linear activation function. We denote the approximated function as f nn (t, x, v; m, w, b) and we suppose our DNN has L layers; in other words, our DNN has an input layer, L − 1 hidden layers, and an output layer. The input layer takes (t, x, v) as input and the final layer gives f nn (t, x, v; m, w, b) as the output. We denote the relation between the l-th layer and the (l + 1)-th layer (l = 1, 2, ..., L − 1) as

, and • z l i : the i-th neuron in the l-th layer •σ l : the activation function in the l-th layer • w (l+1) ji : the weight between the i-th neuron in the l-th layer and the j-th neuron in the (l + 1)-th layer

: the bias of the j-th neuron in the (l + 1)-th layer • m l : the number of neurons in the l-th layer.

Note that the relation between the input layer and the first-hidden layer is expressed as follows:

We use the library PyTorch and the hyper-tangent tanh activation function for our fully connected DNN. Our DNN has four layers which each layer has 3-128-256-128-2 neurons as shown in the following figure: Regarding the optimization algorithm, we use Adam optimization algorithm, which is an extended algorithm of the stochastic gradient descent and is heavily used in the applications of the deep learning. We use a powerful technique called the automatic differentiation (AD), which computes the derivatives of the neural network output with respect to its input coordinates. The AD is the one of the most practical tools in scientific computing that differ from the usual numerical differentiations (such as finite difference) or the symbolic differentiations (expression manipulation). Baydin et al. [4] explains that "all numerical computations are ultimately compositions of a finite set of elementary operations for which derivatives are known, and combining the derivatives of the constituent operations through the chain rule gives the derivative of the overall composition." This allows us to take the derivatives of any order of particular functions relatively easily. In particular, we use the Autograd in PyTorch package (Python library).

The reason that we make the second output is to approximate ∂ vv f nn (t, x, v; m, w, b) via the reduction of order technique. We approximate ∂ vv f nn (t, x, v; m, w, b) by

using the second output

contained in the loss term; see Section 2.3. It is also possible to get the value of ∂ vv f nn directly applying the AD method twice for f nn . However, we use the AD method only once for f nn and h nn by adding the one more output neuron and adding the loss function for h nn . This is because we realized that the modification reduces the computational costs dramatically than the direct one of using the second derivative. Also, the DNN architecture in Figure 1 shares the same weights for the output f and the derivatives of f at the same time, and this allows us to design and modify the DNN corresponding to the purpose.

To approximate the kinetic solution f (t, x, v) via the Deep Learning algorithm, we make the data of grid points for each variable domain. We choose T as 5 or 10 for each boundary condition and each collision coefficient.

We truncate the momentum space for the v variable as [−20, 20] , make the grid points for v in [−10, 10] for the training, and assume that f nn (t, x, v; m, w, b) is 0 if |v| > 10. More precisely, the grid points for the training are chosen uniformly as follows:

We use the grids

for the boundary condition.

In the algorithm, the Adam optimizer finds the optimal parameters w Firstly, we define a loss function for the governing equation (1.2). We use the reduction-of-order technique for the second-order term as follows:

where V def = [−10, 10]. Then we define Loss GE as

where N i,j,k is the number of grid points. We now define the loss function for the initial condition via the use of the initial grid points as

The loss function for the inflow boundary condition in Section 1.4 is defined as follows:

For the other types of the boundary conditions from Section 1.4, we define the loss term similarly via altering g(t, x, v; m, w, b) and g(t i , x, v k ; m, w, b). One of the well-known a priori conservation law for the Fokker-Planck equation (1.2) for the specular, the periodic and the diffusive boundary conditions is the conservation of mass, in which we have that the L 1

x,v moment of f nn (t, ·, ·; m, w, b) L 1

x,v propagates in time. Therefore, the reduction of the loss term (2.1) would result in the reduction of the loss term that is defined for the conservation of mass. Therefore, without loss of generality, we add one more loss term with repect to the conservation of mass for more accurate analysis when impose the three boundary conditions for each t i :

Then we define the whole Loss Mass as

Finally, we define the total loss as 

3.1. On the convergence of DNN solutions to analytic solutions. In this section, we prove that there exists a sequence of weights such that the total sum of loss functions, defined later as (2.5), converges to 0. Sequentially, we also prove that neural networks equipped with such weights converge to an analytic solution.

Throughout the section, we assume that the existence and the uniqueness of solutions for (1.2) and (1.3) with either (1.11) or (1.7) are a priori given. We first introduce the following definition and the theorem from [50] on the existence of the approximated neural network solution:

there is an open Ω (depending on f ) such that K ⊂ Ω and f ∈ C m (Ω).

Theorem 3.2 (Li, Theorem 2.1, [50] ). Let K be a compact subset of R n , n ≥ 1, and f ∈ C m1 (K) ∩ C m2 (K) ∩ · · · C mq (K), where m i ∈ Z n + for 1 ≤ i ≤ q. Also, letσ be any non-polynomial function in C l (R), where l = max{|m i | : 1 ≤ i ≤ q}. Then for any ε > 0, there is a network

where c i ∈ R, w i ∈ R n , and b i ∈ R, 0 ≤ i ≤ ν such that

One may generalize the result above to the one with several hidden layers (see, [38] ). Also, we may assume that the architecture is assumed to have only one hidden layer; i.e., L = 2. Now we introduce our first main theorem which states that a sequence of neural network solutions that makes the total loss term converge to zero exists if a C (1, 1, 2) solution to the equation exists: 

Proof. Let > 0 be given. By Theorem 3.2, there exists a neural network

[j],i2 , w

[j],i ,

|f j (0, ·, ·) − f 0 | 2 over [0, T ]×[−1, 1]×V , [−1, 1]×V , respectively, we obtain that the loss terms (2.1) and (2.2) are bounded by (1+µ(V )+σ +βµ(V ))T 2 µ(V ) 2 ε and 2µ(V ) 2 ε, where µ(·) is the Lebesgue measure on R. Now we assume that the boundary condition satisfies (1.11). Then we note that all the boundary values are bounded by a supremum of interior values, since f and f j are sufficiently smooth. That is,

where γ ±,V denote the boundarie sets γ + and γ − with the v domain truncated to V . This completes the proof by setting ε = ε j = 1 j . 

[j],ik , respectively.

We also remark that Theorem 3.4 provides us that we can find the weight of the neural network that reduces the error function as much as we want. However, this does not guarantee that the neural network could converge to the solution of the original equation when the loss function converges to zero, though we will also discuss how to find the weights in the forthcoming sections. Due to the reason, we introduce our second theorem, Theorem 3.6, which shows that the neural network architecture converges to an analytic solution in a suitable function space when the weights of the neural networks minimize Loss T otal . We prove it under the following additional condition at the boundary that is consistent to the truncation of the domain in v variable introduced in Section 2.2:

for some sufficiently small ε > 0 and k = 0, 1. 

where f is a solution to (1.2) and (1.3) with either (1.11) or (1.7), and C(σ, β) is a non-negative constant depending only on σ and β.

Proof. Motivated by [17] , we define a transformū(t, x, v) of a function u(t, x, v) as follows:

where λ is any non-negative constant that could control the convergence rate. Then the transformed functionf satisfies ∂ tf + e −βt (v · ∂ x )f − σe 2βt ∂ 2 vf + λf = 0. We now consider the following set of equations on the difference betweenf andf j for each fixed j as

where the interval e βt V is defined as

and γ + T,e βt V and γ − T,e βt V are defined as in (3.2) . Here, we define

for the inflow boundary condition (1.11) and

for the specular boundary condition (1.7) instead. Then we derive the inequality below by multiplying 2(f −f j ) onto (3.5) and integrating it over [−1, 1] × e βt V as let 2

where ·, · denotes the standard inner product on L 2 ([−1, 1] × e βt V ). On the lefthand side of (3.8), we note that

, by the Leibniz rule and

Therefore, we reduce (3.8) to

2 The measure dγ for the boundary integration´dγ is defined as´dγ def =´|nx · v|dSxdv where nx is the outward normal vector at the boundary and Sx is the Lebesgue measure on the boundary.

Multiplying (3.9) by e (2λ−1)t and integrating it over [0, t] for t < T , we have

Therefore, (3.10) and the inverse transform fromf to f imply that

where C(σ, β) = C 1 (β) + C 2 (σ, β). Since t ∈ [0, T ] and Loss T otal (f j ) → 0, this completes the proof of Theorem 3.6.

On the convergence rate of the kinetic energy. In this section, we would like to record the theoretical background on the convergence rate of the kinetic energy of the system, which will be observed via the neural network simulations as well in Section 4. This will be done only for the specular boundary condition (1.7) in this section. We define the kinetic energy functional of the solution f to the Fokker-Planck equation as follows:

Then we can rewrite the balance of kinetic energy (1.15) as follows:

where M = f 0 (·, ·) L 1 x,v . Then the first term on the right-hand side of (3.11) is equal to

Under the specular boundary conditon (1.7), both two terms on the right-hand side of (3.12) is 0, sincê = 1, v) is also 0 in a similar manner. Therefore, we can rewrite (3.11) as

Therefore, we obtain 

In this section, we introduce the results of our neural network simulations for the DNN solution f nn (t, x, v; m, w, b) in three different ways. We first simulate our neural network algorithms for the varied boundary conditions, a given initial condition and the fixed values of σ, β coefficients. In the cases of the specular reflection, the diffusive reflection, and the periodic boundaries, we expect that the Lyapunov functional (also called as the free energy or the relative entropy) satisfies η ≤ 0, which is a manifestation of the second law of thermodynamics. Then, we alter the initial conditions for the specular boundary condition and for the fixed values of σ, β coefficients. Lastly, we also obtain the results via altering the coefficients σ and β for a given initial condition and the specular boundary condition.

We analyze our neural network solution f nn (t, x, v; m, w, b) via computing the pointwise values at each grid-point, observing the L ∞ norm of the solution and via computing the physical quantities of the total mass, the kinetic energy, the entropy, and the free energy. We compute these quantities via the approximations of the integration by the Riemann sum on the grid points, similarly to the previous subsection. For example, L 1 norm of f nn (t, x, v; m, w, b) for each time t can be approximated asˆ(

We also compare our results with the existing thoretical results (c.f. [20] ). Especially for the specular (1.7) and the diffusive (1.8) boundary conditions which conserve the total mass, Bonilla-Carrillo-Soler [20] gives the form of the steady-state as given in (1.12) . The expected limiting values of kinetic energy, entropy, and free energy based on the global equilibrium [68] are defined as follows:

where M = f 0 (·, ·) L 1

x,v and C = |Ω| = 2 for our case. Also, we calculate the entropy as the integral of −f log(f + 10 −10 ) to prevent from the divergence when f (t, x, v) = 0.

We predict the pointwise values of the distribution functions f nn for each spatial variable x in [−1, 1] and the velocity variable v in [−10, 10] corresponding to the varied values of time variable t in [0, T ]. Namely, we observe the long-time behavior of neural network solution f nn (t, x, v; m, w, b) and check if the solution converges to Maxwellian (1.12) for the reflection-type boundaries. Note that minimizing the total loss does not guarantee a positivity of f nn . Therefore, we also truncate f nn if f nn is sufficiently small (< 0.005).

Results by varying the boundary conditions. In this section, we impose different types of the boundary conditions that we introduced in Section 1.4. Throughout this section, we set the values of the coefficients σ and β to be 1, and we consider the following cake-shaped initial condition for the varied boundary conditions:

We would first like to introduce the behavior of the approximated solutions under the absorbing boundary condition in the following section. 4.1.1. The absorbing boundary condition. By imposing the absorbing boundary condition (1.10), we would like to understand the dynamics of particles whose momenta vanish when they collide against the boundary. Figure 2 shows the L ∞ norm, total mass, total kinetic energy, and the entropy of our neural network solution f nn (t, x, v; m, w, b) under the absorbing boundary condition. It has been shown in [40] that the L ∞ norm and the total mass of the solutions f (t, x, v) to the kinetic Fokker-Planck equation decay exponentially as the time variable t goes to infinity when the friction coefficient β is zero. In our case with the diffusion coefficient σ = 1 and the friction coefficient β = 1, the total mass decays in time as in Figure 2 , while the L ∞ norm decays after an initial mixing and climbing. Figure 2 . The time-asymptotic behavior of the L ∞ norm and the macroscopic quantities of f nn (t, x, v; m, w, b) with the absorbing boundary condition. All quantities converge to zero as all the particles vanish once they reach the boundary.

Also, we can observe that our solution f nn (t, x, v; m, w, b) converges pointwisely to 0, as shown in Figure 3 . We remark that Figure 3 shows each pointwise value of the neural network solution, and this definitely gives more information than just L 1 and L ∞ values of the solution. The pointwise values of f nn (t, x, v; m, w, b) as t varies at each x's for the absorbing boundary case. x = −1 stands for the boundary point, and x = −0.5 or = 0 are the points away from the boundary. We omit the graphs for x > 0 as we have obtained the symmetric graphs to x < 0. 4.1.2. Inflow boundary condition. We now move onto the next boundary condition, the inflow boundary condition, with which we consider the situation that each boundary bounces particles in a given rate of the given function g(t, x, v). We compare the pointwise values of the neural network solutions under three different inflow boundary conditions. The three different inflow boundary conditions are:

where 1 S is the characteristic function on a set S.

In Figure 4 , the plot for the third type of inflow boundary condition (4.7) has the similar pointwise values of f nn (t, x, v; m, w, b) to that of the first inflow boundary condition (4.5), but it behaves like or converges to that of the absorbing boundary NEURAL NETWORK SOLUTIONS TO THE KINETIC FOKKER-PLANCK EQUATION 21 condition over time, as shown in the cyan-blue line. This converges pointwisely to 0 after 10 time grids. On the other hand, we can observe that the other two types (4.5) and (4.6) converge to some distorted Maxwellian-like shapes at each spatial position. Also, we can observe the smoothing effect of the kinetic Fokker-Planck equation in the interior domain at x = −0.5, 0, and 0.5. At the boundaries x = −1 and x = 1, we observe the jump discontinuities of the momentum derivatives as expected due to the fixed inflow profiles. The three boundary conditions which conserve the total mass. Now we would like to impose other types of the boundary conditions under which the total mass of the system is conserved. Namely, we will look at the dynamics of the particles under the specular boundary condition (1.7), the periodic boundary condition (1.9), and the diffusive boundary condition (1.8) .

We first remark that we obtain the similar behavior of the L ∞ and L 1 norms of the solution in the three types of the boundary conditions. Though the convergence rates are slightly different, we could possibly say that the kinetic energies and the entropies of the three different cases converge to the same values in the red dotted line in Figure 5 . In the perspective of the pointwise values of the neural network solutions, Figure  6 shows that the three cases converge pointwisely to the same steady-state solution; i.e., they converge to the global Maxwellian solution (1.12) . It is remarkable that the shape of the convergence to the global Maxwellian is slightly different in the diffusive boundary case from the other two conditions at the boundary point x = −1. At the boundary x = −1, the periodic and the specular cases converge to the Maxwellian via the superposition of two waves from the left-and the right-hand sides at the same rate and hence via the creation of M-shaped plots. On the other hand, at the boundary x = −1, the solution in the case of the diffusive boundary condition converges to the global Maxwellian from the left to the right, as the diffusive boundary condition at x = −1 emits the Maxwellian-shaped values from the left to the right.

Results by varying the initial conditions. In this section, we vary the initial condition for the fixed coefficients σ = β = 1 and the specular boundary condition. We impose the initial conditions as

, if x ∈ (−0.9, 0.9) and v ∈ (−10, 10) 0, otherwise.

In other words, we impose a M-shaped initial condition with the values compiled at large values in v and a highly-oscillating initial condition near v = 0. Figure 7 shows the pointwise values of f nn (t, x, v; m, w, b) with the two initial conditions. We first note that the M-shaped distribution becomes the Maxwellian over time. Though the initial condition (4.9) is highly singular in its derivatives, NEURAL NETWORK SOLUTIONS TO THE KINETIC FOKKER-PLANCK EQUATION 23 Figure 6 . The pointwise values of f nn (t, x, v; m, w, b) as t varies at each x's for the specular, the periodic, and the diffusive boundary cases in different colors as shown in the legend. x = −1 stands for the boundary point, and x = −0.5 or = 0 are the points away from the boundary. We omit the graphs for x > 0 as we have obtained the symmetric graphs to x < 0. Figure 7 shows that f nn (t, x, v; m, w, b) is being regularized and converges to the Maxwellian over time. We remark that the two initial conditions result in two different Maxwellians and this difference comes from the different total masses of the two initial conditions. 4.3. Results by varying the diffusion and the friction coefficients. In this setion, we alter the two coefficients σ and β of the Fokker-Planck equation; the coefficients σ and β are related to the diffusion and the friction rates, respectively. Throughout the section, we fix the initial condition (4.4) and the specular boundary condition (1.7).

We mention that, throughtout the section, we plot the values of the kinetic energy (4.1) and the entropy (4.3) of the global Maxwellian solution for each value Figure 7 . The pointwise values of f nn (t, x, v; m, w, b) as t varies at each x's for the specular boundary case with the special conditions drawn in different colors as shown in the legend. x = −1 stands for the boundary point, and x = −0.5 or = 0 are the points away from the boundary. We omit the graphs for x > 0 as we have obtained the symmetric graphs to x < 0.

of σ and β in the red dotted lines. In the following three sub-sections, we alter the values of β with a fixed σ, alter the values of σ with a fixed β, and alter both σ and β by keeping the ratio σ β being the same. 4.3.1. Different values of β. Throughout the section, we set σ = 1 and let β = 2, 1, 0.5, and 0.25. Figure 8 shows that the kinetic energy and the entropy converge to the values of the red dotted lines of the expected steady-states for each different value of β. For example, the value of the kinetic energy for the steady-state with the coefficients σ = 1 and β = 0.5 is KE ∞ = σM 2β = 7.2 by (4.1), and the cyan-blue colored graph in the third plot of Figure 8 converges to the red dotted line of the value 7.2. In addition, the graph for the L ∞ norms shows that the L ∞ values after a sufficiently large time become larger as β gets larger and this agrees the expected L ∞ values of the steady-state by (1.12) .

We remark that the rate of the convergence of the kinetic energy and the entropy for each different β depends on the values of β; the solution f nn (t, x, v; m, w, b) with a larger value of the friction coefficient β converged rapidly. This is also consistent to our theoretical supports provided in Section 3.2. Figure 10 shows that the physical quantities converge to those of the steady-state. Theoretically, the steady-state solution with the smaller diffusion coefficient σ makes the smaller variance in the Maxwellian 1.12 with the same mean 0, and Figure 11 agrees with the theory.

Different values of σ and β in the same ratio σ β . In this section, we vary both coefficients by keeping the ratio σ β the same. We set σ β = 1 2 and vary the coefficients σ and β as σ = 1, 0.5, 0.25, 0.05 and β = 2σ. The convergence to the same physical quantities of the same steady-state in Figure 12 agrees with the theory, as the steady-state (1.12) is determined by the fraction of the coefficient σ β . The figure shows that the coefficient σ = 0.05 is too small to make the quantities converge within 5 time-grids. The difference between the four cases is how fast the kinetic energy and the entropy of DNN solutions converge to those of the steady-state. A larger value of β results in the faster convergence though the fractions of the coefficients σ β are the same. Figure 13 shows the different rate of the convergence to the steady-state; i.e., the four graphs in each plot converge to the same maxwellian but at different rates.

This paper presents an approximation of the 1D Fokker-Planck equation solution with the inflow-type or the reflection-type boundary conditions. We first provide a proof for the existence of a sequence of DNN solutions such that the total loss, defined in terms of the difference between the C (1,1,2) -type solutions and the neural network solutions, converges to zero. We then provide a proof for the L 2

x,v convergence of the sequence of neural network solutions to the actual C (1,1,2) -type solutions to the original Fokker-Planck equation in a bounded interval, in the case Figure 9 . The pointwise values of f nn (t, x, v; m, w, b) as t varies at each x's for the specular boundary case, where σ is fixed to be = 1 and β varies as shown in different colors as in the legend. x = −1 stands for the boundary point, and x = −0.5 or = 0 are the points away from the boundary. We omit the graphs for x > 0 as we have obtained the symmetric graphs to x < 0. Figure 10 . The time-asymptotic behaviors of the L ∞ norms and the macroscopic quantities of f nn (t, x, v; m, w, b) for the specular boundary case as σ varies with the fixed β = 1. The steady-state values of the kinetic energy (4.1) and the entropy (4.3) are also given via the red-dotted lines for each different σ. It is notable that the free energy (Lyapunov functional) is monotonically decreasing. Figure 11 . The pointwise values of f nn (t, x, v; m, w, b) as t varies at each x's for the specular boundary case, where β is fixed to be = 1 and σ varies as shown in different colors as in the legend. x = −1 stands for the boundary point, and x = −0.5 or = 0 are the points away from the boundary. We omit the graphs for x > 0 as we have obtained the symmetric graphs to x < 0. Figure 12 . The time-asymptotic behaviors of the L ∞ norms and the macroscopic quantities of f nn (t, x, v; m, w, b) for the specular boundary case as both β and σ vary by keeping the ratio σ β = 1 2 . The steady-state values of the kinetic energy (4.1) and the entropy (4.3) are also given via the red-dotted lines. It is notable that the free energy (Lyapunov functional) is monotonically decreasing. Figure 13 . The pointwise values of f nn (t, x, v; m, w, b) as t varies at each x's for the specular boundary case, where both σ and β vary with the ratio fixed as σ β = 1 2 . The plots are shown in different colors as in the legend. x = −1 stands for the boundary point, and x = −0.5 or = 0 are the points away from the boundary. We omit the graphs for x > 0 as we have obtained the symmetric graphs to x < 0. that the total sum of the loss terms of our own goes to zero. In other words, it has been shown that we can reduce the predefined loss function via the appropriate selection of DNN model parameters. Also, via choosing the model parameters that reduce the loss function as desired, it has been shown that the neural network solutions converge to the analytic solution by obtaining an L 2 energy estimate based on the adaptation of the transformation of the function in the sense of [17] .

For the neural network simulations, our DNN algorithm uses the library PyTorch and the hyper-tangent tanh activation function between each layer. Regarding the optimization algorithm, we use Adam optimizer which is an extended algorithm of the stochastic gradient descent in the process of minimizing the actual loss in the case of the reflection-type boundary conditions. The reduction of the order and the weight sharing between f nn and ∂ v f nn dramatically reduced the learning cost. We remark that we also increased the learning efficiency of the model by adding the mass conservation law to the total loss function. In addition, we have numerically confirmed several theoretical predictions on the asymptotic behaviors of the solutions to the equation and the entropy production; i.e., we were able to provide the numerical simulations on the pointwise convergence of the neural network solutions to the global Maxwellian for varied types of the boundary conditions and the varied coefficients for the diffusion and the friction and observed that the free energy (the relative entropy) is monotonically decreasing.

Our method finds the parameters (weights and biases) which minimize the total loss function and the minimizer converges to an analytic solution. Also, we adopted an optimization method used for the practical exercise (a real-world problem) in the machine-learning community, but it does not provide an explicit way of finding the minimizer. Moreover, in the aspect of the optimization, it is difficult to find the exact global minimum due to the intrinsic limitation of the gradient-descent-based approaches though the number of parameters is small. More precisely, we were able to find that the minimized total-losses starting from different initial parameters have the similar scales of 10 −3 , and it is possible that different solutions that produce the minimized total-losses in the similar scale can be obtained.

In general, we think that the numerical schemes are more useful in the sense of the computational efficiency and that they could treat well-known error bound analysis very well. Nevertheless, we would like to remark that the DNN approach is a function-approximation approach rather than a numerical scheme, and hence it is not necessary to require some additional conditions such as the stability conditions. Also, we could directly compute the derivatives of a DNN solution via the AD, since the convergence criterion is a function-norm up to the regularity of the analytic solutions. In this sense, we can say that a DNN solution contains the mesh-free information on the geometries of the analytic solutions as well, and it is useful in computing some physical quantities. However, this method would require some guarantees on the existence and the uniqueness of the analytic solutions so that a DNN solution matches the analytic solution well.

The possible extensions of our work for the future include the generalizations to other various types of the kinetic equations. These include the multi-dimensional (fractional) Fokker-Planck equation, the coupled Vlasov equation such as the Vlasov-Poisson or the Vlasov-Maxwell systems. We expect that our work can further be extended to the generalized situations listed above via the adaptation of the sampling of random grids to our learning algorithm so the total cost for the learning process of the DNN algorithm is being effectively reduced while the total loss function is being properly reduced at the same time. Also, our approach could apply to the parameter-optimization problems that arise from the real world. For example, Jo et al. [45] computes the time-varying system parameters of the SIR model using the COVID-19 data. They considered the system parameters as DNN and added an observation loss term in the loss function. Therefore, the above method can enable us to calculate the model solution and the parameters at the same time.

A computational investigation of the random particle method for numerical solution of the kinetic vlasov-poisson-fokker-planck equations

Low and high field scaling limits for the Vlasov-and Wigner-Poisson-Fokker-Planck systems

The Wigner-Fokker-Planck equation: stationary states and large time behavior

Automatic differentiation in machine learning: a survey

Conservative finite-difference schemes for the Fokker-Planck equation not violating the law of an increasing entropy

A unified deep artificial neural network approach to partial differential equations in complex geometries

Completely conservative difference schemes for nonlinear kinetic equations of Landau (Fokker-Planck) type, Akad

Asymptotic behavior of an initial-boundary value problem for the Vlasov-Poisson-Fokker-Planck system

Existence and uniqueness of a global smooth solution for the vlasovpoisson-fokker-planck system in three dimensions

Smoothing effect for the non-linear vlasov-poisson-fokker-planck system

On long time asymptotics of the vlasov-fokkerplanck equation and of the vlasov-poisson-fokker-planck system with coulombic and newtonian potentials

Existence and uniqueness of solutions to fokker-planck type equations with irregular coefficients

Conservative and entropy decaying numerical scheme for the isotropic Fokker-Planck-Landau equation

Numerical analysis of conservative and entropy schemes for the Fokker-Planck-Landau equation

Fast algorithms for numerical, conservative, and entropy approximations of the Fokker-Planck-Landau equation

Exponential convergence toward equilibrium for homogeneous Fokker-Planck-type equations

Global weak solutions for the initial-boundary-value problems vlasovpoisson-fokker-planck system

Global classical solutions close to equilibrium to the Vlasov-Fokker-Planck-Euler system

On the initial value problem for the Vlasov-Poisson-Fokker-Planck system with initial data in L p spaces

Asymptotic behaviour and self-similarity for the three-dimensional Vlasov-Poisson-Fokker-Planck system

An implicit energy-conservative 2D Fokker-Planck algorithm. I. Difference scheme

The integration of the vlasov equation in configuration space

The stone-weierstrass theorem and its application to neural networks

A conservative and entropic method for the Vlasov-Fokker-Planck-Landau equation, Numerical methods for hyperbolic and kinetic problems

Approximation by superpositions of a sigmoidal function

Global existence of smooth solutions for the vlasov-fokker-planck equation in 1 and 2 space dimensions

An entropy scheme for the Fokker-Planck collision operator of plasma kinetic theory

On the trend to global equilibrium in spatially inhomogeneous entropy-dissipating systems: The linear fokker-planck equation

On the fokker-planck-boltzmann equation

The fokker-planck equation with absorbing boundary

Numerical solution of the non homogeneous Fokker-Planck-Landau equation

A numerical method for the accurate solution of the fokker-planck-landau equation in the nonhomogeneous case

A numerical method for the accurate solution of the Fokker-Planck-Landau equation in the nonhomogeneous case

Global weak solutions for Kolmogorov-Vicsek type equations with orientational interactions

Solving high-dimensional partial differential equations using deep learning

The numerical analysis of random particle methods applied to vlasov-poisson fokker-planck kinetic equations

On deterministic particle methods for solving vlasov-poisson-fokker-planck systems

Multilayer feedforward networks are universal approximators

The fokker-planck equation with absorbing boundary conditions in bounded domains

The fokker-planck equation with absorbing boundary conditions, Archive for Rational Mechanics and

On the fokker-planck equations with inflow boundary conditions

Numerical solution of elliptic partial differential equation using radial basis function neural networks

Hypocoercivity and uniform regularity for the Vlasov-Poisson-Fokker-Planck system with uncertainty and multiple scales

Deep neural network approach to forward-inverse problems

Analysis of covid-19 spread in south korea using the sir model with time-dependent parameters and deep learning

Aristidis Likas, and Dimitrios I Fotiadis, Artificial neural networks for solving ordinary and partial differential equations

Neural-network methods for boundary value problems with irregular boundaries

Discretization methods for one-dimensional Fokker-Planck operators

Multipole expansions for the Fokker-Planck-Landau operator

Simultaneous approximations of multivariate functions and their derivatives by neural networks with one hidden layer

Radon measures solving the cauchy problem of the nonlinear transport equation

A logical calculus of the ideas immanent in nervous activity

Kinetic equations with maxwell boundary conditions

On the vlasov-fokker-planck equation

Fast spectral methods for the fokker-plancklandau collision operator

The completely conservative difference schemes for the nonlinear Landau-Fokker-Planck equation

On the fokker-planck equation with force term

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

Data-driven solutions of nonlinear partial differential equations

A difference scheme for the vlasov-poisson-fokker-planck system

Convergence of a difference scheme for the vlasov-poisson-fokker-planck system in one dimension

Well-posedness of the fokker-planck equation in a scattering process

Neural network augmented waveequation simulation

Dgm: A deep learning algorithm for solving partial differential equations

On classical solutions of vlasov-poisson fokkerplanck systems

General solutions for nonlinear differential equations: a rule-based self-learning approach using deep reinforcement learning

Numerical approximation of the vlasov-poissonfokker-planck system in one dimension

A deterministic particle method for the vlasov-fokker-planck equation in one dimension

Numerical approximation of the vlasov-poisson-fokker-planck system in two dimensions

The Vlasov-Poisson-Fokker-Planck system with uncertainty and a one-dimensional asymptotic preserving method

☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work

☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests