key: cord-0224387-3ow4qmou
authors: Palmer, Aaron Zeff
title: The role of information in high dimensional stochastic optimal control
date: 2021-05-12
journal: nan
DOI: nan
sha: bb990bcb0ab95eef22595f2af5698f0072be2e28
doc_id: 224387
cord_uid: 3ow4qmou

The stochastic optimal control of many agents minimizes a cost aggregated over the agents. We investigate the problem of partial information, where the state of each agent is not known and the control must be decided based on noisy observations. This results in a high dimensional controlled Markov process that is impractical to handle directly. In the limit as the number of agents approaches infinity, a finite dimensional mean field optimal control problem emerges, where the dependence on the available information vanishes. In this work, we calculate the Gaussian fluctuations about the mean field optimal control, which incorporates the available observations. The method we establish uses an approximate Kalman filter on the fluctuations about the mean field solution. It is straightforward to compute, even when the number of states is large. We consider an example of an epidemic model with observation of positive tests, as well as simple two state model that exhibits a phase transition at which point the fluctuations diverge.

The role of information in stochastic control has been central to many applications. The development of the Kalman filtering technique [20] , [19] has been employed in aeronautics, navigation, economics, etc. It provides the well known solution to the linear quadratic Gaussian (LQG) control problem, which is a basis of this study.

In economics, while individual behavior can be impossible to predict, systems comprised with many individuals tend to develop recognizable behaviors. We note a couple of highly influential works, amid a vast research field. The role of information was recognized in [26] as part of critique that a socialist economy cannot adapt to individual needs as the information on the individual is lost, albeit the argument is made without mathematical formalism. Later in [15] , [16] imperfect information is shown to give rise to inefficient free markets using mathematical models. In this work we study the optimal (rational) usage of information, for example by a government maximizing the well-being of the population or by a private enterprise maximizing their profit. We present a solution that both closely approximates the optimal policy and may be computed using well established and computationally efficient techniques.

We consider a mathematical framework of finite state stochastic control with a parallel finite state observation process. We assume the problem involves a large number, N , of identical agents each in a fixed number of states. Adding additional states allows for further specification of the individual agents. In this framework, each individual produces observations at rates depending on their state. We consider a fixed number of control parameters to be chosen by a centralized planner. The costs are given as functions of the macroscopic state variables, which aggregate the individuals in each state. Various alternate and extended problems are possible, which we will briefly mention. Some socio-economic applications of finite state mean field games have been given in [14] .

Our first inquiry is in to the limit as the number of agents, N , approaches infinity. In this asymptotic limit, the (discrete) finite-state stochastic control problem approaches a (continuous) finite-dimensional nonlinear problem of (deterministic) optimal control. We call this problem the mean field limit. Although the N -agent state cannot be completely observed, the deterministic nature of the limiting control problem allows the macroscopic state to be predicted, and the optimal control can be computed independent of any observations. A similar remark been made for related problems in [5] , Remark 2.27, and in a context closely related to our own in [6] . Alternatively, mean field control problems where the information does not become negligent have been considered, for example in [17] where they develop the stochastic maximum principle approach and calculate the solution of linear quadratic example. The problem of rigorous computational approaches to such problems beyond linear quadratic appears to be unsolved.

We next study the fluctuations about the mean field optimal control limit. Such fluctuations in Markovian population models were obtained in [25] , [8] . For the fully observed controlled version, analysis of the fluctuations has been carried out in [6] for finite-state spaces, following a similar approach to continuous-state spaces in [9] where the master equation is used to find a feedback control form. We will point out here, that a full solution to the master equation is not needed and rather a localized analysis around a mean field trajectory is sufficient to understand the fluctuations. This localized analysis we carryout involves solving a linear quadratic Gaussian control (LQG) problem using the problem data approximated quadratically about a mean field solution. This approximation follows an approach similar to [4] . The solution to the LQG problem satisfies the separation principle, where the Kalman is used to estimate the state, and the dynamic programming principle readily provides a linear feedback control. Computation of these solutions involve solving decoupled forward and backward Ricatti type differential equations. Given these solutions, we can then express an approximate feedback control policy, which we show is accurate to the first order (N −1 ) correction of the cost.

We next analyze a couple of illuminating examples. The first example, is motivated by the Ising model of statistical physics and provides a glimpse of the behavior near a phase transition. In this example we see clearly that the fluctuations diverge at the critical point for the phase transition.

Our second example is to consider a simple epidemic model with observations corresponding to tests of individuals. This example showcases the computational and practical possibilities of our approach. While most of this paper is written for continuous time, it certainly applies with discrete time, which is used in this example.

We briefly mention a few extensions and related problems that may be of interest to keep in mind.

• While we consider 'global' controls with a centralized planner, it includes the case of individual controls, where each agent selects a control α ∈ R m with the same access to information.

Since agents are identical and have the same information, the optimal policy will have the same control for any agents in the same state, and thus we can reduce to global controls by considering the control in feedback form, α ∈ R l m where l denotes the number of states. • The game-theoretic problem where each player has the same information and is in a Nash equilibrium can be solved by a stochastic control problem as we consider here for the special case of potential games. Otherwise, the Nash equilibrium constraint introduces additional complications. While mathematically elegant, the assumption that each player behaves optimally is unlikely to hold in applications. See for instance [10] , where the behavior of financial investors with imperfect information is modeled based on experiments. • Market price is commonly considered in economic models by including a market-clearing constraint. The convergence of a finite-state model to its mean field limit has been considered in the recent work [12] . • Continuous-space problems present many interesting challenges. The problems we consider may come from a discretization of continuous space problems by 'coarse-graining' nearby particles into the same state and approximating the state transitions. Understanding the continuum limits can be a highly difficult problem. It has been discussed for mean field games in [9] , although including partial information has its own challenges.

We consider N identical particles (agents) with state σ i t ∈ {1, . . . , l} for i ∈ {1, . . . , N } and a global control α t ∈ R m . We let Σ N t ∈ R l denote the empirical measure, so that

We make the important simplifying assumption that all interactions are through this empirical distribution.

Each particle transitions from σ to γ at rate β(σ, γ, Σ, α), which will commonly be sparse. The

. This can be expressed by the equations

where each Y σ γ is an independent Poisson process with non-homogeneous rate

. There is always a unique (in law) solution Σ t , see [13] for details on constructing such processes. We will abbreviate the paths as

Remark 2.1. There are many possible generalizations of controlled processes. The simplest is to add the possible of births and deaths, which would cause no problem for our analysis and we omit simply to reduce the burden of notation. Inhomogeneity in time for β and L could also be easily handled.

A very interesting cause would be when β has additional dependence on N . This includes, for example, when the each particle has a given position on a lattice, x N,i ∈ T 2 , and interactions take place with nearest neighbors on the lattice. This may result in much more complicated phenomena and will be partly explored by the author in a related work.

We consider also cumulative measurements Υ N t ∈ Rl, where for each particle, Υ N,υ t → Υ N,υ t + N −1 at rateβ(σ, υ, Σ) and thus the dynamics of Υ N can be expressed as

where again eachỸ σ υ t is an independent non-homogeneous Poisson process of rate N Σ σ tβ (σ, υ, Σ t ). We let (G N ) = (G N t ) t∈[0,T ] be the natural filtration of (Υ N ), and (F N ) be the natural filtration of (Σ N , Υ N ) so that G N t ⊂ F N t . The finite time horizon problem is to select (α N t ) t∈[0,T ) to minimize

over control policies dependent on the history of measurements, α N t =α N t (Υ N s ) s∈[0,t] for a measurable sequence of functions (α N ). Equivalently, (α N ) is progressively measurable with respect to the filtration (G N ).

We consider the mean behavior and the fluctuations about the mean. To anticipate these results it is useful to consider the Doob-Meyer decomposition of Σ N t , which takes the form

where the expected drift is given by

and the covariation of the martingale term, (M N ), sums the squared jumps so that for 0 ≤ r ≤ t,

where is an l by l(l − 1) matrix with entries, supposing the column ν corresponds to the transition from σ to γ = σ,

or for the matrix product

Four different representations of the martingale term for similar Markovian queue models are given in [24] , along with the proof of a central limit governing the fluctuations about a high intensity limit.

Illustrative examples of the problem we study are given and worked out in Section 5.

The mean field approximation ignores the stochastic martingale term of the Doob-Meyer decomposition (4) and recovers the limiting behavior as N → ∞. In the mean field problem, the state Σ becomes a deterministic trajectory S, the control α becomes A, and the transition rates are replaced by a drift b(S, A) from (5). The resulting dynamics are necessarily nonlinear in (S, A).

The mean field problem considers (weak) solutions to the dynamics

The following theorem is a basic result for the mean field theory of stochastic optimization problems. We note that the state Σ naturally takes values in the probability simplex, which we denote by ∆ l ⊂ R l . We make the following assumptions on the problem data:

A1 We assume that β, L, are jointly continuous in Σ and α. We assume that L is uniformly bounded below and β is uniformly bounded. A2 We assume that L is coercive in the sense that {α; L(Σ, α) ≤ M } is compact for each Σ ∈ ∆ l and M ∈ R. A3 We assume the standard convexity condition that for each Σ ∈ ∆ l , the set

is linear (as in our examples) this assumption reduces to that α → L(Σ, α) is convex. We use the notation X N → d X to denote convergence in distribution for the random variable, i.e., weak convergence. In Theorem 3.1, weak convergence is equivalent to convergence in probability due to the fact that the limits are deterministic. When considering the role of information in the fluctuations, we will consider the extended (adapted) weak convergence with respect to the filtration (G N ) but that is not needed here.

There are at least three approaches to analyzing the mean field limit of the N -player problem, from which we will borrow some results.

• The Γ-convergence approach. Our main arguments are inspired by this approach, but we do not attempt to show the full Γ-convergence. Indeed, it seems that convergence of the control policy is too much to expect in all cases. • The approach of Young measures. We use Young measures as a compactification of the control policies. Assumption A3 enables us to pass from a Young measure back to a measurable control. • The approach by Hamilton-Jacobi equation. Since we are interested in applications for high dimensional problems the Hamilton-Jacobi equation does not provide a very practical approach and also cannot easily incorporate partial information. However, when we address the fluctuations in Section 4 we will require estimates that closely parallel results from the Hamilton-Jacobi equations.

Theorem 3.1. We assume A1, A2, A3 and suppose that Σ N 0 → d S 0 ∈ ∆ l . If the mean field problem has unique minimizer, (S * ) and (A * ), and

Furthermore,α N t = A * t determines an approximate optimal control for the N -player problem as

Proof. We let Ω denote the Skorokhod space of càdlàg paths from [0, T ] into ∆ l × Rl. We first obtain tightness for the distributions of (Σ N , Υ N ), which follows from A1 by considering that (Σ N,σ ) and (Υ N,υ ) decompose as the difference of increasing processes that correspond to the transitions into and out of the state. We next compactify the control (α N ) in the space of measures,

Assumption A2 and boundedness of the cost implies tightness for these measures (µ N ). We let µ N : Ω×[0, T ] → P(R m ) denote the disintegration of the measure with respect to the distribution of (Σ N , Υ N ). For every subsequence

Using A1 and the disintegrationμ of the limit measure µ with respect to the distribution of (Σ, Υ), we have that

holds almost surely, and

By A3 and the Kuratowski-Ryll-Nardzewski measurable selection theorem, we find a random measurable control policy (α) :

hold almost surely. (In the case when α → b(Σ, α) is linear and α → L(Σ, α) is convex, this is given simply by the barycenter, α t = R m aμ t (da).) We thus have the lower semi-continuous estimate

and (Σ), (α) is a (randomized) weak solution to the mean field dynamics (7).

We now consider the optimizer of the mean field problem, (S * ) and (A * ). Since, (A * ) is deterministic, we can directly useα N = A * as a policy for the N player problem with cost J[(Σ N ), (α N )]. Note thatα N makes no use of information. We now observe, using tightness and continuity as above, that (Σ N ) → d (S * ) as N → ∞ and lim N →∞

which proves the final statement of the theorem.

We now also infer optimality of limit of optimizers, (Σ), because

implies that equality holds almost everywhere and thus (Σ) = (S * ) almost surely. Since every subsequence of (Σ N ) has a sub-sub-sequence converging, (

Having obtained a minimizer of the mean field problem (8) we consider the first order optimality criteria. We define the Hamiltonian to be

3.1. Optimality Criterion. The Pontryagin maximum principle states, that given an optimal trajectory (S * ), (A * ), we let (P * ) solve co-state equation, with S * 0 = S 0 , P * T = −∇G(S * T ), so that (S * ) and (P * ) solve

. The operators D P and D S are the partial derivatives with respect to the costate and state, respectively. We then have that at points of continuity of S * and P * ,

Assuming some smoothness, the co-state provides the first variation of the cost. A variation of the position at time t, S t → S t + δ ξ t yields a change in the optimal cost of −δ ξ t · P * t + o(δ). While (10) and (11) provide necessary conditions, if a unique solution exists with minimum cost then Theorem 3.1 implies this is the mean field limit of the N -agent problem.

The mean ovservations are easily recovered by 

We now want to improve the mean field approximation to capture the Gaussian fluctuations of the limit. We assume (S * ) and (A * ) are the unique minimizers of the mean field problem and (P * ) is the co-state that solves (10) . We also let (U * ) be the mean observations that solve (12) . We will show that the state, observations, and control satisfy

u * t and a * t have mean zero, and minimize a linear quadratic Gaussian problem. 4.1. Linear-Quadratic-Gaussian Approximation. The linear approximation of the dynamics for the state and observations are

We have used D to represent the gradient operator, such that Db is a linear transformation mapping R l × R m → R l and Db : R l → Rl. The diffusion coefficient is an l by l(l − 1) matrix that captures the quadratic variation of the process see (6) andW are l(l − 1) independent Brownian motions. Specifically, when γ = σ σ(σ,γ) (S, A) = − β(σ, γ, S, A) S σ , and σ(γ,σ) (S, A) = β(γ, σ, S, A) S γ .

Similarly,˜ υ(υ,σ) (S, A) = β (σ, υ, S, A) S σ .

The cost will have the form

, where the correction to the cost from the fluctuations is (dependence on S * and A * is implicit)

The control (a) will now be restricted to depend only on the linearized observations, (u). The problem to minimize LQG (s), (a) with such a partial information constraint is well known [19] , [18] . In order to express the solution, we define the following

. Now we can express the linear quadratic cost in a standard form:

where θ t is a martingale process with infinitesimal covariance given by Θ(S * t , A * t ). Furthermore, we impose the information constraint that a t =â t ((u s ) s∈[0,t] ) , where the linearized observations are given by du t =Ẽ t s t dt + dθ t ,

with u 0 = 0 andθ t is a martingale process with infinitesimal covarianceΘ(S * t , A * t ). The first step to solve the LQG problem is to compute the estimatorŝ t = E s t | (u s ) s∈[0,t] . The separation theorem states that computing the estimator decouples from determining the control problem, and the optimal control is given as the linear feedback optimizer of the full information LQ problem substituting inŝ.

We compute the covariance matrix Π t = cov(s t −ŝ t ) by solving the forward matrix Ricatti differential equation

with Π 0 = cov(s 0 ). Importantly, while certainly the covariance of s may depend on the control, the covariance of the estimation error difference does not. Equation (16) always has a solution on [0, T ] because Θ t ≥ 0 so the semidefinite inequality, Π t ≥ 0, is maintained.

We get the estimatorŝ t given control (a) by solving

For the optimal control, we solve the backwards Ricatti equation

The optimal control is then given by

The optimal cost-to-go is then given by

Equation (17) may not have solution on [0, T ], depending on the data. We will see later in Section 5.1 how this might break down.

Central Limit Theorem. We will now address the convergence to the solution of the linear quadratic approximation. We make the additional assumption of regularity for the mean field solution and problem data:

A4 We suppose that (S * Lemma 4.1. We assume A1 and A4. Suppose that (Σ N ) and (α N ) satisfy (1), and (S * ), (P * ), and (A * ) satisfy (10) and (11) . Then we have

Proof. The proof is a direct calculation using the Taylor expansion of the cost, the optimality criteria (10) and (11), and an expansion of the drift in the dynamics of (1).

The terms of (I 2 ) are part of our conclusion, and to handle the term (I 1 ) we have

We now use (11) to equate

and we use the Doob-Meyer decomposition (4) to rewrite under the expectation

We obtain

Before addressing the convergence problem we consider the approximate solutions given by the linear quadratic problem. This result also shows how the information is incorporated into an asymptotically optimal approximate control. Proposition 4.2. We assume A1 -A4, (S * , A * ) is the global minimum of the mean field problem, R t > 0, and a solution to (17) exists on [0, T ]. We suppose that

We define the approximate Kalman filter and control by

Then, withΣ N that solves (1), the asymptotic cost for this approximate optimal control is given by

Proof. Clearly, there exists unique solutions locally for (16) , and if we show this solution is bounded then it exists for all [0, T ]. We bound Π from above by the solution of

and Π is bounded from below by 0, thus the unique solution to (16) exists. Itô's lemma implies that d dt E |Σ N We now state and prove our main result on the fluctuations of the finite N -agent problem. Weak convergence is not sufficient to justify the preservation of information constraint, i.e. α t = α t (Υ s ) s∈[0,T ] . Instead, we must consider the extended weak convergence with respect to the filtration generated by (Υ s ) s∈[0,T ] from [2] , [1] . We enlarge the probability space from Ω = D([0, T ]; R l × Rl) toΩ = D [0, T ]; P(Ω) .

Given the filtration (G N ), the filtration process, with Z N t ∈ P(Ω), is defined as a stochastic process on Ω so that for f ∈ C b (Ω),

The extended weak convergence is that of the weak convergence for the law of (Z N ) onΩ. In order to prove the information constraint is preserved in a weak limit, we will lift the compactification of the space of controls that we used in (9) to Markovian control policy on the extended probability spaceΩ. This approach has been developed in [11] and implemented for the 'closed-loop' mean field convergence problem in [21] . The notion of extended weak convergence we use is similar to that of weak convergence of the filtration [7] , i.e., that (G N ) converge in a weak sense to (G). Extended weak convergence has had success in the analysis of optimal stopping as well as backward stochastic differential equations, and has been revisited recently in [3] where it is to coincide with other definitions of an 'adapted' weak topology when the time is discrete. In particular, the adapted Wasserstein distance minimizes the expected distance between 'causal couplings' of probability measures on Ω, and provides a natural metric for the extended weak convergence topology. Theorem 4.3. We assume A1 -A4, (S * , A * ) is the global minimum of the mean field problem, R t > 0, and (17) has a solution on [0, T ]. We suppose that N 1/2 (Σ N 0 − S 0 ) → d s 0 , which is normally distributed with zero mean and finite covariance Π 0 and N E Σ N 0 − S * 0 → ζ. We then let s * and a * denote optimal solutions of the linear quadratic Gaussian approximation.

Then the first order asymptotic formula for the cost holds that

and N 1/2 (Σ N − S) → d s (and convergences in the extended weak topology).

Proof. By Theorem 3.1 and uniqueness of the solution, we have that (Σ N ) → d (S * ). We now address (s

We define θ N such that

The infinitesimal covariance of θ N is given by

We first calculate using Lemma 4.1 and Equation (17), similar to (18) ,

We have simply chosen λ > 0 such that λ|b| 2 ≤ b · R t b for all b ∈ R m and t ∈ [0, T ]. With this inequality, we may proceed as in Proposition 4.2 to obtain uniform bounds for E T 0 |a N t | 2 dt and E |s N t | 2 . Similar bounds hold for the observation process, (u N ). In view of the bound on the quadratic covariation of (s N ) and (u N ), which follows from (4) and (6), we have tightness of the process. We suppose that (s N i j ) → d (s) as j → ∞. Then (θ N i j ) → (θ) where θ is a diffusion process with infinitesimal covariance Θ(S * t , A * t ) as in [8] . Proceeding as in Theorem 3.1 we compactify the control variable and obtain a weak limit (a) such that (13) holds and we have lower semi-continuity of the cost. It remains to check that the information constraint, a t =â t (u s ) s∈[0,t] is satisfied. This requires that we verify that the weak convergence holds on the extended probability spaceΩ of the prediction process. We let (Z N ) denote the prediction process of (s N , u N ) with respect to the filtration (G N ). Proposition 6.16 of [1] implies that the sequence of distributions for (Z N ) is tight if the the distribution of (s N , u N ) is tight and the prediction process of the limit point is continuous (which follows from (13)). We defineμ N to a be a measure measure on P(Ω) × [0, T ] × R m so that

Once again, we have tightness of this sequence of measures by the uniform bound on E T 0 |a N t | 2 dt and tightness for (Z N ). The control policy can then be expressed by disintegration as a Borelmeasurable map m : P(Ω) × [0, T ] → P(R m ). Finally, the limit control is recovered by

which shows that (a) satisfies the information constraint as the prediction process (Z) is progressively measurable with respect to (G).

We now have lower semi-continuity with respect to the LQG cost, that is

Combined with Proposition 4.2 we have equality in this limit, and that a * = a, s * = s, and u * = u.

5.1. Ising Game. In the Ising example there are two states for each particle and two global controls that determine the rate of transitions between the states. It is inspired by the physical Ising model and provides and example of a critical phase transition. We consider the following parameters of the model:

• β governs the cost to deviate the control from the rest state of 1. (Small β corresponds to high cost). • H adds an external bias to the preferred state.

• J adds a preference to either congregate in one state (when J < 0). • q governs the rate of observations of the states. The controls are exactly the transition rates, i.e., β(0, 1, Σ, α) = α 0 and β(1, 0, Σ, α) = α 1 . We assume the cost is

We reduce the problem to a single dimension byΣ = Σ 1 − Σ 0 (Σ 0 = 1−Σ 2 and Σ 1 = 1+Σ 2 ). We now have thatΣ jumps by 2 N at rate N α 0 (1 −Σ)/2 and jumps by − 2 N at rate N α 1 (1 +Σ)/2.

We will observe measurements of each particle, υ ∈ {0, 1} with rate q so that β(0, 0, Σ) = q and β(1, 1, Σ) = q.

for some constant q ≥ 0. We work in the reduced form with S t the mean field limit ofΣ. The dynamics of the mean field system simplify to

with cost

If we consider the difference of the number of measurements, it evolves by

The optimal control given co-state P is

Similarly, A 1 = exp − 2 β P }. The Hamiltonian reduces to,

The Hamiltonian flow for the mean field limit is dS t dt = 2 sinh(2 β P t ) − S t cosh(2 β P t ) ,

When H = 0, the critical points are solved simply by S = 0 and P = 0, and, when β J ≥ 1, sinh(2 β P ) = β J S so cosh(2 β P ) = sinh(2 β P ) tanh(2 β P ) = β J,

We have P = 2 −1 β −1 sinh −1 β J S . We will consider the equilibrium at S = 0. First we compute

We then calculate

The error covariance matrix, Π, in equilibrium solves

We want the positive root which simplifies to

Of course the covariance becomes smaller when q is larger and more measurements are available. When q = 0,ŝ = 0, there is no benefit in deviating from the mean field control and (although this no longer satisfies the assumptions of Theorem 4.3) Π is simply the covariance of s given the dynamics ds 2 = −2 s t dt + 2 dW t .

To solve for Z we have

We solve to get

so the solution exists if β J < 1 and is given by

In the critical case, β J = 1, Z = 4 −1 β −1 . The evolution is given by

and BR −1 B Z = −2 so, sinceŝ t remains near s, the drift cancels over long times while the fluctuations grow.

5.2. SIR Epidemic Model. As a second example we consider a compartmental epidemic model. There are three states:

• Σ 2 , recovered, and one control α (a social distancing parameter). The additional parameters of the problem are • γ is the recovery rate.

• b is the base line infection rate.

• k is a coefficient of the cost to reduce α below b.

• c is a cost of infections.

• ν rate of testing of infected individuals. We then suppose that a transition from susceptible to infectious occurs as β(0, 1, Σ, α) = α Σ 1 , and a transition from infectious to recovered occurs at rate β(1, 2, Σ, α) = γ.

We assume the infected individuals are tested at a rate ν,

The cost is

A similar cost was used in [23] with applications to the COVID-19 pandemic. We can consider the problem with a terminal condition, but for simplicity we set G(Σ) = 0.

For the mean field SIR example, the mean field dynamics are

The number of confirmed tests evolves simply by dU t dt = ν S 1 t .

The recovered state, (S 2 ), is irrelevant so we will ignore it. The Hamiltonian is then

.

H(S, P ) = −k log 1 − b S 0 S 1 k (P 1 − P 0 ) − c S 1 .

The Hamiltonian equations are Figure 1 . The infected population, Σ 1 , tests per day, and the approximate control 6. Discrete Time

All of our analysis also applies to a problem in discrete time. We present it in a way that is a discretization of our continuous time problem, although it is not necessary that the time steps are small. We suppose that at time t = k ∆t the transitions at time t + ∆t from state σ to γ occur with probability β(σ, γ, α k , Σ k ) ∆t. We require that γ β(σ, γ, α k , Σ k ) ∆t < 1. We assume the cost has the form J N (Σ), (α) = E T −1 k=0 L(Σ k , α k )∆t + G(Σ T ) .

When N is large and ∆t is small, the number of agents transitioning from σ to γ is well approximated by a Poisson distribution of rate β(σ, γ, α k , Σ k ) Σ σ k ∆t. We then can use the same definitions for b(Σ, α),b(Σ), (Σ, α), and˜ (Σ).

The mean field problem corresponds to the discretized dynamics

The co-state, P T = −D S G(Σ T ), P k = P k+1 + D S H(S k , P k+1 )∆t allows us to compute the gradient of the cost as

The statement and proof of Theorem 3.1 is now essentially the same. In order to express the linear quadratic problem to describe the fluctuations, we again define the following

The discrete form of the covariance equation, (16) , takes the form

and the estimator is given bŷ s k+1 =ŝ k + E kŝk + B k a k (24) + Π k+1Ẽ k+1 (Ẽ k+1 Π k+1Ẽ k+1 +Θ k+1 ) −1 u k+1 −Ẽ(ŝ k + E kŝk + B k a k ) .

and the optimal control is

For numerical implementation, we use a gradient descent algorithm to find the optimal mean field solution, and proceed directly to solve (24) and (25) .

Weak convergence and the general theory of processes. Unpublished manuscript

Stopping times and tightness. ii. The Annals of Probability

All adapted topologies are equal. Probability Theory and Related Fields

Linear-quadratic approximation of optimal policy problems

Probabilistic Theory of Mean Field Games with Applications I: Mean Field FBSDEs, Control, and Games. Probability Theory and Stochastic Modelling

Convergence, fluctuations and large deviations for finite state mean field games via the master equation

On weak convergence of filtrations

Law of large numbers and central limit theorem for unbounded jump mean-field models

From the master equation to mean field game limit theory: A central limit theorem

Ambiguity, information quality, and asset pricing

On stochastic relaxed control for partially observed diffusions

A finite agent equilibrium in an incomplete market and its strong convergence to the mean-field limit

Iosif Il'ich Gihman and Anatolij Vladimirovič Skorohod. Controlled stochastic processes

Socio-economic applications of finite state mean field games

Imperfect information, credit markets and unemployment

Informational imperfections in the capital market and macro-economic fluctuations

On mean-field partial information maximum principle of optimal control for stochastic systems with lévy processes

Contributions to the theory of optimal control

New results in linear filtering and prediction theory

A new approach to linear filtering and prediction problems

On the convergence of closed-loop nash equilibria to the mean field game limit

Tightness criteria for laws of semimartingales

Optimal control of covid-19 infection rate with social costs

Martingale proofs of many-server heavy-traffic limits for markovian queues

Central limit theorem for a system of markovian particles with mean field interactions

Economic calculation in the socialist commonwealth

from Proposition 4.2 for the SIR model.and − dP 0Equilibria occur when S 1 = 0 andDue to the possible instability at the equilibrium, linearizing can lead to very bad results. We show numerical results based on solving the Hamiltonian equations and Kalman filter numerically in discrete time (∆t = 1; see 6). We select as parameters b = 0.87, γ = 0.217, ν = 1 3 , c = 8, 000, k = 100, N = 10, 000, and T = 100. Results are shown in Figure 1 .