key: cord-0812068-9nvwr5o7 authors: Shi, Zhiyan; Wang, Zhongzhi; Zhong, Pingping; Fan, Yan title: The Generalized Entropy Ergodic Theorem for Nonhomogeneous Bifurcating Markov Chains Indexed by a Binary Tree date: 2021-08-03 journal: J Theor Probab DOI: 10.1007/s10959-021-01117-1 sha: 61930f4e2b8508962f5cc383b67c6c92051582f4 doc_id: 812068 cord_uid: 9nvwr5o7 In this paper, we study the generalized entropy ergodic theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree. Firstly, by constructing a class of random variables with a parameter and the mean value of one, we establish a strong limit theorem for delayed sums of the bivariate functions of such chains using the Borel–Cantelli lemma. Secondly, we prove the strong law of large numbers for the frequencies of occurrence of states of delayed sums and the generalized entropy ergodic theorem. As corollaries, we generalize some known results. Let {X n , n ≥ 0} be an arbitrary information source taking values in a finite alphabet set on probability space ( , F, P). Set f n (ω) = − 1 n ln P(X 0 , X 1 , . . . , X n ), and f n (ω) is called the relative entropy density of {X 0 , X 1 , . . . , X n } in information theory. The convergence of f n (ω) to a constant in a sense (L 1 convergence, convergence in probability, a.e. convergence) is regarded as the entropy ergodic theorem or the asymptotic equipartiton property (AEP), or the Shannon-McMillan-Breiman theorem, which is the fundamental theorem in information theory. Shannon [22] first proved the entropy ergodic theorem for convergence in probability for stationary ergodic information sources with finite alphabet. The entropy ergodic theorem in L 1 and a.e. convergence, respectively, for stationary ergodic information sources was explored by McMillan [20] and Breiman [6] . Chung [8] considered the case of countable alphabet and Billingsley [5] extended the result to stationary nonergodic sources. Gray and Kieffer [14] extended it to asymptotically stationary measure process. The entropy ergodic theorem for general stochastic processes can be found, for example, in Barron [2] and Algoet and Cover [1] . Yang and Liu [19, 30] obtained the entropy ergodic theorem of nonhomogeneous Markov information sources in finite state space. Yang and Liu [32] studied the entropy ergodic theorem for mth-order nonhomogeneous Markov information sources. Let S n = n k=1 X k , S 0 = 0 and for any m, n ∈ N + , define T m,n = S m+n − S m = m+n k=m+1 X k , T m,n n is regarded as the moving average or delay sum in probability theory. Many researches have been taken on topics of moving average. Shepp [23] studied the limiting values of the averages [S n+ f (n) − S n ]/ f (n) for i.i.d. random variables. Gaposhkin [12] established the law of large numbers for moving averages of independent random variables. Lanzinger [17] studied an almost sure limit theorem for moving averages of random variables between the strong law of large numbers and the Erdos-Rényi law. Lai [16] gave a review of limit theorems for moving averages and described some recent developments motivated by applications to signal detection and change point problems. Recently, Wang and Yang [27] considered the entropy ergodic theorem of the moving average form and obtained the generalized entropy ergodic theorem for nonhomogeneous Markov chains. The tree indexed stochastic process is one of the research hotspots of stochastic structure in recent years. The tree indexed stochastic process generally includes tree indexed random wak, random tree (such as Galton-Watson tree) and tree indexed Markov chain et al. There are a lot of researches about probability limit theorems of tree indexed stochastic process, we briefly list as follows: Chen [7] studied the average properties of random walks on Galton-Watson trees. Telcs and Wormald [26] studied the strong recurrence of tree indexed random walks determined by the resistance properties of spherically symmetric graphs. Dembo et al. [10] extended the notions of shift-invariance and specific relative entropy-as typically understood for Markov fields on deterministic graphs such as Z d -to Markov fields on random trees, and also developed single-generation empirical measure large deviation principles for a more general class of random trees. Le Gall [18] considered Galton-Watson trees associated with a critical offspring distribution and condition to have exactly n vertices, and they proved that these conditioned spatial trees converge as n → ∞, moduloan appropriate rescaling, towards the conditioned brownian tree under suitable assumptions on the offspring distribution and the spatial displacements. Guyon [13] studied the law of large numbers and central limit theorems for the bifurcating Markov chains indexed by a binary tree, and applied these results to detect cellular aging in Escherichia Coli, using the data of Stewart et al. and a bifurcating autoregressive model. Yamamoto [34] established a large deviation theorem for the number of branches of each order in a random binary tree, where the rate function associated with a large deviation was given by asymptotic forms of the rate function. The significant progress of tree indexed Markov chains is its entropy ergodic theorem. Benjamini and Peres [3] gave the definition of tree-indexed Markov chains and studied the recurrence and ray-recurrence for them. Berger and Ye [4] studied the existence of entropy rate for some stationary random fields on a homogenous tree. Ye and Berger [35, 36] , by using Pemantle's [21] result and a combinational approach, have obtained entropy ergodic theorem in probability for a PPG-invariant and ergodic random field on a homogenous tree. Yang and Liu [30] established the strong law of large numbers for frequency of state occurrence on Markov chains indexed by a homogenous tree (in fact, it is special case of tree-indexed Markov chains and PPGinvariant random field). Yang [31, 33] obtained the strong law of large numbers and the entropy ergodic theorem for tree-indexed Markov chains. Huang and Yang [15] studied the strong law of large numbers and entropy ergodic theorem for Markov chains indexed by an uniformly bounded tree. Shi and Yang [25] studied the entropy ergodic theorem for mth-order nonhomogeneous Markov chains indexed by a tree. Recently, Dang et al. [9] defined a discrete form of nonhomogeneous bifurcating Markov chains indexed by a binary tree and discuss the equivalent properties for them, meanwhile the strong law of large numbers and the entropy ergodic theorem are studied for these Markov chains with finite state space. Shi et al. [24] studied the strong law of large numbers and entropy ergodic theorem for Markov chains indexed by a Cayley tree in a Markovian environment with countable state space. Inspired by Dang et al. [9] , Wang and Yang [27] , and infused with some new ideas, in this paper, we study the generalized entropy ergodic theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree. Firstly, we prove a strong limit theorem for moving average of the bivariate functions of such chains. Secondly, we prove the strong law of large numbers for the frequencies of occurrence of states of moving average and the generalized entropy ergodic theorem. As corollaries, we generalize some known results. The research innovations of this paper are embodied in generalizing the entropy ergodic theorem in the form of moving average. As the classical Doob martingale convergence theorem cannot be employed, the core technique in this paper is that we construct a class of random variables with a parameter and the mean value of one, and use Borel-Cantelli lemma to prove the existence of a.e. convergence of certain random variables. The rest of this paper is organized as follows. Section 2 describes some preliminaries, some concepts and properties of Markov chains indexed by a tree and the entropy density are reviewed. The most significant results of this article, i.e. the strong law of large numbers for the frequencies of occurrence of states and the generalized entropy ergodic theorem for the finite nonhomogeneous bifurcating Markov chains indexed by a binary tree, will be illustrated in Sect. 3 A tree is a graph T which is connected and contains no circuits. Given any two vertices α = β ∈ T . Let αβ be the unique path connecting α and β. Define the distance d(α, β) to be the number of edges contained in the path αβ. Select a vertex as the root (denoted by o). For any two vertices σ and t of tree T , we write σ ≤ t if σ is on the unique path from the root o to t. We denote by σ ∧ t the vertex farthest from o satisfying σ ∧ t ≤ t and σ ∧ t ≤ σ . The set of all vertices with distance n from the root o is called the n-th level of T . We denote by L n the set of all vertices on level n (L o = {o}). We denote by L n m to be the set of all vertices on the mth to nth level of T , specially by T (n) to be the set of all vertices on level 0 (the root o) to level n. Let T be any tree and t ∈ T \{o}. If a vertex in this tree is on the unique path from the root o to t and is the nearest to t, we call it the predecessor of t and denote it by 1 t , we also call t a successor of 1 t . If the root of a tree has N neighboring vertices and other vertices have N + 1 neighboring vertices, we call this type of tree a Cayley tree and denote it by T C,N . That is, for any vertex t of Cayley tree T C,N , it has N successors on the next level. In this paper, we mainly investigate the binary tree T C,2 , on which each vertex has two successors on the next level. For simplicity, we denote T C,2 by T 2 (see Fig. 1 ). For any vertex t of the binary tree T 2 , we denote by t 1 and t 2 the two successors of t, and call them the first successor and the second successor of t respectively. Let ( , F, P) be a probability space, and T be any tree, {X t , t ∈ T } be tree-indexed stochastic processes defined on ( , F, P). Let A be the subgraph of T , X A = {X t , t ∈ A}. We denote by |A| the number of vertices of A, x A the realization of X A . Dang et al. [9] defined the discrete form of nonhomogeneous bifurcating Markov chains indexed by a binary tree. First we review the definition of this process. Definition 2.1 (Dang et al. [9] ) Let T 2 be a binary tree, G a countable state space, {X t , t ∈ T 2 } be a collection of G-valued random variables defined on probability space ( , F, P). Let be a distribution on G, and be a collection of stochastic matrices (that is P t (y 1 , y 2 |x) ≥ 0, ∀y 1 , y 2 , x ∈ G, and and {X t , t ∈ T 2 } will be called G-valued nonhomogeneous bifurcating Markov chains indexed by a binary tree T 2 with the initial distribution (1) and stochastic matrices (2). t ∈ T 2 } will be called G-valued homogeneous bifurcating Markov chains indexed by a binary tree. Dang et al. [9] presented the equivalent properties for nonhomogeneous bifurcating Markov chains indexed by a binary tree as following. Property 2.1 (Dang et al. [9] ) Let T 2 be a binary tree, G a countable state space, and {X t , t ∈ T 2 } be a collection of G-valued random variables defined on probability space ( , F, P), then the three propositions below are equivalent: a binary tree T 2 with the initial distribution (1) and stochastic matrices (2) defined by Definition 2.1; (ii) For ∀n ≥ 1 and ∀x T (n) ∈ G T (n) , we have and It is a consequence of Kolmogorov extension theorem that there exists a collection of G-valued random variables {X t , t ∈ T 2 } on some probability space such that (5) holds. (5), we can easily obtain that for ∀m, n ≥ 1, n ≥ m and ∀x L n m = G L n m , indexed by a binary tree T 2 with the stochastic matrices (2) defined by Definition 2.1. From the second equality of (6), we have that for any t ∈ T , Below we will recall the definition of tree indexed nonhomogeneous Markov chains. Definition 2.2 (Dong et al. [11] ) Let T be a local finite and infinite tree, G a countable state space, {X t , t ∈ T } be a collection of G-valued random variables defined on probability space ( , F, P). Let be a distribution on G, and be a collection of transition matrices on G 2 . If ∀n ≥ 1, and t, t and {X t , t ∈ T } will be called G-valued nonhomogeneous Markov chains indexed by tree T with the initial distribution (8) and transition matrices (9), or called tree indexed nonhomogeneous Markov chains with state space G. The above definition is the natural generalization of the definition of homogeneous Markov chains indexed by tree T (see Benjamini and Peres [3] ). Similar to the equivalent property of nonhomogeneous bifurcating Markov chains indexed by a binary tree, by Property 2.1, we can immediately obtain the equivalent property of nonhomogeneous Markov chains indexed by a tree. Thus a nonhomogeneous bifurcating Markov chain indexed by a binary tree is the nonhomogeneous Markov chain indexed by a binary tree if and only if, for ∀t ∈ T 2 and ∀x, y 1 , y 2 ∈ G, that is ∀t ∈ T 2 , The above equality means that a nonhomogeneous bifurcating Markov chain indexed by a binary tree is the nonhomogeneous Markov chain indexed by a binary tree if and only if for any t ∈ T 2 , their two successors of the same predecessor of t are conditionally independent. Let T be a tree, {X t , t ∈ T } be a stochastic process indexed by tree T taking values in countable state space G. Denote P(x L n m ) = P(X L n m = x L n m ). Let {a n , n ≥ 0} and {φ(n), n ≥ 0} be two sequences of nonnegative integers such that lim n→∞ φ(n) = ∞. Define f a n ,φ(n) (ω) will be called the generalized entropy density of X L an +φ(n) an . Particularly, if a n ≡ 0 and φ(n) = n, f a n ,φ(n) (ω) will become the classical entropy density of X T (n) defined as follows Obviously, if {X t , t ∈ T } is a nonhomogeneous bifurcating Markov chains indexed by a binary tree defined by Definition 2.1, it follows from (7) that f a n ,φ(n) (ω) = − 1 |L a n +φ(n) a n | ln P(X L an ) + t∈L an +φ(n)−1 an (17) and Property 2.3 (Yang and Yang [28] ) Let T 2 be a binary tree, Let f a n ,φ(n) (ω) be defined by (15) . Then f a n ,φ(n) (ω) are uniformly integrable. nonhomogeneous bifurcating Markov chain indexed by a binary tree defined as before. Let S k (L a n +φ(n) a n )(k ∈ G) be the number of k in set of random variables {X t , t ∈ L a n +φ(n) a n }, and S k (L a n )(k ∈ G) be the number of k in set of random variables {X t , t ∈ L a n }, S i k (L a n +φ(n)−1 a n )(k ∈ G) be the number of k in set of random variables {X t i = k, t ∈ L a n +φ(n)−1 a n }, i = 1, 2, which are defined as, S k (L a n +φ(n) a n ) = |{t ∈ L a n +φ(n) a n : X t = k}|; S k (L a n ) = |{t ∈ L a n : X t = k}|; and S i k (L a n +φ(n)−1 a n ) = |{t ∈ L a n +φ(n)−1 a n : X t i = k}|, i = 1, 2. It follows that S k (L a n +φ(n) a n ) = t∈L an +φ(n) an S k (L a n ) = t∈L an S 1 k (L a n +φ(n)−1 a n ) = t∈L an +φ(n)−1 an S 2 k (L a n +φ(n)−1 a n ) = t∈L an +φ(n)−1 an and S k (L a n +φ(n) a n ) = S 1 k (L a n +φ(n)−1 a n ) + S 2 k (L a n +φ(n)−1 a n ) + S k (L a n ), where In this section, we will establish the strong law of large numbers for the frequencies of occurrence of states and the generalized entropy ergodic theorem for the finite nonhomogeneous bifurcating Markov chains indexed by a binary tree. Firstly, we will give the strong law of large numbers for the frequencies of occurrence of states for this chains with finite state space. 1} be a finite state space, and {X t , t ∈ T 2 } be a G-valued nonhomogeneous bifurcating Markov chain indexed by a binary tree T 2 with stochastic matrices {P t , t ∈ T 2 } defined by Definition 2.1, S k (L a n +φ(n) a n ) be defined by (19) . Let P = (P(y 1 , y 2 |x)), x, y 1 , y 2 ∈ G be another stochastic matrix, and let P 1 (y 1 |x) = y 2 ∈G P(y 1 , y 2 |x),P 2 (y 2 |x) = y 1 ∈G P(y 1 , y 2 |x),P 1 = (P 1 (y|x)),P 2 = (P 2 (y|x)). Let Q = 1 2 (P 1 + P 2 ), and assume that the transition matrix Q is ergodic. Let {a n , n ≥ 0} and {φ(n), n ≥ 0} be two nonnegative integer sequences such that for any positive integers n, m If ∀x, y 1 , y 2 ∈ G, then lim n→∞ S k (L a n +φ(n) a n ) |L a n +φ(n) a n where π = {π(0), π(1), . . . , π(b−1)} is the unique stationary distribution determined by the transition matrix Q. The proof of the above theorem will be given in Sect. 4. In the following, we will study the generalized entropy ergodic theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree with finite state space Theorem 3.2 Under the conditions of Theorem 3.1, let f a n ,φ n (ω) be as defined in (17) and {a n , n ≥ 0} be a sequence of bounded nonnegative numbers, then lim n→∞ f a n ,φ n (ω) = − P(k 1 , k 2 |l) ln P(k 1 , k 2 |l) a.e.. (27) The proof of the above theorem will be presented in Sect. 4. Let a n ≡ 0, φ(n) = n in Theorem 3.2, we can immediately get the entropy ergodic theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree with finite state space G (see Dang et al. [9] ). From Property 2.3, we know that f a n ,φ(n) (ω) are uniformly integrable. Thus (27) also holds with L 1 convergence. We denote by g a n ,φ(n) (ω) the generalized entropy density of nonhomogeneous Markov chains indexed by a tree with the initial distribution (8) and transition matrices (9) . From (12) , it is easy to see that g a n ,φ(n) (ω) = − 1 |L a n +φ(n) a n | ln P(X L an ) + t∈L an +φ(n) an +1 ln Q t (X t |X 1 t ) . By Theorem 3.2, we can establish the generalized entropy ergodic theorem for nonhomogeneous Markov chains indexed by a binary tree. then lim n→∞ g a n ,φ(n where π = {π(0), . . . , π(b − 1)} is the unique stationary distribution determined by the transition matrix Q. The proof of the above corollary will be given in Sect. 4. Take a n ≡ 0, φ(n) = n in Corollary 3.1, it is straightforward to obtain the entropy ergodic theorem for nonhomogeneous Markov chains indexed by a Cayley tree T C,2 with finite state space G. The result is a special case of Dong, Yang and Bai [11] for N = 2. If there is only one son for each vertex of the tree, nonhomogeneous Markov chains indexed by a binary tree will degenerate into nonhomogeneous Markov chains. Similarly, we denote by h a n ,φ(n) (ω) the generalized entropy density of nonhomogeneous Markov chain with the initial distribution μ 0 (0), . . . , μ 0 (b−1) and transition matrices P n = ( p n (i, j) ), i, j ∈ G. It easily follows that h a n ,φ(n) (ω) = − 1 φ(n) log μ a n (X a n ) + a n +φ(n) where μ a n (x) is the distribution of X a n . Thus we can get the generalized entropy ergodic theorem for nonhomogeneous Markov chains. Let P = ( p(i, j)) be another transition matrix, and assume that P is irreducible. If Proof The corollary is a special case of Corollary 3.1, where T 2 is the set of nonnegative integers N. Note that 1 φ(n) a n +φ(n) ) 1 a n + φ(n) a n +φ(n) k=1 | p k (i, j) − p(i, j)|, and {a n } is bounded, by (32), we have that lim n→∞ 1 φ(n) a n +φ(n) Thus, we can immediately obtain the results of Wang and Yang [27] on the generalized entropy ergodic theorem for delayed sums of nonhomogeneous Markov chains. If a n ≡ 0, φ(n) = n in Corollary 3.2, we can get the entropy ergodic theorem of nonhomogeneous Markov chains (see Yang, [29] ). Before providing the proofs of the main results in Sect 3, we begin with some lemmas. 2.1, and {g t (x, y 1 , y 2 Let {a n , n ≥ 0} and {φ(n), n ≥ 0} be two sequences of nonnegative integers such that φ(n) converges to infinity as n → ∞. Assume that for ∀ε > 0, ∞ n=1 exp(−|L a n +φ(n) a n |ε) < ∞. Let H a n ,φ(n) (ω) = t∈L an +φ(n)−1 an and G a n ,φ(n) (ω) = t∈L an +φ(n)−1 an Let α > 0, and set D(α) = ω : lim sup n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n) an +1 Then we have lim n→∞ H a n ,φ(n) (ω) − G a n ,φ(n) (ω) |L a n +φ(n) a n | = 0 a.e. ω ∈ D(α). (38) It is easy to see that if {g t (x, y 1 , y 2 ), t ∈ T 2 } is a collection of uniformly bounded functions, then for any α > 0, D(α) = , thus we can get lim n→∞ H a n ,φ(n) (ω) − G a n ,φ(n) (ω) |L a n +φ(n) a n | = 0 a.e.. Let a n = 0 and φ(n) = [log 2 n α ](α > 0). Since T 2 is a binary tree, we have |L a n +φ(n) a n where [·] is the usual greatest integer function. In this case (34) holds. Proof Let λ be a nonzero real number, for fixed n, define t a n ,m (λ, ω) = e λ t∈L an +m−1 an Noticing that E e λ t∈L an +φ(n)−1 ·P(X L an +φ(n) = x L an +φ(n) |X T (an +ϕ(n)−1) ) It is easy to see that E[t a n ,1 (λ, ω)] = 1. Hence by (40), = E E[t a n ,φ(n) (λ, ω)|X T (an +φ(n)−1) ] = E E t a n ,φ(n)−1 (λ, ω) e λ t∈L an +φ(n)−1 g t (X t ,X t 1 ,X t 2 ) = E[t a n ,φ(n)−1 (λ, ω)] = · · · = E[t a n ,1 (λ, ω)] = 1. By Markov inequality, (34) and (41), for any ε > 0, we have ∞ n=1 P 1 |L a n +φ(n) a n | ln t a n ,φ(n) (λ, ω) ≥ ε = ∞ n=1 P t a n ,φ(n) (λ, ω) ≥ exp(|L a n +φ(n) a n | · ε) exp(−|L a n +φ(n) a n | · ε) < ∞. According to Borel-Cantelli Lemma and arbitrariness of ε, we have lim sup n→∞ 1 |L a n +φ(n) a n | ln t a n ,φ(n) (λ, ω) ≤ 0 a.e.. Noticing that ln t a n ,φ(n) (λ, ω) |L a n +φ(n) a n | = 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an by (43) and (44), we have lim sup n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an Let 0 < λ ≤ α, dividing both sides of (45) by λ, we have lim sup n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an By (37), (46), and inequalities ln x ≤ x − 1(x > 0) and 0 ≤ e x − 1 − x ≤ 1 2 x 2 e |x| , we get lim sup n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an ≤ lim sup n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an ≤ lim sup n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an |L a n +φ(n) a n | t∈L an +φ(n)−1 an |L a n +φ(n) a n | t∈L an +φ(n)−1 an Letting λ → 0 + in (47) we have lim sup n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an Let −α ≤ λ < 0, we similarly get lim inf n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an ≥ 0 a.e. ω ∈ D(α). Combining (48) and (49), we obtain (38) directly. Let T 2 be a binary tree, {a n , n ≥ 0} and {φ(n), n ≥ 0} defined as in Lemma 4.1. Let {a t , t ∈ T } be a collection of real numbers, and a be a real number. If then lim n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an Proof Noticing that 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an |a t − a| ≤ |T (a n +φ(n)) | |L a n +φ(n) a n | 1 |T (a n +φ(n)) | t∈T (an +φ(n)−1) |a t − a|. Since T 2 is a binary tree, we have lim n→∞ |T (a n +φ(n)) | |L a n +φ(n) a n | = lim n→∞ 2 a n +φ(n)+1 − 1 2 a n (2 φ(n)+1 − 1) = 1. |L a n +φ(n) a n | t∈L an +φ(n)−1 an |P t (y 1 , y 2 |x) − P(y 1 , y 2 |x)| = 0. Let g t (x, y 1 , y 2 ) = I k (y 1 ) in Lemma 4.1. Obviously, {g t (x, y 1 , y 2 ), t ∈ T 2 } are uniformly bounded. Since H a n ,φ(n) (ω) = t∈L an +φ(n)−1 an and G a n ,φ(n) (ω) = t∈L an +φ(n)−1 an From Lemma 4.1, we have lim n→∞ 1 |L a n +φ(n) a n | S 1 k (L a n +φ(n)−1 a n ) − t∈L an +φ(n)−1 an From (54), it can be easily verified that lim n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an By (57)-(59), we have lim n→∞ 1 |L a n +φ(n) a n | S 1 k (L a n +φ(n)−1 a n ) − b−1 l=0 P 1 (k|l)S l (L a n +φ(n)−1 a n ) = 0 a.e.. (60) Let g t (x, y 1 , y 2 ) = I k (y 2 ) in Lemma 4.1, similarly, we obtain that lim n→∞ 1 |L a n +φ(n) a n | S 2 k (L a n +φ(n)−1 a n ) − b−1 l=0 P 2 (k|l)S l (L a n +φ(n)−1 a n ) = 0 a.e.. (61) Adding (60) and (61), and noticing that 0 ≤ lim n→∞ S k (L a n ) |L a n +φ(n) a n | ≤ lim n→∞ |L a n | |L a n +φ(n) a n | = lim n→∞ 2 a n 2 a n (2 φ(n) − 1) = 0, lim n→∞ |L an +φn) an | |L an +φ(n)−1 an | = 2, and Q = 1 2 (P 1 + P 2 ). By (23), we have lim n→∞ S k (L a n +φ(n) a n ) |L a n +φ(n) a n | − b−1 l=0 Q(k|l) S l (L a n +φ(n)−1 a n ) |L a n +φ(n)−1 a n | = 0 a.e. Letting φ (n) = φ(n) − 1, it is easy to see that {φ (n), n ≥ 0} also satisfies (34) . Using the same argument as that used to derive (62), we can prove that lim n→∞ S k (L a n +φ(n)−1 a n ) |L a n +φ(n)−1 a n |L a n +φ(n)−2 a n | = 0 a.e.. Multiplying the k-th equality of (63) by Q( j|k), adding them together and using (62), we have S k (L a n +φ(n)−1 a n ) |L a n +φ(n)−1 a n S l (L a n +φ(n)−2 a n ) |L a n +φ(n)−2 a n | Q(k|l)Q( j|k) S k (L a n +φ(n)−1 a n ) |L a n +φ(n)−1 a n | Q( j|k) − S j (L a n +φ(n) a n ) |L a n +φ(n) a n | + S j (L a n +φ(n) a n ) |L a n +φ(n) a n S l (L a n +φ(n)−2 a n ) |L a n +φ(n)−2 a n | = lim n→∞ S j (L a n +φ(n) a n ) |L a n +φ(n) a n S l (L a n +φ(n)−2 a n ) |L a n +φ(n)−2 a n | Q (2) where Q (N ) ( j|l) is the N -step transition probability determined by Q. By induction, we have lim n→∞ S j (L a n +φ(n) a n ) |L a n +φ(n) a n S l (L a n +φ(n)−N a n ) |L a n +φ(n)−N a n | Q (N ) ( j|l) = 0 a.e.. Noticing that 1 |L a n +φ(n)−N a n | b−1 l=0 S l (L a n +φ(n)−N a n and lim N →∞ (26) follows from (65), (66) and (67). This completes the proof of the Theorem 3.1. Before presenting the proof of Theorem 3.2, we cite a lemma which will be used. It is easy to see from (24) that {φ(n), n ≥ 0} satisfies (34) . By Markov inequality and (34), we have for every ε > 0, ∞ n=1 P 1 |L a n +φ(n) a n | ln P( exp{−ε|L a n +φ(n) a n |} < ∞. (70) By Borel-Cantelli lemma, we get lim n→∞ 1 |L a n +φ(n) a n | ln P(X L an ) = 0 a.e.. |L a n +φ(n) a n | t∈L an +φ(n)−1 an |P t (k 1 , k 2 |l) ln P t (k 1 , k 2 |l) −P(k 1 , k 2 |l) ln P(k 1 , k 2 |l)| = 0. Let g t (x, y 1 , y 2 ) = ln P t (y 1 , y 2 |x) for all t ∈ T 2 in Lemma 4.1. By (35) and (36), we have H a n ,φ(n) (ω) = t∈L an +φ(n)−1 an G a n ,φ(n) (ω) = t∈L an +φ(n)−1 an Letting α = 1 2 , noticing that for any t ∈ T 2 , we have and ∀t ∈ T 2 , Thus lim sup n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an By (73)-(76) and Lemma 4.1, we have lim n→∞ 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an − t∈L an +φ(n)−1 an Now, we have 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an |L a n +φ(n) a n | t∈L an +φ(n)−1 an I l (X t )P t (k 1 , k 2 |l) · ln P t (k 1 , k 2 |l) − 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an b−1 l=0 b−1 I l (X t )P(k 1 , k 2 |l) · ln P(k 1 , k 2 |l) + 1 |L a n +φ(n) a n I l (X t )P(k 1 , k 2 |l) · ln P(k 1 , k 2 |l) 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an P t (k 1 , k 2 |l) · ln P t (k 1 , k 2 |l) − P(k 1 , k 2 |l) · ln P(k 1 , k 2 |l) + P(k 1 , k 2 |l) · ln P(k 1 , k 2 |l) 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an 1 |L a n +φ(n) a n | t∈L an +φ(n)−1 an P t (k 1 , k 2 |l) · ln P t (k 1 , k 2 |l) − P(k 1 , k 2 |l) · ln P(k 1 , P(k 1 , k 2 |l) · ln P(k 1 , k 2 |l) 1 |L a n +φ(n) a n | S l (L a n +φ(n)−1 an ) − 1 2 π(l) a.e.. |L a n +φ(n) a n | t∈L an +φ(n)−1 an ln P t (X t 1 , X t 2 |X t ) P(k 1 , k 2 |l) ln P(k 1 , k 2 |l) a.e.. (27) can be obtained from (17), (71) and (79), which completes the proof of the theorem 3.2. Proof of Corollary 3.1 Let ∀t ∈ T 2 and ∀x, y 1 , y 2 ∈ G, P t (y 1 , y 2 |x) = Q t 1 (y 1 |x)Q t 2 (y 2 |x). From Remark 2.4 we know that nonhomogeneous Markov chain indexed by a binary tree given in this corollary is a nonhomogeneous bifurcating Markov chain indexed by a binary tree with the stochastic matrices {P t = (P t (y 1 , y 2 |x)), t ∈ T 2 }, and g a n ,φ(n) (ω) = 1 |L a n +φ(n) a n | ln P(X L an ) + t∈L an +φ(n) an +1 ln Q t (X t |X 1 t ) = − 1 |L a n +φ(n) a n | ln P(X L an ) + t∈L an +φ(n)−1 an ln Q t 1 (X t 1 |X t ) + t∈L an +φ(n)−1 an ln Q t 2 (X t 2 |X t ) = − 1 |L a n +φ(n) a n | ln P(X L an ) + t∈L an +φ(n)−1 an ln P t (X t 1 , X t 2 |X t ) = f a n ,φ(n) (ω). Let P(y 1 , y 2 |x) = Q(y 1 |x)Q(y 2 |x). It is easy to see that P 1 = Q, P 2 = Q, 1 2 (P 1 + P 2 ) = Q, and Q is ergodic. Since |P t (k 1 , k 2 |l) − P(k 1 , k 2 |l)| = |Q t 1 (k 1 |l)Q t 2 (k 2 |l) − Q(k 1 |l)Q(k 2 |l)| ≤ |Q t1 (k 1 |l)Q t 2 (k 2 |l) − Q(k 1 |l)Q t 2 (k 2 |l)| + |Q(k 1 |l)Q t 2 (k 2 |l) −Q(k 1 |l)Q(k 2 |l)| ≤ |Q t 1 (k 1 |l) − Q(k 1 |l)| + |Q t 2 (k 2 |l) − Q(k 2 |l)|, and by (29) , for i = 1, 2, = lim n→∞ f a n ,φ(n) (ω) Q(k 1 |l)Q(k 2 |l) · ln Q(k 1 |l) + ln Q(k 2 |l) Thus, (30) holds. A sandwich proof of the Shannon-McMillan-Breiman theorem The strong ergodic theorem for densities: generalized Shannon-McMillan-Breiman theorem Markov chains indexed by trees Entropic aspects of random fields on trees Ergodic Theory and Information The individual ergodic theorem of information theory Average properties of random walks on Galton-Watson trees The ergodic theorem of information theory The strong law of large numbers and the entropy ergodic theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree Large deviations of Markov chains indexed by random trees The strong law of large numbers and the Shannon-McMillan theorem for nonhomogeneous Markov chains indexed by a Cayley tree The law of large numbers for moving averages of independent random variables. Mathematical notes of the Academy of Limit theorems for bifurcating Markov chains. Application to the detection of cellular aging Asymptotically mean stationary measures Strong law of large number for Markov chains indexed by an infinite tree with uniformly bounded degree Limit theorems for moving averages An almost sure limit theorem for moving averages of random variables between the strong law of large numbers and the Erdös-Rényi law A conditional limit theorem for tree-indexedrandom walk A extension of Shannon-McMillan theorem and some limit properties for nonhomogeneous Markov chains The basic theorems of information theory Antomorphism invariant measure on trees A mathematical theory of communication A limit law concerning moving averages The Shannon-McMillan theorem for Markov chains indexed by a Cayley tree in random environment Some limit properties for the mth-order nonhomogeneous Markov chains indexed by an rooted Cayley tree Branching and tree indexed random walks on fractals The generalized entropy ergodic theorem for nonhomogeneous Markov chains The generalized entropy ergodic theorem for nonhomogeneous Markov chains indexed by a Cayley tree The asymptotic equipartition property for nonhomogeneous Markov information sources Strong law of large numbers for Markov chains fields on a Bethe tree Some limit properties for Markov chains indexed by a homogeneous tree The asymptotic equipartition property for mth-order nonhomogeneous Markov information sources The asymptotic equipartition property for Markov chains indexed by a Homogeneous tree Large deviation theorem for branches of the random binary tree in the Horton-Strahler analysis Ergodic, regulary and asymptotic equipartition property of random fields on trees Information measures for discrete random fields Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations The authors sincerely thank the editor and reviewers for their helpful and important comments, especially during the time with COVID-19 pandemic. The authors are also very thankful to Professor Keyue Ding who helped us to improve the English of this paper greatly. This work is supported by the National Natural Science Foundation of China (11971197, 11601191).