key: cord-0615085-1ncxmguy authors: Tong, Lili; Liu, Piaomu; Pena, Edsel title: Joint Dynamic Models and Statistical Inference for Recurrent Competing Risks, Longitudinal Marker, and Health Status date: 2021-03-24 journal: nan DOI: nan sha: 9e2bea552c98e8d62ad3b3a650a9c6ad84e8d879 doc_id: 615085 cord_uid: 1ncxmguy Consider a subject or unit in a longitudinal biomedical, public health, engineering, economic or social science study which is being monitored over a possibly random duration. Over time this unit experiences recurrent events of several types and a longitudinal marker transitions over a discrete state-space. In addition, its"health"status also transitions over a discrete state-space with at least one absorbing state. A vector of covariates will also be associated with this unit. Of major interest for this unit is the time-to-absorption of its health status process, which could be viewed as the unit's lifetime. Aside from being affected by its covariate vector, there could be associations among the recurrent competing risks processes, the longitudinal marker process, and the health status process in the sense that the time-evolution of each process is associated with the other processes. To obtain more realistic models and enhance inferential performance, a joint dynamic stochastic model for these components is proposed and statistical inference methods are developed. This joint model, formulated via counting processes and continuous-time Markov chains, has the potential of facilitating `personalized' interventions. This could enhance, for example, the implementation and adoption of precision medicine in medical settings. Semi-parametric and likelihood-based inferential methods for the model parameters are developed when a sample of these units is available. Finite-sample and asymptotic properties of estimators of model parameters, both finite- and infinite-dimensional, are obtained analytically or through simulation studies. The developed procedures are illustrated using a real data set. Consider a unit -for example, a human subject in a medical or a social science study, an experimental animal in a biological experiment, a machine in an engineering or reliability setting, or a company in an economic or business situation -in a longitudinal study monitored over a period [0, τ ], where τ is possibly random. Associated with the unit is a covariate row vector X = (X 1 , X 2 , . . . , X p ). Over time, the unit will experience occurrences of Q competing types of recurrent events, its recurrent competing risks (RCR) component; transitions of a longitudinal marker (LM) process W (t) over a discrete state space W; and transitions of a 'health' status (HS) process V (t) over a discrete state space V = V 0 V 1 , with V 0 = ∅ being absorbing states. If the health status process transitions into an absorbing state prior to τ , then monitoring of the unit ceases, so time-to-absorption serves as the lifetime of the unit. To demonstrate pictorially, the two panels in Figure 1 : Realized data observables from two distinct study units. The first plot panel is for a unit which was rightcensored prior to reaching an absorbing state, while the second plot panel is for a unit which reached an absorbing state prior to getting right-censored. with v 0 an absorbing state. In panel 1, the unit did not transition to an absorbing state prior to reaching τ ; whereas in panel 2, the unit reached an absorbing state prior to τ . Two major questions arise: (a) how do we specify a dynamic stochastic model that could be a generative model for such a data, and (b) how do we make statistical inferences for the model parameters and predictions of a unit's lifetime from a sample of such data? To address these questions, the major goals of this paper are (i) to propose a joint stochastic model for the random observables consisting of the RCR, LM, and HS components for such units, and (ii) to develop appropriate statistical inference methods for the proposed joint model when a sample of units are observed. Achieving these two goals will enable statistical prediction of the (remaining) lifetime of a, possibly new, unit; allow for the examination of the synergistic association among the RCR, LM, and HS components; and provide a vehicle to compare different groups of units and/or study the effects of concomitant variables or factors. More importantly, a joint stochastic model endowed with proper statistical inference methods could potentially enable unit-based interventions which are performed after a recurrent event occurrence or a transition in either the LM or HS processes. As such it could enhance the implementation of precision or personalized decision-making; for instance, precision medicine in a medical setting. A specific situation where such a data accrual occurs is in a medical study. For example, a subject may experience different types of recurring cancer, with the longitudinal marker being the prostate-specific antigen (PSA) level categorized into a finite ordinal set, while the health status is categorized into either a healthy, diseased, or dead state, with the last state absorbing. A variety of situations in a biomedical, engineering, public health, sociology, and economics settings, where such data structure arise, are further described in Section 2. Several previous works dealing with modeling have either focused in the marginal modeling of each of the three data components, or in the joint modeling of two of the three data components. In this paper we tackle the problem of simultaneously modeling all three data components: RCR, LM, and HS, in order to account for the associations among these components, which would not be possible using either the marginal modeling approach or the joint modeling of pairwise combinations of these three components. A joint full model could also subsume these previous marginal or joint models -in fact, our proposed class of models subsumes as special cases models that have been considered in the literature. In contrast, only by imposing restrictive assumptions, such as the independence of the three model components, could one obtain a joint full model from marginal or pairwise joint models. As such, a joint full model will be less likely to be mis-specified, thereby reducing the potential biases that could accrue from mis-specified models among estimators of model parameters or when predicting residual lifetime. A joint modeling approach has been extensively employed in previous works. For instance, joint models for an LM process and a survival or time-to-event (TE) process have been proposed in [29] , [32] , [27] , [16] , [30] , and [22] . Also, the joint modeling of an LM process and a recurrent event process has also been discussed in [15] and [11] , while the joint modeling of a recurrent event and a lifetime has also been done such as in [19] . An important and critical theoretical aspect that could not be ignored in these settings is that when an event occurrence is terminal (e.g., death) or when there is a finite monitoring period, informative censoring naturally occurs in the RCR, LM, or HS components, since when a terminal event or when the end of monitoring is reached, then the next recurrent event occurrence, the next LM transition, or the next HS transition will not be observed. Another aspect that needs to be taken into account in a dynamic modeling approach is that of performed interventions, usually upon the occurrence of a recurrent event. For instance, in engineering or reliability systems, when a component in the system fails, this component will either be minimally or partially repaired, or altogether replaced with a new component (called a perfect repair); while, with human subjects, when a cancer relapses, a hospital infection transpires, a gout flare-up, or alcoholism recurs, some form of intervention will be performed or instituted. Such interventions will impact the time to the next occurrence of the event, hence it is critical that such intervention effects be reflected in the model; see, for instance, [14] and [15] . In addition, models should take into consideration the impacts of the covariates and the effects of accumulating event occurrences on the unit. Models that take into account these considerations have been studied in [23] and [25] . Appropriate statistical inference procedures for these dynamic models of recurrent events and competing risks have been developed in [23] and [28] . Extensions of these joint dynamic models for both RCR and TE can be found in [20] . Some other recent works in joint modeling included the modeling of the three processes: LM, RCR (mostly, a single recurrent event), and TE simultaneously in [17] , [7] , [18] , [21] and [5] . The joint model that will be proposed in this paper will take into consideration these important aspects. We now outline the remainder of this paper. Prior to describing formally the joint model in Section 3, we first present in Section 2 some concrete situations in science, medicine, engineering, social, and economic disciplines where the data accrual described above could arise and where the joint model will be relevant. Section 3 formally describes the joint model using counting processes and continuous-time Markov chains (CTMCs), and provide interpretations of model parameters. In subsection 3.3 we discuss in some detail a special case of this joint model obtained using independent Poisson processes and homogeneous CTMCs. Section 4 deals with the estimation of the parameters. In subsection 4.1 we demonstrate the estimation of the model parameters in the afore-mentioned special case to pinpoint some intricacies of joint modeling and its inferential aspects. The general joint model contains nonparametric (infinite-dimensional) parameters, so in subsection 4.2 we will describe a semi-parametric estimation procedure for this general model. Section 5 will present asymptotic properties of the estimators, though we will not present in this paper the rigorous proofs of these analytical results but defer them to a more theoretically-focused paper. Section 6 will then demonstrate the semi-parametric estimation approach through the use of a synthetic or simulated data using an R [26] program we developed. In Section 7, we perform simulation studies to investigate the finite-sample properties of the estimators arising from semi-parametric estimation procedure and compare these finite-sample results to the theoretical asymptotic results. An illustration of the semi-parametric inference procedure using a real medical data set is presented in Section 8. Section 9 contains concluding remarks describing some open research problems. To demonstrate potential applicability of the proposed joint model, we describe in this section some concrete situations arising in biomedical, reliability, engineering, and socio-economic settings where the data accrual described in Section 1 could arise. • A Medical Example: Gout is a form of arthritis characterized by sudden and severe attacks of pain, swelling, redness and tenderness in one of more joints in the toes, ankles, knees, elbows, wrists, and fingers (see, for instance, Mayo Clinic publications about gout). When a gout flare occurs, it renders the person incapacitated (personally attested by the senior author) and the debilitating condition may last for several days. Since the location of the gout flare could vary, we may consider gout as competing recurrent events -competing with respect to the location of the flare, and recurrent since it could keep coming back. Gout occurs when urate crystals accumulate in the joints, which in turn is associated with high levels of uric acid in the blood. Other covariates, such as gender, blood pressure, weight, etc., could also impact the occurrence of gout flares, uric acid level, and CKD. When a gout flare occurs, lifestyle interventions could be performed such as (i) consuming skim milk powder enriched with the two dairy products glycomacropeptide (GMP) and G600 milk fat extract; or (ii) consuming standard milk or lactose powder. The purpose of such interventions is to lessen gout flare recurrences. Of major interest is to jointly model the competing gout recurrences, the categorized SUR process, and the CKD process. A study consisting of n subjects could be performed with each subject monitored over some period, with the time of gout flare recurrences of each type, SUR levels, and CKD states recorded over time, aside from relevant covariates. Based on such a data, it will be of interest to estimate the model parameters and to develop a prediction procedure for time-to-absorption to End Stage CKD for a person with gout recurrences. • A Reliability Example: Observe n independent cars over each of their own monitoring period until the car is declared inoperable or the monitoring period ends. Cars are complex systems in that they are constituted by different components, which could be subsystems or modules, configured according to some coherent structure function [3] . For each car, the states of Q components (such as its engine subsystem; transmission subsystem; brake subsystem; electrical subsystem; etc.) are monitored. Furthermore, its covariates such as weight; initial mileage; current mileage; years of operation; and other characteristics (for example, climate in which car is mostly being driven) are observed. Also, its 'health status', which is either functioning, functioning with some problems, or total inoperability (an absorbing state), is tracked over the monitoring period. Meanwhile, a longitudinal marker such as its oil quality indicator (which is either excellent; good; or poor) and the occurrences of failures of any of the Q components are also recorded over the monitoring period. When a component failure occurs, a repair or replacement of the component is undertaken. Given the data for these n cars, an important goal is to predict the time to inoperability of another car. Note that this type of application could occur in more complex systems such as space satellites, nuclear power plants, medical equipments, etc. • A Social Science Example: Observe n independent newly-married couples over a period of years (say, 20 years). Over this follow-up period, the marriage could end in separation or divorce, remain intact, or end due to the death of at least one of them. Each couple will have certain characteristics: their ages when they got married; working status of each; income level of each; education level of each; number of children (this is a time-dependent covariate); net worth of couple; etc. Possibly through a regularly administered questionnaire, the couple provides information from which their "marriage status" could be ascertained (either very satisfied; satisfied; poor; separated or divorced). Competing recurrent events for the couple could be changes in job status for either; addition in the family; educational changes of children; and major disagreements. A longitudinal marker could be the financial health status of the couple reflected by their categorized FICO scores. A goal is to infer about the parameters of the joint model based on the observations from these n couples, and to predict if separation or divorce will occur for a married couple, who are not in the original sample, and if so, to obtain a prediction interval of the time of such an occurrence. • A Financial Example: Track n independent publicly-listed companies over their monitoring periods. At the end of its monitoring period, a company could be bankrupt, still in business, or could have been acquired by another company. Each company has its own characteristics, such as total assets, number of employees, number of branches, etc. Note that these are all time-dependent characteristics. The "health status" of a company is rated according to four categories (A: Exceptional; B: Less than Exceptional; C: Distressed; D: Bankrupt). The bankrupt status is the absorbing state. The company's liability relative to its asset, categorized into High, Medium, Low, Non-Existent could be an important longitudinal marker. Recurrent competing risks will be the occurrence of an increase (decrease) of at least 5% during a trading day in its stock share price. Based on data from a sample of these companies, it could be of interest to predict the time to bankruptcy of another company that is not in the sample. • COVID- Let τ , the end of monitoring period, have a distribution function G(·), which may be degenerate. The covariate vector will be X = (X 1 , . . . , X p ), assumed to be time-independent, though the extension to time-dependent covariates are possible with additional assumptions. For the RCR component, let . . , Q}, be a Q-dimensional multivariate counting (column) vector process such that, for q ∈ I Q , N R q (s) is the number of observed occurrences of the recurrent event of type q over [0, s], with N R q (0) = 0. Thus, the sample path s → N R q (s) takes values in Z 0,+ = {0, 1, 2, . . .}, is a non-decreasing step-function, and is right-continuous with left-hand limits. We denote by with I(·) denoting indicator function. Thus, for each (w 1 , w 2 ) ∈ I W , the sample path s → N W (s; w 1 , w 2 ) takes values in Z 0,+ , is a nondecreasing step-function, right-continuous with left-hand limits, and with N W (s; w 1 , w 2 ) = 0. Furthermore, (w1,w2)∈I W dN W (s; w 1 , w 2 ) ∈ {0, 1} for every s ≥ 0. For each (v 1 , v) ∈ I V , the sample path s → N V (s; v 1 , v) takes values in Z 0,+ , and is a nondecreasing step-function, right-continuous with left-hand limits, and with N W (0; v 1 , v) = 0. In addition, (v1,v)∈I V dN V (s; v 1 , v) ∈ {0, 1} for every s ≥ 0. Next, we combine the multivariate counting processes N R , N W , and N V into one (column) multivariate counting process N = {N (s) : s ≥ 0} of dimension Q + |I W | + |I V |, where, with T denoting vector/matrix transpose, An important point needs to be stated regarding the observables in the study, which will have an impact in the interpretation of the parameters of the joint model. This pertains to the "competing risks" nature of all the possible events at each time point s. The possible Q recurrent event types, as well as the potential transitions in the LM and HS processes, are all competing with each other. Thus, suppose that at time s 0 , the event that occurred is a recurrent event of type q 0 , that is, dN R (s 0 ; q 0 ) = 1. This means that this event has occurred in the presence of the potential recurrent events from the other Q − 1 risks, and the potential transitions from either the LM and HS processes. This will entail the use of crude hazards, instead of net hazards, in the joint modeling, and this observation will play a critical role in the dynamic joint model since each of the competing event occurrences at a given time point s from all the possible event sources (RCR, LM, and HS) will be affected by the history of all these processes just before time s. This is the aspect that exemplifies the synergistic association among the three components. Another observable process for our joint model is a vector of effective (or virtual) age processes E = {(E 1 (s), . . . , E Q (s)) : s ≥ 0}, whose components are F-predictable processes with sample paths that are nonnegative, left-continuous, piecewise nondecreasing and differentiable. These effective age processes will carry the impact of interventions performed after each recurrent event occurrence or a transition in either the LM process or the HS process. For recent articles dealing with virtual ages, see the philosophically-oriented article [12] and the very recent review article [4] . Finally, we define the time-to-absorption of the unit to be τ A = inf{s ≥ 0 : V (s) ∈ V 0 } with the convention that inf ∅ = ∞. Using this τ A and τ , we define the unit's at-risk process to be Y = {Y (s) : s ≥ 0}, with Y (s) = I{min(τ, τ A ) ≥ s}. In addition, we define LM-specific and HS-specific at-risk processes as follows: For w ∈ W, define Y W (s; w) = I{W (s−) = w}; and, for v ∈ V 1 , define Y V (s; v) = I{V (s−) = v}. For a unit, we could then concisely summarize the random observables in terms of stochastic processes as: Note that the processes are undefined for s > min(τ A , τ ) ≡ inf{s ≥ 0 : Y (s+) = 0} since monitoring of the unit had by then ceased. The joint model specification will be through the specification of the compensator process vector and the predictable quadratic variation (PQV) process matrix of the multivariate counting process N . The predictable process vector with A R and M R of dimensions Q; A W and M W of dimensions |I W |; and A V and M V of dimensions |I V |. The matrix M, M could then be partitioned similarly to reflect these block components. We can now proceed with the specification of the compensator process vector and the PQV process matrix. For conciseness, we introduce the generic mapping ι defined as follows: For a set A with m elements, A = {a 1 , . . . , a m }, let ι A (a) = (I(a 2 = a), . . . , I(a m = a)), a row vector of m − 1 elements. Here ι A (a) is the indicator vector of a excluding the first element of A, so that ι A (a 1 ) = (0). Excluding a 1 in the ι mapping is for purposes of model identifiability. One may think of the mapping ι as a converter to dummy variables. We will also need the following quantities or functions. • For each q ∈ I Q there is a baseline (crude) hazard rate function λ 0q (·) with associated baseline (crude) cumulative hazard function Λ 0q (·). We also denote byF 0q (·) = · v=0 [1 − Λ 0q (dv)] the associated baseline (crude) survivor function, where is the product-integral symbol. • For each q ∈ I Q there is a mapping ρ q (·; ·) : Z Q 0,+ × dq → 0,+ , where the d q 's are known positive integers. There is an associated vector α q ∈ dq . • There is a collection of non-negative real numbers η = {η(w 1 , w 2 ) : (w 1 , w 2 ) ∈ I W }, and we define for every w 1 ∈ W, η(w 1 , w 1 ) = − w∈W;w =w1 η(w 1 , w). • There is a collection of non-negative real numbers We then define the observables and associated finite-dimensional parameters, respectively, for each of the three model components according to In addition to the λ 0q 's, α q 's, η, ξ, the θ R , θ W , and θ V will constitute all of the model parameters. For the proposed model, the compensator process components are given by, for q ∈ I Q , (w 1 , w 2 ) ∈ I W , and (v 1 , v) ∈ I V : In the left-hand side of the equations above, we have suppressed writing the dependence on the model parameters. With Dg(a) denoting the diagonal matrix formed from vector a, the PQV process is specified to be Observe the dynamic nature of this model in that an event occurrence at an infinitesimal time interval [s, s + ds) depends on the history of the processes before time s. According to the theory of counting processes, we have the the following probabilistic interpretations (statements are almost surely): together with the conditional covariance, given F s− , between any pair of elements of dN (s) being equal to zero, e.g., Cov{dN R (s), dN W (s; w 1 , w 2 )|F s− } = 0. However, note that we are not assuming that the components of N R , N W , and N V are independent, nor even conditionally independent. A way to view this model is that, given F s− , the history just before time s, on the infinitesimal interval [s, As such, for every s ≥ 0, the following constraint holds: , 1}, with the notation that a subscript of • means the sum over all the appropriate index set, e.g., dN R The multinomial distribution above could actually be approximated by independent Bernoulli distributions. To see this, if we have real numbers p k , k = 1, . . . , K, with 0 < p k ≈ 0 for each k = 1, . . . , K, and with K k=1 p k ≈ 0, then we have the approximation Consequently, in the equation below, the multinomial probability on the left-hand side is approximately the product of (independent) Bernoulli probabilities in the right-hand side. This approximate equivalence informs the manner in which we generate data from the model later in the sections dealing with an illustrative data set (Section 6) and the simulation studies (Section 7) where we used this product of Bernoulli approach. Consider a unit that is still at risk just before time s whose LM process is at state w 1 and HS process at state v 1 / ∈ V 0 . Two questions of interest are: (a) What is the distribution of the next event occurrence? (b) Given in addition that an event occurred at time s + t, what are the conditional probabilities of each of the possible events? Denote by T the time to the next event occurrence starting from time s. Then, with the second equality obtained by invoking the product-integral identity , and the last equality arising since, prior to the next event, there will be no changes in the values of N R , B R , B W , and B V from their respective values just before time s. In particular, if the λ 0q s are constants, corresponding to the hazard rates of an exponential distribution, then the distribution of the time to the next event occurrence is exponential. Given that the event occurred at time s + t, then the conditional probability that it was an RCR-type q event is Similarly, the conditional probability that it was a transition to state w 2 for the LM process is , and the conditional probability that is was a transition to state v, possibly to an absorbing state, for the HS process is . These probabilities demonstrate the competing risks nature of the different possible events. They also provide a computational approach to iteratively generate data from the joint model for use in simulation studies, with the basic idea being to first generate a time to any type of event, then to mark the type of event or update each of the counting processes by using the above conditional probabilities. Denoting by Θ the set of all parameters of the model, the likelihood function arising from observing D, with p (W,V ) (·, ·) the initial joint probability mass function of (W (0), V (0)), is given by The likelihood in (2) could be rewritten in the form Let us examine the meaning of Recall that η(w 1 , w 1 ) and ξ(v 1 , v 1 ) are non-positive real numbers. Thus, T (s)ds could be interpreted as the total risk of an event, either a recurrent event in the RCR component or a transition in the LM or HS components, occurring from all possible sources (RCR, LM, HS) that the unit is exposed to at the infinitesimal time interval [s, s + ds), given the history F s− just before time s. We provide further explanations of the elements of the joint model. First, there is a tacit assumption that no more than one event of any type could occur at any given time s. Second, the event rate at any time point s for any type of event is in the presence of the other possible risk events. Thus, consider a specific q 0 ∈ I Q and assume that the unit is still at-risk at time s 0 . Then, is the conditional probability, given F s0− , that an event occurred at [s 0 , s 0 + ds 0 ) and is of RCR type q 0 and all other event types did not occur, which is the essence of what is called a crude hazard rate, instead of a net hazard rate. Third, the effective (or virtual) age functions E q (·)s, which are assumed to be dynamically determined and not dependent on any unknown parameters, encodes the impact of performed interventions that are performed after each event occurrence. Several possible choices of these functions are: • E q (s) = s for all s ≥ 0 and q ∈ I Q . This is usually referred to as a minimal repair intervention, corresponding to the situation where an intervention simply puts back the system at the age just before the event occurrence. . are the successive event times. This corresponds to a perfect intervention, which has the effect of re-setting the time to zero for each of the RCRs after each event occurrence. In a reliability setting, this means that all Q components (unless having exponential lifetimes) are replaced by corresponding identical, but new, components at each event occurrence. . are the successive event times of the occurrences of the RCR events. . are the successive event times of the occurrences of RCR events of type q. • Other general forms are possible, such as those in [9] and [14] , the latter employing ideas of Kijima. See also the discussion on the 'reality' of virtual or effective ages in the paper by [12] , as well as the recent review paper by [4] . Fourth, the impact of accumulating RCR event occurrences, which could be adverse, but could also be beneficial as in software engineering applications, is incorporated in the model through the ρ q (·; ·) functions. One possible choice is an exponential function, such as ρ q (N R (s−); α q ) = exp{N R (s−) T α q }, but other choices could be made as well. Finally, the modulating exponential link function in the model is for the impact of the covariates as well as the values of the RCR, LM, and HS processes just before the time of interest, with the vector of coefficients quantifying the effects of the covariates. The use of N (s−) in the model could be viewed as using them as internal covariates, or that the dynamic model is a self-exciting model. {η(w 1 , w 2 ) : (w 1 , w 2 ) ∈ I W } and {ξ(v 1 , v) : (v 1 , v) ∈ I V }. Thus, if just before time s 0 , W (s 0 −) = w 1 and V (s 0 −) = v 1 , indicated by F s0− (w 1 , v 1 ), then P{W (s 0 + ds 0 ) = w 2 |F s0 (w 1 , v 1 )} ≈ (5) η(w 1 , w 2 ) exp{Xβ W + γ W j(v1) + N R (s 0 −)ν W }ds 0 ; P{V (s 0 + ds 0 ) = v|F s0 (w 1 , v 1 )} ≈ (6) ξ(v 1 , v) exp{Xβ V + κ V j(w1) + N R (s 0 −)ν V }ds 0 , where j(v 1 ) is the index associated with v 1 in V 1 and j(w 1 ) is the index associated with w 1 in W. There is a special case arising from this general joint model obtained when we set λ 0q (s) = λ 0q , q ∈ I Q ; ρ q = 1; θ R = 0; θ W = 0; and θ V = 0. In this situation, we have It is easy to see that this particular model coincides with the model where we have the following situations: (i) N R (·; q), q ∈ I Q , are independent homogeneous Poisson processes with respective rates λ 0q , q ∈ I Q ; (iv) N R , W , and V are independent; and (v) Processes are observed over [0, min(τ, τ A )], where τ is the end of monitoring period, while τ A is the absorption time of V into V 0 . In this special situation, the λ 0q 's are both crude and net hazard rates. Also, due to the memoryless property of the exponential distribution, interventions performed after each event occurrence will have no impact in the succeeding event occurrences. This specific joint model further allows us to provide an operational interpretation of the model parameters. Thus, suppose that at time s, the LM process is at state w 1 and the HS process is at state v 1 / ∈ V 0 . Then, the distribution of the time to the next event occurrence of any type (the holding or sojourn time at the current state configuration) has an exponential distribution with parameter C = λ 0• − η(w 1 , w 1 ) − ξ(v 1 , v 1 ). When an event occurs, then the (conditional) probability that it is (i) an RCR event of type q is λ 0q /C; (ii) a transition to LM state w 2 = w 1 is η(w 1 , w 2 )/C; or (iii) a transition to an HS state v = v 1 is ξ(v 1 , v)/C. This is the essence of the competing risks nature of all the possible event types: an RCR event, an LM transition, and an HS transition. As such, the more general model could be viewed as an extension of this basic model with independent Poisson processes for the RCR component and CTMCs for the LM and HS components. For this special case, the likelihood function in (3) simplifies to the expression where . Note that T (s) = T (s; λ 0 , η, ξ), that is, it is a quantity instead of a statistic. Here X 1 ∼ BER(0.5), X 2 ∼ N (0, 1). X 1 and X 2 are generated independently of each other. Panel 2: Recurrent competing risks occurrences with three types of competing risks. Each unit is either censored ("+") or reaches the absorbing status ("×"). Panel 3: Marker processes. Panel 4: Health status processes, with state "1" absorbing. Having introduced the joint model, we now address in this section the problem of making inferences about the model parameters. We assume that we are able to observe n independent units, with the ith unit having data (1). The full sample data will then be represented by while the model parameters will be represented by, with the convention that q ∈ I Q , (w 1 , w 2 ) ∈ I W , and The λ 0q s could be parametrically-specified, hence will have finite-dimensional parameters, so Θ will also then be finite-dimensional. Except for the special case mentioned above, our main focus will be the case where the λ 0q s are nonparametric. The distributions, G i s, of the end of monitoring periods, τ i s, also have model parameters, but they are not of main interest. To visualize the type of sample data set that accrues, Figure 2 provides a picture of a simulated sample data with n = 50 units. The full likelihood function, given D, is (3). If the λ 0q (·)s are parametrically-specified, then estimators of the finite-dimensional model parameters could be obtained as the maximizers of this full likelihood function, and their finite and asymptotic properties will follow from the general results for maximum likelihood estimators based on counting processes; see, for instance, [6] and [1] . We illustrate this situation for the special case of the model given in subsection 3.3, so that the parameter is simply In this situation, from (7), the full likelihood reduces to, with τ * i = τ i ∧ τ iA , Equating these equations to zeros yield the ML estimators of the parameters, which are given below and possess the interpretation of being the observed "occurrence-exposure" rates. In these estimators, the numerators are total event counts, e.g., is the total number of observed transitions in the LM process from state w 1 into w 2 ; and is the total number of observed transitions in the HS process from state v 1 into v. On the other hand, the denominators are total observed exposure times, e.g., dw is the total observed time of all units that they were at-risk for a transition in the LM process from state w 1 ; and ds is the total observed time of all units that they were at-risk for a transition in the HS process from state v 1 . An important and crucial point to emphasize here is that in these exposure times, they all take into account the time after the last observed events in each component process until the end of monitoring, whether it is a censoring (reaching τ i ) or an absorption (reaching τ iA ). If one ignores these right-censored times, then the estimators could be severely biased. This is a critical aspect we mentioned in the introductory section and re-iterate at this point that this should not be glossed over when dealing with recurrent event models. The elements of the observed information matrix, I(Θ; D) = −∇ Θ T U (Θ|D), which is a diagonal matrix, have diagonal elements given by: Abbreviating the estimators intoΘ = (λ 0 ,η,ξ), we obtain the asymptotic result, that as n → ∞, with AsyMVN meaning asymptotically multivariate normal. Thus, this result seems to indicate that the RCR, LM, and HS components or the estimators of their respective parameters do not have anything to do with each other, which appears intuitive since the RCR, LM, and HS processes were assumed to be independent processes to begin with. But, let us examine this issue further. The result in (9) is an approximation to the theoretical result that where 1 n I(Θ; D) pr → I(Θ). Evidently, I is a diagonal matrix, so let us examine its diagonal elements. Let q ∈ I Q . Then we have, with 'pr-lim' denoting in-probability limit, The first term on the right-hand side (RHS) converges in probability to zero by the weak law of large numbers and the zero-mean martingale property. The second term in the RHS converges in probability to its expectation, hence But, now, The last probability term above will depend on the generators so that the theoretical Fisher information or the asymptotic variance associated with the estimatorλ 0q depends after all on the HS process, as well as on the G i s, contrary to the seemingly intuitive expectation that it should not depend on the LM and HS processes. This result is a subtle one which arise because of the structure of the observation processes. If G i = G, i = 1, . . . , n, then we have that . . , n. If we denote by Γ the generator matrix of {V (s) : s ≥ 0} and let Γ 1 be the sub-matrix associated with the V 1 states, then if p V 0 ≡ (p V (v 1 ), v 1 ∈ V 1 ) T is the initial probability mass function of V (0), we have where the matrix exponential is e sΓ11 ≡ ∞ k=0 s k Γ k 11 k! and 1 K is a column vector of 1s of dimension K. Thus, we obtain For example, ifḠ(s) = exp(−νs), that is, τ i 's are exponentially-distributed with mean 1/ν, then the above expression simplifies to For computational purposes, one may use an eigenvalue decomposition of Γ 11 : Γ 11 = U Dg(d)U −1 where d consists of the eigenvalues of Γ 11 and U is the matrix of eigenvectors associated with the eigenvalues d. The main point of this example though is the demonstration that estimators of the parameters associated with the RCR, LM, or HS process will depend on features of the other processes, even when one starts with independent processes. We remark that the estimatorsλ 0q s,η(w 1 , w 2 )s, andξ(v 1 , v)s could also be derived as method-of-moments estimators using the martingale structure. The inverse of the observed Fisher information matrix coincides with an estimator using the optional variation (OV) matrix process, while the inverse of the Fisher information matrix coincides with the limit-in-probability of the predictable quadratic variation matrix. To demonstrate for λ 0q , we have that is a zero-mean square-integrable martingale. Letting s → ∞, setting n i=1 M R i (∞; q) = 0, and solving for λ 0q yieldŝ λ 0q . Next, we haveλ We have already seen where ∞ 0 1 n n i=1 Y i (s)ds converges in probability, whereas by the Martingale Central Limit Theorem, we have that which is the same result stated above using ML theory. Analogous asymptotic derivations can be done forη(w 1 , w 2 ) andξ(v 1 , v), though the resulting limiting variances will involve expected occupation times for their respective states of the W i -processes coming from the Y W i (·; w 1 ) terms and the V i -processes from the Y V i (·; v 1 ) terms. Note that, asymptotically, these estimators are independent, but their limiting variances depend on the parameters from the other processes. In this section we consider the estimation of model parameters when the hazard rate functions λ 0q (·)s are specified nonparametrically. We shall denote by Λ 0q (t) = t 0 λ 0q (u)du, q ∈ I Q , the associated cumulative hazard functions. To simplify notation, we let Using these functions, for q ∈ I Q , (w 1 , w 2 ) ∈ I W , and (v 1 , v) ∈ I V , we then have We also abbreviate the vector of model parameters into Θ ≡ (Λ 0 , α, η, ξ, θ R , θ W , θ V ). Our goal is to obtain estimators for these parameters based on the sample data D = (D 1 , D 2 , . . . , D n ). In a nutshell, the basic approach to obtaining our estimators is to first assume that (α, θ R , θ W , θ V ) are known, then obtain 'estimators' of (Λ 0 , η, ξ). Having obtained these 'estimators', in quotes since they are not yet estimators when (α, θ R , θ W , θ V ) are unknown, we plug them into the likelihood function to obtain a profile likelihood function. From the resulting profile likelihood function, which depends on (α, θ R , θ W , θ V ), we obtain its maximizers with respect to these finite-dimensional parameters to obtain their estimators. These estimators are then plugged into the 'estimators' of (Λ 0 , η, ξ) to obtain their estimators. The full likelihood function based on the sample data D = (D 1 , . . . , D n ) could be written as a product of three "major" likelihood functions corresponding to the three model components: where, suppressing writing of the parameters in the functions, Let 0 = S 0 < S 1 < S 2 < . . . < S K < S K+1 = ∞ be the ordered distinct times of any type of event occurrence for all the n sample units. Also, let 0 = T 0 < T 1 < T 2 < . . . < T L < T L+1 = ∞ be the ordered distinct values of the set {E iq (S j ) : i = 1, . . . , n; q ∈ I Q ; j = 0, 1, . . . , S K }. Recall that τ * i = τ i ∧ τ iA . Observe that both {S k : k = 0, 1, . . . , K, K + 1} and {T l : l = 0, 1, . . . , L, L + 1} partition [0, ∞). For each i = 1, . . . , n, and q ∈ I Q , E iq (·) is observed, hence defined, only on [0, τ * i ). However, for notational convenience, we define E iq (s) = 0 for s > τ * i . In addition, on each non-empty interval (S j−1 ∧ τ * i , S j ∧ τ * i ], E iq (·) has an inverse which will be denoted by E −1 iqj (·). Henceforth, for brevity of notation, we adopt the mathematically imprecise convention that 0/0 = 0. Proposition 4.1. For q ∈ I Q , if (α q , θ R ) is known, then the nonparametric maximum likelihood estimator (NPMLE) of Λ 0q (·) is given byΛ Proof. The likelihood L R q (Λ 0q , α q , θ R |D) could be written as follows: Focusing on the nonparametric parameter Λ 0q (·), the first term of L R q could be written as The exponent in the second term of L R q could be written as follows, the second equality obtained after an obvious change-of-variable and using the definition of S 0R q (·; ·, ·) in the proposition: Therefore, L R q , when viewed solely in terms of the parameter Λ 0q equals Since L+1 l=1 u∈(T l−1 ,T l ) S 0R q (u|α q , θ R )dΛ 0q (u) ≥ 0, then we obtain the upper bound for L R q by setting this term to be equal to zero: The upper bound is maximized by setting , l = 1, 2, . . . , L. For u ∈ (T l−1 , T l ), we then takeΛ 0q (u|α q , θ R ) =Λ 0q (T l−1 |α q , θ R ) which will satisfy the condition u∈(T l−1 ,T l ) S 0R q (u|α q , θ R )dΛ 0q (u) = 0 for all l = 1, 2, . . . , L + 1. Thus, which is a step-function with jumps only on T l s with dN R • (∞, T l ; q) > 0, maximizes Λ 0q (·) → L R q (Λ 0q (·)|α q , θ R |D) for given (α q , θ R ), completing the proof of the proposition. A more elegant representation of the Aalen-Breslow-Nelson type estimatorΛ 0q (·; α q , θ R ) in Proposition 4.1, which shows that the estimator is also moment-based, aside from being useful in obtaining finite and asymptotic properties, is through the use doubly-indexed processes as in [25] for a setting with only one recurrent event type and without LM and HS processes. Define the doubly-indexed processes Then, for fixed t, Proof. Similar to steps in the proof of Proposition 4.1. By first assuming that (α q , θ R ) is known, then from the identities in Proposition 4.2, a method-of-moments type estimator of Λ 0q (t) is given bŷ When s → ∞, thisΛ 0q (s, t|α q , θ R ) converges to the estimatorΛ 0q (t|α q , θ R ) in Proposition 4.1. Next, we obtain estimators of the η(w 1 , w 2 )s and ξ(v 1 , v)s, again by first assuming first that θ W and θ V are known. Proposition 4.3. If (θ W , θ V ) are known, the ML estimators of the η(w 1 , w 2 )s and ξ(v 1 , v)s are the "occurrenceexposure" ratesη where Proof. Follows immediately by maximizing the likelihood functions L W and L V with respect to the η(w 1 , w 2 )s and ξ(v 1 , v)s, respectively. We can now form the profile likelihoods for the parameters ({α q , q ∈ I Q }, θ R , θ W , θ V ). These are the likelihoods that are obtained after plugging-in the 'estimators'Λ 0q (·; α q , θ R )s,η(w 1 , w 2 )s, andξ(v 1 , v)s in the full likelihoods. The resulting profile likelihoods are reminiscent of the partial likelihood function in Cox's proportional hazards model [8, 2] . Proposition 4.4. The three profile likelihood functions L R pl , L W pl and L V pl are given by . with dN W i (S j ; w 1 , •) = w2∈W; w2 =w1 dN W i (S j ; w 1 , w 2 ), the number of transitions from state w 1 at time S j for unit i, and dN V i (S j ; v 1 , •) = v∈V; v =v1 dN V i (S j ; v 1 , v), the number of transitions from state v 1 at time S j for unit i. Proof. These follow immediately by plugging-in the 'estimators' into the three main likelihoods in (11) and then simplifying. From these three profile likelihoods, we could obtain estimators of the parameters α q s, θ R , θ W , and θ V as follows: (α q , q ∈ I Q ,θ R ) = arg max (αq,θ R ) L R pl (α q , q ∈ I Q , θ R |D); . Equivalently, these estimators are maximizers of the logarithm of the profile likelihoods. These log-profile likelihoods are more conveniently expressed in terms of stochastic integrals as follows: Associated with each of these log-profile likelihood functions are their profile score vector (the gradient or vector of partial derivatives) and profile observed information matrix (negative of the matrix of second partial derivatives): U R (α, θ R ); U W (θ W ) and U V (θ V ) for the score vectors, and I R (α, θ R ), I W (θ W ) and I V (θ V ) for the observed information matrices. The estimators could then be obtained as the solutions of the set of equations U R pl (α,θ R ) = 0; U W pl (θ W ) = 0; U V pl (θ V ) = 0. A possible computational approach to obtaining the estimates is via the Newton-Raphson iteration procedure: Having obtained the estimates of α, θ W and θ V , we plug them intoΛ 0q (t|α q , θ R )s, η(w 1 , w 2 |θ W )s and ξ(v 1 , v|θ V )s to obtain the estimatorsΛ 0q (t)s,η(w 1 , w 2 )s andξ(v 1 , v)s: In this section we provide some asymptotic properties of the estimators, though we do not present the rigorous proofs of the results due to space constraints and instead defer them to a separate paper. To make our exposition more concise and compact, we introduce additional notation. Consider a real-valued function h defined on I W . Then, h(w 1 , w 2 ) will represent the value at (w 1 , w 2 ), but when we simply write h it means the |I W | × 1 vector consisting of h(w 1 , w 2 ), (w 1 , w 2 ) ∈ I W . Thus, η is an |I W | × 1 (column) vector with elements η(w 1 , w 2 )s; N W i (s) is an |I W | × 1 vector consisting of N W i (s; w 1 , w 2 )s; S 0W (s|θ W ) is an |I W | × 1 vector with elements S 0W (s; w 1 |θ W ), (w 1 , w 2 ) ∈ I W ; and ψ W i (s|θ W ) is an |I W | × 1 vector consisting of the same elements. Similarly for those associated with the HS process where functions are defined over I V ; e.g., Let us first consider the profile likelihood function L W pl and the estimatorsθ W andη. Using the above notation, the associated profile log-likelihood function, score function, and observed information matrix could be written as follows: A '·' over a function means gradient with respect to the parameter vector, while a '··' over a function means the matrix of second-partial derivatives with respect to the element of the parameter vector. In obtaining I W pl , we also used the fact that ψ W i is an exponential function. We now present some results aboutθ W . We also let This will be a function of all the model parameters, since the limiting behavior of the Y i s and Y W i s will depend on all the model parameters, owing to the interplay among the RCR, LM, and HS processes, as could be seen in the special case of Poisson processes and CTMCs. We point out an important result needed for the proof of Proposition 5.1(iii), which is that . This asymptotic equivalence indicates where the involvement of the at-risk processes come into play in the limiting profile information matrix, hence the dependence on all the model parameters. This also shows that a natural consistent estimator of I W pl is I W pl (θ W )/n. Next, we are now in position to present asymptotic results aboutη. First, let s 0W and · s 0W be deterministic functions satisfying Proposition 5.2. Under certain regularity conditions, we have as n → ∞, (iii) (Joint Asymptotic Normality) √ n(η − η) and √ n(θ W − θ W ) are jointly asymptotically normal with means zeros and asymptotic covariance matrix We remark that in result (ii) for the asymptotic covariance matrix Σ in Proposition 5.2, the additional variance term in the right-hand side is the effect of plugging-in the estimatorθ W for θ W in the 'estimators'η(w 1 , w 2 |θ W )s to obtain η(w , w 2 )s. Without having to write them down explicitly, similar results are obtainable for the estimatorsθ V andξ. Next, we present results concerning the asymptotic properties of the estimators of α q s, θ R , and Λ 0q (·)s. Define the restricted profile likelihood for (α q , θ R ) based on data D observed over [0, s * ] via with S 0R q (·, ·|·, ·) as defined in Proposition 4.2. This is restricted in the sense that we only consider events that happened when the effective age are no more than t * and happened over [0, s * ]. Note that as we let t * → ∞ and s * → ∞, we obtain the profile likelihood in Proposition 4.4, The log-profile likelihood function is . The associated profile score function is where, for q ∈ I Q , . The predictable variation associated with this score function is . The estimatorsα q (s * , t * ), q = 1, . . . , Q, andθ R (s * , t * ) satisfy the equation U R pl (α,θ R |s * , t * ) = 0. The observed profile information matrix I R pl (α, θ R |s * , t * ) is a (Q + 1) × (Q + 1) symmetric block matrix with the following block elements, for q, q = 1, . . . , Q: . These are given by the following expressions, for q, q = 1, . . . , Q: where for the last expression we used the fact that ψ i (z) = exp(z). A condition needed for the asymptotic results is that there is an invertible matrix function I R pl (α, θ R |s * , t * ) which equals the in-probability limit of 1 n I R pl (α, θ R |s * , t * ) as n → ∞. Under this condition, we have the following consistency and asymptotic normality results for (α,θ R ): Proposition 5.3. Under regularity conditions and as n → ∞, Important equivalences to note are, for q, q = 1, . . . , Q: . In addition, the regularity conditions must imply that, we have These conditions imply that 1 n U R pl − 1 n I R pl p → 0 as n → ∞. These are the analogous results to those in the usual asymptotic theory of maximum likelihood estimators, where the Fisher information is equal to the expected value of the squared partial derivative with respect to the parameter of the log-likelihood function and also the negative of the expected value of the second partial derivative of the log-likelihood function. They are usually satisfied by imposing a set of conditions that allows for the interchange of the order of integration with respect to s and the partial differentiation with respect to the parameters; see, for instance, [6] , [2] , and [24] in similar but simpler settings. In fact, this representation is also crucial for finding the asymptotic properties of the estimators of the Λ 0q (·)s, which we will now present. By first-order Taylor expansion and under regularity conditions, we have the representations, for each q = 1, . . . , Q, given bŷ For q = 1, . . . , Q, let M R i (s * , dw; q) + Λ * 0q (s * , t). and form the vectors of functionsẐ = (Ẑ q , q = 1, . . . , Q) andH R i = (H R iq , q = 1, . . . , Q). Then, the main asymptotic representation leading to the asymptotic results for the RCR component parameters is given by, for t ∈ [0, t * ], Using this representation, we obtain the following asymptotic properties, though not stated in the most general form. Proposition 5.4. Under certain regularity conditions, we have nẐ(s * , t) are asymptotically independent; (ii) For each q = 1, . . . , Q,Λ 0q (s * , t) converges uniformly in probability to Λ 0q (t) for t ∈ [0, t * ]; (iii) For each q = 1, . . . , Q, and for t ∈ [0, t * ], √ n[Λ 0q (s * , t) − Λ 0q (t)] converges in distribution to a normal distribution with mean zero and variance where (b 1q (s * , t), b 2q (s * , t)) = pr-lim 1 n (B 1q (s * , t), B 2q (s * , t)) . (iv) More, generally, for q = 1, . . . , Q, the stochastic process ) to a zero-mean Gaussian process with covariance function This covariance function is consistently estimated by replacing the unknown functions by their empirical counterparts and replacing the finite-dimensional parameters by their estimators. We point out that theΛ 0q , q = 1, . . . , Q, are asymptotically dependent, and are also not independent of (α(s * , t * ),θ R (s * , t * )). The results stated above generalize those in [2] and [24] to a more complex and general situation. In this section we provide a numerical illustration of the estimation procedure when given a sample data. The illustrative sample data set with n = 50 units is depicted in Figure 2 . This is generated from the proposed model with the following characteristics. For the ith sample unit, the covariate values are generated according to X i1 ∼ BER(0.5), X i2 ∼ N (0, 1) with X i1 and X i2 independent, where BER(p) is the Bernoulli distribution with success probability of p. The end of monitoring time is τ i ∼ EXP(5), where EXP(λ) is the exponential distribution with mean λ. For the RCR component with Q = 3, the baseline (crude) hazard rate function for risk q ∈ {1, 2, 3}, is a two-parameter Weibull given by with κ * q ∈ {2, 2, 3} and θ * q ∈ {0.9, 1.1, 1}. The associated (crude) survivor function for risk q is Table 1 : The second column contains the true values of the parameters in the first column of the model that generated the illustrative sample data in Figure 2 and also used in the simulation study. The third column are the estimates of these parameters arising from the illustrative sample data, while the fourth column contains the information-based estimates of the standard errors of the estimators. The RCR component model parameters are the αs and those with superscript R. The LM component parameters are those with superscript of W . Finally, the HS component parameters are those with superscript V . For each replication, the realized data for the ith unit among the n units are generated in the following manner. At time s = 0, we first randomly assign an initial LM state (either 1, 2, or 3) and HS state (either 2 or 3) uniformly among the allowable states and independently for the LM and HS processes. We specify a fixed length of the intervals partitioning [0, ∞), which in the simulation runs was set to ds = 0.001, so we have intervals I k = [s k , s k+1 ) = [k(ds), (k + 1)(ds)), k = 0, 1, 2, . . .. A smaller value of ds will make the data generation coincide more closely to the model, but at the same time will also lead to higher computational costs, especially in a simulation study with many replications. The data generation proceeds sequentially over k = 0, 1, 2, . . .. Suppose that we have reached interval I k = [s k , s k+1 ) with W (s k ) = w 1 and V (s k ) = v 1 . For q ∈ I Q , generate a realization e R q of E R q according to a BER(p R q ) with p R q given by (4) . Also, for w 2 = w 1 , generate a realization e W w2 of E W w2 according to a BER(p W w2 ) with p W w2 given by (5) . Finally, for v = v 1 , generate a realization e V v of E V v according to a BER(p V v ) with p V v given by (6) . • If all the realizations e R q , q ∈ I Q , e W w2 , w 2 ∈ I W , w 2 = w 1 , and e V v , v ∈ I V , v = v 1 are zeros, which means no events occurred in the interval I k , then we proceed to the next interval I k+1 , provided that τ / ∈ I k , otherwise we stop. • If exactly one of the realizations e R q , q ∈ I Q , e W w2 , w 2 ∈ I W , w 2 = w 1 , and e V v , v ∈ I V , v = v 1 equals one, so that an event occurred, then we update the values of N R , N W , N V and proceed to the next interval I k+1 , unless e V v = 1 for a v ∈ V 0 (i.e., there is a transition to an absorbing state) or τ ∈ I k , in which case we stop. In our implementation, since 0 < ds ≈ 0, the success probabilities in the Bernoulli distributions above will all be close to zeros, hence the probability of more than one of the e R q , q ∈ I Q , e W w2 , w 2 ∈ I W , w 2 = w 1 , and e V v , v ∈ I V , v = v 1 taking values of one is very small. But, in case there were at least two of them with values of one, then we randomly choose one to take the value of one and the others are set to zeros. Thus, we always have which means there is at most one event in any interval. Note that whether we reach an absorbing state or not, there will always be right-censored event times or sojourn times. We remark that the event time generation could also have been implemented by first generating a sojourn time and then deciding which event occurred according to a multinomial distribution at the realized sojourn time as indicated in subsection 3.2. In addition, depending on the form of the baseline hazard rate functions λ 0q (·)s (e.g., Weibulls) and the effective age processes E iq (·)s (e.g., backward recurrence times), a more direct and efficient manner of generating the event times using a variation of the Probability Integral Transformation is possible, without having to do the partitioning of the time axis as done above. However, the method above is more general, though approximate, and applicable even if the λ 0q 's or the E iq 's are of more complex forms. Graphical plots associated with the generated illustrative sample data are provided in Figure 2 of Section 4.1. For this sample data set there were 36 units that reached the absorbing state, with a mean time to absorption of about t = 1; while 14 units did not reach the absorbing state before hitting their respective end of their monitoring periods, with the mean monitoring time at about t = 2. Recall that τ i 's were distributed as EXP(5) so the mean of τ i is 5. One may be curious why the mean monitoring time for those that did not get absorbed is about 2 and not close to 5. The reason for this is because of an induced selection bias. Those units who got absorbed will tend to have longer monitoring times, hence those that were not absorbed will tend to have shorter monitoring times, explaining a reduction in the mean monitoring times for the subset of units that were not absorbed. We have developed programs in R [26] for implementing the semi-parametric estimation procedure for the joint model described above. We used these programs on this illustrative sample data set to obtain estimates and their estimated standard errors of the model parameters. The third and fourth columns of Table 1 contain the parameter estimates and estimates of their standard errors, respectively, of the finite-dimensional parameters in the first column, whose true values are given in the second column. Figure 3 depicts the true baseline survivor functions, which are Weibull survivor functions, together with their semi-parametric estimates for each risk of the three risks in the RCR component. The estimates of the baseline survivor functions are obtained using the product-integral representation of cumulative hazard functions. For this generated sample data, the estimates obtained and the function plots demonstrate that there is reasonable agreement between the true parameter values and true functions and their associated estimates, indicating that the semi-parametric estimation procedure described earlier appears to be viable. We have provided asymptotic results of the estimators in Section 5. In this section we present the results of simulation studies to assess the finite-sample properties of the estimators of model parameters. This will provide some evidence whether the semi-parametric estimation procedure, which appears to perform satisfactorily for the single illustrative sample data set in Section 6, performs satisfactorily over many sample data sets. These simulation studies were implemented using R programs we developed, in particular, the programs utilized in estimating parameters in the preceding section. In these simulation studies, as in the preceding section, when we analyze each of the sample data, the baseline hazard rate functions are estimated semi-parametrically, even though in the generation of each of the sample data sets, two-parameter Weibull models were used in the RCR components. Aside from the set of model parameters described in Section 6, the simulation study have the additional inputs which are the sample size n and the number of simulation replications Mreps, the latter set to 1000. The sample sizes used in the two simulation studies are n ∈ {50, 100}. For fixed n, for each of the Mreps replications, the sample data generation is as described in Section 6. Table 2 presents some summary results pertaining to the three processes based on the Mreps replications. The first three rows indicate the means and standard deviations of the number of event occurrences per unit for each risk, and the mean time for the first event occurrence of each risk. For example, risk 1 occurs about 2.6 times per unit with a standard deviation 3.57. The mean time for risk 1 to occur for the first time is about 0.48. We notice that occurrence frequencies for three risks are ordered according to Risk 1 Risk 2 Risk 3, and consequently risk 1 tends to have the shortest mean time to the first event. Also note that the mean number of event occurrences per unit for each risk is around 2, which implies that there are not too few RCR events or too many RCR events (see the property of "explosion" as discussed in [13] ) per unit. This indicates that the choice of the effective age function E iq (·) and the accumulating event function ρ q (·), together with the parameter values we chose for the data generation, were reasonable. The fourth to ninth rows show the mean and standard deviation of the number of transitions to specific states per unit, the mean and standard deviation of occupation times per unit for specific states, the mean and standard deviation of sojourn times for specific states. For example, column 4 tells us that (i) the mean number of transitions to state 2 of the HS process (HS 2 for short) per unit is 2.34; (ii) a unit would stay in HS 2 for an approximate time of 0.8 on average; (iii) the mean sojourn time for HS 2 is about 0.34. We do not include information for HS 1 since it is an absorbing state. Comparing the V = 2 and V = 3 columns, we find that units tend to transit to HS state 2 more often than to HS state 3. The mean occupation time for state 2 per unit is longer compared to state 3. For the last three columns, there are more transitions to state 2 than to other states. Thus, a unit tends to stay in LM state 1 more often than in the other two LM states. Table 2 : Summary statistics for three processes for Mreps replications of data set. The first three rows are for RCR events. The first three rows indicate the mean/standard deviation of the number of recurrent event occurrences per unit for each risk, the mean time for one recurrent event for each risk. The fourth to ninth rows are for HS and LM events. They indicate the mean/standard deviation of the number of transitions to the specific state per unit, the mean/standard deviation of the occupation time for the specific state per unit, the mean/standard deviation of the sojourn time for the specific state. State 1 of the HS process V is absorbing, hence not included in the table. Also, to obtain some insights into the model-induced dependencies among the components, we also obtained the correlations among RCR, LM, and HS processes over time from the simulated data. We first constructed a vector of six variables over a finite number of time points S ⊂ [0, τ ] given by For each s ∈ S, we then obtained the sample correlation matrix C(s) from {Z i (s), i = 1, 2, . . . , n}. Each of the Mreps replications then yielded a C(s), so we took the mean, element-wise, of these Mreps correlation matrices. The matrix of scatterplots in Figure 4 provides the plots of these mean correlation coefficients over time points s ∈ S. The point we are making here is that the joint model does induce non-trivial patterns of dependencies over time among the three model components. The set of estimates obtained for one sample data set in the last section is insufficient to assess the performance of the semi-parametric estimation procedure. To get a sense of its performance we performed simulation studies with Mreps = 1000 replications and sample sizes n ∈ {50, 100}. For each replication, we generated a sample data set according to the same joint model, then obtained the set of estimates, via the semi-parametric procedure, for this data set. Summary statistics, such as the means, standard deviations, asymptotic standard errors, percentiles, and boxplots for all Mreps estimates were then obtained or constructed. Table 3 shows these summary statistics of the Mreps estimates for each parameter. The asymptotic standard errors reported in Table 3 are the means of the asymptotic standard errors over the Mreps replications. Also included are the percentile 95% confidence intervals for each of the unknown parameters based on the Mreps replications stratified according to n = {50, 100}. Since the Λ 0q s are functional nonparametric parameters, we only provide the estimates at four selected time points. Due to space limitation, we also only provide the results for a small subset of the η and ξ parameters in Table 3 . From these simulation results, we observe that the estimates are close to the true values of the model parameters, and the sample standard deviations are close to the asymptotic standard errors, providing some validation to the semi-parametric estimation procedure for the joint model and empirically lending support to the asymptotic results. By comparing the results for n = 50 and n = 100 in Table 3 , we find that when the sample size n increases, the performance of the estimators of the parameters improves with biases and standard errors decreasing. The graphical summary of these centered estimates for Mreps replications of n = 50 units is given in Figure 5 . Centering for each estimator is done by subtracting the true value of the parameter being estimated. We observe that the medians of all these centered estimates are close to 0. We also observe some outliers, but most of the centered Mreps estimates are close to 0. In Figure 6 , we show three types of plots for the baseline survivor function for each risk in the RCR component. The true Weibull type baseline survivor function is plotted in red color, the overlaid plots of a random selection of ten estimates of the baseline survivor functions are in green color, and the mean baseline survivor function based on the Mreps estimates is shown in blue color. We observe that there is close agreement between the true (red) and mean (blue) curves. Based on these simulation studies, the semi-parametric estimation procedure for the The sample correlation matrix is computed based on the n = 50 sample units. The element-wise means of the Mreps correlation matrices were then computed. The plots depict these mean sample correlations for the pairs of variables in C(s) over time s. joint model appears to provide reasonable estimates of the true finite-and infinite-dimensional model parameters, at least for the choices of parameter values for these particular simulations. Further simulation and analytical studies are still needed to substantively assess the performance of the semi-parametric estimation procedure for the proposed joint model. To illustrate our estimation procedure on a real data set, we apply the joint model and the semi-parametric estimation procedure to a medical data set with n = 150 patients diagnosed with metastatic colorectal cancer which cannot be controlled by curative surgeries. This data set was gathered in France from 2002-2007 and was used in [10] . It consists of two data sets which are deposited in the frailtypack package in the R Library [26] : data set colorectal.Longi and data set colorectal. The data set colorectal.Longi includes the follow-up period, in years, of the patient's tumor size measurements. The times of first measurements of tumor size vary from patient to patient, so to have all of them start at time 'zero', our artificial time origin, we consider these first measurements as their initial states. Subsequent times of measuring tumor size are then in terms of the lengths of time from their time origin. There were a total of 906 tumor size measurements for all the patients. In order to conform to our discrete-valued LM model, we classify (arbitrarily) the tumor size into three categories (states): 1, 2, and 3, if the tumor size belongs in the intervals [3.4, 6.6], [2, 3.4] , and (0, 2), respectively. Since the tumor size is only measured at discrete times, instead of continuously, we assumed that the tumor size state is constant between tumor size measurement times, and consequently the tumor size process could only transition at the times in which tumor size is measured. This assumption is most likely unrealistic, but we do so for the purpose of illustration. The data set colorectal contains some information about the patient's ID number, covariates X 1 and X 2 , with X 1 = 1(0) if patient received treatment C (S); X 2 consists of two dummy variables, with X 2 = (0, 0), (1, 0), (0, 1) if the initial WHO performance status is 0, 1, 2, respectively; the time (in years) of each occurrence of a new lesion since baseline measurement time; and the final right-censored or death time. There were 289 occurrences of new lesions and 121 patients died during the study. Clearly, this data set is a special case of the type of data appropriate for our proposed joint model, having only one type of recurrent event, one absorbing health status state (dead), and one transient health status state (alive). We assume the effective age E i (s) = s − S R iN R i (s−) after each occurrence of a new lesion, and use ρ(k); α) = α log(1+k) . The unknown model parameters in the RCR (here, just a recurrent event) component of the model includes α in the ρ(·) function, We fitted the joint model to this data set. The resulting model parameter estimates along with the information-based standard error estimates are given in Table 4 . The standard errors are obtained by taking square roots of the diagonal elements of the observed inverse of the profile likelihood information matrix. Based on these estimates, we could also perform hypothesis tests. Thus, for instance, the p-values associated with the two-tailed hypothesis tests are also given in the table. The null hypothesis being tested for α is that H 0 : α = 1, while the null hypotheses for the other model parameters are that their true parameter values are zeros. We test α = 1 instead of α = 0 because α = 1 means that the accumulating number of recurrent event occurrences does not have an impact in subsequent recurrent event occurrences. From the values in Table 4 , the estimate of α is less than one, which may indicate that each occurrence of new lesion decreases the risk of future occurrences of new lesions, though from the result of the statistical test we cannot conclude statistically that α < 1. Based on the set of p-values, we find that the initial WHO performance state of 1 and the tumor size state of 2 are associated with decreased risk of new lesion occurrences, while an initial WHO performance state of 2 is associated with an increased risk of new lesion occurrences. An initial WHO performance state of 2 and the number of occurrences of new lesions are associated with an increased risk of death in the health status. Finally, we want to emphasize the importance of the effective age process. An inappropriate effective age may lead to misleading estimates. Parameter model estimates under a mis-specified effective age could lead to biases and potentially misleading conclusions. This is one aspect where domain specialists and statisticians need to consider when assessing the impact of interventions since the specification of the effective or virtual age have important and consequential implications. For more discussions about effective or virtual ages, see the recent papers [12, 4] , the last one also touching on the situation where the virtual age function depends on unknown parameters. 0.14 0.00 Table 4 : Parameter estimates, information-based standard errors, and p-values for the RCR, LM, and HS processes based on the real data set. The p-value is based on the two-tailed hypothesis test that the model parameter is zero (except for the parameter α where H 0 : α = 1). The top block includes the model parameters in the RCR component, the middle block includes the model parameters in the LM component, and the bottom block includes the model parameters in the HS component. For the general class of joint models for recurrent competing risks, longitudinal marker, and health status proposed in this paper, which encompasses many existing models considered previously, there are still numerous aspects that need to be addressed in future studies. Foremost among these aspects is a more refined analytical study of the finite-sample and asymptotic properties of the estimators of model parameters, together with other inferential and prediction procedures. The finite-sample and asymptotic results could be exploited to enable performing tests of hypothesis and construction of confidence regions for model parameters. There is also the interesting aspect of computationally estimating the standard errors of the estimators. How would a bootstrapping approach be implemented in this situation? Another important problem that needs to be addressed is how to perform goodness-of-fit and model validation for this joint model. Though the class of models is very general, there are still possibilities of model mis-specifications, such as, for example, in determining the effective age processes, or in the specification of the ρ q (·)-functions. What are the impacts of such model mis-specifications? Do they lead to serious biases that could potentially result in misleading conclusions? These are some of the problems whose solutions await further studies. A potential promise of this joint class of models is in precision medicine. Because all three components (RCR, LM, HS) are taken into account simultaneously, in contrast to a marginal modeling approach, the synergy that this joint model allows may improve decision-making -for example, in determining interventions to be performed for individual units. In this context, it is of utmost importance to be able to predict in the future the trajectories of the HS process given information at a given point in time about all three processes. Thus, an important problem to be dealt with in future work is the problem of forecasting using this joint model. How should such forecasting be implemented? This further leads to other important questions. One is determining the relative importance of each of the components in this prediction problem. Could one ignore other components and still do as well relative to a joint model-based prediction approach? If there are many covariates, how should the important covariates among these numerous covariates be chosen in order to improve prediction of, say, the time-to-absorption? Finally, though our class of joint models is a natural extension of earlier models dealing with either recurrent events, competing recurrent events, longitudinal marker, and terminal events, one may impugn it as not realistic, but instead view it as more of a futuristic class of models, since existing data sets were not gathered in the manner for which these joint models apply. For instance, in the example pertaining to gout in Section 2, the SUR level and CKD status are not continuously monitored. However, with the advent of smart devices, such as smart wrist watches, embedded sensors, black boxes, etc., made possible by miniaturized technology, high-speed computing, almost limitless cloud-based memory capacity, and availability of rich cloud-based databases, the era is, in our opinion, fast approaching when continuous monitoring of longitudinal markers, health status, occurrences of different types of recurrent events, be it on a human being, an experimental animal or plant, a machine such as an airplane or car, an engineering system, a business entity, etc. will become more of a standard rather than an exception. By developing the models and methods of analysis for such future complex and advanced data sets, even before they become available and real, will hasten and prepare us for their eventual arrival. Statistical models based on counting processes Cox's regression model for counting processes: a large sample study Statistical theory of reliability and life testing A review of effective age models and associated non-and semiparametric methods Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks Maximum likelihood estimation in parametric counting process models, with applications to censored failure time data Joint modeling of longitudinal, recurrent events and failure time data for survivor's population Regression models and life-tables Nonparametric estimation for a general repair model de Cancérologie Digestive (FFCD) 2000-05 Collaborative Group. Sequential versus combination chemotherapy for the treatment of advanced colorectal cancer (ffcd 2000-05): an open-label, randomised A joint model for longitudinal continuous and time-toevent outcomes with direct marginal interpretation Virtual age, is it real? Discussing virtual age in reliability context Recurrent events and the exploding Cox model Modelling intervention effects after cancer relapses Parametric latent class joint model for a longitudinal biomarker and recurrent events Joint modelling of longitudinal measurements and event time data Joint models of longitudinal data and recurrent events with informative terminal event Tutorial in joint modeling and prediction: a statistical software for correlated longitudinal outcomes, recurrent events and a terminal event Shared frailty models for recurrent events and a terminal event Dynamic modeling & analysis of recurrent competing risks and a terminal event Dynamic prediction of risk of death using history of cancer recurrences in joint frailty models Joint analysis of longitudinal and survival data measured on nested timescales by using shared parameter models: an application to fecundity data Dynamic modeling and statistical analysis of event times Asymptotics for a class of dynamic recurrent event models Semiparametric inference for a general class of models for recurrent events R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data Nonparametric estimation with recurrent competing risks data Modeling the relationship of survival to longitudinal data measured with error. applications to survival and cd4 counts in patients with aids Joint modeling of longitudinal and time-to-event data: an overview Longitudinal monitoring of laboratory markers characterizes hospitalized and ambulatory covid-19 patients A joint model for survival and longitudinal data measured with error