key: cord-0446852-db1t4o1k authors: Beyhum, Jad; Florens, Jean-Pierre; Keilegom, Ingrid Van title: A nonparametric instrumental approach to endogeneity in competing risks models date: 2021-05-03 journal: nan DOI: nan sha: 124a0a4b4852c802401df62a33e8ce59089d40ee doc_id: 446852 cord_uid: db1t4o1k This paper discusses endogenous treatment models with duration outcomes, competing risks and random right censoring. The endogeneity issue is solved using a discrete instrumental variable. We show that the competing risks model generates a non-parametric quantile instrumental regression problem. The cause-specific cumulative incidence, the cause-specific hazard and the subdistribution hazard can be recovered from the regression function. A distinguishing feature of the model is that censoring and competing risks prevent identification at some quantiles. We characterize the set of quantiles for which exact identification is possible and give partial identification results for other quantiles. We outline an estimation procedure and discuss its properties. The finite sample performance of the estimator is evaluated through simulations. We apply the proposed method to the Health Insurance Plan of Greater New York experiment. Competing events are events that prevent the statistician from observing the time until the event of interest. A typical example is in biological studies with multiple causes of death. When the competing events are independent from the main outcome, then methods from the survival analysis literature that are designed for random right censoring can be used (Kaplan-Meier, Cox, ...) . However, when the causes are dependent, a more careful analysis is necessary and only some features of the model can be identified. As a result competing risks problems can be seen as dependent censoring problems, which may arise if, for instance, some study participants decide not to attend a follow-up interview. We consider a setting where the researcher is interested in the effect of a treatment on a randomly right-censored duration outcome in the presence of competing risks. The treatment is endogenous, it is not independent of the potential outcomes of the duration. In this case a naive analysis based on data conditional to treatment status could yield biased estimates, if, for example, treated study participants are the ones with the most positive treatment effect. Various approaches have been proposed in the literature to solve the endogeneity issue. Our method is based on an instrumental variable which is sufficiently dependent of the treatment but only affects the outcomes through the treatment. In a typical randomized experiment with binary treatment where there is noncompliance, a natural instrument is the treatment/control group assignment. This paper focuses on the case where both the treatment and the instrument are categorical. We show that the competing risks model gives rise to a nonparametric nonadditive regression problem. The typical features of interest in competing risks models are functionals of the regression function, which we identify thanks to the instrumental variable. The present framework differs from the usual setting of the nonparametric nonseparable instrumental regression literature Hansen (2005, 2006) ; Cazals et al. (2016) ; Wüthrich (2020)) because we allow for random right censoring (arising for instance from the end of the observation period) and competing risks (dependent censoring). The regression function cannot be identified for every value of the residual (which corresponds to quantiles of the distribution of the potential outcomes). The cause of identification failure can be censoring, competing risks or both. We single out the points at which the regression function is exactly identified and for the other quantiles we provide partial identification results. We discuss an estimation procedure and assess its performance through simulations. The strategy is applied to the Health Insurance Plan of Greater New York experiment. When there are no competing risks, this problem has been thoroughly studied Van den Berg (2003, 2005) ; Bijwaard and Ridder (2005) ; Chernozhukov et al. (2015) ; Frandsen (2015) ; Tchetgen Tchetgen et al. (2015) ; Li et al. (2015) ; Chan (2016) ; Sant'Anna (2016); Beyhum et al. (2021) ) with both parametric and nonparametric methods. In particular, Beyhum et al. (2021) also studies a nonparametric instrumental regression problem. The present paper makes two main contributions with respect to the latter works. First, we show how a competing risks model implies a nonparametric regression model. In the absence of competing risks, the duration model can naturally be formulated as a regression model, but in our case the link is far from obvious. Second, we tackle new identification and estimation challenges that arise uniquely because of the presence of competing risks. Note that the mechanism through which competing risks affect identification (non strict monotonicity of the regression function) are different from the effect of random right censoring nonparametric models. Their approaches rely on a monotonicity assumption stating that there are no defiers (see Angrist et al. (1996) ). This is a restriction on the compliance of study participants to their treatment assignment, which has been criticized (see De Chaisemartin (2017) ). Thanks to this condition, Richardson et al. (2017) identify the treatment effects on the cause-specific cumulative incidence functions for the population of compliers, that is agents whose treatment status follows treatment assignment. In Blanco et al. (2020) bounds on the treatment effects for the population of compliers who experience the duration spell of interest are provided, they are valid when the censoring is dependent, which could correspond to a competing risk. In contrast, the present paper makes a rank invariance assumption. It restricts the distribution of potential outcomes, such that ranks (in a sense that is given in Appendix A) between subjects cannot be reversed by the treatment. This condition allows us to identify and estimate the treatment effects on the full population, which is arguably more interesting when one weighs whether or not to make a treatment compulsory. See Wüthrich (2020) for a discussion of the trade-off between the two strategies. The paper is organized as follows. In Section 2, we outline the model. Then, identification is discussed in Section 3. Estimation is studied in Section 4. In Section 5, we present simulations. The method is applied to the Health Insurance Plan of Greater New York experiment in Section 6. Finally, concluding remarks are given in Section 7. The objective of this paper is to analyze treatment models when the outcome of the treatment is a duration time that is subject to competing risks and random right censoring. Suppose that there are two latent durations T j , j = 1, 2. For j = 1, 2, let U j be a residual which represents the dependence of T j on unobserved heterogeneity. Without loss of generality, the distribution of U j is normalized to be unit exponential. The dependence of U 1 and U 2 is left unrestricted (hence, the independent case is nested by our model). We assume that there exists a continuous and strictly increasing mapping ψ j : R + → R + such that The objects of interest fail at time T = min(T 1 , T 2 ) and the cause of this failure is E = min{arg min j=1,2 T j }. Let us now introduce the probability of failure from cause 1, that is p = P(E = 1), and the conditional cumulative distribution function of U j given E = j, that is F j (u j ) = P(U j ≤ u j |E = j) for u j ∈ R + . We assume that F j , j = 1, 2 is continuous and strictly increasing on the support of U j given E = j. We define the residual U = pF 1 (U 1 )I(E = 1) + (p + (1 − p)F 2 (U 2 ))I(E = 2), (2.1) where I(·) is the indicator function. It can be shown that the distribution of U is uniform on [0, 1] (See Lemma A.1 in Appendix A.1). The variable T is generated by U through the following relationship: Let us now introduce a categorical treatment Z with support {z 1 , . . . , z L }. We are interested in the potential outcomes (see Rubin (2005) ) of (T, E) under treatment status z ∈ {z 1 , . . . , z L }, which are denoted by (T (z), E(z)). We assume that the potential outcomes follow a model of the type (2.2), in the sense that for z ∈ {z 1 , . . . , z L } there exists p z ∈ (0, 1) and continuous and strictly increasing mappings ϕ 1 (z, ·) : [0, p z ) → R + and ϕ 2 (z, ·) : (p z , 1] → R + such that and T = T (Z), E = E(Z). When there are no Z, (2.3) becomes (2.2). Remark that p z = P(E(z) = 1) since U is uniform on [0, 1], and that the residual U does not vary among treatment statuses, this is the usual rank invariance assumption from the Econometrics literature (see Dong and Shen (2018) ). In Appendix A.2 we argue that this assumption allows for a wide variety of treatment effects. This paper is concerned with identification and estimation of some features of the model (see Section 2.2) when Z is endogenous and T = T (Z) is randomly right censored. We treat the confounding issue thanks to an instrumental variable (henceforth, IV). Formally, Z and U are dependent but we possess a categorical instrumental variable W with support {w 1 , . . . , w K } such that U and W are independent. There also exists a censoring variable C with support in R + such that we observe Y = min(T, C) and the censoring indicator Our model can be regarded as a quantile regression model. Indeed, if we introduce the random variable The quantity ϕ 1 z (u) is the u-quantile of the distribution of T 1 (z) and, hence, the model T 1 = ϕ 1 (Z, U ) can be seen as a IV model of quantile treatment effects as in Chernozhukov and Hansen (2005) . There are however two distinguishing differences with the usual model of Chernozhukov and Hansen (2005) . First, the variable T is randomly right censored. Second, the mapping ϕ 1 z (·) is not strictly increasing on all of its support. These differences pose additional identification and estimation issues which we tackle in this paper. Throughout the paper, we suppose that the distribution of U given {Z = z, W = w} is continuous, with strictly positive density on [0, u z,w ), where u z,w = sup{u ∈ R + : P(U ≤ u|Z = z, W = w) < 1} is the upper bound of the support of the distribution of U given {Z = z, W = w}. Remark that this allows the support of the latter distribution to differ from [0, 1]. We conclude the presentation of the model T 1 = ϕ 1 (Z, U ) by summarizing the underlying assumptions. (M) (i) ϕ 1 z is continuous and strictly increasing and [0, p z ) and equal to ∞ on (p z , 1]; (ii) U has a uniform distribution on the interval [0, 1] and independent of W ; (iii) The distribution of U given {Z = z, W = w} is continuous, with strictly positive density on [0, u z,w ). Note that no assumptions on ϕ 2 are needed to identifiy ϕ 1 . Let us now discuss the features that we seek to recover. It is well-known that it is not possible to identify the distribution of the latent durations in a competing risks model (see Tsiatis (1975) ). However, some characteristics which we introduce below are identified when Z is exogenous (see Geskus (2020)). For z ∈ {z 1 , . . . , z L }, we introduce the left continuous Let us define the cause-specific cumulative incidence function at time t ∈ R + for cause j under treatment status z, This equation implies that for u ∈ [0, 1], ϕ j z (u) is the u-quantile of the subdistribution function F j . Remark that F j z (t) is a structural function and therefore it is different from the conditional probability P(T j ≤ t|Z = z) = P(T ≤ t, E = j|Z = z). We also introduce the subdistribution hazard at time t ∈ R + for cause j under treatment status z, that is . (2.5) It is the hazard rate of the subdistribution function F j Let also the cause-specific hazard at time t for cause j under treatment status z be . (2.6) We are interested in the effect of the treatment status z on these three quantities, that is the treatment effects. Consider, for instance, the u-quantile treatment effect on the subdistribution of the duration until failure from cause 1, that is ϕ 1 1 (u) − ϕ 1 0 (u). As is clear from equations (2.4), (2.5), and (2.6), these various treatment effects are identified from knowledge of ϕ j , j = 1, 2. Therefore, without loss of generality, we focus on identification and estimation of ϕ 1 in the remainder of the paper. Remark that the fact that we assumed that there are only two causes is not restrictive. Indeed, if there are J ≥ 3 event types, the treatment effects for cause j * ∈ {1, . . . , J} can be identified from ϕ 1 if we label an exit from cause j * as j = 1 and an exit from any other cause as j = 2. Applying this modeling for each cause, that is J times, allows to recover the treatment effects for all causes. 3 Identification In this section, we discuss identification of ϕ 1 when the distribution of the observables (Y, δE, Z, W ) is known. As usual in nonparametric instrumental regression, the model generates a nonlinear system of equations. Let where (3.2) holds because ϕ 1 z (·) is strictly increasing on [0, p 1 ). Usually, one would simply make assumptions ensuring that this system has a unique solution. However, the particular context of this paper brings two additional difficulties. First, S 1 (·, z|w) may not be identified on all of its support because of censoring. Second, (3.1) is only valid for u ∈ [0, p 1 ) because ϕ 1 z (·), z = z 1 , . . . , z L are not strictly increasing on all [0, 1], which is a direct consequence of the presence of competing risks. These two difficulties restrict the set of values of u for which it is possible to identify (ϕ z (u)) L =1 thanks to (3.1). In the next two subsections, we characterize the latter set. We work throughout the paper with the usual independent censoring assumption: Let us also define the upper bounds of the support of C conditional on {Z = z, W = z} and We make the assumption that the upper bound of the censoring time only depends on the treatment status: This hypothesis simplifies the analysis but could be relaxed. It is likely to hold when the follow-up scheme of the study only depends on the treatment status. Moreover, in this paper we assume that T depends on W only through Z, therefore it makes sense to make the same assumption about C. Let are the survival function of T conditional on Z = z, W = w and the cause-1-specific hazard rate given Z = z, W = w, respectively. The function S(·|z, w) is identified on [0, c z ] by standard arguments from the survival analysis literature. Moreover, remark that, for all t in Hence, λ 1 (s|z, w) is identified on [0, c z ) and so are S 1 (·|z, w) and S 1 (t, z|w) = S 1 (·|z, w)P(Z = z|W = w). Next, we have two cases. In order to distinguish them, we introduce the (finite or infinite) upper bound of the support of the distribution of T given {E = 1, Z = z, W = w}: The above analysis implies that for u > u C , we do not know the mapping on the left hand side of (3.1) at the point θ = (ϕ z (u)) L =1 . As a result, (3.1) cannot point identify (ϕ z (u)) L for u > u C . The set for which (3.1) has the potential to deliver point identification can be further reduced. Let us introduce , then (ϕ 1 z (u)) L =1 cannot be identified by the system (3.1). Because it is sufficient that this happens for one value of z for identification to break down, it is not possible to identify (ϕ 1 z (u)) L =1 with (3.1) for all u ≥ u E , where u E = min L =1 (ϕ 1 z ) −1 (t 1 z ). The definition of t 1 z ensures that ϕ 1 z is invertible on [0, t 1 z ) and, hence, that u E is properly defined. It turns out that u E ≤ p 1 . Indeed, the support of the distribution of T given {E = 1, Z = z} is equal to the support of the distribution of ϕ 1 z (U ) given {E = 1, Z = z}. Since on the event {E = 1, Z = z} we have U ≤ p z , we obtain that the support of the distribution where for a mapping f : R + → R + and t ∈ R + , f (t−) is the left limit of f at t. Therefore (ϕ 1 z ) −1 (t 1 z ) ≤ p z because ϕ 1 z is strictly increasing on [0, p z ]. This leads to u E ≤ p 1 as claimed. Notice that if t 1 z were to be equal to ϕ 1 z (p z −), we would have that (ϕ 1 z ) −1 (t 1 z ) = p z and, hence, u E = p 1 . The latter happens when the upper bound of the support of U given Z = z is p z . This shows that the fact that the system of equations cannot identify (ϕ 1 z (u)) L =1 for u ∈ (u E , p 1 ) is due to the dependence of U and Z, that is the selection into treatment. At the moment, we have that the system (3.1) can only identify (ϕ 1 . Remark that even if the system (3.1) may not have a unique solution at u Y , (ϕ 1 z (u Y )) L =1 can always be identified by continuity as long as identification holds on Now, we make a strong conditional completeness assumption similar to that in Appendix C of Chernozhukov and Hansen (2005) which ensures that (3.1) has a single solution (ϕ 1 z (u)) L =1 ∈ L = 1 [0, y 1 z ). We follow the presentation of Appendix A in Fève et al. (2018) of this condition. Assume that the density of (U, Z) given W is perturbed in the direction of a function ∆ ∈ P, where P is the set of mappings ∆ : {z 1 , . . . , z L } × R + → R + such that ϕ 1 The main identification assumption is for all u ∈ [0, u Y ), w ∈ {w 1 , . . . , w K } and ∆ ∈ P, then ρ µ ≡ 0 for all µ ∈ [0, 1]. It can be interpreted as a strong conditional completeness condition of Z and a uniform random variable µ, given (W, U ) for the distribution g. In the case where Z and W are binary with support {0, 1} a simpler identification condition can be given. By Theorem 2 in Chernozhukov and Hansen (2005) , it suffices to assume (G') The density of U given (Z, W ) is continuous and the matrix has rank 2 for all t = (t 1 , t 2 ) ∈ [0, y 1 0 ) × [0, y 1 1 ). Assuming that f 1 (·, z|w) is continuous, this full rank condition implies that the determinant or the same inequality with < instead of >. Following the semantic of Chernozhukov and Hansen (2005) , this is a monotone likelihood ratio condition: the instrument increases (or decreases) the probability of being treated (Z = 1) for all levels of outcomes t ∈ [0, y 1 0 ) × [0, y 1 1 ). In the case of one-sided noncompliance, where P(Z = 1|W = 0) = 0, the condition is trivially satisfied as long as P(Z = 1|W = 1) > 0. To conclude on point identification, we need to be sure that u Y is identified because otherwise we are not able to know for which value of u the quantity (ϕ 1 z (u)) L =1 is identified. Remark that y 1 z = t 1 z ∧ c z is identified because it is the upper bound of the support of the distribution of Y given {E = 1, Z = z}. Hence, u Y = u E ∧ u C = min L =1 (ϕ 1 z ) −1 (t 1 z ∧ c z −) is identified (Under Assumption (G), one can solve the system (3.1) for increasing values of u until the left limit of ϕ 1 z (u) in u becomes y 1 z for one value of z ∈ {z 1 , . . . , z L }). The next theorem summarizes the results of Sections 3.1 to 3.4. When u > u Y , it is possible to partially identify (ϕ 1 z (u)) L =1 . Indeed, (3.2) becomes As a result, we have L =1 S 1 (ϕ 1 z (u), z |w k ) ≥ u for k = 1, . . . , K, but because S 1 (·, z|w) is only identified on [0, c z ] we cannot directly use these equations. As S 1 (t ∧ c z,w , z|w) ≥ S 1 (t, z|w) for all t ∈ R, we leverage instead the following proposition which gives an outer set to the identified set. Proposition 3.1 Under Assumption (M), for all u > u Y , we have This outer set is not sharp because it does not take into account the constraints of continuity and monotonicity of ϕ 1 . Let us now discuss how to compute this outer set as a finite union of product of intervals. We begin with the case where Z is binary with support {0, 1}. The set can have four types of shapes given in Figure 1 : for someθ 1 ,θ 2 ≥ 0. Now, we clarify how to obtain Figure 1 . Take θ ∈ [0, ∞] 2 outside [0, y 1 0 ) × [0, y 1 1 ). Remark that S 1 (t ∧ c z , z|w) does not depend on t on [y 1 z , ∞). Indeed, if y 1 z = t 1 z (i.e. t 1 z ≤ c z ), then t ≥ t 1 z and therefore S 1 (t ∧ c z,w , z|w) = S 1 (t 1 z , z|w) because t 1 z is larger than the upper bound of the distribution of T given {E = 1, Z = z, W = w}. If instead y 1 z = c z (i.e. c z ≤ t 1 z ), we have S 1 (t ∧ c z , z|w) = S 1 (t 1 z , z|w). As a result, θ is in the outer set if and only if (θ 1 + (y 1 0 − θ 1 )I(θ 1 > y 1 0 ), θ 2 + (y 1 1 − θ 2 )I(θ 2 > y 1 1 )) belongs to the outer set. Therefore, it is enough to study the value of θ → min K k=1 R k,u (θ) on [0, y 1 0 ] × {y 1 1 } ∪ {y 1 0 } × [0, y 1 1 ] to draw the outer set. We begin with [0, y 1 0 ) × {y 1 1 }. Because S 1 (·, 0|w) is continuously decreasing on [0, y 1 0 ), the set of vectors (θ 1 , y 1 1 ) in [0, y 1 0 ) × {y 1 1 } such that min K k=1 R k,u (θ) ≥ 0 is either a segment [0,θ 1 ] × {y 1 1 } whereθ 1 ∈ [0, y 1 1 ] or empty. If this set is not empty, as is in the outer set. If rather this set is empty, [0, y 1 1 ) × [y 1 1 , ∞] is not part of the outer set. Then, implement the approach on {y 1 0 } × [0, y 1 1 ). Finally, when min K k=1 R k,u ((y 1 0 , y 1 1 ) ) ≥ 0, [y 1 0 , ∞] × [y 1 1 , ∞] is in the outer set too. If L ≥ 3, one should use a recursive procedure. The outer set in dimension L can be computed as the union of the extrapolations of L outer sets in dimension L − 1. Indeed, one can begin with fixing the first coordinate θ 1 to y 1 z 1 and compute the outer set for the other L − 1 coordinates. If this set is not empty, it should be extrapolated by allowing θ 1 to belong to [y 1 z 1 , ∞]. In the same manner, one should use this approach for the L − 1 other coordinates. The outer set is then the union of the L sets obtained by fixing each coordinate. We consider estimation with an i.i.d. sample of size n, The goal is to estimate (ϕ 1 z (u)) L =1 for u < u Y , that is at quantiles where the regression function is point identified. We assume that we have consistent estimators S 1 of S 1 , u Y of u Y and y 1 z of y 1 z for z ∈ {z 1 , . . . , z L }. Choices of these estimators are discussed in Sections 4.2 and 4.3. We introduce further notations. The estimator of (ϕ 1 z (u)) L =1 is defined by and only the values of u such that u < u Y should be reported. In practice, we cannot compute the estimator at all u ∈ [0, 1]. Instead, we choose a grid 0 ≤ u 1 < u 2 , < · · · < u M ≤ 1, M values at which we want to estimate ϕ 1 . Take, for instance, {1/M, 2/M, . . . , 1}. When there are no competing risks, the estimator defined in (4.1) is the same as the one in Beyhum et al. (2021) , except that in the latter paper the term 1 − u in (4.1) is replaced with e −u because a different normalization of the distribution of U is used. When there are competing risks, the main difference between the estimator of the present paper and the one of Beyhum et al. (2021) is the fact that here the function S 1 is the survival function of a subdistribution while in Beyhum et al. (2021) from cause 1 have similar asymptotic properties. Therefore, in the present paper, we skip the theoretical analysis of solutions to (4.1) and refer the reader to Beyhum et al. (2021) for properties, proofs and further discussion. One major remaining difference between the estimation procedure in Beyhum et al. (2021) and the one in the present paper is that, here, we provide a theoretical analysis of how to estimate u Y , see Section 4.3 below. In this subsection, we discuss choices of S 1 that work under competing risks and random right censoring. Remark that S 1 (t, z|w) = S 1 (t|z, w)p z,w where p z,w = P(Z = z|W = w). We define the following stochastic processes: We estimate S 1 (·|z, w) using the Aalen-Johansen estimator of the cause-specific survival function (see Aalen and Johansen (1978) ; Geskus (2020)), that is To ensure that (4.1) has a unique solution, one should smooth S 1 AJ . Various techniques are available in the literature, including local polynomials and kernel smoothing. For instance, concerning the latter, if we use a kernel K with a bandwidth , we obtain Our final estimator of S 1 is Let us now introduce estimators of y 1 z , z = z 1 , . . . , z L and u Y . These estimators are new and not discussed in Beyhum et al. (2021) , which is why we provide theoretical results. Because y 1 z is the upper bound of the support the distibution of Y given {E = 1, Z = z}, a natural estimator is We have the following result. and that the grid (which depends on n) is dense in the sense that We also choose the sequences ∆ = (∆ ) n , = 1, . . . , L such that r n / min L =1 ∆ → 0. Then, The proof is given in Appendix B. An important remark is that the theoretical analysis in Beyhum et al. (2021) concerns the ideal estimator for all u < u Y where it is assumed that y 1 z , z = z 1 , . . . , z L and u Y are known. Our estimator ϕ 1 and ϕ 1 coincide on the event For u < u Y , when ( ϕ 1 z (u)) L =1 is consistent, we have lim n→∞ P(E) = 1 because y 1 z , z = z 1 , . . . , z L and u Y are consistent. Remark that ϕ 1 = ϕ 1 I(E) + ϕ 1 (1 − I(E)) and ( ϕ 1 z ) L =1 ∈ L =1 [0, y 1 z ) is bounded because y 1 z ≤ y 1 z , z = z 1 , . . . , z L almost surely. Hence, by the continuous mapping theorem and Slutsky's theorem, ϕ 1 has the same asymptotic properties as ϕ 1 and the results of Beyhum et al. (2021) are not altered. Let us consider the following data generating processes (henceforth, DGPs). The variable U has a uniform distribution on the interval [0, 1] and W is a Bernoulli random variable with parameter 2/3, independent of U . We generate where ∼ N (0, 1) is independent of (U, W ). Hence, in this simulation experiment there is one-sided noncompliance as in our empirical application (see Section 6). We set E = I(U > p Z ) + 1, where p 0 = 1/2, p 1 = 3/4. The duration is This leads to The treatment increases the probability that E = 1 and reduces the duration until cause 1 happens for a given value of u. Note that the support of U |Z is [0, 1] because of the presence of the random noise . Therefore, we have t 1 0 = 1, t 1 1 = 3/4 and u E = 1/2. We propose two designs for the censoring variable: Design 1: C is independent of (U, Z, W ) and uniform on the interval [1/3, 2/3]; Design 2: C is independent of (U, Z, W ) and uniform on the interval [1/3, 3/2]. In the first experiment, we have u C = 1/3 < u E , hence identification fails for u > u Y = 1/3 because of censoring. In the second exercise, it holds that u C = 1 and partial identification arises at u Y = 1/2 due to competing risks. With these DGPs, the probability of treatment given W = 1 is P(Z = 1|W = 1) = 73%. Under design 1, 30% of the observations are censored, while under design 2, it is only 10% (all these quantities are averages over 1,000,000 Monte Carlo replications). The sample size was set at n = 10, 000. We generated 1, 000 replications of the model under both designs. The estimator (4.1) was computed on the grid u m = 0.01m, m = 1, . . . , 100. We used a local polynomial of degree 1 with an Epanechnikov Kernel to smooth the Aalen-Johansen estimator of the cause-specific survival function. The bandwidth was selected according to the usual rule of thumb for normal densities. To estimate u Y , we used the estimator defined in (4.2). The quantity ∆ was chosen as the optimal bandwidth for normal density estimation with Epanechnikov kernel of the sample {Y i : i ∈ {1, . . . , n} such that Z i = z , δ i E i = 1}. (right). The estimator u Y almost always yields a quantile lower than but close to u Y , which avoids reporting results at points at which ϕ 1 is not identified. In Figures 4 and 5 , we present the average value of our estimator of the quantile treatment effect ϕ 1 1 − ϕ 1 0 for points of the grid {u m } M m=1 smaller than u Y . We also show the average of the naive estimator of the quantile treatment effect, which estimates ϕ 1 z by inverting the Aalen-Johansen estimator of the survival function P(T ≥ t, E = 1|Z = z). The naive estimator directly compares treated observations to untreated ones. It ignores the endogeneity issue and, hence, is biased. The figures also exhibit the average of the bounds of the 95% confidence intervals of the quantile treatment effects computed with 200 bootstrap draws. The true quantile treatment effects are not reported because they are indistinguishable from our estimator. Finally, Figures 6 and 7 show that the coverage of these 95% confidence intervals is almost nominal. We apply our methodology to the Health Insurance Plan of Greater New York experiment. This clinical trial aimed to evaluate the effect of periodic screening examinations (which aim to detect breast cancer) on breast cancer mortality. The study started in 1963 and lasted until 1986. The experiment follows 60,695 women between 40 and 60 years old. About half (30,565) of the participants were randomized into the control group (W = 0) while the rest (30,100) were assigned to the intervention group (W = 1). Members of the intervention arm were offered a treatment consisting of an initial breast examination and mammography and three yearly subsequent screens. 9,984 participants randomized into the intervention group refused the treatment, which corresponds to a rate of non-compliance of 33%. The women in the study group that accepted the treatment exhibited largely different observable characteristics from the ones that refused screening (see Shapiro (1997) ). This suggests that the treatment is endogenous. Let Z be the variable which equals 1 for women who were offered and accepted the treatment and 0 otherwise. All participants were followed through three mail surveys, respectively, 5, 10 and 15 years after their entry into the study. The outcome duration of interest T is the time from the initial randomization into the study until death. Censoring arises because some subjects are lost to follow-up before they die. Hence, C is the time between registration into the experiment and the last response to a follow-up survey. We consider two competing risks: deaths from breast cancer (j = 1) and death from any other cause (j = 2). In the sample, there are 786 deaths from breast cancer, 13, 798 deaths from other causes and 46, 111 censored observations. As in the simulations, the Aalen-Johansen estimators of the survival functions are smoothed using a local polynomial of degree 1. The bandwidths are selected similarly. The confidence intervals were computed using 200 bootstrap replications. For death by breast cancer, we computed the results on a grid of values of U starting at u 1 = 2 × 10 −4 with step 2 × 10 −4 . The value of the estimator of u Y was 0.013, corresponding to the 1.3% quantile. The quantile treatment effects can only be estimated for such low quantiles because the value of T 1 is almost always infinity (most women do not die from breast cancer) and censored (most women do not die in the 15 years following screening). The estimates of the quantile treatment effects along with the 95% confidence intervals are reported in Figure 8 . The quantile treatment effects are insignificant except for some low quantiles. This seems in contrast with findings in Shapiro (1997) , who concluded that the treatment reduced the probability of dying from breast cancer before a certain time, but in the latter paper the endogeneity issue is ignored. For cause 2 (death by other causes than breast cancer), a grid of values of U starting at u 1 = 0.02 with step 0.02 was chosen. We found u Y = 0.12. However, the estimated quantile treatment effects close to u Y exhibit a very irregular behavior. Hence, we chose to present in Figure 9 only the results for quantiles between 0 and 9.8%. As expected, the treatment does not have a significant effect on the subdistribution of the time until of death from another cause than breast cancer. This paper exhibits the link between competing risks models and nonparametric regression models in the presence of endogeneity. Thanks to this relationship, we are able to formulate our problem in terms of a quantile instrumental regression model. We study identification and estimation of the model. Numerical experiments assess the small sample performance of the method. We show how to revisit an empirical application using this paper's approach. Many possible directions for future research are interesting. A valuable generalization would allow for continuous or even dynamic treatments, instrumental variables and covariates X. With continuous variables, this could be done by replacing the system of equations (3.1) by a (potentially infinite) set of integral equations S 1 (ϕ 1 z (u), z, x|w)dz = 1 − u, for all x, w, u in the support of X, W and U . This is an ill-posed problem and regularization of the estimator would be required. It would also be interesting to identify directly the properties of the marginal distributions of the risks. Following this goal, one could assume that the risks are independent conditional on a set of covariables. Under this condition, the competing events can be treated as independent censoring. Another approach could be to extend models from the dependent censoring literature (such as in Basu and Ghosh (1978) ; Emoto and Matthews (1990) ; Deresa and Van Keilegom (2021); Czado and Van Keilegom (2021) ) to the case where endogeneity is allowed. Finally, it should be noted that because the duration T 1 has a mass at infinity, all the discussion of this paper can easily be adapted to the estimation of the latency in a cure model with endogenous treatment. In fact a competing risks model can be seen as a cure model where the cure status is known for some observations (see Betensky and Schoenfeld (2001) ). A promising project could investigate the identification of the cure fraction (the cause-specific probability of failure in the context of competing risks) when the treatment is endogenous. For a given event type j ∈ {1, 2}, the structural model of Section 2.1 implies the following rank invariance property: (RI) For two subjects i 1 and i 2 and z ∈ {z 1 , . . . , z L }, if T j i 1 (z) < T j i 2 (z), then (T j i 1 (z ), T j i 2 (z )) = (∞, ∞) or T j i 1 (z ) < T j i 2 (z ) for all z ∈ {z 1 , . . . , z L }. Indeed, T 1 i 1 (z) < T 1 i 2 (z) implies that U i 1 < U i 2 because ϕ 1 z is increasing. Then, if U i 1 < p z , we have T 1 i 1 (z ) < T 1 i 2 (z ) and otherwise it holds that (T i 1 (z ), T i 2 (z )) = (∞, ∞). The proof for j = 2 is similar. Let us discuss the implications of this assumption in a simple example. Let Z be a treatment for COVID-19 taking the values 0 (untreated) and 1 (treated). T is the duration until one of the two following competing risks happens: recovery (j = 1) and death (j = 2). We consider two subjects i 1 and i 2 . If T 1 i 1 (0) < T 1 i 2 (0), then i 1 recovers and i 2 dies or recovers after i 1 . Table 1 summarizes the different treated potential outcomes for (T 1 i 1 (1), T 1 i 2 (1)) allowed by the rank invariance assumption in this case. When T 2 i 1 (0) < T 2 i 2 (0), then i 1 dies and i 2 recovers or dies after i 1 and the possible treated potential outcomes for (T 2 i 1 (1), T 2 i 2 (1)) are displayed in Table 2 . It can be seen that the rank invariance assumption allows for many natural treatment effects. Let us consider two cases. If y 1 z < ∞, for > 0, we have | y 1 z − y 1 z | > if and only if Y i < y 1 z − for all i ∈ {1, . . . , n} such that Z i = z, δ i E i = 1. This has a probability at most An empirical transition matrix for non-homogeneous Markov chains based on censored observations The nonparametric identification of treatment effects in duration models Social experiments and instrumental variables with duration outcomes Identification of causal effects using instrumental variables Identifiability of the multinormal and other distributions under competing risks model Nonparametric estimation in a cure model with random cure times Nonparametric instrumental regression with right censored duration outcomes Correcting for selective compliance in a reemployment bonus experiment Bounds on average and quantile treatment effects on duration outcomes under censoring, selection, and noncompliance Nonparametric instrumental variables estimation for efficiency frontier Reader reaction: Instrumental variable additive hazards models with exposure-dependent censoring Quantile regression with censoring and endogeneity An IV model of quantile treatment effects Instrumental quantile regression inference for structural and treatment effect models Dependent censoring based on copulas Tolerating defiance? local average treatment effects without monotonicity On semiparametric modelling, estimation and inference for survival data subject to dependent censoring Testing for rank invariance or similarity in program evaluation A Weibull model for dependent censoring Estimation of conditional ranks and tests of exogeneity in nonparametric nonseparable models Treatment effects with censoring and endogeneity Competing risks: Aims and methods, Handbook of Statistics pp Instrumental variable method for time-to-event data using a pseudo-observation approach Instrumental variable additive hazards models Instrumental variables estimation with competing risk data Nonparametric binary instrumental variable analysis of competing risks data Causal inference using potential outcomes: Design, modeling, decisions Program evaluation with right-censored data Periodic screening for breast cancer: the hip randomized controlled trial Instrumental variable estimation in a survival context A nonidentifiability aspect of the problem of competing risks A comparison of two quantile models with endogeneity Two-stage residual inclusion for survival data and competing risks-an instrumental variable approach with application to seer-medicare linked data Instrumental variable with competing risk model = min L =1 (ϕ 1 z ) −1 (y 1 z − ). For ∈ {1 for some small ∆ > 0, = 1, . . . , L. The rationale of this estimator is as follows. By definition, u Y is the lowest value of u such that there exists ∈ {1, . . . , L} for which the left limit of ϕ 1 z (·) in this value is equal to y 1 z . If y 1 z is consistent, then y 1 z − ∆ should be close to y 1 z . As a result, since ϕ 1 is consistent, u Y is close to u Y . The main role of {∆ } L =1 is to provide a cushion which ensures that the probability of { u Y > u Y } is small. This event is not desirable because it would lead the researcher to report results for values of u for which (ϕ 1 z (u)) L =1 is not identified. The following proposition formalizes these ideas.Proposition 4.2 Assume that ϕ 1 z (·) is uniformly consistent on [0, u Y ) with rate of convergence r n → 0, that isWe prove the following Lemma.By the law of large numbers, Y 1 z /n converges to P(Z = z, δE = 1) > 0 in probability. Hence, Y 1 z goes to ∞ in probability and. We haveThis and (A.1) imply that P u * (∆) ≥ u I Y ≥ P ϕ 1 z * (∆) (u * (∆)) ≥ y 1 z * (∆) − ∆ * (∆) → 1, and since u Y ≥ u * (∆), we obtain P(u Y ≥ u I Y ) → 1. Notice that u m Y −1 ≤ u I Y , therefore P(u m Y −1 ≤ u Y ) → 1.