key: cord-0162612-is2weqev authors: Miao, Wang; Li, Xinyu; Sun, Baoluo title: A stableness of resistance model for nonresponse adjustment with callback data date: 2021-12-06 journal: nan DOI: nan sha: cde23343ae25621edc5d383c3e74584d387d8c75 doc_id: 162612 cord_uid: is2weqev The survey world is rife with nonresponse and in many situations the missingness mechanism is not at random, which is a major source of bias for statistical inference. Nonetheless, the survey world is rich with paradata that track the data collection process. A traditional form of paradata is callback data that record attempts to contact. Although it has been recognized that callback data are useful for nonresponse adjustment, they have not been used widely in statistical analysis until recently. In particular, there have been a few attempts that use callback data to estimate response propensity scores, which rest on fully parametric models and fairly stringent assumptions. In this paper, we propose a stableness of resistance assumption for identifying the propensity scores and the outcome distribution of interest, without imposing any parametric restrictions. We establish the semiparametric efficiency theory, derive the efficient influence function, and propose a suite of semiparametric estimation methods including doubly robust ones, which generalize existing parametric approaches. We also consider extension of this framework to causal inference for unmeasured confounding adjustment. Application to a Consumer Expenditure Survey dataset suggests an association between nonresponse and high housing expenditures, and reanalysis of Card (1995)'s dataset on the return to schooling shows a smaller effect of education in the overall population than in the respondents. Nonresponse has been a persistent concern of statisticians and applied researchers for many year, because it leads to substantially biased statistical inference and is frequently encountered in surveys and observational studies in almost every area of scientific research. A large body of work for nonresponse adjustment has relied on the assumption that the missingness is at random, while statistical inference with data missing not at random is confronted with the difficulty of identification and has relied on either restrictive models that lack flexibility or auxiliary data that may not always be available. Nonetheless, paradata are routinely available in modern surveys and observational studies. Paradata are the records tracking the data collection process (Couper, 1998) . A traditional form of paradata is callback data that record attempts to contact. Although it has been recognized that callback data are helpful to reduce the bias in surveys, they have not been used widely in statistical analysis until recently. In particular, there have been a few attempts that use callback data to estimate response propensity scores and then estimate the outcome mean by inverse probability weighting. These approaches rely on fully parametric propensity score models and fairly stringent assumptions. In contrast, we take a fundamentally nonparametric identification strategy. We propose a stableness of resistance assumption for identification, without placing any parametric models on the propensity scores and the outcome distribution. We establish the semiparametric efficiency theory and propose a suite of semiparametric estimators including doubly robust ones, which generalize existing parametric approaches. Consider an outcome prone to missing values, then the missingness is said to be at random (MAR) or ignorable if it does not depend on the missing outcome conditional on fully-observed covariates, and otherwise it is called missing not at random (MNAR) or nonignorable. The response propensity and the outcome distribution are identified under MAR, where identification means that the parameter or distribution of interest is uniquely determined from the observed-data distribution. Identification is crucial for missing data analysis, without which statistical inference may be misleading and is of limited interest. Methods to make inference under MAR abound, including likelihood-based methods, multiple imputation, inverse probability weighting, and doubly robust estimation. However, there is a suspicion that the missingness mechanism is MNAR in many situations. For example, social stigma in sensitive questions (e.g., HIV status, income, or drug use) makes nonresponse dependent on unobserved variables. Causal inference in the potential outcomes framework can be cast as a missing data problem by viewing the potential outcome under treatment/control as the missing variable, and the missingness is MNAR if certain confounders affecting both the treatment and the outcome are not observed. Under MNAR, statistical inference is substantially more challenging because in general identification fails to hold without extra information. Although one can achieve identification if the impact of the outcome on the missingness is completely known, this should be used rather as a sensitivity analysis because in practice such information is seldom available, see e.g. Robins et al. (2000) . Otherwise, identification does not hold even for fully parametric models, except for several fairly restrictive ones (e.g., Heckman, 1979; . For semiparametric and nonparametric models, identification and inference under MNAR essentially require the use of auxiliary data: either an instrumental variable correlated with the response propensity and uncorrelated with the missing variable (Sun et al., 2018; Liu et al., 2020; Tchetgen Tchetgen and Wirth, 2017) , or a shadow variable correlated the missing variable but uncorrelated with the response propensity (D'Haultfoeuille, 2010; Miao and Tchetgen Tchetgen, 2016; Wang et al., 2014; Zhao and Shao, 2015) . Researchers have traditionally sought such auxiliary variables from the sampling frame, but it is difficult for practitioners and the difficulty is amplified in multipurpose studies where multiple survey variables are concerned and multiple auxiliary variables are necessitated. Moreover, these two approaches invoke additional no interaction or completeness conditions that further limit their use, and they break down if the auxiliary variables also have missing values due to failure of contact in surveys. In contrast to the paucity of auxiliary variables in most sampling frames, paradata offer an important source of auxiliary information for nonresponse adjustment. Paradata are the records tracking the collection process of survey data-including the contact process, the length and the mode of interviews, and electronic traces such as mouse clicks and keystrokes in web surveys. Paradata are available in many surveys, e.g., the National Health Interview Survey (NHIS), the National Survey of Family Growth (NSFG), the European Social Survey (ESS), the National Survey on Drug Use and Health (NSDUH), the Behavioral Risk Factor Surveillance System (BRFSS), and the Consumer Expenditure Survey (CES). Appropriate use of paradata can improve survey quality and compensate for deficiencies in surveys. Kreuter (2013) provides a comprehensive literature review on the use of paradata in analyses of survey data. Olson (2013) reviews categories of paradata and challenges and opportunities in using paradata for nonresponse adjustment. A traditional form of paradata related to survey participation is the callback data, sometimes called level-of-effort data (Biemer et al., 2013) . Callbacks are records for recruitment attempts in the course of survey. In many surveys, follow-ups after the initial contact are made to increase the response rate, and the call attempts are recorded by interviewers and often kept for all units. In social science and survey study, callbacks have routinely been used to monitor response rates and to study how design features affect contact and cooperation in the course of data collection (Groves and Couper, 1998) ; they also figure importantly for the design of subsampling (Elliott et al., 2000; Cochran, 1977, chapter 13) . Stemming from the early idea of Politz and Simmons (1949) , callback data are increasingly looked into as a source of auxiliary information for nonresponse adjustment in political, social, economic, and epidemiological surveys. The callback design is analogous to the two/multi-phase sampling (Deming, 1953) by viewing follow-ups as the second-phase sample, but it differs in that nonresponse may still occur in the follow-up sample and is possibly not at random. Hence, nonresponse adjustment remains difficult even if callbacks are available, due to the challenge for identification and inference under MNAR. In Section 2, we further review the challenge. The notion of "continuum of resistance" has been asserted in social and survey researches for decades despite conflicting evidence of its validity (Lin and Schaeffer, 1995; Filion, 1976; Potthoff et al., 1993; Clarsen et al., 2021) ; the premise is that nonrespondents are more similar to delayed respondents than they are to early respondents, so that the most reluctant respondents are used to approximate the nonrespondents. A more elaborate approach is to model the joint likelihood of callbacks and frame variables (Biemer et al., 2013; Drew and Fuller, 1980) where identification requires untenable assumptions in practice-for instance, assuming that response probabilities are equal across different call attempts or levels of frame variables. Peress (2010) provides an excellent review on these approaches. Chen et al. (2018) and Zhang et al. (2018) generalize the Heckman (1979) Selection model to incorporate callbacks to improve efficiency, which entails joint normality of the missing outcome and the callback data and is in fact identified without callbacks and auxiliary data (see e.g., . Scharfstein et al. (1999, in the alongside discussion), Rotnitzky and Robins (1995, in an unpublished manuscript) , and Daniels et al. (2015) propose sensitivity analysis methods for callback data based on selection and patternmixture models, which are particularly convenient because the sensitivity parameters do not impact model fitting of the observed-data distribution. Most notably, Alho (1990) , Kim and Im (2014) , Qin and Follmann (2014) , and Guan et al. (2018) use callback data to estimate response propensity scores and propose inverse probability weighted and empirical likelihood-based estimation methods. Wood et al. (2006) and Jackson et al. (2010) apply this approach in a survey of Gulf war veterans and in the QUATRO trial, respectively. This approach hinges on a fully parametric linear logistic propensity score model and invokes a commonslope assumption to achieve identification, i.e., the coefficients of all explanatory variables included in the propensity score model do not vary for at least the first two calls. The parametric approaches circumvent the intrinsic potential of callback data for nonresponse adjustment. In contrast, we take a fundamentally nonparametric identification strategy. In Section 3, we characterize a stableness of resistance assumption for using callbacks to identify the full-data distribution and establish the identification results. This assumption states that the impact of the missing outcome on the response propensity does not vary between the first two call attempts. This assumption does not require any parametric models on the propensity score, does not restrict the impact of covariates on the missingness, and admits any type (discrete or continuous) of variables. Therefore, it is a nonparametric generalization of the linear logistic propensity score model and the common slope assumption made by Alho (1990) , Kim and Im (2014) , Qin and Follmann (2014) , and Guan et al. (2018) . To our knowledge, it is so far the most parsimonious condition characterizing the most flexible models for identification and inference of nonignorable missing data with callbacks. For the ease of presentation, we focus on the case with two call attempts, but the stableness of resistance assumption obviously guarantees identification for multiple callback attempts as more observations are available. In Section 4, we propose a parameterization of the full-data distribution and describe both an inverse probability weighted (IPW) and an outcome regression-based (REG) estimator. The proposed IPW estimator is a generalization of the calibration estimator of Kim and Im (2014) by allowing for nonlinear propensity score models and call-specific effects of covariates on nonresponse. The IPW estimator relies on two propensity score models for the first and second call attempts, while the REG estimator relies on the propensity score model for the first call and the outcome regression model for the second call. However, if the corresponding propensity score or regression model is incorrect, the IPW or REG estimator is no longer consistent. Therefore, we propose a semiparametric efficient and doubly robust approach in Section 5. We establish the semiparametric theory for nonresponse adjustment with callbacks under the stableness of resistance assumption, by characterizing the tangent space and the semiparametric efficiency bound for estimating the outcome mean. The efficient influence function motivates a doubly robust estimator that affords double protection against misspecification of working models: it remains consistent if either the second-call propensity score model or the second-call regression model is correct but not necessarily both, provided the first-call propensity score model is correct; and moreover, it attains the efficiency bound for the nonparametric model when all working models are correct. In Section 6, we consider extensions of the proposed approach to estimation of a general smooth functional, to causal inference for unmeasured confounding adjustment, and to the setting with multiple callbacks. Sections 7 and 8 include numerical illustrations with simulations and two real data applications to a Consumer Expenditure Survey dataset and a National Longitudinal Survey of Young Men dataset previously analyzed by Card (1995) . Section 9 includes discussions on other extensions and limitations of the proposed approach. Throughout the paper, we let (X, Y ) denote the frame variables that are released in a survey or observational study for investigation of a particular scientific purpose, where X denotes a vector (possibly a null set) of fully observed covariates and Y the outcome prone to missing values. Suppose in the data collection process the interviewer has continued to contact a unit until he/she responds or the maximum number of call attempts K is achieved. We let R k denote the response state of Y with R k = 1 if Y is available in the kth call and R k = 0 otherwise. The observations of (R 1 , . . . , R K ) recording the response process in the survey are known as callback data. Let f denote a generic probability density or mass function. Suppose the full data contain n independent and identically distributed samples from the joint distribution f (X, Y, R 1 , . . . , R K ), and the values of Y with R K = 0 are missing in the observed data. Table 1 illustrates the data structure of a survey with callbacks, where the final response status R K can be created based on the frame data but the callback data (R 1 , . . . , R K ) contain much more information about the response process than solely R K does. To ground ideas, we will focus on the setting with two calls or when only the first two calls are considered and discuss the extension to multiple calls in the supplement. Our primary goal is to make inference about the outcome mean µ = E(Y ) based on the observed data and extension to a general functional of the full-data distribution f (X, Y ) will be considered in Section 6. : : Researchers have traditionally used the frame data to make nonresponse adjustment without engaging deeply with the callback data, in which case, only the final response status R K is used. Under MAR, i.e., is not equal to f (Y | X, R K = 1). Callback data contain useful information about the reluctance of units to respond; however, they still do not suffice for identification. Consider a toy example of survey on a binary outcome Y with two calls; then the observed-data distribution is captured by f (Y, R 1 = 1) and f (Y, R 2 = 1 | R 1 = 0), which only offers four constraints on five unknown parameters f (Y ), f (R 1 = 1 | Y ), f (R 2 = 1 | R 1 = 0, Y ). Therefore, one cannot identify f (Y, R 1 , R 2 ) without further restrictions. To achieve identification with callback data, Alho (1990) , Kim and Im (2014) , and Qin and Follmann (2014) consider a common-slope linear logistic model for the propensity scores. Example 1 (A common-slope linear logistic model). Alho (1990) assumes that for k = 1, . . . , K, (1) Hereafter, we let logit x = log{x/(1 − x)} be the logit transformation of x and expit x = exp(x)/{1 + exp(x)} its inverse. Example 1 is a very restrictive model: the propensity score for each callback follows a linear (in parameters) logistic model and the effects of both the covariate X and the outcome Y do not vary across call attempts. Daniels et al. (2015) mention an identification result that allows for call-specific coefficients of X but requires constant coefficients of Y across all call attempts without providing a formal proof. Guan et al. (2018) relax the common slope assumption by only requiring α 11 = α 21 and γ 1 = γ 2 . This enables identification with only two call attempts, but their identifying strategy based on a meticulous analysis of the logistic model only admits continuous covariates and outcome. To unleash the potential of callback data for nonresponse adjustment, it is of great interest to establish identification for other types of models than the linear logistic model. So far, however, identification and inference of semiparametric and nonparametric propensity score models with callbacks are not available. An all-important step for identification under MNAR is to determine the degree to which the missingness departs from MAR. The odds ratio function is a particularly useful measure widely used in the missing data literature to characterize the association between the outcome and response propensity (Chen, 2007; Kim and Yu, 2011; Miao and Tchetgen Tchetgen, 2016) . We define the odds ratio functions for the response propensity in the the first and second calls as follows, The odds ratio functions Γ 1 (X, Y ) and Γ 2 (X, Y ) measure the impact of the outcome in the first and second calls, respectively, by quantifying the change in the odds of response caused by a shift of Y from a common reference value Y = 0 to a specific level while controlling for covariates. Any other value within the support of Y can be chosen as the reference value. Therefore, the odds ratio function is a measure of the resistance to respond caused by the outcome. Odds ratio functions equal to zero correspond to MAR where the outcome does not influence the response. Our identification strategy rests on the following assumption. (iii) Stableness of resistance: Condition (i) reveals the feature of callbacks: observations recorded in the first call will be kept through the course of the survey, i.e., if R 1 = 1 then R 2 = 1. Condition (ii), a standard positivity assumption in missing data analysis, ensures sufficient overlap between nonrespondents and respondents in each call attempt. Condition (iii) is the key identifying condition, which reveals that the impact of the outcome on the response propensity remains the same in the first two calls. The stableness of resistance assumption is plausible if the social stigma attached to a unit does not change through the course of survey or the contacts are made in a short period of time, so that the resistance to response caused by the outcome remains the same in the first two calls. However, Assumption 1 admits call-specific impacts of covariates on nonresponse. be the logit transformation of the corresponding baseline propensity scores evaluated at the reference value Y = 0, then they capture call-specific impacts of covariates and are unrestricted. Moreover, Assumption 1 does not impose any parametric models on the propensity scores. Under Assumption 1, the propensity scores have the following representation: Assumption 1 is a generalization of the common-slope logistic propensity score model in Example 1, where A 1 (X) = α 10 + α T 1 X, A 2 (X) = α 20 + α T 1 X, and Γ(X, Y ) = γY . Whether (iii) holds or not does not depend on the choice of the reference value in the definition of the odds ratio functions and the baseline propensity scores. The following implication of Assumption 1 plays a key role in our identification strategy. We prove this proposition in the supplement. The last equation leads to an inequality: i.e., given X the density ratio on the left hand side is uniformly bounded from zero. This is a restriction on the observed-data distribution imposed by Assumption 1. We have the following identification result. Proof. Proposition 1 shows that it suffices to identify This is an equation with D(X) unknown while all the other quantities are available from the observed-data distribution. Identification of D(X) can be assessed by checking uniqueness of the solution to this equation. For any fixed x and any D(x) such that (6) Therefore, for any fixed x the solution to (7) is unique. Applying this argument to all x, then D(X) is identified and We achieve identification of f (X, Y, R 1 , R 2 ) under the stableness of resistance assumption, without imposing any parametric models for the propensity scores and restrictions on the effects of covariates. In contrast to previous parametric identification strategies that rely on parameter counting or meticulous analysis of the logistic model, our identification strategy relies on uniqueness of the solution to equation (7) for the unknown parameter D(X), which holds for any types of covariates and outcome. To our knowledge, Assumption 1 is so far the most parsimonious condition characterizing the most flexible model for identification with callbacks, and Theorem 1 is so far the most general identification result. When multiple calls (K ≥ 3) are available, identification of f (X, Y, R 1 , . . . , R K ) equally holds under Assumption 1, even if the propensity scores and outcome distributions for the third and later call attempts are completely unrestricted. In lieu of engaging with callback data, researchers have previously sought fully observed auxiliary variables from the frame data to establish identification, either an instrumental variable Z such that Z ⊥ ⊥ Y | X (e.g., Sun et al., 2018; Liu et al., 2020; Tchetgen Tchetgen and Wirth, 2017) Haultfoeuille, 2010; Miao and Tchetgen Tchetgen, 2016; Wang et al., 2014; Zhao and Shao, 2015) . Moreover, the instrumental variable approach requires that Z and Y do not interact in the propensity score, and the shadow variable approach invokes the completeness of f (Y | W ) (or f (Y | W, R K = 1)) that requires the number of levels or dimension of W to be at least as large as that of Y . Looking for auxiliary variables satisfying such conditions from sampling frames is sometimes difficult in practice and fails if the missingness is due to failure of contacts and all frame variables have missing values, which is often the case in surveys. In contrast, our identification strategy employs callback data that are often available in modern surveys, and rests on the stableness of resistance assumption. Applying Theorem 1 to the linear logistic model, we immediately obtain the following result. Proposition 2. Assuming that logit f (R k = 1 | R k−1 = 0, X, Y ) = α k0 + α k1 X + γY for k = 1, 2, then α k0 , α k1 , and γ are identified. Proposition 2 generalizes the identification of Example 1 by admitting call-specific coefficients of X in the propensity scores and only requiring the coefficients of Y to be equal for the first two calls. Example 1 in fact reveals a continuum-of-resistance model where nonrespondents are more similar to late respondents than to early ones due to a common odds ratio for all call attempts; however, in the model in Proposition 2 nonrespondents may depart further from late respondents than from early ones, depending on the form of odds ratio functions in later calls. In this case, continuum-of-resistance models are not suitable. Given the observed data, we can in principle first estimate , then plug them into equation (7) to solve for D(X), and finally obtain the estimate of f (X, Y, R 1 , R 2 ) according to Proposition 1. The first step can be achieved by standard nonparametric estimation and the third step only involves basic arithmetic; however, solving equation (7) in the second step is in general complicated. In the next section, we consider a parameterization for the joint distribution f (X, Y, R 1 , R 2 ) and develop feasible estimation methods that only require the working models to be partially correct. Under Assumption 1, we introduce the following factorization of the joint distribution as the basis for parameterization and estimation. where c 1 (X) is a normalizing function of X that makes the right hand side of (8) a valid density function. We prove (8) in Lemma 1 in the supplement. This factorization enables a convenient and congenial specification of four variationally independent components of the joint distribution: • the odds ratio function Γ(X, Y ); • and the outcome distribution for the second call f (Y | R 2 = 1, R 1 = 0, X). This kind of factorization of a joint density into a combination of univariate conditionals and odds ratio is widely applicable; see Osius (2004) , Chen (2007) , Yu (2011), and Franks et al. (2020) for examples in missing data analysis and causal inference. The joint distribution is determined once given the two baseline propensity scores, the odds ratio, and the second-call outcome distribution. We consider semiparametric estimation under correct specification of a subset of these four models. For notational convenience, we let π 1 (X, Y ) = f (R 1 = 1 | X, Y ) and π 2 (X, Y ) = f (R 2 = 1 | R 1 = 0, X, Y ) denote the propensity scores, A 1 (X) = logit f (R 1 = 1 | X, Y = 0) and A 2 (X) = logit f (R 2 = 1 | R 1 = 0, X, Y = 0) the logit transformation of the corresponding baseline propensity scores, and f 2 (Y | X) = f (Y | R 2 = 1, R 1 = 0, X) the outcome model for the second call. Note that π 1 (X, Y ) and π 2 (X, Y ) are determined by A 1 (X), A 2 (X), Γ(X, Y ) as in equations (4)-(5). We will write π 1 , π 2 , A 1 , A 2 , f 2 , Γ for short where it does not cause confusion. We specify parametric working models for two baseline propensity scores A 1 (X; α 1 ), A 2 (X; α 2 ), and the odds ratio function Γ(X, Y ; γ). By definition, we require Γ(X, Y = 0; γ) = 0. This is equivalent to specify π 1 (X, Y ; α 1 , γ) and π 2 (X, Y ; α 2 , γ). The logistic model in Proposition 2 is an example. The following equations characterize the propensity scores, The first equation follows from the definition of π 1 and the other two echo the definition of π 2 by noting that These three conditional moment equations motivate the following marginal moment equations for estimating (α 1 , α 2 , γ): whereÊ denotes the empirical mean operator and {V 1 (X), V 2 (X), U (X, Y )} is a vector of user-specified functions of dimension at least as large as that of (α 1 , α 2 , γ). A natural choice is V 1 (X) = ∂A 1 (X; α 1 )/∂α 1 , V 2 (X) = ∂A 2 (X; α 2 )/∂α 2 , and U (X, Equations (12)-(14) only involve the observed data. The generalized method of moments (Hansen, 1982) can be implemented to solve these equations. Letting (α 1,ipw ,α 2,ipw ,γ ipw ) be the nuisance estimators obtained from (12)-(14),π 1 = π 1 (α 1,ipw ,γ ipw ) andπ 2 = π 2 (α 2,ipw ,γ ipw ) the estimated propensity scores, andp 2 =π 1 +π 2 (1 −π 1 ) an estimator of p 2 = f (R 2 = 1 | X, Y ), we propose the following IPW estimator of the outcome mean, Preceding our proposal, Kim and Im (2014) developed a calibration estimator under the common-slope logistic model in Example 1. Their estimator can in fact be obtained from our IPW estimator by choosing appropriate functions V 1 , V 2 , U . However, their estimator fails if the slopes of covariates X differ in the two propensity score models, while our approach works. In this sense, our IPW estimator can be viewed as a generalization of their calibration estimator. Under the common-slope linear logistic model, Qin and Follmann (2014) and Guan et al. (2018) developed empirical likelihood-based estimation that is convenient to incorporate auxiliary information to achieve higher efficiency; it is of interest to extend their approach to the setting with call-specific slopes for covariates. We illustrate the connection between our IPW estimator and the calibration estimator of Kim and Im (2014) with the linear logistic model. Example 2. Assuming the common-slope logistic model logit π k (X, Y ) = α k0 + α 1 X + γY for k = 1, 2, Kim and Im (2014) propose a calibration estimator of the nuisance parameters, which solves These two equations can be obtained from our estimating equations (12)-(14) by letting V 1 (X) = 1, V 2 (X) = 0 and U (X, Y ) = (1, X T , Y ) T . However, if the propensity score model is logit π k (X, Y ) = α k0 + α k1 X + γY with call-specific coefficients of X, then (16)-(17) fail to estimate the parameters because the number of estimating equations is not sufficient. In contrast, we can still consistently estimate the parameters with (12)-(14) by using V 1 (X) = V 2 (X) = (1, X T ) T and U (X, Y ) = Y , for example. Figure 1 presents a brief comparison of the performance of these two estimation methods in a numerical simulation. In Scenario (i) where π 1 and π 2 have a common slope in X, the calibration estimator obtained from (16)-(17) has little bias, but the proposed IPW estimator can achieve higher efficiency by choosing appropriate user-specified functions V 1 , V 2 , U in (12)-(14). In Scenario (ii) where π 1 and π 2 have different slopes in X, the calibration estimator has large bias while the proposed IPW estimator has little bias. Scenario (i) Scenario (ii) Figure 1 : Bias for estimators of µ and γ. Note for Figure 1 : Data are generated according to 6+0.6X, 0.6 2 ); α 1 = α 2 = 0.8 in Scenario (i) and α 1 = −0.8, α 2 = 0.8 in Scenario (ii);γ calib is obtained from (16)- (17),μ calib is obtained from (15) with nuisance estimators given by (16)- (17), andμ ipw1 ,μ ipw2 employ nuisance estimators given by (12)- (14) with V 1 (X) = V 2 (X) = (1, X) T and U (X, Y ) = Y and U (X, Y ) = π 2 Y , respectively; 1000 simulations under sample size 3000 are replicated for each scenario. Alternatively, the outcome mean can be obtained by estimation or imputation of the missing values. We specify and fit working models π 1 (α 1 , γ), f 2 (Y | X; β) for the first-call propensity score f (R 1 = 1 | X, Y ) and the second-call outcome distribution f (Y | X, R 2 = 1, R 1 = 0), and impute the missing values with which is known as the exponential tilting or Tukey's representation (Kim and Yu, 2011; Vansteelandt et al., 2007; Franks et al., 2020) . To obtain the nuisance estimators (β,α 1,reg ,γ reg ), we solve where E(· | X, R 2 = 0; β, γ) is evaluated according to (18) and U (X, Y ) is a user-specified function of dimension at least as large as (α 1 , γ). Equation (19) is the score equation of f 2 (Y | X; β). Equation (20) is motivated by the equationÊ[(1 − R 2 ){U (X, Y ) − E{U (X, Y ) | R 2 = 0, X}}] = 0, but evaluation of E{U (X, Y )} required for solving this equation is untenable due to missing values of Y , and thus we replace withÊ{R 1 U (X, Y )/π 1 }. An estimator of the outcome mean by imputing the missing outcome values with E(Y | X, R 2 = 0;β,γ reg ) iŝ We refer to this approach as the outcome regression-based (REG) estimation, although the first-call propensity score model is also involved. Guan et al. (2018) have previously proposed an estimator that involves the full-data outcome regression E(Y | X) whereas our REG estimation involves E(Y | R 2 = 1, R 1 = 0, X) that only concerns the observed data and enjoys the ease for model specification and estimation. Moreover, estimation of their outcome regression model requires estimating the propensity scores for all call attempts, but our REG estimation only requires estimating the first-call propensity score. Note that (12)-(15) are unbiased estimating equations for (α 1 , α 2 , γ, µ) in model M ipw = {π 1 (α 1 , γ) and π 2 (α 2 , γ) are correct}, and (19)-(21) are unbiased estimating equations for (β, α 1 , γ, µ) in model M reg = {π 1 (α 1 , γ) and f 2 (Y | X; β) are correct}. As a consequence, consistency and asymptotic normality of (α 1,ipw ,α 2,ipw ,γ ipw ,μ ipw ) in model M ipw and of (β,α 1,reg ,γ reg ,μ reg ) in model M reg can be established under standard regularity conditions (see e.g., Newey and McFadden, 1994) by following the theory of estimating equations, which we will not replicate here. The choice of user-specified functions V 1 (X), V 2 (X), U (X, Y ) depends on the working models and influences the efficiency of the estimators. In principle, the optimal choice can be obtained by deriving the efficient influence function for the nuisance parameters in models M ipw and M reg , respectively; however, the potential prize of attempting to attain local efficiency for nuisance parameters may not always be worth the chase because it depends on correct modeling of additional components of the joint distribution beyond the working models, and as pointed out by Stephens et al. (2014) such additional modeling efforts seldom deliver the anticipated efficiency gain. Besides, consistency of the estimators is undermined if the required working models are incorrect. Therefore, we next construct a doubly robust and locally efficient estimator based on a possibly inefficient estimator of the nuisance parameters. We consider the model M npr characterized by Assumption 1, Although the stableness of resistance assumption does impose an inequality constraint (6) on the observeddata distribution, we refer to M npr as the (locally) nonparametric model because no parametric models are imposed in M npr and as established in the following, the observed-data tangent space under M npr is the entire Hilbert space. We aim to derive the set of influence functions for all regular and asymptotically linear (RAL) estimators of µ and the efficient influence function under M npr . To achieve this goal, the primary step is to derive the observed-data tangent space for model M npr . Hereafter, we let O = (X, Y, R 1 , R 2 ) if R 2 = 1 and O = (X, R 1 , R 2 ) if R 2 = 0 denote the observed data. Proposition 3. The observed-data tangent space for M npr is where h 1 (X, Y ), h 2 (X, Y ) and h 3 (X) are arbitrary measurable and square-integrable functions. This proposition states that the observed-data tangent space for M npr is the entire Hilbert space of observed-data functions with mean zero and finite variance, equipped with the usual inner product. Hence, the stableness of resistance assumption does not impose any local restriction on the observed-data distribution. As a result, there exits a unique influence function for µ in model M npr , which must be the efficient one. We have derived the closed form for the efficient influence function. Theorem 2. The efficient influence function for µ in the nonparametric model M npr is From Theorem 2, the semiparametric efficiency bound for estimating µ in M npr is E{IF(O; µ) 2 }. A locally efficient estimator attaining this semiparametric efficiency bound can be constructed by plugging nuisance estimators of π 1 , π 2 , E(· | R 2 = 0, X) that converge sufficiently fast into the efficient influence function and then solvingÊ{IF(O; µ)} = 0. However, if X has more than two continuous components, one cannot be confident that these nuisance parameters can either be modeled correctly or estimated nonparametrically at rates that are sufficiently fast. It is therefore of interest to develop a doubly robust estimation approach, which delivers valid inferences about the outcome mean provided that a subset but not necessarily all low dimensional models for the nuisance parameters are specified correctly. To construct a doubly robust (DR) estimator, we specify working models {A 1 (X; α 1 ), A 2 (X; α 2 ), Γ(X, Y ; γ), f 2 (Y | X; β)} and estimate the nuisance parameters by solving where {V 1 (X), V 2 (X), U (X, Y )} are user-specified functions having dimensions at least as large as (α 1 , α 2 , γ), respectively. Let (β,α 1,dr ,α 2,dr ,γ dr ) denote the solution to (22)-(25) andπ 1 = π 1 (α 1,dr ,γ dr ),π 2 = π 2 (α 2,dr ,γ dr ), then an estimator of µ motivated from the influence function in Theorem 2 iŝ Equations (22)-(24) for estimating (β, α 1 , α 2 ) remain the same as (19), (12), and (13) in the IPW and REG estimation, respectively; equation (25) is a doubly robust estimating equation for γ. We summarize the double robustness of the estimators. Theorem 3. Under Assumption 1 and regularity conditions described by Newey and McFadden (1994, Theorems 2.6 and 3.4), (α 1,dr ,γ dr ,μ dr ) are consistent and asymptotically normal provided one of the following conditions holds: • A 1 (X; α 1 ), Γ(X, Y ; γ), and A 2 (X; α 2 ) are correctly specified; or • A 1 (X; α 1 ), Γ(X, Y ; γ), and f 2 (Y | X; β) are correctly specified. Furthermore,μ dr attains the semiparametric efficiency bound for the nonparametric model M npr at the intersection model M ipw ∩ M reg where all models {A 1 (X; α 1 ), A 2 (X; α 2 ), Γ(X, Y ; γ), f 2 (Y | X; β)} are correct. This theorem states that (α 1,dr ,γ dr ,μ dr ) are doubly robust against misspecification of the second-call baseline propensity score A 2 (X; α 2 ) and the second-call outcome distribution f 2 (Y | X; β), provided that the first-call propensity score π 1 (X, Y ; α 1 , γ) (i.e., A 1 (X; α 1 ), Γ(X, Y ; γ)) is correctly specified. Moreover,μ dr is locally efficient if all working models are correct, regardless of the efficiency of the nuisance estimators. The outcome mean estimatorμ dr has an analogous form to the conventional augmented inverse probability weighted (AIPW) estimator: the first part is an IPW estimator and the second part is an augmentation term involving the outcome regression model. Compared to the IPW and REG estimators in the previous section, the doubly robust estimator offers one more chance to correct the bias due to model misspecification. However, if both A 2 (X; α 2 ) and f 2 (Y | X; β) are incorrect, the proposed doubly robust estimator will generally also be biased. The odds ratio Γ(X, Y ; γ) is essential for all three proposed estimation methods. This is because, as noted previously, the odds ratio encodes the degree to which the outcome and the missingness process are correlated; in order to estimate the outcome mean, one must be able to account for this correlation. The first-call baseline propensity score A 1 (X; α 1 ) needs to be correct for estimation of the odds ratio, and as a result,α 1,dr is also doubly robust. If π 1 (α 1 , γ) is incorrect,μ dr is in general not consistent even if both A 2 (X; α 2 ) and f 2 (Y | X; β) are correct-because the latter two models only concern the second call. Variance estimation and confidence intervals for the doubly robust approach also follow from the general theory for estimation equations (see e.g., Newey and McFadden, 1994) and can be constructed based on the normal approximation under standard regularity conditions. Alternatively, bootstrap can be implemented to compute the variances of estimators. Doubly robust methods have been advocated in recent years for missing data analysis, causal inference, and other problems with data coarsening. However, previous proposals have assumed that the odds ratio function Γ(X, Y ) is either known exactly with the special case of MAR (e.g., Scharfstein et al., 1999; Lipsitz et al., 1999; Tan, 2006; van der Laan and Rubin, 2006; Vansteelandt et al., 2007; Okui et al., 2012) , or can be estimated with the aid of an extra instrumental or shadow variable (e.g., Miao and Tchetgen Tchetgen, 2016; Sun et al., 2018; Ogburn et al., 2015) . We have shown that with callbacks widely available in survey data, one can doubly robust estimate both the odds ratio model and the outcome mean. If the missingness mechanism is indeed at random, i.e. Γ(X, Y ) = 0, then f (X, Y, R 2 ) is identified without callback data. The efficiency bound for the nonparametric model, with the same observed data of (X, Y ) but without the callback data, is the variance of the conventional AIPW influence function: Under MAR, this efficiency bound does not vary in the presence of callback data. We prove this result in the supplement. From Proposition 4, callback data do not deliver efficiency gain under MAR, however, they bring additional modeling strategies and doubly robust estimators. 6. EXTENSIONS 6.1 Estimation of a general full-data functional We extend the proposed IPW, REG, and DR methods to estimation of a general smooth full-data functional θ, defined as the unique solution to a given estimating equation E{m(X, Y ; θ)} = 0. Familiar examples include the outcome mean with m(X, Y ; µ) = Y − µ; the least squares coefficient with m(X, Y ; θ) = X(Y − θ T X); and the instrumental variable estimand in causal inference with m(X, Y ; θ) = Z(Y − θ T W ), where W and Z are subvectors of X, θ is the causal effect of W on Y and is subject to unmeasured confounding and Z is a set of instrumental variables used for confounding adjustment. Given estimators of the nuisance parameters, the IPW and REG estimators of θ can be obtained by solving respectively. Letting m(θ) ≡ m(X, Y ; θ), the efficient influence function for θ in model M npr is We prove this result in the supplement. Then a doubly robust and locally efficient estimator of θ can be constructed by solvingÊ{IF(O; θ)} = 0, after obtaining nuisance estimators from (22)-(25). 6.2 Estimation of the causal effect on the untreated in the presence of unmeasured confounding Suppose we are interested in the effect of a binary treatment R (1 for treatment and 0 for control) on an outcome Y . Following the potential outcomes framework for causal inference (Rubin, 1974) , we let Y 1 , Y 0 denote the potential outcomes would be observed if the treatment were set to R = 1 or 0, respectively. The observed outcome is a realization of the potential outcome under the treatment a unit actually received, i.e., Y = RY 1 + (1 − R)Y 0 . A causal effect is defined as a comparison of the potential outcomes. Therefore, causal inference is inherently a missing data problem in the potential outcomes framework because Y 1 is observed for only the treated units and Y 0 for only control units. If the treatment assignment mechanism is ignorable, i.e., R ⊥ ⊥ (Y 1 , Y 0 ) | X, then identification and estimation of the potential outcome mean is analogous to the MAR setting in missing data analysis. However, unmeasured confounding often occurs in observational studies, i.e., certain variables correlated with both the treatment and the outcome are not measured. In this case, the missingness of potential outcomes is MNAR, and hence the potential outcome distributions are no longer identified. The instrumental variable (e.g., Angrist et al., 1996) is widely used and recently a proximal inference framework (e.g., Miao et al., 2018; Shi et al., 2020; Tchetgen Tchetgen et al., 2020) is established for confounding bias adjustment. The former approach entails an instrumental variable for the treatment, which is independent of the unmeasured confounder, correlated with the treatment, and does not directly affect the outcome, and the latter rests on two proxies of the unmeasured confounder. Besides, both approaches require additional conditions to achieve identification, such as effect homogeneity or treatment monotonicity for the instrumental variable approach and completeness in confounder proxies for proximal inference. Here we show how to apply the callback design to identify causal effects in the presence of unmeasured confounding. Suppose we make two calls to promote the treatment-for instance, the injection of certain Covid-19 vaccine. Let R 1 , R 2 denote treatment states in these two calls, and Y the level of antibody six months after taking the shot. We have that R 2 ≥ R 1 . Assuming that Y R 1 =1 = Y R 2 =1 holds for the potential outcomes, i.e., the effect of vaccine is the same whenever it is taken in these two calls. We are particularly interested in the average treatment effect on the untreated ATUT = E(Y R 2 =1 − Y R 2 =0 | R 2 = 0), which is the effect on those who ultimately do not take the vaccine. Assuming that a unit's resistance or willingness to taking the treatment remains the same in these two calls, i.e., the odds ratio functions of f (R 1 = 1 | X, Y R 2 =1 ) and f (R 2 = 1 | X, Y R 2 =1 , R 1 = 0) are the same, we can apply the identification strategy in Section 3 to identify E(Y R 2 =1 | R 2 = 0) and thus ATUT. The callback design does not involve instrumental or proxy variables, but makes use of the history or callbacks for treatment and invokes the stableness of resistance assumption to achieve identification. When multiple calls (K ≥ 3) are available, the proposed IPW, REG, and DR estimation methods developed for the setting with two calls still work but they are agnostic to the observed data after the second call, and the influence function given in Section 5 is no longer efficient. In the supplement, we extend the proposed IPW, REG, and DR estimators to the setting with multiple callbacks, which can incorporate all observations on (X, Y ). It is of great interest to derive the set of influence functions and the efficient one, and to construct multiply robust estimators in the sense of Vansteelandt et al. (2007) when multiple callbacks are available; however, the tangent space and the projection onto it, which are the basis for calculating influence functions, are complicated in this situation. Therefore, we defer this to future work. We evaluate the performance of the proposed estimators and assess their robustness against misspecification of working models and violation of the key identifying assumption via simulations. Let X = (1, X 1 , X 2 ) T and X = (1, X 2 1 , X 2 2 ) with X 1 , X 2 independent following a uniform distribution Unif(−1, 1). We consider four data generating scenarios according to different choices for the second-call baseline propensity score and the second-call outcome distribution. The following table presents the data generating mechanism and the working models for estimation. Data generating model π 1 = expit (α T 1 X + γY ), π 2 = expit (α T 2 W 1 + γY ), f 2 (Y | X) ∼ N (β T W 2 , σ 2 ) Four scenarios with different choices of (W 1 , W 2 ) TT FT TF FF W 1 = X, W 2 = X W 1 = X, W 2 = X W 1 = X, W 2 = X W 1 = X, W 2 = X α T 1 (-1, 0.5, 0.2) (-0.3, -0.7, 0.7) (-1, 1, -0.1) (-0.3, -0.5, 1) α T (1, 0.5, 0.2) (-0.3, 1.9, 0.9) (0.5, 1, -0. Hence, the working model for the second-call baseline propensity score is correct in Scenarios (TT) and (TF), the second-call outcome model is correct in Scenarios (TT) and (FT), and they both are incorrect in Scenario (FF). The first-call baseline propensity score and the odds ratio models are correct in all scenarios. Based on the working models specified above, we implement the proposed IPW, REG, and DR methods for estimation of the outcome mean and the odds ratio parameter. We also compute the variance of the estimators and the confidence interval based on normal approximation and evaluate its coverage rate. For each scenario, we replicate 1000 simulations at sample size 3000. Figures 2 and 3 show the bias for estimators of the outcome mean and the odds ratio parameter, respectively, and Table 2 shows the coverage rates for the corresponding confidence intervals. In Scenario (TT) where all working models are correct, all three estimators have little bias and the 95% confidence intervals have coverage rates close to 0.95; in Scenario (FT) the second-call baseline propensity score model is incorrect but the second-call outcome model is correct, then the IPW estimator has large bias with coverage rate well below 0.95 and the REG estimator has little bias with the coverage rate close to 0.95; conversely, in Scenario (TF) the second-call baseline propensity score model is correct but the second-call outcome model is incorrect, then the IPW estimator has little bias with an appropriate coverage rate and the REG estimator has large bias with an undersized coverage rate. In contrast, the DR estimator has little bias with appropriate coverage rates in all the three scenarios when at least one of these two working models is correct. These numerical results demonstrate the advantage ofμ dr in terms of robustness against model misspecification. However, in Scenario (FF) where both working models are incorrect, all three estimators are biased. Therefore, we recommend to implement all three methods and use different working models to gain more robust inferences. In addition to model misspecification, we also evaluate sensitivity of the inference against violation of the identifying assumption. In the supplement, we include a sensitivity analysis to evaluate the performance of the proposed estimators when call-specific odds ratios arise, i.e., the stableness of resistance assumption is not met. The difference between the odds ratios is used as the sensitivity parameter to capture the degree to which the stableness of resistance is violated. The proposed estimators exhibit small bias as the difference of odds ratios varies within a moderate range but it could become severe if there were a big gap between the odds ratios. Therefore, to obtain reliable inferences in practice, we recommend implementing and comparing multiple applicable methods. The Consumer Expenditure Survey (CES; National Research Council, 2013) is a nationwide survey conducted by the U.S. Bureau of Labor Statistics to find out how American households make and spend money. It comprises two surveys: the Quarterly Interview Survey on large and recurring expenditures such as rent and utilities, and the Diary Survey on small and high frequency purchases, such as food and clothing. The survey data are released annually since 1980, which contain detailed paradata including the callback history. We use the public-use microdata from the Quarterly Interview Survey in the fourth quarter of 2018, which are downloaded from https://www.bls.gov/cex/pumd data.htm#csv. The nonresponse of frame variables is concurrent due to contact failure or refusal and no fully-observed baseline covariates are available in this dataset. For illustration, we analyze this dataset to study the expenditures on housing and on utilities, fuels and public services. This survey contains 9986 households and we remove 277 of them with extremely large or small expenditure values. The interviewers made 27 contact attempts in this survey. Figure 4 (a) shows the cumulative response rate. The overall response rate is about 0.6 and 80% of the respondents completed the survey within the first five contact attempts. Let (Y 1 , Y 2 ) denote the logarithm of the expenditure on housing and on utilities, fuels and public services, respectively. Figure 4 (b) shows the mean of (Y 1 , Y 2 ) for respondents to each call attempt. The mean of Y 1 for respondents gradually increases with the number of contacts, and the mean of Y 2 has a sharp increase in the second and third contacts and fluctuates in later contacts. These results suggest that the delayed respondents are likely to have higher expenditures and the nonresponse is likely dependent on the expenditures, in particular, on Y 1 . We apply the proposed methods to estimate the outcomes mean µ = (µ 1 , µ 2 ) = (E(Y 1 ), E(Y 2 )). Analogous to previous survey studies (e.g., Qin and Follmann, 2014; Boniface et al., 2017) , we split the contact attempts into two stages: 1-2 calls (early contact) and 3+ calls (late contact). Among the 9709 households we analyze, 1992 responded in the first stage, 3287 responded later, and 4430 never responded. We fit the data with working models π 1 = expit (α 1 + γ 1 Y 1 + γ 2 Y 2 ), π 2 = expit (α 2 + γ 1 Y 1 + γ 2 Y 2 ) and f 2 (Y ) ∼ N (µ, Σ), and apply the proposed IPW, REG, and DR methods to estimate the outcomes mean. The common odds ratio parameters (γ 1 , γ 2 ) reveal that the resistance to respond caused by the outcomes remain the same in these two contact stages. We also compute the complete-case (CC) sample mean. Table 3 presents the estimates. The DR point estimate of µ 1 is 7.842 with 95% confidence interval (7.800, 7.884), and of µ 2 is 6.345 with (6.296, 6.393). The CC estimate of µ 1 is 7.756 (7.734, 7.778) and of µ 2 is 6.285 (6.264, 6.306). The IPW and REG methods produce estimates close to DR; however, the CC estimate of the outcomes mean, in particular of µ 1 , is well below the DR estimate. As shown in Figure 4 (b), this can be partially explained by the fact that the outcomes mean in early respondents is lower than in the delayed, and therefore, the outcomes mean in the respondents is likely lower than in the nonrespondents. The estimation results of the odds ratio parameters reinforce this conjecture: the IPW, REG, and DR estimates of the odds ratio parameters are all negative, suggesting that high-spending people are more reluctant to respond or more difficult to contact. The odds ratio estimate for expenditure on housing is statistically significant at level 0.01, although it is not significant for expenditure on utilities, fuels and public services. This is evidence for missingness not at random. These results indicate that the expenditure on housing play a more important role in the response process; this may be because the survey takes personal home visit as one of the main modes of interview and people with high expenditure on housing are more difficult to reach. Increasing the variety of interview modes may potentially alleviate such nonignorable nonresponse. We reanalyze a dataset from the National Longitudinal Survey of Young Men, which has been previously analyzed by Card (1995) to estimate the return to schooling, a problem of longstanding interest in labor economics. The dataset is available at https://davidcard.berkeley.edu/data sets.html. The dataset contains 3613 observations on the log of hourly wage in 1976 and 1978, years of schooling (educ), and a vector of fully-observed covariates including years of work experience (exper) and experience squared (exper 2 ), dummy variables black for race, south and smsa for residence in the south and in a metropolitan area, and nearc4 for grew up near a 4-year college. The wages for only 3010 units were recorded in 1976, and previous authors (e.g., Card, 1995; Okui et al., 2012; Wang and Tchetgen Tchetgen, 2018) have used this subsample to estimate the effect of education on wage. Among the 603 nonrespondents to the call in 1976, however, we note that the wages for 201 units were recorded in the follow up in 1978, which can be viewed as a second-call sample and can be incorporated to adjust for potential selection bias of complete-case analyses based solely on the 3010 units from the first call. Let Y denote the log of hourly wage in 1976, W = (educ, C T ), Z = (nearc4, C T ) T , and X = (educ, nearc4, C T ) T with C = (1, exper, exper 2 , black, south, smsa) T . The instrumental variable estimand defined by E{Z(Y − θ T W )} = 0, particularly the coefficient of educ (θ educ ), is of interest for evaluating the causal effect of education on wage. As previous authors suggested, the presence of a nearby 4-year college (nearc4) is used as an instrumental variable for educ to adjust for confounding bias caused by unmeasured common causes of both education and wage; see Card (1995) , Okui et al. (2012) , and Wang and Tchetgen Tchetgen (2018) for details of the instrumental variable estimand. We apply the proposed methods to adjust for nonresponse and to estimate θ educ , with working models π 1 = expit (α T 1 X + γY ), π 2 = expit (α T 2 X + γY ) and f 2 (Y | X) ∼ N (β T X, σ 2 ). The common odds ratio γ encodes a stable resistance, caused by the wage, to respond. We also apply the complete-case (CC) analysis as a comparison. Table 4 summarizes the estimates. The complete-case point estimate of θ educ is 0.132 with 95% confidence interval (0.033, 0.231), which is quite close to previous analysis results (Card, 1995; Okui et al., 2012) and suggests a significant return to schooling among respondents to the call in 1976. After adjustment for nonresponse, the IPW point estimate of θ educ is 0.111 with 95% confidence interval (0.006, 0.216), which shows a significant return to schooling among the population consisting of both the respondents and the nonrespondents. The REG and DR point estimates are slightly smaller than the IPW estimate. Compared to the complete-case analysis, adjustment for nonresponse attenuates the estimates of θ educ . This result shows potential heterogeneity of the effects of education on wage: the effect for the nonrespondents to the call in 1976 is smaller than for the respondents. This is supported by the estimation results for the propensity score model. The impacts of wage and education on nonresponse to the call in 1976, encoded respectively in γ and α 1,educ , are both statistically significant at level 0.05. Therefore, the effect of education on wage in the respondents substantially differs from that in the nonrespondents due to selection bias. Besides, the results suggest that men with higher wage and lower education are more likely to respond, and therefore, the missingness mechanism is likely to be nonignorable. Kang and Schafer (2007) cautioned for potentially disastrous bias of certain DR estimators under MAR when all working models are incorrect; however, previous authors have proposed alternative constructions of nuisance estimators and DR estimators to alleviate this problem, see e.g. Tan (2010) ; Vermeulen and Vansteelandt (2015) ; Tsiatis et al. (2011) and the discussions alongside Kang and Schafer (2007) . Moreover, the doubly robust influence function enjoys an advantage that it admits nuisance estimators with convergence rates considerably slower than n 1/2 while delivering valid n 1/2 inference for the functional of interest. For instance, flexible, data-adaptive machine learning or nonparametric methods for estimation of highdimensional nuisance parameters can be used provided that the nuisance estimators have mean squared error of order smaller than n −1/4 . Such results have been well-documented (e.g., Benkeser et al., 2017; Kennedy et al., 2017; Athey et al., 2018; Tan, 2020; Rotnitzky et al., 2021; Dukes and Vansteelandt, 2021) . In addition, multiply robust estimation in the sense of Vansteelandt et al. (2007) is also of interest. It is of interest to incorporate these approaches to improve the proposed doubly robust estimator. Besides, we note that there exist an additional class of estimators exhibiting the same double robustness as in Theorem 3, e.g., However, the efficiency of this estimator depends on how nuisance parameters are estimated and there is no guarantee that it attains the efficiency bound for M npr . We plan to study this class of estimators elsewhere. One cannot discriminate between MNAR and MAR using solely frame data (Molenberghs et al., 2008) ; however, with callback data one can apply the proposed framework to test MAR by assessing how far away the estimated odds ratio function is from zero. The size of such tests can be (doubly robust) controlled because the stableness of resistance naturally holds under the null hypothesis of MAR, although they may not have power against certain alternatives. We assumed stableness of resistance for the first two calls, but identification remains if this holds for any two adjacent calls by applying our identifying strategy to the subsurvey starting with these two calls. We considered a single action response process-a call attempt either succeeds or fails; however, there may exist several dispositions, e.g., interview, refusal, other nonresponse or final non-contact (Biemer et al., 2013) . Concurrent nonresponse is often the case when the missingness is due to failure of contact, but in practice different frame variables may be observed in different call attempts. In this case one may combine the callback design and the graphical model (e.g., Sadinle and Reiter, 2017; Malinsky et al., 2020; Mohan and Pearl, 2021) to account for complex patterns of missingness. In addition to nonresponse adjustment, callback data are also useful for the design and organization of surveys, e.g., allocation of time and staff resources, and other forms of paradata can also be incorporated for nonresponse adjustment. The integration of the callback design and other tools (e.g., instrumental variables) can handle simultaneous problems of nonresponse and confounding or other deficiencies in survey and observational studies. It is of interest to pursue these extensions and further explore the use of the stableness of resistance assumption in causal inference, case-control studies, and longitudinal studies. Supplementary material online includes estimation under an alternative parameterization, estimation with multiple callbacks, proof of theorems, propositions, and equations, a simulation for sensitivity analysis, codes and data for reproducing the simulations and applications. Adjusting for nonresponse bias using logistic regression Identification of causal effects using instrumental variables Approximate residual balancing: debiased inference of average treatment effects in high dimensions Doubly robust nonparametric inference on the average treatment effect Using level-of-effort paradata in non-response adjustments with application to field surveys Assessment of non-response bias in estimates of alcohol consumption: Applying the continuum of resistance model in a general population survey in england Using geographic variation in college proximity to estimate the return to schooling Generalization of heckman selection model to nonignorable nonresponse using call-back information A semiparametric odds ratio model for measuring association Revisiting the continuum of resistance model in the digital age: a comparison of early and delayed respondents to the norwegian counties public health survey Sampling Techniques Measuring survey quality in a casic environment Pattern mixture models for the analysis of repeated attempt designs On a probability mechanism to attain an economic balance between the resultant error of response and the bias of nonresponse A new instrumental method for dealing with endogenous selection Modeling nonresponse in surveys with callbacks Inference for treatment effect parameters in potentially misspecified high-dimensional models Subsampling callbacks to improve survey efficiency Exploring and correcting for nonresponse bias using follow-ups of non respondents Flexible sensitivity analysis for observational studies without observable implications Nonresponse in Household Interview Surveys Semiparametric maximum likelihood inference for nonignorable nonresponse with callbacks Large sample properties of generalized method of moments estimators Sample selection bias as a specification error How much can we learn about missing data?: an exploration of a clinical trial in psychiatry Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data Non-parametric methods for doubly robust estimation of continuous treatment effects Propensity score adjustment with several follow-ups A semiparametric estimation of mean functionals with nonignorable missing data Improving surveys with paradata Using survey participants to estimate the impact of nonparticipation A weighted estimating equation for missing covariate data with properties similar to maximum likelihood Identification and inference for marginal average treatment effect on the treated with an instrumental variable Semiparametric inference for non-monotone missing-not-at-random data: the no self-censoring model Identifiability of normal and normal mixture models with nonignorable missing data Identifying causal effects with proxy variables of an unmeasured confounder On varieties of doubly robust estimators under missingness not at random with a shadow variable Graphical models for processing missing data Every missingness not at random model has a missingness at random counterpart with equal fit Measuring what we spend: Toward a new consumer expenditure survey Large sample estimation and hypothesis testing Doubly robust estimation of the local average treatment effect curve Doubly robust instrumental variable regression Paradata for nonresponse adjustment The association between two random elements: A complete characterization and odds ratio models Correcting for survey nonresponse using variable response propensity An attempt to get the "not at homes" into the sample without callbacks Correcting for nonavailability bias in surveys by weighting based on number of callbacks Semiparametric maximum likelihood inference by using failed contact attempts to adjust for nonignorable nonresponse Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models Characterization of parameters with a mixed bias property Estimating causal effects of treatments in randomized and nonrandomized studies Itemwise conditionally independent nonresponse modelling for incomplete multivariate data Adjusting for nonignorable drop-out using semiparametric nonresponse models Multiply robust causal inference with double negative control adjustment for categorical unmeasured confounding Locally efficient estimation of marginal treatment effects when outcomes are correlated: is the prize worth the chase? Semiparametric estimation with data missing not at random using an instrumental variable A distributional approach for causal inference using propensity scores Bounded, efficient and doubly robust estimation with inverse weighting Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data A general instrumental variable framework for regression analysis with outcome missing not at random An introduction to proximal causal learning Improved doubly robust estimation when data are monotonely coarsened, with application to longitudinal studies with dropout Targeted maximum likelihood learning Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse Biased-reduced doubly robust estimation Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables An instrumental variable approach for identification and estimation with nonignorable nonresponse Using number of failed contact attempts to adjust for non-ignorable non-response Bayesian inference for nonresponse two-phase sampling Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data