The direct effect of a treatment variable and the indirect effect through a mediator variable on an endpoint variable are important for understanding a causal mechanism. The Controlled direct effect has a prescriptive interpretation, while the natural direct and indirect effects have a descriptive interpretation. In practice, these three effects are usually very difficult to identify. To tackle this problem, some researchers investigated the upper and lower bounds of these three effects when some reasonable identification conditions hold. For example, Luo and Geng (2016) gave the upper and lower bounds of these direct and indirect effects when there is an unobserved mediator-endpoint confounder vector and the endpoint variable is continuous. In this paper, we tighten the bounds on controlled direct effect in Luo and Geng (2016) when part of the confounders can be observed. Additionally, we give a sufficient condition to identify the direct and indirect effects when the variables satisfy one linear relationship.
Introduction
To understand the causal mechanism, [1], [2] defined the controlled direct effect, the natural direct effect and the natural indirect effect formally. After that, several papers investigated the identification conditions of these direct and indirect effects [1], [2], [3], [4], [5], [6], [7]. These researchers pointed out that the random assignment of treatment alone is not sufficient to identify the direct and indirect effects, and some additional assumptions and conditions are absolutely needed, such as the sequential ignorability assumption without unobserved confounders or the sequential potential ignorability assumption. However, it is hard to believe that these extra assumptions are correct in many applications. For example, to identify the controlled direct effect, it is required that no unobserved confounders exists between the mediator and the endpoint variable. However, the mediator cannot be randomized, meaning that the unobserved mediator-endpoint confounders may exist and thus the identifiability of controlled direct effect does not hold. To identify the natural direct and indirect effects, stronger assumptions and conditions are required, such as the independencies of the potential mediator and the potential endpoint variable, which are less likely to hold. In this situation, some researchers investigate the bounds of direct and indirect effects or try to give a sufficient condition to identify them under some model assumptions.
The aim of the present article is to give the bounds on the controlled direct effect, the natural direct effect and the natural indirect effect, when part of confounders between the mediator and the endpoint variable can be observed and the rest confounders cannot. In [8], the bounds on controlled direct effect are given, when the endpoint variable is dichotomous. Later, [9] derived the bounds on natural direct and indirect effects by linear programming when the treatment, mediator and endpoint variables are all binary. Under some new assumptions, [10] gave the bounds on the potential endpoint variable for the first time. Furthermore, assuming the potential mediator variable to be a constant for the entire population, the bounds on controlled direct effect, natural direct and indirect effects were derived in [10]. Following the idea of [8], [11] considered the general case, where the endpoint variable can be arbitrary. In this article, we draw lessons from [12], which tightened the bounds on average causal effect by additional covariate information. Under the assumption that part of the mediator-endpoint confounders can be measured, new bounds on potential endpoint variable and controlled direct effect are derived using stratified analysis. Besides, we illustrate that the new bounds are narrower than the bounds in [11]. Specifically, we give the bounds on natural direct and indirect effects when the potential mediator is a constant. Furthermore, we point out that direct and indirect effects can be identified when observable and unobservable confounders are independent and all variables satisfy a linear model.
The remainder of this paper is organized as follows. Section 2 reviews the definitions of controlled direct effect , natural direct and indirect effects in potential outcome framework. The identification conditions and estimation methods of the causal effects are also provided. In Section 3, we derive the new bounds on the potential endpoint variable, controlled direct effect, natural direct and indirect effects, assuming that part of the mediator-outcome confounders can be measured. In addition, we show that the new bounds on the expectation of potential endpoint variable and controlled direct effect in this article are narrower than the bounds in [11]. At last, we give sufficient conditions to identify direct and indirect effects under some mild model assumptions. And then, the estimation of these bounds from the observed data are given in Section 4. In Section 5, we illustrate our method by simulated data. In Section 6, the proposed approaches are applied for two real data sets, one of which investigated the direct and indirect effects of the job training on the depressive symptoms mediated by the job-search self-efficacy, while the other is about the effects of college education on the log wage. At last, Section 7 ends the paper with some discussion.
Section snippets
Notations and definitions
In this section, we first introduce the notations and definitions of controlled directed effect (CDE), natural direct effect (NDE) and natural indirected effect (NIE). Then we review the assumptions and conditions required for identifying CDE, NDE and NIE. At last, the equations for estimating these causal effects are presented.
Let denote a binary treatment variable, 1 for treated and 0 for placebo. Let be the post-treatment variable which is an intermediate variable or mediator. Let
Bounds on direct and indirect effects
Throughout this part we suppose the mediator is a discrete variable. In the previous section, we present the identification conditions and estimation equations of the CDE, NDE and NIE. Moreover, we point out that some conditions are unreasonable in practice. For example, we need the conditional independencies (i), (ii) to identify the CDE, where is observed. However, it is very difficult to observe all the mediator-endpoint confounders in practice. In Section 2, we also mentioned that two
Estimation of the bounds on CDE, NDE and NIE
In this section, we first present a method for estimating the bounds of CDE, NDE and NIE given in the previous section by observed data. Suppose the observed data is for , where is the sample size. We can estimate , , and respectively with
Simulations
In this section, we first present the bounds by simulated data. We illustrate that the bounds on and in this paper are narrower than the bounds in [11]. Suppose the data set is created according to Fig. 1 and , , and are all binary variables. is a continuous variable and the marginal probabilities are given as follows: , and . The conditional probabilities of given , and are shown in Table 1.
In addition, the conditional
Applications
In this section, we illustrate our method by two real data sets. Firstly, the bounds on and in this paper are compared with the bounds in [11]. As the bounds on the NIE are the same as the bounds on the CDE, we only present bounds on NIE.
Secondly, we assume that the outcome, treatment, mediator and invisible confounders satisfy model (5), and the observable confounder is independent of the unobservable confounders. Estimations of the direct and indirect effects, and the
Discussions
In this paper, we analyzed the bounds on the CDE, the NDE and the NIE. Identifying the CDE requires the following two conditional independencies and , where and can be observed. In practice, it is usually difficult to intervene the mediating variable. In addition, non-existence of the unobserved confounders between the mediating variable and the endpoint variable is unreasonable in many applications. To identify natural direct and indirect effects, two extra assumptions are required,
The author would like to thank professor Zhi Geng of Peking university for his helpful discussion on the draft of this paper. The suggestions from the anonymous reviewers were also very helpful. This work was supported in part by the National Natural Science Foundation of China (Grant 62573300), and in part by the Natural Science Foundation of Guangdong Province (Grant 2023A1515011394).