Bounds and identification on direct and indirect effects under partially observed mediator-endpoint confounders

https://doi.org/10.1016/j.jmva.2025.105565Get rights and content

Abstract

The direct effect of a treatment variable and the indirect effect through a mediator variable on an endpoint variable are important for understanding a causal mechanism. The Controlled direct effect has a prescriptive interpretation, while the natural direct and indirect effects have a descriptive interpretation. In practice, these three effects are usually very difficult to identify. To tackle this problem, some researchers investigated the upper and lower bounds of these three effects when some reasonable identification conditions hold. For example, Luo and Geng (2016) gave the upper and lower bounds of these direct and indirect effects when there is an unobserved mediator-endpoint confounder vector and the endpoint variable is continuous. In this paper, we tighten the bounds on controlled direct effect in Luo and Geng (2016) when part of the confounders can be observed. Additionally, we give a sufficient condition to identify the direct and indirect effects when the variables satisfy one linear relationship.

Introduction

To understand the causal mechanism, [1], [2] defined the controlled direct effect, the natural direct effect and the natural indirect effect formally. After that, several papers investigated the identification conditions of these direct and indirect effects [1], [2], [3], [4], [5], [6], [7]. These researchers pointed out that the random assignment of treatment alone is not sufficient to identify the direct and indirect effects, and some additional assumptions and conditions are absolutely needed, such as the sequential ignorability assumption without unobserved confounders or the sequential potential ignorability assumption. However, it is hard to believe that these extra assumptions are correct in many applications. For example, to identify the controlled direct effect, it is required that no unobserved confounders exists between the mediator and the endpoint variable. However, the mediator cannot be randomized, meaning that the unobserved mediator-endpoint confounders may exist and thus the identifiability of controlled direct effect does not hold. To identify the natural direct and indirect effects, stronger assumptions and conditions are required, such as the independencies of the potential mediator and the potential endpoint variable, which are less likely to hold. In this situation, some researchers investigate the bounds of direct and indirect effects or try to give a sufficient condition to identify them under some model assumptions.
The aim of the present article is to give the bounds on the controlled direct effect, the natural direct effect and the natural indirect effect, when part of confounders between the mediator and the endpoint variable can be observed and the rest confounders cannot. In [8], the bounds on controlled direct effect are given, when the endpoint variable is dichotomous. Later, [9] derived the bounds on natural direct and indirect effects by linear programming when the treatment, mediator and endpoint variables are all binary. Under some new assumptions, [10] gave the bounds on the potential endpoint variable for the first time. Furthermore, assuming the potential mediator variable to be a constant for the entire population, the bounds on controlled direct effect, natural direct and indirect effects were derived in [10]. Following the idea of [8], [11] considered the general case, where the endpoint variable can be arbitrary. In this article, we draw lessons from [12], which tightened the bounds on average causal effect by additional covariate information. Under the assumption that part of the mediator-endpoint confounders can be measured, new bounds on potential endpoint variable and controlled direct effect are derived using stratified analysis. Besides, we illustrate that the new bounds are narrower than the bounds in [11]. Specifically, we give the bounds on natural direct and indirect effects when the potential mediator is a constant. Furthermore, we point out that direct and indirect effects can be identified when observable and unobservable confounders are independent and all variables satisfy a linear model.
The remainder of this paper is organized as follows. Section 2 reviews the definitions of controlled direct effect , natural direct and indirect effects in potential outcome framework. The identification conditions and estimation methods of the causal effects are also provided. In Section 3, we derive the new bounds on the potential endpoint variable, controlled direct effect, natural direct and indirect effects, assuming that part of the mediator-outcome confounders can be measured. In addition, we show that the new bounds on the expectation of potential endpoint variable and controlled direct effect in this article are narrower than the bounds in [11]. At last, we give sufficient conditions to identify direct and indirect effects under some mild model assumptions. And then, the estimation of these bounds from the observed data are given in Section 4. In Section 5, we illustrate our method by simulated data. In Section 6, the proposed approaches are applied for two real data sets, one of which investigated the direct and indirect effects of the job training on the depressive symptoms mediated by the job-search self-efficacy, while the other is about the effects of college education on the log wage. At last, Section 7 ends the paper with some discussion.

Section snippets

Notations and definitions

In this section, we first introduce the notations and definitions of controlled directed effect (CDE), natural direct effect (NDE) and natural indirected effect (NIE). Then we review the assumptions and conditions required for identifying CDE, NDE and NIE. At last, the equations for estimating these causal effects are presented.
Let A denote a binary treatment variable, 1 for treated and 0 for placebo. Let Z be the post-treatment variable which is an intermediate variable or mediator. Let Y

Bounds on direct and indirect effects

Throughout this part we suppose the mediator is a discrete variable. In the previous section, we present the identification conditions and estimation equations of the CDE, NDE and NIE. Moreover, we point out that some conditions are unreasonable in practice. For example, we need the conditional independencies (i), (ii) to identify the CDE, where U is observed. However, it is very difficult to observe all the mediator-endpoint confounders in practice. In Section 2, we also mentioned that two

Estimation of the bounds on CDE, NDE and NIE

In this section, we first present a method for estimating the bounds of CDE, NDE and NIE given in the previous section by observed data. Suppose the observed data is (si,ai,zi,yi) for i{1,,N}, where N is the sample size. We can estimate Pr(S=s|A=a), Pr(Z=z,S=s|A=a), E(YI{Z=z}|A=a) and E(Y|A=a) respectively with Prˆ(S=s|A=a)=i=1NI{ai=a}I{si=s}i=1NI{ai=a},Prˆ(S=s,Z=z|A=a)=i=1NI{ai=a}I{si=s}I{zi=z}i=1NI{ai=a}, Eˆ(YI{Z=z}|A=a)=i=1NyiI{ai=a}I{zi=z}i=1NI{ai=a},Eˆ(Y|A=a)=i=1NyiI{ai=a}i=

Simulations

In this section, we first present the bounds by simulated data. We illustrate that the bounds on E(Yaz) and CDEa,a(z) in this paper are narrower than the bounds in [11]. Suppose the data set is created according to Fig. 1 and A, Z, S and U are all binary variables. Y is a continuous variable and the marginal probabilities are given as follows: Pr(A=1)=0.5, Pr(S=1)=0.129 and Pr(U=1)=0.118. The conditional probabilities of Z given A, S and U are shown in Table 1.
In addition, the conditional

Applications

In this section, we illustrate our method by two real data sets. Firstly, the bounds on E(Yaz) and CDEa,a(z) in this paper are compared with the bounds in [11]. As the bounds on the NIE are the same as the bounds on the CDE, we only present bounds on NIE.
Secondly, we assume that the outcome, treatment, mediator and invisible confounders satisfy model (5), and the observable confounder is independent of the unobservable confounders. Estimations of the direct and indirect effects, and the

Discussions

In this paper, we analyzed the bounds on the CDE, the NDE and the NIE. Identifying the CDE requires the following two conditional independencies
and
, where C and U can be observed. In practice, it is usually difficult to intervene the mediating variable. In addition, non-existence of the unobserved confounders between the mediating variable and the endpoint variable is unreasonable in many applications. To identify natural direct and indirect effects, two extra assumptions are required,

CRediT authorship contribution statement

Yu Han: Conceptualization, Writing – original draft. Peng Luo: Validation, Writing – review & editing. Wei Zhang: Methodology, Software, Investigation, Formal Analysis, Writing – original draft. Xiang Gu: Visualization, Data curation.

Acknowledgments

The author would like to thank professor Zhi Geng of Peking university for his helpful discussion on the draft of this paper. The suggestions from the anonymous reviewers were also very helpful. This work was supported in part by the National Natural Science Foundation of China (Grant 62573300), and in part by the Natural Science Foundation of Guangdong Province (Grant 2023A1515011394).

References (18)

  • J.M. Robins

    A new approach to causal inference in mortality studies with sustained exposure periods application to control of the healthy worker survivor effects

    Math. Model.

    (1986)
  • Pearl et al.

    Direct and indirect effects

  • J.M. Robins et al.

    Identifiability and exchangeability for direct and indirect effects

    Epidemiology

    (1992)
  • V. Didelez et al.

    Direct and indirect effects of sequential treatments

  • D.M. Hafeman et al.

    Alternative assumptions for the identification of direct and indirect effects

    Epidemiology

    (2011)
  • K. Imai et al.

    And sensitivity analysis for causal mediation effects

    Statist. Sci.

    (2010)
  • M.L. Petersen et al.

    Estimation of direct causal effects

    Epidemiology

    (2006)
  • J.M. Robins

    Semantics of causal DAG models and the identification of direct and indirect effects

  • Z. Cai et al.

    Bounds on direct effects in the presence of confounded intermediate variables

    Biometrics

    (2008)
There are more references available in the full text version of this article.

Cited by (0)

This article is part of a Special issue entitled: ‘YJMVA_Learning in Multi-Analysis’ published in Journal of Multivariate Analysis.
View full text