Bounds and identification on direct and indirect effects under partially observed mediator-endpoint confounders - ScienceDirect

Skip to main content Skip to article

Journal of Multivariate Analysis

Volume 213, May 2026, 105565

Journal of Multivariate Analysis

https://doi.org/10.1016/j.jmva.2025.105565 Get rights and content

Abstract

The direct effect of a treatment variable and the indirect effect through a mediator variable on an endpoint variable are important for understanding a causal mechanism. The Controlled direct effect has a prescriptive interpretation, while the natural direct and indirect effects have a descriptive interpretation. In practice, these three effects are usually very difficult to identify. To tackle this problem, some researchers investigated the upper and lower bounds of these three effects when some reasonable identification conditions hold. For example, Luo and Geng (2016) gave the upper and lower bounds of these direct and indirect effects when there is an unobserved mediator-endpoint confounder vector and the endpoint variable is continuous. In this paper, we tighten the bounds on controlled direct effect in Luo and Geng (2016) when part of the confounders can be observed. Additionally, we give a sufficient condition to identify the direct and indirect effects when the variables satisfy one linear relationship.

Introduction

To understand the causal mechanism, [1], [2] defined the controlled direct effect, the natural direct effect and the natural indirect effect formally. After that, several papers investigated the identification conditions of these direct and indirect effects [1], [2], [3], [4], [5], [6], [7]. These researchers pointed out that the random assignment of treatment alone is not sufficient to identify the direct and indirect effects, and some additional assumptions and conditions are absolutely needed, such as the sequential ignorability assumption without unobserved confounders or the sequential potential ignorability assumption. However, it is hard to believe that these extra assumptions are correct in many applications. For example, to identify the controlled direct effect, it is required that no unobserved confounders exists between the mediator and the endpoint variable. However, the mediator cannot be randomized, meaning that the unobserved mediator-endpoint confounders may exist and thus the identifiability of controlled direct effect does not hold. To identify the natural direct and indirect effects, stronger assumptions and conditions are required, such as the independencies of the potential mediator and the potential endpoint variable, which are less likely to hold. In this situation, some researchers investigate the bounds of direct and indirect effects or try to give a sufficient condition to identify them under some model assumptions.

The aim of the present article is to give the bounds on the controlled direct effect, the natural direct effect and the natural indirect effect, when part of confounders between the mediator and the endpoint variable can be observed and the rest confounders cannot. In [8], the bounds on controlled direct effect are given, when the endpoint variable is dichotomous. Later, [9] derived the bounds on natural direct and indirect effects by linear programming when the treatment, mediator and endpoint variables are all binary. Under some new assumptions, [10] gave the bounds on the potential endpoint variable for the first time. Furthermore, assuming the potential mediator variable to be a constant for the entire population, the bounds on controlled direct effect, natural direct and indirect effects were derived in [10]. Following the idea of [8], [11] considered the general case, where the endpoint variable can be arbitrary. In this article, we draw lessons from [12], which tightened the bounds on average causal effect by additional covariate information. Under the assumption that part of the mediator-endpoint confounders can be measured, new bounds on potential endpoint variable and controlled direct effect are derived using stratified analysis. Besides, we illustrate that the new bounds are narrower than the bounds in [11]. Specifically, we give the bounds on natural direct and indirect effects when the potential mediator is a constant. Furthermore, we point out that direct and indirect effects can be identified when observable and unobservable confounders are independent and all variables satisfy a linear model.

The remainder of this paper is organized as follows. Section 2 reviews the definitions of controlled direct effect , natural direct and indirect effects in potential outcome framework. The identification conditions and estimation methods of the causal effects are also provided. In Section 3, we derive the new bounds on the potential endpoint variable, controlled direct effect, natural direct and indirect effects, assuming that part of the mediator-outcome confounders can be measured. In addition, we show that the new bounds on the expectation of potential endpoint variable and controlled direct effect in this article are narrower than the bounds in [11]. At last, we give sufficient conditions to identify direct and indirect effects under some mild model assumptions. And then, the estimation of these bounds from the observed data are given in Section 4. In Section 5, we illustrate our method by simulated data. In Section 6, the proposed approaches are applied for two real data sets, one of which investigated the direct and indirect effects of the job training on the depressive symptoms mediated by the job-search self-efficacy, while the other is about the effects of college education on the log wage. At last, Section 7 ends the paper with some discussion.

Section snippets

Notations and definitions

In this section, we first introduce the notations and definitions of controlled directed effect (CDE), natural direct effect (NDE) and natural indirected effect (NIE). Then we review the assumptions and conditions required for identifying CDE, NDE and NIE. At last, the equations for estimating these causal effects are presented.

Let

A

denote a binary treatment variable, 1 for treated and 0 for placebo. Let

Z

be the post-treatment variable which is an intermediate variable or mediator. Let

Y

Bounds on direct and indirect effects

Throughout this part we suppose the mediator is a discrete variable. In the previous section, we present the identification conditions and estimation equations of the CDE, NDE and NIE. Moreover, we point out that some conditions are unreasonable in practice. For example, we need the conditional independencies (i), (ii) to identify the CDE, where

U

is observed. However, it is very difficult to observe all the mediator-endpoint confounders in practice. In Section 2, we also mentioned that two

Estimation of the bounds on CDE, NDE and NIE

In this section, we first present a method for estimating the bounds of CDE, NDE and NIE given in the previous section by observed data. Suppose the observed data is

(s_{i}, a_{i}, z_{i}, y_{i})

for

i \in {1, \dots, N}

, where

N

is the sample size. We can estimate

Pr (S = s | A = a)

,

Pr (Z = z, S = s | A = a)

,

E (Y I_{{Z = z}} | A = a)

and

E (Y | A = a)

respectively with

\hat{Pr} (S = s | A = a) = \frac{\sum_{i = 1}^{N} I_{{a_{i} = a}} \cdot I_{{s_{i} = s}}}{\sum_{i = 1}^{N} I_{{a_{i} = a}}}, \hat{Pr} (S = s, Z = z | A = a) = \frac{\sum_{i = 1}^{N} I_{{a_{i} = a}} \cdot I_{{s_{i} = s}} \cdot I_{{z_{i} = z}}}{\sum_{i = 1}^{N} I_{{a_{i} = a}}},

\hat{E} (Y I_{{Z = z}} | A = a) = \frac{\sum_{i = 1}^{N} y_{i} \cdot I_{{a_{i} = a}} \cdot I_{{z_{i} = z}}}{\sum_{i = 1}^{N} I_{{a_{i} = a}}}, \hat{E} (Y | A = a) = \frac{\sum_{i = 1}^{N} y_{i} \cdot I_{{a_{i} = a}}}{\sum i =}

Simulations

In this section, we first present the bounds by simulated data. We illustrate that the bounds on

E (Y_{a z})

and

C D E_{a^{*}, a} (z)

in this paper are narrower than the bounds in [11]. Suppose the data set is created according to Fig. 1 and

A

,

Z

,

S

and

U

are all binary variables.

Y

is a continuous variable and the marginal probabilities are given as follows:

Pr (A = 1) = 0.5

,

Pr (S = 1) = 0.129

and

Pr (U = 1) = 0.118

. The conditional probabilities of

Z

given

A

,

S

and

U

are shown in Table 1.

In addition, the conditional

Applications

In this section, we illustrate our method by two real data sets. Firstly, the bounds on

E (Y_{a z})

and

C D E_{a^{*}, a} (z)

in this paper are compared with the bounds in [11]. As the bounds on the NIE are the same as the bounds on the CDE, we only present bounds on NIE.

Secondly, we assume that the outcome, treatment, mediator and invisible confounders satisfy model (5), and the observable confounder is independent of the unobservable confounders. Estimations of the direct and indirect effects, and the

Discussions

In this paper, we analyzed the bounds on the CDE, the NDE and the NIE. Identifying the CDE requires the following two conditional independencies

and

, where

C

and

U

can be observed. In practice, it is usually difficult to intervene the mediating variable. In addition, non-existence of the unobserved confounders between the mediating variable and the endpoint variable is unreasonable in many applications. To identify natural direct and indirect effects, two extra assumptions are required,

CRediT authorship contribution statement

Yu Han: Conceptualization, Writing – original draft. Peng Luo: Validation, Writing – review & editing. Wei Zhang: Methodology, Software, Investigation, Formal Analysis, Writing – original draft. Xiang Gu: Visualization, Data curation.

Acknowledgments

The author would like to thank professor Zhi Geng of Peking university for his helpful discussion on the draft of this paper. The suggestions from the anonymous reviewers were also very helpful. This work was supported in part by the National Natural Science Foundation of China (Grant 62573300), and in part by the Natural Science Foundation of Guangdong Province (Grant 2023A1515011394).

References (18)

J.M. Robins
A new approach to causal inference in mortality studies with sustained exposure periods application to control of the healthy worker survivor effects
Math. Model.
(1986)
Pearl et al.
Direct and indirect effects
J.M. Robins et al.
Identifiability and exchangeability for direct and indirect effects
Epidemiology
(1992)
V. Didelez et al.
Direct and indirect effects of sequential treatments
D.M. Hafeman et al.
Alternative assumptions for the identification of direct and indirect effects
Epidemiology
(2011)
K. Imai et al.
And sensitivity analysis for causal mediation effects
Statist. Sci.
(2010)
M.L. Petersen et al.
Estimation of direct causal effects
Epidemiology
(2006)
J.M. Robins
Semantics of causal DAG models and the identification of direct and indirect effects
Z. Cai et al.
Bounds on direct effects in the presence of confounded intermediate variables
Biometrics
(2008)

There are more references available in the full text version of this article.

Cited by (0)

^☆: This article is part of a Special issue entitled: ‘YJMVA_Learning in Multi-Analysis’ published in Journal of Multivariate Analysis.

© 2025 Published by Elsevier Inc.

Estimating singular functions of kernel cross-covariance operators: An investigation of the Nyström method
Journal of Multivariate Analysis, Volume 211, 2026, Article 105514
Min Xu, …, Zhuo-Xi Shi
Bayesian analysis of nonlinear structured latent factor models with a Gaussian process prior
Journal of Multivariate Analysis, Volume 212, 2026, Article 105577
Yimang Zhang, …, Jian Qing Shi
Statistical guarantees for distribution estimation of contaminated data via DNN-based MoM-GANs
Journal of Multivariate Analysis, Volume 212, 2026, Article 105571
Fang Xie, …, Huiming Zhang
On the two-sample Behrens–Fisher problem for high-dimensional data
Journal of Multivariate Analysis, Volume 212, 2026, Article 105572
Yongshuai Chen, …, Baoxue Zhang
Robust bilinear factor analysis based on the matrix-variate t distribution
Journal of Multivariate Analysis, Volume 212, 2026, Article 105575
Xuan Ma, …, Philip L.H. Yu
Simultaneous variable selection and estimation of multivariate panel count data
Journal of Multivariate Analysis, Volume 213, 2026, Article 105559
Lei Ge, …, Jianguo Sun