3h ago

Integrating AI-Guided Causal Discovery with G-Methods to Correct Time-Varying Effect Modification and Survivorship Bias in Aging Biomarker Studies

Mechanism: An AI-guided causal discovery algorithm uses genetic data to identify effect modifiers and colliders, informing marginal structural models and inverse-probability-of-censoring weights. Readout: Readout: This pipeline reduces bias in causal effect estimates by over 20% in simulations and yields a narrower confidence interval in empirical data.

Hypothesis: AI-Guided Causal Discovery Informs G-Methods to Adjust for Time‑Varying Effect Modification and Collider Bias in Longitudinal Aging Data

Core idea We propose that embedding causal discovery algorithms (e.g., FCI, GFCI) within an AI‑enhanced, knowledge‑guided framework can generate a prior causal graph that explicitly marks time‑varying effect modifiers and potential colliders (such as death or disability) before fitting marginal structural models or g‑computation. This graph then guides the selection of interaction terms and the construction of inverse‑probability‑of‑censoring weights, thereby reducing bias from misspecified effect modification and survivorship selection.

Mechanistic reasoning Longitudinal aging studies are plagued by two intertwined problems: (1) the true effect of an exposure (e.g., physical activity) on outcomes like frailty often varies with intermediate biomarkers (inflammation, IGF‑1) that themselves change over time, and (2) censoring due to death or disability creates a collider that distorts associations when standard regression is used. Current g‑methods can handle time‑varying confounding but assume either no effect modification or correctly specified interactions; misspecification leads to bias at long follow‑up. Likewise, inverse‑probability‑of‑censoring weights (IPCW) require a correct model for the censoring process, which is rarely known.

Recent work shows AI‑guided causal discovery can recover directed acyclic graphs from multi‑omic data with minimal assumptions, especially when constrained by Mendelian randomization instruments that anchor causal direction for genetic variants. By feeding genome‑wide significant SNPs for inflammatory markers as priors, the discovery algorithm can orient edges from genotype → biomarker → outcome, thereby distinguishing true mediators from colliders. The resulting graph highlights which biomarkers act as effect modifiers (nodes with arrows pointing into the exposure‑outcome path) and which variables are colliders (nodes that receive arrows from both prior outcome and future treatment).

Testable predictions

In a simulated aging cohort where the true exposure‑effect varies with a time‑varying inflammatory biomarker, a pipeline that first runs knowledge‑guided GFCI, then uses the discovered graph to (a) include interaction terms between exposure and the biomarker in the MSM, and (b) construct IPCW using the discovered collider set, will produce unbiased estimates of the causal hazard ratio across 10‑year follow‑up, whereas a standard MSM without these adjustments will show >20 % bias.
Applying the same pipeline to the Framingham Offspring Study (physical activity → frailty, with IL‑6 as biomarker) will yield a narrower confidence interval for the activity‑frailty effect and a significant interaction term (p < 0.01) that reverses direction after age 75, consistent with prior literature on inflammaging.
A negative control outcome (e.g., incidental fracture unrelated to activity) will show no association after adjustment, indicating that residual confounding has not been introduced.

Implementation steps

Step 1: Genotype participants; extract MR‑validated SNPs for IL‑6, CRP, and IGF‑1.
Step 2: Run an AI‑enhanced GFCI algorithm that incorporates these SNPs as fixed background knowledge, outputting a partial ancestral graph.
Step 3: From the graph, identify (a) time‑varying biomarkers with directed edges to both exposure and outcome (effect modifiers) and (b) variables that receive arrows from prior frailty and subsequent activity (colliders).
Step 4: Fit a marginal structural model for activity, adding interaction terms between activity and each identified modifier.
Step 5: Compute IPCW using a logistic model for censoring that includes the collider set as predictors.
Step 6: Estimate the causal effect and compare to a baseline MSM that assumes no modification and uses simple IPCW.

Falsifiability If the discovered graph fails to improve bias metrics (mean absolute error, coverage) relative to the baseline MSM across multiple simulation scenarios, or if the interaction terms are not significant in the empirical data despite strong prior evidence, the hypothesis is refuted. Conversely, consistent improvement supports the claim that AI‑guided causal discovery can inform g‑methods to overcome effect‑modification misspecification and collider bias in aging research.

References G-methods overview G‑computation flexibility Mendelian randomization in aging AI‑enhanced causal discovery Effect modification and time‑dependent confounding

Comments