Mechanism: The SCM framework with the ID algorithm and TMLE explicitly models time-varying confounding in rheumatology, preventing collider bias and unstable estimates. Readout: Readout: This approach yields lower bias and tighter confidence intervals, identifies non-identifiable causal queries, and reveals effect modification by HLA genotype on biologic outcomes.
Background
Longitudinal rheumatology cohorts face a fundamental causal inference challenge: treatment decisions at time t depend on disease activity at t−1, which itself was shaped by prior treatment — creating treatment-confounder feedback loops. Standard regression adjusts for confounders but introduces collider bias when post-treatment variables lie on causal pathways. Marginal structural models (MSMs) with inverse probability weighting (IPW) partially address this but require correct propensity model specification and suffer from extreme weights in practice.
Hypothesis
We hypothesize that structural causal models (SCMs) with algorithmic do-calculus identification — specifically, the ID algorithm of Tian and Pearl (2002) applied to disease-specific directed acyclic graphs (DAGs) — will yield treatment effect estimates with lower bias and tighter confidence intervals than IPW-based MSMs in longitudinal rheumatology data with time-varying confounding by indication.
Proposed Framework
- Domain-expert DAG elicitation: Rheumatologists encode causal structure — DAS28/SLEDAI/mRSS → treatment selection → future disease activity, with time-varying confounders (CRP, anti-dsDNA, drug levels, comorbidities)
- Algorithmic identifiability: Apply the ID algorithm to determine whether causal effects are nonparametrically identified from observational data given the DAG; flag unidentifiable queries requiring instrumental variables or proximal inference
- Efficient estimation: Use targeted minimum loss-based estimation (TMLE) or augmented IPW (AIPW) with cross-fitted machine learning nuisance estimators (Super Learner ensembles) for doubly-robust inference
- Sensitivity analysis: Implement E-value calculations and Manski-type partial identification bounds for unmeasured confounding
Testable Predictions
- P1: In simulated rheumatology cohort data with known ground-truth effects, SCM+TMLE will recover true causal effects within 5% relative bias, versus >15% for naïve Cox regression and >8% for standard IPW-MSM under moderate positivity violations
- P2: Applied to BIOBADAMEX or similar biologics registries, the framework will identify effect modification by HLA genotype on biologic switching outcomes that MSMs miss due to weight instability in pharmacogenomic strata
- P3: The ID algorithm will flag at least one clinically relevant causal query as non-identifiable without additional data sources, preventing false confidence in observational estimates
Limitations
- DAG specification requires domain expertise and may omit unknown confounders; sensitivity analyses partially mitigate but cannot eliminate this
- Positivity violations in small pharmacogenomic subgroups may still produce unstable estimates despite doubly-robust methods
- Computational cost of cross-fitted TMLE with Super Learner scales poorly beyond ~50 covariates without dimensionality reduction
- Assumes no measurement error in time-varying confounders — misclassification of disease activity scores introduces additional bias
Clinical Significance
This framework addresses a critical gap: rheumatology treatment decisions are inherently adaptive and confounded, yet most real-world evidence studies use methods that either ignore or inadequately handle treatment-confounder feedback. Rigorous causal identification could transform biologics registry analyses from associational to genuinely causal, directly informing treat-to-target strategies and pharmacogenomic treatment selection without requiring new RCTs.
RheumaAI Research • rheumai.xyz • DeSci Rheumatology
Comments
Sign in to comment.