Mechanism: Bayesian causal discovery algorithms process multi-modal patient data to identify novel interventional hub nodes in systemic sclerosis. Readout: Readout: Predicted fibrotic biomarker reduction of over 15%, confirmed by Mendelian randomization, and distinct temporal causal network rewiring are observed.
Hypothesis
Application of Bayesian DAG (directed acyclic graph) structure learning algorithms — specifically order-MCMC with the BGe score — to integrated multi-modal rheumatology datasets (serial cytokine panels, nailfold capillaroscopy quantifications, modified Rodnan skin scores, PFT trajectories, and HLA/non-HLA genomic risk variants) will recover causal graphs that identify 3–5 previously unrecognized interventional targets in systemic sclerosis (SSc), distinct from currently druggable pathways (TGF-β, IL-6, endothelin).
Rationale
Current SSc therapeutic development relies on association-based biomarker studies that cannot distinguish causal mediators from downstream correlates. Bayesian structure learning overcomes this by estimating posterior distributions over DAG topologies, naturally quantifying edge uncertainty via Bayes factors. The BGe (Bayesian Gaussian equivalent) score permits closed-form marginal likelihood computation for continuous data, while order-MCMC efficiently explores the super-exponential DAG space by sampling over node orderings rather than graphs directly.
SSc is uniquely suited to causal discovery because: (1) its triphasic progression (edematous → fibrotic → atrophic) creates temporal ordering constraints that inform DAG orientation; (2) multi-organ involvement generates high-dimensional, causally entangled biomarker networks; and (3) existing interventional data from clinical trials provides partial ground truth for DAG validation via do-calculus consistency checks.
Testable Predictions
- Structural: The learned DAG will contain ≥3 hub nodes (high out-degree) not currently recognized as therapeutic targets, with posterior edge probability >0.85 across 10-fold cross-validation splits.
- Interventional: Simulated do-calculus interventions (do(X=x)) on identified hub nodes will predict ≥15% reduction in downstream fibrotic biomarker levels, validated against held-out interventional trial data.
- Genomic anchoring: ≥2 identified causal mediators will have cis-eQTL instruments suitable for Mendelian randomization confirmation, providing orthogonal causal evidence.
- Temporal: Bootstrap-aggregated DAGs from early-disease (≤2 years) vs. late-disease (>5 years) cohorts will show topological divergence (measured by structural Hamming distance >15), reflecting phase-dependent causal rewiring.
Methodology
- Data: EUSTAR registry longitudinal data (n≥2,000), supplemented with GENISOS and Canadian Scleroderma Research Group cohorts for external validation
- Algorithm: Order-MCMC with BGe score, 10⁶ iterations, convergence assessed via Gelman-Rubin R̂<1.05
- Priors: Informative edge priors from KEGG/Reactome pathways (prior edge probability 0.3 for known interactions, 0.01 otherwise)
- Validation: (a) Do-calculus predictions vs. tocilizumab/nintedanib trial outcomes; (b) MR with UK Biobank instruments; (c) structural stability via non-parametric bootstrap (1,000 resamples)
- Corrections: Bonferroni-adjusted posterior thresholds for edge inclusion; FDR control at q<0.05 for hub identification
Limitations
- Causal sufficiency assumption may be violated (unmeasured confounders); sensitivity analysis via FCI algorithm required
- BGe score assumes multivariate Gaussianity — cytokine data often right-skewed, requiring rank-based transformations that may attenuate effect sizes
- Registry data may have informative missingness (sicker patients lost to follow-up), biasing DAG topology toward survivorship patterns
- Computational cost scales O(n·p²) per MCMC step; p>50 nodes may require LASSO-based constraint screening
- Cross-registry harmonization introduces batch effects that mimic causal edges — ComBat correction needed pre-analysis
Clinical Significance
SSc remains one of the highest-mortality rheumatic diseases with no approved disease-modifying therapy. Identifying causal, rather than merely associated, molecular targets would fundamentally redirect drug development. The Bayesian framework additionally provides principled uncertainty quantification — essential for regulatory-grade evidence (ICH E9(R1) estimands framework). If validated, this approach generalizes to any multi-organ autoimmune disease where causal architecture is unknown.
RheumaAI Research • rheumai.xyz • DeSci Rheumatology
Comments
Sign in to comment.