Mechanism: A β-VAE processes multi-omic data from SLE patients to generate disentangled latent factors representing disease axes. Readout: Readout: These factors enable counterfactual simulation of treatment outcomes, achieving 80% concordance with observed SLEDAI response and DCI 0.7 for disentanglement.
Hypothesis
A β-variational autoencoder (β-VAE) trained on joint multi-omic data (transcriptomics, proteomics, metabolomics, and immunophenotyping) from longitudinal SLE cohorts learns disentangled latent factors that correspond to biologically interpretable disease axes (interferon signature, complement consumption, B-cell hyperactivity, metabolic dysfunction). Crucially, interventions on individual latent dimensions enable counterfactual reasoning — simulating "what would happen if this patient received belimumab vs. rituximab vs. voclosporin" — with >80% concordance to observed outcomes in held-out validation.
Background and Rationale
Current biologic selection in SLE relies on phenotypic classification (renal vs. cutaneous vs. hematologic) and expert intuition. This fails to capture the high-dimensional, nonlinear interactions among immune pathways that determine treatment response. Standard predictive models (logistic regression, random forests) learn correlations but cannot perform interventional reasoning — they cannot answer "what would have happened under an alternative treatment?"
Disentangled representation learning offers a principled solution. The β-VAE objective encourages statistical independence among latent factors via a KL-divergence penalty weighted by β > 1. When trained on sufficiently rich multi-omic data, each latent dimension captures a distinct biological process. Because these factors are independent, intervening on one (simulating drug action on a specific pathway) does not spuriously alter others — satisfying a key requirement for valid counterfactual inference.
This connects to the structural causal model (SCM) framework: if disentangled latent factors approximate the true causal variables of the data-generating process, then do-calculus interventions on these factors approximate real-world treatment effects.
Testable Predictions
- Disentanglement quality: β-VAE latent factors will achieve DCI disentanglement score >0.7 on held-out multi-omic data, with individual factors correlating (|r| > 0.6) to known biological axes (IFN score, C3/C4, CD19+ count, serum metabolite clusters)
- Counterfactual accuracy: Simulated treatment outcomes via latent-space intervention will achieve >80% concordance (AUROC) with observed 6-month SLEDAI response in a held-out cohort of ≥200 patients
- Superiority over correlative models: Counterfactual-based biologic selection will outperform random forest classifiers by ≥15 percentage points in predicting SRI-4 response at 52 weeks
- Biological plausibility: Latent traversals along the "interferon" dimension will recapitulate known gene expression changes induced by anifrolumab, validating the biological interpretability of learned representations
- Generalization: Models trained on one ethnic cohort will maintain >70% concordance when applied to genetically distinct populations, with pharmacogenomic covariates (CYP2D6, CYP3A4) improving cross-population transfer by ≥10%
Proposed Methodology
- Data: Longitudinal multi-omic panels from ≥500 SLE patients across ≥3 treatment arms (belimumab, rituximab, voclosporin/standard-of-care), sampled at baseline, 3, 6, and 12 months
- Architecture: β-VAE with 64-dimensional latent space, β=4, convolutional encoder for omic tensors, with auxiliary classifiers for semi-supervised disentanglement
- Counterfactual engine: Treatment-specific decoder heads conditioned on latent representations; intervention via latent dimension clamping informed by known drug mechanism-of-action mapping
- Validation: 5-fold cross-validation + external validation on independent cohort; calibration via Platt scaling; fairness audit across demographic subgroups
- Causal validation: Compare counterfactual predictions against propensity-score-matched observational treatment switches (natural experiments)
Limitations
- Disentanglement is not guaranteed even with high β — information-theoretic bounds (Locatello et al., 2019) show that fully unsupervised disentanglement is impossible without inductive biases. We mitigate this with semi-supervised auxiliary losses anchored to known biology
- Counterfactual validity assumes the latent factors approximate true causal variables — this is an untestable assumption that can only be partially validated via downstream prediction accuracy
- Multi-omic data collection is expensive and not universally available, limiting immediate clinical translation
- The 64-dimensional latent space may be insufficient for capturing all relevant biological variation, or overparameterized for smaller cohorts — sensitivity analysis across latent dimensions is required
- Cross-population generalization depends on shared causal structure across genetic backgrounds, which may not hold for all disease axes
Clinical Significance
If validated, this framework transforms biologic selection from empirical trial-and-error into principled counterfactual reasoning. A rheumatologist could input a patient's baseline multi-omic profile and receive probabilistic predictions for response to each available biologic — with uncertainty quantification and biological explanation of why a specific drug is recommended. This reduces time-to-optimal-therapy, minimizes exposure to ineffective treatments, and enables genuinely personalized medicine in SLE.
The disentangled representation also serves as a foundation for digital twin construction — continuously updated patient models that simulate disease trajectory under various therapeutic scenarios.
RheumaAI Research • rheumai.xyz • DeSci Rheumatology
Comments
Sign in to comment.