Mechanism: A Rheumatology Foundation Model (RFM) pre-trained on abundant common disease data transfers its learned patterns to rare autoimmune conditions. Readout: Readout: Fine-tuning the RFM with fewer than 200 rare cases achieves high predictive accuracy (AUROC ≥ 0.78), while revealing that over 60% of attended tokens are from pre-training.
Background
Rare autoimmune diseases — adult-onset Still disease (AOSD), IgG4-related disease (IgG4-RD), relapsing polychondritis, eosinophilic granulomatosis with polyangiitis (EGPA) — collectively affect millions yet individually lack sufficient cohort sizes for robust predictive modeling. Traditional supervised learning requires thousands of labeled trajectories, rendering these conditions perpetually under-modeled.
Recent rheumatology foundation models (RFMs) pre-trained on large multi-site datasets of common diseases (RA, SLE, SpA) learn generalizable representations of inflammatory dynamics, lab trajectory patterns, and treatment response phenotypes. We hypothesize that these learned representations transfer to rare conditions with minimal fine-tuning.
Hypothesis
A GPT-2-architecture foundation model pre-trained on ≥500,000 longitudinal rheumatology encounters (OMOP-tokenized: labs, diagnoses, medications, disease activity scores, temporal tokens) can be fine-tuned with fewer than 200 labeled cases per rare autoimmune condition to predict 6-month disease activity trajectories with discriminative performance (AUROC ≥ 0.78) comparable to disease-specific models trained on 2,000+ cases.
Mechanistic Rationale
The key insight is that inflammatory autoimmune diseases share deep mechanistic structure:
- Shared cytokine grammar: IL-6, TNF-α, IFN-γ trajectory patterns learned from RA/SLE generalize because the same signaling cascades drive AOSD (IL-18/IL-6) and EGPA (IL-5/IL-13 with IL-6 co-activation)
- Treatment response homology: Corticosteroid taper dynamics, biologic onset-of-action curves, and immunosuppressant dose-response relationships exhibit conserved temporal signatures across diseases
- Lab trajectory transferability: CRP/ESR decay kinetics, ferritin dynamics (critical for AOSD), complement consumption patterns, and cytopenias follow learnable archetypes
The foundation model encodes these shared patterns as transferable attention weights. Few-shot fine-tuning then learns disease-specific deviations (e.g., the pathognomonic quotidian fever + ferritin spike in AOSD).
Testable Predictions
- Primary: Fine-tuned RFM with n=150 AOSD cases achieves AUROC ≥ 0.78 for predicting active vs. inactive disease at 6 months, vs. AUROC ≤ 0.62 for a de novo logistic regression on the same 150 cases
- Secondary: Cross-attention probing reveals that >60% of the top-50 attended tokens in rare-disease predictions map to features learned during common-disease pre-training (shared inflammatory grammar)
- Calibration: Bayesian calibration (Platt scaling + MCMC posterior) yields expected calibration error (ECE) < 0.08 even with n < 200, because the pre-trained prior regularizes the posterior
- Negative control: A foundation model pre-trained on non-rheumatologic data (cardiology encounters) fine-tuned with the same 200 cases shows no transfer benefit (AUROC improvement < 0.03), confirming domain-specific transfer
Proposed Validation
- Pre-training corpus: OMOP-formatted EHR data from ≥3 academic centers, >500K encounters across RA, SLE, SpA, SSc, vasculitis
- Fine-tuning sets: Retrospective cohorts of AOSD (n=150), IgG4-RD (n=180), relapsing polychondritis (n=120), EGPA (n=160)
- Architecture: GPT-2 decoder (12 layers, 768 hidden) with disease-activity prediction head; optional cross-attention module for genomic PRS integration
- Evaluation: 5-fold stratified cross-validation with bootstrapped 95% CIs; DeLong test comparing fine-tuned vs. de novo models
- DeSci infrastructure: Federated pre-training across sites via secure aggregation; model weights and evaluation code published on-chain via IPFS+Ethereum attestation
Limitations
- Distribution shift: Pre-training on common diseases may encode biases (e.g., RA treatment patterns) that distort rare-disease predictions; careful monitoring of attention attribution is essential
- Label heterogeneity: Disease activity definitions differ across rare conditions (Pouchot score for AOSD vs. IgG4-RD Responder Index); harmonization introduces noise
- Small-sample overfitting: Despite transfer learning regularization, n < 200 remains vulnerable to batch effects and site-specific confounders
- Temporal tokenization assumptions: OMOP temporal binning may lose clinically relevant intra-day dynamics (e.g., quotidian fever periodicity in AOSD)
- External validation: Requires independent multi-ethnic cohorts not used in pre-training
Clinical Significance
If validated, this approach democratizes predictive modeling for the long tail of rare autoimmune diseases. Clinicians managing AOSD or IgG4-RD — currently relying on expert opinion and small case series — would gain calibrated, evidence-derived predictions from a model that leverages the collective knowledge embedded in hundreds of thousands of common-disease encounters. The DeSci federated training paradigm ensures that small centers can contribute their rare-disease cases without exposing patient-level data, creating a positive-sum data flywheel.
RheumaAI Research • rheumai.xyz • DeSci Rheumatology
Comments
Sign in to comment.