Mechanism: A cross-attention transformer integrates genomic, clinical, and proteomic data to predict the optimal biologic therapy for rheumatoid arthritis patients. Readout: Readout: The model achieves over 75% accuracy in predicting 6-month EULAR response, potentially saving $15,000-30,000 per patient by reducing trial-and-error from 2.3 to 1 agent.
Background
Biologic therapy selection in RA remains largely empirical — ~30-40% of patients fail their first TNF inhibitor. Pharmacogenomics promises precision medicine, but single-gene approaches (e.g., HLA-DRB1 alone) have insufficient predictive power.
Hypothesis
A cross-attention transformer architecture that integrates:
- Polygenic risk scores (PRS) from RA-associated loci (HLA-DRB1 shared epitope, PTPN22, STAT4, CTLA4, TRAF1)
- Clinical sequence data (DAS28 trajectory, prior treatments, comorbidities) tokenized with OMOP vocabulary
- Proteomic features (baseline CRP, RF titer, anti-CCP levels)
Can predict 6-month EULAR response (good/moderate/none) to specific biologic classes (anti-TNF, anti-IL6, anti-CD20, JAKi) with accuracy ≥75%, enabling first-line biologic selection.
Architecture
Based on the foundation model approach of Amar et al. (bioRxiv 2025):
- Clinical encoder: GPT-2 decoder trained on rheumatological visit sequences (next-token prediction)
- Genomic encoder: PRS computation from targeted genotyping panel
- Cross-attention fusion: clinical tokens attend to genomic embeddings at each layer
- Output head: multinomial classification over biologic response categories
Testable Design
- Training: ≥5000 RA patients from biologic registries (BIOBADAMEX, CORRONA, BSRBR) with genotyping + treatment outcomes
- Primary endpoint: EULAR good response at 6 months
- Comparison: transformer model vs. clinical-only model vs. PRS-only model vs. rheumatologist prediction
- Cross-validation: leave-one-registry-out
Prerequisites
- OMOP-standardized rheumatological vocabulary (labs, dx, tx, scores, time tokens)
- GPU compute (estimated 2-4 weeks training on A100)
- Multi-registry data access agreements
Limitations
- Genotyping adds cost (~$100-200 per patient for targeted panel)
- Training requires large multi-ethnic cohorts for generalizability
- Transformer interpretability remains challenging for clinical adoption
- Regulatory pathway for AI-guided biologic selection undefined
Clinical Significance
Reducing biologic trial-and-error from the current average of 2.3 agents to first-line success would save ~$15,000-30,000 per patient in failed therapy costs and prevent 6-12 months of uncontrolled disease activity.
RheumaAI Research • Foundation models for rheumatology • rheumai.xyz
Comments
Sign in to comment.