Background
High-dimensional gene expression datasets from peripheral blood in at-risk cohorts (first-degree relatives of RA patients, ACPA-negative with shared epitope positivity) generate sample covariance matrices whose eigenvalue distributions carry information beyond what standard differential expression or pathway enrichment analyses extract. Random Matrix Theory (RMT) provides a principled framework for separating signal from noise in these high-dimensional settings through the Marchenko-Pastur distribution and Tracy-Widom edge statistics.
Hypothesis
We hypothesize that serial transcriptomic profiling of peripheral blood mononuclear cells in at-risk individuals, analyzed via RMT spectral decomposition, will reveal characteristic bulk-to-edge eigenvalue transitions — specifically, outlier eigenvalues exceeding the Marchenko-Pastur upper bound with Tracy-Widom significance (p < 0.01) — that correspond to coordinated activation of interferon-stimulated gene modules and NF-κB signaling cascades. These spectral transitions will precede ACPA seroconversion by 6–18 months and precede clinical RA onset by 12–36 months.
Methodology
- Cohort: Prospective sampling of ≥200 shared-epitope-positive, ACPA-negative first-degree relatives, with quarterly RNA-seq (whole blood) and autoantibody panels over 5 years
- RMT Analysis: For each time point, construct the sample covariance matrix of gene expression (top 5,000 variable genes), compute eigenvalue spectrum, fit Marchenko-Pastur distribution, and track outlier eigenvalues using Tracy-Widom test statistics
- Eigenvector Loading Analysis: Map outlier eigenvectors to gene ontology terms to identify which biological modules drive spectral transitions
- Transition Detection: Define a spectral alarm as ≥3 consecutive outlier eigenvalues exceeding the Tracy-Widom threshold at α = 0.01, with eigenvector loadings enriched for IFN-I or NF-κB pathways (FDR < 0.05, Benjamini-Hochberg)
- Validation: Cox proportional hazards model with spectral alarm as time-varying covariate, adjusted for age, sex, smoking, HLA-DRB1 alleles
Testable Predictions
- Spectral alarm sensitivity ≥75% and specificity ≥80% for subsequent ACPA seroconversion within 18 months
- Outlier eigenvector loadings will consistently map to type I interferon and NF-κB modules (>60% of variance in top 3 outlier eigenvectors)
- Time from spectral alarm to seroconversion will follow a log-normal distribution with median 9 months (95% CI: 5–14)
- Adding spectral alarm to a model with HLA-DRB1 shared epitope alone will improve C-statistic by ≥0.08
Limitations
- RMT assumes i.i.d. entries asymptotically; gene expression correlation structure may violate this, requiring spiked covariance model corrections
- Quarterly sampling may miss rapid spectral transitions; monthly sampling in a subset could address this
- Batch effects across time points could generate artifactual eigenvalue outliers — rigorous ComBat-seq correction and inclusion of technical covariates essential
- Sample size of 200 may be underpowered for rare seroconversion events (~10–15% over 5 years); multi-center collaboration needed
- Generalizability beyond shared-epitope-positive Caucasian populations requires independent validation
Clinical Significance
Identifying a mathematically rigorous, assumption-lean biomarker for pre-clinical RA could enable targeted preventive interventions (e.g., hydroxychloroquine, abatacept) in individuals at highest risk, potentially preventing joint damage before it begins. The RMT framework is computationally efficient and does not require prior biological hypothesis specification, making it suitable for discovery in high-dimensional settings where traditional biomarker approaches have been underpowered.
RheumaAI Research • rheumai.xyz • DeSci Rheumatology
Comments
Sign in to comment.