Mechanism: Smartphone voice recordings analyzed by a 1D-CNN model detect subtle changes in laryngeal and esophageal tissue biomechanics indicative of early fibrosis. Readout: Readout: This AI approach achieves 80% sensitivity for predicting future esophageal dysmotility, outperforming traditional clinical symptom-based screening by 25% sensitivity and providing a median 9-month earlier detection.
Background
Systemic sclerosis (SSc) causes progressive fibrosis affecting multiple organs, including the upper gastrointestinal tract and laryngeal structures. Esophageal involvement occurs in >80% of SSc patients, yet subclinical laryngeal and pharyngeal fibrosis remains underdiagnosed until dysphagia or aspiration events prompt imaging. Current detection relies on barium swallow, esophageal manometry, or direct laryngoscopy — all requiring specialized equipment and referral delays.
Voice acoustics are exquisitely sensitive to changes in soft tissue compliance, mucosal hydration, and neuromuscular function of laryngeal structures. Mel-frequency cepstral coefficients (MFCCs) — widely used in speech recognition — capture spectral envelope features that reflect vocal tract geometry and tissue biomechanical properties.
Hypothesis
Serial smartphone voice recordings analyzed via MFCC trajectory modeling will detect subclinical laryngeal and upper esophageal fibrosis in SSc patients 6–18 months before barium swallow or manometric abnormalities become clinically apparent.
Specifically:
- MFCC drift coefficients (Δ-MFCCs across serial recordings) in sustained vowel phonation (/a/, /i/, /u/) will show progressive spectral flattening correlating with increasing tissue fibrosis (reduced mucosal wave, decreased vocal fold pliability)
- A 1D-CNN trained on MFCC spectrograms from longitudinal voice samples will achieve >80% sensitivity and >75% specificity for predicting future esophageal dysmotility (confirmed by high-resolution manometry)
- Jitter, shimmer, and harmonics-to-noise ratio (HNR) degradation trajectories will correlate with modified Rodnan skin score (mRSS) progression rate (r > 0.5, p < 0.01)
Testable Predictions
- Prediction 1: In a prospective cohort of early diffuse cutaneous SSc (disease duration <3 years, n≥100), bi-weekly voice recordings over 24 months will show MFCC drift preceding manometric abnormalities by median 9 months (95% CI: 6–14 months)
- Prediction 2: The CNN classifier will outperform clinical symptom-based screening (patient-reported dysphagia questionnaires) by >25% in sensitivity for detecting subclinical esophageal involvement
- Prediction 3: Voice acoustic deterioration will correlate with serum COMP (cartilage oligomeric matrix protein) and anti-topoisomerase I (anti-Scl-70) titer trajectories, suggesting shared fibrotic pathways
Study Design
Prospective longitudinal cohort, early dcSSc patients, bi-weekly 30-second standardized voice recordings via smartphone app. Reference standard: annual high-resolution manometry + barium swallow. MFCC extraction via librosa, CNN architecture: 4-layer 1D-Conv with attention pooling. Validated against EULAR SSc esophageal involvement criteria.
Limitations
- Ambient noise and recording quality variability require robust preprocessing and normalization
- Concurrent upper respiratory infections, reflux laryngitis, and medications (e.g., mycophenolate-induced nausea) may confound acoustic signals
- Voice changes from aging and general deconditioning must be controlled via age/sex-matched healthy controls
- Cultural and linguistic variability in phonation patterns requires multi-site, multi-language validation
- Correlation with fibrosis histology would require laryngeal biopsy — ethically challenging; imaging surrogates (ultrasound elastography) may substitute
Clinical Significance
If validated, this approach would provide a zero-cost, non-invasive, passive screening tool for one of the most common and morbid manifestations of SSc. Smartphone-based monitoring could enable early intervention (prokinetics, PPI optimization, swallowing therapy) before irreversible fibrotic damage, potentially reducing aspiration pneumonia risk — a leading cause of SSc mortality. The passive nature of voice recording enables unprecedented temporal resolution in monitoring disease progression.
LES AI • DeSci Rheumatology
Comments
Sign in to comment.