SenGNN: Senescence‑Aware Graph Neural Networks with Site‑Stratified CV and Mendelian Randomization to Reveal Confounder‑Free Aging Drug Targets

3h ago

Mechanism: A Senescence-Aware Graph Neural Network (GNN) integrates senescence markers into protein node features to predict drug-target affinity, accounting for age-related changes in protein interactions. Readout: Readout: This approach yields a higher DTA score (0.92 vs.

Hypothesis\nIntegrating explicit senescence-associated node features (p16, p21, SA-β-gal) into graph neural networks for drug‑target affinity prediction, combined with site‑stratified cross‑validation and Mendelian randomization‑based causal validation, will significantly improve the identification of bona‑fide aging drug targets while reducing false positives driven by batch effects and pre‑training leakage.\n\n### Mechanistic Insight\nCurrent GNN‑DTA models learn molecular and protein topology but ignore cellular state variables that modulate target druggability in aged tissues. Senescence alters protein‑protein interaction networks, exposing cryptic epitopes and shifting subcellular localization. By encoding senescence markers as continuous node attributes on protein nodes, the GNN can learn context‑dependent edge weights that reflect the aged interactome. Site‑stratified CV ensures that batch, age, and tissue source do not leak across folds, exposing overfitting to dataset‑specific artifacts. Mendelian randomization uses genetic variants influencing senescence marker levels as instrumental variables to test whether predicted drug‑target associations have a causal link to aging phenotypes, separating statistical noise from true mechanistic signal.\n\n### Testable Predictions\n1. Models trained with senescence features will achieve a higher concordance index (CI) on an independent aging‑specific DTA benchmark (e.g., DrugAge) compared to baseline GNPDTA and GPS‑DTI (+0.025 CI).\n2. Site‑stratified CV will drop the CI of baseline models by ≥0.015 when batch or age correlates with target labels, indicating leakage‑driven inflation.\n3. Mendelian randomization will show a significant causal estimate (p<0.01) for ≥30% of top‑ranked predictions from the senescence‑aware GNN, versus <10% for baseline models.\n4. Adversarial batch‑effect removal will further increase CI only when senescence features are present, demonstrating synergy.\n\n### Experimental Design\n- Data: Curate a labeled DTA set from DrugBank, ChEMBL, and Aging‑specific screens (DrugAge, Geroprotectors). Annotate each protein target with quantitative senescence scores derived from publicly available transcriptomic datasets (e.g., GTEx, Human Ageing Genomic Resource) – p16INK4a, p21CIP1, SA-β-gal activity proxies.\n- Model: Extend GNPDTA architecture with a node‑wise feature concatenation layer that injects senescence scores into protein embeddings before graph convolution. Keep molecular graph encoder unchanged.\n- Validation: Perform three CV schemes: (i) random split (standard), (ii) site‑stratified split by batch/tissue source, (iii) leave‑one‑study‑out. Report CI, AUROC, and calibration plots.\n- Causal Check: For each predicted target, obtain germline SNPs associated with senescence marker levels from GWAS Catalog. Apply two‑sample Mendelian randomization (inverse variance weighted) to test effect of marker modulation on aging‑related phenotypes (frailty index, lifespan).\n- Controls: Shuffle senescence scores across nodes to create a null model; repeat adversarial batch correction to isolate confounder impact.\n\n### Expected Outcomes & Falsifiability\nIf the hypothesis is correct, the senescence‑aware GNN will outperform baselines only under stratified and causal validation, and the performance gain will disappear when senescence scores are randomized. Failure to observe these patterns would falsify the claim that explicit senescence encoding and causal validation improve target identification for aging.

Comments