Mechanism: Traditional GNN models for drug-target prediction can misinterpret high-degree 'hub proteins' in static PPI networks as having high drug-binding likelihood due to topology bias. Readout: Readout: Implementing degree-matched negative controls and age-specific edge weights significantly reduces inflated hub-target AUC scores, especially in inductive splits, while non-hub target performance remains stable.
Overview
Recent GNN models for drug‑target prediction achieve high scores on benchmark datasets, yet they rarely examine whether performance stems from true binding signals or from network topology bias [1][2]. Hub proteins such as mTOR and SIRT1 are especially prone to false positives because their high degree can be mistaken for drug‑binding propensity.
Mechanistic Rationale
Aging reshapes the protein‑protein interactome through post‑translational modifications, altered expression, and rewiring of signaling pathways. These changes are edge‑specific: some interactions are lost, others gained, while the overall degree distribution may stay similar. A GNN trained on a static, young‑adult PPI network therefore learns two things simultaneously: (1) the static topological signature of hubs and (2) any age‑dependent edge patterns that correlate with drug response. If the model’s attention mechanism conflates a node’s degree with its binding likelihood, predictions for hubs will remain high even when the underlying biology changes.
Testable Predictions
- When degree‑matched random graphs are used as negative controls, the performance gap between hub and non‑hub targets will shrink significantly.
- Incorporating age‑specific edge weight adjustments (e.g., down‑weighting edges known to lose phosphorylation‑dependent interactions in old tissue) will further reduce GNN scores for known aging hubs without affecting scores for validated non‑hub aging drugs.
- Models trained on static young‑adult networks will show inflated performance in transductive splits that share network neighborhoods, but performance will drop in inductive splits where the test set contains proteins whose degree distribution is unseen during training.
Experimental Design
- Data: Use a curated drug‑target set (e.g., Davis et al.) complemented with aging‑relevant targets from GenAge and DrugAge. Build three PPI backgrounds: (i) static young‑adult human interactome, (ii) same interactome with degree‑matched random edge rewiring, (iii) age‑specific interactomes derived from young vs old tissue‑specific phosphoproteomics data.
- Models: Train identical GNN architectures (e.g., GNPDTA, GPS‑DTI) under three negative‑sampling strategies: standard random negatives, degree‑matched negatives, and degree‑matched + age‑edge‑weight negatives.
- Evaluation: Compute ROC‑AUC and PR‑AUC separately for hub proteins (top 10 % degree) and non‑hub proteins. Perform both transductive (random node split) and inductive (degree‑stratified split) cross‑validation.
- Analysis: Compare performance differences across conditions using paired statistical tests (Wilcoxon signed‑rank).
Expected Outcomes and Falsifiability
- If the hypothesis is correct, we expect: (a) a statistically significant drop in hub‑target AUC when using degree‑matched negatives (p < 0.01); (b) an additional decrease when age‑specific edge weights are applied; (c) hub‑target performance in inductive splits to fall to near‑random levels, while non‑hub targets retain predictive power.
- Failure to observe any of these drops—i.e., hub performance remains high regardless of negative‑sampling strategy or edge‑weight adjustments—would falsify the claim that topology confounding drives the observed GNN signal for aging drug targets.
Implications
Demonstrating topology confounding would motivate the adoption of topology‑aware controls as a standard benchmark step and encourage the development of dynamic, age‑stratified interactome models for drug discovery.
Comments
Sign in to comment.