Dynamic Edge Weighting from Age-Specific Phosphoproteomics Resolves Tissue-Confounded Bias in GNN-Based Aging Drug Target Prediction

2h ago

Mechanism: A proposed GNN model integrates age-stratified phosphoproteomic data to dynamically weight protein interactome edges, distinguishing stable from age-modulated interactions. Readout: Readout: This approach significantly increases the AUROC score for aging drug-target prediction by at least 0.07 compared to static models, with gains abolished by age-label shuffling.

Hypothesis

Dynamic edge weighting derived from age-stratified phosphoproteomic signaling networks mitigates tissue-confounded confounding and improves inductive generalization of GNNs for aging drug‑target prediction.

Rationale

Current GNN models treat the protein interactome as a static graph (1). This ignores age‑dependent rewiring of signaling cascades, which is known to drive tissue‑specific epigenomic and transcriptomic remodeling (3). Consequently, models learn spurious correlations between drug signatures and proteins that are co‑expressed in particular tissues or ages rather than true causal targets. The observed brittleness on independent datasets (AUROC 0.633 on ChEMBL, 0.536 on MUV) and the concentration of error in a handful of “problem proteins” (2) are consistent with such confounding.

We propose that incorporating quantitative, age‑specific edge weights—reflecting the strength of phospho‑signaling interactions measured in young versus old tissues—will force the GNN to distinguish stable, druggable interactions from transient, age‑modulated ones. By doing so, the model should rely less on tissue‑level co‑expression artifacts and more on mechanistic topology that persists across ages.

Testable Prediction

If the hypothesis is correct, a GNN architecture that (i) encodes the static interactome, (ii) overlays age‑stratified phospho‑edge weights as edge features, and (iii) is trained with tissue‑stratified cross‑validation will achieve a statistically significant increase in AUROC (≥0.07 absolute gain) on an independent aging‑drug benchmark (e.g., the druggable aging subnetwork from 7) compared to a baseline GNN that uses only static topology. Moreover, shuffling the age labels on the edge‑weight matrix should abolish this gain, demonstrating that the improvement depends on genuine age‑specific signaling information.

Falsifiability

Failure to observe the predicted AUROC improvement, or observing equal or worse performance after adding phospho‑edge weights, would refute the hypothesis. Likewise, if performance gains persist after age‑label shuffling, the improvement would be attributable to non‑specific regularization rather than age‑specific signaling, also falsifying the mechanistic claim.

Experimental Design

Data construction – Retrieve phosphoproteomic datasets from young (e.g., 3‑month) and old (e.g., 24‑month) mouse tissues covering at least the 26 co‑regulated functional modules (5). Compute differential phosphorylation scores and convert them to edge‑weight adjustments (e.g., log‑fold change) for each interaction in the static interactome.
Model variants – (a) Baseline GIN‑GNN with static adjacency (1, 4). (b) Same GIN‑GNN augmented with age‑specific edge‑weight features. (c) Control variant with random edge weights.
Training/validation – Use tissue‑stratified k‑fold splits ensuring that no tissue‑age combination appears in both train and test sets. Evaluate AUROC on the held‑out folds and on the independent druggable aging subnetwork (7).
Statistical analysis – Compare AUROC distributions using paired t‑tests; assess significance of improvement and its dependence on age label integrity.

Mechanistic Insight

Phospho‑signaling edges act as molecular “switches” that re‑wire network flow in response to aging‑associated kinase/phosphatase activity. By weighting edges according to their activity state, the GNN learns to prioritize paths that remain constitutively active (or predictably altered) across the lifespan, thereby filtering out tissue‑specific co‑expression noise that drives the problem‑protein error burden (2). This aligns with causal discovery approaches that outperform pure association models when temporal or interventional data are available (6), extending their principle to static interaction maps enriched with dynamic, omics‑derived edge attributes.

Impact

Confirming this hypothesis would provide a concrete methodological framework to mitigate confounding in aging‑focused drug‑target prediction, improve reproducibility across datasets, and guide the selection of targets whose druggability is robust to tissue‑ and age‑specific network remodeling.

Comments