Mechanism: This infographic illustrates a computational pipeline for discovering novel bioactive dipeptides, moving from enumeration and prediction to targeted docking and experimental validation. Readout: Readout: The process identifies dipeptides with predicted antioxidant, Nrf2 activation, and anti-inflammatory properties, feeding experimental results back into the machine learning model for refinement.
Motivation
The original ideas21 sketch proposed using machine learning to explore peptide space for longevity properties by mining patterns across species.
A natural extension is systematic exploration of the dipeptide chemical space (400 base compounds with 20 standard amino acids). Many bioactive dipeptides are already known (antioxidant, ACE-inhibitory, anti-inflammatory, opioid, etc.) from the BIOPEP database. Expanding to non-standard amino acids, methylation, acetylation, and γ-linkages would create a rich, tractable library.
Combinatorial Scope
Base space:
- 20 standard amino acids → 20×20 = 400 dipeptides (order matters: Ala-Gly ≠ Gly-Ala)
Expanded space (realistic for screening):
- Mirror-image (D-) amino acids (enantiomers of the 20 standard L-amino acids) — these often have greater resistance to proteases, extending half-life. Including D-variants roughly doubles the space.
-
- 8 non-standard/analogs (selenocysteine, pyrrolysine, ornithine, taurine, GABA, β-alanine, hydroxyproline, sarcosine) → ~28 AAs → ~784 dipeptides (or ~1,500+ when including D-forms and mixtures)
- Methylation variants (N-methyl on N-terminus or side chains) on key residues → selective addition of ~200–300 more
- Acetylation variants (common PTM)
- γ-linkages (as in γ-Glu-Cys, the GSH precursor) — adds stability and distinct pharmacology
Total searchable space: ~1,000–2,000+ compounds (still computationally trivial). The term you were thinking of is "enantiomers" (non-superimposable mirror images). "Isomorphs" usually refers to crystals of similar shape but different composition.
Proposed Computational Pipeline
-
Enumeration
- Script to generate all combinations in SMILES or InChI format
- Filter for drug-like properties (molecular weight < 300 Da for dipeptides, reasonable logP)
-
Bioactivity Prediction
- QSAR/QSPR models trained on BIOPEP and other bioactive peptide databases
- Predict antioxidant capacity, Nrf2 activation, anti-inflammatory, ACE-inhibition, etc.
- Use graph neural networks on molecular graphs for better performance than traditional descriptors
-
Targeted Docking
- Dock high-scoring candidates to relevant targets:
- Keap1 (for Nrf2 activation)
- GPx or glutathione reductase
- Inflammatory targets (COX-2, NF-κB)
- Aging-related proteins (sirtuins, mTOR, AMPK interfaces)
- Use AutoDock Vina or DiffDock for peptide-specific docking
- Dock high-scoring candidates to relevant targets:
-
Prioritization & Synthesis Recommendation
- Score by predicted activity + synthetic feasibility + predicted ADME/Tox (solubility, stability, renal clearance)
- Prioritize cysteine- or γ-Glu-containing sequences for glutathione support
- Prioritize stable forms (γ-linkage, D-amino acids, N-methylated)
-
Experimental Loop
- Synthesize or purchase top 10–20 candidates
- Test in cell models for GSH induction, antioxidant activity, anti-senescence effects
- Feed results back into the ML model (active learning)
Tools & Resources
- Databases: BIOPEP-UWM, PepBank, APD2 (antimicrobial), CPPsite (cell-penetrating peptides)
- Libraries: RDKit for enumeration and descriptors, DeepChem or PyTorch Geometric for ML models
- Docking: AutoDock Vina, DiffDock, or Rosetta for peptides
- Peptide-specific ML: PepNet, DeepPep, or fine-tuned ProtT5/ESM-2 models
- Hardware: Consumer GPU sufficient for initial screen (400–1500 compounds is small)
Feasibility & Cost Estimate
- Computational phase: 1–2 weeks on a decent laptop/GPU instance
- Synthesis of top 20 candidates: ~$2,000–5,000 (custom synthesis labs)
- In vitro testing: cell-based antioxidant/GSH assays (~$3,000–8,000)
Total initial proof-of-concept budget: ~$10,000 or less if leveraging open-source tools and academic collaborations.
This project would be a natural fit for Beach.science — completely tractable, crosses computational biology, chemistry, and aging research, and could discover novel dipeptides safe for complex geriatric profiles.
Comments
Sign in to comment.