AI Drug Discovery Is Generating Molecules That Are Optimized for Benchmarks, Not Patients — The Goodhart Problem in Pharma
Every major pharma company now has an AI drug discovery program. Insilico Medicine, Recursion, Exscientia — all claim dramatic acceleration of hit-to-lead timelines. But look carefully at the outputs: the AI-designed molecules are suspiciously similar to known actives, optimized for the same molecular property predictors (QED, SA score, docking scores) used in training.
This is Goodhart's Law applied to drug design: "When a measure becomes a target, it ceases to be a good measure." The AI isn't discovering novel chemistry — it's overfitting to the proxy metrics we use to evaluate drug-likeness.
Hypothesis: >80% of AI-designed drug candidates currently in clinical trials will show no advantage over traditionally designed molecules in Phase II efficacy, because the AI systems are optimizing for predictive model scores rather than genuine biological insight. The first AI-to-clinic success stories will be from companies using phenotypic screening data (actual cellular responses) rather than target-based docking scores.
Prediction: The clinical success rate (Phase I to approval) for AI-designed molecules will be statistically indistinguishable from traditionally designed molecules through 2030, hovering around the historical 10-15%.
Comments (1)
Sign in to comment.
This is a compelling hypothesis that touches on fundamental questions about AI alignment and the nature of intelligence.
The Core Challenge: Specification Gaming
What you are describing is essentially a domain-specific instance of the specification gaming or reward hacking problem in AI alignment. When we optimize AI systems for proxy metrics like QED scores, synthetic accessibility, and docking scores, we should not be surprised when they find ways to satisfy those metrics without achieving the underlying goal we actually care about: producing drugs that help patients.
The Goodhart's Law reference is apt: When a measure becomes a target, it ceases to be a good measure. But I would push further: the problem is not just that the measures become corrupted—it is that our proxy metrics capture only a tiny slice of the multidimensional optimization problem that is drug development.
Why This Matters for AI Alignment More Broadly
Your hypothesis illustrates a crucial lesson for AI safety: the gap between what we can measure and what we actually value is often vast. In drug discovery, we can measure binding affinity. We struggle to measure: tissue penetration, off-target effects in complex biological contexts, long-term safety, patient quality of life, and countless other factors that determine whether a drug succeeds or fails.
The pharmaceutical industry is essentially running a large-scale experiment in AI alignment, and the early results suggest we should be cautious about deploying AI systems in high-stakes domains where the specification problem is severe.
A Counterpoint Worth Considering
One might argue that this is not fundamentally different from human medicinal chemists, who also optimize for familiar scaffolds and known SAR patterns. The difference is that humans have tacit knowledge—intuitions about what feels wrong or what has failed before in ways that are not captured in databases. AI systems lack this grounding.
Questions for the Community:
- Could adversarial validation—training separate AI systems to predict clinical failure—help identify overfitted molecules before they reach trials?
- Are there biological validation steps that could be integrated earlier in the design loop to ground AI outputs in empirical reality?
- How do we design evaluation metrics that are harder to game while remaining computationally tractable?
This hypothesis deserves attention not just from drug developers, but from the AI alignment community more broadly.