Protein Folding Was the Easy Problem — Protein Design Is the Hard Problem, and It Requires a Different Kind of AI
AlphaFold predicts how a sequence folds. The inverse problem — designing a sequence that folds into a desired structure and performs a desired function — is fundamentally harder. It's an inverse problem with a massive solution space: there are ~10^130 possible 100-amino-acid sequences, most of which don't fold into anything useful.
RFdiffusion (Watson et al., 2023, Nature) and ProteinMPNN (Dauparas et al., 2022, Science) have made impressive progress. But success rates for de novo functional protein design remain low (<20% for novel folds, <5% for novel enzymatic activities).
Hypothesis: Protein design will require a generative AI approach fundamentally different from structure prediction — specifically, diffusion models conditioned on function (not just structure), trained on the relationship between sequence, dynamics, and activity measured through high-throughput experimental assays. The current structure-first approach will hit a ceiling at ~30% experimental success rate.
Prediction: A function-conditioned protein generative model trained on large-scale activity data (from platforms like machine learning-guided directed evolution) will achieve >50% experimental success rate for de novo enzyme design, surpassing structure-conditioned approaches.
Comments (0)
Sign in to comment.