Prospective docking of large libraries against unrefined AlphaFold2 (AF2) models of the σ2 and 5‑HT2A receptors yielded hit rates and affinities similar to those from experimental structures, and cryo‑EM of a potent 5‑HT2A ligand showed residue accommodations resembling the AF2 prediction. This demonstrates that AF2 models can sample alternative low‑energy conformations relevant for ligand discovery, extending the utility of structure‑based drug design.
AlphaFold2 (AF2) models have had wide impact but mixed success in retrospective ligand recognition. We prospectively docked large libraries against unrefined AF2 models of the σ 2 and serotonin 2A (5-HT2A) receptors, testing hundreds of new molecules and comparing results with those obtained from docking against the experimental structures. Hit rates were high and similar for the experimental and AF2 structures, as were affinities. Success in docking against the AF2 models was achieved despite differences between orthosteric residue conformations in the AF2 models and the experimental structures. Determination of the cryo–electron microscopy structure for one of the more potent 5-HT2A ligands from the AF2 docking revealed residue accommodations that resembled the AF2 prediction. AF2 models may sample conformations that differ from experimental structures but remain low energy and relevant for ligand discovery, extending the domain of structure-based drug design.
Structure-based docking is widely used in early-stage ligand discovery, typically relying on experimental protein structures from X-ray crystallography or cryo-electron microscopy (cryo-EM). Where experimental structures are unavailable, homology models have been used but performance can suffer when sequence identity to templates is low. Deep-learning methods such as AlphaFold2 (AF2) have produced highly accurate models at proteome scale, raising the possibility that AF2 structures could be useful for drug discovery. However, retrospective studies have reported mixed results for AF2 in ligand recognition: although AF2 models are globally accurate, small differences in binding-site side chains can impair retrospective enrichment of known ligands versus decoys. A. and colleagues set out to test prospectively whether unrefined AF2 models can support large-library docking and lead discovery on par with experimental structures. They focused on two pharmaceutically relevant membrane proteins—the σ2 receptor and the serotonin 2A (5-HT2A) G protein–coupled receptor—docked ultralarge libraries against both AF2 models and experimental structures, synthesised hundreds of top-ranked molecules from each campaign, and compared hit rates, affinities and functional profiles. The study also included retrospective docking analyses and cryo-EM determination of one AF2-derived ligand bound to 5-HT2A to evaluate how AF2-predicted conformations relate to experimentally observed ligand–receptor complexes.
Papers cited by this study that are also in Blossom
Kim, K., Che, T., Panova, O. et al. · Cell (2020)
Kaplan, A. L., Confair, D. N., Kim, K. et al. · Nature (2022)
Cao, C., Barros-Álvarez, X., Kim, K. et al. · Neuron (2022)
Cao, D., Yu, J., Wang, H. et al. · Science (2022)
Papers in Blossom that reference this study
Overall strategy: The researchers carried out parallel large-scale, prospective docking campaigns against unrefined AF2 models and against experimental structures for two targets (σ2 and 5-HT2A). Docking used DOCK3.8, with pregenerated energy grids for van der Waals, Poisson–Boltzmann electrostatics and ligand desolvation. Libraries were ultralarge: 490 million cationic molecules from ZINC15 for σ2, and more than 1.6 billion molecules from ZINC20/ZINC22 for 5-HT2A. Top-ranked molecules were filtered, clustered and prioritised for synthesis under consistent criteria for both AF2 and experimental-structure arms. AF2 model preparation and docking setup: AF2 models were obtained from the AlphaFold Protein Structure Database (specific model IDs stated for each target). Structures were protonated at pH 7.0 and assigned AMBER united-atom charges; for σ2, one glutamate (E73) was modelled as neutral. The σ2 receptor was embedded in a lipid bilayer and subjected to a 50-ns coarse-grained molecular dynamics (MD) run only to define a low-dielectric lipid environment for electrostatic calculations; the AF2 coordinates themselves were not refined. Matching spheres for docking were derived from best docked poses of known binders (PB28 for σ2, lisuride for 5-HT2A). Docking evaluation used ligand–decoy enrichment (logAUC) and tests against “extrema” and “goldilocks” sets to check for biases in physicochemical properties. Hit selection and synthesis: After docking, the top fractions of the ranked libraries (e.g., top 0.06% for σ2, top 0.2% for 5-HT2A) were filtered for novelty using ECFP4 Tanimoto cut-offs against known ligands, clustered to ensure topological diversity, and then additional filters were applied (torsional strain, formation of expected interactions, exclusion of un-compensated donors). Human inspection was used alongside machine ranking to select molecules for make-on-demand synthesis from commercial vendors; ultimately 119 compounds were synthesised for the σ2 AF2 campaign, and 223 (experimental) and 161 (AF2) for 5-HT2A. Biochemical and functional assays: Primary binding was assessed by competitive radioligand displacement in membranes expressing the target receptors. For σ2, [3H]-DTG displacement assays were used; for 5-HT2A, [3H]-LSD was used with an initial single-concentration screen (10 μM) followed by concentration–response curves for strong binders to derive K_i values by Cheng–Prusoff correction. Functional activity (agonism/antagonism) and subtype selectivity were probed in calcium mobilisation assays in HEK293 cells stably expressing 5-HT2A/2B/2C, with thresholds set (≥10% maximal agonist response to call an agonist). G protein versus β-arrestin bias was measured by BRET assays (Gαq dissociation and β-arrestin2 recruitment). Controls for promiscuity and aggregation: Selected potent ligands were counter-screened against an unrelated GPCR (V1A vasopressin receptor) to detect off-target activity. Colloidal aggregation was assessed by dynamic light scattering (DLS) and by testing inhibition of malate dehydrogenase (MDH) as an enzyme counter-screen, typically at concentrations an order of magnitude above on-target activities. Structural characterisation: The authors determined cryo-EM structures for an active-state lisuride–5-HT2A–mini-Gαq complex (consensus reconstruction at 3.1 Å) and for a 5-HT2A complex with an AF2-derived agonist (Z7757) at 3.0 Å. Complexes were formed with mini-GαqiN–Gβ1–Gγ2 and scFv16, purified and vitrified; data were processed in cryoSPARC/Relion, models were refined in Phenix/COOT, and ligand poses validated with Gemspot/Emerald. Standard details for membrane preparation, radioligand assays, calcium assays, BRET, DLS, MDH, protein purification and cryo-EM data collection/processing are provided in the Methods.
Assessing AF2 models: For σ2 and 5-HT2A, AF2 models had binding sites that were judged ligandable by SiteMap (scores >1; σ2: 1.2 AF2 versus 1.3 crystal, 5-HT2A: 1.2 for both AF2 and cryo-EM). For σ2, orthosteric site side-chain conformations in the AF2 model matched the crystal structure closely (overall side-chain RMSD ≈1.1 Å with few residues >1.5 Å). For 5-HT2A, most binding-site residues were predicted within <2 Å RMSD versus the cryo-EM structure, but two residues (F234 and L229) differed substantially (2.5–3.1 Å) adopting different rotamers; overall backbone RMSD was 1.6 Å for 5-HT2A compared with 0.5 Å for σ2. These differences reduced the calculated pocket volume in the AF2 5-HT2A model (732 Å3) versus the lisuride cryo-EM structure (816 Å3). Retrospective docking: Redocking known ligands and literature actives returned better retrospective enrichment for experimental structures than for AF2 models. Against σ2, a previous experimental-structure docking campaign had yielded 70 actives (51% hit rate) from 138 tested top-ranked molecules; redocking the same library against the AF2 σ2 model demoted those molecules out of the prior top ranks and retrospective enrichment (logAUC) dropped (from 39 for crystal to 16 for AF2 in one test). Similar retrospective superiority of experimental structures over AF2 was observed for 5-HT2A. Prospective σ2 docking: From docking 490 million molecules against the σ2 AF2 model, 119 high-ranking molecules were synthesised; in radioligand displacement at 1 μM, 64 displaced >50% [3H]-DTG (hit rate 54%). This closely matched the prior crystal-structure campaign hit rate of 51% (difference not statistically significant). The top 18 AF2-derived hits had K_i values from 1.6 to 84 nM, with 13 <50 nM and two <5 nM; the most potent AF2 hit (ZINC866533340) had K_i = 1.6 nM and represented a previously unseen chemotype. Despite similar hit rates and affinity ranges, overlap between AF2- and experimental-derived ligands was minimal: only one of 134 new ligands shared the same core scaffold, with average pairwise ECFP4 Tanimoto coefficients ≈0.32, near random. Prospective 5-HT2A docking: More than 1.6 billion molecules were docked against both the lisuride cryo-EM structure and the AF2 model. After filtering and selection, 223 cryo-EM–docked and 161 AF2-docked molecules were synthesised. In primary binding screens (10 μM), 51/223 cryo-EM molecules (23%) and 42/161 AF2 molecules (26%) displaced >50% [3H]-LSD. Using a stringent >90% displacement threshold at 10 μM, hit rates were 4% (8/223) for cryo-EM and 6% (9/161) for AF2. Secondary binding assays on top hits gave K_i values between 15 and 344 nM; notably, the three highest-affinity compounds (K_i 15–24 nM) were all from the AF2 set, whereas the top cryo-EM compounds had K_i values of 71–114 nM. Overall, the difference in binding hit rates between AF2 and experimental structures was not statistically significant. Functional profiling and selectivity: Functional screening (initial single-concentration followed by dose–response for selected compounds) identified agonists, antagonists and subtype activities across 5-HT2A/2B/2C. At a 3 μM screen concentration, 10 cryo-EM and six AF2 compounds met the ≥10% 5-HT response threshold to be considered agonists. Full concentration–response assays (calcium mobilisation) showed potencies for cryo-EM–derived agonists ranging ~246 nM to 3 μM, and for AF2-derived agonists ~42 nM to 1.6 μM. Three of the top five AF2 agonists (Q2118, Z7757, Z2504) displayed 5-HT2A subtype selectivity over 5-HT2B/2C; none of the top cryo-EM agonists were subtype selective. BRET assays for Gαq dissociation versus β-arrestin2 recruitment indicated modest Gαq bias for nine of the top ten agonists from both sets. Antagonist potencies were generally weak; cryo-EM antagonists showed potencies between 13 and 78 μM, whereas AF2-derived antagonists ranged from 907 nM to 114 μM. Controls for nonspecific activity: Thirty-six potent ligands were counter-screened at an unrelated GPCR (V1A); none showed agonist or antagonist activity there. Thirty of these were tested for colloidal aggregation by DLS and for promiscuous enzyme inhibition of MDH at 31.6 μM; none formed particles by DLS nor substantially inhibited MDH, arguing against a colloidal mechanism for these ligands. Cryo-EM of an AF2-derived ligand complex: The researchers determined a 3.0-Å cryo-EM structure of 5-HT2A bound to the AF2-derived agonist Z7757 in complex with mini-Gαq. The experimentally observed Z7757 pose closely matched the AF2-docking prediction (pose RMSD ≈1.6 Å), with expected interactions: the phenolic hydroxyl hydrogen-bonded to T160 and S242 and to the backbone of G238, and the cationic nitrogen formed a salt bridge with D155. Comparison of the Z7757-bound structure with both the AF2 model and the lisuride cryo-EM revealed that certain residues (notably L229) in the new complex adopt positions more similar to the AF2 model, whereas other residues (F234, W151) resembled the lisuride cryo-EM conformations or showed conformational heterogeneity across deposited structures. These observations suggest AF2 sampled low-energy conformations that are relevant for ligand recognition even when they differ from the specific experimental structure used in a comparative docking campaign.
The authors interpret their results to mean that, contrary to expectations from retrospective analyses, unrefined AF2 models can be as effective as experimental structures for prospective large-library docking under some circumstances. Across both the σ2 and 5-HT2A targets, docking against AF2 models produced high and statistically comparable hit rates and affinities to those obtained from docking against experimental structures, and in the 5-HT2A case the AF2 campaign yielded some of the most potent and subtype-selective agonists. The cryo-EM structure of an AF2-derived agonist (Z7757) largely validated the docking prediction and showed that some residue conformations in the ligand-bound complex more closely resembled the AF2 model than the alternative experimental structure. From these findings the authors suggest that AF2 models can sample alternative low-energy conformations of binding-site residues that are relevant for recognising distinct chemotypes and thus can complement experimental structures in structure-based ligand discovery. The authors reconcile their prospective success with prior retrospective studies by emphasising bias in retrospective benchmarks: known ligands and experimentally determined structures are often mutually adaptive, so retrospective docking against AF2 models underestimates prospective utility. They note that AF2 and experimental structures tend to prioritise different chemotypes and that low overlap between active sets in paired campaigns supports the idea that different conformations can yield complementary ligand families. Key limitations acknowledged by the authors include that the study focused on two targets with AF2 models that were either close to experimental conformations (σ2) or close with a few important divergences (5-HT2A); other AF2 models can be so different from experimental structures that they are unsuitable for docking. The authors did not quantify how many targets fall into these categories or provide a definitive a priori criterion for suitability, although they note that binding-site pLDDT (predicted local distance difference test) scores may be informative. They also report that functional hit rates (agonists/antagonists) for 5-HT2A were low (~1–5%), and that testing racemic mixtures may have reduced functional detection (pure enantiomers can be crucial). Finally, the AF2 models used were predicted without ligand information; newer tools that cofold proteins with ligands may further improve models for docking. The authors conclude that, with appropriate target selection and caution, AF2 models can expand the domain of structure-based discovery by providing additional, low-energy receptor conformations that are useful for finding potent and diverse chemotypes, complementing experimental structures rather than replacing them.
Gumpper, R. H., Jain, M. K., Kim, K. et al. · Nature Communications (2025)