5-MeO-DMT

A dual-receptor model of serotonergic psychedelics: therapeutic insights from simulated cortical dynamics

This pre-print, based on predictive processing and an energy-based model of cortical dynamics, explores the therapeutic mechanism of serotonergic psychedelics. It suggests that a combination of 5-HT2a and 5-HT1a agonism leads to a more psychologically tolerable acute experience and better therapeutic efficacy compared to pure agonists. This finding supports the clinical success of mixed serotonin agonists like LSD, psilocybin, and DMT, and suggests potential for the development of even more effective and tolerable psychotherapeutic agents, such as biased 5-HT1a agonist psychedelics like 5-MeO-DMT.

Authors

Chelu, V.
Graesser, L.
Juliani, A.

Published

April 15, 2024

Biorxiv

meta Study

Links

Read Paper DOI Google Scholar

Abstract

Serotonergic psychedelics have been identified as promising next-generation therapeutic agents in the treatment of mood and anxiety disorders. While their efficacy has been increasingly validated, the mechanism by which they exert a therapeutic effect is still debated. A popular theoretical account is that excessive 5-HT2a agonism disrupts cortical dynamics, relaxing the precision of maladaptive high-level beliefs, thus making them more malleable and open to revision. We extend this perspective by developing a theoretical framework and simulations based on predictive processing and an energy-based model of cortical dynamics. We consider the role of both 5-HT2a and 5-HT1a agonism, characterizing 5-HT2a agonism as inducing stochastic perturbations of the energy function underlying cortical dynamics and 5-HT1a agonism as inducing a global smoothing of that function. Within our simulations, we find that while both agonists are able to provide a significant therapeutic effect individually, mixed agonists provide both a more psychologically tolerable acute experience and better therapeutic efficacy than either pure 5-HT2a or 5-HT1a agonists alone. This finding provides a potential theoretical basis for the clinical success of LSD, psilocybin, and DMT, all of which are mixed serotonin agonists. Our results furthermore indicate that exploring the design space of biased 5-HT1a agonist psychedelics such as 5-MeO-DMT may prove fruitful in the development of even more effective and tolerable psychotherapeutic agents in the future.

Unlocked with Blossom Pro

Research Summary of 'A dual-receptor model of serotonergic psychedelics: therapeutic insights from simulated cortical dynamics'

Introduction

Juliani and colleagues situate their work within ongoing theoretical debates about how serotonergic psychedelics produce therapeutic effects. Earlier research has identified 5-HT2a agonism and downstream neurotrophic processes as important, and frameworks such as REBUS (Relaxed Beliefs Under Psychedelics) propose that psychedelics relax the precision of high-level beliefs, increasing cortical entropy and enabling belief revision. However, alternative perspectives (for example ALBUS) and empirical phenomena such as intense ‘‘insight’’ experiences and DMT entity encounters suggest that acute effects may sometimes transiently strengthen beliefs rather than simply relax them. The authors emphasise that many classic psychedelics also have appreciable 5-HT1a affinity, and argue that prior models under-specify the role of 5-HT1a in shaping acute phenomenology and therapeutic outcomes.

Methods

The investigators track six primary metrics: energy function value (average auxiliary energy of sampled z), gradient magnitude (norm of energy gradients, used as a proxy for belief precision), local minima count (number of attractors), state visitation count (number of unique z states visited, used as a proxy for cortical entropy/diversity), energy-function divergence (a KL-divergence between the drug-modulated surrogate posterior and the target posterior, treated as a proxy for therapeutic efficacy), and divergence-trend monotonicity (a measure of whether KL decreases monotonically during optimisation, treated as a proxy for tolerability of the acute experience). The authors performed sensitivity analyses varying Hebbian plasticity, homeostatic strength and inference gradient steps and report robustness of conclusions across realistic parameter ranges; extreme parameter values altered outcomes and are discussed as implausible.

Results

An exhaustive scan of 5-HT2a/5-HT1a dose permutations identified an optimal trade-off (equal weighting of final KL reduction and KL-trend monotonicity) at maximal 5-HT1a with medium or heavy 5-HT2a — a biased 5-HT1a agonism. State-visitation count explained nearly all variance in final KL-divergence (r2=0.952), supporting a strong association between increased neural-state diversity (entropy) and the surrogate therapeutic metric. Average energy value accounted for most variance in divergence-trend monotonicity (r2=0.894), linking acute overfitting to poorer tolerability in the model.

Discussion

Limitations are emphasised: the simulations model a single fixed energy function and a stationary environmental context, use a low-dimensional non-parametric z representation, do not simulate action or full hierarchical interactions, and are abstracted from empirical neural data. The investigators also note sensitivity of outcomes to extreme parameter choices (for example very large Hebbian plasticity or very weak homeostasis) and acknowledge practical challenges in empirically validating fine-grained changes to an inferred energy function with current neuroimaging. They recommend targeted neuroimaging and behavioural experiments to test predictions, and suggest extensions using recurrent neural network models or richer hierarchical task contexts.

Conclusion

The paper concludes that a dual-receptor model in which 5-HT2a introduces stochastic perturbations and 5-HT1a provides smoothing can reconcile apparently divergent phenomenological and therapeutic observations about serotonergic psychedelics. Mixed 5-HT2a/5-HT1a agonism is predicted to yield both greater long-term efficacy and improved acute tolerability compared with pure agonists, and a bias toward 5-HT1a may further improve tolerability with only modest loss of efficacy. The authors propose that these mechanistic insights could guide clinical research and next-generation psychedelic drug design, while reiterating the need for cautious, equitable, and ethically informed translational work.

View full paper sections

INTRODUCTION

Serotonergic psychedelics such as psilocybin, LSD, and DMT have received significant attention from both research scientists and clinicians in recent years for their potential to treat a variety of psychiatric conditions ranging from depression and anxiety to substance use and obsessive compulsive disorders (M. W.. This transdiagnostic efficacy has led some researchers to hypothesize that there may be a single primary underlying factor of psychopathology which psychedelic therapy is acting to address (R.. Despite significant progress in understanding the empirical therapeutic effects of these drugs, theoretical models which describe and predict these effects are less well developed. While the key role of the 5-HT2a receptor system and downstream neurotrophic effects are largely accepted by the scientific community, the role of subjective experience in the therapeutic effects of psychedelics is still a topic of considerable debate. A variety of competing theories have been developed to explain the mechanics behind the subjective effects of psychedelics and their relationship to therapeutic outcomes. Among these, the Relaxed Beliefs Under Psychedelics (REBUS) model (R. L.has received additional attention outside the domain of neuroscience for its translational application in guiding protocol development for psychedelic-assisted psychotherapy. The REBUS model hypothesizes that psychedelics exert their therapeutic effect by relaxing the precision of high-level beliefs both acutely and post-acutely, thus making them amenable to modification through introspection and interpersonal therapy. This hypothesis is supported by evidence of an increase in the entropy of cortical transition dynamics during the acute phase of psychedelic use, and a relaxation of long-term propositional beliefs after psychedelic therapy. Despite the existence of some preliminary evidence in support of REBUS, it is not clear whether we should expect the relaxation of beliefs to hold true across the dose-response curve for all substances in the broad class of psychedelics, or whether certain doses and environmental conditions may result in the transient strengthening of beliefs under psychedelics instead. One piece of evidence which would suggest that psychedelics may strengthen beliefs in some cases are reports of insight-experiences, in which individuals arrive a strongly-felt (but potentially erroneous) new beliefs. Another is the experience of so-called "entities" and "alternate realities" under the effects of DMT, which are often described as appearing more real than normal conscious experience. Furthermore, although the purported mechanism behind the disruption of belief representation is 5-HT2a agonism, there is a non-trivial contribution by other serotonin (5-HT) receptor populations in the changes to neural dynamics which result from psychedelics, with activity at 5-HT1a receptors in particular exerting a significant effect. The recently introduced Altered Beliefs Under Psychedelics (ALBUS) model predicts that psychedelics 2 alter belief representations in a more non-specific way than REBUS, with the therapeutic belief relaxation seen in clinical studies being a special case which results from specific dose ranges and controlled environmental influences. While this alternative hypothesis is based on a study of the theoretical non-linear dynamics in the underlying activity of the relevant circuits in the cortex, it currently lacks empirical validation and is under-specified in a number of ways which limit its practical application as a predictive tool. In this work we begin to address these limitations. Utilizing a novel simulation paradigm inspired by iterative optimization algorithms and built on the framework provided by ALBUS and hierarchical predictive processing (HPP), we contribute additional theoretical support for the possibility of both belief relaxation and belief strengthening across the dose-response curve of serotonergic psychedelics. Our model and simulation framework has the potential to help answer several important questions about the neuromodulatory effect of serotonin (5-HT) signaling on cortical belief representation. We illustrate that the markers of brain activity characteristic of the psychedelic state and associated with 5-HT2a agonism such as increases in entropy and complexity of cortical states are consistent with both a relaxation as well as a stochastic and transient strengthening of beliefs during an acute drug administration. We demonstrate that within our model this transient strengthening of beliefs is theoretically capable of producing therapeutic outcomes comparable or greater than those of more straightforward belief relaxation. We also model the neuromodulatory role that 5-HT1a agonism plays in psychedelic effects, both with respect to their acute phenomenology as well as to their long-term clinical efficacy in addressing psychopathology. Psilocybin, LSD, and DMT all have significant affinity for the 5-HT1a receptor in addition to the 5-HT2a receptor. Rather than serving a minor or even completely negligible role, as has been previously assumed, we provide a review of empirical evidence along with simulations under the proposed model which both provide give insight into the key role 5-HT1a agonism may play in psychedelic experiences and therapeutic outcomes. In this work we characterize 5-HT2a agonism as producing disruptions in belief representations through the perturbation of the underlying optimization landscape of the predictively encoded input signal. In contrast, 5-HT1a agonism provides a complementary regularizing effect which prevents the development of overly-precise beliefs during the acute phase of the psychedelic experience. Based on clinical evidence and simulation results, we hypothesize that although 5-HT2a agonism is necessary and sufficient for the psychedelic experience, 5-HT1a agonism also plays a role in modulating the experience such that it is both more tolerable and therapeutic. Our model allows for an interpretation of the unique phenomenology of the highly biased 5-HT1a agonist psychedelic 5-MeO-DMT, which has been described as producing a "white-out" or "void" state. Within our framework, the 5-MeO-DMT experience corresponding to a state in which belief representation is temporarily regularized to an extreme point where beliefs are represented with a near-zero precision due to the high levels of 5-HT1a agonism. The fact that this drug produces one of the most significant and durable antidepressant effectsis both support for the REBUS model as well as a major complication for the hypothesis that 5-HT2a agonism is the only relevant drug factor in psychedelic-assisted psychotherapy. Properly characterizing the modula-tory role played by 5-HT1a agonism in the psychedelic experience may be essential to the development of next-generation psychedelic substances which provide both greater psychological tolerability as well as more consistent long-term therapeutic effects. The rest of this article is organized as follows: We first introduce a guiding framework for the endogenous function of serotonin signalling (Section 1.1) as well as experimental and clinical evidence in support of the modulating roles of 5-HT2a and 5-HT1a agonism in the acute effects and therapeutic efficacy of serotonergic psychedelics (Section 1.2). We then present a review of the REBUS model and predictive processing (Section 1.3), followed by a discussion of the key role of the prefrontal cortex in psychedelic activity (Section 1.4). We next introduce our theoretical model based on the principles of HPP and energybased models (EBMs) (Section 1.5), within which we propose specific mechanisms for the neuromodulatory effects of 5-HT2a and 5-HT1a agonism on cortical dynamics (Section 2). We follow this with results from idealized simulations of these dynamics, demonstrating their ability to predict potential therapeutic outcomes, markers of cortical entropy, and drug tolerability (Section 3). Finally, we discuss the implications of our model, as well as the experimental evidence which would need to be collected to validate or invalidate our predictions (Section 4).

SEROTONERGIC NEUROMODULATION AS A STRESS RESPONSE SYSTEM

A well-supported model of the serotonin system is that it serves to mediate cognitive and affective responses to stress. This can be seen as an extension of a broader evolutionarily preserved role for serotonin in responding to aversive stimuli. A prominent model within the paradigm of serotonin-mediated stress response is that relevant serotonin signaling is composed of two unique interdependent systems, corresponding to the two main serotonin receptor populations in the brain. Within the model, these receptor systems instantiate separate "active" and "passive" coping strategies which an organism may engage under stress (R. L.. These two systems have independently been identified as "automaticity" and "flux" cognitive states, which we interpret here as being functionally equivalent to the two coping strategies outlined by R. L. Carhart-Harris and Nutt. The human nervous system contains over a dozen unique serotonin receptors. Of these, 5-HT1a and 5-HT2a are both the most widely distributed within the brain and the most well studied. Of particular interest are the role of the postsynaptically expressed 5-HT1a and 5-HT2a receptors in the cortex. These receptor populations are understood to operate in relatively simple opponency with one another within cortical pyramidal cells, with 5-HT1a receptors inhibiting and 5-HT2a receptors exciting the postsynaptic cell. The active and passive coping strategies within the brain are hypothesized to be instantiated by 5-HT2a and 5-HT1a receptor systems, respectively (R. L.. Within this dual-strategy framework, small to moderate amounts of stress activate the 5-HT1a medi-ated passive coping system while large amounts of stress activate the 5-HT2a mediated active coping system. In passive coping the animal maintains the current behavioral policy but modulates affect in response to manageable levels of stress. In contrast, active coping is engaged when the level of stress is significant enough to require the instantiation of novel and divergent internal models, beliefs, or behavioral policies. This bi-modal behavioral distribution can be seen in a simplified form in serotonin-mediated rodent responses to stress from predatory threats, where a low-level threat initiates freezing behavior while a high-level threat initiates fleeing behavior. Importantly, both coping strategies are mediated by the same system of serotonergic neuromodulation. Within a stress-response model of serotonin function, the extent to which a given strategy is deployed is a function of 5-HT release in the dorsal raphae nucleus (DRN). A complete account of the computational role of the DRN is still being developed, but evidence exists that DRN neurons may compute unsigned prediction errors, and that the magnitude of this signal corresponds to downstream 5-HT release in the cortex. This prediction error computation is hypothesized to be part of a larger role for the DRN in value prediction complementary to that of the ventral tegmental area. The kind of stress which serotonin acts in response to may therefore be mediated by the ability of the organism to form and maintain successfully predictive representations. Within this framework, 5-HT neuromodulation then serves a role in responding to stress by improving the computational efficiency of the processes required to learn and maintain such successful representations. The ability to efficiently learn and maintain adaptive predictive representations is precisely the construct of cognitive flexibility-the characteristic that enables animals or humans to adaptively generate appropriate behavioral responses based on changing sensory stimuli-which psychedelics have been demonstrated to improve, and is believed to be generally associated with serotonin signalling. Given that all classic psychedelics have significant affinity for the 5-HT2a receptor system, they have been characterized as a prototypical example of a substance that induces an "active coping" response in an organism by R. L. Carhart-Harris and Nutt. This manifests in the acute psychedelic state which enables a state of "metaplasticity"-the dynamic regulation of the extent to which synaptic plasticity can be induced, enabling a re-evaluation of previously held beliefs and the spontaneous assumption of novel representations of self or environment (for a review of this effect across various forms of belief, see). In contrast, R. L. Carhart-Harris and Nutt propose MDMA as an exemplar "passive coping" substance due to its role in releasing endogenous serotonin, which preferentially activates the 5-HT1a receptor system. In humans this is hypothesized to manifest most acutely in the felt sense of equanimity and acceptance that characterizes the unique phenomenology of the MDMA. A complication to this straightforward account is that psilocybin, LSD, and DMT all bind with significant affinity to 5-HT1a receptors as well 5-HT2a receptors(See Figure). Given the Figure: Relative 5-HT1a and 5-HT2a binding affinities K i (µM ) for select psychedelic substances. Lower values correspond to higher binding affinity. DOI has the highest relative 5-HT2a binding affinity, while 5-MeO-DMT has the highest relative 5-HT1a binding affinity. Data sourced from. strong positive correlation between receptor binding affinity and efficacy for this class of drugs, one would expect them to engage both coping systems simultaneously, as opposed to only engaging the active coping system. Below we present a review of existing evidence for the roles of both 5-HT2a and 5-HT1a agonism in the acute effects of psychedelics.

-HT2A AND 5-HT1A RECEPTOR AGONISM IN PSYCHEDELICS

5-HT2a agonism has been demonstrated to be necessary and sufficient for inducing the "head twitch response" in rodents, and this effect is highly correlated with the existence of reportable psychedelic phenomenology in humans. Despite this, there is also evidence that agonism at the 5-HT1a receptor is responsible for significantly modulating the 5-HT2a mediated effects of serotonergic psychedelics. Here we consider the evidence for this modulating effect across a number of dimensions including behavioral, neural, clinical, and phenomenological. On the phenomenological level, 5-HT1a agonism has been identified as being the primary driver of stimulus control in rodents trained to discriminate 5-MeO-DMT, and there is evidence that it also plays a significant role in the discrimination of LSD and DMT, though not psilocybinor DOB (a strongly biased 5-HT2a agonist, similar to DOI). The role of 5-HT1a agonism in stimulus control in these examples is notably correlated with the relative 5-HT2a and 5-HT1a receptor affinities of each drug(see Figure), suggesting an underlying pattern whereby psychedelics with greater relative 5-HT1a agonism also manifest phenomenological effects which are more dependent on that agonism. Despite the fact that the 5-HT1a agonism of psilocybin does not induce stimulus control, there is still 6 evidence for the role it plays in the drug's behavioral effects, specifically with respect to the modulation of attentional control, compulsive behavior, and exploration. The later study demonstrated that 5-HT1a agonism is likewise primarily implicated in the behavioral and phenomenological effects of 5-MeO-DMT, which is unsurprising, given the high ratio of 5-HT1a to 5-HT2a affinity in this substance. The behavioral effects in rodents treated with 5-MeO-DMT are accompanied by changes in cortical dynamics in both PFC and visual cortex, both of which are mediated primarily by 5-HT1a agonism. One consistent marker of these disruptions is a decrease in fMRI bloodoxygen-level-dependent (BOLD) signal, which is consistent with the inhibitory role of postsynaptic 5-HT1a agonism, particularly in the PFC. 5-HT1a receptor binding maps are also significantly more predictive than any other serotonin binding map except for 5-HT2a in anticipating the changes to brain activity under LSD, as well as for psilocybin. A prominent theory of the therapeutic efficacy of psychedelics is that they act as powerful psychoplastogens, or neurogenesis-inducing agents. Although this effect is well documented, there is some debate concerning the mechanisms by which it takes place. Given the critical role 5-HT2a agonism plays in the phenomenological effects of psychedelics, it was historically hypothesized that 5-HT2a agonism may be responsible for the plasticity inducing effects and their corresponding therapeutic benefits as well. Recent work has refined this hypothesis by implicating intracellular 5-HT2a agonism in particular in neurogenesis, thus providing an explanation for why endogenous serotonin does not act as a psychoplastogen as it is only capable of extracellular 5-HT2a agonism. The hypothesis of 5-HT2a mediated plasticity is complicated by other research implicating 5-HT1a agonism in downstream neurogenesis. It has also been shown that a moderate knockdown of 5-HT2A receptor availability disrupts the acute behavioral effect of psilocybin without abolishing its effect on structural plasticity. Likewise, research into the antidepressant effects of the dissociative drug ketamine have identified a key role of downstream 5-HT1a agonism in PFC neurogenesis, which is a believed to be a key factor in the therapeutic effect of the drug. Other recent work suggests that the psychoplastogen effects of psychedelics may be largely serotonin-independent and the result of direct binding to the Brain-Derived Neurotrophic Factor (BDNF)-a key neurotrophin involved in synaptic plasticity-receptor TrkB (Tropomyosin Receptor Kinase B), whose activation plays a direct role in inducing neuroplasticity. Importantly, there is also evidence that 5-HT2a agonism may not play an essential role in the antidepressant effects of psychedelics, even independent of its effects on neuroplasticity. DOI, a highly biased 5-HT2a agonists, has failed to produce robust therapeutic effects in a rodent model of depression. There is also mixed evidence for the efficacy of DOI as a potential treatment of anxiety disorders, with a number of studies finding an anxiolytic effect, while others finding an anxiogenic effect in rodent models. In contrast, it has been demonstrated that psilocybin is capable of producing antidepressant effects in rodent models even when the acute effects of 5-HT2a agonism are partially blocked, potentially implicating 5-HT1a agonism in these effects. Another proposed mediator of the antidepressant effects of psychedelics is a post-acute increase in cognitive flexibility. A review of the literature suggests that cognitive flexibility is more frequently associated with preferential 5-HT1a agonism rather than 5-HT2a agonism. Recent evidence also suggests that blocking 5-HT1a agonism in psilocybin impairs the drug's positive effects on cognitive flexibility in a rodent model of anorexia nervosa, suggesting the important mediating role that 5-HT1a agonism plays in the drug's effects. Finally, serotonin itself demonstrates a significantly biased agonism for 5-HT1a over 5-HT2a receptors, and there are commonalities between the phenomenological effects of serotonin releasing drugs and 5-HT1a agonists. In a study of the effects of the serotonin releasing drug MDMA, 5-HT2a agonism was found to play only a negligible role in the acute phenomenology, suggesting in contrast a more prominent role of 5-HT1a agonism in the subjective effects of the drug. There is also a research literature on the anxiolytic effects of biased 5-HT1a agonists, such as 8-OH-DPAT, in rodent models of psychopathology, and these effects are consistent with those of MDMA and other 5-HT releasing drugs. Reductions in anxiety and avoidance are key mediators in a variety of common psychopathologies and their treatment by psychedelic therapy, suggesting a potential link between these effects and 5-HT1a agonism. Notably, outcomes from MDMA assisted therapy are characterized by a significant acute decrease in experiential avoidance. Taken together, it is clear that at the very least 5-HT1a agonism plays a non-trivial modulating role in the acute phenomenology and long-term efficacy of serotonergic psychedelics. The question that then arises is in what way 5-HT1a agonism may contribute to these effects when the most apparent changes in subjective experience are mediated by 5-HT2a agonism. Utilizing an HPP framework of inference and learning in the PFC under psychedelic effects as outlined by R. L. Carhart-Harris and Friston and extended by Safron, we hypothesize that 5-HT1a agonism impacts cortical belief representation by regularization toward a flat prior and thus smoothing the optimization landscape underlying predictive coding dynamics. In contrast, we posit the mechanism of action of 5-HT2a agonism is to introduce stochastic perturbations into the neural optimization landscape which produce transient changes resulting in increased plasticity. The predictions of such a model are consistent with the anxiolytic, sedating, and prosocial effects8 of 5-HT1a agonism as encapsulated in the construct of "passive coping" as well as the stimulating, pareidolic, and insight-inducing effects of 5-HT2a agonism as encapsulated in the construct of "active coping". Only when both sets of effects are taken together can there be a full characterization of the psychedelic experience in humans and other animals (M. W..

PREDICTIVE PROCESSING AND THE REBUS MODEL

The REBUS model is rooted in the idea that the brain uses an internal generative model of the world to predict sensory input based on movement and past sensory experience (R. L.. This family of theories, collectively referred to as the predictive processing (PP) framework, includes predictive coding, hierarchical temporal memory and Bayesian inference (K.. Common to each of these variations of the theory is the idea that an error signal between the predicted and actual sensory input is used to update an internal representation of the world. Within the Bayesian brain perspective in particular, these internal representations are referred to as beliefs. PP is often described as a processing hierarchy, and referred to as hierarchical predictive processing (HPP) in the context of the human cortex, whereby higher levels of this hierarchy of functional brain areas send top-down signals to a lower areas in the form of predictions of the bottom-up stimulus input to that area. PP need not follow a strict hierarchical arrangement of inter-areal connections, the connections can also be lateral described by a densely recurrent network (R. P. N.) (see Figurefor an illustration of a representational layer in the HPP model). Within this theoretical framework, prediction errors are dynamically adjusted using a gain modulation mechanism which gates neural plasticity in a context-dependent manner. Specifically, the neuromodulatory tone may shift the relative contribution of bottom-up and top-down signals and consequently alter the precision or sensitivity by which cortical circuits prioritize and respond to sensory information or report prediction errors. Representations encoded with greater precision are thus able to exert a greater influence on both upstream and downstream computation. The influence of prediction errors can be modulated according to the internal latent cortical representation of the sensory stimuli, which would determine the extent to which bottom-up inputs are used to update the internal model, or how much the internal representation is driven by top-down predictions. While predictive processing focuses on the brain's generation and adjustment of predictions to minimize prediction errors, the free energy principle (FEP) (K.which inspired the REBUS model, provides a broader theoretical perspective, framing these processes as part of a larger imperative for biological systems to minimize "free energy" and, by extension, surprise or unpredictability in their interactions with the environment. When there is a mismatch between predictions and sensory input, a discrepancy between top-down and bottom-up inputs generates prediction errors (cf. Figure). Minimizing these prediction errors aligns with the "free energy" principle's goal of reducing surprise or uncertainty. According to the REBUS model, the spontaneous activation of neurons due to increased postsynaptic 5-HT2a agonism results in a desynchronization of those neuronal populations, and thus a disruption in their ability to robustly encode high-level beliefs or internal representations of the world. This disruption is equivalent to a decrease in the precision of the beliefs which these representations encode, thus making them also more malleable. Under the REBUS formulation, psychopathology pertaining cognitive inflexibility can be interpreted as inference under aberrant precision with respect to strong, rigid priors (R.. 5-HT2a modulation by psychedelics is thought to relax overly precise prior precision and enable the revision of maladaptive priors through experience-dependent learning. At the neural level, the disruption of high-level functional connectivity induced by psychedelics is accompanied by a decrease in activity and correlation within and between these networks, particularly those involved in self-referential processing. This includes regions of the default-mode network (DMN), a brain network that shows increased endogenous fluctuations when an individual is awake but not engaged in a task requiring attention, a reconfiguration of communication in the brain characterized by increased brain activity, increased diversity of neural activation patterns compared to normal waking consciousness, elevated Shannon-Boltzmann entropyof intra-brain-network synchrony (R. L., induced "metaplasticity"-thought to be one of the mechanisms underlying the establishment of critical periods. We can draw a link between an organism's ability to minimize prediction errors, which can be interpreted as a definition of free energy within HPP, and its long-term behavioral success. Theoretical work has further linked this success in minimizing prediction errors to not only successful behavioral outcomes, but also positive mental health outcomes. Large prediction errors in cortical representation corresponds to the agent's inability to resolve the uncertainty in an environment. There is both theoretical and empirical research connecting a failure of predictive error minimization to negatively valenced affect in an individual's subjective experience. Greater felt uncertainty for example is often characteristic of anxiety and mood disorders. This is particularly the case when the inability to resolve prediction errors takes place at higher levels of the cortical hierarchy, as this suggests a fundamental maladaptivity in the foundational beliefs or attentional policies of the organism. Within this work we use this connection between prediction error and negatively valenced affect to derive a measure of the potential therapeutic efficacy of a given pharmacological intervention. A critical characteristic of a successful learning system is robustness to changing tasks, contexts, and environments. A maladaptive neural representation is one which is locally optimal with respect to the current environmental context, characterized by a particular history-dependent frame of reference, but sub-optimal with respect to future contextual frames of reference. The canalization model of psychopathology characterizes the tendency to be caught in high-precision local optima of the cortical optimization landscape as playing a causal role in a variety of psychopathologies (R.. A similar "plasticity loss" phenomenon has been observed in adaptive learning algorithms in the context of continual learning in non-stationary environments. An implication of connecting prediction error minimization with behavioral fitness is that confident neural responses are not inherently pathological if they are both relevant to the current environmental context and also adaptable to future contexts. Indeed, such high-precision, stable, and adapted representations may provide a buffer against various forms of psychopathology. The inverse is also true that even if certain encoded predictions are amenable to change (more adaptable), they can still be maladaptive if they are unable to instantiate a robust enough behavioral policy to ensure success in a given environmental context. Crucially, the ideal situation involves learning models which are both adapted to the current environmental context and malleable enough to make possible future adaptation when the environmental context changes. This highlights the stability-plasticity dilemma, which refers to the extent a system must be plastic to integrate novel information and stable in order not to catastrophically interfere with consolidated knowledge.

IDENTIFYING A KEY ROLE FOR PREFRONTAL CORTEX

In this work we focus in particular on the role of the prefrontal cortex (PFC) as a target site of both endogenous serotonergic neuromodulation as well as exogenous neuromodulation by psychedelics. The PFC is involved in higher cognitive functions such as decision-making, working memory, and executive control, and is a strong candidate region to focus our attention for a number of reasons. It both contains a large number of neurons which express both 5-HT2a and 5-HT1a postsynaptic receptors and is a primary target of 5-HT neuromodulation arising from the dorsal raphae nucleus. These facts alone would make it likely that the PFC plays a significant role in the action of psychedelic drugs, and there is clear neuroimaging evidence that this is the case (Wood,. The PFC is also implicated in a variety of psychopathology ranging from depression to schizophrenia, making it an ideal target region for a pharmacological intervention which may be addressing an underlying causal factor of psychopathology (R.. It is also the case that the PFC represents high-level beliefs and goals of the organism concerning self, others, and world, as well as the high-level behavioral and attentional policies required to achieve those goals. These collectively make up the primary psychological material which is involved in psychedelic-assisted psychotherapy, as well as psychotherapy more broadly. Indeed, within the REBUS model the PFC is positioned as one of the primary regions both responsible for the acute effects of psychedelics as well as their more lasting therapeutic effects, given its position at the top of the predictive processing hierarchy. Given its role both in the maintenance of critical representations of self, world, and goals, as well as its causal influence on downstream cortical dynamics and ultimately behavior, we consider the prefrontal cortex to be a prime candidate to model within our theoretical framework and simulations. The PFC also expresses strong recurrent dynamics (S., which are necessary for the formation and maintenance of stable attractors responsible for the representation of high-level beliefs. In our context, these stable attractors corresponding to minima in the optimization landscape of the PFC represent high-level beliefs concerning the organism's overall fitness in a given environmental context, its high level goals, as well as mental and physical behavioral policies. It is precisely this set of beliefs which psychedelic-assisted psychotherapy has been shown to be capable of altering.

ENERGY-BASED MODELS OF NEURAL DYNAMICS

Dynamical systems approaches have become increasing popular in computation neuroscience, being used to model both single neurons as well as population responses of large numbers of neurons. The dynamical systems approach explicitly describes neural population responses as time-varying trajectories in a high-dimensional state space and views the dynamics as acting to shape these trajectories. The brain operates in the face of substantial uncertainty due to ambiguity in the inputs, and inherent unpredictability in the environment. Probabilistic inference offers a principled framework for understanding both behaviour and cortical computation. Based on the idea that cortical circuits implement Bayesian inference in latent variable models, it has been proposed that neural firing rates might be viewed as representing Monte Carlo (MC)samples from the posterior distribution over the latent variables capturing the neural response, given the observed input or stimuli. In this view, the neural response variability is related to the uncertainty, about world parameters, which is inherent in any stimulus. The PFC forms a recurrent network whose spiking activity encodes multiple types of learning-relevant information. A number of plausible mechanisms have been proposed for the persistence of neural activity in a population of neurons for the time a stimulus is perceived, and there is significant evidence of this being the case for many clinically relevant sub-networks of the cortex, and for the PFC in particularJ. X.. These stable and persistent patterns of neural activity are thought to be the result of attractor dynamics within the cortex. This assumption of the existence of cortical attractor dynamics has formed the basis for multiple recent theoretical models of psychedelic's effect on brain activity. Energy functions can represent various aspects of neural activity, including factors such as synaptic weights, membrane potentials, and firing rates. One common approach is to define an energy function analogous to the Hamiltonian in classical mechanics. This energy function captures the total "energy" of the neural network. Changes in neural activity and synaptic connectivity can be modeled as gradients in this energy landscape, driving the system towards stable states (attractors) or trajectories corresponding to specific cognitive processes or behaviors. Here we study the abstract properties of PFC neural circuits by examining the dynamics of neural activity, their evolution over time, and whether they tend to converge to stable attractor states. Energy functions can be used to analyze the stability of these states by assessing how changes in neural activity affect the total energy of the system. In this perspective, attractors are minima in the energy landscape in which a stable and persistent configuration of neural activity in reached. From a Bayesian perspective, this persistent activity corresponds to a internal belief state, and the difficulty of the system to escape this configuration corresponds to the precision with which the belief is encoded. In complex systems, the energy landscape represents all possible configurations and their associated energies, with the system evolving towards lower energy configurations. Understanding the energy landscape's structure helps predict the behavior and stability of dynamical systems. In optimal control problems, manipulating inputs or synaptic connections aims to minimize an optimization objective, which determines the energy function and the associated probabilistic model it generates. Within the hierarchical predictive processing (HPP) framework, this objective functional calibrates the energy function and shapes the landscape governing cortical dynamics based on the predictive error between decoded and incoming stimuli. Following these premises, we propose an optimization algorithm within the predictive processing framework which utilizes an energy-based model (EBM) of neural dynamics, a.k.a. energy networks. We consequently assume that the attractor dynamics operate within the HPP framework (cf. Figure), whereby higher or downstream functional networks attempt to predict incoming information from lower or upstream networks via top-down predictions cf.

METHODS

Our theoretical model is based on existing work using EBMs to model neural responses, but differs in a number of important ways. (i) We first describe a general algorithm for learning and inference in EBMs of neural responses to stimuli in the context of HPP; (ii) Then we present our algorithm with added serotonin modulation; (iii) Lastly, we introduce the metrics of interest considered in our simulation results presented in Section 3. For additional details on the theoretical model and relevant connections to similar models, see Appendix B. For additional details on the specific parameters used in the simulations, see Appendix C.

ENERGY-BASED MODEL PRELIMINARIES

Taking HPP as a starting point, we assume the brain is modeling the distribution of possible interoceptive and exteroceptive sensory signals which can be encountered in a given environmental context. Concretely, at any moment in time an organism is receiving some sensory stimuli x which is distributed according to some probability distribution p(x). We model a single layer of a predictive processing hierarchy (Figure 2(a)), and thus our framework is generic enough to remain agnostic with respect to the signal's origin. In this work however we assume x is internal to the organism and corresponds to a representation upstream of the PFC. The goal of the model is to produce a downstream neural response z which is capable of predicting x. To do so, the model must learn to correctly represent the distribution p(x). The distribution of stimuli is a function of some environmental context, which may persist for minutes, hours, days or weeks. This context may correspond to a particular configuration of social relationships, the physical environment, or goals which govern the individual's mental and physical behavior. In psychedelic-assisted therapy, an environmental context would correspond to the "set and setting" for a particular drug-induced experience. In order to simplify the learning procedure, we consider a fixed environmental context, and correspondingly a stationary distribution over stimuli. In order to learn a response function which can model the stimuli distribution p(x), we take inspiration from control theory and machine learning, and employ EBMs, a.k.a. energy networks. EBMs are models which perform inference and learning implicitly through an energy function, typically parameterized by a multi-layer or recurrent neural network. We follow prior approaches from the literatureand consider inference and learning in EBMs where the energy function is defined over possible neural responses, or patterns of neural activity. To begin with, any probability density function p(x) can be expressed via a Boltzmann distribution as where E(x) is an energy function that maps input x ∈ X to a scalar, and Z is the normalizing constant with respect to x (also known as the partition function). Ideally, an energy function should assign low energy values to the samples drawn from data distribution, and high values otherwise. In computational models of learning in the brain using EBMs, inference is carried out through neural sampling, while learning happens via long-term synaptic plasticity. The key challenges are the following: (i) The intractable partition function Z makes learning the energy function via a maximum likelihood estimate is not always straightforward, particularly under normative constraints, such as locality. (ii) The inference procedure via neural sampling should be able to explore complex multi-modal distributions and be fast enough to not violate normative neural constraints.

PREDICTIVE PROCESSING USING EBMS

The structure of the EBM is such that synaptic connectivity, captured with parameters θ, can be functionally represented by means of a mapping between stimuli x and neural responses z, which defines a joint . By manipulating the parameters or synaptic connections θ, we may influence the dynamics of neural activity and guide the system towards desired states or behaviors. EBMs can provide a framework for inference and learning using spatially localized rules, which fit normative theories that postulate plasticity can only be affected by locally-available information, and thus refrain from violating neural processing constraints. Following prior research emphasizing the role of gradients in neural computation, we use a gradient-based learning algorithm, within the predictive processing framework, which corresponds to learning the parameters θ defining the synaptic connections such that the distribution of the stimuli under the model p θ (x) matches as accurately as possible the perceived distribution of stimuli p(x) an organism may encounter in a given environmental context. This is carried out by maximum likelihood estimation with respect to θ with D KL (p, q) the Kullback-Leibler (KL) divergence between p and q. The marginal likelihood p θ (x) = z p θ (z)p θ (x|z)dz however is computationally intractable in most cases of interest. One approach to solve this challenge is to approximate the posterior distribution p θ (z|x), with another parameterized density model p ϕ (z|x), using synaptic connection weights ϕ. Then, minimize the evidence lower bound (ELBO)on the marginal likelihood -E x∼p(x) [log p θ (x)] ≤ L(θ, ϕ), with respect to both the generative parameters θ and the variational parameters ϕ, where the ELBO is the loss gradient with respect to the generative parameters θ, given individual stimuli sampled from x (i) ∼ p(x), ∀i ≥ 0 and generated samples from the posterior z (i) ∼ p ϕ (z|x), we may use an online estimate ) and then update the parameters θ with gradient descent θ ′ ← θ + ∆θ, where ∆θ . = -α ∇ θ L(θ, ϕ), and α is a small positive learning rate. Similarly, to estimate the gradient with respect to the parameters of the recognition model ϕ, given individual stimuli sampled from x (i) ∼ p(x), ∀i ≥ 0 and generated samples from the posterior z (i) ∼ p ϕ (z|x), we may use an online estimate ∇ ϕ L(θ, ϕ) . = ∇ ϕ log p ϕ (z (i) |x (i) ), or an empirical average over can be viewed as a reward in the context of the "weight transport" or credit assignment problem. We may then update the variational parameters ϕ with gradient descent ϕ ′ ← ϕ + ∆ϕ, where ∆ϕ ≈ -α ∇ ϕ L(θ, ϕ). In EBMs we may represent the generative and variational parameters jointly p θ (x, z), without explicit modelling of the posterior p θ (z|x) if we can still draw samples from it. Samples from the posterior or the joint model may be obtained using stochastic sampling by means of a Monte Carlo gradient estimator.

SAMPLING FROM EBMS

The neural sampling perspective of probabilistic inference in the cortex posits that the brain infers a posterior distribution over neural responses consistent with Bayesian inference. To model the variability in neural responses, it has been proposedto interpret the neural responses as Monte Carlo samplingof the posterior distribution p ϕ (z|x) or the joint model p θ (x, z). To work around the need to represent the posterior explicitly, we may also use the relationship The sampling process itself involves iteratively descending the energy landscape in order to arrive at a representation where the energy is locally minimal. Prior workshave developed and applied a number of approaches to sample from the posterior or the joint efficiently by defining the inference dynamics as performing walks in the latent space of neural responses, e.g., Gibbs sampling, Langevin sampling, Hamiltonian sampling.

OPERATIONALIZING SEROTONERGIC NEUROMODULATION

Within the framework we lay forward, pathologies in optimization correspond to failures to find stable and predictive configurations of neural activity, and likewise a failure to encode adaptive beliefs or internal representations. If these pathologies are severe and prolonged enough, they may lead to downstream failures in behavioral fitness and the development of psychopathology associated with either over-or undercanalization (R.. These optimization issues can include the inability to find global optima (J., ill-conditioning of the energy landscape manifesting in high sensitivity in response to small changes in the input, causing a deceleration in convergence to a stable attractor, overfitting and plasticity loss. Using Markov Chain Monte Carlo (MCMC)we can sample from the posterior distribution over neural responses p θ (z|x) by translating the density function for this distribution to a "potential" energy function E θ (x, z) using Eq 1. We then simulate a Markov chain from a stochastic process designed to explore the posterior distribution of neural responses, given observed data. Langevin dynamics (LD) has been applied to sample from a posterior distribution, as Bayesian inference, by performing stochastic gradient descent on the manifold of the energy function defined as the negative log-posterior distribution Alone however, this response sampling procedure is slower than the processing speed of the cortex, and is furthermore susceptible to the optimization pathologies described above. Introducing a surrogate objective via an auxiliary function (a "kinetic" energy function) that presents additional neuromodulation mechanics, the resulting stochastic dynamical system may be better able to explore the posterior density and escape from local minima of the "potential" energy landscape. We model the neuromodulatory effects of serotonin using an inference process presenting such neuromodulatory mechanisms which indirectly impact the learning process for the synaptic connectivity of the energy function over neural responses inducing the posterior distribution. We operationalize the effects of 5-HT2a and 5-HT1a agonism as modulators of the energy function landscape, and simulate their effect following existing empirical evidence, particularly implementing the following mechanisms. Excitatory postsynaptic effect of 5-HT2a agonism Given the excitatory postsynaptic effect of 5-HT2a agonism on cortical pyramidal cells, we model its role in the neural optimization process as a stochastic perturbation of the energy landscape used to infer the neuronal response. This stochasticity injection has been hypothesized to disrupt neural responses through the desynchronization of relevant neural populations in a manner that is capable of both strengthening or relaxing the neural representations of populations of neurons in a context-dependent manner. Inhibitory postsynaptic effect of 5-HT1a agonism We model the inhibitory postsynaptic effect of 5-HT1a agonism on cortical pyramidal cells via a local smoothing or regularization of the energy landscape. This serves to destabilize locally optimal, but potentially globally sub-optimal, attractor points of the optimization surface. By regularizing the energy function, the optimization algorithm is encouraged to explore the solution space effectively while still exploiting promising regions for optimization. This relaxation prevents the encoding of overly-precise beliefs, which may become maladaptive in future contexts. It also helps prevent overfitting by promoting a broader and less confident exploration of the solution space-an exploration-exploitation tradeoff.

CONNECTION TO HAMILTONIAN DYNAMICS

A connection can be drawn between our proposed mechanisms of 5-HT neuromodulation and Hamiltonian Monte-Carlo (HMC), which relies on Hamiltonian dynamics to sample from posterior probabilities. HMC can be implemented by the interactions of recurrently coupled excitatory and inhibitory populations in cortical circuits, and has been previously used to simulate neural responses (T.. In Hamiltonian dynamics, the state of the system behaves as a particle moving on a (high dimensional) surface, frictionless but with momentum. The algorithm leverages an auxiliary variable, which can be interpreted as momentum in the context of neural adaptation, and using which it is able to adapt the inference dynamics. The surface determines the potential energy of the particle, corresponding to the negative logarithm of the probability distribution that needs to be sampled from E(x, z) ∝ -log p(z|x) (such that high probability states correspond to low potential energy). The momentum variable, denoted e corresponds to an auxiliary random variable capturing a "kinetic" energy function e, defined using a normal distribution centered around the neural responses z, corresponding to smoothing the probability density of z (5-HT1a effect), with standard deviation σ, p(e|x, z) = N (z, σ). The momentum term accumulates smoothed stochastic gradients of the potential energy function, which are subject to a Gaussian convolution and are applied a stochastic perturbation using independent Gaussian-distributed random variables ξ t representing the stochastic noise terms (5-HT2a effect). As a consequence, the particle will accelerate as it heads towards the minimum of the "potential" energy land- The above equation indicates the instantaneous change in neural response is driven by a momentum term e which accumulates stochastic smoothed gradients of an energy function ∇ z E θ (z, x) subject to injected stochasticity and smoothness. Thus, the neural adaptation and sampling process explores the space of neural responses by combining gradient-driven dynamics towards lower energy regions with stochastic noise and smoothing, allowing the system to cover a wider range of neural responses. After a sufficient number of time steps, the final neural response of the system can be considered as a sample from the distribution p θ (x, z) of interest.

CONCRETE ALGORITHM FOR 5-HT NEUROMODULATED EBMS

Above we presented idealized learning and inference algorithms in EBMs with neuromodulation mechanisms within an HPP framework. In this work we are primarily interested in studying the effect of exogenous 5-HT2a and 5-HT1a neuromodulation on the inference procedure of neural responses in EBMs. In order to keep our analysis clear and tractable, we make a set of simplifications to the above procedures. α excite , α inhibit ∼ drug(t) ▷ Sample neuromodulator level 8: δ excite = α excite ξ t ▷ 5-HT2a agonism effect 9: ▷ 5-HT1a agonism effect 10: ▷ Homeostatic plasticity effect 11: for k = 0, 1 . . . K -1 do 14: ▷ Recurrently infer neural responses 16: end for 17: ▷ Objective for approximate optimization of EBM 19: The values of these representations are then normalized to be within the range [0, 1] at the start of the optimization process. Finally, we employ a simplified learning objective for E(z) ∝ -log p(z) to describe changes in synaptic connectivity, namely via min E L(E), with • α excite : modulates the 5-HT2a excitatory drug effect • α inhibit : modulates the 5-HT1a inhibitory drug effect • α homeo : modulates the strength of homeostatic constraints

SAMPLING NEUROMODULATION LEVEL

The ranges for α excite and α inhibit can either be set constant for the duration of the optimization process, or be determined dynamically. Given that we are interested in exogenous neuromodulation effects from drugs, we use a function drug(t) to determine the neuromodulation levels as a function of time. This function uses a beta distribution to provide a response curve, where the probability density function at a given point corresponds to the drug strength at that point in time. This function is used to provide a simple and generic representation of prototypical change in blood plasma concentration levels of a drug over time.

-HT2A DRUG EFFECT

At each iteration t, a stochastic perturbation is applied to the auxiliary energy function Ẽ(z) where ξ excite t ∼ E(N, σ) is structured noise sampled from the energy function distribution and then scaled to be within the range [-0.5, 0.5], and α excite t the relative excitatory strength of 5-HT2a signaling.

-HT1A DRUG EFFECT

We simulate the inhibitory effect of 5-HT1a agonism by applying a Gaussian filter kernel centered around Ẽ(z) and apply this effect to the auxiliary energy function Ẽ(z) We use a local Gaussian filter kernel G(σ) which is convolved with Ẽ(z), where σ t denotes the size of the Gaussian kernel and α inhibit t the relative inhibition strength of 5-HT1a signaling. Homeostatic plasticity effect Homeostatic plasticity is a mechanism for stabilizing neural activity levels across a network, ensuring that neurons maintain an appropriate level of excitability. This regulatory mechanism restores stability and mitigates imbalances in the activity levels of neurons by regularizing the neuromodulated energy landscape towards the baseline with α homeo t the relative strength of the update toward the previous value of the unmodulated energy function E(z).

APPLYING NEUROMODULATORY EFFECTS

The three neuromodulatory effects are then additively combined into a single term δ t and accumulated into Ẽ(z) before the sampling process takes place each iteration. The optimization procedure including the aforementioned neuromodulation mechanisms is described in Algorithm 1, and visualized in Figure.

METRICS OF INTEREST

We consider six primary metrics of interest which we collect and analyze during each simulation experiment. Figure(1st row) contains a simple graphical representation of each metric under consideration, with the final two being represented by the rightmost graphic.

ENERGY FUNCTION VALUE

The energy function value corresponds to the E zt [ Ẽ(z t )] for a set of sampled z t values at a given iteration t of the simulation. This metric provides insight into how a given intervention on the neuromodulatory system changes the energy level of sampled z t values relative to a non-modulated baseline over time. The EBM represents the probability of different responses implicitly, by the frequency with which it visits their representations via its dynamics. Higher energy of some neural response z means neural activity will likely visit z with less frequency, whereas lower energy corresponds to a larger proportion of the time neural activity representing z.

GRADIENT MAGNITUDE

We consider the average gradient magnitude of the energy function at a given time-step. This is calculated by uniformly evaluating ||∇ Ẽ(z)|| 2 for every possible value of z in our non-parametric representation. Functions with bounded gradient magnitudes are smooth, ensuring that changes in input variables result in proportionally limited changes in the function's output. This is crucial for maintaining stability and convergence of optimization algorithms towards optimal solutions. The average gradient magnitude value also corresponds to the expected precision assigned to the set of internal representations capable of being encoded by z. Larger values are desirable to the extent that the representations encoded by z are both able to reduce prediction errors in the current and future environmental contexts. This is more likely to be the case the more stationary the environmental context is over time. In non-stationary environments, a smaller average gradient magnitude over time will correlate with smoothness, and may be desirable as it will produce a more diverse sampling of z.

LOCAL MINIMA COUNT

We consider the number of local minima present in the energy function at a given time-step. This is calculated by uniformly evaluating our non-parametric representation of Ẽ(z) and determining the local gradient at each point. The local minima count reflects the complexity of the energy landscape (the more local minima, the more complex the landscape) and corresponds to possible stable attractor states in which the stochastic sampling process may enter. Within the context of our theoretical model, the number of local minima is predictive of how successful both the learning and inference processes will be. Fewer local minima increase the likelihood that the stochastic sampling process will infer a neural response closer to global minima of the energy function.

STATE VISITATION COUNT

We calculate state visitation count by tracking the number of unique values which z takes over the course of the optimization process in a given experiment. Within the context of our theoretical model, this metric illustrates the ability to cover a broad range of plausible interpretations of its input stimuli by exploring the entire posterior space, where better coverage (i.e. more visited states) leads to more accurate estimation of the energy function. As such, this metric can be seen as a proxy for diversity over neural states. The number of states z is correlated with diversity of neural activity patterns which might be recorded using neuroimaging techniques. Given the inherent low-dimensional nature of our z values, we simply count unique states rather than compute typical complexity measures such as Lempel-Ziv, which assumes the compressibility of states.

ENERGY FUNCTION DIVERGENCE

We also track the surrogate objective L( Ẽ) = D KL (p, p * ), where p ∝ -exp Ẽ(z), defined as the KL-divergence between the posterior probability distribution induced by the auxiliary variable-a surrogate energy function Ẽ(z) resulting from the acute drug-infused neuromodulation, and the target posterior induced by E * . This value is a proxy for the magnitude of the HPP prediction error that can be expected by sampling from p(z) at a given iteration, and thus smaller values correspond to reduced expected error. Within the context of our theoretical model, an optimal posterior over neural responses (i.e. L( Ẽ) = 0) implies that an organism is able to infer stimuli correctly and reduce uncertainty relative to its environment. We conjecture this correlates with a reduction in stress and negative affect in an individual, and is consequently a clinically desirable outcome. Consequently, we use this value relative to a placebo baseline as a proxy for the therapeutic efficacy of a given neuromodulatory intervention.

DIVERGENCE TREND MONOTONICITY

We finally consider a measure that quantifies the evolution of the KLdivergence during optimization. In an ideal scenario where temporal monotonicity is achieved, the convergence curve of the optimization process would show a consistently decreasing trend over iterations. However, in practice, there exist fluctuations or plateaus due to system noise and the influence of the This metric quantifies fluctuations and transitory changes, thereby measuring the disruption in the learning process caused by neuromodulation. Within the context of our theoretical model, this corresponds to the extent to which uncertainty or misaligned internal representations are introduced by the pharmacological in-tervention. We therefore use this value as a measure of the tolerability of the acute experience. A perfectly tolerable intervention would strictly decrease the divergence through a robust monotonically-converging optimization processes.

RESULTS

In order to evaluate our proposed model of serotonergic neuromodulation on cortical dynamics, we run a series of idealized simulations of the algorithm presented above. We first examine the individual impacts of 5-HT2a and 5-HT1a neuromodulation on cortical optimization, finding that they respectively induce transient overfitting and underfitting in the optimization process of the posterior probability over neural responses. We correlate these phenomena with our metrics of therapeutic efficacy and tolerability, respectively. We next consider the effect of mixed agonism, and find that this maintains the desirable properties of both individual agonism types to provide both greater therapeutic efficacy as well as greater tolerability. Finally, we explore the full space of biased agonists and report the relative level of 5-HT2a to 5-HT1a agonism which optimally trades off between our proxy metrics of therapeutic efficacy and tolerability, finding it to be a biased 5-HT1a agonist.

-HT2A AGONISM INDUCES TRANSIENT OVERFITTING

We first consider the effects of pure 5-HT2a agonism on the final KL-divergence values across a range of dose levels (see Figure, 2nd row). We find that only two levels of 5-HT2a agonism decrease the KL-divergence relative to the baseline: heavy [two-tailed t-test: t(99) = 3.1, p < 0.001], and max [t(99) = 4.83, p < 0.001] (column 5). Given that we use the KL-divergence objective function as a proxy for therapeutic efficacy, we predict that only heavy and max levels of 5-HT2a agonism may be capable of inducing a therapeutically significant effect. When we examine the energy function values over the course of the experiment, we observe that across all dose ranges there is a transient acute reduction of the energy value below the baseline. Furthermore, in cases of heavy and max dose levels, this acute reduction falls below the final values in the post-acute stage of the simulation (column 1). We also find that the gradient magnitude increases over baseline during the acute drug effect, and that this effect is likewise correlated with dose strength, with heavy Turning to the remaining metrics, we observe that local minima count decreases relative to the baseline during the acute phase of neuromodulation for the heavy and max dose levels [both p < 0.001] (column 3). State visitation count in contrast increases according to the dose strength, with medium, heavy, and max doses producing significantly higher state counts than baseline [all p < 0.005] (column 4). Finally, we find that as the dose strength increases the KL-divergence metric increases as well, with significant differences found for all levels [all p < 0.001] (column 5). As a result of 5-HT2a agonism, both the number of attractor states and the location and energy value of these states change significantly during the acute drug phase and this effect is correlated with drug strength. These results suggest the number of attractor states decreases and the optimization procedure exhibits oscillatory or diverging dynamics during the acute drug phase, an effect correlated with drug strength. This finding is consistent with observed increases in the entropy of cortical dynamics as measured by fMRI in individuals experiencing the acute effects of serotonergic psychedelics (R. L.. Furthermore, the acute increase in divergence at heavy and max dose levels suggests that 5-HT2a agonism produces a state in which there is an expected increase in divergence, and thus may correlated with a more challenging subjective experience.

-HT1A AGONISM INDUCES TRANSIENT UNDERFITTING

We next consider the effects of pure 5-HT1a agonism on the metrics of interest across the same range of doses (see Figure, 3rd row). We first find that 5-HT1a agonism is unable to produce a lasting decrease in the KL-divergence between the learning and target posterior probability distributions, even at the max dose level (column 5). This suggests that within the assumptions of our theoretical model, 5-HT1a agonism alone may have limited long-term therapeutic efficacy. On the other hand, during the acute phase of the simulation we find that 5-HT1a agonism transiently, but significantly decreases the KL-divergence values for all levels of agonism [all p < 0.001] (column 5). This suggests that 5-HT1a agonists may provide acute therapeutic relief, even if these effects do not persist once the drug effect has ended. Similar to what we observe with 5-HT2a agonism, we find that 5-HT1a agonism is able to induce decreases in the post-acute energy function values. We find however that only the max dose level is able to produce a significant decrease compared to the baseline [t(99) = 3.498, p = 0.001] (column 1). Unlike 5-HT2a agonism however, 5-HT1a agonism induces an opposite effect during the acute phase of the drug by increasing the energy value above baseline in a manner correlated with drug strength, with medium, heavy, and max doses producing significant acute increases [all p < 0.001]. Examining the change in gradient magnitude, we find an inverse effect to that of 5-HT2a agonism, with 5-HT1a agonism inducing significant transient decreases in gradient magnitude as a function of drug strength for all dose levels [all p < 0.001] (column 2). Transient increases in energy values along with decreases in gradient magnitude together can be interpreted as the induction of transient underfitting in neural response inference procedure during the acute effects of 5-HT1a agonism. This transient underfitting is equivalent to belief relaxation, where the precision of encoded beliefs is significantly decreased. These transient effects are also correlated with the acute decrease in the KL-divergence objective metric. We can connect this finding to the clinically observed phenomenon of temporary relief in individuals from maladaptive beliefs during the acute administration of drugs such as MDMA. Examining the remaining metrics, we find that 5-HT1a agonism produces a significant acute decrease in the number of local minima within the energy function which are greater than those of 5-HT2a agonism (column 3), and that this effect is correlated with dose strength [all dose levels above min p < 0.001]. We find that the number of visited states is also significantly increased by 5-HT1a agonism (column 4), though at a level which is considerably less than that of 5-HT2a agonism, with only max [t(99) = -2.104, p = 0.037] dose levels produce significantly greater state counts than baseline. These results suggest that 5-HT1a agonism alone is unlikely to produce the increases in neural response diversity comparable to those which result from administration of serotonergic psychedelics (R. L..

EFFECT OF BALANCED SEROTONIN AGONISM

We next report the effect of applying evenly balanced 5-HT2a and 5-HT1a agonism on the metrics of interest across a range of doses (see Figure, 4th row). We find that these neuromodulation patterns are able to produce final KL-divergence values which are significantly lower than the considered baseline for the medium, heavy, and max dose levels [all p < 0.001] (column 5). This suggests that these dose levels may provide long-term therapeutic benefit for individuals under our conjectured therapeutic correlation. We also observe that for all doses except for max, there is a transient decrease in KL-divergence values during the acute phase of the drug. We correlate this observation with possible short-term therapeutic benefits to balanced 5-HT1a/5-HT2a agonism, similarly to 5-HT1a agonism. We observe that balanced agonism also produces significant decreases in post-acute energy function values at the medium, heavy, and max dose levels [all p < 0.001] (column 1). Furthermore, these values are lower than either pure 5-HT2a or 5-HT1a agonism alone at the equivalent dose level. The acute energy function values for balanced agonists are also neither above or below pre-or post-acute levels. We also observe that mixed agonism produces significant transient decreases in gradient magnitude at all dose levels [all p < 0.001], but that these are less extreme than those of equivalent pure 5-HT1a agonism. Taken together, these results suggest that balanced agonism is able to reduce post-acute energy function values without the introduction of either predominant underfitting or overfitting of the neural response function during the acute phase. Given the transient but unstable decrease in gradient magnitude, we can still interpret these interventions as inducing a form a belief relaxation, although one which is not as consistent as that of pure 5-HT1a agonism. It also differs from pure 5-HT1a agonism in that the states being sampled have a higher likelihood than they would under the baseline (as a result of the lower energy value), but a lower precision (as a result of the lower average gradient magnitude). Turning to the remaining metrics, we find that balanced agonism induces a decrease in the number of local minima during the acute drug phase which is greater than that seen for either 5-HT1a or 5-HT2a agonism alone [all except min dose level p < 0.001] (column 3). Correspondingly, state visitation count is also increased in balanced agonism over 5-HT2a and 5-HT1a agonism, with all dose levels above min producing significantly more visited states [p < 0.01] (column 4). This would likewise correspond to a measurable increase in neural response diversity, which is one of the signatures of mixed serotonergic psychedelics such as LSD and psilocybin (R. L.. Relative to monotonic decrease of the optimization objective over the course of the simulation observed in the baseline, here we observe that there are significant increases in KL-divergence [all p < 0.001] (column 5). This suggests under our therapeutic formulation that balanced agonists may also present questions of tolerability. Notably however we see more monotonicity in the KL-divergence trend compared to that of 5-HT2a agonists, suggesting an overall better potential tolerability profile. Collectively, these results suggest that balanced agonists may provide both greater short-and long-term therapeutic efficacy than either 5-HT2a or 5-HT1a agonists in addition to providing a better tolerability profile than pure 5-HT2a agonists.

EXPLORING THE OPTIMAL SEROTONIN AGONISM BIAS

Above we found a synergistic effect of balanced agonism on post-acute values of the KL-divergence as well as the state visitation and acute local minima counts. Balanced agonism also produced a moderating effect on the non-monotonicity of the KL-divergence trend. Given these improvements as compared to pure agonists, we hypothesized that a neuromodulation profile which ideally balances our conjectured proxy metrics for therapeutic efficacy and tolerability might involve a form of biased rather than balanced mixed agonism. In order to determine this, we simulated all possible permutations of 5-HT2a and 5-HT1a dose strengths. These results are presented in Figure. We consider final values of the KL-divergence and the non-monotonicity of the KL-divergence trend over the optimization procedure as the primary two metrics of interest, given their interpretation within our model as proxies of therapeutic efficacy and tolerability, respectively. We find that a max dose of a balanced 5-HT2a and 5-HT1a agonist produces the greatest post-acute reduction in optimization objective. Likewise, the general trend is that increases across both agonist ranges produces improvements in this metric, with a greater portion of the improvement coming from increases in 5-HT2a rather than 5-HT1a agonism. This result alone would imply that a maximal dose of a balanced agonist may provide the greatest long-term therapeutic benefit. In contrast, we find that the KL-divergence trend is most monotonic at low levels of both 5-HT2a and 5-HT1a agonism, but is most strongly affected by increases in 5-HT2a agonism. Accordingly, smaller doses, and a bias towards 5-HT1a agonism may improve the expected tolerability of the drug intervention. In order to assess the trade-off between these two metrics, we normalized values of both final divergence and divergence trend non-monotonicity and equally weighted them to produce a new score metric which balances the two desirable drug properties. Here we find that a max level of 5-HT1a agonism paired with a medium or heavy level of 5-HT2a agonism provides the optimal trade-off between the two metrics of interest. In particular, this level of biased agonism produces post-acute KL-divergence values which are comparable to that of pure max 5-HT2a agonism, but does so with a trend which is significantly more monotonic than that produced by max 5-HT2a agonism. For cases in which tolerability and short-term effi- 0.17 0.16 0.16 0.16 0.17 0.17 cacy is more important, a greater 5-HT1a bias would be desirable. In contrast, for cases in which long-term therapeutic efficacy is most important, a balanced agonism profile is desirable. We finally consider the relationship between the metrics describing the acute drug effects and our two proxy outcome metrics. We find that while average gradient magnitude is uncorrelated with the final KL-divergence, final energy value, average local minima count, and final state visitation count are all significantly correlated with final KL-divergence value [linear regression: all p < 0.001] (see Figure). Among these, state visitation count is able to account for nearly all of the variance in final optimization objective, with an r 2 = 0.952. This finding aligns with the hypothesis that there is a causal relationship between increases in neural response diversity and therapeutic outcomes (R. L.. From an optimization perspective this is natural since increased exploration of the posterior probability landscape can improve the sampling procedure, consequently leading to better learning of the target. Average energy value, average gradient magnitude, and final state visitation count are also linearly predictive of monotonicity in the KL-divergence trend [all p < 0.01]. Of these, we find that average energy value accounts for the greatest variance in the proxy metric, with an r 2 = 0.894. Although this relationship is not causal, it does imply that overfitting in the energy function during the acute drug experience is associated with greater divergence between the current and target energy distributions. Given that this is our hypothesized proxy for tolerability, it suggests that acute overfitting may be associated with decreases in drug tolerability.

DISCUSSION

Operating within the preexisting theoretical models of predictive processing and energy-based models of neural response dynamics, we proposed concrete mechanisms whereby 5-HT2a and 5-HT1a agonism drugs such as psilocybin, LSD, and DMT may exert their effects, and simulated these effects across a range of dose strength and relative agonism biases. This model and simulations results have a number of potential implications for theoretical debates concerning belief representation, and motivate further study concerning clinical application of psychedelic-assisted therapy, as well as next-generation psychedelic drug discovery. Below we attempt to address these considerations.

ALTERATIONS IN BELIEF REPRESENTATION UNDER PSYCHEDELICS

Overall we found that both 5-HT2a and 5-HT1a agonists are able to produce meaningful improvements in KL-divergence, the metric which serves as our proxy metric for therapeutic efficacy. In particular, 5-HT2a agonism provides greater long-term benefits across the dose response curve, whereas 5-HT1a agonism confers benefit only during the acute phase of neuromodulation. Our findings predict that 5-HT2a agonists can be characterized as acutely inducing a transient overfitting into the cortical dynamics, resulting in the generation of a sequence of ever-shifting high-precision beliefs which may or may not be veridical. In contrast, 5-HT1a agonists can be characterized as acutely inducing transient underfitting into cortical dynamics, resulting in the generation of a more stable sequence of lower-precision beliefs than either a placebo baseline or 5-HT2a agonism. We can interpret these observations in light of the debate around whether psychedelics are beliefstrengthening, belief-relaxing, or some third category (R. L.. Strictly speaking, our findings predict that 5-HT2a agonists such as DOI and DOB will be belief-strengthening. This is apparent in the sharp increase in gradient magnitude produced by higher doses of 5-HT2a agonism in our simulations as well as the acute decrease in energy function values relative to baseline. This must be qualified however by the fact that this strengthening of beliefs does not refer to post-acute outcomes. Rather than a fixed set of beliefs being strengthened for the duration of the drug effect, the increase of belief precision is stochastic and transient, thus resulting in a higher state visitation count as attractors in the energy landscape are constantly disrupted during the acute drug effect. This account is able to reconcile the phenomenological evidence for pareidolia and insight experiences, which are consistent with belief strengthening, during acute psychedelic experiences with the finding that psychedelics increase cortical entropy entirely within a pure 5-HT2a agonist model of psychedelics. Our understanding of the belief-altering properties of psychedelics becomes more nuanced however when we consider the significant modulating role that 5-HT1a agonism also plays in the acute effects of psychedelics. Our simulations predict that 5-HT1a agonism produces a belief-relaxation effect, which can be measured both by the decrease in gradient magnitude as well as energy function values during the acute phase of the experiment. Both of these together serve as our proxies for belief precision, suggesting that 5-HT1a agonists relax beliefs acutely. Furthermore, we find that 5-HT1a agonists decrease the number of local minima present in the energy function surface, which corresponds to the number of stable beliefs a network can encode. This effect is consistent with the phenomenological reports of 5-MeO-DMT use, a highly biased 5-HT1a agonism which at heavy doses produces experiences in which users become unable to represent objects, ideas, space, or even time. We also consider relatively balanced serotonin agonists such as psilocybin, LSD, or DMT, which have also been the subject of the greatest scientific focus in recent years. Here we predict both belief-relaxation and belief-strengthening operating simultaneously, albeit in different ways. In this case we find both an increase in state visitation count as well as an overall decrease in gradient magnitude and number of local minima. We likewise find that balanced agonists produce a more monotonic trend in KL-divergence compared to pure 5-HT2a agonists, suggesting greater potential tolerability, while also decreasing the postacute divergence values beyond what the placebo or either class of pure agonists are capable of. Taken together, these findings are both consistent with the ALBUS and REBUS models of psychedelic action. Whether psychedelics are viewed as strengthening or relaxing beliefs is then a question of the particular drug, timescale, and proxy metric of interest under consideration. This opponency effect between 5-HT2a and 5-HT1a agonism on belief representation has previously been identified in the context of a model contrasting DMT and 5-MeO-DMT (Gómez-Emilsson, 2020). It is also is consistent with the passive vs active coping model introduced by R. L. Carhart-Harris and Nutt, and the automaticity vs flux model of Shine et al.. In all cases, 5-HT1a agonism can be understood to acutely relax beliefs whereas 5-HT2a agonism acutely and stochastically strengthens beliefs. The effect of the former is to produce a state which may provide short-term relief from stress, whereas latter provides the potential for long-term benefit. Significantly, the class of serotonergic psychedelics currently being clinically evaluated are capable of inducing both effects simultaneously due to their mixed binding profiles, thus providing both benefits in a combined manner.

CONNECTION TO ENDOGENOUS NEUROMODULATION

Our work also has implications for endogenous neuromodulation and is consistent with current theoretical work on the role of postsynaptic 5-HT agonism in the cortex as a stress-response system (R. L.. Here we extend this perspective by proposing a set of concrete mechanisms by which cognitive responses to stress can be deployed to improve the adaptedness of an organism within a given environmental context. Although we did not explore it here, a natural extension of our model is to consider the mechanism by which endogenous neuromodulation takes place in response to stress. While critically important to affect and cognition, the receptor populations we consider here are only two of a dozen 5-HT receptor types expressed throughout the central nervous system. There is evidence that other 5-HT receptor populations including 5-HT1B and 5-HT2C may also play important roles in the etiology of certain mood disorders. These two populations in particular are also common targets of classic psychedelics including LSD and psilocybin, though they do not possess the same level level of binding affinity or causal influence as either 5-HT2a or 5-HT1a. The action of these receptor populations on cortical activity is also less straightforward than the apparent opponency between 5-HT2a postsynaptic excitation and 5-HT1a postsynaptic inhibition. Further basic neuroscientific research will be required to fully understand the complex role that the family of 5-HT receptor populations plays throughout the cortex and the central nervous system more broadly. There is also a complex interplay between serotonin and other neuromodulatory systems such as dopamine. While previously it has been believed to exist an opponency between the two neuromodulators, this hypothesis has more recently been refined to suggest the pair of neurotransmitters having doubly dissociable effects, particularly in the context of cognitive flexibility. The dual-receptor nature of the serotonin coping systems can also be compared to the dopamine system's balance between reflective, goal-oriented decision making, linked to the dorsomedial striatum (caudate) and involving future planning, and reflexive, habitual behavior, linked to the dorsolateral striatum (putamen) and based on past actions or previous experience.

CONNECTION TO NEURAL ANNEALING THEORY

A popular metaphor for the therapeutic action of psychedelics on brain dynamics is that of annealing in metallurgy or simulated annealing in probability theory. This analogy has been formalized by various theorists into a neural annealing theory of psychedelic action (M.. According to neural annealing theory, the dynamics of the brain are "heated," "manipulated," and then "cooled" in a manner analogous to that of the process of annealing in metallurgy. Rather than being literally heated or cooled, the neural annealing process is meant to correspond to the phase of transient acute increases in the entropy of neural patterns of activity during psychedelic use (R. L., followed by the slow process within which the cortical dynamics return to baseline-subsequently also referred to as "metaplasticity". The neural annealing theory has also been extended beyond the particulars of psychedelics to encompass any intervention which may be capable of increasing the entropy of brain dynamics, such as specific breathing or meditation techniques (Gómez-Emilsson, 2021). The theoretical model and set of simulations which we present here serve as a possible concrete computational instantiation of the neural annealing process. In this work we demonstrate first that both 5-HT1a and 5-HT2a agonism have the capacity to increase the neural response variability, a measure of cortical entropy, during the acute phase of drug effects, and that these dynamics return to baseline when the drug effect ends, thus satisfying the basic criteria of a neural annealing process. We also demonstrate that this increase in cortical entropy corresponds to a period in which the synaptic connectivity structure becomes more malleable, thus resulting in lasting changes to cortical dynamics in the post-acute phase. A therapeutically successful neural annealing process involves two primary outcomes. The first is that cortical dynamics settle into a regime in which they are better able to reduce prediction errors for the current environmental context, which we refer to as adaptation. This is reflected at two temporal scales. The first is in the current neural dynamics settling into an attractor which is closer to the global minima than would result from a placebo in which the stochastic sampling process is local, insufficiently exploratory and thereby slow. The second is the KL-divergence between the current and target energy functions decreasing significantly. We find that in our simulations 5-HT2a agonism is able to achieve this goal at suitable dose levels, with balanced agonism producing an even stronger therapeutic effect. The second desirable outcome of a neural annealing procedure is that the synaptic connectivity of the energy function enables fewer local minima and a smaller overall gradient magnitude. Together these are expected to prevent neural dynamics from getting trapped in overly-precise attractors in the future, which we refer to as adaptability. This increase in adaptability may be what is typically associated with post-acute increases in markers of plasticity after psychedelic drug administration. Within the context of our simulations, we find that 5-HT1a agonism is capable of producing this effect acutely. As we do not model the potential effects that psychoplastogens may have on the underlying energy function, we cannot speak to the long-term changes in local minima. Regardless, an increase in future adaptability will likely be at the expense of decreasing current adaptedness, highlighting a trade-off inherent in learning systems, including the human brain, between adaptation to a given context and adaptability to potential future contexts.

EMPIRICALLY VALIDATING OUR MODEL AND PREDICTIONS

Our theoretical model is based on a number of assumptions which are still in need of experimental validation. Our model operates on the basis of the Bayesian brain hypothesis, and assumes that cortical dynamics are governed by a process of hierarchical predictive processing. More specifically, we make the assumption that HPP can be modeled as learning and stochastic sampling in an energy-based model using gradient descent. Human behaviour is consistent with Bayesian inference in many sensory, motor) and cognitive tasks. While there is accumulating evidence for predictive processing in various aspects of cortical computation, it is still not universally accepted among theorists. A similar state of evidence exists for explicit attractor dynamics in cortical computation, which would be amenable to modeling using energy functions. While research into the existence of such dynamics is still nascent, there is accumulating evidence that within the context of high-level decision making PFC cortical dynamics can be modeled using energy functions. In our model, we assume that both 5-HT1a and 5-HT2a agonists rapidly and continuously modify the entire energy function governing cortical dynamics. Validating the particular nature of this modification would pose some difficulty given the current capabilities of neuroimaging. Current recording techniques are only able to build model of this function by sampling from a number of sequential trials over an extended period of time (S.. Novel experimental design or data analysis methods may serve to provide insight into the effects of these receptor populations on the induced energy function in simple contexts such as two choice decision making tasks or basic perceptual tasks. Verifying the predictions concerning the clinical effects of serotonergic psychedelics is potentially more tractable within pre-existing methodological frameworks. The simulation results presented in Section 3 provide the basis for a number of predictions which may guide neuroimaging research, clinical protocols, and drug development. The first of these concerns the modulating role of 5-HT1a agonism in the psychedelic experience. Our model predicts that unlike 5-HT2a agonist psychedelics, 5-H1a agonists should not significantly increase acute cortical entropy, as measured by the number of unique cortical states visited as well as the transition dynamics between states (R. L.. Our model also predicts that balanced mixed agonists will produce levels of cortical entropy which are significantly greater than either 5-HT1a or 5-HT2a agonists alone, suggesting a complex synergistic effect between the two receptor systems. As has been theorized by R. Carhart-Harris et al., we expect that increases in cortical entropy will be significantly correlated with greater post-treatment therapeutic efficacy. Our model also predicts that 5-HT2a agonists will provide greater long-term therapeutic efficacy than 5-HT1a agonists, and that mixed agonists will produce the greatest therapeutic effect, as they also produce the greatest levels of cortical entropy. Our model also predicts that mixed agonists will have greater therapeutic efficacy at lower doses than either pure 5-HT2a or 5-HT1a agonists alone, which has implications for both safety and tolerability of treatment. From a clinical perspective, it is also desirable to maximize the tolerability and short-term efficacy of the acute experience during psychedelic-assisted therapy. Our model predicts that pure 5-HT1a agonists will be able to produce measurable therapeutic improvements during the acute phase of the drug effect as well as provide greater acute tolerability. In contrast, we predict that a pure 5-HT2a agonist may have the oppose acute effect. This is consistent with pre-existing findings that 5-HT1a agonists produce acute anxiolytic effects, while 5-HT2a agonists can be anxiogenic. There already exists some evidence that increasing 5-HT1a receptor activation during the acute psychedelic experience decreases the intensity of the reported acute experience, but this should be examined in greater depth by explicitly modulating the relative levels of 5-HT2a and 5-HT1a agonism across a range of doses. Our model also has implications for research into the experience of insight, a key aspect of problem solving. Psychedelic use is known to transiently induce insight experiences, some of which may be so-called "false insights," due to their non-veridical nature. We can operationalize the insight experience as the cortical dynamics for a given internal representation arriving in a novel attractor within the energy function landscape that is able to reduce prediction error significantly better than previous attractor states. The intensity of the insight experience then corresponds to the value of the energy function at that point in combination with the local gradient magnitude around the point, the latter of which is a measure of the precision with which the novel belief is encoded. Our model predicts that 5-HT2a agonists induce insight experiences through the generation of transient attractor states with unusually small energy values and large gradient magnitudes around them. This can potentially account for the intensity of the reported insights during psychedelic-assisted therapy, along with the potential for these insights to be false or spurious, as they are generated by an energy function which has been perturbed by exogenous neuromodulation from the drug and may not align with the target. Our model also predicts that 5-HT1a agonism will modulate the number and intensity of these insight experiences, potentially resulting in fewer, but more stable insights during the acute treatment experience. These insights may then be better positioned to serve as a useful basis for the post-acute process of psychotherapeutic integration.

POTENTIAL IMPLICATIONS FOR DRUG DESIGN

Current efforts in psychedelic drug design are largely focused on the development of non-hallucinogenic 5-HT2a agonists. Although progress has been made in this area, there is evidence to suggest that the acute phenomenology of the psychedelic experience plays a non-trivial causal role in the long-term therapeutic efficacy of psychedelic-assisted therapy. If this is indeed the case, then there is value in understanding the pharmacological mechanisms responsible for different aspects of the acute phenomenology and their relationship to clinical outcomes. Early clinical evidence suggests that 5-MeO-DMT, a highly biased 5-HT1a agonist psychedelic, has significant potential as a treatment for major depressive disorder. Some concern exists however regarding the safety profile of 5-MeO-DMT both in the acute and post-acute phases of the drug effect. One particular concern is the experience of "reactivations" in which the drug phenomenology of the effect spontaneously returns days or weeks after acutely consuming the drug. Other concerns include the potential for 5-MeO-DMT to induce seizures in individuals with a predisposition, or for it to increase body temperature to unsafe levels. Our model predicts that a highly-biased 5-HT1a agonist such as 5-MeO-DMT will only have clinically relevant effects at very high doses, and this seems to be the case from early empirical studies. Increases in dose strength however produce decreases in psychological tolerability as well as corresponding undesirable side-effects resulting from high levels of 5-HT agonism at non-clinically relevant sites throughout the nervous system, potentially leading to serotonin syndrome in extreme cases (Volpi-Abadie,. An ideal balance between 5-HT2a and 5-HT1a agonism may involve a relatively smaller but pharmacologically meaningful bias towards 5-HT1 agonism over 5-HT2a agonism. Notably, 5-HT itself is biased in this way with a three to five fold preference for 5-HT1a over 5-HT2a agonism, which is consistent with the optimal binding profile reported in our results. It may prove worthwhile to explore the development of drugs with a similar profile to 5-HT itself, but which are acutely psychoactive and amenable for use in psychedelic-assisted therapy. 5-MeO-DMT is part of a larger class of "5-methoxy" psychedelic substances which all posses greater binding affinity for 5-HT1a over 5-HT2a receptors. In particular, drugs such as 5-MeO-MiPT or 5-MeO-DiPT have been reported to have binding affinity ratios within the range highlighted here as potentially optimal, and should be given greater potential clinical attention. The existence of modern drug discovery methods utilizing deep learning make the process of discovering a molecule with the desirable binding properties considerably easier than in previous decades. Recent work has demonstrated the ability to utilize state-of-the-art protein folding models to develop candidate molecules which possess strong 5-HT2a binding profiles. The extension of this technology to discovering 5-HT1a binding molecules is well within the capabilities of current models, and may yield the development of promising new psychotherapeutic agents with fewer off-target effects than those of currently known psychedelics.

LIMITATIONS AND FUTURE DIRECTIONS

There are a number of limitations to both our theoretical model and the simulations presented here. Many of these were made in order to avoid unnecessary obfuscating complexity into a novel paradigm, but they should nonetheless be addressed in the future. First, we limit our consideration to the dynamics of only a single fixed target energy function and a single associated learned energy function. This is a significant simplification of real neural dynamics, where predictive processing is hypothesized to take place at many levels of a spatiotemporal hierarchy. We are also unable to capture the potential for complex non-linear interactions between relevant functional networks within the cortex that may be induced by 5-HT neuromodulation. Second, we make the simplifying assumption that the environmental context which the organism finds itself in is held fixed for the duration of the simulation. Consequently, the generative model's parameters, from latent neural responses to reconstructed stimuli, presumably context-dependent, are assumed fixed, and omitted throughout the drug experience. This allowed us to avoid considerations of how the external dynamics of the world may alter the sensitivity of the neural optimization process, particularly its response to sensory information, or attentional modulation of prediction errors over time. At the same time, it has prevented us from studying how the drug effect may impact lifelong or continual learning. A number of clinical protocols for psychedelic-assisted therapy have also utilized repeated doses over the course of multiple weeks (R., whereas here we focus on simulating the effects of a single dose, which may be less effective. Third, we resorted to a non-parametric representation of the optimization landscape, which is modeled via a two-dimensional neural response z, which we update directly by gradient descent on the energy function. While this is a significant simplification of neural dynamics in living organisms, there is evidence to suggest that for a given functional network, the underlying optimization dynamics may often lay on a lowerdimensional manifold. Indeed, multiple studies have found that cortical attractor dynamics can be projected down to as simple as a 1D space while retaining significant predictive power in the case of manifolds for high-level decision making and representation. Fourth, our model also does not take into account the role that action plays in the perceptual process. Given that perception and belief formation is ultimately in the service of adapted and goal-directed action, this would serve as a worthwhile extension to the model. The PFC in particular has been heavily studied for not only its role in high-level belief representation, but also for high-level action selection. Numerous theoretical models currently exist which attempt to provide normative descriptions of this role of the PFC, often from the perspective of reinforcement learning, meta-learning, or both (J. X.. Fifth, it is the case that the simulations utilized in this work are abstracted from real neural data. Each of the starting energy functions are sampled from a set of structured noise functions, rather than being naturally induced by a specific set of parameters and perceptual or behavioral task. A promising future direction is to extend this work by considering datasets and excitatory-inhibitory models represented via recurrent neural networks which would themselves induce more ecologically valid energy functions. This would allow for a more thorough examination of the neuromodulatory properties of 5-HT1a and 5-HT2a agonists under conditions which are more similar to the actual set of tasks which an organism may face in its daily life. The currently expanding literature on work utilizing deep neural networks as models of brain function may be of particular interest towards realizing this goal. Finally, in order to realize the potential benefits of next-generation psychotherapeutic agents, we emphasize the importance of a measured and careful approach to any research in the mental health and psychedelic drugs space. It is also important that due attention be paid to issues of health equity and ethics to ensure that the consequences of such research are beneficial and equitably distributed.

CONCLUSION

Psychedelic-assisted psychotherapy has the potential to improve the lives of many individuals suffering from cognitive, affective, and behavioral disorders which are not amenable to current treatment modalities (M. W.. A critical step towards realizing this potential is to develop sophisticated models which can be used to predict the therapeutic efficacy of candidate psychedelic substances and doses. While the amount of clinical and neuroscientific evidence which can be used to develop such models continues to accumulate at a rapid pace (R., actionable models are still in their infancy. One popular theoretical framework understands psychedelics primarily as pharmacological agents which engender greater neuroplasticity across various temporal scales. While appealing for its simplicity, this model cannot account for the mediating role that the acute phenomenological experience and its correlated neural activity play in driving positive clinical outcomes. In contrast, frameworks privileging changes to subjective experience have focused on the role of belief alteration, typically in the form of belief relaxation as a result of psychedelic use both acutely and post-acutely (R. L.. Here we presented a theoretical model based on predictive processing and a set of simulations in which this hypothesized process of belief alteration can be formally studied. We hope that it will be able to guide further clinical and neuroscientific research on this topic. Within our framework, we modeled the acute action of psychedelics on belief representation as the result of two unique sets of effects, generated by the populations of 5-HT2a and 5-HT1a receptors present in the prefrontal cortex. We operationalize these effects as two different forms of modulation of the process which encodes and maintains belief representations. At a high level, we find that both the effects of stochastic perturbation through postsynaptic excitation (5-HT2a) as well as uniform smoothing through postsynaptic inhibition (5-HT1a) result in improvements in this optimization process across various metrics, thus pointing to a potential therapeutic effect for both. We characterize these effects in optimization terminology as 5-HT2a agonism introducing a transient overfitting of beliefs and 5-HT1a agonism introducing a transient underfitting of beliefs. We predict that mixed agonists which activate both 5-HT2a and 5-HT1a populations will have the most desirable properties, as they may balance both long-term therapeutic efficacy as well as acute drug tolerability. Furthermore, a bias towards preferential 5-HT1a agonism may provide an even greater relative benefit to tolerability with only a minimal decrease in long-term therapeutic efficacy. Interestingly, the psychedelics currently being most thoroughly investigated for their potential as clinical tools all have various levels of mixed 5-HT agonism. Rather than being an undesirable property to be engineered away, this mixed agonism may be key to the unique profile of subjective tolerability and therapeutic efficacy which psilocybin, LSD, and DMT seem to possess, and pure 5-HT2a agonists such as DOI and DOB may not. Looking ahead, increasing clinical interest is being placed on the biased 5-HT1a agonist psychedelic 5-MeO-DMT, a substance which is capable of inducing powerful and long-lasting antidepressant effects in minutes. Clinical attention is also beginning to be paid to the positive effects of co-administration of MDMA along with LSD or psilocybin, a combination which may also result in a net 5-HT1a agonism bias and greater tolerability. Rather than focusing on the development of non-hallucinogenic psychedelics, which may prove to have limited efficacy outside of animal models, significant evidence exists in support of exploring the pharmacological design space of psychedelic substances with a biased 5-HT1a binding profile similar to that of 5-MeO-DMT or LSD co-administered with MDMA. Doing so may yield substances which are psychologically safer and more effective for clinical use than those currently under investigation. Such novel substances would enable the more mainstream adoption of psychedelic-assisted therapy, creating life-changing opportunities for the large population of individuals whose needs have been unmet by current front-line psychiatric treatments.

B THEORETICAL AND COMPUTATIONAL MODEL

In this section, we give additional details concerning the theoretical framework in which the computation model operates and the theory surrounding it.

B.1 INFERENCE AND LEARNING IN LATENT-VARIABLE MODELS

Given stimulus x we seek a good neural representation, denoted with z, such that the neural response elicited encodes maximal information about the stimulus. We formalize inference and learning in latent variable models as an optimization problem using the variational inference framework. We assume the stimuli should be decodable from neural responses, i.e. there exists a generative distribution x ∼ p θ (x|z) (typically obtained from top-down processing) represented by means of synaptic connection weights θ (subject to learning under Hebbian plasticity) that takes a neural response and decodes the stimuli. We also assume there exists a posterior recognition distribution over latent neural responses z, which is used to encode the sensory stimuli p ϕ (z|x), and it has a synaptic representation captured by ϕ. Problem setting To fit a generative distribution to incoming sensory stimuli using maximum likelihood, we arrive at the minimization objective over the synaptic weights θ The objective is minimized when p θ (x) matches p(x), in which case the data distribution is perfectly modelled. Using Bayes' Theorem, we may obtain the distribution over latent neural responses z when a stimulus x is received For complex, nonlinear models, the high-dimensional integral in the denominator is generally computationally or analytically intractable. Evidence lower bound (ELBO) One approach is estimate the recognition map p θ (z|x) with a parametric model p ϕ (z|x) and use it to encode the stimulus by minimizing the following KL-divergence over the parameters ϕ whereas the gradient with respect to recognition parameters ϕ yields where r(x, z) = log p θ (x,z) p ϕ (z|x) can be viewed as a reward for a credit assignment problem. Given individual stimuli sampled from x (i) ∼ p(x), and generated samples from the posterior z (i) ∼ p ϕ (z|x) we may approximate the gradients with respect to the generative parameters θ using an empirical mean, and update them using gradient-descent where α is a small positive learning rate. Similarly, to estimate the gradient with respect to the recognition parameters ϕ, given individual stimuli sampled from x (i) ∼ p(x), and generated samples from the posterior z (i) ∼ p ϕ (z|x) we may use

B.1.1 JOINT GENERATION AND NEURAL RESPONSE SAMPLING

EBMs adopt the sampling method to generate neural responses from the posterior distribution or the joint model p θ (x, z). The latter is an option due to the relationship ∇ z log p θ (z|x) ∝ ∇ z log p θ (x, z) ∝ ∇ z E θ (x, z). This means we can use Monte Carlo sampling (MCMC) to obtain the samples z (i) . There exist several MCMC samplers that can perform the procedure.

LANGEVING SAMPLER

The simplest of them is Langevin sampling, which uses A real concern with MCMC methods is that the Markov chain move through all areas of significant probability. If the target distribution has many modes or "islands" of high density, then it will take a long time to move from one island to another. When the initial proposal distribution used when sampling has a very large variance, then the chance of landing on a high-density island is small. Auxiliary variable MCMC methods such as stochastic Hamiltonian Monte-Carlo (HMC) and other neural adaptation procedures have been developed to address these concerns, making non-local jumps possible so that we can more easily jump from one mode to another. Neural adaptation with a Hamiltonian sampler HMC adds a random momentum variable and then simulates a particle moving on an energy surface. The momentum term keeps the network moving while reaching the local minima. This kind of neural adaptation accumulates noise terms ξ coming, e.g., from ion concentrations, release of neural transmitters, activation/inactivation of ion channels, drug-induced activation of receptors, etc. The momentum variable typically also aggregates an independent component, generally a random normal distribution N (0, σ), which may be independent of z or centered around it. Together these dynamics act as momentum to accelerate the sampling process and enhance the exploration of the posterior. Consider the simplified momentum-based dynamics e ′ ← e + ∆e, ∆e . = -β∇ z log p θ (z, e, x) + αξ with p(e|z, x) = N (z, σ) Gaussian noise dependent on z capturing the ability of inhibitory cells to respond to input, and ξ the perturbation added to the system by adding noise.

C SIMULATION METHODS

The cortical dynamics simulator code was written in python, and is freely available to use with an opensource license at:.

C.1 SIMULATOR PARAMETER SENSITIVITY STUDIES

We assessed the sensitivity of the simulated results to four key hyper-parameters; (1) Hebbian plasticity, (2) homeostatic strength, (3) the number of gradient steps taken when inferring the neural response (k in Algorithm 1), and (4) the energy function initializations. Hebbian plasticity Figurepresents results for a range of plasticity values. We find that our simulated results and overall conclusions remain consistent when plasticity α is varied from 0.1 to 1.0 (the default is 0.5). We do observe that when α = 1.0 the post acute number of local minima for mixed agonists falls relative to the baseline plasticity value, however this does not meaningfully affect our conclusions as the difference is small relative to the acute changes in local minima count. However, when plasticity is very large (α = 2.0), our conclusions no longer hold. In this case divergence increases for all agonist mixes and strengths, including the baseline (zero). Here, we observe that the post acute gradient magnitude is substantially larger compared to the baseline plasticity, as is the number of local minima. This indicates the energy function has become more complex and it is easier to get stuck in sub-optimal local minima, likely leading to the observed increase in post-acute divergence. Note that since we observe the the baseline divergence increasing we do not think that to be a very realistic value for plasticity We also present results for α = 0 to show the simulator dynamics in the absence of plasticity, however we note this is also not a realistic value for this parameter as it implies no learning is taking place. Homeostatic strength Figurepresents results varying strengths of the homeostatic constraint. Overall we find that our simulated results and overall conclusions remain consistent when homeostatic α is varied from 0.025 to 0.1 (the default is 0.05). The weaker the homeostatic constraint, the more the acute drug phase effects are amplified. For example, compare the acute phase reduction in energy value or increase in gradient magnitude and divergence for 2a_max when α = 0.05 and α = 0.01. The stronger the homeostatic constraint, the faster the dynamics stabilize after the drug effect ends (vertical dotted line). When homeostatic α is very low, at 0.01, the energy value, gradient magnitude, local minima, and divergence all continue to change substantially for many time steps after the drug effect wears off. Further, the energy value and gradient magnitude have not stabilized by the end of the simulated experiment. This suggests that 0.01 is not a realistic value for this parameter. We also present results for α = 0 to show the simulator dynamics in the absence of any homeostatic constraint, however we note this is not a biologically plausible value for this parameter.

GRADIENT STEPS

We find that the simulated results are not particularly sensitive to the number of inference gradient steps each iteration, and that our overall conclusions would remain the same across a range of values. Figureindicates that results are broadly consistent between 1 and 100 steps (the default is