ARC: a framework for access, reciprocity and... — Research Summary & Context | Blossom

Healthy VolunteersLSD

ARC: a framework for access, reciprocity and conduct in psychedelic therapies

In a within‑subjects intravenous LSD versus placebo study using computational reinforcement‑learning modelling, LSD increased reward (and to a lesser extent punishment) learning rates and reduced stimulus stickiness, producing greater exploration while leaving simple win‑stay/lose‑shift measures unchanged. These effects indicate heightened plasticity that could facilitate the revision of maladaptive associations in clinical psychedelic therapy.

Ask Blossom about this paper

Authors

Meg Spriggs
Ashleigh Murphy-Beiner
Robin Murphy

Published

November 22, 2022

Psychological Medicine

meta Study

Links

Read Paper DOI Google Scholar

Abstract

Background

The non-selective serotonin 2A (5-HT 2A ) receptor agonist lysergic acid diethylamide (LSD) holds promise as a treatment for some psychiatric disorders. Psychedelic drugs such as LSD have been suggested to have therapeutic actions through their effects on learning. The behavioural effects of LSD in humans, however, remain incompletely understood. Here we examined how LSD affects probabilistic reversal learning (PRL) in healthy humans.

Methods

Healthy volunteers received intravenous LSD (75 μ g in 10 mL saline) or placebo (10 mL saline) in a within-subjects design and completed a PRL task. Participants had to learn through trial and error which of three stimuli was rewarded most of the time, and these contingencies switched in a reversal phase. Computational models of reinforcement learning (RL) were fitted to the behavioural data to assess how LSD affected the updating (‘learning rates’) and deployment of value representations (‘reinforcement sensitivity’) during choice, as well as ‘stimulus stickiness’ (choice repetition irrespective of reinforcement history).

Results

Raw data measures assessing sensitivity to immediate feedback (‘win-stay’ and ‘lose-shift’ probabilities) were unaffected, whereas LSD increased the impact of the strength of initial learning on perseveration. Computational modelling revealed that the most pronounced effect of LSD was the enhancement of the reward learning rate. The punishment learning rate was also elevated. Stimulus stickiness was decreased by LSD, reflecting heightened exploration. Reinforcement sensitivity differed by phase.

Conclusions

Increased RL rates suggest LSD induced a state of heightened plasticity. These results indicate a potential mechanism through which revision of maladaptive associations could occur in the clinical application of LSD.

References (29)

Papers cited by this study that are also in Blossom

Double-blind comparison of the two hallucinogens psilocybin and dextromethorphan: effects on cognition

Barrett, F. S., Carbonaro, T. M., Hurwitz, E. et al. · Psychopharmacology (2018)

78 cited

Psilocybin-assisted treatment for alcohol dependence: a proof-of-concept study

Bogenschutz, M. P., Forcehimes, A. A., Pommy, J. A. et al. · Journal of Psychopharmacology (2015)

Trial of Psilocybin versus Escitalopram for Depression

Carhart-Harris, R. L., Giribaldi, B., Watts, R. et al. · New England Journal of Medicine (2021)

927 cited

Psilocybin with psychological support for treatment-resistant depression: six-month follow-up

Carhart-Harris, R. L., Bolstridge, &. M., Day, C. M. J. et al. · Psychopharmacology (2017)

Psilocybin with psychological support for treatment-resistant depression: an open-label feasibility study

Introduction

Research into LSD as a potential psychiatric treatment has re-emerged, with theorised benefits centred on effects on learning and neural plasticity. The serotonin 2A (5-HT2A) receptor is considered a key target mediating psychedelic effects and plasticity, and animal studies have shown that 5-HT2A agonists and LSD can influence associative learning. Serotonin and dopamine systems are both implicated in adaptive behavioural flexibility and reversal learning, and prior human studies of LSD and other psychedelics have examined a range of cognitive and affective domains but have not comprehensively addressed the mechanisms of instrumental learning and probabilistic choice under uncertainty in humans. The authors set out to test how LSD alters instrumental conditioning and cognitive flexibility using a probabilistic reversal learning (PRL) paradigm in healthy volunteers. They aimed both to assess overt behavioural changes using standard measures (for example win-stay/lose-stay and perseveration) and to probe underlying learning mechanisms by fitting computational reinforcement-learning (RL) models. Specific questions included whether LSD modulates sensitivity to immediate feedback, changes the rate at which choice values are updated (learning rates), and alters exploratory behaviour (indexed as reinforcement sensitivity and stimulus stickiness).

Methods

Nineteen healthy volunteers (mean age 30.6 years; 15 males) completed a single-blind, within-subjects, balanced-order study in which they attended two sessions at least two weeks apart and received intravenous LSD (75 μg in 10 mL saline) or placebo (10 mL saline). One participant from the original sample of 20 did not complete the PRL task and was excluded; all participants had prior experience with a classic psychedelic without adverse reaction and met medical, psychiatric, and drug-use screening criteria. Participants were blinded to condition but experimenters were not. A cannula was used for a two-minute injection; participants reported subjective effects 5–15 minutes after dosing and the PRL task was administered approximately five hours after injection. The study formed part of a larger project; supplementary demographic and procedural details were reported elsewhere. The behavioural task was a three-choice probabilistic reversal learning paradigm of 80 trials (40 acquisition, 40 reversal). On each trial participants chose one of three visual stimuli. During acquisition one stimulus yielded positive feedback on 75% of trials, a second on 50% (neutral), and a third on 25%; after 40 trials the 75% and 25% contingencies were reversed. Positive and negative feedback were signalled by emotive faces. Raw behavioural measures included the number of responses to each stimulus, win-stay and lose-stay probabilities (defined as the proportion of occasions a choice was repeated following a win or loss, respectively), and perseverative errors in the reversal phase (defined as two or more responses to the formerly correct stimulus after reversal, with the first reversal trial excluded). Statistical testing used conventional null-hypothesis methods with α = 0.05 for raw measures. To characterise latent learning processes, the authors fitted three reinforcement-learning models to trial-by-trial choices using a hierarchical Bayesian approach implemented with Hamiltonian Markov chain Monte Carlo sampling in Stan. Models were compared with bridge sampling to estimate marginal likelihoods and posterior model probabilities. Convergence was checked with the potential scale reduction factor R (< 1.2 criterion). The hierarchical structure modelled drug condition at the group level and subjects below that, allowing within-subject comparisons of parameter means between LSD and placebo. Posterior parameter differences were interpreted using highest posterior density intervals (HDIs). The three candidate models comprised: (1) separate reward and punishment learning rates plus a reinforcement-sensitivity parameter; (2) a single learning rate plus a stimulus stickiness parameter capturing outcome-independent tendency to repeat the previous choice; and (3) a full model combining separate reward and punishment learning rates with stimulus stickiness. Key parameters were reward learning rate (α_rew), punishment learning rate (α_pun), reinforcement sensitivity (τ_reinf; akin to value-based exploitation), and stimulus stickiness (τ_stim; value-free tendency to repeat choices).

Results

On overt task performance, repeated-measures ANOVA showed a strong main effect of stimulus type and a stimulus-by-phase interaction, indicating participants chose according to the reinforcement probabilities and adapted after reversal. There was no overall effect of LSD on the number of correct responses during acquisition (paired t test, t18 = 0.84, p = 0.4) or reversal (t18 = 0.23, p = 0.8), and no reliable interactions between LSD and stimulus or phase in the ANOVA. Examining the relationship between initial learning and perseveration, LSD strengthened the coupling between acquisition performance and subsequent perseverative errors: the change in acquisition correct responses (LSD minus placebo) predicted the change in reversal perseverative errors (LSD minus placebo) with a regression coefficient β = 0.56 (p = 0.002). Under LSD fewer acquisition errors predicted more perseverative errors (β = 0.44, p = 0.003), whereas no such relationship was present under placebo. However, perseverative errors considered alone did not differ between conditions (t18 = 0.03, p = 0.98). For immediate feedback sensitivity, participants stayed more after wins than losses (main effect of valence, F1,18 = 37.76, p = 8.0 × 10^-6) but LSD had no main effect and did not interact with valence (no significant drug or valence×drug effects). Model comparison showed good convergence for all three RL models (R < 1.2) and that behaviour was best captured by the full four-parameter model (separate reward and punishment learning rates, reinforcement sensitivity, stimulus stickiness). Model-derived parameter effects across the full 80 trials were: a substantial increase in reward learning rate under LSD (mean LSD 0.87 v. placebo 0.28), with the posterior 99.9% HDI for the difference excluding zero. Punishment learning rate was also elevated (LSD mean 0.48 v. placebo 0.39) with the 99% HDI excluding zero. Net, LSD increased reward learning more than punishment learning (difference excluded zero at high credibility). Phase-specific modelling indicated that during acquisition the reward learning rate was markedly higher under LSD (mean 0.72 v. placebo 0.17; 99% HDI excluding zero) while punishment learning during acquisition was not significantly elevated. During reversal both reward (LSD mean 0.96 v. placebo 0.77) and punishment learning rates (LSD mean 0.42 v. placebo 0.31) were elevated under LSD (drug differences credible at the 90% HDI), and during reversal there was no credible difference in LSD's effect on reward versus punishment rates. Regarding exploratory/exploitative parameters, modelling the entire task showed stimulus stickiness was decreased by LSD (LSD mean 0.23 v. placebo 0.43; drug difference credible at the 90% HDI), indicating increased value-free exploration. Reinforcement sensitivity across the full task was not credibly changed (LSD mean 4.70 v. placebo 5.57; 95% HDI included zero). Phase-wise, during acquisition LSD reduced both stimulus stickiness (0.09 v. 0.46) and reinforcement sensitivity (4.92 v. 6.54), implying increased exploration by both value-free and value-based metrics. During reversal, stimulus stickiness remained lower under LSD (0.36 v. 0.58) but reinforcement sensitivity was increased under LSD (3.64 v. 2.47), suggesting greater trial-by-trial value-driven exploitation in the reversal phase. Linking model parameters to raw measures, a higher reward learning rate during acquisition under LSD predicted more perseverative errors (β = 26.94, p = 0.02), an effect not present under placebo. Stimulus stickiness during reversal did not significantly correlate with perseveration in either condition. Additional exploratory analyses were reported in supplementary materials.

Discussion

The authors interpret their findings as showing that LSD enhances the rate at which human participants update value representations following prediction errors, particularly for rewards, and promotes exploratory behaviour. They emphasise that increased learning rates are compatible with theoretical accounts that psychedelics relax prior beliefs and thereby increase sensitivity to new information, a mechanism that could plausibly support revision of entrenched maladaptive associations in psychiatric disorders. Two distinct forms of increased exploration were reported: a value-free reduction in stimulus stickiness (greater tendency to switch irrespective of outcome), present across acquisition and reversal, and phase-dependent changes in value-based exploration (reinforcement sensitivity was reduced during acquisition but increased during reversal). The reward learning rate was elevated in both acquisition and reversal, whereas punishment learning rate increases were most evident in the reversal phase. The authors note that under LSD better initial acquisition predicted greater perseverative responding upon reversal, suggesting that associations formed during LSD may become more strongly expressed and harder to update if both learning and later testing occur under the drug. The discussion recognises uncertainty about the precise neurochemical mechanisms, noting candidate roles for 5-HT2A-mediated serotonergic effects and for dopaminergic influences (including interactions between the two systems). The authors cite converging animal and genetic findings that link serotonergic and dopaminergic manipulations to changes in learning rates and stimulus stickiness, but stress that LSD acts at multiple receptor types and that the present design cannot dissociate these. They also highlight timing as a critical factor: when acquisition occurs before drug administration, LSD (or 5-HT2A modulation) can improve subsequent reversal learning, whereas when acquisition and reversal are both conducted under the drug, newly formed priors may be reinforced. Limitations acknowledged by the authors include the inability to determine receptor-specific causes because LSD binds many targets beyond 5-HT2A, the absence of concurrent measurements of subjective effects or plasma LSD levels at the time of task administration, and an inability to reproduce the learning–perseveration relationship in simulated data despite successful parameter recovery analyses. The authors conclude that the principal finding is that LSD enhances feedback-driven belief updating (most strongly for reward) and increases exploration, with potential relevance for understanding how LSD might be harnessed therapeutically to revise harmful associations.

View full paper sections

SUBJECTS AND DRUG ADMINISTRATION

Nineteen healthy volunteers (mean age 30.6; 15 males), over the age of 21, attended two sessions at least two weeks apart where they received either intravenous LSD (75 μg in 10 mL saline) or placebo (10 mL saline), in a single-blind within-subjects balanced-order design. Whereas 20 participants were included in the original study, one participant did not complete the PRL task; therefore, 19 participants are reported here. Demographic information is provided in online Supplementary Table. All participants provided written informed consent after briefing on the study and screening. Participants had no personal history of diagnosed psychiatric disorder, or immediate family history of a psychotic disorder. Other inclusion criteria were a normal electrocardiogram (ECG), normal screening blood tests, negative urine tests for pregnancy and recent recreational drug use, a negative breathalyser test for recent alcohol use, alcohol use limited to less than 40 UK units per week, and absence of a significant medical condition. Participants had previous experience with a classic psychedelic drug [e.g. LSD, mescaline, psilocybin/magic mushrooms, or dimethyltryptamine (DMT)/ayahuasca] without an adverse reaction, and had not used these within six weeks of the study. Screening was conducted at the Imperial College London Clinical Research Facility (ICRF) at the Hammersmith Hospital campus, and the study was carried out at the Cardiff University Brain Research Imaging Centre (CUBRIC). Participants were blinded to the condition but the experimenters were not. A cannula was inserted and secured in the antecubital fossa and injection was performed over the course of two minutes. Participants reported noticing subjective effects of LSD five to 15 min after dosing. The PRL task was administered approximately five hours after injection. Once the subjective drug effects subsided, a psychiatrist assessed suitability for discharge. This experiment was part of a larger study, the data from which are published elsewhere (e.g.. Additional information can be found in.

PROBABILISTIC REVERSAL LEARNING TASK

A schematic of the task is shown in Fig.. On every trial, participants could choose from three visual stimuli, presented at three of four randomised locations on a computer screen. In the first half of the task (40 trials), choosing one of the stimuli resulted in positive feedback in the form of a green smiling face on 75% of trials. A second stimulus resulted in positive feedback 50% of the time, whilst the third stimulus yielded positive feedback on only 25% of trials. Negative feedback was provided in the form of a red frowning face. The first stimulus selected was defined as the initially rewarded stimulus; the choice on trial 1 always resulted in reward. The second stimulus that was selected was defined as the mostly punished stimulus, and by definition the third stimulus was then the 'neutral' stimulus. After 40 trials, the most and least optimal stimuli reversed, such that the stimulus that initially was correct 75% of the time was then only correct 25% of the time, and likewise the 25% correct stimulus then resulted in positive feedback on 75% of trials. There were 40 trials in the reversal phase. This is a recently developed versionof a widely used PRL tasknovel due to the addition of a 50% 'neutral' stimulus in order to distinguish learning to select the mostly rewarding stimulus from learning to avoid the mostly punishing stimulus.

RAW DATA MEASURES OF BEHAVIOUR

We examined whether LSD impaired participants' basic overall ability to perform the task by analysing the number of responses made to each stimulus during the acquisition and reversal phases. We measured feedback sensitivity by determining whether participants stayed with the same choice following positive or negative feedback (win-stay or lose-stay). The win-stay probability was defined as the number of times an individual repeated a choice after a win, divided by the number of trials on which positive feedback occurred (opportunities to stay after a win). Lose-stay probability was calculated in the same manner: the number of times a choice was repeated following a loss, divided by the total losses experienced. Note that in previous studies with a choice between only two stimuli (or responses), this metric is usually referred to as 'win-stay/lose-shift', which also captures the tendency to repeat (rather than switch) responses following a win, and the tendency to switch (rather than repeat) choices following a loss. Random choice would result in 50% win-stay and 50% lose-shift; however, in the current paradigm with 3 stimuli, this base rate is 33% (win-)stay and 67% (lose-)shift. We therefore encode both variables with respect to the stay (rather than shift) rate, but they are still conceptually identical to earlier studies. Perseveration was defined according to denand was assessed based on responses in the reversal phase. A perseverative error occurred when two or more (now incorrect) responses were made to the previously correct stimulus, and these errors could occur at any point in the reversal phase. The first trial in the reversal phase (trial 41 of 80) was excluded from the perseveration analysis, however, as at that point behaviour cannot yet have been shaped by the new feedback structure. Note again that this metric is not entirely identical to the previous studies cited employing two stimuli, as the base-rate choice for each stimulus is now 1/3, so the 'chance' level of perseverative errors is lower. Null hypothesis significance tests used α = 0.05.

MODEL FITTING, COMPARISON, AND INTERPRETATION

These methods are based on our previous work. We fitted three RL models to the behavioural data using a hierarchical Bayesian method, via Hamiltonian Markov chain Monte Carlo sampling implemented in Stan 2.17.2. Convergence was checked according to R, the potential scale reduction factor measure, which approaches 1 for perfect convergence. Values below 1.2 are typically used as a guideline for determining model convergence. We assumed the three models had the same prior probability (0.33). Models were compared via a bridge sampling estimate of the marginal likelihood, using the 'bridgesampling' package in R. Bridge sampling directly estimates the marginal likelihood, and therefore the posterior probability of each model given the data (and prior model probabilities), as well as the assumption that the models represent the entire group of those to be considered. Posterior distributions were interpreted using the 95% highest posterior density interval (HDI), which is the Bayesian 'credible interval.' Parameter recovery for this modelling approach has been confirmed in a previous studyand is demonstrated in the online Supplementary material. The Bayesian hierarchy consisted of 'drug condition' at the highest level, and 'subject' at the level below. For each parameter, each drug condition (e.g. LSD) had its own mean (with a prior that was the same across conditions, i.e. with priors that were unbiased with respect to LSD v. placebo). This was then merged with the intersubject variability (assumed to be normally distributed; mean 0 by definition, standard deviation determined by a further prior). The priors used for each parameter are shown in Table. For instance, the learning rate for a given subject under LSD was taken as: the group mean LSD value for learning rate, plus the subject-specific component of learning rate. The learning rate for a given subject under placebo was taken as: the group mean placebo value for learning rate, plus the subjectspecific component of the learning rate for the same subject. This method accounts for the within-subjects structure of the study design. This was done similarly (and separately) for all other model parameters. To determine the change (LSDplacebo) in parameters, we calculated [group mean LSD learning rate] -[group mean placebo learning rate] for each of the ∼8000 simulation runs and tested them against zero via the HDI. This approach also removes distributional assumptions and provides an automatic multiple comparisons correction.

MODELS

The parameters contained in each model are summarised in Tablesand. With Model 1, we tested the hypothesis that positive v. negative feedback guides behaviour differentially, and that LSD affects this. We augmented a basic RL model) with separate learning rates for reward, α rew , and punishment, α pun . Positive feedback led to an increase in the value V i of the stimulus i that was chosen, at a speed governed by the reward learning rate, α rew , via R t represents the outcome on trial t (defined as 1 on trials where positive feedback occurred), and (R t -V i,t ) the prediction error. On trials where negative feedback occurred, R t = 0, which led to a decrease in value of V i at a speed governed by the punishment learning rate, α pun , according to V i,t+1 ← V i,t + α pun (R t -V i,t ). Stimulus value was incorporated into the final quantity controlling choice according to Q reinf t = τ reinf V t . The additional parameter τ reinf , termed reinforcement sensitivity, governs the degree to which behaviour is driven by reinforcement history. The quantities Q associated with the three available choices, for a given trial, were then fed into a standard softmax choice function to compute the probability of each choice: for n = 3 choice options. The probability values for each trial emerging from the softmax function (the probability of choosing stimulus 1) were fitted to the subject's actual choices (did the subject choose stimulus 1?). No further softmax inverse temperature was applied (β = 1; see below), and as a result the reinforcement sensitivity parameter (τ reinf ) directly represented the weight given to the exponents in the softmax function. Model 2 again augmented a simple RL model, but now also described the tendency to repeat a response, irrespective of the outcome that followed it (in other words, the tendency to 'stay' regardless of outcome). With Model 2 we tested the hypothesis that LSD affects this basic perseverative tendency. This was implemented using a 'stimulus stickiness' parameter, τ stim . The stimulus stickiness effect was modelled as Q stim t = τ stim s t-1 , where s t-1 was 1 for the stimulus that was chosen on the previous trial and was 0 for the other two stimuli. In this model, we used only a single RL rate, α reinf . Positive reinforcement led to an increase in the value V i of the stimulus i that was chosen, at a speed controlled by the learning rate, α reinf , via V i,t+1 ← V i,t + α reinf (R t -V i,t ). The final quantity controlling choice incorporated the additional stickiness parameter as . Quantities Q, corresponding to the three choice options on a given trial, were then fed into the softmax function as above. It should be noted that if τ stim is not in the model (or is zero), then τ reinf is mathematically identical to the notion of softmax inverse temperature typically implemented as β. The notation τ reinf is used, however, because it contributes to Q reinf t but not to Q stim t . A standard implementation of β, by contrast, would govern the effects of both Q reinf t and Q stim t by weighting the sum of the two (Q t ). Model 3 was the full model that incorporated separate reward and punishment learning rates as well as the stimulus stickiness parameter. With Model 3, we tested the hypothesis that LSD affects both how positive v. negative feedback guides behaviour differentially, and how LSD affects a basic perseverative tendency. Again, the final quantity controlling choice was determined by

LEARNING AND PERSEVERATION

First, we examined whether LSD altered participants' overall ability to choose the stimulus that led to reward most of the time. Behavioural performance is depicted in Figsand. To examine whether LSD affected the number of times each stimulus was chosen, repeated-measures analysis of variance (ANOVA) was conducted with drug (LSD, placebo), phase (acquisition, reversal), and stimulus type (75, 50, or 25% rewarded) as within-subjects factors. This revealed a main effect of stimulus (F 1,23 = 30.66, p = 3 × 10 -6 , η p 2 = 0.63), a stimulus × phase interaction (F = 28.62, p = 2 × 10 -6 , η p 2 = 0.61), and no interaction of LSD with stimulus or phase (F < 1.5, p > 0.24, η p 2 < 0.08, for terms involving LSD). The number of correct responses did not differ between placebo and LSD during the acquisition (pairedsample t test, t 18 = 0.84, p = 0.4, d = 0.19) or reversal phases (t 18 = 0.23, p = 0.8, d = 0.05). We then examined the relationship between initial learning and perseveration, following den Ouden et al. () (Fig.). LSD enhanced the relationship between the number of correct responses during the acquisition phase and the number of perseverative errors made during the subsequent reversal stage [acquisition correct responses (LSD minus placebo) v. reversal perseverative errors (LSD minus placebo): linear regression coefficient β = 0.56, p = 0.002]. Confirming this, making fewer errors during the acquisition phase predicted more perseverative errors when on LSD (β = 0.44, p = 0.003) but not when under placebo (β = 0.04, p = 0.8). Perseverative errors, a subset of all reversal errors, alone did not differ between conditions (t 18 = 0.03, p = 0.98, d = 0.01).

FEEDBACK SENSITIVITY

We next assessed whether LSD influenced individuals' responses on trials immediately after positive v. negative feedbackwhether participants stayed with the same choice after a win or a loss (win-stay/lose-stay; Figure). Repeated-measures ANOVA with drug (LSD, placebo) and valence (win, loss) as within-subjects factors revealed a main effect of valenceparticipants 'stayed' more after wins than losses (F 1,18 = 37.76, p = 8.0 × 10 -6 , η p 2 = 0.68)and no main effect of LSD (F 1,18 = 0.20, p = 0.66, η p 2 = 0.01). There was also no interaction of valence × LSD (F 1,18 = 0.63, p = 0.44, η p 2 = 0.03).

CHOICE OF REINFORCEMENT LEARNING MODEL

The core modelling results are displayed in Fig.. We fitted and compared three RL models. Convergence was good with all three models having R < 1.2. Behaviour was best characterised by a RL model with four parameters (Table). The four parameters in the winning model were: (1) reward learning rate, which reflects the degree to which the chosen stimulus value is increased following a positive outcome; (2) punishment learning rate, the degree to which the chosen stimulus value is decreased following a negative outcome; (3) reinforcement sensitivity, the degree to which the values learned through reinforcement contribute to final choice; and (4) 'stimulus stickiness', which quantifies the tendency to get 'stuck' to a stimulus and choose it because it was chosen on the previous trial, irrespective of the outcome. The last two parameters resemble the explore/exploit trade-off: low values of stickiness or reinforcement sensitivity characterise two different types of exploratory behaviour.

REWARD AND PUNISHMENT LEARNING RATES

First, we modelled all 80 trials in the task (both acquisition and reversal phases) and these results are depicted in Fig.. The reward learning rate was significantly elevated on LSD (mean 0.87) compared to placebo (mean 0.28) [with the posterior 99.9% HDI of the difference between these means excluding zero; 0 ∉ 99.9% HDI]. There was also an increased punishment learning rate under LSD (mean 0.48) relative to placebo (mean 0.39) (drug difference, 0 ∉ 99% HDI; Figure99% HDIs not shown graphically). LSD increased the reward learning rate to a greater extent than the punishment learning rate [(α rew,LSDα rew,placebo ) -(α pun,LSD -α pun,placebo ) > 0; drug difference, 0 ∉ 99% HDI]. To better understand how LSD affected the dynamics of flexible choice behaviour, we then modelled the acquisition and reversal phases separately (40 trials each). During acquisition (Fig.), the reward learning rate was elevated under LSD (mean 0.72) compared to placebo (mean 0.17) (drug difference, 0 ∉ 99% HDI). The punishment learning rate during acquisition, meanwhile, was not significantly elevated under LSD (mean 0.34) compared to placebo (mean 0.47) (no drug difference, 0 ∈ 90% HDI). LSD increased the reward learning rate more than the punishment learning rate [(α rew,LSD -α rew,placebo ) -(α pun,LSD -α pun,placebo ) > 0; drug difference, 0 ∉ 99.9% HDI]. During the reversal phase (Fig.), the reward learning rate was elevated under LSD (mean 0.96) compared to placebo (mean 0.77) (drug difference, 0 ∉ 90% HDI) as was the punishment learning rate (LSD mean 0.42; placebo mean 0.31; drug difference, 0 ∉ 90% HDI). During reversal, there was no difference between the effect of LSD on the reward learning rate v. on the punishment learning rate [(α rew,LSD -α rew,placebo ) -(α pun, LSD -α pun,placebo ) drug difference, 0 ∈ 99.9% HDI].

STIMULUS STICKINESS AND REINFORCEMENT SENSITIVITY

Modelling both acquisition and reversal contiguously, stimulus stickiness was lowered by LSD (mean 0.23) relative to placebo (mean 0.43) (drug difference, 0 ∉ 90% HDI; Figure), which is a manifestation of increased exploratory behaviour. Reinforcement sensitivity was not modulated by LSD (LSD mean 4.70, placebo mean 5.57; no drug difference, 0 ∈ 95% HDI). This is in line with the absence of an effect of LSD on the tendency to 'stay' following reward or punishment (see analysis of raw data measures above). When modelling the acquisition phase alone (Fig.), stimulus stickiness was diminished under LSD (mean 0.09) compared to placebo (mean 0.46) (drug difference, 0 ∉ 90% HDI) as was reinforcement sensitivity (LSD mean 4.92; placebo mean 6.54; drug difference, 0 ∉ 90% HDI). In other words, during acquisition, behaviour under LSD was more exploratory as assessed by two metricsone value-based (reinforcement sensitivity) and one value-free (stimulus stickiness). When modelling the reversal phase alone (Fig.), stimulus stickiness remained decreased under LSD (mean 0.36) compared to placebo (mean 0.58) (drug difference, 0 ∉ 90% HDI), as during acquisition. Reinforcement sensitivity, however, which had been decreased under LSD during acquisition, was instead increased under LSD during the reversal phase (LSD mean 3.64; placebo mean 2.47; drug difference, 0 ∉ 90% HDI).

RELATIONSHIP BETWEEN MODEL PARAMETERS AND RAW DATA BEHAVIOURAL MEASURES

Analyses to understand the relationship between computational and raw data measures were conducted. Given the initial finding on the relationship between better acquisition learning and perseveration, the first question addressed was whether the elevated reward learning rate under LSD during acquisition, from the computational model, was predictive of the raw data measure of perseveration from den. Simple linear regression showed that under LSD, a higher reward learning rate during acquisition predicted significantly more perseverative errors (β = 26.94, p = 0.02), whereas no such relationship was present when the same participants were under placebo (β = 9.59, p = 0.40). Next, we examined the relationship between the stimulus stickiness parameter from the computational model and the raw data measure of perseveration. Stimulus stickiness during reversal was not significantly correlated with the raw data measure of perseveration, in either the placebo (β = 4.13, p = 0.50) or LSD (β = 11.60, p = 0.09) condition. Further exploratory analyses are reported in the online Supplementary material.

DISCUSSION

There has been a recent surge of interest in the potential therapeutic effects of psychedelics, including LSD. Theorising on the mechanisms of such effects centres on their role in enhancing learning and plasticity. In the current study, we tested these postulated effects of LSD in flexible learning in humans and find that LSD increased learning rates, exploratory behaviour, and the impact of previously learnt values on subsequent perseverative behaviour. Specifically, LSD increased the speed at which value representations were updated following prediction error (the mismatch between expectations and experience). Whilst LSD enhanced the impact of both positive and negative feedback, overall it augmented learning from reward significantly more than it augmented learning from punishment. The observation that LSD enhanced learning rates may be particularly important for understanding the mechanisms through which LSD might be therapeutically useful. Psychedelic drugs have been hypothesised to destabilise pre-existing beliefs (relax prior beliefs or 'priors'), making them amenable to revision. The notion of relaxed priors is directly compatible with increased RL rates: in our study, LSD rendered subjects more sensitive to prediction errors, which naturally implies downweighting of prior beliefs. That LSD affected a fundamental belief-updating process is notable given that psychedelics are under investigation trans-diagnostically for diverse clinical disorders including depression, anxiety, alcoholand nicotine abuse, obsessive-compulsive disorder (OCD), and eating disorders. A unifying feature of these conditions is intransigent maladaptive associations in need of revision. Behaviour was more exploratory overall under LSD, as assessed computationally in two ways, consistent with theoretical accounts of psychedelic effects which have predicted increased exploratory tendencies. First, LSD decreased stimulus stickiness, which indicates a diminished tendency to repeat previously chosen options, irrespective of reinforcement history (value-free). This effect on stickiness was significant in all phases of the experimentwhen considering the entire experiment as a whole (acquisition and reversal), when examining initial learning only (acquisition), and when isolating the reversal phase. In other words, regardless of LSD-induced changes in value-guided choice strategies (elaborated upon below), LSD promoted an overall latent tendency to explore in the form of shifting between choices, irrespective of feedback and value, which was maintained during both stable and changing circumstances. That LSD lowered stimulus stickiness may also be clinically relevant: stimulus stickiness was recently shown to be abnormally high in cocaine and amphetamine use disorders. LSD also modulated value-based exploratory tendencies (indexed by the reinforcement sensitivity parameter), which, by contrast, differed by phase. When looking at the experiment as a whole, there was no effect of LSD on reinforcement sensitivity, although lack of an effect here was obscured by the following patterns: When examining initial learning only, reinforcement sensitivity was substantially diminished under LSD, indicating a tendency for increased exploration away from the more highly valued choice option. During the reversal phase, meanwhile, reinforcement sensitivity was increased, indicative of a heightened tendency to exploit the choice option that was computed to be more highly valued trial-by-trial, which can be seen as adaptive when circumstances change, and rapid reorienting of actions is required. A shift in the computations underlying choice was also observed in relation to RL rates, during learning to maximise reward and minimise punishment in an initial situation and when adapting actions following contingency reversal. Whereas overall, LSD enhanced both the reward and punishment rates (especially for rewards), the increase in punishment learning rate appeared during the reversal phase only. The reward learning rate was elevated in both the acquisition and reversal phases. Together, these learning rate findings suggest that LSD accelerates the updating of value, in a way that is (overall) especially rewarddriven, and LSD speeds up learning from negative feedback that is encountered when circumstances change. Under LSD, better initial learning led to more perseverative responding. The implication is that when a behaviour is newly and more strongly learned through positive reinforcement (i.e. the acquisition phase) under LSD, it may persist more strongly even when that action is no longer relevant (i.e. the reversal phase). These measures of overt performance defined based on feedback are orthogonal to an overall latent tendency towards exploration irrespective of reinforcement history (low stimulus stickiness). Importantly, perseverationitself, as assessed in the analysis of raw data measures, was not elevated by LSD, nor did it correlate with stimulus stickiness (online Supplementary Table). Given the broad effect of LSD on a range of neurotransmitter systems, it is not possible to determine the specific neurochemical mechanism underlying the observed LSD effects on learning. Nonetheless, obvious possibilities involve the serotonin and dopamine systems, in particular 5-HT 2A and D 2 receptors. Specifically, the psychological plasticity purportedly promoted by psychedelics is believed to be mediated through action at 5-HT 2A receptors (Carhart-Harris & Nutt, 2017) via downstream enhancement of glutamatergic activityand brain-derived neurotrophic factor (BDNF) expression. The hypothesis that the present results regarding RL rates are driven by the serotonergic effects of LSD is supported by two recent studies in mice. Optogenetically stimulating dorsal raphé serotonin neurons enhanced RL rates, whilst activation of these neurons tracked both reward and punishment prediction errors during reversal learning. Neurotoxic manipulation of serotonin in marmoset monkeys during PRL, meanwhile, altered stimulus stickiness: this implicates a serotonergic mechanism underlying increased exploratory behaviour following LSD administration in the present study. In addition to affecting the serotonin system, however, LSD also acts at dopamine receptors, albeit with a far lower direct affinity for dopamine receptors than for 5-HT receptors. Dopamine has long been known to play a crucial role in belief updating following reward, and more recent evidence shows that dopaminergic manipulations may alter learning rates. A dopaminergic effect would be in line with our previous study where genetic variation in the dopamine, but not serotonin transporter polymorphism, was associated with the same enhanced relationship between acquisition and perseveration as reported here under LSD. Serotonin-dopamine interactions represent another candidate mechanism that could underlie the present findings. For example, stimulation of 5-HT 2A receptors in the prefrontal cortex of the rat Psychological Medicine enhanced ventral tegmental area dopaminergic activity. Indeed, the initial action of LSD at 5-HT 2A receptors has been proposed to sensitise dopamine neuron firing. LSD action at D 2 receptors, albeit with a low binding affinity, may be more pronounced in a late phase of LSD's effects, which may be relevant given the relatively long delay between LSD administration and performance of the current task (see Methods). However, arguing against a late dopaminergic effect is a previous study in rodents where the effects of LSD on reversal learning were consistent across four different time lags between drug administration and behavioural testing. The result of the enhanced coupling of acquisition learning and perseverative responding under LSD is in line with a recent study showing that LSD induced higher-order cognitive inflexibility in a set-shifting paradigm. Importantly, these effects were blocked by co-administration of the 5-HT 2A antagonist ketanserin, showing that the LSD-induced impairments were mediated by 5-HT 2A agonism, consistent with a 5-HT 2A mechanism underlying the present results. LSD's effects to increase acquisition-perseveration coupling and worsen set-shifting, in conjunction, suggest that what is newly or recently learnt through reinforcement under LSD is more 'stamped in', and thus may subsequently be harder to update. Whilst these findings are ostensibly at odds with the observation that LSD enhanced plasticity (through enhanced learning rates), they can be reconciled by considering the timing of drug administration with respect to initial learning and tests of cognitive flexibility. In both the present experiment and the previous set-shifting study, all phases of learning (acquisition and reversal) were conducted after LSD administration. In contrast, when acquisition learning was conducted prior to LSD administration, LSD resulted in improved reversal learning (using a reversal paradigm in rats;. Likewise, when acquisition learning was conducted prior to the administration of a 5-HT 2A antagonist, reversal learning was impaired; also see. Collectively, these findings suggest that whether a prior belief is down-or up-weighted under LSD may depend on whether the prior is formed before or during drug administration, respectively. This observation is of great relevance for a putative therapeutic setting, where maladaptive beliefs will have been formed before treatment. Another important consideration for reconciling the effects of 5-HT 2A receptor modulation on behavioural/cognitive flexibility is that 5-HT 2A antagonism can produce opposite effects depending on whether the OFC or striatum is targeted, complicating the interpretation of studies employing systemic administration. Species, strain, dose, compound, route of administration, task specifications (and engagement of cortical and subcortical structures), and reinforcement schedule must also be considered. The application of computational modelling may also help unify effects across studies and species. While we observed an effect of LSD on acquisitionperseveration coupling, reminiscent of a previous similar observation as a function of genetic variability in the dopamine transporter, we did not observe effects of LSD on acquisition performance or perseveration directly, or on lose-stay and win-stay behaviour, unexpectedly. In fact, more broadly, the effects of LSD observed here differ from the effects of neurochemically more specific influences such as acute serotonin reuptake inhibition, or neurotoxic serotonin depletion. More in line with this, previous studies with LSD administration, examining perseveration, using an outcome devaluation paradigm, found no effect of LSD, nor did a study on visual memory during paired associates learning. Our computational modelling approach, here, was more sensitive to detecting the effects of LSD. It may be possible to reconcile these robust computational effects with the minimal overt behavioural performance effects via the following speculation. Subtle differences in states of underlying plasticity may not translate to overt differences in instrumental or Pavlovian responses, even if the long-term expression of these learned responses would differ. For example, in the memory reconsolidation literature, a previously learned associative memory is believed to become susceptible to disruption (e.g. pharmacologically or behaviourally) following cued reactivation or recall for a period of several hours known as the 'reconsolidation window'. There is evidence that conducting extinction training (learning) during the reconsolidation windowwhen mechanisms of plasticity differdoes not alter the overt success or failure of extinction within the session, yet there are long-term effects; extinction learning during the reconsolidation window can be more enduring than extinction learned outside of this window. These Pavlovian extinction learning data, showing no difference during extinction itself, may parallel the instrumental conditioning data in the present study, in that we report no observable effect of LSD on most raw data measures (e.g. number of correct responses), yet latent learning processes that relate to purported mechanisms of plasticity, namely learning rate, were affected. Future studies would need to determine whether and how to harness this apparent window of heightened plasticity for therapeutic benefit. Limitations of this study include the following. We have made a case for the critical involvement of the 5-HT 2A receptor; however, we cannot be sure which particular receptor interaction(s) the current findings are caused by. LSD, in addition to binding with high affinity to 5-HT 2A receptors, acts at numerous other receptors including D 1 , D 2 , 5-HT 1A/1B/1D , 5-HT 2C , 5-HT 5A , 5-HT 6 , and 5-HT 7. Indeed, 5-HT 2C receptors can counter 5-HT 2A effects on reversal learning. A future study co-administering LSD with a 5-HT 2A antagonist would help discern the putative 5-HT 2Amediated effects. Additionally, the subjective effects and plasma levels of LSD were not measured at the time of task administration. Furthermore, even though our parameter recovery analysis was successful (see online Supplementary material), we were unable to demonstrate the initial learning-perseveration effect observed in the behavioural data in the simulated data. In summary, the core result of this study was that LSD enhanced the rate at which humans updated their beliefs based on feedback. RL was most enhanced by LSD when receiving the reward, and to a lesser extent following punishment. LSD also increased exploratory behaviour. These findings have implications for understanding the mechanisms through which LSD might be therapeutically useful for revising deleterious associations. Supplementary material. The supplementary material for this article can be found at

Company

Product

Legal