PsilocybinAyahuascaMescalineLSD

‘Equal-unblinding’ meta-analysis of psychedelic therapy vs. antidepressants for the treatment of depression

This pre-print pre-registered meta-analysis (s=24) comparing psychedelic-assisted therapy (PAT) and open-label traditional antidepressants (tAD) for major depression found no significant difference in effectiveness between the two approaches, with both producing clinically meaningful improvements, challenging previous assumptions about PAT's superiority when accounting for the unblinding effect present in psychedelic trials.

Authors

  • Barnett, H.
  • Szigeti, B.
  • Williams, Z. J.

Published

OSF Preprints
meta Study

Abstract

Importance: Psychedelic-assisted therapy (PAT) trials have high levels of functional unblinding. This effect positively biases results when PAT trials are compared against truly blinded trials.Objective: This pre-registered meta-analysis investigated the comparative efficacy of PAT and open-label traditional antidepressants (tAD; such as SSRIs and SNRIs) for the treatment of major depression. The rationale is that PAT is effectively always open-label, thus, it is only fair to compare results against open-label tAD trials, so both interventions equally benefit from effects associated with patients knowing the treatment.Data Sources: PubMed was systematically searched for trials of PAT and open-label tAD for the treatment of major depression without comorbidity in outpatient, non-psychotic adults. Twenty-four of the initially retrieved 619 records met inclusion.Data Extraction and Synthesis: Depression scores were extracted by two independent reviewers; estimates were pooled with both Bayesian and frequentist mixed-effects models. The reporting follows the PRISMA guideline.Main Outcome(s) and Measure(s): Following pre-defined hypotheses, we compared the mean within-arm effect size from baseline to primary endpoint, i.e. the patient improvement, between PAT and open-label tAD trials on the 17-item Hamilton Depression Rating Scale. We also compared the within-arm effect size of blinded vs. open-label trials in both PAT and tAD, to assess the influence of blinding.Results: In total, eight PAT trials involving 548 patients and sixteen open-label tAD studies involving 9751 patients were included. Contrary to prior hypothesis, PAT was no more effective than open-label tAD treatment (estimated difference: 0.3 favouring open-label tAD; 95% confidence interval: −1.39 to 1.98; p = 0.730). Open-label tAD was associated with better outcomes than blinded treatment (1.3 [0.07, 2.51]; p = 0.038), but the same difference was not observed in PAT (0.4 [−2.20, 3.11]; p = 0.738).Conclusions and Relevance: Both tAD and PAT were associated with robust, statistically and clinically meaningful improvements. However, PAT’s lack of superiority compared to tADs under equal-unblinding conditions highlights the influence of blinding integrity and presents a sobering viewpoint on the treatment’s potential.

Unlocked with Blossom Pro

Research Summary of '‘Equal-unblinding’ meta-analysis of psychedelic therapy vs. antidepressants for the treatment of depression'

Introduction

Williams and colleagues frame the analysis around two contrasting pictures of antidepressant efficacy: traditional antidepressants (tADs, primarily SSRIs and SNRIs) have small treatment–placebo (between-arm) differences on the HAMD-17, while psychedelic-assisted therapy (PAT) trials report much larger between-arm effects. The introduction highlights functional unblinding — patients deducing allocation from drug effects — as especially pervasive in PAT trials because of the drugs' intense subjective effects, and notes that unblinding can inflate between-arm estimates. The authors observe limited direct head-to-head evidence comparing PAT and tAD and argue that comparing PAT trials (effectively unblinded) to truly blinded tAD trials creates an unfair advantage for PAT. This pre-registered meta-analysis therefore sets out to compare PAT against open-label tAD trials so both treatments would equally experience effects related to patients knowing their treatment. The primary aim was to compare within-arm change from baseline to primary endpoint (the patient improvement) on the HAMD-17. The authors also planned to compare blinded versus open-label trials within each treatment class to quantify the influence of blinding. The analysis focuses on outpatient, non-psychotic adult major depressive disorder and converts other depression scales into HAMD-17 equivalents for comparability.

Methods

The investigators pre-registered their search and analysis plan and searched PubMed in March 2024 for trials in adults (mean age between 18 and 65) of either open-label tADs (defined as new-generation antidepressants) or PAT using LSD, psilocybin, mescaline (including San Pedro/Peyote), or DMT/ayahuasca. Trials were excluded if conducted in inpatients, psychotic depression, or significant comorbidity (an exception was made for comorbid anxiety). Augmentation/combination trials and trials with run-in periods were excluded, except where a run-in phase itself met inclusion criteria and was treated as an open-label tAD phase. The team excluded open-label antidepressant trials with fewer than 100 participants but applied no minimum sample size for psychedelic trials. Two reviewers (BS and HB) independently screened and extracted data, resolving conflicts by consensus; missing data recovery included contacting study authors. Different depression scales (BDI-I/II, MADRS, HAMD-21, QIDS) were converted into HAMD-17 units; conversions between SDs of endpoint and change scores assumed a conservative correlation coefficient of 0.5. The primary outcome was the within-arm effect (change from baseline to primary endpoint in HAMD-17 units). Analyses used both pre-registered Bayesian and frequentist multilevel random-effects meta-analytic models that accounted for nested outcomes (different scales) and included baseline depression severity as a covariate with a fixed effect and random slope. Models testing the main hypotheses included a binary focal variable for treatment type (PAT vs open-label tAD) or for blinding (open-label vs blinded) as appropriate. Bayesian results are reported as posterior medians with 95% credible intervals (CrI95), and frequentist results as means with 95% confidence intervals (CI95); secondary analyses were conducted using frequentist models for computational efficiency. Additional procedural notes provided by the authors include handling of crossover data (extracting first-period data except for specific fixed-order designs), an explicit decision not to perform a formal risk-of-bias assessment because open-label trials are high risk by definition, and several deviations from the preregistration (for example, restricting 6–12 week primary endpoints to tAD trials only, changing the dependent variable to change scores, and simplifying random-effects structures when complex models did not converge). The included-trials set used for the primary analyses comprised 16 open-label tAD trials and 8 PAT trials from an initial pool of 619 records; an extended dataset added four near-miss PAT studies for robustness checks.

Results

The primary dataset comprised 16 open-label tAD trials (n=9,751) and 8 PAT trials (n=548). Mean time from baseline to the primary endpoint differed between groups: 8.1±1.5 weeks for tAD trials and 3.4±2.2 weeks for PAT trials. Baseline HAMD-17 scores averaged 22.7±1.9 for open-label tAD trials and 21.3±3.9 for PAT trials. Hypothesis 1 (PAT vs open-label tAD): Bayesian models estimated the within-arm change for open-label tAD at approximately −12.5 HAMD units (CrI95 around −12.9 to −12.2; SMD ≈ −2.7) and for PAT a very similar magnitude (SMD ≈ −2.6). The posterior mean for the PAT minus tAD difference was βPAT−tAD ≈ 0.25 HAMD units favouring tAD, and the posterior probability that PAT reduced depression by ≥3 HAMD units more than open-label tAD was only 0.2%. The posterior probability that the true difference lay within ±3 HAMD units (the region of practical equivalence, where 3 HAMD units was pre-specified as the minimal clinically important difference) was 99.1%. An extended dataset produced a similar estimate (βPAT−tAD=0.43, CrI95 [−0.17, 2.55]). In frequentist models the difference was likewise non-significant; a reported frequentist estimate for the between-treatment difference was ~0.3 HAMD units favouring open-label tAD (CI95 [−1.39, 1.98], p=0.730). Sub-analyses comparing PAT with SSRIs, SNRIs, and non-SSRI/non-SNRI tADs found no significant differences. Hypotheses 2 and 3 (blinded vs open-label within treatment classes): For tAD, the Bayesian estimate for blinded within-arm change was about −12.3 HAMD units (SMD ≈ −2.6) and for open-label roughly −12 HAMD units; the frequentist model found a small but statistically significant advantage for open-label tAD over blinded administration with βblind−OL = 1.29 HAMD units (CI95 [0.07, 2.51], p=0.038). The authors note this corresponds to roughly half the pre-specified MCID and that the entire CI95 falls within ±3 HAMD units, implying the magnitude is practically negligible. For PAT, neither the Bayesian nor frequentist models found a significant difference between blinded and open-label trials (frequentist βblind−OL ≈ 0.45, p=0.738), though the CI95 intersected the ±3 HAMD unit region so formal equivalence was not established. Sensitivity analyses using the extended dataset did not materially change these results. Across both treatment classes the within-arm improvement from baseline to primary endpoint was approximately 12 HAMD units, a magnitude the authors describe as both statistically and clinically large.

Discussion

Williams and colleagues interpret their main finding as a failure to confirm the preregistered hypothesis that PAT would be clinically superior to open-label tAD by at least the pre-specified MCID of 3 HAMD units. Instead, both treatments produced similar, large within-arm symptom reductions of roughly 12 HAMD units, and the between-treatment difference was negligible (~0.3 HAMD units) and statistically non-significant. The authors emphasise that this comparison—PAT versus open-label tAD—was chosen to equalise the advantage that knowledge of allocation can confer, reasoning that open-label comparisons better reflect how medicines are used in routine clinical practice. To reconcile their null finding with prior reports that PAT shows much larger between-arm effects than tAD, the investigators propose two contributors. First, open-label administration of tADs appears to modestly improve outcomes relative to blinded administration (about 1.2 HAMD units on average), which represents some benefit of patients knowing their treatment. Second, pooled evidence suggests placebo response is substantially suppressed in PAT trials (the authors cite an estimated 4.0 HAMD unit lower placebo response), producing an inflated between-arm treatment–placebo difference for PAT. Summed together, these two effects approximately account for the 5 HAMD unit apparent advantage of PAT in between-arm comparisons. The authors describe this suppressed placebo response in PAT as a potential 'knowcebo' effect arising when participants realise they are in the control arm and experience disappointment, and they note that in some PAT trials placebo arms worsened. The discussion acknowledges several limitations the authors report. Some PAT trials enrolled treatment-resistant depression (TRD) cases while no tAD trials did, but baseline severity was controlled for and sensitivity analyses excluding TRD-only PAT trials did not change the result. PAT trials are not fully unblinded in the mathematical sense (correct-guess rates are reported near 95%, not 100%), so a residual unblinding imbalance could remain, though it is likely much smaller than in a blind-versus-blind comparison. The comparison is imperfect because PAT typically includes psychotherapy while the tAD comparators were pharmacological only; the authors note no trials of open-label tAD plus therapy were available for comparison. They also underline that their analysis focused solely on symptom reduction: other outcomes such as social functioning, connectedness, meaning in life, adverse events, and longer-term follow-up metrics may differ between interventions and were not synthesised here. Finally, the authors state they did not perform a formal risk-of-bias assessment because open-label trials are inherently high risk. In their concluding interpretation, the investigators present a tempered view: PAT produces large within-arm improvements that justify cautious optimism, but PAT was not superior to open-label tAD under their equal-unblinding comparison, underscoring the influence of blinding integrity on apparent efficacy and calling for measured expectations about PAT's comparative advantage.

View full paper sections

INTRODUCTION

Current treatments for major depression have small treatment -placebo difference, also known as specific treatment effects or between-arm effect, but promising new treatments are emerging. Current treatments include serotonin reuptake inhibitors (SSRIs), serotonin-norepinephrine reuptake inhibitors (SNRIs) and a few other medications outside of these classes such as mirtazapine. Collectively we call these traditional antidepressants (tADs). A comprehensive meta-analysis found a significant tAD -placebo difference, but the magnitude of the difference was only ~2 points on the 17-item Hamilton Depression Rating Scale. This small difference has raised questions whether tADs provide a clinically meaningful benefit over placebos. Psychedelic-assisted therapy (PAT), which is the combined treatment of psychotherapy and large doses of psychedelics, has emerged as a novel depression treatment, attracting much attention. Unlike tAD, PAT studies have reported large PAT -placebo differences, with an average magnitude of ~7.3 HAMD-17 units. In open-label trials patients know their treatment, in contrast, blind trials conceal treatment allocation. Even in formally blind trials, sometimes patients can deduce their treatment allocation from side effects and/or other factors. This phenomenon is called functional unblinding, and it may inflate the drug-placebo difference. Functional unblinding is exceedingly common in psychedelic trials due to the drugs' intense subjective effects. In a truly blind trial, ~50% of patients can guess their treatment allocation correctly (assuming two arms and 1:1 allocation); however in PAT trials the correct guess rate is 90-95%even when an active placebo is used. Functional unblinding played a prominent role in the FDA's rejection of MDMA-assisted psychotherapy as a treatment for PTSD (U.S.. Researchers continue to debate whether blinding is maintained in tAD trials. Two recent meta-analyses arrived at opposite conclusions. However, the correct guess rate in blinded tAD trials is only ~60%, which is substantially lower than in PAT trials. Therefore, even if unblinding is present in tAD trials, its magnitude is much smaller than in PAT. There has been limited empirical work on the comparative efficacy of PAT vs. tAD, only one head-to-head trial was conducted. At the 6-week primary endpoint, there was no between-treatment difference on the primary outcome measure, the Quick Inventory of Depressive Symptomatology (QIDS). However, psilocybin showed significantly better improvement on all secondary depression measures (HAMD-17, Beck Depression Inventory, Montgomery-Åsberg Depression Rating Scale). The present meta-analysis aims to enhance our understanding of the comparative efficacy of PAT vs. tAD by comparing PAT to open-label tADs, so both treatments equally benefit from effects associated with patients knowing the treatment.

PRE-REGISTRATION AND HYPOTHESIS

We pre-registered both the search process and the statistical analysis plan. Some minor modifications to the pre-registered protocol were made and are described in the Supplementary materials. Previous work found that a difference of 3-5 HAMD-17 units corresponds to the minimal clinically important difference (MCID;; we used the lower bound of this estimate, i.e., MCID = 3 HAMD units. The following three hypotheses were registered: 1. At the primary endpoint, the estimated mean difference between PAT and openlabel tAD will exceed the MCID, favoring PAT. 2. At the primary endpoint, the estimated mean difference between blinded and openlabel tAD trials will exceed the MCID, favoring open-label administration. To estimate the efficacy of blinded tAD administration, we used data from3. At the primary endpoint, the estimated mean difference between blinded and openlabel PAT trials will not exceed the MCID.

SEARCH STRATEGY AND DATA EXTRACTION

We searched the PubMed database to identify trials on major depressive disorder (MDD) in the adult population (18<mean age of patients<65) where the treatment was either: • An open-label trial with a "new generational antidepressant" as defined by; these we call tAD. • A blind or open-label trial of PAT with one of the following psychedelic: lysergic acid diethylamide (LSD), psilocybin, mescaline / San Pedro / Peyote, or DMT / Ayahuasca. Trials were excluded if they were conducted in inpatients, psychotic depression, or individuals with significant comorbidity. An exception was made for comorbid anxiety, due to frequent co-occurrence with depression. Augmentation and combination trials were excluded. Trials with run-in periods were excluded; however, data from the run-in period was included as "open-label tAD" if that phase met all inclusion criteria, e.g., the open-label phase of a discontinuation trial. Retrieved manuscripts were independently marked to include/exclude by two authors (BS and HB), with conflicts resolved by consensus. The reference list of the selected manuscripts was scanned for additional trials. Variables were independently extracted by the same two authors. When data was missing, we attempted to recover it by contacting study authors.

DATA SYNTHESIS

All depression scores from the BDI1/2, MADRS, HAMD-21, and QIDS scales were transformed to HAMD-17 equivalents. In this article all HAMD units refer to the 17-item version of the scale. We converted between the standard deviations of endpoint and change scores using a conservative correlation coefficient of 0.5.

STATISTICAL MODELS

We analyzed the data using pre-registered Bayesian and frequentist models. The primary effect of interest was the within-arm effect, i.e. the change from baseline to primary endpoint in HAMD units. These within-arm effects were synthesized using meta-analytic models with a multilevel random-effect structurethat accounted for the studies' nested outcomes (MADRS, QIDS, etc.). All models included the same random effect structure, a fixed effect (and random slope) of baseline depression severity, in addition to the focal variable. Models assessing hypothesis 1 had a binary focal variable assessing the effect of treatment type (PAT vs. tAD). Models assessing hypothesis 2 / 3 compared open-label vs. blinded tAD / PAT studies, respectively. These models had a binary focal variable assessing the effect of blinding (unblinded vs. blinded). See pre-registration and Supplementary materials for further details. Bayesian model results are presented with posterior distributions of the focal variable, summarized by the posterior median and 95% credible interval (CrI95). Posterior predictive distributions are shown in the Supplementary figures to illustrate between-study heterogeneity. Frequentist model results are presented with the mean and the 95% confidence interval (CI95) of the focal variable. Secondary analyses were performed using frequentist models due to their lower computational demand.

INCLUDED TRIALS

The search process was conducted in March 2024. Of the 619 retrieved records, 38 met the inclusion criteria: 29 open-label tAD trials and 9 PAT trials. Of these 38 trials, we could extract the required variables from 24 trials, including 16 open-label tAD and 8 PAT trials, see Figurefor the search flowchart. The included trials involved a total of 9751 patients in tAD trials and 548 in PAT trials. The mean±SD time from baseline to the primary endpoint was 8.1±1.5 weeks for tAD and 3.4±2.2 weeks for PAT. At baseline, the mean±SD HAMD scores were 22.7±1.9 for tAD and 21.3±3.9 for PAT. We identified four additional PAT studies that nearly met inclusion: three with cancer comorbidity, and one with both major depression and bipolar II disorder. We included these trials in an extended dataset, which was used to test robustness of results.

HYPOTHESIS 1: DIFFERENCE BETWEEN OPEN-LABEL TAD AND PAT TREATMENT

The Bayesian model estimated that the within-arm effect of open-label tAD treatment was -12.5 HAMD units (CrI95 [-12.9, -12.2]; SMD=-2.7), while for .3], SMD=-2.6). The posterior distribution of the PAT -tAD difference had a mean of βPAT-tAD = 0.25 (CrI95), favoring tAD. The posterior probability that PAT decreased depression by ≥3 HAMD units more than open-label tAD (i.e., that H1 is true) was 0.2%. The posterior probability of the difference falling within ±3 HAMD units, i.e., "the region of practical equivalence", was 99.1%, see Figure. When using the extended dataset, the difference remained similar (βPAT-tAD=0.43, CrI95 [-0.17, 2.55], see Supplementary figures). In line with the Bayesian analysis, the frequentist model did not find a significant there was no significant difference between PAT and any SSRI. • PAT trials vs. open-label SNRIs (duloxetine, venlafaxine) trials (βPAT-tAD=-0.36, CI95. We also tested PAT vs. each SNRI separately; there was no significant difference between PAT and any SNRI. • PAT trials vs. open-label agomelatine or mirtazapine, i.e. tADs that are neither SSRIs nor SNRIs. There was no significant difference between PAT and either drug. See Supplementary Tables for complete statistical outputs.

HYPOTHESIS 2: DIFFERENCE BETWEEN BLINDED AND OPEN-LABEL TAD TREATMENT

The Bayesian model estimated that tAD's within-arm effect with blinded administration was -12.3 HAMD units (CrI95, SMD=-2.6), while with openlabel administration it was -12. In the frequentist model, there was a statistically significant difference between blinded and open-label tAD treatments. The estimated mean difference was βblind-OL=1.29 HAMD units (CI95 [0.07, 2.51], p=0.038), favoring open-label. This effect corresponds to approximately half of the MCID, and the entire CI95 falls within the region of ±3 HAMD units, meaning that the magnitude of this effect is practically zero. In concordance with the Bayesian analysis, the frequentist model did not find a significant difference between blinded and open-label PAT treatments (estimated mean difference: βblind-OL=0.45, CI95, p=0.738). Technically, the CI95 intersected the region of ±3 HAMD units, therefore equivalence was not obtained. The result did not change qualitatively when the same model was applied to the extended dataset (βblind-OL=-0.49, CI95.28], p=0.729).

DISCUSSION

The premise of this pre-registered meta-analysis is that it is biased to compare an In contrast with our prior hypothesis, we did not find treatment with PAT to be better than treatment with open-label tAD by a clinically meaningful margin (H1). Not only was the difference not clinically meaningful, but practically there was no difference at all: according to both Bayesian and frequentist estimates the difference was a negligible ~0.3 HAMD units. This finding means that tAD administered knowingly to patients, which is the case in real-life medical practice, is as effective as PAT. This result was robust across variations in study selection, including when we removed PAT trials on treatment-resistant depression. The improvement from baseline to endpoint was ~12 HAMD points for both treatments, which is highly significant both statistically and clinically. We also assessed the impact of blinding in both PAT and tAD trials. We found that for tAD (H2), but not for PAT (H3), open label is associated with better outcomes than blinded treatment. The average of the Bayesian and frequentist estimates of this effect for tAD was (0.9+1.3)/2 ~1.2 HAMD units, which is about half of the minimal clinically important difference (3 HAMD units). Thus, both frequentist and Bayesian models for H2 provided substantial evidence against the possibility that the effect of blinding was clinically significant. The finding that PAT was no more effective than open-label tAD is surprising given that PAT trials reported 7.3 HAMD points difference from placebo, while tAD trials reported only 2.4. Thus, PAT is 7.3 -2.4 = 4.9 ~ 5 points better than tAD when measured against placebo. Note that these numbers are between-arm effects (treatment vs. placebo), thus can be influenced by either arm. Two key factors explain the failure of H1 despite the 5 points 'difference of between-arm differences': 1. Open-label tAD is approximately ~1.2 HAMD units more effective than blinded treatment (H2). This effect can be interpreted as the influence of knowing one's treatment assignment, such as positive expectancy. 2. A recent meta-analysis of depression trials found that relative to tAD trials, the placebo response is 4.0 HAMD points lower in psychedelic trials. This suppressed placebo response leads to an inflated between-arm difference, as the treatment arm is measured against a lower floor. The sum of these two effects is 1.2 + 4.0 = 5.2 HAMD units which equals the difference of the reported between-arm effects (7.3 -2.4 ~ 5), explaining why hypothesis 1 failed. The suppressed placebo response in PAT trials is likely attributable to the 'knowcebo' effect, i.e. the disappointment when patients realize they are in the control group. This effect could be magnified by the PAT treatment model where patients undergo extensive therapy in preparation for a transformative, spiritual experience. Then, patients are underwhelmed on dosing days and often bored for the 6-8 hours while they need to remain on site. In PAT trials, this placebo suppression accounts for 4.0 / 7.3 ~ 55% of the total between-arm effect. In other words, ~55% of PAT's between-arm effect is not explained by improvement in the treatment arm, but rather by the lack of improvement in the placebo arm. In some psychedelic trials, the placebo group even worsened. The most comprehensive meta-analysis of tAD trials involved 304 placebo groups and patients improved in all 304 of them, highlighting how extreme it is for patients to get worse after placebo treatment. Our comparison has higher real-life validity and relevance for patients than most meta-analyses for two reasons. First, in medical practice, patients know the drug they are prescribed, i.e. treatment is open-label. Secondly, most meta-analyses exclusively focus on between-arm effects, i.e., the treatment's superiority relative to placebo. Here, we instead focused on the within-arm effect, i.e. the change from baseline to endpoint, which represents the patient improvement during the treatment. In summary, we found that for the treatment of depression, PAT is no more effective than open-label tADs. PAT demonstrated a robust and large therapeutic effects (~12 HAMD units), which justifies optimism. On the other hand, PAT's lack of superiority compared to tADs under equal-unblinding conditions highlights the influence of blinding integrity and presents a sobering viewpoint on the treatment's potential.

LIMITATIONS

Some PAT trials exclusively recruited treatment-resistant depression (TRD) patients. No tAD trial focused on TRD patients, raising the possibility that cases in PAT trials were more severe. All models controlled for baseline scores as a covariate, separating variance due to baseline depression from between-condition contrasts. Baseline scores in open-label tAD trials were slightly higher than in PAT trials: the mean±SD was 22.7±1.9 (open-label tAD) and 21.3±3.9 (PAT) HAMD units. Moreover, the lack of difference persisted in sensitivity analyses that excluded the subset of PAT trials that exclusively recruited TRD patients. Therefore, a difference in depression severity likely does not explain the lack of a between-treatment difference. The correct guess rate for treatment assignment in PAT trials is ~95%, not 100%, and likely some patients had less than absolute confidence in their treatment guess. Therefore, there is some blinding in PAT trials, unlike in open-label tAD trials where the treatment is known with certainty. Thus, some residual 'unblinding imbalance' could exist in our analysis; however, the magnitude of this imbalance is likely to be much smaller than in a 'blind PAT vs. blind tAD' comparison. PAT includes therapy, while tAD is a purely pharmacological intervention. Therefore, it would be better to compare PAT vs. 'open-label tAD + therapy', but unfortunately, no such trials were found. Likely such combined intervention would yield better results than the 'tAD alone' treatments included here. \ Our analysis examined only treatment efficacy as quantified by symptom reduction. Other key considerations of a treatment are the prevalence of adverse events, side-effects and functional improvements. For example, at the 6months follow-up, the escitalopram vs. psilocybin trial found no difference in depressive symptom reduction, but the psilocybin group experienced greater improvements in social functioning, connectedness, and meaning in life. Thus, our analysis could be enhanced by supplementing the symptom reduction data with other factors known to be relevant to patients.

NOTES ON STUDY SELECTION AND DATA EXTRACTION

• When both per protocol and intended-to-treat data was available, per protocol was carried into the analysis. • From crossover trials we extracted data from the first treatment period only, with the exception ofwho used a fixed order design with placebo first, psilocybin treatment second. • We excluded open-label antidepressant trials with fewer than 100 participants. This decision is motivated by practical reasons. There are several large sample (n>500) open-label antidepressant trials, and due to inverse variance weighting, these will largely determine the results. Small trials would only have a small influence, but would add significant workload. This minimum sample size requirement is not applied to psychedelic trials, because in this case we expect a much lower number of trials to have ever been conducted.

DEVIATIONS FROM PRE-REGISTRATION

• The registration states that "We will only consider trials that have a primary endpoint between 6 and 12 weeks". This limitation was only meant to be applicable to traditional antidepressant treatments, but not to the psychedelic trials. We added this condition as it is known that tAD's effect take at least 6 weeks to start. • The preregistration includes plans for an analysis of modulatory factors, which we defer to a later publication. • Frequentist models were constructed R's rma.mv() function. We misread the documentation that V parameter stands for variance and hence that is how the models were registered. However, V stands for variance/sample_size, this has been corrected for the analysis. • Frequentist models were registered with two sets of random effects. The preregistration is not explicit, but the intention was that the more complex structure will be tried first and if they fail then the simpler one will be use. Indeed, some of the models did not converge on the data, so the models with the simpler random effect structure are presented here • The dependent variable of the models was changed from endpoint score to change score from baseline to primary endpoint to enhance interpretability of the results. • Lines 64 -83 of the main analysis script (PsyOLAD_analysis_v2.Rmd) is a left over from a previous version of the registration. It was not intended to be part of the final registration.

RISK OF BIAS

We did not formally assess risk of bias as open-label trials are high-risk by definition.

LIST OF INCLUDED TRIALS

Open-label tAD trials: • Arns, M., Bruder, G., Hegerl, U., Spooner, C., Palmer, D. M., Etkin, A., Fallahpour, K., Gatt, J. M.,. EEG alpha asymmetry as a gender- • Husain, M. I., Foster, J. A.,. Pro-inflammatory markers are associated with response to sequential pharmacotherapy in major depressive disorder: A CAN-BIND-1 report. CNS Spectrums, 28(6), 739-746. • Kennedy, S. H., Lam, R. W., Rotzinger, S., Milev, R. V., Blier, P., Downar, J., Evans, K. R.,

Study Details

Your Library