Cited guideEvidence

Trial Follow-Up vs HTA Horizon

A visual explanation of why pivotal endpoints, observed follow-up, and HTA model horizons answer different questions for high-touch psychedelic therapies.

Pivotal endpoint: 3-6 weeks
Observed follow-up: 6-52 weeks
HTA model horizon: Years+

At a glance

What to take from this page

Observed follow-up is not the same as modeled horizon.
Psychedelic follow-up evidence is strongest for symptoms and response/remission, and thinner for retreatment, resource use, and productivity.
Longer follow-up is valuable, but unblinding, expectancy, intercurrent care, and selective retention make causal interpretation harder over time.

Who this helps

Researchers

Frame follow-up evidence around the questions payers will ask later.

Investors

Check whether a readout is enough for approval, access, or both.

Drug developers

Build the bridge from pivotal endpoints to reimbursement evidence.

Step 1

Evidence

Step 2

Approval

Step 3

Reimbursement

Step 4

Delivery

Step 5

Access

Observed follow-up is not the same as modeled horizon

Later observations inform the model, but payers still need explicit assumptions about benefit waning, retreatment, downstream care, and uncertainty beyond the trial window.

Pivotal endpoint

3-6 weeks

Approval-style efficacy question

Early durability

12-26 weeks

Response, relapse, rescue care

Observed follow-up

52 weeks

Useful, but often less blinded

HTA model horizon

Years+

Waning, retreatment, cost offsets

Psychedelic trial endpoint and follow-up map

Program or study	Primary endpoint	Longest follow-up signal	Main interpretation caveat
COMP001 phase 2b TRD	MADRS change at 3 weeks after dosing	12 weeks in parent study; 52 weeks when linked to COMP004	Randomized acute evidence does not by itself settle relapse, retreatment, or health-economic durability.
COMP005 / COMP006 Phase III programme	Week 6 symptom-severity readouts in the Phase III programme	Durability and long-term safety planned around roughly one year	Company and registry readouts need translation into payer-relevant resource-use assumptions.
Usona PSIL201 MDD	MADRS change at day 43	Published randomized evidence through day 43	Blinded trial evidence is useful, but it does not yet answer one-year relapse or retreatment.
Imperial psilocybin vs escitalopram	QIDS-SR-16 change at week 6	6-month observational follow-up after blind break	Follow-up allows additional care and no longer has the same causal structure as the RCT.
Johns Hopkins MDD	GRID-HAMD at acute post-treatment timepoints	12-month prospective follow-up	Strong retention, but small academic cohort and limited generalizability to routine reimbursed TRD care.
MDMA-assisted therapy PTSD Phase III	CAPS-5 and functional impairment over 18 weeks	Observational long-term follow-up after parent studies	FDA and ICER reviews highlight functional unblinding, attrition, variable follow-up, and intercurrent care.

This table separates published evidence from planned programme horizons. Registry identifiers should be kept explicit where Blossom stores them.

Evidence horizons to keep separate

Horizon	Main question	What it can support	What remains open
3-6 weeks	Does the intervention work at the pivotal endpoint?	Primary efficacy and acute safety	Durability, relapse, retreatment, service offsets, and late safety
12-26 weeks	Is there early durability beyond the primary endpoint?	Symptom trajectory, response/remission, rescue care signals	One-year outcomes, repeated-care patterns, payer budget impact
52 weeks	What is observed over one year?	Relapse timing, new care, adverse events, functioning and QoL signals	Causal interpretation if follow-up is open-label, unblinded, or selectively continued
Years or lifetime	What should payers assume over the full economic horizon?	Cost-effectiveness, registry, managed-entry, waning and retreatment assumptions	Real-world adherence, workforce, site variation, and extrapolation uncertainty

Outcome domains and HTA maturity

Domain	Current evidence pattern	What payers still need
Symptoms and response/remission	Common across depression and PTSD trials, using MADRS, QIDS, GRID-HAMD, or CAPS-5.	Comparable definitions across studies and evidence of persistence under routine care.
Relapse and time to event	Better developed in esketamine maintenance and some COMPASS follow-up than in most psilocybin pivotal trials.	Explicit relapse-prevention and retreatment algorithms.
Retreatment and rescue care	Tracked inconsistently; observational follow-up often allows new care.	Who gets retreated, when, at what cost, and with what effect.
Functioning and quality of life	Present in several studies, but not always primary or mapped cleanly to utility.	Utility, productivity, caregiver burden, and downstream service-use data.
Safety and suicidality	Adverse events are tracked; late and rare harms remain thinner for psilocybin than for established comparators.	Large exposure totals and post-launch monitoring.

Comparator evidence patterns

Comparator	Why it helps	What not to overclaim
Spravato / esketamine	Shows how rapid antidepressant effects, supervised administration, and long-term uncertainty appear in HTA.	The visit length and therapy component differ from psilocybin-assisted therapy.
ECT / TMS	Relevant mental-health comparators with service capacity, acceptability, relapse, and maintenance-treatment questions.	Mechanism, setting, and evidence base are different enough that direct substitution is risky.
CAR-T / ATMPs	Useful for high-upfront-cost therapies with certified sites and long-term outcome uncertainty.	Oncology one-time therapies do not map neatly onto psychiatric outcomes or therapist time.
Psychotherapy episodes	Useful for thinking about sessions, fidelity, and patient selection over time.	A psychedelic dosing day adds medicine governance and acute monitoring requirements.

Three clocks are running at the same time

The pivotal endpoint asks whether the intervention works at a prespecified timepoint under trial conditions. Observed follow-up asks what happens later. The HTA model horizon asks what payers should believe over the full period in which costs and outcomes matter.

Those are related questions, not interchangeable ones. A 6-week or 12-week endpoint can support efficacy; a 6- to 12-month follow-up can support a durability signal; a payer model may still need assumptions about waning, retreatment, healthcare use, productivity, and safety over years.

What longer follow-up can and cannot solve

Longer follow-up helps payers assess relapse, rescue treatment, repeat dosing, safety, functioning, and downstream care use. COMPASS has a 52-week observational follow-up signal from COMP004, and Phase III durability readouts are important for the access story.

The limitation is that longer follow-up does not remove every interpretation problem. Functional unblinding, expectancy, treatment preference, rescue care, intercurrent therapy, missing data, and selective continuation can all make durability harder to read than a clean short-term endpoint.

Approval question: is the acute treatment effect convincing enough for regulators?
HTA question: how long does benefit last and what resources are needed after dosing?
Implementation question: who tracks relapse, safety, retreatment, and ongoing support?

Compass follow-up needs two readings

Compass' public programme gives payers more than an acute endpoint: Phase III durability, long-term follow-up, and observational evidence all help answer whether an intensive treatment episode can justify its cost and service burden.

Those data still need careful interpretation. Once participants have experienced a high-salience psychedelic session, blinding, expectancy, rescue treatment, discontinuation, and re-treatment decisions can all affect what long-term outcomes mean.

The public claim should stay precise

The careful formulation is: regulators ask whether the therapy works at the prespecified endpoint; follow-up studies ask whether benefits are observed later; HTA bodies ask what payers should assume over the period in which costs and outcomes matter.

That framing lets Blossom discuss promising durability evidence without implying that every long-term observation proves causality or cost-effectiveness.