This is a problem. Not just politically, but diagnostically. If you can’t measure the ROI of your experimentation program, you do not actually know if it is working.
Here is how to run those numbers properly.
Start With What You Are Actually Spending
Before you can measure return, you need an honest read on investment. Most teams undercount this badly.
The obvious costs are the platform licence and the analyst salary. The invisible ones are where it gets interesting. How many hours does your developer spend implementing experiments? How much of a product manager’s week is going into prioritisation calls? What about the design time on test variants? That QA pass before launch?
Add it up across a year. Not to make the number look scary, but because if you do not count it, you cannot manage it. A program running at £200k fully-loaded cost needs to clear a different bar than one running at £40k. Getting this wrong means you are either underselling your program to the business or, worse, running something that is genuinely not profitable and you have no idea.
The full investment figure is your denominator. Everything else sits on top of it.
The Three Numbers That Actually Matter
When people talk about measuring marketing ROI from experimentation, they usually jump straight to revenue lift. Revenue lift matters, but it’s one piece of a three-part picture.
- The first number is validated revenue lift. This is the cumulative revenue impact of your winning experiments, annualised. If a test produced a 4% conversion rate improvement on a £2m revenue stream, that is £80k. Across twelve months, across multiple tests, that total becomes your top-line return figure. Important: only count tests that reached statistical significance. Anything else is noise you are pretending is signal.
- The second number is experimentation velocity. How many tests are you shipping per month? Velocity matters because ROI in experimentation is a volume game. One team running two tests a month and one running ten are not playing the same sport. Higher velocity means more shots at improvement, more learning, faster iteration. It also means your cost per insight drops as your program matures, which is a story worth telling to whoever controls the budget.
- The third number is the cost of bad decisions prevented. This one is underused and genuinely powerful. Every test you run that kills a bad idea before it ships, that is value. A feature that would have cost £30k to build and hurt conversion by 8% never shipped because your A/B test caught it. That is not a failed experiment. That is the program working exactly as it should. Start logging these. Assign conservative estimates. It changes the conversation.
The Win Rate Trap
Here’s something most CRO programs get wrong. They optimise for win rate and then use it as the headline metric when talking to leadership.
Win rate is almost meaningless on its own. A team with a 60% win rate running tiny tests with minimal traffic impact is less valuable than a team with a 30% win rate who are testing bold, high-stakes hypotheses on core conversion flows. The second team is doing harder, more important work. Their lower win rate is actually evidence of intellectual honesty, not underperformance.
The reason we experiment is because we do not know the answer.
If you only run tests you are confident will win, you are not experimenting. You are validating your own assumptions and calling it science. Win rate as a primary KPI incentivises exactly that behaviour.
Report win rate if you must. But pair it with average impact per winning test and you get a far more honest picture of program health.
Making the Business Case
At some point you have to stand in a room or on a call, and argue for continued investment. The number one mistake people make here is leading with process instead of outcomes.
Nobody in a finance review cares that you have a rigorous test documentation process. They care whether the program is returning more than it costs. So lead with that.
A simple structure that works:
- here’s what we spent
- here’s what we returned in validated lift
- here’s what we stopped from shipping that would have cost us
- here’s what we expect next year based on our current velocity.
Four sentences. Everything else is backup if they ask for it.
If you’re building the case from scratch, the safest starting point is a conservative revenue multiple. Mature experimentation programs at companies running serious volume typically return somewhere between three and ten times their investment. If you are early stage, model the low end. Promise conservatively and over-deliver. The trust that builds over 18 months is worth more than any single test result.
One more thing worth saying here is not every company needs CRO investment. If your traffic is below a certain threshold, you will never reach significance fast enough for the program to pay for itself. If your conversion problem is actually a brand problem, or a product problem, or a pricing problem, testing your button colour is not going to fix it. Part of making an honest business case is being clear about the conditions under which the investment makes sense and the ones where it does not.
What a Healthy Program Looks Like on Paper
If you are looking for a rough benchmark, here is what a program worth investing in tends to look like numerically. These are not targets, they are reference points.
- Velocity of at least four to six experiments per month on meaningful traffic.
- A minimum sample of tests reaching significance before you draw conclusions about win rate.
- An annualised revenue lift figure that you can defend line by line, not estimate in aggregate.
- And a documented log of tests that prevented bad decisions, valued conservatively but consistently.
When those four things exist, you have a program you can talk about clearly. You have numbers. You can measure marketing ROI in a way that holds up to scrutiny.
When they do not exist, you have activity. Activity is not a program.
The Part Nobody Talks About
Writing and copy changes outperform layout changes in A/B tests more often than most practitioners expect. Almost nobody talks about this when they teach ROI frameworks, but it matters because copy tests are cheap to build, fast to run, and carry outsized impact on conversion. If your program is heavy on redesigns and light on messaging tests, your ROI is probably lower than it should be and your test cycle time is probably longer than it needs to be.
When you audit your experiment backlog, look at the ratio. If you see fewer than one in four tests touching copy or messaging, you have a prioritisation issue that is directly affecting the numbers you are trying to improve.
Run Your Own Numbers
Everything above is a framework. But a framework sitting in a blog post is not going to help you walk into that boardroom.
So I’m sharing a pretty powerful tool called the Impact Scorecard™. It not only allows you to work out ROI but also allows you to show the overall impact in a way that speaks to leadership. It takes your real inputs, headcount costs, platform spend, test velocity, average lift, and gives you a defensible return figure you can actually use.
It is designed for practitioners who need to make the business case and do not want to build a model from scratch. Plug in your numbers and see where you stand.
The numbers exist. You just need to go find them.





