A few years back I was brought into a retail brand to audit their A/b testing program. They were running about four tests a month. They had a testing tool, a dedicated analyst, and a backlog of ideas. On paper it looked like a functioning program. When I dug in, I found something different.

  • They had no documented hypothesis format…
  • Their sample sizes were based on gut feel…
  • They were calling winners after five days…

And when I asked what they’d learned from their last ten tests, the team lead went quiet for a second and said, “We learned what worked.”

That’s the tell. A mature A/b testing program produces knowledge. An immature one produces decisions. Those two things feel similar from the inside and look completely different from the outside.

This post is about how to tell the difference, specifically yours.

Why “We Run Tests” Doesn’t Mean You Have a Program

Running tests is table stakes. A program is something else. It’s a system that compounds. Every experiment should be feeding the next one, building a picture of your customer that makes future decisions cheaper and faster to reach.

Most teams skip that part. They pick an idea, build a variant, wait for significance, ship the winner, and move on. The learning evaporates. The next test starts from scratch. Six months later they’re testing button colours again because nobody remembers what they already know. That’s not experimentation.

The question isn’t whether you’re running tests. It’s whether your program is getting smarter over time. And to answer that honestly, you need to assess it across four specific dimensions.

The Four Dimensions of Experimentation Maturity

I use four lenses when I’m auditing a program: process, strategy, insight, and culture. Each one can be strong or weak independently of the others, and that’s exactly why most teams miss the gaps. You can have a rock-solid process and completely broken strategy sitting right next to it.

Process is the operational layer. How do you go from idea to live experiment? Is it documented? Is it repeatable? Do you have a hypothesis format that forces you to state what you expect to happen and why? Do you calculate sample size before you run, not after? Process maturity is the easiest to spot because it either exists or it doesn’t. If your QA is someone clicking around on their laptop for ten minutes before launch, your process isn’t mature yet.

Strategy is about whether your tests are connected to anything real. A mature AB testing program doesn’t have a backlog of random ideas. It has a prioritised set of questions tied to a specific problem in the funnel. If I asked you right now why you’re running the test that’s live on your site today, could you trace it back to a business problem? Not a hunch, not “we thought it might help,” but an actual diagnosed problem with evidence behind it? Most teams can’t. That gap is a strategy problem.

Insight is what you do with what you learn. This is where I see the biggest difference between early-stage and mature programs. Immature programs document results. Mature programs document learning. There’s a real difference. A result tells you what happened. A learning tells you why, and what it implies about your customer. If your test repository is a spreadsheet of green and red rows with uplift percentages, you have results. You probably don’t have insight.

Culture is the hardest one to shift and the most important for long-term success. It’s about whether the people around your experimentation program actually believe in it. Not just the CRO team. The product managers, the designers, the senior stakeholders. A mature experimentation culture is one where a losing test doesn’t trigger a post-mortem into what went wrong, it triggers a conversation about what you just learned. That reframe is everything. The reason you experiment is because you don’t know the answer. A loss is confirmation the test worked. You just found out something true that you didn’t know before.

What Mature Actually Looks Like in Practice

Let me make this concrete because maturity is one of those words that floats around without landing anywhere useful.

On process, a mature team has a hypothesis template that forces three things:

  • the change
  • the expected behaviour shift
  • and the metric it affects.

They calculate minimum detectable effect (MDE) before a test goes live. They have a QA checklist that someone other than the person who built the test signs off on. And they have a decision framework for what happens when a test is inconclusive, which will be a significant chunk of your tests if you’re being honest.

On strategy, a mature team connects its test backlog to qualitative and quantitative research. They’re not guessing what to fix. They’ve looked at session recordings, heatmaps, survey responses and funnel drop-off data + they’ve formed a view about what the actual problem is. Then they test a solution to that problem.

On insight, a mature team has a structured way to store and surface what they’ve learned. When a new team member joins, they can get up to speed on the customer model the program has built. When a stakeholder asks “have we ever tested that?”, someone can answer the question in three minutes. The knowledge compounds. It doesn’t reset every quarter.

On culture, a mature organisation has done the harder work of separating experimentation from optimisation. Optimisation assumes you know the problem and you’re just finding the best solution. Experimentation assumes you might be wrong about the problem itself. That’s a much harder thing to sell internally, especially to stakeholders who want certainty. Mature programs have usually found a way to frame experimentation as risk reduction rather than creative exploration. That reframe lands better with people who control budgets.

The Common Gap Patterns

When I audit programs, I see the same gap patterns come up repeatedly. Strong process, weak strategy is probably the most common. Teams who’ve read all the right playbooks have their operational house in order but they’re testing the wrong things. Their experiments are clean and well-run and largely irrelevant to the actual problem the business has.

Strong insight, weak culture is the most frustrating one. There’s a sharp analyst or CRO lead who’s doing genuinely good work, building real understanding of the customer. But the organisation around them doesn’t act on it. Tests get called early when stakeholders get impatient. HiPPO decisions override experiment results. The knowledge gets built and then ignored. That’s a leadership problem, not a CRO problem, and it usually means the program will plateau until something structural changes.

Weak insight across the board is more common than people admit. Most A/b testing programs don’t have a real picture of what they’ve learned about their customers over time. They have a list of what they shipped. That’s not the same thing.

And sometimes, honestly, the answer is that the company doesn’t need a mature experimentation program right now. If your traffic is too low to reach significance on meaningful tests, if you haven’t solved basic conversion problems that are obvious without testing, if your product is changing so fast that a four-week test is already outdated by the time it reads out, then CRO infrastructure is not the priority. Getting the fundamentals right first is.

How to Actually Use This Assessment

The value in a maturity assessment isn’t the score. It’s the gap. Knowing you’re strong on process and weak on insight tells you exactly where to put your energy next. You don’t need to fix everything. You need to fix the right thing.

Start by being honest about where you are rather than where you want to be. The best teams I’ve worked with are the ones that can look at their own program clearly and say “this part isn’t working yet.” That kind of honesty is actually a sign of maturity in itself.

Pick one dimension. Find the weakest link. Address that before you add more tests to the backlog.

A mature A/b testing program isn’t one that runs the most tests. It’s one that gets better at knowing what to test, why, and what to do with what it learns. That’s a compounding asset. And it’s worth building properly.

Find Out Where Your Program Actually Stands

If you want a structured way to assess your own program, take the Experimentation Maturity Quiz at quiz.kyznacademy.com. It gives you a personalised score across all four dimensions, process, strategy, insight, and culture, so you can see exactly where the gaps are and what to work on first.

Takes about five minutes. Gives you something concrete to act on.

Kyle Newsam

An optimizer by trade & lifestyle. Truly any experience or interaction becomes an experiment & something I can learn from. Currently, moving around the globe working from the coolest locations that the younger me could never have imagined.

Leave a Reply