Someone reached out to me when I was working with Optimizely. They’d signed the contract, paid for the software, and were now trying to figure out the bit they should have figured out first… who was actually going to run this thing. Should they hire in-house? Work with an agency? Some blend of both? And if in-house, what roles did they need to get started, and what did the long-term picture look like?
It’s a question I’ve been asked more times than I can count. And the honest answer is, it depends on three things.
- How big the company is?
- What technical capability already exists?
- How mature the program is right now?
Get those three things wrong and you’ll spend the first eighteen months hiring the wrong people, pointing them at the wrong problems, and wondering why nothing’s moving.
This article is the version of that conversation I wish I’d had written down to send people.
Before You Talk About Roles, Talk About What You Actually Need
Most companies start thinking about CRO team structure by asking “who should we hire?” That’s the wrong starting point. The right question is, what does this program actually need to function?
A CRO program needs four things to work.
- Someone who can generate and prioritise ideas rooted in data.
- Someone who can build experiments without depending entirely on the engineering backlog.
- Someone who can analyse results confidently.
- Someone who has enough organisational credibility to unblock things when they get stuck… because they will get stuck.
That last one sounds political… because it is. The most technically capable experimentation teams I’ve worked with have plateaued not because they lacked skill, but because they didn’t have support from above. It sucks.
When a branding team blocks a test because they don’t like the variant, or a dev team deprioritises implementation for three sprints, the solution isn’t to argue harder – that’s futile. It’s to have a stakeholder who oversees those teams who genuinely sees the value of the program. Treating it as anything else is why a lot of programs stagnate at exactly the moment they should be accelerating.
So before the org chart conversation, who in your organisation has enough influence to protect this program? If the answer is nobody yet, that’s the first gap to close, and no amount of clever hiring fixes it.
The Three Models: In-House, Agency, or Blend
Let’s get this part out of the way because it shapes everything else.
A pure agency model makes sense when you’re early, budget-constrained, or running a relatively low volume of tests. You’re essentially renting capability. The downside is that the valuable insight from experimentation will live with the agency, not with you. When the contract ends, the program will probably regress. Good agencies should be building your internal capability, not just running tests. If yours isn’t doing that, ask why.
A pure in-house model makes sense when experimentation is genuinely central to how the business grows, when test volume is high enough to justify it, and when you have the patience to build slowly and correctly. The risk here is hiring too fast before the program has a clear shape. I’ll come back to this.
The blend model is where most companies of a reasonable size land, and it can work well if the division of responsibility is clear. Agency handles execution and extra capacity. In-house owns strategy, prioritisation, and learning. Where it breaks down is when neither side owns the research and diagnosis layer. Then you get a lot of tests, a lot of activity, and not much actual understanding of your customer.
The Minimum Viable CRO Team
For a SaaS company just getting started, the minimum viable team is two people. One person who owns the strategy and analysis, and one person who can build. That’s it. Not because it’s ideal, but because anything less and you’re constantly blocked, and anything more, before you have a functioning program, is waste.
The strategy and analysis role is the critical one. This person sets the research agenda, turns data into hypotheses, interprets results correctly, and decides what gets prioritised and why. They’re the one asking “why is conversion low here?” before they ask “what should we test?” If this person isn’t experienced, the program will drift. You’ll run tests that answer the wrong questions. You’ll call tests early. You’ll ship variants that look like wins but erode long-term value in ways you won’t see until the data from your warehouse tells you something your test dashboard couldn’t.
Which is a point worth making explicitly. Most teams stop measuring at the conversion event. But the data that matters, churn, lifetime value, returns, cancellations, lives in the warehouse. A subscription business optimising for sign-up rate without knowing average membership duration is optimising for the wrong thing. The strategy person needs to understand this, or the program will be technically active and directionally wrong.
The build role, whether that’s a dedicated CRO developer or a front-end developer who’s been embedded into the team, is the other non-negotiable. Dependency on a central engineering backlog is one of the fastest ways to kill momentum in an experimentation program – I’ve been there. If every test implementation requires a sprint ticket and a two-week queue, your test velocity drops to the point where the program can’t generate enough learning to sustain itself.
The Most Important Early Hire, and What Getting It Wrong Costs You
If you’re only hiring one person to start a CRO program, that person needs to be an experienced strategist. Not a junior analyst. Not a project manager who’s done some A/B testing. Someone who has run programs before, knows what good research looks like, has made prioritisation calls under pressure, and can look at a test result and tell you what it actually means.
The mistake I see most often is hiring juniors because they’re cheaper and then wondering why the program isn’t generating insight. Juniors can be excellent. But without an experienced lead to shape the program, they default to what’s familiar… running tests on things that are easy to test, not things that are worth testing. You end up with a lot of button colour experiments and a backlog full of low-impact ideas that nobody’s questioning, because the person doing the prioritisation doesn’t have the pattern recognition yet to know which problems are worth solving.
That pattern recognition matters more in CRO than almost any other discipline I’ve worked in. Because the cost of running the wrong test isn’t just the time spent building and running it. It’s the opportunity cost of not running the right test. And in a program with limited velocity, six months of wrong tests is genuinely damaging to the business case for the program itself.
The Wine Society situation made this concrete for me. Quantifying what a losing variant would have cost if it had been shipped without testing is often more compelling to leadership than the wins. It reframes the program as risk management. And it’s only possible to have that conversation when someone in the team has the experience to look at a result and understand what it’s telling you beyond the headline numbers.
What the Team Looks Like at Scale
Once the program is functioning, the test volume is consistent, and the research process is embedded, the team structure needs to evolve. Not necessarily grow, but evolve. More people without a more defined operating model just creates noise.
At scale, a mature in-house CRO team typically has four or five distinct functions covered:
- Program strategy and research.
- Experiment design and analysis.
- Technical implementation.
- UX and design input.
- Someone owning the data infrastructure (whether that’s a dedicated analyst or an embedded relationship with the data team)
The research function becomes more important the more mature the program gets, not less. Early programs can get by on analytics and heuristics. Mature programs need qual data, session analysis, customer interviews, and the ability to connect behavioral patterns to actual user intent. This is where a lot of mid-maturity programs stall. They’ve got good test velocity but shallow research, and the ideas start to thin out because nobody’s gone back to first principles about what the customer is actually trying to do.
UX and design is another function that gets underweighted early. A CRO developer can build what’s specified, but if the variant isn’t grounded in sound UX thinking, you’re testing execution rather than ideas. The best CRO teams I’ve seen have a designer who understands experimentation, not one who treats every test as a creative brief. Those are different skill sets and it matters which one you hire.
At genuine scale, some teams add a dedicated program manager to own coordination between the CRO function and the engineering, product, and marketing teams it depends on. This role is underrated. Someone has to own the operational layer of the program, the test log, the results repository, the stakeholder comms, the post-test action process. When nobody owns it, institutional learning evaporates and the program ends up repeating mistakes it should have learned from six months ago.
How Structure Should Evolve as the Program Matures
Early programs need depth before breadth. One person who knows what they’re doing is worth more than three who don’t. The goal at this stage is to run clean experiments, build organisational trust, and establish the research process as the engine of the program, not an afterthought.
Mid-maturity programs need to start investing in infrastructure. A test velocity of two to four experiments a month is where most programs should aim to be after twelve to eighteen months. To sustain that, you need the build function to be stable and fast, the analysis layer to be repeatable and documented, and the prioritisation process to be systematic rather than whoever shouts loudest in the meeting. Frameworks like ICE or PIE can help here. Not because the scores are inherently accurate, but because the conversation the framework forces is more rigorous than gut feel.
Mature programs start to look more like a product function than a traditional CRO team. They’re running concurrent experiments across multiple surfaces. They’re feeding results back into product roadmap decisions. They’re using warehouse data to validate experiment results beyond the immediate conversion event. They are cooking. At this point the team structure has to reflect that complexity, with clear ownership of each layer and enough experience distributed through the team that the program doesn’t collapse when one person leaves.
One note on AI here, because it’s coming up in every conversation about CRO right now. Teams using it well are using it to cover ground they couldn’t cover manually:
- accessibility testing at scale
- markup consistency
- viewport analysis across devices, etc
Teams using it badly are using it to move faster before they’ve diagnosed correctly. Speed is not the constraint in most CRO programs. Clarity is. Hiring for AI capability without first having the strategic foundation in place is the same mistake as hiring juniors without a lead. You accelerate in the wrong direction faster.
Where to Start if You’re Building This Now
If you’re at the beginning of this, hire the experienced strategist first. Before the developer, before the analyst, before anyone else. Get one person in who has built programs before and let them tell you what they need. They’ll give you a better answer than any org chart template.
If you’re already running a program and things feel stuck, look at the research layer first. Not the test backlog, not the velocity, not the tooling. Who is responsible for understanding why users behave the way they do? If the answer is nobody, or it’s someone junior with no clear methodology, that’s your constraint.
If you’re trying to make the case to leadership for more resource, stop leading with wins. Lead with what the program has prevented. The tests that lost. What those variants would have cost if they’d gone live without testing. The risk management framing lands harder than the growth framing in most organisations, and it’s more honest about what experimentation actually does.
And if you’re not sure where your program sits right now, whether you’re early, mid-maturity, or genuinely scaled, the Experimentation Maturity Quiz is a useful starting point. It takes five minutes and gives you a clearer picture of the gaps before you start making hiring decisions based on what sounds right rather than what your program actually needs.





