Their trial-to-paid conversion was 8%. Nobody tested that.
That’s the problem with most landing page A/B testing programs. The mechanics are fine. The thinking is off. Teams run experiments to feel productive rather than to actually learn something useful. So before I walk you through how to do this properly, let me be clear about what we’re actually trying to do. We’re trying to reduce uncertainty about why users are or aren’t converting. Not to prove we’re right. To find out if we are.
Start With a Diagnosis, Not an Idea
Many people come to A/B testing with a list of things they want to try. New headline. Different image. Shorter form. And it’s great that they have ideas BUT that doesn’t make it a testing program.
Before you write a single hypothesis, you need to know where things are breaking. Pull your scroll depth data. Watch session recordings. Look at where people stop, where they hesitate, where they rage-click. If you’re running paid traffic to the page, check your quality score signals, your bounce rate by source, your time-on-page by device. The data will show you a gap. That gap is where you start.
Let’s say you’re running a SaaS landing page for a project management tool. Traffic is decent. Click-through on the CTA is low. You watch 40 session recordings and notice most users scroll past the headline, spend time reading the feature list, and then leave without clicking. That tells you something specific. The features aren’t connecting to the user’s actual problem. They’re reading, but not recognising themselves in what they’re reading.
This matters because writing changes outperform layout changes at a rate most CRO teams would find embarrassing if they actually tracked it. Changing the order of your sections rarely moves the needle the way rewriting your value proposition does. But layout changes are easier to brief, easier to design, easier to ship. So that’s what gets tested. The harder, higher-leverage work gets skipped.
How to Write a Hypothesis Worth Testing
A hypothesis is not “I think a shorter form will increase conversions.” That’s a guess dressed up in a lab coat.
A proper hypothesis has three parts:
- What you’re changing.
- Why you believe that change will matter, grounded in specific evidence.
- And what metric you expect to move as a result.
Here’s the structure I use.
“Because [evidence], we believe that changing [element] for [audience] will result in [outcome], which we’ll measure by [metric].”
Applied to the example above, “Because session recordings show users spending over 90 seconds on the feature list before leaving without converting, we believe that rewriting the feature descriptions to reflect the specific frustrations named in our customer interviews will increase CTA clicks, which we’ll measure by click-through rate on the primary CTA.”
That hypothesis is testable. It’s specific. And if the test loses, you’ve still learned something, because the failure tells you either the feature copy wasn’t the real blocker or the change you made didn’t land the way you thought it would. Both of those are useful.
Losses are not failures. The reason you run an experiment is because you don’t know the answer. If you already knew, you wouldn’t need the test. A loss means the page is more complex than your mental model of it. That’s information. Treat it like information.
The Practicalities: Sample Size, Runtime, and Significance
Here’s where people either get obsessive or completely sloppy. Neither is useful.
Before you run a test, calculate your required sample size. You need to know your current conversion rate, the minimum detectable effect you care about (the smallest improvement that would actually change a business decision), your desired statistical power (typically 80%), and your significance threshold (typically 95%).
There are free calculators for this. Use one. If your landing page gets 200 visitors a month and you’re looking for a 10% relative lift, the math will tell you that you need about eight months to reach significance. That’s not a test worth running. Either find a higher-traffic entry point or accept that you’re working with a smaller program and adjust accordingly.
Don’t stop tests early. This is the most common mistake I see. A test shows a lift after two weeks, someone gets excited, someone senior asks “can we just ship this?”, and the test gets called. Early stopping inflates false positive rates. The significance you see at week two is often noise. Let it run.
Runtime should cover at least two full business cycles, usually two weeks minimum, to account for day-of-week variation in user behaviour. Some audiences convert differently on Tuesdays than on Saturdays. If you stop before you’ve captured that variance, you’re guessing.
What to Actually Test on a Landing Page
Roughly in order of how often they move the needle:
- your headline and value proposition copy
- your CTA copy and placement
- your social proof (who is saying what, and where it sits on the page)
- your hero image or video (particularly the first frame)
- and your form length or lead capture flow.
The headline is the highest-leverage element on most landing pages because it’s the first filter. If the headline doesn’t match the mental state of the person arriving, everything below it gets read with less attention. For SaaS specifically, the question I always ask is does this headline describe what the product does, or does it describe what the user stops suffering from? Those are different headlines. One of them converts better almost every time.
Social proof is consistently undertested. Most teams slap a logo bar on the page and call it done. But the specificity of proof matters enormously. “We saved 5 hours a week” converts better than “We love this product.” A quote that names a recognisable company in your target vertical outperforms a generic five-star review. Test the type of proof, not just the presence of it.
Form length is more nuanced than the “shorter is better” rule suggests. For high-intent traffic, a longer form that pre-qualifies the lead can actually increase downstream conversion even if it reduces top-of-funnel submissions. Know what you’re optimising the whole funnel for, not just the page in isolation.
Reading Your Results Without Lying to Yourself
Statistical significance tells you that the result probably isn’t random. It doesn’t tell you that the result will hold. It doesn’t tell you why the variant won. It doesn’t generalise to a different traffic source or a different month.
When a test concludes, ask three questions:
- Did the primary metric move in the direction you predicted?
- Did any secondary metrics move in a direction that concerns you (sign-ups went up, but downstream activation went down)?
- And does the result match the hypothesis, or did something unexpected happen that you need to investigate?
A win that you don’t understand is nearly as dangerous as a loss you ignore. If your new headline lifted conversions by 18% and you’re not sure why, you haven’t learned enough to apply that thinking to the next test. Dig into it. Segment the results by traffic source, by device, by new versus returning. The breakdown usually reveals what’s actually driving the lift.
One Last Thing
Not every landing page problem is a testing problem. Some pages need a complete strategic rethink. Some companies need better targeting before they need better conversion. If your traffic quality is poor, optimising the page is polishing something that was never going to work at scale. Know the difference before you invest in a testing program.
But when you do have the right traffic, a clear problem worth investigating, and a hypothesis grounded in evidence, landing page A/B testing is one of the most direct ways to compound your marketing performance over time. Each test builds on the last. Each insight narrows the gap between what you think is true about your users and what’s actually true.
That’s the job. Not winning tests. Understanding users well enough that winning becomes more likely.
Before you run your next experiment, check whether the idea is actually worth testing. The Experiment Validator at Kyzn Academy walks you through the key questions to assess whether your hypothesis is solid, your sample size is realistic, and your test is set up to teach you something. Takes two minutes. Saves you weeks of running the wrong test.





