Web Tip – How to Measure Statistical Significance of Your A/B Tests

This week, we welcome a guest post from our Digital Project Manager, Anna Pleshakova.

 Anna is responsible for behavior-driven email marketing campaigns and strategy for B2B and B2C clients and helping them navigate emerging digital trends. She also works on web development and data integration projects for clients.

As marketers, we always want to be testing to see what creative content fits our audience best. Is it a sneaky subject line that entices email opens? Or a particularly good call to action that gets the click on a landing page?

Multivariate, or A/B testing, is a great way to test and see what works and doesn’t. “Success” can be measured in numerous ways, but the results should be significant. If you’re scratching your head because we lost you at multivariate, read on to learn what A/B testing is and how you can measure the statistical significance of your tests.

A/B or Multivariate Testing

A/B Testing is a method of comparing two, or more if you’re using A/B/C/D testing, versions of web pages, emails or other assets to determine which one performs best (has a better conversion rate).

An A/B test works by splitting your audience into two samples: a “testing” sample, generally 20% – 40% of your audience, and a “remaining” sample. The marketer sets the metric that determines the “winner”. This metric is open rate, click rate, effective rate, or something similar, to define success. Then an end date and time is set for the conclusion of the testing.

When performing A/B testing on emails, the first thing you will do is send both the test and control versions of your email to the testing sample audience. After the test window ends, the “winner” is sent or shown to the remainder of the audience.

The A/B testing process for webpages is a little different than testing emails:

    1. Study Your Website Data to Observe User Behavior
      Start by using a tool, such as Google Analytics or crazy egg, to view data for each of your webpages. Look at your bounce rates, conversion rates, where visitors are clicking on each page to see if there’s a disconnect and identify the goal you want to achieve.
    2. Create a Hypothesis
      Once you see the bounce and conversion rates, you will be able to come up with a hypothesis. Perhaps, there is less conversion on one page more so than on other pages. It could be that the size of a button is too small and people find it hard to click on mobile, or the call to action isn’t enticing or informative enough.
    3. Test your Hypothesis
      After you create a hypothesis, develop your test version(s) for an A/B test you create a second version of your targeted page and make changes you believe will lead to an improvement of your end goal. Half your traffic is shown the original version of the page (control) and half are shown the modified version.
    4. Analyze Data and Draw Conclusions
      Review the data and see if there is a clear winner. The winner can sometimes be the test version or the control.

The statistical significance of the test results is imperative for drawing meaningful conclusions. Read on to read what statistical significance is and why it’s important in testing.

Statistical Significance

According to Optimizely, “statistical significance is a way of mathematically proving that a certain statistic is reliable. When you make decisions based on the results of experiments that you’re running, you will want to make sure a relationship actually exists.”

In other words, unless your tests are statistically significant, you will not be able to back up your claims of one version winning over another.

Statistics crash course:

  • Null Hypothesis: the email or variable you are testing
  • Alternative Hypothesis: the other email or variable
  • P-value (a number between 0 and 1) is used to help determine the significance of your results
    • P-value less than or equal to 0.05, null hypothesis rejected
    • P-value more than 0.05, null hypothesis accepted

If you run an A/B test with the outcome of a significance level of 95% (aka your p-value is equal to 0.05) or more, this means that you are 95% confident that the winner is real. It also means that there is a 5% chance that the winner could be an error caused by randomness. An error rate of more than 5% is considered too high.  You are not going to be 100% right all the time, but you can ensure you are as accurate as possible by keeping a break point of 5% in mind.

Measuring Statistical Significance

Are you thinking of calling up your old statistics professor for help? Put the phone done, we’ve got you covered.

There are some awesome free (and accurate) statistical significance calculators online that keep you from digging out your calculator, but allow you to run the calculations yourself. All you need to do is plug in the number of people in each test and the number of conversions (opens, clicks, downloads, views, etc.) each test received.

Here are our favorite resources:

A/B Test Guide
GetDataDriven
Neil Patel

One Last Thing

We want to stress how important it is to not make assumptions while A/B testing. Significance percentages give you confidence that your results, which ultimately power changes to your website or other assets, will be worthwhile and reliable.

We encourage you to start testing early and often with the help of a significance calculator. If you’ve already been testing, but haven’t been calculating statistical significance – use the easy and free tools we linked above!