AB(n) Test reporting interface
Updated over a week ago

Where to find the Full Report

Two key places link into the Full Report, both found on the dashboard screen.

Report configuration

This report comes with many settings you can tweak, and user pools to configure. These will be detailed here.

In the top-right corner of the report, you'll see:

Date Range

Often, users wish to limit the dates they report on their data. We see this happen typically for two reasons:

  1. Ignore the last day, as users won't have had time to convert yet.

  2. Ignore the first 2-3 days. Sometimes tests have a "novelty effect", which you can spot in the Conversion Rate Stabilisation graph, where the variant only performs well for the first day or two. Removing these days from your results could prove significance faster than including a few days with "novelty" data.

Confidence Level

This is our level for how easily a swing in conversion rate is declared as significant. While we default to 95%, there are good reasons to deviate from this value.

Increasing the level is useful for mission-critical tests, where false discovery is less acceptable. You are able to go up to a 99% confidence level, which as a 2-tailed test requires a 99.5% or 0.5% chance to beat control to be flagged as significant.

Decreasing the level is useful for lower-traffic tests, short-term learning or where direction is important but certainty isn't.

Once the value is changed, simply click Reload Report for all calculated statistics to update.

While many tools have this value in their account settings area, we add this to your report as we feel it's a per-test decision, i.e. "this test is mission critical" or "this test is in a low traffic area".

Users have the ability to adjust the default in Account Settings, though.

Note: Our default calculations are 2-tailed, and so 95% confidence level requires a Chance To Beat Control of >97.5% or <2.5% to be declared significant. Read more here.

Conversion metrics selection

Here, you can select your metrics in the order that you want to see them in the report.

Refining your list to limit scope is useful, as we encourage users to track many metrics in our system but often decisions are primarily made based on north-star goals.

Reordering your list is useful, both for the coherence of your report, but also to have graphs ordered correctly, such as Dropoff charts.

KPI selection

As of Release 74, we have notifications which alert you to test significance, backed with certain guardrail checks. This is based on your KPI having been set.

Other places in the interface, alerts etc. are also powered by KPIs being set, which we recommend you set for all experiments you run.

Report settings bar

This settings bar influences the report, including how we build our User Pool, how we slice the data, and how we present it back to you.

Filters

Filters are used to help refine your target audience. There are a few useful features to understand here.

Match any / Match all

You can choose to join your rules with OR and AND - a useful distinction when listing multiple Rules. The rationale is very similar to the Segment builder for which you should use.

Attributes

We have attributes that categorise the user (Device attributes, Geolocation, etc.), and that describe their behaviour (Metrics triggered). Chaining these together allows for complex user definitions, e.g.: "Mobile users who have not purchased, but did get to the checkout and saw the login error".

As of Release 74, we also have the option for Custom Filters, where your Custom Data can power filtering for an open-ended and flexible approach to filtering.

Operators

You will find the ability for both inclusion (is, contains, is in list) and exclusion (is not, does not contain, is not in list) operators. While many platforms are limited to inclusion, we make it very easy to remove users from your pool as well as find people to include.

Values

Most value options you will find in the interface are open-ended.

Dimensions

Dimensions allow you to split your data by different attributes. By default, you will always see Experiment names and Event names. You can add other attributes alongside this.

You can also decide the order in which to present your additional dimensions.

  • Category: Divide the entire report at the top-level. This keeps the overall presentation of the report the same, but lets you see all values of an attribute on the same screen, one after the other.

  • Group: This groups the screen first by metric, then your attribute, then experiment. We find this to be the most common view people look for when slicing their reports.

  • Row: Use your custom dimension at the most granular level. Useful for making direct comparisons, e.g. Desktop vs. Mobile for a given experiment + metric.

Measures

This allows you to select which columns you wish to see in the report. As we grow the number of calculations available in our reporting suite, using this will keep your screen manageable.

Scope

What is user, session, total scope - both for views and conversions. What does it mean?

All data collected comes with Visitor IDs and Session IDs, thus allowing us to make the distinction.

Control group

Whilst the default Control is always experiment/variation 1, there are times when it helps to analyse data pivoted on a different experiment.

For example - let's say you run an ABn test with 4 variations. It's important to know firstly that your variations are outperforming the control group, which default settings help identify quickly.

Beyond this though, making comparisons about how much stronger the difference between two variations is can only be done if you reconsider your control group, given that all statistics (lift over control, chance to beat control, significance) reference the control.

By changing your Control group to e.g. Variation 3, you could then understand if/that Variation 3 stands out as an overwhelming winner amongst the other variations, thus being a safe overall winner and removing the need to consider running the test for longer.

Extract data

You can extract the raw data, as a zipped-CSV, straight onto your machine.

Simply select the columns you would like to explore - keeping in mind fewer columns = smaller file - and download the file.

It may take a while to generate the extract - you'll find a list of current extracts on the first tab.

Performance tab

Binomial metrics

Each metric is listed as a group, and this is typically split by your Experiment Name (variation). Along the columns, you will then find data or calculations.

Users

This is our count of people or sessions, per experiment. The Scope is selected in the settings bar, and defaults to Users.

Conversions

This is a count of the metrics received, again based on your Scope settings.

Conversion rate

A simple calculation of Conversions / Users.

The accompanying value is the error rate - this takes a given distance from the Conversion Rate based on traffic levels, to show you the boundaries of where conversion rates may lie.

Lift over control

How much better or worse than the control group is this? The number you find here is a proportionate increase, not a percentage-points increase.

Take the following scenario:

  • Control: 10% conversion rate

  • Variation: 12% conversion rate

The number you would see for Lift over control is 20% (proportionate increase), not 2% (percentage-points increase).

The number is a key output when reporting back on the success of your experiments, and allows users to say "this variation generates 20% more sales than what we have today".

Chance to beat control

This is a key output of the Z-test that we run. See more details here.

Values over 50% show an increase in conversion rate; values under 50% show a decrease.

Values tending towards 100% are strongly positive; values tending towards 0% are strongly negative.

Significant

As described and as per chance to beat control, significance is a key output of the z-test.

You will see yes/no values here, with Yes being green for "positive and significant" and red for "negative and significant".

Continuous metrics

Continuous, or non-binomial metrics, are anything where we're comparing the numerical value of an event as opposed to whether or not it happened. The goal here is to be able to report on uplifts for value, such as revenue earned, units sold, etc., instead of just knowing whether or not more transactions took place.

You will find the following columns for continuous metrics:

Views by experiment

This should be a similar count to your binomial metrics - count of users/sesions/total who saw your experience.

Conversions

This is a count of the number of metrics triggered with this data field present. For example, if Revenue and Units are collected with your Purchase event, it would be a count of Purchase events.

Avg per visitor

This is average per tested visitor. Of the people who fell into your test, many of whom may not convert, what is their average? This is a useful metric if you are hoping to get more users to perform the action you're measuring, e.g. get more people to purchase, not just increase basket size.

It is also useful as a guardrail metric against overall uplift. E.g. "We increased Average Order Value, but did we increase or reduce the spend per visitor?".

For websites that heavily rely on CPC/CPM style metrics, this is also a good gauge of knowing what your return per unit of traffic was. If it costs you £3 to acquire traffic, and you make £17 on average with a 60% margin, you know you're doing ok.

Lift over control (avg per visitor)

This is a simple calculation of percentage difference. For example:

  • Control: 10.00

  • Variation: 12.00

  • Lift over control: +20%

Mean value

For collected data, this is the calculated mean - all collected values divided by the number of values.

This is what most users consider to be "the average".

No outliers are stripped from this number - we include all values when calculating the mean.

Mean total

This is the sum/total of all observed values for that metric, again with no outliers stripped out from this number.

Lift over control (mean)

This is a simple calculation of the percentage difference. You'll find it's the same number whether based on Mean Value or Mean Total.

Median Value

The median is the central data point, and is an easy but useful gauge of averages without any impact of skew or outliers.

Median Total

This is a synthetic/calculated total, taking the Median Value x The number of values. This gives us an "observed total" value but without outliers influencing the number.

Why is removing outliers important?

Let's say your average spend is £50, but one person spends £10,000. They happen to be in the variant, and so Means point towards positive as the value is so high. Removing this one data point, your variation could be negative.

Removing outliers, and using calculations such as U-tests which are less swayed by individual values, allows for more robust conclusions.

Lift over control (median)

Again, a simple difference-based calculation, this time based on the Median.

Chance to beat control, Significance

These are core outputs of the Rank-Sum test (U-test). They provide us with a strength of difference figure in Chance to beat control, and a "did it break out threshold?" figure in Significance.

Chance to beat control is a percentage based number - we see:

  • Under 50% - Negative

  • Over 50% - Positive

  • Tending towards 0% - strongly negative

  • Tending towards 100% - strongly positive

Significance is just a Yes/No - did it break out threshold based on the Confidence Level - yes or no.

Any Conversion

To understand this, first consider how we recommend you collect metrics.


The recommended strategy for data collection is to be granular where possible. For example, if you have a section with 3 blocks that a user can click, track these separately as click_block1, click_block2, click_block3 instead of collectively as click_blocks.

At this stage, you will be able to measure their individual performance using the normal reporting, and then collective performance using Any Conversion.

Any Conversion, as this suggests, gives you rolled-up performance. Across the metrics you've selected, this gives you a collective view of "interactions with Any of these metrics".

You may which to limit your Selected Conversions if using this feature - it is typically not useful if observing an entire conversion funnel of metrics as opposed to a few similar clicks/pages.

Did this answer your question?