Archive

Archive for the ‘Research And Measurement’ Category

Factors Affecting Marketing Experimentation: Statistical significance in marketing, calculating sample size for marketing tests, and more

April 4th, 2023

Here are answers to questions SuperFunnel Cohort members put in the chat of recent MEC200 and MEC300 LiveClasses for ChatGPT, CRO and AI: 40 Days to build a MECLABS SuperFunnel (feel free to register at that link to join us for an upcoming MECLABS LiveClass).

How many impressions or how much reach do we need for statistical significance?

I can’t give you a specific number, because the answer will vary based on several factors (described below). Also, MECLABS SuperFunnel Cohort members now have access to a Simplified Test Protocol in their Hub, and you can use that tool to calculate these numbers, as shown in Wednesday’s LiveClass.

But I included the question in this blog post because I thought it would be helpful to explain the factors that go into this calculation. And to be clear, I’m not the math guy here. So I won’t get into the formulas and calculations. However, a basic understanding of these factors has always helped me better understand marketing experimentation, and hopefully it will help you as well.

First of all, why do we even care about statistical significance in marketing experimentation? When we run a marketing test, essentially we are trying to measure a small group to learn lessons that would be applicable to all potential customers – take a lesson from this group, and apply it to everyone else.

Statistical significance helps us understand that our test results represent a real difference and aren’t just the result of random chance.

We want to feel like the change in results is because of our own hand. It’s human nature. A better headline on the treatment landing page, or a better offer. And we can see the results with our own eyes, so it is very hard to understand that a 10% conversion rate may not really be any different than an 8% conversion rate.

But it may just be randomness. “Why is the human need to be in control relevant to a discussion of random patterns? Because if events are random, we are not in control, and if we are in control of events, they are not random, there is therefore a fundamental clash between our need to feel we are in control and our ability to recognize randomness,” Dr. Leonard Mlodinow explains in The Drunkard’s Walk: How Randomness Rules Our Lives.

You can see the effect of randomness for yourself if you run a double control experiment – split traffic to two identical landing pages and even though they are exactly the same, they will likely get a different number of conversions.

We fight randomness with statistical significance. The key numbers we want to know to determine statistical significance are:

  • Sample size – How many people see your message?
  • Conversions – How many people act on your message?
  • Number of treatments – For example, are you testing two different landing pages, or four?
  • Level of confidence – Based on those numbers, how sure can you be that there really is a difference between your treatments?

And this is the reason I cannot give you a standard answer for the number of impressions you need to reach statistical significance – because of these multiple factors.

I’ll give you an (extreme) example. Let’s say your sample size is 100 and you have four treatments. That means, each landing page was visited by 25 people. Three of the landing pages each get three conversions, and the other landing page gets four conversions. Since so few people saw these pages and the difference in conversions is so small, how confident are you that they are different? Or perhaps you randomly had one more motivated person in that last group that gave you the extra conversion.

And this assumes an even traffic split, which you may not want to do based on how concerned you are about the change you are making. As we teach in How to Plan Landing Page Tests: 6 Steps to Guide Your Process, “Using an uneven traffic split is helpful when your team is testing major changes that could impact brand perception or another area of your business. Although the results will take longer to reach statistical significance, the test is less likely to have an immediate negative impact on business.”

Now, let’s take another extreme example. Say your sample size is 10,000,000 and you have just a control and a treatment. The control gets 11 conversions, but the treatment gets 842,957 conversions. In that case, you can be pretty confident that the control and treatment are different.

But there is another number at play here – Level of Confidence (LoC). When we say there is a statistically significant difference, it is at a specific Level of Confidence. How sure do you want to be that the control and treatment are different? For marketing experimentation, 95% is the gold standard. But 90%, or even 80% could be a enough if it is a change that likely isn’t going to be harmful, and doesn’t take too many resources to make. And the lower Level of Confidence you are OK with, the lower sample size you need and the less difference in conversions you need to be statistically significant at that LoC.

So is Estimated Minimum Relative Difference our desired/target lift if our test performs as expected?

Once you understand how statistical significance works (as I described in the previous question), the next natural question is – well, how does this affect my business decisions?

The first answer is, this understanding will help you run marketing experiments that are more likely to predict your potential customers’ real-world behavior.

But second answer is – this should impact how you plan and run tests.

This question refers to the Estimated Minimum Relative Difference in the Simplified Test Protocol that SuperFunnel Cohort members receive, specifically in the test planning section that helps you forecast how long to run a test to reach statistical significance. And yes, the Estimated Minimum Relative Difference is the difference in conversion rate you expect between the control and treatment.

As discussed above, the larger this number is, the less samples and time (to get those samples) it takes to run a test.

Which means that companies with a lot of traffic can run tests that reach statistical significance even if they make very small changes. For example, let’s say you’re running a test on the homepage of a major brand, like Google or YouTube, which get billions of visits per month. Even a very small change like button color may be able to reach statistical significance.

But if you have lower traffic and a smaller budget, you likely need to take a bigger swing with your test to find a big enough difference. This does not necessarily mean it has to require major dev work. For example, the headlines “Free courtside March Madness tickets, no credit card required” and “$12,000 upper level March Madness tickets, $400 application fee to see if you qualify” are very quick changes on a landing page. However, they are major changes in the mind of a potential customer and will likely receive very different results.

Which brings us to risk. When you run valid experiments, you decrease the risk in general. Instead of just making a change and hoping for the best, only part of your potential customer base sees the change. So if your change actually leads to a decrease, you learn before shifting your entire business. And you know what caused the decrease in results because you have isolated all the other variables.

But the results from your experiments will never guarantee a result. They will only tell you how likely there will be a difference when you roll out that change to all your customers for a longer period. So if you take that big swing you’ve always wanted to take, and the results aren’t what you expect, that may rein your team in from a major fail.

As we say in Quick Guide to Online Testing: 10 tactics to start or expand your testing process, “If a treatment has a significant increase over the control, it may be worth the risk for the possibility of high reward. However, if the relative difference between treatments is small and the LoC is low, you may decide you are not willing to take that risk.”

With a test running past 4 weeks, how concerned are you about audience contamination between the variants?

Up until now we’ve been talking about a validity threat called sampling distortion effect – failure to collect a sufficient sample size. As discussed, this could mean your marketing experiment results are due to random variability, and not a true difference between how your customers will react to your treatments when rolled out to your entire customer set.

But there are other validity threats as well. A validity threat simply means that a factor other than the change you made – say, different headlines or different CTAs – was the reason for the difference in performance you saw. You are necessarily testing with a small slice of your total addressable market, and you want to ensure that the results have a high probability of replicability – you will see an improvement when you roll out this change to all of your potential customers.

Other validity threats include instrumentation effect – your measurement instrument affecting the results, and selection effect – the mix of customers seeing the treatments do not represent the customers that you will ultimately try to sell to, or in this case, the same customer seeing multiple treatments.

These are the types of validity threats this questioner is referring to. However, I think there is a fairly low (but not zero) chance of these validity threats only coming from running the test (not too much) past four weeks. While we have seen this problem many years ago, most major platforms have gotten pretty good at assigning a visitor to a specific treatment and keeping them there on repeat visits.

That said, people can visit on multiple devices, so the split certainly isn’t perfect. And if your offer is something that calls for many repeat visits, especially from multiple devices (like at home and at work), this may become a bigger validity threat. If this is a concern, I suggest you ask your testing software provider how they mitigate against these validity threats.

However, when I see your question, the validity threat I would worry about most is history effect, an extraneous variable that occurs with the passage of time. And this one is all on you, friend, there is not much your testing software can do to mitigate against it.

As I said, you are trying to isolate your test so the only variables that affect the outcome are the ones you’ve purposefully changed and are intending to test based on your hypothesis. The longer a test runs, the harder this gets. For example, you (or someone else in your organization) may choose to run a promotion during that period. Maybe you can keep a tight lid on promotions for a seven-day test, but can you keep the promotion wolves at bay in your organization for a full two months?

Or you may work at an ecommerce company looking to get some customer wisdom to impact your holiday sales. If you have to test for two months before rolling anything out, you may test in September and October. However, customers may behave very differently earlier in the year than they would in December, when their motivation to purchase a gift near a looming deadline is a much bigger factor.

While a long test makes a history effect more likely, it can occur even during a shorter test. In fact, our most well-known history effect case study occurred during a seven-day experiment because of the NBC television program Dateline. You can read about it (along with info about other validity threats) in the classic MarketingExperiments article Optimization Testing Tested: Validity Threats Beyond Sample Size.

Join us for a Wednesday LiveClass

As I mentioned, these questions came from the chat of recent LiveClasses. You can RSVP now to join us for an upcoming LiveClass. Here are some short videos to give you an idea of what you can learn from a LiveClass…

“If there’s not a strong enough difference in these two Google ads…the difference isn’t going to be stark enough to probably produce a meaningful set of statistics [for a marketing test]…” – Flint McGlaughlin in this 27-second video.

“…but that’s what Daniel was really touching on a moment ago. OK, you’ve got a [marketing] test, you’ve got a hypothesis, but is this really where you want to invest your money? Is this really going to get the most dollars or the most impact for the energy you invest?…” – Flint McGlaughlin, from this 46-second video about finding the most important hypotheses to test.

How far do you have to go to with your marketing to take potential customers from the problem they think they have, to the problem they do have? I discuss this topic while coaching the co-founders of an eyebrow beauty salon training company on their marketing test hypothesis in this 54-second video.

Marketing 101: What is an A/B split test?

February 2nd, 2018

Marketing has a language all its own. This is our latest in a series of posts aimed at helping new marketers learn that language. What term do you find yourself explaining most often to new hires during onboarding? Let us know.

An A/B split test refers to a test situation in which two randomized groups of users are sent different content at the same time to monitor the performance of specific campaign elements.

A/B split testing is a powerful way to improve marketing and messaging performance because it enables you to make decisions about the best headline, ad copy, landing page design, offer, etc., based on actual customer behavior and not merely a marketer’s opinion.

 

Let’s break down the process of A/B split testing.

Real People Enter the Test

This is part of the power of A/B split testing as compared to other forms of marketing research such as focus groups or surveys. A/B split testing is conducted with real people in a real-world purchase situation making real decisions, as opposed to a survey or focus group where you’re asking people who (hopefully) represent your customers what they might do in a hypothetical situation, or to remember what they have done in a past situation.

Not only can you inadvertently influence people in ways that change their answer (since the research gathering mechanism does not exactly mimic the real-world situation), but people may simply tell you what they think you want to hear.

Or, many times, customers misjudge how they would act in a situation or misremember how they have acted in the past.

That doesn’t mean you shouldn’t use surveys, focus groups and the like. Use this new information to create a hypothesis about your customers. And then run an A/B split test to learn from real customers if your hypothesis is correct.

Read more…

Live from MarketingSherpa Summit 2017: Jeff Ma on harnessing the power of analytics to better understand customers

April 12th, 2017

As a member of the famous MIT Blackjack Team and the inspiration for the main character in the book Bringing Down the House and the Kevin Spacey film 21, Jeff Ma knows a thing or two about gambling.

Scratch that — Jeff Ma isn’t a gambler. That’s because every move in blackjack has one correct decision. It’s just about understanding basic strategy, and implementing it. Remove human instincts, or “gut feelings,” and you will stack the odds in your favor.

Currently the senior director of analytics at Twitter (after selling his startup to the social network) and a former predictive analytics expert for ESPN, Jeff spoke to the MarketingSherpa Summit audience about how to use data and analytics to come out on top with customers.

1K2A9210 - Version 2

By using data to overcome emotional biases, Ma said, not only can marketers win big with customers, but they’ll also build influence within their organizations.

Learning to make better decisions

It all begins with increasing your odds by using basic strategy.

“A lot of people don’t use basic strategy, which is why we’re so bad at making decisions as a people,” Jeff said. “Decisions are best when you have data behind them.”

One common mistake people fall prey to is omission bias. Basically, people don’t want to be perceived as the agent for harm to themselves — or their company. As Jeff put it, people would rather make a decision with a lower chance of success if the “dealer” or “fate” beats them, rather than going with a higher chance of success that, if it fails, will mean they’ve made a “bad” decision.

Or to put it in Vegas terms: big risk, big reward.

There are no bad decisions — only ones informed by data

Read more…

How to Be Ready for the Future of Marketing in 3 Steps

May 3rd, 2016

Editor’s Note: This interview was edited for length and grammar only.

Marketers by the very nature of what they do are constantly trying to predict what’s going to happen next. That could include answering questions like: What’s our next big campaign? How will this new channel perform at generating leads? Will this strategy work?

But marketers seldom — if at all — get to sit back to wonder about or predict the broader future of marketing.

In my role as chief evangelist, I often get to talk to influencers about what they’re seeing in the marketing community. When I read about Nick Johnson, Brand Director, Incite Group, and the research he did to understand the future of marketing, and later writing a book about it, I wanted to talk to him about what he learned and how marketers can get ready for the future of marketing.

 

Brian Carroll: What inspired you to research and write about the Future of Marketing?

nick-johnson-headshotNick Johnson: A variety of things really, so I’ve been fortunate enough to be in a position to speak with senior marketing executives on a daily basis for five years now in my position of running Incite.

I spend a lot of my time doing research, working out what priorities, challenges and shifting opportunities there are for marketers — which get into white papers and reports we put together. It became apparent there was an unprecedented level of turbulence in the space. The changes in marketing were happening at a pace that was unprecedented and shift in terms of the marketer’s role and their ability to influence the fortunes of their company were absolutely enormous. I remember speaking to several marketers that have been in their positions for decades and they say things like, “I used to know what I was doing and now it’s all changed.”

  Read more…

Ecommerce: Building online trust before customers click over to your competitors’ sites

December 23rd, 2014

All marketing is built on trust. Without trust, customers won’t subscribe to your email. They won’t open. They won’t click. And they certainly won’t buy.

Keeping this in mind, I interviewed Craig Spiezle, Executive Director and President, Online Trust Alliance, about security, privacy and consumer protection. I’ve also and provided tips on how you can build trust with your customers.

 

“Privacy policies were written by attorneys, for attorneys,” Craig joked. “And you need three attorneys to figure them out. It’s a great job enhancement thing for the legal profession. It does nothing for consumers.”

Read more…

How a Single Source of Data Truth Can Improve Business Decisions

September 12th, 2014

One of the great things about writing MarketingSherpa case studies is having the opportunity to interview your marketing peers who are doing, well, just cool stuff. Also, being able to highlight challenges that can help readers improve their marketing efforts is a big perk as well.

A frustrating part of the process is that during our interviews, we get a lot of incredible insights that end up on the cutting room floor in order to craft our case studies. Luckily for us, some days we can share those insights that didn’t survive the case study edit right here in the MarketingSherpa Blog.

Today is one of those times.

 

Setting the stage

A recent MarketingSherpa Email Marketing Newsletter article — Marketing Analytics: How a drip email campaign transformed National Instruments’ data management — detailed a marketing analytics challenge at National Instruments, a global B2B company with a customer base of 30,000 companies in 91 countries.

The data challenge was developed out of a drip email campaign, which centered around National Instruments’ signature product, after conversion dropped at each stage from the beta test, to the global rollout, and finally, to results calculated by a new analyst.

The drip email campaign tested several of National Instruments’ key markets, and after the beta test was completed, the program was rolled out globally.

The data issues that came up when the team looked into the conversion metrics were:

  • The beta test converted at 8%
  • The global rollout was at 5%
  • The new analyst determined the conversion rate to be at 2%, which she determined after parsing the data set without any documentation as to how the 5% figure was calculated

Read the entire case study to find out how the team reacted to that marketing challenge to improve its entire data management process.

Read more…

Customer Relevance: 3 golden rules for cookie-based Web segmentation

September 13th, 2013

Over the years, the Internet has become more adaptive to the things we want.

It often seems as if sites are directly talking to us and can almost predict the things we are searching for, and in some ways, they are.

Once you visit a website, you may get a cookie saved within your browser that stores information about your interactions with that site. Websites use this cookie to remember who you are. You can use this same data to segment visitors on your own websites by presenting visitors with a tailored Web experience.

Much like a salesman with some background on a client, webpages are able to make their “pitch” to visitors by referencing  information they already know about them to encourage clickthrough and ultimately conversion.

Webpages get this information from cookies and then use a segmentation or targeting platform to give visitors tailored Web experiences.

Cookies can also be used to provide visitors with tailored ads, but in today’s MarketingSherpa Blog post, we will concentrate on your website, and how segmentation can be used on your pages to provide more relevant information to your potential customers.

 

Test your way into cookie-based segmentation

At MECLABS, we explore cookie-based segmentation the only way that makes sense to us – by testing it.

It’s fairly easy to identify the different variables you would want to segment visitors by, but how to accurately talk to them should be researched. It’s also easy to become distracted by the possibilities of the technology, but in reality, the basic principles of segmentation still apply, as well as the following general rules.

 

Rule #1. Remember you are segmenting the computer, not the person

There are more opportunities for error when segmenting online because multiple people may use the same computer.

Therefore, online segmentation has some mystery to it. You can tailor your message to best fit the cookies, but that may not accurately represent the needs of the specific person sitting in front of the computer at that time.

Many segmentation platforms boast a 60% to 80% confidence level when it comes to how accurately they can segment visitors, but I think a better way to position this information is there is a 20% to 40% margin of error.

That is pretty high!

Be cautious with how you segment. Make sure the different experiences you display are not too different and do not create discomfort for the visitor.

For visitors who do not share a computer, error can still be high. They may be cookied for things that do not accurately describe them.

I bet if you looked at your browser history, it may not be the most precise representation of who you are as a person. Therefore, don’t take cookie data as fact because it most likely isn’t. It should be used as a tool in your overall segmentation strategy and not serve as your primary resource for information about your customers.

 

Rule #2. Be helpful, not creepy

People are getting used to the Internet making suggestions and presenting only relevant information to them.

Some have even come to expect this sort of interaction with their favorite sites. However, there is a fine line between helpful and creepy. Visitors probably don’t want to feel like they are being watched or tracked. Marketers should use the data collected about their visitors in a way that does not surpass their conscious threshold for being tracked.

For example, providing location-specific information to visitors in a certain region is alright, but providing too much known information about those visitors may not be.

Cookies can tell you income level, demographic information, shopping preferences and so much more. Combining too much known information could seem overwhelming to the visitor and rather than speaking directly to them, you risk scaring them off.

Instead of making it blatantly obvious to visitors you have collected information on them, I would suggest an approach that supplies users with relevant information that meets their needs.

Read more…

Marketing Process: Managing your business leader’s testing expectations

June 25th, 2013

Every Research Partner wants a lift, but we know sometimes, those lifts aren’t achievable without learning more about their customers first.

And often, our biggest lifts are associated with radical redesign tests that really shake things up on a landing page. That is because the changes are more drastic than a single-factor A/B test that allows for pinpointing discoveries.

So, how can you strike a balance between using these two approaches while still delivering results that satisfy expectations?

You can achieve this by managing your client’s or business leader’s expectations effectively.

It sounds easier said than done, but there are a few things you can do to satisfy a client’s or business leader’s needs for lifts and learnings. 

 

Step #1. Start with radical changes that challenge the paradigm

At MECLABS, we often recommend a strategic testing cycle with radical redesign testing (multiple clusters as opposed to a single-factor A/B split) to identify any untapped potential that may exist on a Research Partner’s landing page.

However, you must make sure you are not making random changes to a page to achieve a radically different control and treatment, but are truly focused on challenging the control’s paradigms and assumptions currently being made on the page by testing with a hypothesis.

For example, Sierra Tucson, an addiction and mental health rehabilitation facility, found with a radical redesign from a landing page focused on luxury to a landing page focused on trust resulted better with its target audience. The company also generated 220% more leads with the test to boot.

 

Step #2. Zoom in on general areas your radical redesign test has identified as having a high potential for impacting conversion

Next, we suggest refining with variable cluster testing, also known as select clusters.

If you identify a radical shift in messaging to be effective, as Sierra Tucson did, you might next want to try different copy, different designs or different offers, just to name a few options.

Read more…

Testing: 3 common barriers to test planning

June 14th, 2013

Sometimes while working with our Research Partners, I hear interesting explanations on why they can’t move forward with testing a particular strategy.

And as you would expect, there are a few common explanations I encounter more often than others:

  • We’ve always done it like this.
  • “Our customers are not complaining, so why change?

And my personal favorite…

  • We already tested that a few years ago and it didn’t work.

While there are some very legitimate barriers to testing that arise during planning (testing budgets, site traffic and ROI), the most common explanations of “We can’t do that” I hear  rarely outweigh the potential revenue being left on the table – at least not from this testing strategist’s point of view.

So in today’s MarketingSherpa blog post, we will share three of the most common barriers to testing and why your marketing team should avoid them.

 

The legacy barrier – “We’ve always done it like this.”

Legacy barriers to testing are decisions derived from comfort.

But what guarantee does anyone ever have that learning more about your customers is going to be a comfortable experience? So, when I receive a swift refusal to test based on “We’ve always done it like this,” I propose an important question – what created the legacy in your organization in the first place?

Generally, many companies understandably create business constraints and initiatives around what is acceptable for the market at a given point in time.

But what happens far too often is that these constraints and initiatives turn into habits. Habits that are passed on from marketer to marketer, until the chain of succession gives way to a forgotten lore of why a particular practice was put in place.

This ultimately results in a business climate in which the needs of yesteryear continue to take priority over the needs you have today.

So, if you find yourself facing a legacy barrier, below are a few resources from our sister company MarketingExperiments to help you achieve the buy-in you need to challenge the status quo:

What to test (and how) to increase your ROI today

Value Proposition: A free worksheet to help you win arguments in any meeting

 

The false confidence barrier  “Our customers are not complaining, so why change?”   

The false confidence barrier is built on the belief that if it isn’t broken, don’t fix it – or at least it isn’t broken that you’re aware of.

This is especially important if your organization is determined to use customer experience in the digital age as the metric of success when evaluating a website’s performance – and this happens more than you would think.

So, considering for a moment a hypothetical customer is having an unpleasant experience on your website, ask yourself…

What obligation does a customer have to complain about their experience to you?

My recommendation in this case is to never assume customer silence is customer acceptance.

Instead, take a deeper look at your sales funnel for opportunities to mitigate elements of friction and anxiety that may steer customers away from your objectives, rather than towards them.

Read more…

Test Planning: Create a universal test planner in 3 simple steps

May 2nd, 2013

One of my responsibilities as a Research Analyst is to manage ongoing test planning with our Research Partners and at times, keeping tests running smoothly can be a challenge.

This is especially true when you consider testing is not a static event – it’s more like a living, breathing continuous cycle of motion.

But even with so many moving parts, effectively managing test plans can be made a little easier with two proven key factors for success – planning and preparation.

Today’s MarketingSherpa blog post is three tips for test planning management. Our goal is to give marketers a few simple best practices to help keep their testing queue in good order.

 

Step #1. Create

Creating a universal test planner everyone on your team can access is a great place to start.

For our research team, we created a universal test planner including:

  • Results from prior testing with our Research Partner
  • Current active tests
  • Any future testing planned
  • A list of test statuses definitions that everyone on the team understands – (test active, test complete, inconclusive, etc.)
  • A brief description of what is being tested (call-to-action button test, value copy test, etc.)
  • A list of who is responsible for each task in the test plan

 

Step#2. Organize

As I mentioned in the previous step, the status of a test can change and, based on the results, so will the ideas and priorities for future testing.

Some tests will move forward in the queue, and others will be pushed back to a later time.

So, to help keep our team informed of changes in the testing environment, we update the planner throughout the day and in real time during brainstorming sessions based on results and Partner feedback.

This allows us to focus our research and testing strategy efforts on expanding on discoveries versus chasing our tails to keep up-to-date.

Read more…