Posts tagged: standard deviation

Wanna be better with metrics? Watch more poker and less baseball.

Both baseball and poker have been televising their World Series championships, and announcers for both frequently describe strategies and tactics based on the statistics of the games. Poker announcers base their commentary and discussion on the probabilities associated with a small number of key metrics, while baseball announcers barrage us with numbers that sound meaningful but that are often pure nonsense.

Similarly, today’s web analytics give us the capability to track and report data on just about anything, but just because we can generate a number doesn’t mean that number is meaningful to our business. In fact, reading meaning into meaningless numbers can cause us to make very bad decisions.

Don’t get me wrong, I am a huge believer in making data-based decisions, in baseball, poker, and on our websites. But making good decisions is heavily dependent on using the right data and seeing the data in the right light. I sometimes worry that constant exposure to sports announcers’ misreading and misappropriation of numbers is actually contributing to a misreading and misunderstanding of numbers in our business settings.

Let’s consider a couple of examples of misreading and misappropriating numbers that have occurred in baseball over the last couple of weeks:

  1. Selection bias
    This one is incredibly common in the world of sports and nearly as common in business. Recently, headlines here in Detroit focused on the Tigers “choking” and blowing a seven-game lead with only 16 games to go. In a recent email exchange on this topic, my friend Chris Eagle pointed out the problems with the sports announcers’ hyperbole:

    “They’re picking the high-water mark for the Tigers in order to make their statement look good.  If you pick any other random time frame (say end-of-August, which I selected simply because it’s a logical break point), the Tigers were up 3.5 games.  But it doesn’t look like much of a choke if you say the Tigers lost a 3.5 game lead with a month and change to go.”

    Unfortunately, this type of analysis error occurs far too often in business. We might find that our weekend promotions are driving huge sales over the last six months, which sounds really impressive until we notice that non-sale days have dropped significantly as we’ve just shifted our business to days when we are running promotions (which may ultimately mean we’ve reduced our margins overall by selling more discounted product and less full-price merchandise).

    In a different way, Dennis Mortensen addressed the topic in his excellent blog post “The Recency Bias in Web Analytics,” where he points out the tendency to give undue weight to more recent numbers. He included a strong example about the problems of dashboards that lack context. Dashboards with gauges look really cool but are potentially dangerous as they are only showing metrics from a very short period of time. Which leads me to…

  2. Inconsistency of averages over short terms
    Baseball announcers and reporters can’t get enough of this one. Consider this article on the Phillies’ Ryan Howard after Game 3 of the World Series that includes, “Ryan Howard‘s home run trot has been replaced by a trudge back to the dugout.The Phillies’ big bopper has gone down swinging more than he’s gone deep…He’s still 13 for 44 overall in the postseason (.295) but only 2 for 13 (.154) in the World Series.” Actually, during the length of the season, he had three times as many strike outs as home runs, so his trudges back to the dugout seem pretty normal. And the problem with the World Series batting average stat is the low sample size. A sample of thirteen at bats is simply too small to match against his season long average of .279. Do different pitchers or the pressures of the situation have an effect? Maybe, but there’s nothing in the data to support such a conclusion. Segmenting by pitcher or “postseason” suffers from the same small sample size problems, where the margin of error expands significantly. Furthermore, and this is really key, knowing an average without knowing the variability of the original data set is incomplete and often misleading.

    This problems with variability and sample sizes arise frequently in retail analysis when we either run a test with too small a sample size and assume we can project it to the rest of the business, or we run a properly sized test but assume we’ll automatically see those same results in the first day of a full application of the promotion. Essentially, the latter point is what is happening with Ryan Howard in the postseason. We often hear the former as well when a player is all of the sudden crowned a star when he outperforms his season averages over a few games in the postseason.

    In retail, we frequently see this type of issue when we’re comparing something like average order value of two different promotions or two variations in an A/B test. Say we’ve run an A/B test of two promotions. Over 3,100 iterations of test A, we have an average order size of $31.68. And over 3,000 iterations of Test B, we have an average order size of $32.15. So, test B is the clear winner, right? Wrong. It turns our there is a lot more variability in test B, which has a standard deviation of 11.37 compared with test A’s standard deviation of 7.29. As a result the margin of error on the comparison expands to +/- 48 cents, which means both averages are within the margin of error and we can say with 95% confidence that there really is no difference between the tests. Therefore, it would be a mistake to project an increase in transaction size if we went with test B.

    Check out that example using this simple calculator created by my fine colleagues at ForeSee Results and play around with your own scenarios.  Download Test difference between two averages.

Poker announcers don’t seem to fall into all these statistical traps. Instead, they focus on a few key metrics like the number of outs and the size of the pot to discuss strategies for each player based largely on the probability of success in light of the risks and rewards of a particular tactic. Sure, there are intangibles like “poker tells” that occur, but even those are considered in light of the statistical probabilities of a particular situation.

Retail is certainly more complicated than poker, and the number of potential variables to deal with is immense. However, we can be much more prepared to deal with the complexities of our situations if we take a little more time to view our metrics in the right light. Our data-driven decisions can be far more accurate if we ensure we’re looking at the full data set, not a carefully selected subset, and we take the extra few minutes to understand the effects of variability on averages we report. A little extra critical thinking can go a long way.

What do you think? Are there better ways to analyze key metrics at your company? Do you consider variability in your analyses? Do you find the file to test two averages useful?



Related posts:

How retail sales forecasts are like baby due dates

Are web analytics like 24-hour news networks

True conversion – the on-base percentage of web analytics

How the US Open was like a retail promotion analysis

The Right Metrics: Why keeping it simple may not work for measuring e-retail performance (Internet Retailer article)

How are retail sales forecasts like baby due dates?

Q. How are retail sales forecasts like baby due dates.

A. They both provide an improper illusion of precision and cause considerable consternation when they’re missed.

Our first child was born perfectly healthy almost two weeks past her due date, but every day past that less than precisely accurate due date was considerably more frustrating for my amazing and beautiful wife. While her misery was greater than many of us endure in retail sales results meetings, we nonetheless experience more misery than necessary due to improperly specific forecast numbers creating unrealistic expectations.

I believe there’s a way to continue to provide the planning value of a sales forecast (and baby due date) while reducing the consternation involved in the almost inevitable miss of the predictions generated today.

But first, let’s explore how sales forecasts are produced today.

In my experience, an analyst or team of analysts will pull a variety of data sources into a model used to generate their forecast. They’ll feed sales for the same time period over the last several years at least; they’ll look at the current year sales trend to try to factor in the current environment; they’ll take some guidance from merchant planning; and they’ll mix in planned promotions for the time period, which also includes looking at past performance of the same promotions. That description is probably oversimplified for most retailers, but the basic process is there.

Once all the data is in the mix, some degree of statistical analysis is run on the data and used to generate a forecast of sales for the coming time period — let’s say it’s a week. Here’s where the problems start. The sales forecast are specific numbers, maybe rounded to the nearest thousand. For example, the forecast for the week might be $38,478k. From that number, daily sales will be further parsed out by determining percentages of the week that each day represents, and each day’s actual sales will be measured against those forecast days.

And let the consternation begin because the forecast almost never matches up to actual sales.

The laws of statistics are incredibly powerful — sometimes so powerful that we forget all the intricacies involved. We forget about confidence intervals, margins of error, standard deviations, proper sampling techniques, etc. The reality is we can use statistical methodologies to pretty accurately predict the probability we’ll get a certain range of sales for a coming week. We can use various modeling techniques and different mixes of data to potentially increase the probability and decrease the range, but we’ll still have a probability and a range.

I propose we stop forecasting specific amounts and start forecasting the probability we’ll achieve sales in a particular range.

Instead of projecting an unreliably specific amount like $38,478k, we would instead forecast a 70% probability that sales would fall between $37,708k and $39,243k. Looking at our businesses in this manner better reflects the reality that literally millions of variables have an effect on our sales each day, and random outliers at any given time can cause significant swings in results over small periods of time.

Of course, that doesn’t mean we won’t still need sales targets to achieve our sales plans. But if we don’t acknowledge the inherent uncertainty of our forecasts, we won’t truly understand the size of the risks associated with achieving plan. And we need to understand the risks in order to develop the right contingency and mitigation tactics. The National Weather Service, which uses similar methods of forecasting, explains the reasons for their methods as follows:

“These are guidelines based on weather model output data along with local forecasting experience in order to give persons [an idea] as to what the statistical chance of rain is so that people can be prepared and take whatever action may be required. For example, if someone pouring concrete was at a critical point of a job, a 40% chance of rain may be enough to have that person change their plans or at least be alerted to such an event. No guarantees, but forecasts are getting better.”

Imagine how the Monday conversation would change when reviewing last week’s sales if we had the probability and range forecast suggested above and actual sales came in at $37,805k? Instead of focusing on how we missed a phantom forecast figure by 1.7%, we could quickly acknowledge that sales came in as predicted and then focus on what tactics we employed above and beyond what was fed into the model that generated the forecast. Did those tactics generate additional sales or not? How did those tactics affect or not affect existing tactics? Do we need to make strategic changes, or should we accept that our even though our strategy can be affected by millions of variables in the short term it’s still on track for the long term?

Expressing our forecasts in probabilities and ranges, whether we’re
talking about sales, baby due dates or the weather, helps us get a
better sense of the possibilities the future might hold and allows us
to plan with our eyes wide open. And maybe, just maybe, those last couple weeks of pregnancy will be slightly less frustrating (and, believe me, every little bit helps).

What do you think? Would forecasts with probabilities and ranges enhance sales discussions at your company? Do sales forecasts work differently at your company?



Retail: Shaken Not Stirred by Kevin Ertell


Home | About