Click the link below to download a sample Excel spreadsheet I created that can develop sales forecast using a Monte Carlo simulation:
(If you haven’t already, see my post “Why most sales forecasts suck…and how Monte Carlo simulations can make them better” for some background.)
Here’s a breakdown on how it works:
The first sheet illustrates a simplified version of how retail sales forecasts are traditionally developed. Basically, it looks at a couple of historical data sets (sales for the same week the prior year and previous iterations of the promotion we plan to run this week) and develops a forecast as follows:
- Start with base sales from the same time period last year
- Apply the current sales trend (which will be determined in this case by average of the previous 10 week comps). This method may vary from retailer to retailer, but this is the general principle.
- Add the average of the incremental results of previous iterations of the promotion
- Voilà! This is the sales forecast.
Of course, this number is impossibly precise and the analysts who generate it usually know that. However, those on the receiving end tend to assume it is absolutely accurate and the probability of hitting the forecast is close to 100%.
Which is pretty much impossible.
It would be far more accurate to forecast a range of sales with accompanying probabilities. And that’s where the Monte Carlo simulation comes in.
The second sheet in the file deals with the Base Business.
On left side of the sheet are the inputs to the Monte Carlo simulation. Fields highlighted in yellow can be changed to modify the structure of the model. (NOTE: Please don’t change fields not highlighted in yellow as these fields all contain calculations.)
In this simple model, there are three input variables: Traffic, Conversion and Average Order Size. You’ll want to estimate the values and the probabilities of those values for each variable based on what you know about your business.
For example, we might look at five iterations of a particular promotion and determine that we got about 75,000 new visitors three time, 100,000 visitors once and 50,000 visitors once. Traditionally, we would average those and assume we’ll always get 75,000 new visitors. But those variances in this variable combined with variances in other variables, such as average ticket and conversion rate, can lead to wildly different sales. Instead, we can say we have a 60% probability of getting 75,000 new visitors, a 20% probability of 50,000 new visitors and a 20% probability of 100,000 new visitors. We’ll do something similar for average ticket and conversion rate.
Important: make sure your probabilities add up to 100%.
Once you get familiar with how the model works, feel free to add more input variables as suits your business. Email me if you have questions about how to do this.
As you enter your input variable probabilities, you’ll be able to see the results in the charts and the columns to the right.
The pie chart is a simple binary chart to quickly show the overall probability that this week’s base business will be positive or negative compared to the same week’s base business the prior year. In this case, it looks like there’s about a 73% chance our base business will comp negatively to last year (which is probably really an indication that we hit on the high end of the scale last year).
While pie chart is useful as a quick look, it’s not really enough to base a decision on. The line charts show the probabilities at various comp percentage increments and dollar increments to give us a better sense of where the base business will fall, with about a 35% chance of being around 2% down and another 13% chance of being down 10%. The columns on the right under the Probabilities heading give all the breakdowns.
You’ll note there are yellow highlighted fields in the output section. Those fields won’t change the model, but changing them allows for showing probabilities in different ranges as might be appropriate for your business.
At the top right, you’ll see a section entitled “Probability Weighted Expected Value.” This is a single number that is the sum of possible sales numbers multiplied by their respective probabilities. It should be used with caution. It can be useful for comparing different options and for trending of multiple periods, but it should not be used as a specific forecast for a specific time period.
The actual Monte Carlo simulation, with its 10,000 iterations, can be seen below the charts. See below for details on how it works, if you’re interested.
Now that we’ve determined probabilities for where our base business might fall, we can layer in a promotion we might run to increase our odds of growing sales year over year.
The third sheet in the spreadsheet, entitled “Incremental Promo” uses a second Monte Carlo simulation and set up inputs to determine our complete forecast.
This sheet works pretty much like the Base Business sheet, but now we’re feeding a simulation that incorporates the variables for the base business and anything we input here that is generated incrementally from the promotion.
Because we’ve likely only run any given promotion a relatively small number of times previously, we have to be really careful about the estimates we make as a small sample size can yield unpredictable results. We should allow for the fact that the promotion could actually cause at least some of our variables to decrement rather than increment. (I read that one of the problems that occurred in our recent economic meltdown was that many analysts built models that had no allowances for housing prices to go down. Let’s not make the same mistake with our models.) You’ll see in the example data that I’ve allowed for the possibility of both conversion and average order size to be lower than any possibility in the Base Business worksheet.
All of the inputs are now combined with the base business variables to create a new simulation, which can be seen on its own worksheet called “Simulation.”
In the pie chart, we can see that the incremental promotion will increase our probability of generating a positive comp for the week, but we’re still slightly more likely to have a negative comp. Basically, though, it’s a coin toss. We can see with the line charts the various probabilities at different increments of where we’re likely to end up based on what we know. And now that we have a better understanding of our risks and potential rewards, we can make a better decision.
How the model works
The model relies on the RAND() and VLOOKUP functions of Excel. The RAND() function randomly generates a decimal between 0 and 1. We feed that random number into the VLOOKUP function, which references the appropriate “Engine Bins” to determine the variable value it uses. VLOOKUP references the first column of a range (range: Two or more cells on a sheet. The cells in a range can be adjacent or nonadjacent.) of cells (Engine Bins, in our example), and then returns the value from the value column on that row. If an exact match is not found in the Engine Bin, the next largest value that is less than random number lookup_value is returned.
So, in our Base Business example, if the RAND() function returns 0.62467547 for one of the traffic iteration variables the VLOOKUP function will reference the Engine Bins for Traffic, see it doesn’t have an exact match and then reference the Engine Bin with the next largest value that is less than 0.62467547. In this case, it chooses the Engine Bin “0” and returns a traffic variable of 300,000. Because the Engine Bin ranges are based on our input probabilities, we will select the 300,000 about 70% of the time, 275,000 about 15% of the time and 325,000 about 15% of the time. You can see how this plays out in the simulation.
In this particular model, we then determine sales by multiplying traffic by conversion rate by average ticket.
The simulation then runs the calculations 10,000 times in order to give us a very statistically valid sample size. There’s no real magic to 10,000, but it’s a large enough data set to give us a reliable result without being so large that it takes a long time to calculate.
We use this data set to generate our probabilities.
- The mode is definitely dependent on the quality of the inputs, but those are generally easier to estimate than the outputs
- Using the RAND() function will cause the entire spreadsheet to recalculate each time you load the spreadsheet, enter a new value or hit F9. This means all the probability outputs will change slightly each time. But they won’t change much. This really illustrates how we can tame the uncertainty of the future with ranges and probabilities, but it also shows how impossible it is to be extremely precise. All that said, I wanted the model to work for you as you enter new values, so I left the RAND() functions in. In actual practice, you might want to cut and paste the values after into another sheet after you’ve run the calculation just so you don’t have to deal with slightly fluid numbers.
- It could take a few iterations of learning to get the inputs right
- Running Monte Carlo forecasts in parallel with the current method can help to refine the inputs