Business Experimentation

Imagine for a moment that you want to implement a new sales initiative that you think will transform your business. The problem is, you’re not too sure if it’d work.

You decide, prudently, that maybe a pilot test would be good: let’s roll out the initiative to just a small subset of the company, the pilot group, and see how it performs.

If it performs well, great, we roll it out to the rest of the company. If it performs badly, no drama – we simply stop the initiative at the pilot stage and don’t roll it out to the rest of the company. The cost of the pilot would be negligible compared to the full implementation.

After consulting with your team, you decide that your pilot group would be based on geography. You pick a region you know well with relatively homogeneous customers, and whom are extremely receptive to your idea.

You bring your idea to your boss, who likes it and agrees to be the project sponsor. However, he tells you in no uncertain terms that in order for the initiative to go beyond a pilot, you need to show conclusively that it has a positive sales impact. You have no doubt it has, and you readily agree, “of course!”

Knowing that measurement is a little outside your area of expertise, you consult your resident data scientist on the best way to “show conclusively” the your idea works. He advises you that the best way to do that would be through doing an A/B test.

“Split the customers in your pilot group, the region you’ve picked, randomly into two,” your data scientist says. “Let one group be the ‘control’ group, on which you do nothing, and the other be the ‘test’ group, on which you roll out the initiative on. If your test group performs statistically better than the control group — I’ll advise you later on how to do that — you know you’ve got a winning initiative on your hands.”

You think about it, but have your doubts. “But,” you say, “wouldn’t that mean that I would only impact a portion of the pilot group? I can’t afford to potentially lose out on any sales – can’t I roll it out to the whole region and have some other group, outside the pilot, be the control?”

Your data scientist thinks about it for a moment, but doesn’t look convinced.

“You can, but it wouldn’t be strictly A/B testing if you were to do that. Your pilot group was based on geography. Customers in other geographies won’t have the exact characteristics as customers in your pilot geography. If they were to perform differently, it could be down to a host of other factors, like environmental differences; or cultural differences; or perhaps even sales budget differences.”

You’re caught in two minds. On the one hand, you want this to be scientific and prove beyond a doubt the efficacy of the initiative.

On the other hand, having an initiative that brings in an additional $2 million in revenue looks better than one that brings in an additional $1.5 million, due to having a control group you can’t impact.

Why would you want to lose $500,000 when you know your idea works?

What do you do?

A Culture of Experimentation

Without a culture of experimentation, it’s extremely difficult for me to recommend that you actually stick by the principles of proper experimentation and go for the rigorous A/B route. There’s a real agency problem here.

You, as the originator of the idea, have a stake in trying to make sure the idea works. Even though it’d have just been a pilot, having it fail means you’d have wasted time and resources. Your credibility might take a hit. In a way, you don’t want to rigorously test your idea if you don’t have to. You just want to show it works.

Even if it means an ineffective idea is stopped before more funds are channeled to an ultimately worthless cause, for you it really has no benefit. Good for company; bad for you.

In the end, I think it takes a very confident leader to go through with the proper A/B testing route, especially in a culture not used to proper experimentation. It’s simply not easy to walk away from potential revenue gains through holding out a control group, or scrapping a project because of poor results in the pilot phase.

But it is the leader who rigorously tests his or her ideas, who boldly assumes and cautiously validates, who will earn the respect of those around. In the long run, it is this leader who will not be busy fighting fires, attempting to save doomed-to-fail initiatives.

Without these low-value initiatives on this leader’s plate, there will be more resources that can be channeled to more promising ventures. It is this leader who will catch the Black Swans, projects with massive impacts.

I leave you with a passage from an article I really enjoyed from the Harvard Business Review called The Discipline of Business Experimentation, which is a great example of a business actually following through with scrapping an initiative after the poor results of a business experiment:

When Kohl’s was considering adding a new product category, furniture, many executives were tremendously enthusiastic, anticipating significant additional revenue. A test at 70 stores over six months, however, showed a net decrease in revenue. Products that now had less floor space (to make room for the furniture) experienced a drop in sales, and Kohl’s was actually losing customers overall. Those negative results were a huge disappointment for those who had advocated for the initiative, but the program was nevertheless scrapped. The Kohl’s example highlights the fact that experiments are often needed to perform objective assessments of initiatives backed by people with organizational clout.

Can you imagine if they decided not to do a proper test?

What if they thought, “let’s not waste time; if we don’t get on the furniture bandwagon now our competitors are going to eat us alive!” and jumped in with both feet, skipping the “testing” phase?

Or what if the person who proposed the idea felt threatened that should the initiative failed  it would make him or her look bad, and decided to cherry pick examples of stores for which it worked well? (An only too real and too frequent possibility when companies don’t conduct proper experiments.)

It would, I have little doubt, led to very poor results.

And now imagine if this happened with very single initiative the company came up with, large or small. No tests, just straight from dream to reality.

Disastrous.

But unfortunately in so many companies just the case.

What are you doing to help the person next to you?

Was taking a break from my studies (exams next week, people!), having my dinner and watching some YouTube vids on “leadership” (just because) when I came across Simon Sinek and this video.

Reminded me of something I knew very well sometime back, but forgotten in the hustle and bustle of corporate life: that we sometimes have to put ourselves aside, ignoring the modern social beat of “I, I, me, me“, and think about how we can help and serve others, not in the hope for some future karmic gain, but because we can.

Forecasts are not predictions

If you have a prediction and it turns out to be wrong, then that’s bad.

But if you have a forecast and it turns out to be wrong, that’s not necessarily bad, and may in fact be good.

Let’s say that you’re the captain of a ship and you see an iceberg one mile out. Based on your direction and speed, you forecast that within a couple of minutes you’ll hit the iceberg. So you slow down and change course, averting impact.

The forecast, in this sense, was wrong, but served the very purpose it was meant to serve: to aid in decision making. Now, imagine if you were rewarded on getting your forecasts right. Changing course after revealing your forecast of imminent impact would have been a bad move.

To satisfy your objective of forecast accuracy while still not sinking your ship, you could also decide to stay on course, but slow down enough to ensure impact was still made but with minimal damage. Doing this, you would have met both the “avoid sinking” objective and your “forecasting” objective.

But that doesn’t really make sense, does it? And yet, isn’t that what we often do in sales forecasting?

 

Improving Forecasting Through Ensembles

There’s this wonderful article I want to share on building prediction models using ensembles. “Ensembles” in this case simply means the combination of two or more prediction models.

I’d personally had great success bringing several (relatively) poorly performing models together into one ensemble model, with prediction accuracy far greater than any of the models individually.

Definitely something to check out if you’re into this sort of thing.

The net is set for the fish

The following passage is taken from the beautiful book  Master of the Three Ways by Hing Ying Ming (which libraries might classify as “Eastern Philosophy”):

The net is set for the fish,
But catches the swan in its mesh.

The praying mantis covets its prey,
While the sparrow approaches from the rear.

Within one contrivance hides another;
Beyond one change, another is born.

Are wisdom and skill enough to put your hopes on?

Just a little reminder for my future self on the uncertainty of life (r-squared never is 100%).

Update: For the uninitiated, my comment on “r-squared”  above was just a little statistical quip. R-squared is a number between 0 and 1 that represents the amount of variability of a linear model, in percentage, that can be explained by the model. Anything outside of r-squared, so 1 less r-squared, is uncertainty.

Explaining and questioning the world through data science