The need for theory in prediction models

I’d like to share this wonderful quip by philosopher Robert Long, that was quoted in the (also insightful and actually pretty good) book A Richer Life by Philip Roscoe:

Let’s say that in early 2001 I formulate a theory to the effect that there is a Constant Tolkienian Force in the Universe that produces a Tolkien film every year. When Austrians complain that my theory ignores the fact that films are products of human action and not of constant impersonal forces, I reply: ‘Oh, I know that. My theory isn’t supposed to be realistic. The question is whether it’s a good predictor.’ So I test it in 2001, 2002, and 2003. Lo and behold, my theory works each year! … But unless I pay some attention to the true explanation of this sequence of film releases, I’ll be caught by surprise when the regularity fails for 2004.

The above quote relates to the “blind” prediction models that we build (maybe statistical, maybe machine learning), where “accurate prediction” is so often the aim of the game.

The problem with just going for accurate predictions, without thinking about the underlying causes or theory, is that it’s difficult to tell whether or not the predictions are flukes, or if they contain true insight that may not be so obvious to anybody or anything else but the algorithm.

Though the quote above is rather tongue-in-cheek (the “Tolkienian Force” model is really built on just three data points, way too few for statistical significance) the moral of the need for a good theory wasn’t lost on me: theory is the human ying to the technological yang. They balance and support each other, creating something much more powerful and persuasive than either would alone. When one is amiss, the other somehow doesn’t feel right either.

Another (More Selfish) Benefit of Theory

They’re actually really useful when persuading the business on accepting the results of a prediction model that they otherwise know nothing about. Without a theory, on what basis should the business believe the model? On faith?

That’s a tough sell.

And if you were the one who built the model, and they “trusted” the model because they trust you, what happens if the model fails?

Well, you fail with the model.

With a convincing theory, the model stands on its own merits. They may “trust” the model a bit more because they trust you, but if the model fails, it’s the theory that fails with it, not you.

Let me know what you think