Future is Probabilistic
“All models are wrong, but some are useful”.
What did the British statistician George Box mean when he wrote these now-famous words in 1976, and how is it relevant to you?
Whenever we try understanding the world around us – be it our customers’ behavior, or how stars circle the center of a galaxy, or how coronavirus affects the human body – we never have direct and full access to the underlying reality. Instead, what we have access to is only the data generated by the process that we want to understand.
So, for example, we know that a customer clicked on one button but ignored the other one. Based on this data point, we now infer what sort of customer she must be and predict her needs are and how we can fulfill them.
Notice that the data hasn’t told you a lot – it’s sterile, a mere bunch of numbers. But when you combine the fact that a customer clicked on the button with your assumptions about how people behave, you get the magical ability to predict the future (which you can use to your advantage).
But your assumptions are not equal to reality and that’s why Box said: “all models are wrong”. To understand this clearly, look at the following chart:
Imagine you start observing the data from the start (the bottom left) and you move in time and collect more data points (up to the blue circle). At this stage, you have to decide whether future data points will lie on the red curve or the black curve. Perhaps you have to make a business decision based on this choice. Which one will you pick?
Actually, in this case, even though black and red curves are generated through two very different realities (see their equations), there’s no way any amount of historical data can help you choose because all data up to blue circle is consistent with both.
As you can see data cannot help you select between hypotheses, it can only help you eliminate. The idea that theories can never be proved true, but only be shown to be wrong is core to how science is done. So scientists definitely know which theories are wrong, but they never know for certain which theories are right. (Surprisingly, that’s also how venture capitalists work: while investing, they know for sure which companies are “duds” but they never know which ones are going to be “unicorns”).
Box also said: “some (models) are useful”. Notice that he didn’t say some models are correct. He used the term “useful”.
The usefulness of models points to their ability to predict the future. Scientific laws (like Newton’s law of gravitation) are models that help us predict solar eclipses hundreds of years ahead. They work quite well for this purpose while another model like throwing darts to predict eclipses will fail horribly. So even though both models are wrong, Newton’s law gives us more mileage because it’s proven to be useful in a variety of contexts.
The key points to remember:
- Don’t shoot for being right because there’s no such thing as the “correct” assumptions. Shoot for having “useful” assumptions.
- Data alone is sterile. Whenever you think data is giving you insights, it’s actually the data+your assumptions that are informing you.
If you enjoyed reading my letter, do send me a note with your thoughts at email@example.com. I read and reply to all emails 🙂