By Jean-Philippe Legault, guest contributor

A few weeks ago, I came across an article where the author predicted a major decline in the stock markets. Reading this article gave me goosebumps. Make no mistake, my dread came not from his negative conclusion, but rather from the process used to arrive at it.

Without going into too much detail, the author had developed a “model” that told him that the stock markets would decline. His model was developed by comparing historical returns of the S&P 500 with certain periods of the US presidential cycle. It also combined different economic measures that showed that current levels bear a strong resemblance to those of 2000 and 2008. In other words, it used a wealth of historical data that served to support its conclusion that we were about to experience a correction.

We must be careful with this type of model and analysis since we often find the “multiple data bias”, more commonly called the “data-mining bias”. To help you better understand data mining, I will explain the birthday paradox.

The Birthday Paradox

Imagine that you work in a small company and a new boss has just been appointed. How likely are you to have the same birthday as your new boss? The answer: 0.27%, or 1 in 365.25 days (including the leap year). The probability is therefore very low. Now, what is the probability that two people have the same birthday in your company made up of 23 people? The answer: nearly 50%!

Such a high percentage may seem surprising, but it stems from the high number of potential relationships between each individual. There are no fewer than 253 possible relationships between these 23 individuals (individual 1 with individual 2, individual 1 with individual 3, etc.)

Since you have now mastered the concept, here is a new question. What is the probability that two individuals share the same birthday in a group of 50 people? The answer: 97%. Once again, the 1,225 possible relationships between these 50 individuals explain this high probability.

Seek and You Will Find

I am constantly fascinated by the amount of information and data in the financial universe. Thanks to technology, we can record, analyze, and compare all this data. So, with a bit of research, you will no doubt be able to find data that confirms your opinion or even financial data that seems to predict the future.

Besides making me smile, the 1995 analysis by David J. Leinweber, Ph.D. explains this point well. Mr. Leinweber had managed to find a very strong relationship between the variation of the S&P 500 and the production of butter in Bangladesh. The relationship with the S&P 500 became even better if we added the production of cheese in the United States.

Imagine me presenting this model to you and using this latest data to predict the movements of the S&P 500. You will surely laugh at me. Now imagine that data being inflation, GDP, interest rates, and even periods of the US presidential cycle. You will probably take me seriously.

Using economic data certainly makes more sense than butter in Bangladesh since there are definitely cause and effect relationships between the performance of the S&P 500 and the economy. However, some relationships can be mere chance.

That’s why I think it’s best to look at these historical “patterns” with caution. The explanation and reasoning behind building a model are certainly more important than the model itself.