Numbers never lie... until they do.

“Numbers never lie”.

You have likely heard this said. It’s wielded by keyboard warriors in an attempt to shout over dissenting opinions. “Sure, my viewpoint might be callous, but it’s backed by the numbers, and numbers never lie”.

Whoever first said this was lying. Maybe it’s more charitable to say that they just weren’t telling the whole truth.

I’m a researcher. I look into why people use systems and processes. I use both direct and indirect methods. I study behaviors and attitudes. I have a decent understanding of research practices, both in academic and corporate environments. I can tell you with a level of confidence that numbers do absolutely lie - or at least, the people showing the numbers lie.

How we interpret data and seek patterns is a vital part of living in our data-crazy world. If we aren’t careful, we start from a position of opinion and move toward fact, rather than moving from the facts to form an opinion.

Why data interpretation is important

It doesn’t matter what you do in life - at this point, your life is impacted by data; Data collection, data privacy, data retention, data leaks… it’s everywhere.

On a given day, you’ll see data about population growth, economics, health, education, social media. It’s inescapable. And since it’s inescapable, your understanding of the world is(consciously and unconsciously) being shaped by it. Why?

Humans were created to seek understanding in our environment. We are natural pattern seekers. When we see two points of data, we want to connect the dots - it’s just our nature. Data interpretation is as natural as breathing to us. Our pattern seeking helped us understand the connection between water and crop growth, how to efficiently construct buildings and cities, and eradicate diseases.

If it really is natural that we interpret data, then we need to understand the right and wrong ways to do it. So let me talk about how Nicolas Cage movies cause more people to drown in pools.

Data interpretation is as natural as breathing to us

Spurious Correlations

Tyler Vigen is a Harvard Law School student who took up a hobby of finding places where our desire to find meaning in data does us a disservice.

These show up as “spurious correlation'“ - two entirely independent trends that appear to be related, but where no causal relationship exists.

My favorite is the (apparent) trend whereby Nicolas Cage movies influence the number of people who drown in pools.

source: https://www.tylervigen.com/spurious-correlations

source: https://www.tylervigen.com/spurious-correlations

Our logical brains override our desire to seek meaning in data here. Obviously people aren’t killing themselves because Ol’ Nic decided to take on another horrible movie. But this happens in less obvious ways as well.

The most common causes of these spurious relationships are statistical errors - not using the right representation of a population, not involving enough people, not controlling for other variables that influence the results, testing a “null hypothesis”. But all of those are on the researcher. A good researcher will make sure their work is not easily misinterpreted. In fact, if you dig deep enough to find the core research behind most data we see, you will realize that most research is not readily able to be spuriously correlated.

The problem exists on the consumption of the research.

Data doctoring to fit an agenda

Many of you reading this have probably not laid eyes on a research paper in decades, if ever. And for good reason - they are typically dry, uninteresting, and unnecessary for a laypersons understanding of data. For most people at most times, a synopsis of research given by a news outlet is enough to give you an idea of what the data suggest.

But we are increasingly in a polarized country. Never has our media and greater culture been so partisan. Everyone has a platform from which they can speak what they think is truth. As a result, we have a breakdown in relationships and a disconnect between the observed world and our interpretation of it.

Media outlets have learned that the real money is in pandering to one set of partisans or another. To do this, they talk of retention rates - how many people keep coming back to their content. How do you ensure consistent retention? One way is to continue to reinforce the narratives already written.

We as consumers have likewise been conditioned into believing that our opinion is of utmost importance, that interpretation of truth is more important than truth itself. We are therefore more willing to disregard or cast off stories that challenge our way of thinking.

The result? The desire to create and consume doctored data. Research is a lot more palatable when we don’t have to change our convictions. So our media outlets are incentivized to make sure the data they report on supports the partisanship they have chosen to speak to, and consumers are incentivized to only read the things they already believe.

Research seeks to take a set of data and derive meaning from it, which ultimately gives birth to specific goals and desires. Our media and our own internal conditioning seeks to take a set of goals and desires and find meaning from a set of data. The nuance is subtle but present. This leaves the conscientious consumer with a problem - how do they know the stories they see accurately represent the research performed?

How to know when data is doctored

It’s so hard to write this section, because the root answer is “have enough time to do additional research”. Many of us just don’t have the time. If you are interested in figuring out when to dig a little deeper, here’s some simple questions to ask yourself:

  • Is the story I’m reading linking to a specific research paper or topic?

  • Is the story I’m reading being published on other sites with similar partisanship?

  • Is the story I’m reading being published on other sites with different partisanship?

  • Does the outlet for the story stand to gain/lose something based on the research?

  • Does the story spend more time talking about the research, or the implications of the research?

  • Does the story tell me how I “should” think or feel about the data?

Be wise. But more importantly, be understanding.

You think it’s hard for you to know what to think? Join the 300,000,000 other people in the US that feel that way too. Like it or not, we are a community. Fractured, fractious, fragile, foolish, but ultimately we’re all in the same boat. Let’s try to empathize with each other and understand that misinformation is rampant, and we’re all to some extent victims.