Numbers Are Not Just Numbers

by Yan Peng

Welcome to my first blog post as a Data Schooler! In our first week of training, we delved into the theoretical underpinnings of data analysis. As I reflected on these foundational concepts, I couldn't help but notice how my understanding of numbers has evolved since transitioning into my role as a data analyst.

One key takeaway that resonated with me is that numbers are not mere figures; they carry meaning only when contextualized. Imagine someone tells you the sales figure for a company this year, you wouldn’t know what to make of this information. Without context, a number doesn’t really say much. Worse, it can be misleading.

So, what does context entail? In the case of sales numbers, we might compare this year's figures with those from the previous year or benchmark them against other regions. Comparison, however, can be tricky.

Take Figure 1 as an example. The chart is from my analysis of the Munich Airbnb market based on data pertaining to one day during the Oktoberfest in 2023. It shows that Airbnb listings closer to the venue cost more – pretty neat at the first glance, right? The problem is that the Oktoberfest venue is already situated in an area where the average listing price is among the highest in the city. This means that the proximity to the event venue is not the only contributing factor to the pattern in the chart. To strengthen our analysis, we need to assess the relationship across different months to see whether the price disparity changes. If the slope is flatter, then we can conclude that Oktoberfest does lead to higher price in terms of its venue.

Figure 1. Distance from Oktoberfest venue and price of Munich Airbnb listings

In our daily lives, we often lack the time or patience to dig deeper into the context of numbers or to question whether a comparison is meaningful. Advertisers know how to take advantage of this tendency, as illustrated vividly in the book “Calling Bullshit: The Art of Scepticism in a Data-Driven World” by Bergstrom and West. When the label on a box of chocolate says “20% less fat”, can we simply conclude that it is healthier? We may want to ask some follow-up questions such as 20% less fat than WHAT, or how about other potentially harmful ingredients like refined sugar?

A more egregious example mentioned in the book comes from a headline by the American far-right news outlet Breitbart, claiming that 2,139 of the DREAM act or DACA recipients (undocumented immigrants who entered the United States as minors) had been convicted or accused of crimes against Americans. While the number sounds alarming, it represents only a fraction of people who held DACA status at that time—only about 0.3%, which is less than 1 in 300. On top of this, when contextualized against the criminal rate of US citizens, which is 0.75%, the figure further loses its sensationalism. Figure 2 shows the difference between using a bare number and embedding it in a wider context.

Figure 2. Illustration of the same numeric object without and with context

As data analysts, we grapple with numbers constantly. It's not enough to ensure their accuracy; we must also embed them in meaningful contexts to tell truthful stories.

In conclusion, numbers carry meaning only when accompanied by context. Whether analyzing business data or scrutinizing news headlines, contextualization is crucial for an accurate interpretation of the data.