Chalk X

Finding where to put your mark

Math, Metrics, and Science Oh My!

Who doesn’t like Math, Metrics, and Science? According to Pew Research, only 11% of Americans say that they liked math in school. Based on most public discourse, this isn’t hard to confirm. I want to look at a couple of commonly used statistics terms and how metrics get misused. This misuse can cause us to make the wrong assumptions, draw incorrect conclusions, and ultimately make decisions that, at best, don’t have the expected impact and, worse, cause catastrophic outcomes.

Metrics should always be used to solve a problem. For our problem-solving to be effective and consistent, we all need to have a discussion based on reality and not a skewed misunderstanding of mathematics. 

Math – Statistics

The mean household income in the US is 111,600, and the median is 83,360. Why the two numbers? What are they telling us? 

First, let’s start with two commonly used statistics: mean and median. 

  • Mean: The sum of the values in the group divided by how many numbers are in a group. 
  • Median: The middle value in a group. 

These two numbers show that while the mean/average income is growing, the median is starting to plateau. The growing distance from our numbers indicates an increasing variance in the data. 

In this case, the graph tells us: 

  1. The overall income for Americans is going up over time. 
  2. Because the average is growing faster than the median, we can tell that the variance in our data set is also increasing. Since the mean exceeds the median value, there is more income growth in fewer people.

Note: Nowhere in the data does it say why. It also doesn’t indicate if it is good or bad. More on that in the last section.

Let’s consider a hypothetical scenario. You live in a middle-class neighborhood in the US, and the mean and median income on your street of 20 families is $80,000. This neighborhood has a homeowner’s fee based on the mean average income. It is 1% of the mean annual income or $800 annually.

Now Jeff Bezos buys one of these 20 houses. The median would still be 80,000. The mean would be $5.35 Billion. And now your annual homeowner’s fee would be $53.5 million dollars yearly. 

This example may seem ludicrous; however, how often do we base decisions on averages in business? We want to increase the average number of sales. Perhaps all we care about is absolute numbers, but we likely want to understand where sales are coming from—which products, margins, amounts, new customers, etc.

Understanding what the calculations may mean on a dataset helps us generate new questions to understand our datasets better.

Metrics 

With today’s information systems, many numbers are available to look at. This is true in every field. 

  1. Operations: Number of incidents, availability, and usage
  2. Financials: Gross Profit Margin, Operating Cash Flow, Average Invoice Processing Cost.
  3. Marketing: Conversion rate. Costs per Lead, Click-through Rate
  4. Manufacturing: Throughput, utilization, downtime, cycle time, etc.
  5. Etc. 

In each area, there is a wealth of numbers you can quickly get as our systems are designed to generate data. Collecting metrics isn’t free, even if the system you are paying for generates them. If you are discussing which numbers to use and you are all getting paid, then those metrics cost something. The question is, are you getting value out of that discussion, tracking, and reviewing metrics? 

Common Metric Pitfalls. 

  1. Using averages for system management—When evaluating a system’s performance, there is a tendency to consider all numbers equally.
    • However, uncontrollable things happen. If the power went out due to a meteor, a system’s average throughput, availability, and processing times would decrease. If you spent millions of dollars to increase those averages, you are likely wasting your money. Unless you’re Ben Affleck and on a mission to save Earth by drilling into a meteor with your future father-in-law. 
    • Armageddon aside, you likely have some natural variance in your system and control. Instead of using averages, you should manage variance via X Bar R charts.  Much of modern business thinking is that your system produces the correct result 99.9997% of the time, or Six Sigma of control is the target for all systems. Note that Six Sigma is very expensive. This may be the right target for things like semiconductors and airplane parts. However, in many systems, Four Sigma is just fine at 99.9937. Perhaps you are making cheap giveaways. Or maybe you aren’t measuring the right thing: 
  2. Averaging unlike numbers together – Let’s say you have multiple systems working in your factory. They all create different parts for your main product. Some create screws, and others make the frame of your widget. Then, you want to see the average issue rate for all your machines. The screw machine has 25 issues per day, and the frame machine has 5. You have no control over the system if you depend on the overall average of all your machines together. Five bad screws may mean nothing, but five bad frames may result in no good parts for the day. In general, consider the unit of measure. If you add each machine’s average daily failure together, your unit of measure isn’t the number of machine issues per day. It is really a number of issues per screw*frame*housing*battery*etc/day.  
  3. Vanity metrics – Collecting before we have a reason. When you want to look at numbers to see how big they are, it doesn’t drive a decision. This is a vanity metric. Common vanity metrics include: visits to the website, volume of shipments, or tonnage of product created. These aren’t always fun to see but they may be unconnected to value creation.
  4. If I can measure it, it counts. – Simply because a number exists in a system does not make it relevant to your business. You can always count the number of chairs in your office, but just because you can obtain the number doesn’t make it useful. More metrics add noise, not clarity, to managing the system. 

Here is a great example of making decisions on easy to count numbers (speed) instead of hard to count numbers (desire to ride the train) that matter.

Metrics Summary: 
  1. Metrics shouldn’t be used unless they are driving a decision.
  2. Make sure you understand units of measure before you integrate unlike numbers together using different calculations, such as average and median. 
  3. Not understanding your dataset makes the use of metrics impossible. Because of this, changes in metrics should be an invitation to a conversation, not a conclusion. 
  4. Make sure they are the right metrics. Do they actually drive the outcome you desire?

Science

Many will look at dashboards and statistics, see apparent correlations between numbers, and conclude that these metrics are causational. Meaning a change in one metric causes a direct change in another. People sometimes call this a “discovery” or “doing research.” This is a misuse of the English language. When you look at data or metrics and have an idea what it means, at best, you have a hypothesis, not a theory. 

A theory is a well-substantiated explanation of a natural phenomenon that has been repeatedly tested and confirmed. Theories are based on evidence and the scientific method.

A fun way to look at how easily we can come up with an incorrect hypothesis is to look at the website: “Spurious correlations” 

There are many charts showing correlations between two seemingly unrelated phenomena. For example: 

One could falsely assume that forcing more children to be named Thomas in the US it would increase the Gasoline consumed in France. This may sound outlandish, but this same reasoning has caused mistakes in public health. For example, in the 1940s, there was a thought that ice cream could be causing Polio:

That should make you ask the question. Why is this? The reason is ice cream sales go up in the summer. Polio cases go up in the summer. In the summer, people gather more, and kids play in pools and rivers. Polo can spread quickly through water contamination. 

The key here is that when you see a correlation, you don’t draw a conclusion; you start to ask questions. Someone has already tested the Polio theory to find out why, but that is precisely what we need to do when we see a perceived relationship in our metrics: Run a study. 

You cannot retrofit an existing study to your hypothesis. This is a dangerous game, leading to many academic papers being retracted.

Summary:

  1. Use data as a starting point to ask questions don’t draw conclusions.
  2. With the power of some data analytic systems there is a temptation to keep trying to find meaning in a dataset. This will lead to more false conclusions with the added detriment of thinking it accurate.
  3. Create experiments to test your hypothesis. Identify indicators and success conditions ahead of time.

Published by

Leave a comment