Characteristics of Good Metrics

author Ishan Goel image Ishan Goel
8 Min Read
Generated via Open AI Dall-E 3

The metric system was developed in France in the 1790s in the aftermath of the French Revolution. Equality and liberalization gave birth to the need for a standardized system of measurement for the people. A coherent, decimal-based system with units such as kilograms, meters and seconds could be used to measure all things in daily life. A convenient system of prefixes like “kilo” and “milli” could be used to denote multiples and fractions. A meter was 1/10,000,000 of a quadrant of Earth’s meridian, measured through Paris, and the kilogram was the mass of a cubic decimeter of water. By 1799, the metric system was established with the hope that these new units would be “for all people, for all time”.

Humans have designed new and convenient metric designs repeatedly throughout history. However, a lot many factors go into the adoption of metric designs. The biggest example is that of the United States which never shifted entirely to the new system for a variety of historical, economic and cultural reasons. At the time it was introduced in France, Thomas Jefferson, then Secretary of State in the U.S. was aware of the new system but ended up adopting the British Imperial System of measurement due to its existing use. With the Industrial Revolution, the U.S. got heavily invested into the manufacturing process escalating the costs of switching to the new system. The subsequent generations became so comfortable with the imperial system that they chose to stick with it and life went on.

Metric design has been innate to human existence and the choice of metrics is never right or wrong. There are just some good choices that work for you, and others that cause difficulty. There are some desirable characteristics of metrics depending on the context you need them. And there are some trade-offs you have to make based on what you prefer. This blogpost explains the 11 questions you need to ask yourself when designing relevant metrics for your experiments. They are grouped into three sections: Statistical aspects, Logistical aspects and the Usability aspects.

Statistical Aspects

Metrics are essentially numbers that quantify business processes and help a leader measure the success of a product. Numbers in the real world are random and you need to think about the associated randomness when choosing a metric.

  1. Sensitivity: Will the metric move in response to the change tested? Not all metrics will be very sensitive to the kind of change you make. Imagine if you want to weigh a heap of rice, a human weighing machine won’t be a great choice because small additions of rice would not move the ticker. Similarly, a better reading font on a website might not be able to move the conversion rate but can definitely move the time spent on the page.
  1. Robustness: Will the metric move too much even when nothing has changed? On the contrary, sometimes a metric fluctuates a lot due to external factors. Think about how it is harder to read the readings from hanging spring-based weight measurers when the pointer oscillates up and down in a simple harmonic motion. A metric that fluctuates ends up distorting the small nuances effect-sizes you want to detect. 
  1. Statistical Biases: What statistical biases can be there in your metric?: Often the source of some metrics will be statistically biased and such metrics will give a distorted picture of reality. For instance, if play store reviews are the north star metric of your business, you will always have a biased picture of your customer satisfaction because most unengaged users would not care to leave a review, be it good or bad.
  1. Noisy Data: How noisy is your metric in practice? Some metrics are so structured that they capture a lot of noisy data that adds unnecessary randomness to your experiment. For instance, if you feel surveying is the right way to collect the data you need and make a long survey that your users cannot exit in between, you will be instigating them to give frivolous and noisy responses which in turn will affect your decision making.
Source: StatsNinja

Logistical Aspects

Metrics are created out of data and no matter how tractable and easy data collection has become, there will always be logistical issues with collecting and summarizing data. If you are selecting a metric, it is necessary to consider some logistical questions about the metric.

  1. Computational Complexity: Can the metric be easily and reliably tracked? A metric that is very costly to calculate is probably not a good metric in the long run because you cannot measure it at scale. For instance, if you are tracking engagement through a machine learning model that takes in various behavioral attributes and then takes over a minute to spit out a predicted engagement number, you will never be able to scale up the predicted engagement variable. As users increase you will be forced to switch to low-cost alternatives.
  1. Delay and Time Aspects: Can the metric be recorded in a meaningful time? If you base weekly experiments on the metric of yearly ARR, you would be running most experiments for at least half a year to reach a conclusion. You want metrics that can be recorded fast and in real-time to be able to have meaningful control over these metrics.
  1. Availability: Is the metric available across a wide range of use cases? Often you might be targeting a metric that is not easily available across your entire audience segment. For instance, metrics derived from user data often face a non-availability problem. Different countries and different operating systems (iOS or Android) place different restrictions on the availability of user data creating a severe logistic issue in reliably calculating metrics. Metrics should hence be chosen to have a high rate of availability.

Usability Aspects

Product metrics are to be eventually used by people to meet business goals. The widespread use and adoption of the metric is not possible if metrics just end up being an abstraction no one can understand. This leads to various other questions one needs to answer when choosing a good characteristic metric.

  1. Interpretability: Is the metric easily understandable by stakeholders? A metric that is not easily understandable is at best just a number that can’t be practically used for product and business growth. For instance, wrapping your number of daily active users with a log function might be useful for a machine learning algorithm to capture exponential growth but not for a decision-maker who wants to use it to measure product health and success. Metrics should be designed such that they are clear, easily understandable, and capture early signs of user behavior or business growth.
  1. Directionality: Which direction of the metric movement is aligned to the desired objective? It is often not straightforward which direction is best for the movement of the metric. For instance, an increase in clicks on your product might not be always aligned with engagement but might also be sometimes correlated to more hanging issues on the customer’s experience. It is hence important that the metrics are well understood in terms of what they represent and what an increase or decrease might mean.
  1. Metric Limitations: What are the cases in which the metric would not work? Very few metrics work across all use cases. For instance, revenue per user might not be a great metric when trying to run experiments to improve the number of users trying out a new product. The team that is focused on the metrics should be able to list down the cases in which the metrics would not work and should be careful to not use the same metric everywhere. 
  1. Gameability: Can the metric be optimized in a way that beats the purpose? A metric is said to be gamed when it is optimized in a way that does not serve the intended purpose of choosing the metric. The hard truth is that metrics can be gameable in a lot more ways than you can imagine. For instance, it might seem reasonable to choose revenue generated as a metric for monitoring and optimizing email marketing campaigns. However, in efforts to optimize this metric, an agent might unknowingly start spamming your customers with countless emails as added emails can never lead to a decrease in generated revenue.
Source: Marketoonist

Conclusion

Data has exploded in the 21st century and even simple web applications provide several different raw signals that can be manipulated to create useful metrics. The bottleneck is no more the availability of data to create effective metrics for your product, but rather the capability to decide which metrics are not good metrics from the myriad of options presented by permutations and combinations of raw signals. Metrics are silent and insidious because they look suitable on the surface. However, they reveal their true nature only in practice after leading one down the road of unintended consequences. 

Not all metrics are good metrics, and choosing good metrics is a mix of art and science. 

But no one expects you to get it right in the first go. What is important is to have a process to regularly review, revise and evolve your metric structure.

Once you understand and optimize your metrics to have some good characteristics, they stay and guide you towards success for a long time to come.

This blogpost was written in collaboration with Manisha Arora. A special thanks to her for the same.

Source: EQuest

You might also love to read these

Share

Get new content on mail