There’s an old economic adage from around 1975 that says, “Any observed statistical regularity tends to break down as soon as pressure is applied to it for control purposes.”
This statement came from the Bank of England’s chief economic adviser, Charles Goodhart, in an article on monetary policy in the United Kingdom.
Goodhart was making a direct criticism of the policies and practices used to measure the growth of the UK economy up to that point. He warned that a feature of the economy that is used as an indicator of its performance inevitably ceases to function as an indicator because people start to play with it. This concept, now known as Goodhart’s Law, applies to far more than economic measurements.
You can see Goodhart’s Law in action when anything is set as a Key Performance Indicator (KPI) and linked to a goal as a performance measure, especially in IT. What is not linked to a goal and a specific measurement these days?
Tiny Russian Nails and Poor Service
A more descriptive paraphrase of Goodhart’s Law is that “when a measure becomes a target, it ceases to be a good measure.” This should be clear to anyone setting a team’s goals and determining what metrics should be monitored to achieve those goals. This is illustrated by the parable of a Russian nail factory looking for ways to motivate its employees to produce more nails.
The story goes that one day in Soviet Russia, a nail factory wanted to increase production and decided to set a goal for the workers based on the number of nails they produced per day. The workers immediately focused on reaching their new goal and decided to produce thousands of tiny nails. To the dismay of their leadership, the workers met their production goal, but the nails were so small that they were unusable. They then changed their goal from the number of nails produced per day to the number of pounds (actual pounds) of nails they produced per day. Again, the workers focused on their new goal and produced a single giant nail that dwarfed the thousands of tiny nails they had produced the previous day, so they met their goal.
Probably no one reading this has ever been responsible for maximizing the production of a Russian nail factory, but this story could easily play out in any modern workplace. In a customer service call center, for example, it might be a good idea to pay employees based on the number of customers they help, rather than the amount of time they spend on the job. So, hourly pay is replaced with a compensation plan based solely on the number of calls an employee makes or receives. Employees immediately focus on reaching their new goal and maximizing their incentives, soon doubling the number of calls they made before. On the reporting dashboard, it looks like this new policy is a success, but upon closer inspection, it turns out that the quality of each call has plummeted as employees try to generate more calls instead of solving problems or taking care of the customer. This leads to unhappy customers who leave bad reviews, resulting in lost revenue and downstream repercussions for escalation teams.
The Cobra Effect
In these extreme scenarios, as employees place a new, single focus on production increases to get their paycheck, shortcuts are taken, courtesies and follow-ups are eliminated, and the customer experience and quality of work are diminished. As Steve Jobs said, “Incentive structures work. So you have to be very careful about what you incentivize your employees to do, because different incentive structures have all kinds of consequences that you cannot predict.”
What Jobs describes is also known as the Cobra Effect. This effect occurs when unintended consequences occur because trying to solve a problem causes the problem to get worse.
This idiom stems from an incident that also illustrates Goodhart’s Law. When India was under British rule, the government tried to eliminate the amount of poisonous cobras in the capital city of Delhi. The plan was to offer a bounty for each dead cobra that a resident killed and delivered to those responsible. As the government paid out the rewards and the number of dead snakes continued to increase, the strategy appeared to be successful. Just as in our other examples, certain people focused on the goal and maximizing their incentive, not on the overall intent of ridding the city of Delhi of its cobra problem. Enterprising people played out the program by breeding cobras for slaughter rather than actually hunting the problematic snakes. When the government learned of the exploitation, the bounty was discontinued and the breeders released their now worthless snakes into the wild. The unintended consequence or cobra effect in this anecdote is that these newly released snakes have significantly increased the cobra population in Delhi.
What Should Be Measured?
Goodhart’s Law does not tell us to stop measuring things. Depending on the circumstances, applying Goodhart’s Law might show that even more measurement is needed to avoid an environment where promotions and pay scales are tied to specific measures and goals, which can lead to the dreaded Cobra Effect. Specifically in data science, Goodhart’s Law reminds us of the need for appropriate metrics to achieve optimization. Using a single key data set to determine if the effectiveness of a solution can lead to adverse consequences. But often we focus on limited data such as the mean squared error in regressions or the F1-score in classification problems to determine the effectiveness of a machine learning model.
Choosing metrics that seem useful at first glance, or using metrics without careful consideration of what the chosen metrics promote, is an all too common strategy for determining measures of success. Metrics development is both a science and an art and should be carefully considered and tested before becoming a guidepost for success.
Pressure testing metrics
Immediacy: Can the metric be calculated in real time? Does it provide feedback quickly enough to incentivize?
Simplicity: is the metric difficult to understand? Will participants understand it well enough to influence their behavior? Are the implications understood?
Fairness: is the metric proportionate to the actual goals? Does the metric provide disproportionate benefits to some groups? Do behaviors influenced by the metric impose costs elsewhere in the system?
Non-corruptibility: can the system be used by a party that has incentives to cheat? Does the metric create unfair information asymmetries?
Poorly designed metrics will be exploited.
However, before introducing goals for a team or project in any industry, make sure to thoroughly interrogate that each goal or metric is coherent and known to all parties who may be affected by it.
In addition, take the time to review the metrics up front to see if they can be exploited and set up long-term checkpoints to identify system and behavior changes that may occur over time.
If you make the effort to develop efficient solutions, you will not solve all problems or satisfy all stakeholders, but you will have fewer erroneous metrics and better results overall.