What Is All This Data for?

What Is All This Data for?

Although data may be the new oil, this does not mean that its collection is inherently valuable. It is important for organizations to keep Goodhart's Law in mind so that a data-driven mindset does not distort their larger goals.

In 1975, the British economist Charles Goodhart coined a saying that came to be known as Goodhart's Law, which posits that "any observed statistical regularity will crumble under pressure when it is used for control purposes." Thankfully, the anthropologist Marilyn Strathern was able to translate the original text from Economese into this pithy dictum:

“When a measure becomes a target, it ceases to be a good measure.”

Goodhart’s original insight has become increasingly relevant in today's data-driven world as the costs of data collection have decreased while computer storage and processing capacity has exploded. When data is used for allocating punishments and rewards, the insights drawn from that data are likely to be overhyped, misleading, or fraudulent. Why? Because humans in organizations, if they have the incentives to do so, respond to metric collection in ways that fundamentally change the metrics' meanings.

HIGH-STAKES TESTING DATA

As Jerry Muller refers to it, "metric fixation" is the replacement of administrative judgment with standardized numerical performance measures. A recent and famous example of this is school systems' use of "value-added" testing of teacher and school performance. This type of testing is designed to accurately assess the impact of inputs on a student's in-class performance, while controlling for outside factors that could affect the test results. "High-stakes" testing is a term used to describe the practice of using test results to evaluate employees for promotions, pay raises, and other financial rewards in many districts.

These practices, which are driven by data, can ultimately weaken the diagnostic value of testing for teachers and schools that use this data to inform their curriculum design, lesson planning, and class-time allocations. When employment decisions, pay incentives, and other retributive measures are linked to testing outcomes, teachers and schools face clear incentives to focus a great deal, or even a majority, of classroom time on teaching testing strategies as opposed to subject material. Most test questions favor easily measurable, multiple-choice answers instead of delving into deeper concepts within the scope of the subject material. The short-term benefits of this topicality are clear, but they come at the expense of long-term learning: Most students forget algebra and geometry within five years of graduating high school and almost all the math they learned within 25 years. Additionally, privileging high-stakes testing outcomes can also result in negative consequences such as "creaming," the practice of classifying weaker pupils as disabled in order to remove them from the pool of tested students, or even fraud.

COMPSTAT METRICS

Another area where rewards and punishments based on data collection have warped institutional incentives and produced perverse outcomes is policing. In 1994, the New York Police Department introduced CompStat as a means to track crime patterns and allocate police resources. The program had mixed results, initially being successful in lowering New York's violent crime rates but later being used as a way to performance-manage officers.

The city government's pressure on the NYPD for crime reductions, based on CompStat data, led to NYPD corruption. The department is trying to choose between two methods to lower the amount of major crimes: either by downgrading serious crimes to minor offenses or by over policing minor, easily arrestable infractions. The manipulation of data occurred as a result of this practice, where every arrest had the same weight - from a minor drug possession charge to a major arrest of a known, violent felon - in CompStat's arrest counts. By measuring crime, CompStat shifted the focus from helping the NYPD to judging the success of political leaders.

MILITARY AND POLITICAL DATA

The military has fallen victim to Goodhart's Law in regards to information collection. Good data provides feedback about strategies and tactics that are effective during conflicts. When metrics are used to judge combat effectiveness, there is a risk of data being skewed to achieve better results.

The military collected data on the number of Vietnamese dead as a means of convincing the American public that the U.S. was winning the war. Although most field commanders did not trust its validity as an index of battlefield success, the body count was still U.S. Secretary of Defense Robert McNamara's prized metric. American soldiers were sometimes killed while trying to get an accurate count of Vietnamese casualties after battles, in order to make body counts seem higher.

According to Edward Luttwak's The Pentagon and the Art of War, measures of military outputs like body counts, battle incidents, and other non-territorial statistics are unable to discern whether campaigns were ultimately successful. According to Luttwak, the only variables that mattered in eventual military victory were non-measurable, because it was impossible to quantify an enemy’s willingness to fight. He went on to say that concentrating on quantifiable military data in the present moment came at the expense of the long-term strategic thinking necessary for victory.

If we only focus on short-term measures, we may not be successful in the long term, as is the case with international development. Programs that are more likely to receive a greater amount of foreign aid are ones that can be more easily analyzed by organizations such as the United States Office of Management and Budget or the Government Accountability Office. Although one might not expect to see good statistical measures of progress over the course of a few months or years when dealing with issues like improving governmental skills in transitioning democracies or instilling civic trust and civil-service norms in skeptical populations, it is still true that programs with easily quantifiable results in the present are the least effective long term. Because of this, organizations such as the U.S. Agency for International Development use up valuable resources collecting and broadcasting data that does not meaningfully show why they deserve funding each year. These futile endeavors take away from time and resources that could be used for more effective methods of worldwide development.

Developing countries understand that financial assistance is frequently tied to macroeconomic outcomes. By skewing statistical indicators like per-capita gross national income (GNI), population counts, foreign direct investments, and other metrics used by the United Nations and other international bodies in determining where to distribute aid, these countries then have clear incentives.

They analyzed discrepancies between countries’ data published online versus print editions of these same countries’ GNI figures over the same time periods. The most up-to-date online data for GNI figures may differ from what is printed in the World Bank Atlas because it is easier to revise electronic data. Once the World Bank processed aid applications for these countries, the discrepancies were no longer present.

DATA AND SCIENCE

Goodhart's Law contributes heavily to the replication crisis in empirical academic inquiry. Both the "h-index" and "journal impact factor" are used to measure the overall impacts that empirical works have on their fields of knowledge. By design, the h-index and journal impact factor create pressure on researchers to hide research that is inconclusive or does not support the existing beliefs. In addition, there is no motivation to replicate or reproduce the results of other scientific research. Good science relies on replication to verify new results as more information becomes available with time, but replication studies have minimal impact on h-indices and impact factors in comparison to new, interesting findings that, though doubtful, oppose the prevailing consensus.

Incentives for output volume versus output quality leads to research of diminishing quality and high false discovery rates, a phenomenon which Richard McElreath and Paul Smaldino refer to as the "natural selection of bad science." In his book Science Fictions, Stuart Ritchie demonstrates how metrics tied to institutional incentives that favor poor quality scientific research can undermine the public trust in science. Ritchie observes that, because of the citation and publication count bases of h-indices and journal impact factors, there are pressures for scientists and journals to form citation rings that cite each other's work regardless of quality.

THE STAKES OF BIG DATA

It is critical to ask a variety of questions about how and why metrics could possibly be misleading as well as how they might be illuminating when constructing metrics. In "The Tyranny of Metrics", Jerry Muller emphasizes the importance of considering how those who are being measured might respond to the metrics being used. This question becomes more important when rewards or punishments are given based on the metrics collected. Furthermore, it is crucial to consider who creates these metrics and what the purpose is for collecting them.

Low-stakes situations are where metrics are most useful for practitioners to understand what processes might be at work and to find insights about process improvements. A focus on short-term measurement can have negative consequences for long-term objectives and should be avoided. As a result of this, when metrics and institutional incentives are combined, it is essential for organizations to be aware of Charles Goodhart's advice.

Developer Jobs in Austria

This might also interest you