The Google Ngram Viewer, which instantly searches through thousands of scanned books and other publications, provides a rough but telling portrait of changes in our culture. Set the parameters by years, type in a term or phrase, and up pops a graph showing the incidence of the words selected from 1800 to the present. Look up “gender”, for example, and you will see a line that curves upward around 1972; the slope becomes steeper around 1980, reaches its peak in 2000, and afterwards declines gently. Type in “accountability” and behold a line that begins to curve upward around 1965, with an increasingly steep upward slope after 1985. So too with “metrics”, whose steep increase starts around 1985. “Benchmarks” follows the same pattern, as does “performance indicators.” But unlike “gender”, the lines for “accountability”, “metrics”, “benchmarks”, and “performance indicators” are all still on the upswing.
Today, “accountability” and its kissing cousins “metrics” and “performance indicators” seem to be, if not on every lip, then on every piece of legislation, and certainly on every policy memo in the Western world. In business, government, non-profit organizations, and education, “accountability” has become a ubiquitous meme—a pattern that repeats itself endlessly, albeit with thousands of localized variations.
The characteristic feature of the culture of accountability is the aspiration to replace judgment with standardized measurement. Judgment is understood as personal, subjective, and self-interested; metrics are supposed to provide information that is hard and objective. The strategy behind the numbers is to improve institutional efficiency by offering rewards to those whose metrics are highest or whose benchmarks have been reached, and by punishing those who fall behind relative to them. Policies based on these assumptions have been on the march for decades, hugely enabled in recent years by dramatic technological advances, and as the ever-rising slope of the Ngram graphs indicate, their assumed truth goes marching on.
The attractions of accountability metrics are apparent. Yet like every culture, the culture of accountability has carved out its own unquestioned sacred space and, as with all arguments from presumed authority, possesses its characteristic blind spots. In this case, the virtues of accountability metrics have been oversold and their costs are underappreciated. It is high time to call accountability and metrics to account.
That might seem a quixotic, if not also a perverse, aspiration. What, after all, could be objectionable about accountability? Should not individuals, departments, divisions, be held to account? And how to do that without counting what they are doing in some standardized, numerical form? How can they be held to firm standards and expectations without providing specific achievement goals, that is, “benchmarks”? And how are people and institutions to be motivated unless rewards are tied to measureable performance? To those in thrall to the culture of accountability, to call its virtues into question is tantamount to championing secrecy, irresponsibility, and, worst of all, imprecision. It is to mark oneself as an enemy of democratic transparency.
To be sure, decision-making based on standardized measurement is often superior to judgment based on personal experience and expertise. Decisions based on big data are useful when the experience of any single practitioner is likely to be too limited to develop an intuitive feel for or reliable measure of efficacy. When a physician confronts the symptoms of a rare disorder, for example, she is better advised to rely on standardized criteria based on the aggregation of many cases. Data-based checklists—standardized procedures for how to proceed under routine or sometimes emergency conditions—have proven valuable in fields as varied as airline operation, rescue squad work, urban policing, and nuclear power plant safety, among a great many.
Clearly, the attempt to measure performance, however difficult it can be, is intrinsically desirable if what is actually measured is a reasonable proxy for what is intended to be measured. But that is not always the case, and between the two is where the blind spots form.
Measurement schemes are deceptively attractive because they often “prove” themselves through low-hanging fruit. They may indeed identify and help to remedy specific problems: It’s good to know which hospitals have the highest rates of infections, which airlines have the best on-time arrival records, and so on, because it can energize and improve performance. But, in many cases, the extension of standardized measurement may suffer diminished utility and even become counterproductive if sensible pragmatism gives way to metric madness. Measurement can readily become counterproductive when it tries to measure the unmeasurable and quantify the unquantifiable, whether to determine rewards or for other purposes. This tends to be the case as the scale of what is being measured grows while the activity itself becomes functionally differentiated, and when those tasked with doing the measuring are detached organizationally from the activity being measured.
The Genesis of Management as Measurement
While it took off only in the 1980s, the culture of standardized assessment has deep roots. Its precursor was the industrial efficiency movement of the late 19th and early 20th centuries, founded by Frederick Winslow Taylor, an American who coined the term “scientific management.” Taylorism was based on trying to replace the implicit knowledge of the workmen with mass production methods developed, planned, monitored, and controlled by managers.
Taylorism was an engineer’s dream, but the culture of accountability and standardized measurement has taken as much, if not by now more, from the accounting profession. It was Robert McNamara, an accountant who at the age of 24 became the youngest professor at the Harvard Business School, who carried the message of metrics to the largest organization in the United States: the U.S. Army. As Secretary of Defense in charge of prosecuting the war in Vietnam, McNamara championed the metric of “body counts” as a purportedly reliable index of American progress in winning the war. But few of the generals in the field considered the body count a valid measure of success, and many knew the counts to be exaggerated or composed of outright lies. The result, in the pithy formulation of Kenneth Cukier and Viktor Mayer-Schönberger, was a “quagmire of quantification.”1
The decades in which McNamara rose from business school professor to Ford Motor Company executive to Secretary of Defense also saw the transformation of American business schools. In an earlier era, business schools had focused on preparing students for jobs in particular industries and enterprises. From the 1950s on, the business school ideal was the general manager, equipped with a set of skills independent of particular industries. The core of managerial expertise was now defined as a mastery of quantitative methodologies.2
Before that, “expertise” meant the career-long accumulation of knowledge of a specific field, as one rose from rung to rung within the same institution or business. Managers had to know the product in all of its manifestations. But in the latter decades of the 20th century, ever more organizations—corporations, universities, and non-profits—came to be headed by those who moved from one institution to another, often in unrelated fields or in different parts of the country, often with very different local circumstances and institutional cultures. Without detailed knowledge of institutional particularities absorbed through long experience within them, these mobile experts sought to gain a handle on the institutions in their charge through easily grasped, standardized measurements. Process was trump, not product.
They also tended to call in the aid of management consultants, people with even less knowledge of the institutions they were hired to advise than the managers themselves. The consultants, often called efficiency experts, boasted the managerial skills of quantitative analysis, whose first maxim was, “If you can’t measure it, you can’t manage it.” Reliance on numbers and quantitative manipulation not only gave the impression of scientific expertise based on “hard” evidence, it also minimized the need for specific knowledge of the institution to which advice was being sold. In time, the same sort of mindset became manifest at places like the World Bank, where development “experts” believed they could apply economic theory anywhere without regard to cultural and historical differences among societies.
The demand for greater “accountability” reflected the growing distrust of institutions that burst out in the 1960s. It picked up steam in the 1970s, and then achieved a kind of theoretical quintessence in “principal-agent theory”, developed by economists in business schools. That theory called attention to the gap between the purposes of institutions and the people who ran and were employed by them. It focused on the problem of aligning the interests of shareholders in maximum profitability and stock price with the interests of corporate executives, whose priorities might diverge from those goals. Principal-agent theory articulated in abstract terms the general suspicion that those employed in institutions were not to be trusted; that their activity had to be monitored and measured; that those measures needed to be transparent to those without firsthand knowledge of the institutions; and that pecuniary rewards and punishments were the most effective way to motivate “agents.” Here too, numbers were seen as a guarantee of objectivity and as replacement for intimate knowledge and personal trust.
These trends, which began in the corporate sector, quickly spread beyond it, not only in the United States, but in other parts of the Anglosphere. They were on display in Great Britain where, in 1987, the administration of Margaret Thatcher developed wide-ranging plans for transforming government funding of British universities. The plan called for a plethora of new “performance indicators”, on the evidence of which ministers and their bureaucracies were to decide upon the allocation of funds to particular universities.
The distinguished British historian and political theorist, Elie Kedourie, emerged as one of the plan’s most scathing critics:
After two decades of government-sponsored excess and prodigality, we see now abroad a vague but powerful discontent and impatience with the ways of universities . . . a nameless yearning for some formula or recipe—more science perhaps, more information, technology, more questionnaires, more monitoring—which will scientifically (or better, magically) prove that they are not wasting their time, which will hook them up with the humming conveyor-belts of industry.3
Kedourie wondered in astonishment that “a Conservative administration should have embarked on a university policy so much at variance with its proclaimed ideals and objectives”, and concluded: “In order to explain the inexplicable, one is driven to conclude that the policy is an outcome not of conscious decisions, but of an unconscious automatic response to an irresistible spirit of the times.”4 Under the slogan of “efficiency” a great fraud was being perpetrated, Kedourie declared, for
efficiency is not a general and abstract attribute. It is always relative to the object in view. A business is more efficient when its return on the factors employed in production is greater than that of another, comparable one. But a university is not a business.5
In the decade that followed, “accountability” became the buzzword among business leaders, politicians, and policymakers in the United States as well. In the words of the historian of education (and erstwhile Department of Education official), Diane Ravitch, “Governors, corporate executives, the first Bush Administration, and the Clinton Administration agreed: They wanted measureable results; they wanted to know that the tax dollars invested in public education were getting a good return.”6
No Child, Doctor, or Cop Left Behind
In the public sector, the show horse of accountability became “No Child Left Behind” (NCLB), an educational act signed into law with bipartisan support by George W. Bush in 2001 whose formal title was, “An act to close the achievement gap with accountability, flexibility, and choice, so that no child is left behind.”
The NCLB legislation grew out of more than a decade of heavy lobbying by business groups concerned about the quality of the workforce, civil rights groups worried about differential group achievement, and educational reformers who demanded national standards, tests, and assessment.7 The benefit of such measures was oversold, in terms little short of utopian.
Thus William Kolberg of the National Alliance of Business asserted that, “the establishment of a system of national standards, coupled with assessment, would ensure that every student leaves compulsory school with a demonstrated ability to read, write, compute and perform at world-class levels in general school subjects.” The first fruit of this effort, on the Federal level, was the “Improving America’s Schools Act” adopted under President Clinton in 1994. Meanwhile, in Texas, Governor George W. Bush became a champion of mandated testing and educational accountability, a stance that presaged his support for NCLB.
Under NCLB states were to test every student in grades 3–8 each year in math, reading, and science. The act was meant to bring all students to “academic proficiency” by 2014, and to ensure that each group of students (including blacks and Hispanics) within each school made “adequate yearly progress” toward proficiency each year. It imposed an escalating series of penalties and sanctions for schools in which the designated groups of students did not make adequate progress. Despite opposition from conservative Republicans antipathetic to the spread of Federal power over education, and of some liberal Democrats, the act was co-sponsored by Senator Edward Kennedy and passed both houses of Congress with majority Republican and Democratic support. Advocates of the reforms maintained that the act would create incentives for improved outcomes by aligning the behavior of teachers, students, and schools with “the performance goals of the system.”8
Yet more than a decade after its implementation, the benefits of the accountability provisions of NCLB remain elusive. Its advocates grasp at any evidence of improvement on any test at any grade in any demographic group for proof of NCLB’s efficacy. But test scores for primary school students have gone up only slightly, and no more quickly than before the legislation was enacted. Its impact on the test scores of high school students has been more limited still.
The unintended consequences of NCLB’s testing-and-accountability regime are more tangible, however, and exemplify many of the characteristic pitfalls of the culture of accountability. Under NCLB, scores on standardized tests are the numerical metric by which success and failure are judged. And the stakes are high for teachers and principals, whose salaries and very jobs depend on this performance indicator. It is no wonder, then, that teachers (encouraged by their principals) divert class time toward the subjects tested—mathematics and English—and away from history, social studies, art, and music. Instruction in math and English is narrowly focused on the skills required by the test rather than broader cognitive processes: Students learn test-taking strategies rather than substantive knowledge. Much class time is devoted to practicing for tests, hardly a source of stimulation for pupils.
Even worse than the perverse incentives involved in “teaching to the test” is the technique of improving average achievement levels by reclassifying weaker students as disabled, thus removing them from the assessment pool. Then there is out-and-out cheating, as teachers alter student answers or toss out tests by students likely to be low scorers, phenomena well documented in Atlanta, Chicago, Cleveland, Houston, Dallas, and other cities. Mayors and governors have diminished the difficulty of tests, or lowered the grades required to pass the test, in order to raise the pass rate and thus demonstrate the success of their educational reforms—and get more Federal money by so doing.
Another effect of NCLB is the demoralization of teachers. Many teachers perceive the regimen created by the culture of accountability as robbing them of their autonomy, and of the ability to use their discretion and creativity in designing and implementing the curriculum. The result has been a wave of early retirements by experienced teachers, and the movement of the more creative ones away from public and toward private schools, which are not bound by NCLB.9
Despite the pitfalls of NCLB, the Obama Administration doubled down on accountability and metrics in K-12 education. In 2009, it introduced “Race to the Top”, which used funds from the American Recovery and Reinvestment Act to induce states “to adopt college- and career-ready standards and assessments; build data systems that measure student growth and success; and link student achievement to teachers and administrators.” This shows what happens these days when accountability metrics do not yield the result desired: Measure more, but differently, until you get the result you want.
Metric madness is not limited to education. Some of the problems evident in NCLB pop up in fields from medicine to policing. Take, for example the medical quality-improvement program known as “surgical report cards.”
Surgical report cards debuted in the early 1990s to document the relative success rates of surgeons performing coronary bypasses. Here, too, the goal was to improve the performance of hospitals and surgeons by offering objective metrics of their efforts, with an eye toward rewarding the more successful and penalizing the deficient. Such programs have at times led to real improvements. But they, too, have had unintended negative consequences.
In order to boost their scores, for example, some surgeons increasingly turned away patients with worse chances of successful outcomes. Thus sicker patients, whose lives might have been saved by surgery, found it harder to find a surgeon. The patients died, but the metrics improved. A recent scholarly review of the effects of publicly reported accountability metrics in cardiovascular care concludes that:
[T]he experience with public reporting demonstrates little evidence that reporting is associated with improvement on either process of care or patient outcomes for cardiovascular disease, above and beyond quality measurement alone, and demonstrates that avoidance of high risk patients is a real consequence of these programs. Thus, it remains unclear whether the net effect of public reporting is positive or negative.10
Indeed, the metrics used could also have the opposite, but no less harmful, effect: not the denial of care to those who need it, but rather the provision of it to those who don’t. The standard metric is the “thirty day mortality rule”, whereby if the patient is alive thirty days after the surgical procedure, the procedure is considered a success. As a result, some unknown but not trivial number of patients who do not need the procedure get it, while others with minimal prospects and poor quality of life are kept on life support for at least 31 days.11
Nevertheless, the idea of “payment for performance” remains powerful among employers, insurers, and Medicare. So does the statistically informed demand for compliance, the coding of procedures, and demands for electronic-only medical records, all of which reduce the time that doctors spend actually being doctors.
Or take the recent increased dependence of police departments on statistics as a mark of accountability to politicians and the public. Ed Burns, a former Baltimore police detective for the homicide and narcotics divisions—best known as co-creator of The Wire—has described the process of “juking the stats” by which police officials can manipulate the activity of the department to produce seemingly impressive outcomes. As a detective in the narcotics division, Burns sought to meticulously and patiently build a case against top drug lords, but his superiors were uninterested because of the manpower and investment of time required to produce an arrest. They cared about improving the metrics, and since arresting five teenagers selling drugs on street corners a day yielded better statistics than arresting one drug kingpin after a multi-year investigation, they favored busting the teens. From their point of view—and from the point of view of the politicians to whom they reported—every arrest carried the same value. Of course, the path that produced the best performance indicators did little to diminish the sale and consumption of narcotics.
The substitution of numerical measurement for experience and judgment is evident in the realms of finance and business as well. It has far-reaching and largely pernicious effects on economic growth.
The cult of “accountability” and metrics in corporate America is linked to key innovations that have turned out to carry unanticipated undersides. One is the fashion for linking pay to performance, which has put a premium on schemes that purport to measure performance. This tends to produce “hard” numbers that seem reliable but are not. It has created tremendous incentives for CEOs and other executives to devote their creative energies to gaming the metrics—that is, to come up with schemes that purport to demonstrate productivity or profit by massaging the data in order to boost quarterly earnings or their equivalents. This gaming often comes at the expense of policies that foster long-term growth, such as investing in maintenance or in human capital formation.12 Similarly, providing top executives with stock options as an incentive for accountability has led CEOs to take steps to goose the firm’s profits in the short run—when their stock options took effect—often at the expense of long-term growth.
A focus on measurable performance indicators can lead managers to neglect tasks for which no clear measures of performance are available, as the organizational scholars Nelson Repenning and Rebecca Henderson have noted.13 Unable to count intangible assets such as reputation, employee satisfaction, motivation, loyalty, trust, and co-operation, those enamored of performance metrics squeeze these assets in the short term at the expense of long-term consequences. For all these reasons, reliance upon measurable metrics is conducive to short-termism, to the “flip it” mentality that is the besetting malady of contemporary American corporations.
The attempt to substitute precise measurement for informed judgment also limits innovation, which necessarily entails guesswork and risk. As business school professors Gary Pisano and Willy Shih have argued:
Most companies are wedded to highly analytical methods for evaluating investment opportunities. Still, it remains enormously hard to assess long-term R&D programs with quantitative techniques. . . . Usually, the data, or even reasonable estimates, are simply not available. Nonetheless, all too often these tools become the ultimate arbiter of what gets funded and what does not. So short-term projects with more predictable outcomes beat out the long-term investments needed to replenish technical and operating capabilities.14
Despite its lack of demonstrable positive effects on public K-12 education, the cult of accountability is increasingly worshipped in higher education as well. Until recently, the main conduit for this mania has been the regional bodies that accredit American universities, which are in turn legitimated by the U.S. Department of Education. These bodies send commissions, whose members include experts in assessment, to every college and university and demand a swelling stream of assessment reports on every aspect of the academic institution. The effect has been to divert the time of department chairs, deans, and provosts to the compiling of reports.
The Obama Administration sought to up the ante by extending direct, governmentally regulated accountability to colleges and universities. The Department of Education announced plans to grade all colleges and universities (public and private), to disaggregate its data by “gender, race-ethnicity and other variables”, and to eventually tie Federal funds to ratings focused on access, affordability, and outcomes—including expected earnings upon graduation. “The public should know how students fare at institutions receiving Federal student aid, and this performance should be considered when we assess our investments and priorities”, said Department of Education Under Secretary Ted Mitchell. “We also need to create incentives for schools to accelerate progress toward the most important goals, like graduating low-income students and holding down costs.”15
What advocates of government accountability metrics overlook is that the very real problem of the increasing cost of college and university education is due in no small part to the massive expansion of administrative cadres, many of whose members are required in order to comply with Federal legal mandates. One predictable effect of the new plan will be to raise further the costs of administration, both by diverting even more faculty time from teaching and research into filling out forms to accumulate data and by increasing the number of administrators hired to gather the forms, analyze the data, and supply the raw material for the government’s metrics.
Some of the suggested metrics were mutually exclusive, while others were simply absurd. The goal of increasing college graduation rates, for example, is at odds with increasing access, since less-advantaged students tend not only to be financially poorer but less prepared. The less prepared the students, the less likely they are to graduate on time or at all. Thus community colleges and other institutions that provide greater access to less prepared students would be penalized for their low graduation rates.
They could, of course, attempt to game the numbers in two ways. They could raise standards for incoming students, increasing their likelihood of graduating—but at the price of access. Or they could respond by lowering the standards for graduation—at the price of educational quality and the market value of a degree. It might be possible to admit more economically, cognitively, and academically ill-prepared students and to ensure that more of them graduate, but this could only be achieved at great expense, which is at odds with another goal of the Department of Education, namely to hold down costs.
Another metric that colleges and universities are to supply is the average earnings of their students after graduation. Not only is this information expensive to gather and highly unreliable, it’s also downright distortive. Many of the best students will pursue professional education, insuring that their earnings will be low for at least the time they remain in graduate school. Thus a graduate who proceeds immediately to become a greeter at Walmart will show a higher score than a fellow student who goes on to medical school.
In June, after a wave of protests from colleges and universities pointing out the impracticality of the Department of Education’s grading scheme and threats by Republicans in Congress to defund the project, the Administration retreated from its plan to create a unified grading scheme, opting to collect “more data than ever before” and make it available on its website. There will be numbers to show, and hence “accountability.”
Philosophical Critiques
Just as the cult of accountability has its devotees on both the political Right and Left, it also has its share of critics on both sides. What’s left of the Marxist Left sees accountability metrics as an example of de-skilling, in which changes in the organization of production brought about by those at the top have the effect of devaluing the skills and experience of those subordinate in the system. Work that is more circumscribed, and from which discretion has been excised by the requirement of meeting narrowly defined goals dictated by others, is also more alienating.
There are also powerful dissections of accountability-as-measurement from conservative and classical liberal thinkers such as Michael Oakeshott, Michael Polanyi, and Friedrich Hayek, whose analysis has recently been rediscovered by James C. Scott, a Yale anthropologist with self-described anarchist predilections.16 All three distinguished between two forms of knowledge, one abstract and formulaic, the other more practical and tacit.
Practical or tacit knowledge is the product of experience: It can be learned and taught to some extent, but not conveyed in general formulas. Abstract knowledge, by contrast, is a matter of technique, which, it is assumed, can be systematized, conveyed, and applied. In Oakeshott’s famous example, a sort of abstract knowledge is conveyed by cookbooks, but actually knowing how to make use of such knowledge (“beat an egg”, “whisk the mixture”) requires practical knowledge, based upon experience, that cannot be learned from books.
Oakeshott criticized “rationalists” for assuming that the conduct of human affairs is a matter of applying the right formulas or recipes. Technical knowledge is susceptible to precise formulation, which gives it the appearance of certainty. By contrast, he wrote,
it is a characteristic of practical knowledge that it is not susceptible of formulation of this kind. Its normal expression is in a customary or traditional way of doing things, or, simply, in practice. And this gives it the appearance of imprecision and consequently of uncertainty, of being a matter of opinion, of probability rather than truth.
The rationalist believes in the sovereignty of technique in which the only form of authentic knowledge is technical knowledge, for it alone satisfies the standard of certainty that marks real knowledge. The error of rationalism, for Oakeshott, is its failure to appreciate the necessity of practical knowledge and of knowledge of the peculiarity of circumstances.17
Hayek developed a related critique of the pretense of knowledge. He chastised socialist attempts at large-scale economic planning for their “scientism”, by which he meant their attempt to engineer economic life, as if planners were in a position to know all the relevant inputs and outputs that make up life in a complex society. The advantage of the competitive market, he maintained, is that it allows individuals not only to make use of their knowledge of local conditions, but to discover new uses for existing resources or imagine new products and services hitherto unknown and unsuspected. In short, planning fails not only to consider relevant but dispersed information, it prohibits the entrepreneurial discovery both of how to meet particular needs and how to generate new goals.18
A good deal of Hayek’s critique of scientism (which he applied broadly to much of modern economics) also applies to the culture of accountability. By setting out in advance a limited and purportedly measurable set of goals, accountability mania truncates the range of actual goals of a business or organization. It also precludes entrepreneurship within organizations, as there may be new goals and purposes worth pursuing that are not part of the metric. To focus efforts on satisfying the metric is to close off the process of innovation and discovery.
Accountability by metrics imposes a simplification not only of goals but of knowledge. In Scott’s insightful formulation, metrics and performance indicators, like many forms of cost-benefit analysis,
manage, through heroic assumptions and an implausible metric for comparing incommensurate variables, to produce a quantitative answer to thorny questions. They achieve impartiality, precision, and replicability at the cost of accuracy.19
Furthermore, “quantification is a powerful agency of standardization because it imposes order on hazy thinking, but this depends on the license it provides to ignore or reconfigure much of what is difficult or obscure”, contends Theodore Porter, an historian of science.20 That which is difficult or obscure is precisely the realm of the practical, tacit knowledge that metrics seek to displace.
Echoing Karl Popper’s famous distinction between clouds and clocks, the British liberal philosopher, Isaiah Berlin, noted in an essay on political judgment that, “to demand or preach mechanical precision, even in principle, in a field incapable of it is to be blind and to mislead others.”21 Indeed, what Berlin says of political judgment applies more broadly: Judgment is a skill at grasping the unique particularities of a situation, and it entails a talent for synthesis rather than analysis, “a capacity for taking in the total pattern of a human situation, of the way in which things hang together.”22 A feel for the whole and a sense for the unique are precisely what numerical metrics cannot supply.
There is a natural human tendency to want to simplify problems by focusing on the most easily measureable elements. But as we have seen, measuring performance, especially when there is much at stake, often leads to unintended consequences. Let us look at chronic complaints and characteristic flaws of attempts to measure and reward performance.
One recurrent problem was formulated by the social psychologist Donald T. Campbell in 1975, in what has come to be called “Campbell’s Law”: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” In other words, when performance is judged by a single measure, and the stakes are high (keeping one’s job, getting a raise, raising the stock price at the times when stock options are vested), people will focus on satisfying that measure often at the expense of other, more important organizational goals that are not measured. The result is goal displacement, where the metric means comes to replace the ultimate ends that those means ought to serve.
In British universities, for example, as academic departments came to be rewarded or punished based on the number of books and articles their faculty published, the result was the publication of more but ever less important articles, and a decline in the writing of “big books” that require years of research and revision. To take another example from the realm of medicine: Pay-for-performance schemes may lead to a narrow focus on improving the treatment of specific diseases by medical specialists, but without creating incentives for the sort of integrated care provided by primary-care physicians. But that integrated care reduces the incidence of disease, and it increases longevity when dealing with the complex situations of patients with multiple conditions. Yet this added value is overlooked when the focus of measurement is the degree of success in treating individual diseases.23
Then there is the phenomenon already mentioned in passing variously known as “gaming the metrics” or “goosing the numbers.” Hospitals, for example, engage in “upcoding”, which describes the phenomenon of coding diagnoses as more serious than is warranted, thus making patients appear sicker in order to make risk-adjusted outcomes show greater improvement.24
Gaming the metrics also takes the form of diverting resources from their best long-term uses to achieve measured short-term goals. Take the company that, hoping to be bought out at a multiple of earnings, tries to boost its short-term profit by laying off necessary workers. Or the CEO who smooths out corporate earnings by postponing needed investments in an effort to meet analysts’ expectations for the quarter. Or, as we’ve seen, the police narcotics department that decides to boost its arrests numbers by going after street-corner small fry rather than the few movers and shakers at the top of the drug-dealer pyramid.
One step beyond gaming the metrics is cheating, a phenomenon whose frequency tends to increase directly with the stakes of the metric in question. As we’ve seen, as No Child Left Behind raised the importance for teachers of the test scores of their pupils, teachers and principals in many cities responded by cheating on the test. In China, where tests at the end of high school determine students’ entrance into university and hence their futures, cheating is a mass phenomenon. In short, the very performance indicators that promise “transparency” are often massaged or manipulated. The result is that one form of opacity is replaced by another.
To the debit side of the ledger must also be added the transactional costs of acquiring the metrics: The expenditure of time by those employees tasked with compiling and processing the metrics, not to speak of time required to actually read and understand them. That is exacerbated by the “reporting imperative”—the perceived need to constantly generate information, even when nothing significant is going on. Sometimes the metric of success is the number and size of the reports generated, as if nothing is accomplished or even actually real unless it is extensively documented.
Then there is the degradation of the experience of work that comes from compelling the people in an organization to focus their efforts on the narrow range of what is measured. Edmund Phelps, a Nobel-prize winning economist, claims in Mass Flourishing: How Grassroots Innovation Created Jobs, Challenge, and Change, that one of the virtues of capitalism is its ability to provide “the experience of mental stimulation, the challenge of new problems to solve, the chance to try the new, and the excitement of venturing into the unknown.” That is certainly a possibility under capitalism. But those subject to performance metrics are forced to focus their efforts on narrow goals, imposed by others who often enough do not understand the work they do. For them, mental stimulation is dulled: They decide neither the problems to be solved nor how to solve them, and there is no excitement in venturing into the unknown because the unknown is beyond the measurable and hence off limits. In short, the entrepreneurial element of human nature, which of course extends far beyond the owners of enterprises, is stifled by the cult of accountability. Moreover, by limiting the range of relevant knowledge and acceptable experimentation, abstract schemes of accountability “tend to diminish the skills, agility, initiative, and morale” of those forced to conform their work to schemes imposed from above, as Scott contends. They thus deplete the human capital of the workforce.
One effect of that depletion is to motivate those with greater initiative and enterprise to move out of mainstream, large-scale organizations where the culture of accountable performance prevails. Teachers move out of public schools to private and charter schools. Engineers move out of large corporations to boutique firms. Enterprising government employees become contractors and consultants. There is a healthy element in this. But surely the large-scale organizations of our society are the poorer for driving out those most likely to innovate and initiate. The more that work becomes a matter of filling out forms and filling in boxes by which performance is to be measured and rewarded, the more it will repel those who know how to actually think.
Economists who specialize in measuring economic productivity report that in recent years (2007–12) the only increase in total factor productivity in the American economy was in the IT-producing industries. A question worth asking is to what extent the culture of accountability—with its staggering costs in employee time, morale, and initiative—has itself contributed to economic stagnation?
How should we account for the gap between the effectiveness of the cult of accountability and its ubiquity? Given its many drawbacks, most of them obvious to anyone who cares to look, why is it so popular?
An elective affinity exists between a democratic society with substantial social mobility and the culture of measured accountability. In societies with an established trans-generational upper class, members of that class are more likely to feel secure in their positions, to trust one another, and to have imbibed a measure of tacit knowledge about how to govern from their families, giving them a high degree of confidence in their judgments. By contrast, in meritocratic societies with more open and changing elites, those who reach positions of authority are less likely to feel secure in their judgments, and more likely to seek seemingly objective criteria by which to make decisions.
Numerical metrics also give the appearance (if one does not analyze their genesis and relevance too closely) of transparency and objectivity. A good part of their attractiveness is that they appear to be readily understood by all. As the Cambridge literary scholar Stefan Collini has observed, “public debate in modern liberal democracies has come to combine utilitarian valuations with a distrust of procedures that are not mechanically universalizable”, and so the exercise of judgment is replaced by standardized measurements “that can be made intelligible to the average accountant-in-the-street.”25
The quest for numerical metrics of accountability is particularly attractive in cultures marked by low social trust. And mistrust of authority has been a leitmotiv of American culture since the 1960s, on both the Right and the Left. There is a particularly close affinity between it and the populist, egalitarian suspicion of authority based on class, expertise, and background. Thus in politics, administration, and many other fields, numbers are valued precisely because they replace reliance on the subjective, experience-based judgments of those in power. Add to that the fear of litigation and the resulting need to document that decisions have been made on objective grounds, for which numerical indicators are seen as more convincing proof.
The Right and the Left look to metrics of accountability, though not always for the same reasons. On the Right there is the suspicion, sometimes well-founded, that public-sector institutions are being run primarily for the benefit of their employees rather than their clients and constituents, just as the theorists of the principal-agency dilemma had maintained in their critique of corporate America. In schools, police departments, and other government agencies, time-serving has indeed been a reality, even if not the predominant one, and the culture of accountability was an understandable attempt to break the stranglehold of entrenched gerontocracy. That led to the oft-stated conviction that the problem with the non-profit sectors (government, schools, universities) is that they have “no bottom line” and hence no way of accounting for success or failure. To this way of thinking, the solution was to create a substitute bottom line in the form of “objective”—and preferably numerical—measures of standardized processes.
Both the American Left and Right have trouble admitting that some problems are insoluble for all practical purposes, such as differential group achievement. Instead, there is a search for technocratic answers to achievement gaps, as if measuring a problem with precision will lead inexorably to its solution. In such cases metric goals serve as a form of wish fulfillment.
The impetus for accountability in the fields of medicine and education stems from the fact that the relative cost of these services has risen compared to that of most consumer goods. Part of the reason lies in “Baumol’s cost disease”, the phenomenon first identified by the economist William Baumol in 1966, who observed that increases in productivity in manufacturing, which were largely the product of improved technology and had grown steadily over the past century and more, were not matched by similar improvements in fields such as the arts, where “the work of the performer is an end in itself.” Thus cost per unit of output continues to decline in manufacturing—to the benefit of consumers—while costs in other fields do not, and hence become relatively more expensive. As technological developments and the intensification of global trade have led to the ever-declining costs of most consumer goods, the relative costs of medicine, education, and similar services have become ever more salient, and a focus of public discontent—hence the pressure for greater efficiency and greater accountability. (That improvements in medical technology and more effective pharmaceuticals may legitimately add to costs is less frequently noted. Nor is the fact that education that maintains static educational attainments in an era when students enter school with declining levels of human capital may be a sign of institutional effectiveness rather than stagnancy.)
Other economic forces are also at play. As organizations (companies, universities, government agencies) become larger and more diversified, there is an ever greater remove between top management and those further down the organizational chain engaged in the actual activities to which the organization is dedicated. Add the fact that CEOs, university presidents, and heads of government agencies often move from one organization to another, and the lack of substantive knowledge of those at the top becomes greater. Hence their greater reliance on metrics, and preferably metrics that are similar from one organization to another.
Then there are the cultural peculiarities of some American bureaucracies, which assume that each person can and should be rotated through an ascending hierarchy of posts, both within an organization and among organizations. This militates against developing a depth of expertise that would allow for meaningful evaluation of the significance and qualitative importance of work done by subordinates. Hence here too we intuit the attractiveness of relying on measurable, quantitative criteria.
Last but not least comes the spread of information technology (IT). In the early 1980s, the invention and rapid adoption of the electronic spreadsheet and the resulting ease of tabulating and manipulating figures had wide-ranging effects. As a prescient analyst of the phenomenon, Steven Levy wrote in 1984:
The spreadsheet is a tool, but it is also a worldview—reality by the numbers. . . . Because spreadsheets can do so many important things, those who use them tend to lose sight of the crucial fact that the imaginary business that they can create on their computers are just that—imaginary. You can’t really duplicate a business inside a computer, just aspects of a business. And since numbers are the strength of spreadsheets, the aspects that get emphasized are the ones easily embodied in numbers. Intangible factors aren’t so easily quantified.26
Since then, the growing opportunities to collect data and the declining cost of doing so have contributed to the meme that data is the answer for which organizations have to come up with the questions. There is an often unexamined faith that amassing data and sharing it widely within the organization will result in improvements of some sort, even if much information has to be denuded of nuance and context to turn it into easily transferred “data.” Once a company, organization, or government agency has invested substantial sums in IT, a search ensues for standardized data to justify the costs of the system. And even if the results are meager, management needs to declare victory to justify its investment. Thus does the accountability snake seek to eat its own tail.
With all of these forces at play, “accountability”, “metrics”, and “performance indicators” become cultural memes, a pattern in the cultural air that exerts the attraction of a seat on the train of historical progress. No politician, agency chief, university president, or school superintendent wants to be left behind.
There is nothing intrinsically pernicious about counting and measuring. We all tend to project broad-ranging conclusions based on our inevitably limited experience, and measured data can serve as a useful counterpoint to subjective judgment. There are many instances in which metrics are perfectly appropriate, above all in highly standardized sectors of an organization. In every organization, there are legitimate metrics of performance. But knowing which metrics are appropriate and how to use them are ultimately matters of judgment.
Accountability metrics are less likely to be effective when they are imposed from above, using standardized formulas developed by those far removed from the activity being measured. Measurements are more likely to be meaningful when they are developed from the bottom up, not by mining engineers, so to speak, but by those at the coalface. That means asking those with the tacit knowledge that comes from experience to provide suggestions for how to improve productivity and to develop appropriate performance standards.27 Such measurement can be used to inform practitioners of their performance relative to their peers, offering recognition to those who have excelled and assistance to those who have not. Measurement instruments, such as tests, are invaluable; but they are most useful for internal analysis by practitioners rather than external evaluation by publics who may fail to understand their limits. To the extent that they are used to determine continuing employment and pay, they will almost invariably be subject to the gaming of the statistics or outright fraud.
Even when metrics can be useful, however, they are not necessarily worth assembling. One should always consider the trade-offs: the costs in employee time and energy, and their diversion from other purposes. Acquiring data is not free, nor are the costs of obtaining them easily calculated. Had he lived longer, Franz Kafka himself might have written a short story of accountability mania spiraling downward in an infinite regress until reaching a point of total entropy. The challenge for organizations is to leave room for judgment and initiative, to motivate without stifling autonomy.
In the end, there is no silver bullet, no substitute for actually knowing one’s subject and one’s organization, which is partly a matter of experience and partly a matter of unquantifiable skill. Many matters of great importance are too subject to judgment and interpretation to be solved by standardized metrics. In recent decades, too many politicians, business leaders, policymakers, and academic officials have lost sight of that distinction. To paraphrase Lewis Carroll, “if you don’t know where you’re going, any metric will take you there.”
1Kenneth Cukier and Viktor Mayer-Schönberger, “The Dictatorship of Data”, MIT Technology Review, May 31, 2013.
2See Rakesh Khurana, From Higher Aims to Hired Hands: The Social Transformation of American Business Schools and the Unfulfilled Promise of Management as a Profession (Princeton University Press, 2007).
3Elie Kedourie, Diamonds into Glass: The Government and the Universities (London, 1988), reprinted in Kedourie, “The British Universities under Duress”, Minerva (March 1993), pp. 56–105.
4Kedourie, Perekstroika in the Universities (Institute of Economic Affairs Health and Welfare Unit, 1989), pp. x–xi.
5Kedourie, Perestroika, p. 29.
6Diane Ravitch, The Death and Life of the Great American School System (Basic Books, 2011), p. 149.
7See Jesse H. Rhodes, An Education in Politics: The Origins and Evolution of No Child Left Behind (Cornell University Press, 2012), p. 88.
8Thomas S. Dee and Brian Jacob, “The Impact of No Child Left Behind on Student Achievement”, Journal of Policy Analysis and Management (Summer 2011).
9Kenneth Bernstein, “Warnings from the Trenches”, Academe (January/February 2013); see also the powerful testimony of “Teacher of the Year” Anthony J. Mullen, “Teachers Should be Seen and Not Heard”, Education Week, January 7, 2010.
10Paula Chatterjee and Karen E. Joynt, “Do Cardiology Quality Measures Actually Improve Patient Outcomes?”, Journal of the American Heart Association (February 2014).
11Paula Span, “A Surgery Standard Under Fire”, New York Times, March 3, 2015.
12Nelson P. Repenning and Rebecca M. Henderson, “Making the Numbers? ‘Short Termism’ and the Puzzle of Only Occasional Disaster”, Working Paper 11-33, Harvard Business School, 2010.
13See Michael Beer and Mark D. Cannon, “Promise and Peril in Implementing Pay-for-Performance”, Human Resources Management (Spring 2004), pp. 3–48.
14Gary P. Pisano and Willy C. Shih, “Restoring American Competitiveness”, Harvard Business Review (July/August 2009), pp. 11–12.
15Douglas Belkin, “Obama Spells Out College-Ranking Framework”, Wall Street Journal, December 19, 2014.
16James C. Scott, Two Cheers for Anarchism (Princeton University Press, 2012).
17Oakeshott, “Rationalism in Politics”, Rationalism in Politics, and Other Essays, ed. Timothy Fuller (Liberty Fund, 1991).
18Hayek, “The Uses of Knowledge in Society”, “The Meaning of Competition”, and “‘Free’ Enterprise and Competitive Order”, Individualism and Economic Order (University of Chicago Press, 1948).
19Scott, Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed (Yale University Press, 1999), p. 426.
20Theodore M. Porter, Trust in Numbers: The Pursuit of Objectivity in Science and Public Life (Princeton University Press, 1995), p. 85.
21Isaiah Berlin, “Political Judgment”, The Sense of Reality: Studies in Ideas and Their History, Henry Hardy, ed. (Farrar, Straus & Giroux, 1998), p. 53.
22Berlin, “Political Judgment”, p. 50.
23Kurt C. Strange and Robert L. Ferrer, “The Paradox of Family Care”, Annals of Family Medicine (July/August 2009), pp. 293–9.
24Chatterjee and Joynt, “Do Cardiology Quality Measures Actually Improve Patient Outcomes?”
25Stefan Collini, “Against Prodspeak”, English Pasts: Essays in History and Culture (Oxford University Press, 1999), p. 239.
26Levy, “A Spreadsheet Way of Knowledge”, Harper’s (November 1984).
27Thomas Kochan, “Commentary” on “Promise and Peril in Implementing Pay-for Performance”, Human Resources Management (Spring 2004), pp. 35–7.