Wednesday, March 28, 2018

The Testing Thermostat

Here's another analogy to help understand why test-centered accountability doesn't work well.

All the heat in my house is run by a single thermostat. My house has three stories and a basement. The thermostat is on the first floor. The furnace runs into two out of four rooms on the second floor. There are no furnace runs to the third floor (a converted attic space).

The thermostat is supposed to turn the furnace off and on based on the temperature in the house. But it only measures the temperature in one room. In a second-floor bedroom, the temperature may be uncomfortably cold, but the thermostat doesn't measure that. In the attic room, a space heater4 may have the room super-warm, but the thermometer doesn't know that. The thermostat is by the front door-- if that door opens and cold air comes pouring in, the thermostat thinks the whole house is cold.

In short, the thermostat is an inaccurate measure of the temperature in my home because it only measures the temp in one place.

It's true that the thermostat is crudely accurate-ish. If the thermostat thinks the house is at 30 degrees, it's probably safe to conclude that the whole house is cold-- although, it could also mean that the front door has blown open. If the thermostat says the house temperature is 90, it's probably safe to assume that the furnace doesn't need to kick on (and the odds are that the house isn't on fire).

The temperature at that single location isn't a completely useless proxy for the temperature at other locations in the house, if we do a lot of correcting and seat-of-pants compensation for the ways in which the system is built to fail. If we start insisting that the temperature reported by the thermostat is the exact same temperature in every other part of the house, we're in trouble.

We're in even more trouble is we start using the thermostat read-out as a proxy for other things entirely, like how comfortable a room is, or how bright the light is in a room, or how nice the room smells, or how loud the room is. It takes a Grand Canyon sized leap to figure that the reported temperature in one location can be used to determine other factors in other parts of the house.

Likewise, we will fail if we try to use the thermostat read-out to evaluate the efficiency of the power generating and delivery capabilities of our electric company, or evaluate the contractor who built the house (in my case, almost a hundred years ago), or evaluate the health and well-being of the people who live in the house-- or to jump from there to judging the effectiveness of the doctor who treats the people who live in the house, or the medical school that trained that doctor.

At the end of the day, the thermostat really only measures one thing-- the temperature right there, in the place where the thermostat is mounted. To use it to measure any other part of the house, or any other aspect of any other part of the house, or any aspect of the people who live in the other parts of the house-- well, that just means we're moving further and further out on a shaky limb of the Huge Inaccuracy Tree.

In this way, the thermostat is much like the Big Standardized Test-- really only good at measuring one small thing, and not a reliable proxy for anything else. Try this analogy the next time someone asks you why it doesn't make sense to use a single BS Test to measure students, schools, teachers, and the full range of educational activity.

It's not a perfect analogy. It doesn't, for instance, address the example of a thermostat that sets off a bomb instead of starting the furnace. Nor does it address the superstitious belief that the more often you look at a thermostat, the warmer your house becomes. But it's hard to come up with an analogy that captures all the ways in which test-centered accountability is a mess. This will do for a starter.


  1. As prepare to stop teaching and begin test prep, at the request of my principal, I appreciate your article. The BS test makes me want to jump out a window and my classroom is on the second floor!

  2. But without any thermostat -- set to a mutually agreed scale -- you can't run any central heating at all. I'm not convinced that's better.

    Your analogy would imply we should have a more diffuse heating system, with multiple thermostats, responding to different areas. I agree with that.

    But railing against the BS test makes it seem as if one is against testing of students. Whereas campaigning for better testing requires a very different line of attack.

  3. Nice! Keep it coming. I read every post and retweet most of them.

  4. Your model of a thermostat showing the flaws of testing to measure assessment strikes a chord with me. As an educator, I see my students 180 days a year but the test is just one day. It doesn't factor in if my students had breakfast or had a full night of sleep - the test is just one snapshot of performance. I do however think that high schools do need some public measures of accountability to show progress (college going rates, college entrance scores, completion of certifications, etc.) that would be tied to students showing their knowledge of a scope of an academic career instead of on one test.

    Colleges have noticed the same - high school GPAs are better predictors for grades in college than SAT scores (Geiser, & Santelices, 2007). High schools should be able to apply this finding to their own measurements of success and find better alternatives to measure student outcomes than test scores.

    Geiser, S., & Santelices, M. V. (2007). Validity of high-school grades in predicting student success beyond the freshman year: High-school record vs. standardized tests as indicators of four-year college outcomes.