Thursday, October 27, 2016

The Death of Testing Fantasies

It is one of the least surprising research findings ever, confirmed now by at least two studies-- students would do better on the Big Standardized Test if they actually cared about the results.

One of the great fantasies of the testocrats is their belief that the Big Standardized Tests provide useful data. That fantasy is predicated on another fantasy-- that students actually try to do their best on the BS Test. Maybe it's a kind of confirmation bias. Maybe it's a kind of Staring Into Their Own Navels For Too Long bias. But test manufacturers and the policy wonks who love them have so convinced themselves that these tests are super-important and deeply valuable that they tend to believe that students think so, too.

Somehow they imagine a roomful of fourteen-year-olds, faced with a long, tedious standardized test, saying, "Well, this test has absolutely no bearing on any part of my life, but it's really important to me that bureaucrats and policy mavens at the state and federal level have the very best data to work from, so I am going to concentrate hard and give my sincere and heartfelt all to this boring, confusing test that will have no effect on my life whatsoever." Right.

This is not what happens. I often think that we would get some serious BS Test reform in this country if testocrats and bureaucrats and test manufacturers had to sit in the room with the students for the duration of the BS Tests. As I once wrote, if the students don't care, the data aren't there.

There are times when testocrats seem to sense this, though their response is often silly. For instance, back when Pennsylvania was using the PSSA test as our BS Test, state officials decided that students would take the test more seriously if a good score won them a shiny gold sticker on their diploma.

The research suggests that something more than a sticker may be needed. Some British research suggests that cash rewards for good test performance can raise test scores in poor, underperforming students. And then we've got this new, unpublished working paper from researchers John List (University of Chicago), Jeffrey Livngston (Bentley University) and Susan Neckermann (University of Chicago) which asks via title the key question-- "Do Students Show What They Know on Standardized Tests?" Here's the abstract, in all its stilted academic-languaged glory:

Standardized tests are widely used to evaluate teacher performance. It is thus crucial that they accurately measure a student’s academic achievement. We conduct a field experiment where students, parents and tutors are incentivized based partially on the results of standardized tests that we constructed. These tests were designed to measure the same skills as the official state
standardized tests; however, performance on the official tests was not incented. We find substantial improvement on the incented tests but no improvement on the official tests, calling into question whether students show what they know when they have no incentive to do so.

I skimmed through the full paper, though I admit I just didn't feel incented to examine it carefully because this paper is destined to be published in the Journal of Blindingly Obvious Conclusions. Basically, the researchers paid students to try harder on one pointless test, but found that this did not inspire the students to try harder on other pointless tests for free.

A comparable experiment would be for a parent to pay their teenage daughter to clean up her room, then wait to see if she decided to clean the living room, too. There is some useful information here (finding out if she actually knows how to clean a room), but what we already know about motivation (via both science and common sense) tells us that paying her to clean her room actually makes it less likely that she will clean the living room for free.

And my analogy is not perfect because she actually lives in her room and uses the living room, so she has some connection to the cleaning task. Perhaps it would improve my analogy to make it about two rooms in some stranger's home.

The study played with the results of different rewards for the student lab rats, again, with unsurprising results ("The effects are eliminated completely however when the reward amount is small or
payment is delayed by a month").

More problematically, the study authors do not seem to have fully understood what they were doing as witnessed by what they believed was their experimental design--

The experiment is designed to evaluate whether these incentives successfully encourage
knowledge acquisition, then measure whether this acquisition results in higher ISAT scores.
Using a system developed by Discovery Education, the organization which creates the ISAT, we
created “probe” tests which are designed to assess the same skills and knowledge that the official
standardized tests examine.

No. The experiment was designed, whether you grokked it or not, to determine if students could be bribed to try harder on the tests, thereby getting better scores.  

The answer is, yes, yes they can, and that result underlines one of the central flaws of test-driven accountability-- if you give students a test that is a pointless exercise in answer-clicking, many will not make any effort to try, and your results are useless crap. The fantasy that BS Tests produce meaningful data is a fantasy deserves to die.

As for the secondary question raised by these studies-- should we start paying students for test performance-- we already know a thousand reasons that such extrinsic rewarding for performance tasks is a Very Bad Idea. So let me leave you with one of the most-linked pieces of work on this blog, Daniel Pink's "Drive"


  1. Some states require you to pass the BS test in order to graduate. This provides incentive, but also provides stress, panic, and anxiety, which can be counterproductive to getting better results or really learning what they know. Besides the fact that they may know and know how to do many things that may be useful but are not tested, or the fact that positive and intrinsic motivation (as I believe Pink says) is much more effective and valuable than negative and extrinsic motivation.

  2. I absolutely think testing provides data that can improve education. I see it in my circle of friends. I know it has forced school districts to do SOMETHING about failing campuses when they were hell-bent on doing nothing. I think the test may have to change, but I have no doubt that assessment has a tremendous and important role in improvement education: not the least of which is providing teachers with a better idea of what the standards are and what the outcome needs to be for students to meet that standard.

    1. "can improve education"

      Yes, that's true. But the testing practices in almost all of the states make it impossible for teachers, admins & researchers to glean much useful data. The tests are 'black-box', it's a crime in some states for teachers to read their content, most have not been honed over a long period of time (like the AP's, the British exit exams, the NY Regents, etc, etc) and for many exams, there are not effective review materials the relate to the exams.

      Add all that to "why should Johnny give a shit" and we are wasting huge amounts of time, along with effectively hemorrhaging large numbers of future, new & veteran public school teachers (I am one of them) because of the huge amount of dung associated with The Holy Tests.

      I have nothing bad to say about well-designed exams shaping curriculum, at least in disciplines where I've taught or am very familiar, examples being the old ACS end of course HS Chemistry exam, the two AP Physics C exams, the AP Calculus AB & BC exams and a few others. BUT these exams are meant for a specific set of students...not for the guy who needs more time to learn how to "think physics" so he can qualify to enter the Navy's Nuclear service.

    2. paprgi, how can assessments give teachers "a better understanding of what the standards are"? The only thing assessments can do is show whether or not students meet the standards, and the necessary outcome can only be seen in the standards themselves. As a teacher, I have always been very clear on what the "standards", though I would call them objectives, should be in my area.

      The only thing that can improve education is to use problem-solving techniques that are not connected to profit-making motives. Many different types of data, among which the results of test assessments like the NAEP is one of myriad components, can be useful in this process, but current assumptions by policy makers are detrimental to this process.