Sunday, May 10, 2015

The Two Critical Testing Questions

The full range of debate about the Big Standardized Tests really comes down to answering two critical questions about the testing.

1. Does the test collect good data?

The whole justification for the BS Tests is that they will collect all sorts of rich and useful data about students, schools, and educational programs.

I have been amazed at the widespread, childlike, bland faith that many people have in anything called a "standardized test." If it's a "standardized test," then surely it must measure real stuff with accuracy, reliability and validity. Sure, the reasoning goes, they wouldn't be putting the test out there if it weren't really measuring stuff.

But to date, no evidence has appeared that the BS Tests are reliable, valid, or actually measuring anything that they claim to measure. The test contents are locked under a Giant Cone of Secrecy, as if the test is some sort of educational vampire that will evaporate if sunlight hits it. Nor have the data collected by the BS Test been clearly linked to anything useful. "Well, since she got a great score on the PARCC, we can be assured that she will be a happy, productive, and rich member of society," said nobody, ever.

Nor is the data rich with any level of data at all. Instead, we get reports that are the equivalent of saying the student was either "Pathetic," "Sad," "Okee dokee-ish," and "Mighty Swell."

Do the BS Tests measure anything other than the students' ability to take the BS Tests? Do the test results actually mean anything? If the test fans can't answer those questions, we're wasting everyone's time.

2. What action is taken with the data?

The tests are supposed to provide data on which to act. Does that-- can that-- happen?

On the classroom level, no. Data is too meager, non-transparent, and just plain late to do anybody any good. "Well, last year you score Okee-dokee-ish because you missed some questions that I'm not allowed to see, so I've customized an educational program to address what I imagine your problem areas used to be," is not a useful thing to say to a student.

But what about identifying schools that need help? Is the data used to help those schools? Not unless by "help" you mean "close" or "take over" or "strip of resources so students can go to a charter instead." Our current system does not identify schools for help; it identifies schools for punishment.

Of course, it's hard to come up with a good action plan based on bad data, which is why we need answers to Question #1 before we can do anything with Question #2.

We can't fix what we don't measure.

Well, maybe, but it doesn't matter because right now our process is as follows:

1) Hey, your bicycle looks like it's not working right.

2) I've measured the lead content of the paint on the bicycle by squeezing the bouncy part of the seat. Your bike is definitely defective.

3) I have thrown your bicycle in the dumpster.

We aren't measuring anything, and we aren't fixing anything. Outside of that, test-driven accountability is working out very Okee dokee-ish.


  1. The PARCC tests specifically are scaled two grades too high. So when nearly EVERYBODY fails an incorrectly calibrated test, what have we learned?