CURMUDGUCATION: Fordham Institute Almost Figures Out Testing

When someone from the other side of debate announces that they Get It finally, you can find yourself torn between reactions. A sarcastic "No shit, Sherlock" wrestles with a sincere, "Why that's an excellent insight. Good job."

So here's part of the opening paragraph from a recent post by Dale Chu at the ever-reformy Fordham Institute. It opens with a discussion of a new drive by states to come up with what we're calling "through-year" assessments.

Through-year assessment is appealing because it promises to address multiple longstanding frustrations with existing state tests. But growing enthusiasm for it rests on a shaky assumption: that a single assessment can simultaneously satisfy accountability requirements, improve instruction, provide timely feedback, reduce testing burden, preserve local flexibility, and produce valid statewide comparisons. It slices, it dices, it even juliennes!

~~No shit, Sherlock.~~ I mean, that's an excellent insight. One of the major problems with the state Big Standardized Tests, from all the way back in the No Child Left Behind era, is that they are advertised as slicing dicing and juilienning.

The BS Test is supposed to provide teachers with information about how students are doing and where there are "gaps" in instruction. They are supposed to provide schools with data on how well their curricula are designed and working. They are supposed to provide an accurate instrument for evaluating both teacher and school performance. They are supposed to provide useful "customer" information for parents choosing a school. They are supposed to provide "accountability" information for at least three different constituencies-- taxpayers, state and federal. They are supposed to provide a measure that facilitates comparison across space and time. And that's all before we get to less-explicitly-discussed purposes like exerting control over local curricular choices.

As Chu writes-

It is the same dilemma captured in the classic Saturday Night Live commercial for “New Shimmer,” a product advertised as both a floor wax and a dessert topping. Through-year assessment has something of a New Shimmer problem: It is being asked to function simultaneously as an accountability instrument and an instructional tool, as a system for comparability and a system for flexibility—design goals that do not naturally coexist in a single product.

Chu also points out, "Assessment experts have been warning about this dynamic for years." If by "assessment experts" he means "assessment experts, actual teachers, and plenty of parents," then sure. That's an excellent insight.

More instructional utility often comes at the expense of comparability across students and schools. Additional testing windows may yield more information but increase logistical complexity. Faster feedback requires sacrifices elsewhere in the system, most commonly in the depth and breadth of what the assessment can cover.

These tradeoffs are not simply technical problems that can be papered over through more sophisticated psychometric design.

What an excellent insight.

Look, teachers have been saying this for year, all the way back to NCLB and then the Common Core tests that were designed by prioritizing could be measured over what was important to measure. Teachers have pointed out now for decades that when they are literally forbidden to see the questions that students answered or to know what the students responses were, but are simply given a single score, usually months after the students have left their classroom-- that is an absolutely useless test from an instructional standpoint (but hey--protecting testing companies' valuable IP is more important than any educational goals).

Chu frames this whole piece as a discussion of what states are trying to set up for the future. But he comes really close to the most important insight.

The challenge, then, is not a lack of good intentions or even a lack of innovation. It is that state assessment systems are being asked to solve multiple problems at once, requiring choices that satisfy no purpose fully and inevitably sacrifice elements of each. In the process, states may be required to spend more time, money, and resources on unproven assessment models—and probably add to students’ total testing burden—at a time when there is little, if any, appetite for doing so.

Yes, these limits of testing utility and accuracy are a hurdle, and states may "mistake an unavoidable tradeoff for a design problem that can be engineered away."

But Chu isn't describing a possible pitfall for future testing-- he is describing the fatal flaw of the state testing that is going on right now. The current program of BS Testing in the states has not even pretended to grapple with the problem of balancing different purposes for the testing, and so we've been saddle with a tradeoff that exchanges the generation of easily-managed numbers masquerading as data for -- well, everything. In pretending to do everything, state tests do pretty much nothing useful at all.

So yeah-- these are some great insights. Please catch on to the other implications, and could somebody please forward these testing insights on to all the people who think we can fix the "learning recession" by leaning harder into test and punish policies of the past. Because when your tests are too multipurpose to be useful, "test and punish" feels a lot like "punish randomly." And I don’t want to wait years for some folks to have that insight.

Pages

Friday, June 12, 2026

Fordham Institute Almost Figures Out Testing

No comments:

Post a Comment