Wednesday, April 6, 2022

How To Innovate On Assessment (And Why States Won't)

At Bellwether Education Partners, Michelle Croft marks Testing Season by wondering why states have not been using their new-found sort-of-freedom-ish-ness under ESSA to innovate with the Big Standardized Test.

Despite rhetoric over the years about innovations in assessments and computer-based delivery, by and large, students’ testing experience in 2022 will parallel students’ testing experience in 2002. The monolith of one largely multiple-choice assessment at the end of the school year remains. And so does the perennial quest to improve student tests.

It's a fair point. States could be getting clever; they aren't. 

Croft cites a couple of possible explanation for the tepidity of the states. First, states are still staggering under the interruption of the BS Test over the past couple of pandemess years. Second is the challenge of meeting the accountability requirements of ESSA. States have the option of applying for the Innovative Assessment Demonstration Authority, but the major impediment is that new testing systems would have to be backwards compatible--in other words, people (well, state and federal education bureaucrats) would have to be able to compare new scores to scores under the old system. That right there is pretty much a game ender.

Croft has a couple of ideas about targets that a new system should aim for. One is to improve score reporting so that it can "meaningfully and easily communicate results to educators and families." Another is to try to improve "teacher classroom assessment literacy." 

Regular readers know my feelings about high stakes testing, which I would call the single largest, most destructive, most terribly toxic scourge on public education in the last 25 years. At the same time, I absolutely believe in accountability for public schools. But I am in absolute agreement with Croft that the state response to ESSA re:accountability has been--well, she says "tepid" and I would say "crappy." So, without getting into the nitty gritty devil-dwelling details, what requirements do I think a new, revamped system would need to have? What goals should we set out to meet?

Don't throw good money after bad. Suck it up and face the unfortunate truth that the last twenty-some years of BS Test data are junk, and there is absolutely no point in trying to pursue backward compatibility. We don't need the new data to be comparable to the old data, because the old data aren't particularly useful to begin with. Now is the perfect time to cut losses and start over.

Figure out what it's for. One of the fatal weaknesses of BS Testing accountability is that a single test was supposed to be useful for a dozen different purposes. That is not how tests work. Every tool is made for a particular purpose; you cannot use a hammer to hammer nails, drill holes, screw in screws, cut lumber, paint siding, and comb your hair. But the Big Standardized Test was supposed to be a measure for a myriad of purposes, from informing curricular choices to allowing state educrats to compare schools to evaluating teachers to telling parents how their kids were doing. It should not be a radical notion to declare that you intend to settle on the purpose for a tool before you design and built that tool. 

Note: this discussion should also include some "why" questions, e.g. why do we need to track individual students' results over their career? There may be good answers to some of these why's, and knowing them would help better focus the instrument we're designing. 

Also note: the discussion of purpose should stick to real things. Croft works back around to the notion that we need to track students and school achievement so that we can allocate resources and support, an argument people have been making for several decades despite the fact that has never, ever been how it has worked. Low test scores have not gotten schools extra help.

Create assessments that actually assess. Pro tip: whatever purpose you settle on, a multiple choice test will not be the best way to assess it. In fact, an assessment that can be scored by a computer probably isn't it, either, even though so many people seem to really, really want a computer-managed assessment system. 

Don't build it backwards. One of the problems with that insistence on computer assessment is that you immediately put yourself to the business of asking what a computer can assess instead of what you need to assess. That has been one of the major failings of the modern assessment system, which has asked what it can assess quickly, simply, and profitably, rather than what needs to be assessed. It's the old story of the drunk looking for their keys under the lamp post even though they lost the keys a hundred yards away-- "I'm looking here because the light's better." 

None of these things are going to happen, mostly because they are time consuming, because they are costly, and because the people making these decisions will get their advice from test manufacturing companies and not actual educators. Quality assessments that can't be scored by an algorithm are expensive and take time (particularly if you let people see them in order to better interpret the results, requiring the test manufacturer to come up with new materials every year). Croft, in another post, notes that she and her husband found accessing  and interpreting their child's results daunting (and they are trained psychometricians), but test manufacturers have been resistant to transparency both because of proprietary info concerns and because building a better interface would cost more money. 

There are way better ways to assess schools, teachers, students, etc than those we've been using (try Jack Schneider's Beyond Test Scores for an example), and lots of reasons to understand that the Big Standardized Test is a terrible solution (read Daniel Koretz's The Testing Charade for many of them). After twenty-five years of this baloney, we really ought to be better at it. 

2 comments:

  1. As a consultant writer for state science exams under NCLB requirements, I received formal training in test development.
    A few thoughts based on this experience regarding a goal of more valid, reliable, and useful standardized tests.

    1) Standardized tests are only as good as the standards that are used to develop test items. I had the opportunity to read science standards from over one dozen states. Bad or poorly written standards almost always (and unavoidably) resulted in bad or confusing test items. A well written science standard almost always resulted in a well written test item.

    2) It is impossible to write a valid, objective (MC) test item when the standard being used is vague and subjective. This was the fundamental problem regarding Common Core ELA assessments, which rendered these scores as worse than useless. Writing tests that attempt to assess critical thinking or problem-solving skills was, and always will, be a fool's errand.

    3) The scoring of standardized tests is completely arbitrary and is manipulated to meet the demands of the customer. Enough said here.

    4) The dream (hallucination) of using test data to "drive" instruction and as a tool to evaluate teacher effectiveness was nothing more than a scam perpetrated through misleading claims
    that are easily debunked by scratching the surface.

    No standardized test will ever answer the question, "WHY did the student answer incorrectly?"

    Were they, Unmotivated? Apathetic? Angry? Tired? Physically sick/ill? Confused by a poorly written test item? Absent during instruction (or prep)? Distracted during instruction? Provided with ineffective or incomplete instruction? Socially promoted without appropriate skills/knowledge? Learning disabled? Dyslexic?

    How to Build a Better Test (If you really want to)

    1) WRITE GOOD STANDARDS
    Tested learning standards must clear, objective, and age appropriate; they should cover only essential/important procedural or content knowledge. Eliminate all vague, confusing, subjective, or trivial standards.

    2) WRITE OBJECTIVE TESTS using GOOD STANDARDS
    Accept that not everything worth leaning can be tested. Be clear about the limitations of testing.

    3) USE SIMPLIFIED yet GRANULAR SCORING

    4) PROVIDE COMPLETE TEST TRANSPARENCY

    5) MAKE TYESTS "COUNT" and STUDENTS ACCOUNTABLE

    ReplyDelete
  2. It seems like a no-brainer that we are due for adaptive tests. If we are stuck with computer tests, let them at least be adaptive.

    Oh wait, that won't be accepted because then they'd have to abandon the dictum that all material taught is material from 'grade level'. If the adaptive test dipped down to show a student's true level, while still showing growth, it would undermine the Test-At-All-Costs crowd's desire to cram in only grade-level material even if all the students in the class are two years behind and can't access the material in a reasonable way.

    ReplyDelete