Thursday, December 3, 2020

Election Polling and the Big Standardized Test

From the Washington Post to the Wall Street Journal to the Atlantic and beyond, writers weeks after the election castigating the pollsters for yet another less-than-stellar year. But education writer Larry Ferlazzo moved on to another question—”Could Polling Errors in the 2020 Election Teach Us Something About The Use Of ‘Data’ In Education.

He’s onto something there. Thinking about the Big Standardized Test from a polling perspective helps illuminate why the tests themselves, and the data-driven philosophy in education, have worked out so poorly.

After all, those big “measure everyone” standardized math and reading tests that states are mandated to give every year are similar to opinion polls—they are trying to measure and quantify what is going on in peoples’ heads. And to work, there are several things the data-gatherers have to get right.

Ask the right questions.

For an election poll, this seems simple enough. “Which candidate are you going to vote for?” But trying to fold in other information like political leanings, positions on other issues, the strength of the voter’s opinions—that gets harder.

For math and reading tests, this part is trickier. Imagine, for instance, you want to know if a student can find the main idea of any piece of writing. Could you measure that by asking just a couple of multiple choice questions? How many questions do you think it takes to figure out whether or not someone is a good reader?

The respondent has to care enough to make a good faith response.

Whether it’s someone who doesn’t want talk about their political choices out loud, or someone who is so tired of answering those damned phone calls that they just start saying anything, a poll cannot collect useful and authentic data if respondents don’t care enough to cooperate.

Ditto for a standardized test. Take the NAEP 12th grade results recently released to hand-wringing by Education Secretary Betsy DeVos and others. High school teachers don’t get very excited about these results because the test is given to high school seniors in the spring. Have you ever met a high school senior in the spring? They are firmly focused on the future; a no-stakes standardized test is unlikely to be a major concern. Students currently in K-12 have been subjected to standardized testing every year of their educational careers. While bureaucrats, researchers, and edu-commentators may consider these tests critical and important, for many students they are just a pointless, boring chore. Students are not sitting there thinking, “I must be sure to do my very best on this so that researchers can better inform policy discussions with an accurate picture of my skills and knowledge.”

There must be a good data crunching machine.

Now, this may not actually matter, because the best model or equation in the world cannot get good results out of bad data. But if the model is bad, the results are bad.

In education, we have seen attempts to take test data and crunch it to do things like find “effective” teachers by computing the “value” they have “added” to students. This super-secret special formula has been disavowed by all manner of professionals and even struck down by a federal court, but versions of it are still in use.

One distinction of election polls is that eventually we actually have the election, and the polls are tested against cold, hard reality. Unfortunately, fans of the Big Standardized Test are able to argue on ad infinitum that the data are real and accurate and useful. We know that raising test scores does not improve student futures, but testocrats are still asserting that we had better get to testing during pandemic school or all manner of disorder will ensue.

Ferlazzo refers to one other problem of being driven by data that the polls highlight, citing an article by Adrian Chiles—”In a data-obsessed world, the power of observation must not be forgotten.” Chiles tells a story:

In 2017, after a nasty bump between a US warship and an oil tanker, Aron Soerensen, head of maritime technology and regulation at the Baltic and International Maritime Council, said: “Maybe today there’s a bit of a fixation on instruments instead of looking out the window.” There’s a lot of this about, literally and metaphorically.

For teachers, teaching driven by test-generated data is rarely more effective than looking out the window.


  1. I teach novice (8th grade) learners in chemistry, physics, geology, astronomy, and meteorology. The NYS federally mandated science test was written by NYS science teachers and has been a fair, basic skills content oriented test in an MC and ER format with a hands-on lab skills component. I administered the test for all 19 years that it has been required and had the advantage of scoring the tests as well. We also had the advantage of factoring test scores into student (classroom) grades. In addition I was technically trained as a consultant item writer for Measured Progress, writing standardized test items using a dozen or so different state science standards. These science tests were a requirement under NCLB and still are under ESSA. Over the years, there has been significant but reasonable pressure from administration to improve test scores because of the "safe harbor" value.

    I teach in a Title 1, small Title city school district that represents the full spectrum of student demographics, with schools that are fully integrated. Equal educational opportunities and common instructional experiences for all with a curriculum filled with brand new subjects/topics and a cohort of students with virtually no prior knowledge. Here are some random observations from my NCLB/RTTT/ESSA test-based experiences.

    Despite identical instructional opportunities and experiences, test scores ran the full gamut and they were remarkably predictable.

    Poorly crafted or subjective standards cannot produce valid/reliable tests/scores (talking to you Mr. Coleman)

    Well crafted, objective standards can produce fairly valid/reliable tests/scores

    Cut scores are not arbitrary; instead they are used to skew scores in whichever direction is desired

    Incorrect test responses lack useful feedback for improving instruction because we never knew the exact reasons why.

    The range of reasons for missed items include:
    Poor attendance
    Chronic inattention/distraction
    Family stressors
    Weak subject aptitude
    Learning disabilities
    Poorly written and/or subjective standard
    Poorly crafted/confusing test item
    Too many standards (specific to this 4 year comprehensive test)
    Unreasonable demands of multi-year, comprehensive testing
    Overlooked curriculum component (instructional problem)
    Poor effort or attendance during test prep sessions
    No-stakes testing
    Test apathy/fatigue (science always followed ELA and math)
    Lack of test score feedback

    My dos centavos

  2. I left out one critical reason:
    The propensity of the novice learner brain to flip-flop, competing vocabulary terms or ides. That despite their best intentions and efforts, non-experts (i.e. all students) tend to misremember and get a little confused (as we all did when learning was new). And when you think about it carefully, from K to 12, virtually all learning is new. It is in fact rare for students to leave the realm of being concrete, novice learners. That's why the whole idea of testing their under-developed brains for 21st century critical thinking skills is beyond ignorant on the part of adults. Basic skills, grade span testing (3, 6, 9, 11) is probably the best alternative to the current madness.

  3. Just one more . . .
    Self-fulfilling prophesy: I suck at __.
    Fill in the blank with math. science. writing. test taking

    Blame this on the system for failing to teach kids the word "yet" and by reinforcing these impressions with number grading to soon.