Wednesday, March 2, 2016

Ace That Test? I Think Not.

The full court press for the Big Standardized Test is on, with all manner of spokespersons and PR initiatives trying to convince Americans to welcome the warm, loving embrace of standardized testing. Last week the Boston Globe brought us Yana Weinstein and Megan Smith, a pair of psychology assistant professors who have co-founded Learning Scientists, which appears to be mostly a blog that they've been running for about a month. And say what you like-- they do not appear to be slickly or heavily funded by the Usual Gang of Reformsters.

Their stated goals include lessening test anxiety and decreasing the negative views of testing.  And the reliably reformy Boston Globe gave them a chance to get their word out. Additionally, the pair blogged about additional material that did not make it through the Globe's edit.

The Testing Effect

Weinstein and Smith are fond of  "the testing effect" a somewhat inexact term used to refer to the notion that recalling information helps people retain it. It always makes me want a name for whatever it is that makes some people believe that the only situation in which information is recalled is a test. Hell, it could be called the teaching effect, since we can get the same thing going by having students teach a concept to the rest of the class. Or the writing effect, or the discussion effect. There are many ways to have students sock information in place by recalling it; testing is neither the only or the best way to go about it.

Things That Make the Learning Scientists Feel Bad 

From their blog, we learn that the LS team feels "awkward" when reading anti-testing writing, and they link to an example from Diane Ravitch. Awkward is an odd way to feel, really. But then, I think their example of a strong defense of testing is a little awkward. They wanted to quote a HuffPost pro-testing piece from Charles Coleman that, they say, addresses problems with the opt out movement "eloquently."

"To put it plainly: white parents from well-funded and highly performing areas are participating in petulant, poorly conceived protests that are ultimately affecting inner-city blacks at schools that need the funding and measures of accountability to ensure any hope of progress in performance." -- Charles F. Coleman Jr.

Ah. So opt outers are white, rich, whiny racists. That is certainly eloquent and well-reasoned support of testing. And let's throw in the counter-reality notion that testing helps poor schools, though after over a decade of test-driven accountability, you'd think supporters could rattle off a list of schools that A) nobody knew were underfunded and underresourced until testing and B) received an boost through extra money and resources after testing. Could it be that no such list actually exists?

Tests Cause Anxiety

The LS duo wants to decrease test anxiety by hammering students with testing all the time, so that it's no longer a big deal. I believe that's true, but not a good idea. Also, parents and teachers should stop saying bad things about the BS Tests, but just keep piling on the happy talk so that students can stop worrying and learn to love the test. All of this, of course, pre-supposes that the BS Tests are actually worthwhile and wonderful and that all the misgivings being expressed by professional educators and the parents of the children is-- what? An evil plot? Widespread confusion? The duo seem deeply committed to not admitting that test critics have any point at all. Fools, the lot of them.

Teaching to the Test

The idea that teaching to a test isn’t really teaching implies an almost astounding assumption that standardized tests are filled with meaningless, ill-thought-out questions on irrelevant or arbitrary information. This may be based on the myth that “teachers in the trenches” are being told what to teach by some “experts” who’ve probably never set foot in a “real” classroom.

Actually, it's neither "astounding" nor an "assumption," but, at least in the case of this "defiant" teacher (LS likes to use argument by adjective), my judgment of the test is based on looking at the actual test and using my professional judgment. It's a crappy test, with poorly-constructed questions that, as is generally the case with a standardized test, mostly test the student's ability to figure out what the test manufacturer wants the student to choose for an answer (and of course the fact that students are selecting answers rather than responding to open ended prompts further limits the usefulness of the BS Test).

But LS assert that tests are actually put together by testing experts and well-seasoned real teachers (and you can see the proof in a video put up by a testing manufacturer about how awesome that test manufacturer is, so totally legit). LS note that "defiant teachers" either "fail to realize" this or "choose to ignore" it. In other words, teachers are either dumb or mindlessly opposed to the truth.

Standardized Tests Are Biased

The team notes that bias is an issue with standardized tests, but it's "highly unlikely" that classroom teachers could do any better, so there. Their question-- if we can't trust a big board of experts to come up with an unbiased test, how can we believe that an individual wouldn't do even worse, and how would we hold them accountable?

That's a fair question, but it assumes some purposes for testing that are not in evidence. My classroom tests are there to see how my students have progressed with and grasped the material. I design those materials with my students in mind. I don't, as BS Tests often do, assume that "everybody knows about" the topic of the material, because I know the everybody's in my classroom, so I can make choices accordingly. I can also select prompts and test material that hook directly into their culture and background.

In short, BS Testing bias enters largely because the test is designed to fit an imaginary Generic Student who actually represents the biases of the test manufacturers, while my assessments are designed to fit the very specific group of students in my room. BS Tests are one-size-fits-all. Mine are tailored to fit.

Reformsters may then say, "But if yours are tailored to fit, how can we use them to compare your students to students across the nation." To which I say, "So what?" You'll need to convince me that there is an actual need to closely compare all students in the nation.

Tests Don't Provide Prompt Feedback

The duo actually agree that test "have a lot of room for improvement." They even acknowledge that the feedback from the test is not only late, but generally vague and useless. But hey-- tests are going to be totes better when they are all online, an assertion that makes the astonishing assumption that there is no difference between a paper test and a computer test except how the students record their answers.

Big Finish

The wrap up is a final barrage of Wrong Things.

Standardized tests were created to track students’ progress and evaluate schools and teachers. 

Were they? Really? Is it even possible to create a single test that can actually be used for all those purposes? Because just about everyone on the planet not financially invested in the industry has pointed out that using test results to evaluate teachers via VAM-like methods is baloney. And tests need to be manufactured for a particular purpose-- not three or four entirely different ones. So I call shenanigans-- the tests were not created to both measure and track all three of those things.

Griping abounds about how these tests are measuring the wrong thing and in the wrong way; but what’s conspicuously absent is any suggestion for how to better measure the effect of education — i.e., learning — on a large scale.

A popular reformster fallacy. If you walk into my hospital room and say, "Well, your blood pressure is terrible, so we are going to chop off your feet," and then I say, "No, I don't want you to chop off my feet. I don't believe it will help, and I like my feet," your appropriate response is not, "Well, then, you'd better tell me what else you want me to chop off instead.

In other words, what is "conspicuously absent" is evidence that there is a need for or value in measuring the effects of education on a large scale. Why do we need to do that? If you want to upend the education system for that purpose, the burden is on you to prove that the purpose is valid and useful.

In the absence of direct measures of learning, we resort to measures of performance.

Since we can't actually measure what we want to measure, we'll measure something else as a proxy and talk about it as if it's the same thing. That is one of the major problems with BS Testing in a nutshell.

And the great thing is: measuring this learning actually causes it to grow. 

And weighing the pig makes it heavier. This is simply not true, "testing effect" notwithstanding.


Via the blog, we know that they wanted to link to this post at Learning Spy which has some interesting things to say about the difference between learning and performance, including this:

And students are skilled at mimicking what they think teachers want to see and hear. This mimicry might result in learning but often doesn’t.

That's a pretty good explanation of why BS Tests are of so little use-- they are about learning to mimic the behavior required by test manufacturers. But the critical difference between that mimicry on a test and in my classroom is that in my classroom, I can watch for when students are simply mimicking and adjust my instruction and assessment accordingly. A BS Tests cannot make any such adjustments, and cannot tell the difference between mimicry and learning at all.

The duo notes that their post is "controversial," and it is in the sense that it's more pro-test baloney, but I suspect that much of their pushback is also a reaction to their barely-disguised disdain for classroom teachers who don't agree with them. They might also consider widening their tool selection ("when your only tool is a hammer, etc...") to include a broader range of approaches beyond the "test effect." It's a nice trick, and it has its uses, but it's a lousy justification for high stakes BS Testing.


  1. >Weinstein and Smith are fond of "the testing effect" a somewhat inexact term used to refer to the notion that recalling information helps people retain it.<

    They apparently have not seen the actual tests in question. The ELA tests contain virtually zero test items that require the recalling of information.

  2. Is anyone else having any difficulty posting to this or other blogspot site? I can post from my home computer, but on my work computer the comment box is not there and there is no way to bring it up. I don't know if this is a blogspot glitch, a glitch with my work computer or a deliberate move by our IT department. Thoughts?

    1. Dienne,

      I find that it is browser specific. Firefox works well, Chrome not so much.

  3. The testing effect is rather old school: answering questions about something enhances memorization of those things. This is as old as writing flashcards, and as it long predates NCLB, no doubt it will be accepted by the commentariate.

    More interesting is the issue of what a grade might mean in Peters class, in my class, or any other class. Peter says that his exams are meant to gauge if students have grasped the material. Without knowing the material in Peter's class and knowing how Peter gauges "grasping", it is impossible to assign any meaning at all to the results of Peter's exams. Do the students who pass Peter's exams know how to read? Maybe they do know how to read, maybe not. Even if it means that students are literate in Peter's class in rural Pennsylvania, does passing a high school ELA class at Nelson Island Area High School mean the same thing (for those that believe that standardized tests mean something, 12% of the students at Nelson Island Area High School are proficient in Math and 7.5% are proficient in reading. For those that think standardized tests have no meaning, well I guess it means the students at Nelson Island Area High School are not academically any different from other high school graduates.)

    Standardized tests are valuable to all of us who are not in the K-12 classroom every day. The parents who want to think that a grade in a class means something about what a student has learned. The taxpayers who want the students in the community to learn. The employers who want a high school diploma to certify that a student is actually literate and numerate.

    I do agree that BS tests are of little use. What is unclear is if the standardized tests are the BS test or the teacher written exams are the BS test.

    1. The same argument about variation can be made for your doctor. There aren't nationally accepted tests for most health conditions where you are ranked against all other humans of your age in the country. One doctor may test and treat one way, another a different way. There is some overlap but methodologies and even prices are often totally different. Why don't we fight for universal health screening so I can know how my health compares? Because people want personalized medicine that treats them as an individual by a doctor who knows them and don't much care about comparisons to someone in West Virginia or wherever.

      It comes down to trust. We trust that an M D. means that a certain level of competence has been achieved by our doctors despite the horror stories of near criminal negligence or incompetence that almost everyone has heard from friends or family. We aren't anywhere near the top in anything in international health rankings but no one has decided that this is all because our doctors are lazy and stupid. Unlike doctors (or lawyers, accountants, or car mechanics) we don't see teachers as trustworthy professionals and this is at the base of the testing mania and the push to teacher proof education.

  4. Excellent, excellent analysis and rebuttals, Peter.

    I note that these psychology instructors are, of course, of the present vogue "data-driven above all" variety.

    Of course the more times you practice recalling information from memory, the longer you're liable to remember it, but there are many ways to do that. There are also many ways to encode/engrave the information in your memory to begin with, and the way you do that is also important for retention, but that's irrelevant to these people.

    These people also think we teachers are stupid because so many of us "still believe" that individual students don't all learn the same way. No studies have as yet been designed correctly to "prove" that individual learning styles exist; ergo, these differences are non-existent. Also, it's not proven that it would be more cost-effective to teach different students different ways than to do other types of intervention, like one-on-one tutoring, so don't even think about it.

    I'm not saying we should use only "congruent" modalities, and try to teach some students only with text, others only visually, others only auditorily, others only kinesthetically. Limiting modalities used is not a good idea because most students learn in a variety of ways; how learning best takes place also depends on the nature of the particular knowledge, skills, and concepts being taught; and practice with modalities that don't play to a particular student's strength, to a certain extent, can strengthen their ability in that modality. But one of the biggest mistakes a teacher can make is to assume that all students learn in the same way she does. So first you have to determine the best way to introduce a topic, according to the nature of the topic and what the learning objectives are, that will work with most students. But you should always use as many modalities as possible. And if what you're doing isn't working with a particular student, then in the one-to-one intervention you should certainly try to figure out what kind of modalities work best with that particular student.

    Of course, if all students learned the same way, only some more slowly than others, then we certainly wouldn't need special ed, everything could be standardized, and everybody could learn on their own, at their own pace, using a computer program, without teachers. Which seems to be the point of denying that everybody doesn't learn the same way.

  5. If you want a scientific explanation for why BS tests...and standards in general, especially one-size-fit-all standards...are not only useless but hugely detrimental, please read the new book The End of Average by Todd Rose. If it doesn't make you rethink the whole premise of public education, nothing will.

  6. I've read a lot of Greene's articles on testing. And it seems his issue is not so much the tests themselves but our use of of those tests. He has previously supported the long-standing use of the NAEP tests. Contrary to his point, "I can also select prompts and test material that hook directly into their culture and background", NAEP tests are not customized. Per the NAEP web site, "NAEP asks the same questions and is administered in the same way in every state nationwide, providing a common measure of student progress and making comparisons between states possible."

    Second, I'd suspect that there is strong correlation between the results in Greene's class of "customized" tests vs their scores on standardized tests. See this article for the correlation between PARCC and college grades.

    So really, this isn't a question about standardized tests but rather what they are used for. And on this point, Greene is clear: "So what?" You'll need to convince me that there is an actual need to closely compare all students in the nation."

    The answer is clear, but it's not one that Greene wants to acknowledge (which is why I doubt he posts my comment). We test in order to see whether the various inputs being applied result in more or less effective education. Of course, teachers are just one of those inputs. And of course, attribution will not be perfect. But it does help us get an idea of what is working as (just as important) what is not. See article below for the predictiveness of VAM.

    Greene (and the teachers unions) do not want this because they reject the idea that there is a "good" teacher (or a "bad"one) ... or that there is a good way of measuring what that means. To them, teachers are like artists. The evaluation of one is inherently subjective and best left to fellow artists.

    But education is not art - though there may be an art to teaching (not close to the same thing). Here is a sample question from the NAEP (Math) for 9-yr olds:

    What is the distance all the way around a square that has a side length of 10 inches?

    A. 10 inches
    B. 20 inches
    C. 40 inches
    D. 100 inches

    Please tell me how this is culturally biased ? Please tell me how that question on a standardized test would differ substantially from one that any given teacher might ask. Now, you might suggest that some 9-yr olds wouldn't know the formula for the perimeter of a square (and note that they didn't use the word perimeter). But that's the point ! It's the answer to Greene's question (about the need to compare all students in the nation) -

    If a significantly higher portion of poor, minority 9-yr olds in Camden, NJ get that (and similar) questions correct than their peers in Oakland, CA ... then maybe it's worth asking what's going on with elementary school math education in Oakland (or what's going right in Camden) ?

    Note that none of that involved removing tenure or any other hot button union issues. But if we have certain expectations for kids of different ages (such as being able to calculate the perimeter of a square), then it makes sense to test the extent to which this is being done ... and then correct what we can and try again ... This is how we improve.