Thursday, October 29, 2015

The Only Good Standardized Test

As testing has risen once again to the surface of the ed policy soup,  I have found myself in versions of the same conversation, because people who like the idea of standardized testing really like the idea of standardized testing, and because I said the number of necessary standardized tests is zero.

data from tests are the life blood of education and he took exception with my exception. Someone in the comments called me a "union shill." And a reporter asked me what the alternative to standardized testing would be.

It's a fair question. Is there such a thing as a useful standardized test?

But First a Few Words about Opposition

To have this conversation, we have to get one thing out of the way first. If you believe (and I think some reformsters sincerely believe it) that the only reason that teachers oppose the current high stakes test-and-punish status quo is because their self-serving union tells them to, you are blinding yourself to some real issues. First, there is a real gulf between national union leadership and rank and file teachers precisely because union opposition to reformster policies has been tepid at best. For the most part, NEA and AFT leadership is not whipping up opposition to ed reform policies-- they are trying to tamp it down.

The teacher opposition to testing comes first and foremost from teachers who are watching testing become a toxic and destructive element in our classrooms. Testing doesn't just drive the bus, but it drives it straight toward a cliff. It gets in our way, interfering with our ability to deliver real education. It's detrimental to our students. It is educational malpractice. And on top of all that, it is used in many places to deliver a professional verdict on our schools and ourselves with an accuracy no greater than a roll of the dice.

The other opposition to testing comes from the other people who see how it plays out on the ground-- the parents. The Opt Out movement was not created by teachers, is not led by teachers and, in some places, is actually potentially damaging to teachers under the current bizarro test-driven accountability system.

So if you imagine that test opposition is some sort of political ploy engineered top-down by the unions, you are kidding yourself.

None of That Answers the Question, so Let's Get Back To It

If I am such a dedicated opponent of standardized testing, what do I propose as an acceptable substitute.

Before we go any further, we'd better clarify our rather fuzzy terms.

"Standardized" Test?

Come to think of it, we'd better clarify "test" as well. For many folks, it's only a "test" if the student is answering questions. A five page paper assignment, for instance, is usually not called a test. In fact, the more open-ended the assessment, the less likely folks are to call it a "test." In schools, a test (students must prove they know something) is different from tests anywhere else (e.g. if we test the water, it is not up to the water to prove anything, but it is up to the tester to find a way to measure the nature of the water). Requiring students to prove themselves is the very first step in developing a bad assessment.

"Standardized" when applied to a test can mean any or all (well, most) of the following: mass-produced, mass-administered, simultaneously mass-administered, objective, created by a third party, scored by a third party, reported to a third party, formative, summative, norm-referenced or criterion referenced.

This broad palate of definitions means that conversations about standardized testing often run at cross-purposes. When Binis talks about the new performance assessment task piloting in NH, she thinks she's making a case for standardization, and I'm think that performance based assessment is pretty much the opposite of standardized testing. There's a lot of this happening in the testing debates-- people arguing unproductively because they have very different things in mind.

Acceptable Substitute for What Purpose

The confusion is further exacerbated by a myriad of stated and unstated purposes for standardized testing. This confusion about purpose has emerged as a huge issue in the ed debates because far too many of the amateurs designing testing policy don't understand this at all. At. All.

It's not just that reformsters argue that you can make the pig gain weight by measuring it. It's that they also assert that the scales used for weighing the pig can also be used to measure the voltage of your house's electrical system and the rate of water flow in the Upper Mississippi.

If we want to find an acceptable test, we have to first declare what the test is going to be used for.

Ranking schools, students and teachers

This is where purpose becomes important, because I can't think of a good test for achieving these goals because I don't think these goals are worth achieving. As a teacher, I don't need to know how my student compares to students in Idaho. I don't need to know that as a parent, either.

Comparing teachers to other teachers, schools to other schools, students to other students-- it's a fool's game. First of all, I can only make the comparison based on a narrowly defined criteria. Otherwise I'm reduced to deciding if my insensitive smart flabby artist student ranks lower or higher than my sensitive tall winning cross country racer student. The comparison only has meaning if it is based on narrow criteria (which student answered the most math problems correctly on Tuesday)-- but what good is a narrowly defined comparison.

If I find that my smart, funny wife is not as smart and funny as some other woman, should I be unhappy in my marriage? If this delicious steak is not as delicious as the steak I had last night, should I spit it out? If all the teachers in my school are great, should it be closed down because some other school has greater ones?

The signature feature of a ranking system is that it locates losers. But what decent teacher would stand in front of a class of thirty on the first day of school and say, "Five of you will turn out to be losers." Testy science wonks like Binis would scold me for saying "loser" and argue for something less loaded and more clinical, but I'm working with students and all the sugar coating in the world will not hide the medicine in this model.

Ranking and rating means that even if everyone is excellent, the least excellent must be marked Below Basic or Underperforming or Just Not Good Enough. A system based on ranking and rating is a system that assumes that in every endeavor, there are people who just aren't good enough. I reject that view of the world, and so I reject any testing system designed to re-inforce that view. If everybody in my classroom does a great job, everybody in my classroom gets an A.

Providing feedback for parents

Here we have a standardization problem because not all parents want the same feedback. Is she getting an A? Is she passing? Is she developing a better grasp of abstract language particularly as used in classic literature? Is she okay? Does she seem happy? These are all types of feedback I've been asked for by some parents. What one measuring tool would satisfy all those questions?

Standardized testing is repeatedly sold with the myth of the clueless parents, the parents who have no idea how their students are doing. But the solution to this problem is transparency, the levels of which can be controlled by the parents.

For example, the electronic gradebook. Our parents can look up their students any time and see exactly what I see when I pull up the gradebook. Some of my parents look every day. Some look never. Some look and then call or email me to ask, "So what exactly was this one assignment."

When we control the available information, we do parents a disservice. Only revealing the grade at report card time is a disservice. But anyone who has taught at a school with big detailed portfolio gradeless systems can tell the story of the parent who looked at all that data and said, "Look, can you just tell me what grade she's getting?"

Parents deserve just as much feedback as they want. Standardized testing has nothing to do with providing that.

Feedback for teachers

Any decent teacher generates this kind of data daily. Any lousy teacher will have no use for standardized test data even if it arrives on gold-clothed ponies.

You are dodging the question

Okay, yeah. I've laid out my usual assortment of objections to standardized testing, but I still haven't said what would be an acceptable substitute. If you're still here, I'll try to address that now.

What qualities would an acceptable-to-me standardized test have?

If I ever were to find a standardized test that I could live with (or even date regularly), this is what it would look like.

Criterion-based (and so, objective)

If I'm going to measure my students against a standard, not against each other. I can use the test to answer the question, "Do my students know how to find verbs" or "Can my students identify dependent clauses?" If every student in my class can't potentially get a top score, I'm not interested. And if it's not objectively scoreable, it's no help. That means that no standardized test is going to be used for any higher order critical thinking type skills.

(This is part of the whole point of Depth of Knowledge testing love--it creates the illusion that higher order stuff can be scored objectively. But no, it can't).

It is possible to come up with standardized questions. I once had a textbook with great literature questions-- but I still had to evaluate the answers myself.

In fact, I can only see using a standardized test for checking the lowest levels of simple operations-- simple recall, basic application.

As Close to Authentic as Possible

I want a task that actually assesses what it claims to assess. Multiple choice questions don't assess writing skills. Click-and-drag questions don't assess critical thinking.


This ought to go without saying, but if I don't get to see the questions, the answers, and the exact results from my students, then, no, thank you. I can do better myself.


I rarely re-use my own test-like assessments; instead, I make new ones each year to fit the class and the instruction. Particularly when I'm working summative assessments, I'll create something that focuses on the issues that we're struggling with. For instance, if we're solid on spotting infinitive phrases but have trouble picking out gerunds used as direct objects, I can design a test that will help both me and my students. I can adjust assessment to build confidence or prompt a come-to-Jesus moment.

Expertise and Convenience

There are lots of things I don't know. Materials prepared by people who are experts in particular areas are a necessary aid, and those sometimes include assessments. I'm happy to have an expert in a particular field in my classroom.

And at some points, I can use the convenience of having something pre-built to save me some time.

So, the acceptable alternative...?

Man, this ended up far longer than I meant it to, but I wanted to seriously examine my thoughts about this. Do I really think that there are no necessary standardized tests?

Well, Binis is correct when she argues that we all use standardization because we don't completely individualize everything from assessment through evaluation-- but that's a hugely broad definition of "standardized."(She disagrees with my reading of her "ever do this..." list.)

By that standard (har) everything used with two or more students is a standardized test-- and maybe it's useful to think of standardization as a sliding scale. The more we broaden the reach of the assessment, the more students we try to make it each, and the more we try to make the grading of the test be quick and uniform, the less useful the assessment becomes. A test that you can give to every student in America and which can be scored in just a week will by necessity be inauthentic and measure little. So for best classroom assessment, we stay as close to the individualized specifics end as we possibly can. The more that an assessment is developed in response to specific instruction by a specific teacher of specific students, the more useful that assessment will be in performing the most useful function of any test-- telling students and teachers where their strengths and weaknesses lie.

Yes, that information is not what the policymakers would really like to have. But the information they would like to have is completely useless to me in the classroom (and so far, they've found no reliable method for either collecting or using such information anyway). I'm not convinced that information can be collected by standardized tests anyway, but Good lord in Heaven am I still typing???

The number of necessary standardized tests, the number of tests I really need in my classroom? I still say zero. Mind you, I'm not saying that all standardized tests are an evil plague, and stripped of baseless high-stakes consequences, their plaguiness is greatly reduced. There are standardized tools that are tolerable, and a few that might rise to the level of useful. But necessary? Needed?

Still zero.


  1. I like the example of testing the water. We ought to think of testing in education the same way: to find out the "nature" of the student (strengths, weaknesses, abilities, interests.)

    You're so patient to explain all this over and over! But then, to my mind, patience is one of the necessary qualities of a good teacher.

  2. I'm so tired of "what's the alternative?". If someone is beating you over the head, you might have trouble concentrating. If "experts" pop out of the woodwork and offer you all sorts of programs and products to help with your concentration problem, you'll probably refuse. But, OMG, what's the alternative?! The alternative, of course, is to stop beating you over the head.

  3. Good response, but I think it could be still punchier, because "what's the alternative?" in this case is even more disingenuous than usual here, unless these people are even more stupid than they seem. Standardized testing as applied in the US now is the outlier, both historically and globally. The easiest examples are:

    a) US private schools;
    b) US public schools prior to 15-20 years ago;
    c) educational systems in most other countries, especially before the past 20 years or so.

    In particular, even systems in other countries that do emphasize testing, standardized to varying degrees, do it very differently than we are doing it now (see Linda Darling-Hammond, etc.). Where they are important in other countries, they still aren't annual high stakes events starting in primary school.

    Finally, as test crazy as we are, we don't even use standardized assessment in all subjects.

    What we're doing now is the outlier, an experiment. The alternatives are all around us.

  4. Double amen.
    Besides: what's the alternative to a worthless test that doesn't accurately measure a vaguely-defined ability (like critical thinking) to serve a worthless goal (sort and rank)? It's such a dopey question.
    "Well, maybe the ordeal by water isn't perfect. But you tell me, smartass - do you have a better way to identify how many witches there are in your community?"

  5. Posted my response before I read Tom Hoffman's above - yes, absolutely right. I wentto school in the UK and France - never encountered a standardized test, only examinations. The first time I saw all the scaffolded materials and multiple-chouce, fill-in-the-blank things, they looked so strange and comic.

  6. This is one of those posts that makes me want to jump up and yell "HA! YES! SEE?!?! SEE?!!? and run around showing it to people in a maniacal frenzy, hitting them with bricks if they won't hear what's being said.

    In practice, I've found this isn't as effective as I might like - but the urge is still there.

  7. Excellent summary Peter of the way standardized tests have warped our education system. The thing that is being obscured by the Obama administration is that standardized tests are of no use to students and parents. They give them no empirical information about what a student knows and a how a school should remediate (or that the standardized tests only measure one type of knowledge). Standardized tests are only for ranking and sorting students and schools for a privatization agenda.

    One disagreement: I do not believe the union leadership is simply "tamping down" opposition to corporate ed reform. They are actively collaborating with the corporate reform agenda. Look at the links with my research about this on my blog Defend Public Education! in my article "Beware the Corporate Media Spin on the Obama Administration's Change of Course on Standardized Testing"

  8. What is the alternative to standardized testing? I have a simple answer: trust the teacher.

  9. In Peter and my continued discourse around the nature of standardized testing, I posted a follow-up post here:

    I would add my two cents to Gregory's comment that all of the alternatives that are fantastic and wonderful and exactly what we want in school, that teachers want to do, involve some degree of standardization around design or construction. Yet, teachers are human which means they are fallible. Attending to features like fairness and bias (which is part of standardized design) isn't a lack of trust in teachers, but rather an acknowledgement that every human carries biases and we need process and protocols to ensure that our biases don't impact students. It remains one of the great challenges of education that a tool designed by the US military to screen out "undesirable" applicants, namely men of color or from poverty, from officer school has become so widely adopted.

    IMO, multiple choice is the dragon we should be slaying. Not standardization.

    1. It's true, we need to have some consensus about what our terminology means. From studying the dictionary, I would say that the definition of "test" is "a way of measuring the knowledge, skills, or aptitudes of an individual or group," and "standardized test" is "a method of judging the extent of conformity to a model (standard or criterion) determined by an authority."

      Peter's definition of "standardized test" seems to be "a test whose standards were determined by someone other than the teacher teaching the students."

      The Department of Education definition of "standardized test" seems to be "a state-wide annual test norm-referenced instead of criterion-referenced."

      Your definition seems to be " the same task/test given to a group under the same conditions such as length of time allotted and using the same grading scale."

      There is nothing in the dictionary that would indicate a "standardized test" would have to be given to more than one person. The only definition of "standardized" meaning "uniform" or "the same" is in reference to a "standard gauge" as in size of railroad ties, for example. There is nothing uniform or the same about students.

    2. The Code of Fair Testing in Education you reference is interesting. It says that tests have to be fair "regardless of age, disability,...linguistic background, or other personal characteristics." "Other personal characteristics," I would think, could be ADD, ADHD, autistic spectrum, visual/spatial learner, audio learner, tactile learner, or learner who suffers from test anxiety and whose operating memory shuts down, among many others. How can giving all students the same kind of test and the same allotted time be fair? Standardizing (uniform process and procedure) does not guarantee "fair and free of bias" and does not "ensure that all test takers are given a comparable [equivalent; virtually identical in effect or function] opportunity to demonstrate what they know and how they can perform." Plus you have the bias of whatever fallible human created the test. Plus you have the unequal conditions of some students sleeping more or less than others the night before, some who had a good breakfast and some who didn't eat anything, and some who have huge personal problems they're wrestling with that day, while others don't.

      The Code also says that test developers have the obligation to provide "evidence that the technical quality, including reliability and validity, of the test meets its intended purpose." Hmm. Nothing like that provided for PAARC or SBAC.

      The Code further says the developer should make "appropriately modified forms of tests or administration procedures available for test takers with disabilities who need special accommodations" and provide "guidelines on reasonable procedures for assessing persons with disabilities who need special accommodations or those with diverse linguistic backgrounds." Oh, no, don't bother. Arne decided that's not allowed!

      The Code also says developers or users (those who adminster the tests and make decisions based on them) must provide "the rationale, procedures, and evidence for setting performance standards or passing scores." Whoops! Nope. Cut scores arbitrarily set after the fact.

      And the users should avoid "using tests for purposes other than those recommended by the developer unless there is evidence to support the intended use or interpretation." So who recommended state tests should be used to shut down schools and where's the evidence this is a productive use? And the American Statistical Association stated emphatically that student test scores and VAMs are invalid to use to evaluate individual teachers, so that's the opposite of supporting evidence.

      Whew! That's a whole lot of demonstrable educational malpractice.

    3. I really don't understand your reference to the importance of the link to The Hechinger Report about "assessment literacy." Really? Teachers are "assessment illiterate"? It ain't rocket science. Creating assessments are one of the easiest things we do. The difficult part is deciding what learning strategies to use with which students and the hundreds of judgment calls we have to make each day in class.

      You say you "want there to be a standardized approach to how we collect large-scale evidence of learning in order to inform systemic decisions and policy." What kind of "systemic decisions and policy"? How do tests help inform them? It's just data collection. Go ahead. Measure metrics to your heart's content. It's meaningless to teachers. It doesn't tell us how to teach better. Cognitive psychology, observation, reflection, and experience does.

    4. No, standardization is the hydra. Multiple choice is just one of its many heads. Anyone who understands the deeper purpose of education would understand that.

  10. The real enemy is the normal curve. Even if every single student met the standards, we still would make sure that they are ranked. So if the standard is everyone will know all of their basic single digit multiplication facts and I give them a grade level BS math test you would think if they all get the single digit multiplication items correct then the test would demonstrate that they were proficient. But no- even if every single child gets it correct- they throw in a lot of items that have nothing to do with the standard so they can get their artificially created normal curve. So if the child cannot do double digit multiplication or algebra they will be ranked lower than those who can. Even if double digit multiplication and algebra are not appropriate for that grade and those items are not standards for that grade - nor were they even taught by the teachers, the student is ranked lower, teacher is ranked lower and the school is ranked lower. I cannot understand that if we artificially make sure half the children are below average (kind of the definition of average) then we punish everyone who is below average even though below average is just a manipulation of test items and cut scores. This is not an accurate assessment of standards. Peter is right- if I want to know if a student meets a standards I check the standard- and there is no predetermined number of students who can pass it or not. The normal curve is an unsubstantiated theory- and has no relevance for career or college readiness. Why is that hard for people to understand?

  11. I rarely agree with Greene but I do at least partially agree with him here that tests do have biases. I used to be a tutor for the SAT and GMAT. Many years ago, the GMAT had logic questions related to family hierarchy. For example, what is the relation between you and your grandfather's nephew ? However, these were removed when it became clear that such questions unfairly penalized students who did not come from tradtional families.

    However, the key part of Greene's piece is wrong. "A reliance on testing means that we make judgments about what behaviors, knowledge and skill is worth measuring. A belief in the perfect awesomeness of standardized tests leads us quickly to the conclusion that only things that can be measured by standardized tests are worth knowing or doing."

    We are NOT saying that only those things measured by tests are worth knowing or doing. And context does matter. I recently took my young daughter to the doctor. As part of a routine visit, they measured her blood pressure. This is an objective measure like our standardized tests. I was surprised that her blood pressure was much higher than I expected. I am accustomed to numbers like 130/80. But context matters. Young children normally have much higher blood pressure than adults. And certainly the blood pressure measure was not the only thing that measured. The doctor listened to her breathing and felt her stomach for resistance. These things aren't as easily measured quantifiably. But this does NOT mean that the objective measures were irrelevant. Far from it. The measures like blood pressure, temperature, etc. were essential measures that gave the doctor some idea of where my daughter measured relative to peers.

    And if she had a high temperature, that doesn't mean that the doctor was bad. But certainly it is fair to look at measures such as the legnth of time until the child was well again, incidence of infection, etc. and compare these to peer groups. In fact, Obamacare (which liberals like) was lauded as using such "results oriented" measures. The fact that teachers don't want to be measured in any objective manner (even in context such as looking only at subsets such as improvement in low income children) strongly indicates that they just don't want to be measured ... or that if you do measure students (e.g. NAEP), such results should not reflect on the teachers. Yes, such tests may be biased (though improved) and they may be just a part of what a teacher is trying to do. But just as with doctors they are an important part of evaluating whether that student is getting an effective education from that teacher and that school.

  12. Be very skeptical of anyone who wants to "measure" students by the same criteria. These people value conformity, authority, and neatness, and hardly see individual students in the process.