Saturday, August 8, 2015

Evaluating Evaluation

When discussing the current flock of evaluation tools, both for teachers and for schools, the defense always seems to work back around to this:

How are we going to know which teachers and schools are doing well or not? It's better than nothing, and we have to do something. 

It's a specious argument. If I'm on a sidewalk bleeding and broken and a random stranger approaches with a running chainsaw offering to chop away, I am not going to say, "Sure, go ahead. It'll be better than nothing." Very often things that are "better than nothing," are, in fact, worse than nothing, and I would argue that VAM, for instance, is as destructive as any sidewalk chainsaw medical practice. And it's worth noting that reformsters never use the "better than nothing" argument to justify leaving an ineffective teacher or school in place.

That said, we cannot simply go to a system in which the taxpayers insert their money into a big, black box marked "School" and trust that everything inside the box is hunky dory (a system used in public schools decade ago and in charter schools currently). "Take our word for it," is not an accountability system.

I have some ideas about how to run an evaluation system, but rather than push those today (still saving them for when I retire to start my million-dollar consulting business), let's do something else. We know there are lots of ideas out there about how to evaluate schools and teachers. How can we tell the good plans from the bad plans? What are the characteristics of a good evaluation system? How can we evaluate the evaluation?

Here are the traits that are essential for a useful, viable, good evaluation system.

Give the community voice

One of the challenges of evaluation schools and teachers is that we have about seventy-eight gadzillion ideas about what schools and teachers are supposed to be doing. Teachers often find themselves in the position of a person who thought she was hired to bake pies and finds herself in trouble for having ugly upholstery on her couch.

When it comes to the question of what the schools and teachers are supposed to be doing, the primary voice that must be heard is the voice of the community that the school serves. In other words, any evaluation system that involves outside folks or government officials coming into a community and saying, "Be quiet. We will tell you what your schools should be doing" is a crappy system.

Give a survey, form a committee, do regular outreach, put community leaders in positions of power-- whatever you do, your evaluation system must be based on the priorities set and chosen by all the members of the community. If some guy in the state capital or a policy making group thinks those priorities are "wrong," that's tough. Welcome to democracy.

Embrace the chaos

Somewhere between the deeply sainted Mrs. McAwesometeach and the widely loathed Mrs. O'Suxbuckets are many teachers living in a greyer area (and, honestly, you can still find students who hated the former and who loved the latter). Teacher and school performance vary over time and over student, and the complex constellation of skills involved in teacher guarantee that there are a million different ways to be good at the job.

Any system that draws a hard, bright line between the effective and the ineffective is a crappy system, because no such line can be drawn. Can we find individuals on both extremes on which we can have clear agreement? Probably. But any system that assumes that we can clearly and decisively sort every single school and teacher is a fool's game.

This includes systems that try to distribute teachers or schools on a bell curve. The bell curve guarantees that, even if all the teachers in a school are awesome, some of them will be labeled "sucky" by the system. This also includes any system that tries to reduce teacher ratings to a single score and then tries to create a cut-off line for those scores.

I know some of you want solid hard numberfied data on schools and teachers. You can't have it. You just can't. You can't go through your neighborhood and give each couple a hard data numerical rating of their marriage, and you can't give each of your children a hard data numerical rating on how swell they are and rank their family standing accordingly.

The fundamental basis of education is relationships. You can roughly sort into "probably keepers," "probably not keepers," and "somewhere in the middle." The more precise you attempt to make your system, the more mistakes your system is going to make and the more your system is going to warp and twist and generally screw up your school.

Neither carrots nor sticks, but helping hands

The purpose of an evaluation system is to make the school better. Isn't it? I mean, was there some other purpose I'm missing? No? Good.

The stack-ranking, reward and punish systems completely abdicate systemic responsibility for improvement. They are bosses that say, "Hey, something needs to be fixed here. You, buddy!! You figure out how to fix this right now, or else" or "Hey, this needs to work better. I've got a fiver here for the first person who can get it fixed for me." In both cases, the system sloughs off all responsibility for analyzing, addressing or ameliorating any problems-- it just pushes all of that off on the evaluatees. It is the world's worst coach-- "Hey, you suck. Get back in there and suck less, somehow."

An evaluation system should produce actionable recommendations, and it should result in the necessary assistance to pursue those actions. It does not help to say to a school, "Hey, you are too poor. Be less poor, will you?" Find the problem, and get help for the problem from the appropriate source.

That means that a good evaluation system must also value--

Richness over granules

We've been saying it over and over, but it needs to be repeated until policy makers act as if they get it: a single poorly-constructed narrow standardized test of reading and math does not give us any useful any information about teacher performance.

When I was student teaching, my co-op worked with me daily, and my supervisor worked with me weekly. In my first year, that same supervisor worked with me monthly. He knew tons about me in the classroom, not just in terms of pedagogical techniquery, but how I interacted with different kinds of students in different sorts of situations. His knowledge of my skills (or lack thereof) was rich and deep, and resulted in direct coaching that was specifically tailored to me and my needs. It allowed  us to address what I needed as a teacher quickly and effectively.

Compare that to someone handing me (or my principal) a bunch of student scores and saying, in effect, "Your kids' scores last year weren't good enough. Make sure they're better this year."

Information that is rich, deep, and personal is the key to driving meaningful improvement and growth. We know that to be true for students; why would it not be true for teachers and schools?

"Multiple measures" are generally a dodge by the same folks who believe that only numbers count as information. Multiple observations. Walk-throughs. Student and alumni interviews and questionnaires. Peer review. Gather a ton of information-- not data, but information. And then, the hardest part.


All of that information has to be weighed, sown together, and judged by a live local human being.

I understand the desire to get human judgment out of the system. I am well aware that there are Jerks With Power out there, and that a big JWP can make an ungodly mess.

But you cannot create an unbiased system. You can't. Systems that are set in stone and automatically triggered by data points are just a faceless form of human bias, with every mechanized lever of the system an expression of the biases of the person who designed it (and who doesn't actually have to look anybody in the face when the system implements the designer's judgment).

You cannot take judgment out of the system. What you can do is take it out from behind the curtains and machinery that try to obscure it. What you can do is put it in the hands of professionals who understand their field well enough to put the work ahead of personal bias. What you can do is create a system where there are redundancies (many peoples' judgment is weighed) and counterbalances as simple as having to deliver your judgment face-to-face instead of through a digitized report from faceless software. What you can do is create a system where the people who exercise judgment have to own it.

Your goal is not a system devoid of human judgment, but a system where that judgment reflects the priorities of the community, the realities of the school, teachers and students, and the professionalism of the person making the call. Your goal is a culture of support and excellence and humanity; not one of data, punishment and fear.

Simple enough

Come up with a system that includes all of these features, and I think you may have something worthwhile, something that can actually help grow schools and teachers who are the best they can be. It will be far better than better than nothing.


  1. Peter, I am a relative newbie to your (very much) mostly well considered musing, which is a welcome breath of fresh air.

    Much of the reform-dikheads' concerns ARE legit; however, they (and the disorganized resistance) ignore several of the basics. I am a long time physics teacher who used to teach a lot of chemistry, so my observations will be from my narrow POV.

    1) Our international competitors have high stakes testing. These occur somewhere between our 6th-8th grade. The students deemed 'non-academic' are shunted off at that point to other career tracks.
    2) Compensatory public ed is really a US idea. We allow violent CONVICTED felons to attend school. I know. I was sucker punched by one. It cost me several thousand dollars to find out that Brandon Cook was found guilty of carrying a firearm as a juvenile onto Dent Middle School and then allowed to attend Spring Valley HS (Columbia SC). Until we are allowed to, yes, get rid of these types of individuals, like our international competitors do, there is no point in discussing improving public schools.
    3) The American Chemical Society (ACS) developed a final exam for HS chem students which was well written, properly normed and realistic in scope. That test largely shaped what is taught as a somewhat standard HS college-prep chem course. Our competition does much the same...PROFESSIONALLY written exams which shape curriculum of the disciplines which are taught. Which leads to my next point...
    4) There is really nothing wrong about 'teaching to the test' as long as that test is really good. The AP physics tests, the ACS test that I mentioned, the AP history tests (at least 'til now) are examples in this country.
    5) US teachers spend more time in front of students than the teachers of nearly all the countries which are held up as examples which the US should emulate. THAT point should be brought up anytime one of the local political hacks mentions "Finland" or "Germany". Eliminating student criminals should be brought up if anyone mentions "Japan" or "China".

  2. Our current evaluation system comes from a dysfunctional business model. Jack Welch as CEO of GE implemented the vitality curve model. It was a means of keeping workers on their toes by firing the lowest 10% ("action plan", anyone?) and rewarding the top 10% with bonuses (merit pay).

    Many large corporations used this model, most notably, Microsoft. However, they no longer use it.

    Employees hated it. It REDUCED collaboration. HR departments tended to hire the mediocre candidates. Would you hire someone who would exceed you on the curve? Those who sucker up to the bosses naturally stayed employed, while those who thought outside the box slid down the curve.

    Sucking up worked because all evaluations are inherently subjective.

    So, instead of our decision makers admitting that this model is flawed and produces mediocre results, they got the bright idea to run schools like a business. Race to the Top brought us the Common Core and so much more. It brought a vitality curve model of teacher evaluation.

    For more, Google Jack Welch vitality curve.

  3. This article has a lot of good thought. I don't have a problem with good feedback. I am hesitant because I've had to many experiences with bad evaluation systems. Poorly designed state tests and administrators with personal agendas.

    I would love to have an evaluation system that was valid, and that I respected. Now I'm just happy when they leave me along and let me teach.