Thursday, August 9, 2018

Before We Evaluate Teachers

Policy hounds have been searching for a tool to accurately and fairly evaluate K-12 teachers for years, and to date, they have been largely unsuccessful. That has left us stalled in versions of the following conversations:

Policy leaders: We are going to evaluate teachers by flipping this magical coin.

Teachers: I really don't want to be evaluated by the flip of a coin, magical or otherwise.

Policy leaders: You teachers! You're all opposed to evaluation and accountability.

This is not a useful conversation.

The root problem with the current state of teacher evaluation is that we never had the necessary conversations about what we think it is for. The old system basically said, "We'll hire someone to be the teachers' boss. If that person is happy with the individual teachers, that'll be just fine." If the boss is okay, that system works okay. But if the boss is not so great, that system works out poorly for someone-- taxpayers, teachers, students, someone.
Contrary to what some claim, teachers are fine with accountability. Teachers aren't very happy about teaching next door to Mrs. McAwful. Yes, teachers' unions defend bad teachers for the same reason defense lawyers defend bad criminals-- because the alternative is a system in which powerful people can hurt others at will. A spirited defense of the accused is how we keep people in power accountable. Nevertheless, teachers are perfectly happy to be held accountable by a system that is fair and accurate and that makes sense. Accountability by student standardized test score is not that system.
Before we can design that system, we have to answer some basic questions.

What is it for? Do we want a system that can weed out the dead wood, or do we want a system that helps us find the truly excellent? Do we want it to target teacher weak spots as part of a plan to help them improve? Are we trying to locate teacher-created gaps in the curriculum and instruction? Are we trying to stack rank our entire staff? To make explicit and clear to teachers what exactly we expect from them? This matters because the top and the bottom require different measures. Stack-ranking is hard, corrosive and not always helpful. How do you compare the high school shop teacher to the first grade teacher? How do you get staff to work together when everyone understands that when your colleague wins, you lose? And "taller than everyone else in the room on Tuesday" does not tell you how tall someone actually is.

Who is it for? Are trying to show local taxpayers that they're getting their money's worth? Are we trying to satisfy state and federal bureaucrats? Is the data to be used in house by the teachers and administration themselves? Will this information be for private use or for public vivisection?

What are we going to measure, exactly? Any job evaluation is a matter of saying "This is what we're paying you to do." I don't think any taxpaying parent in the country would say, "We are paying teachers to get Junior to bubble in more correct answers on a standardized test," and yet here we are. This is where the "who" part becomes sticky, because bureaucrats aren't big on "Makes students feel positive about themselves" because that's hard to boil down to a data set of deliverables. But if your own child came home from school, crying because the teacher made her feel like a small, useless person, you would not think "No biggie-- that's not what I pay that teacher for, anyway." So what do we want to measure? Imparts content knowledge? Develops skills? Helps student become a better person? Creates a healthy environment? Helps individual student grow as best that student can? Or helps that student grow as measured against some outside metric? We've gone with standardized test scores because they're easy data to crunch-- but that doesn't mean they're useful.

Creating a teacher evaluation system is hard-- really hard. Jason Kamras thought he really cracked the code with IMPACT in the DC schools, but given time and reflection, it seems to have established a culture in which rampant cheating and misbehavior were encouraged. Kamras has been hired as a superintendent for Richmond Public Schools and he has already said that he will not take IMPACT with him. IMPACT is a dud.

What we have in most corners of the country is a system that attempts to do all of these things at once, resting on a standardized test that wasn't designed to help do any of them. And notice-- I have only talked about teacher evaluation. In most states, the same many-dys-functional hydra is also supposed to evaluate the entire school as well. That adds more multiple layers of complexity (for a thoughtful look at one response, pick up Beyond Test Scores by Jack Schneider.)

When you start to contemplate how huge the task is, it is really astonishing how little discussion there has been about how to do it well. And while this debate is raging, there are folks who argue for the CEO model of charter or public school where the Visionary Leader can just hire or fire at will as he sees fit. Which is just like the old evaluation system we wanted to get rid of-- only worse.

Accountability is important, but if we get it wrong, we end up with a system that does more harm than good, which is in fact where we are. To get to a better place will require a lot of conversation between a full range of stakeholders, and ESSA still keeps districts' hands tied more than is healthy. But somehow we have to move beyond the flip of a magical coin.


  1. Two Points:
    1. Measures vs Outcomes - "I don't think any taxpaying parent in the country would say, "We are paying teachers to get Junior to bubble in more correct answers on a standardized test". No, and you are not paying your pediatrician to get the thermometer to decline either. We are paying the pediatrican to cure your child from their fever. And the thermometer is one significant measure we use to do so. Just like using standardized tests for teacher evaluation, the thermometer is not a perfect measure. However, it's objective and a pretty good indicator.

    2. Good is not the enemy of the perfect - "We'll hire someone to be the teachers' boss. If that person is happy with the individual teachers, that'll be just fine." When did we ever determine that this "old system" met all the varied criteria that you outline ? And no, it's not good enough to say that if that system was used for many years, then the onus is on the reformers to show the alternative is better.

    Most of what you outline is part of the same thing. Finding the stars and finding the "dead wood" is consistent with some type of ranking. Now, what you do with this information (e.g. weeding out "dead wood" or working on teacher development for those struggling) is a different question. And you could likewise ask that question in the old system. But right now, we're just talking about step 1 - measuring teacher effectiveness. And if we can't answer that question, then what exactly are teachers doing in the classroom in the first place ?

  2. I'm reminded of an old tenet in quantum mechanics : "The presence of an observer changes the results of an experiment." And so it is in putting the focus on teaching to the test rather than teaching the student. You can measure interminably in such a milieu and all it will do is pervert results away from what is desired towards what is mandated.