The National Bureau of Economic Research has just dropped a working paper entitled "Taking Teacher Evaluation to Scale: The Effect of State Reforms on Achievement and Attainment," and you can read the whole 72 pages of the thing if you wish, but instead, I would recommend a very thoughtful review of the paper by Matthew Di Carlo, a senior research fellow at the Albert Shanker Institute.
"The Rise and Fall of the Teacher Evaluation Empire" spins from the working paper into the decades long history of attempts to "fix" teacher evaluation somehow and (spoiler alert) why it has consistently failed.
This isn't the first time that the revolution in teacher evaluation has been tagged unsuccessful. Di Carlo tries to break down the whys and wherefores, and I think he gets it mostly right.
Di Carlo notes that evidence on teacher evaluations is "mixed, with no clear pattern-- low stakes, high stakes, various versions, some work well, some don't. The key issue for Di Carlo is that we don't know why the ones that work, work.
So, there is some good evidence out there, but it is far from perfectly consistent, and it is still outweighed by what we don’t know about teacher evaluations (including, most crucially, why systems do or do not work).
I have some thoughts, but those can wait for a bit.
One major failing he points to is the combination of "implicit overpromising" and the "short-sighted policy analysis environment" resulted in a demand that systems be implemented RIGHT NOW and that results be visible IMMEDIATELY. Well, yes. The whole modern ed reform movement has been marked by a manufactured urgency, an insistence that major changes need to be made quickly, an education implementation of the techno-ethic of move fast and break things.
Change doesn't happen quickly in education, for a variety of reasons, not the least of which is that students themselves cannot be turned on a dime. Di Carlo is right on the money here:
So long as every policy needs to harvest quick testing gains to be considered successful, there won’t be many acknowledged successes, many potentially successful policies won’t be tried, and those that are tried will be in danger of being shut down prematurely.
So we've been hammering at this stuff for a while, particularly when No Child Left Behind laid the foundation for a Fire Our Way To Excellence policy approach. If you're old enough, you remember reformsters arguing with a straight face that if 50% of students scored poorly on the reading Big Standardized Test, that meant that 50% of the teachers stunk and should be fired. The notion that teacher rewards and employment decisions could and should be tied to test scores was irresistible to some folks, including outfits like TNTP that ground out fake research materials like
the irredeemably dumb "Widget Effect." Besides being a bad policy idea destined to fail, the "throw out the bad apples" policies were brutal on teacher morale, mostly because policies makers adopted a stance of "Assume all teachers stink until they can prove otherwise."
Which was bad enough, but since "prove otherwise" meant "be somehow associated with better student scores on a bad untested invalid standardized test' or even "somehow get those bad test data run through a magic VAM formula," teachers too often found themselves caught in a Kafkaesque nightmare in which they were guilty of bad teaching unless "evidence" emerged from the back end of a mysterious and unreliable process over which they had little influence. More nightmare fuel from the notion that if an evaluation system didn't find lots of bad teachers, it must not be working correctly.
So yeah. I have a few thoughts on why the evaluation revolution didn't pan out.
And yes, this is a good time to reiterate that as long as "success" is measured by scores on a mediocre test, it will be hard to achieve and, more importantly, counter-productive to pursue.
Di Carlo argues that there's no evidence for any particular approach to be a winner at scale. He also offers four reasons that he thinks the "rate harder, and rank more rigorously argument is baloney.
First, there's plenty of evidence that evaluations can work without any ratings attached to them at all. Second, teachers will respond to a second-highest rating. Third, when you rush these systems into schools, with such speed that nobody has any reason to trust them, administrators will shy away from handing out the low ratings. Fourth, nothing matters if the evaluation is not provided with actionable feedback. In other words, "You suck. Go do something about that." does not get you an improved teaching staff.
So "can teacher evaluation reform be salvaged?"
Di Carlo points out that evaluation systems work by changing behavior, which they do either by 1) changing the person in the job into another person by hiring and firing or 2) helping the person in the job do better work. For that to work, the system has be credible and trusted. Di Carlo notes that all means of improvement are voluntary. Hence the need for some kind of useful feedback.
I'd say that useful feedback is one of the two critical elements in teacher evaluation. What's the other?
Di Carlo just about has it figured out:
... there is consistent evidence that principals—their training, the time and resources they have to conduct observations, the culture they create—are vital to the success of evaluations and accountability systems.
That's it. That's everything. It's not "vital"--it's the only thing that really matters. If you have a building principal who professional judgment you trust, the format and elements of the evaluation can be configured any number of ways, and it won't really matter. Conversely, the best evaluation system in the world will be worthless junk in the hands of a principal that teachers can't trust.
States can certainly screw this up, mostly by creating an evaluation system that is so regimented that it ties the principal's hands. Such a system won't make an untrustworthy administrator into a trusted one, but it will keep a trusted principal from being able to use their judgment to steer the process into a useful approach. Likewise, swamping a principal in an evaluation system so detailed and time-consuming that for three months of the year they can't get any of the rest of their job done--that system does not help anyone. Neither does mindless "this is fine" happy talk.
I am absolutely a fan of meaningful, useful teacher evaluation. I have yet to meet a teacher who particularly enjoys working with a teacher who is not pulling her weight. And I am an Edward Deming fan--you get better work out of your people in an environment heavy on trust, and you do the opposite by relying on fear and punishment (and to be clear, when you create a system where getting a living wage depends on being "rewarded" for doing well, you are creating an environment based on punishment).
A system that depends on threats and punishment and fear will not create excellence in the school, will not help teachers improve, will not create a healthy environment in which students can learn (see also: the old management maxim that your employees will treat your customers the way you treat your employees).
But for at least twenty years, we have let the fans of fear and punishment direct the course of teacher evaluation. They have, predictably, failed to achieve any of their promised goals. It's time for them to step back and let other approached prevail.