Wednesday, May 14, 2014

Brookings Whips Up Some Teacher Eval Research

The Brown Center on Education Policy at Brookings (the folks who remind us that teachers are everything wrong with education) has released a new report, "Evaluating Teachers with Classroom Observations,"  a study intended to Tell Us Some Things about teacher evaluation and how to do it best. Is this going to be a trip to the unicorn farm? The very first sentences tell us where this report's heart is:

The evidence is clear: better teachers improve student outcomes, ranging from test scores to college attendance rates to career earnings. Federal policy has begun to catch up with these findings in its recent shift from an effort to ensure that all teachers have traditional credentials to policies intended to incentivize states to evaluate and retain teachers based on their classroom  performance.

We are off once more to search for ways to perfect the teacher evaluation system. Grover J. Whitehurst, Matthew M. Chingo, and Katherine M. Lindquist have laid down twenty-seven serious pages of unicorn farming. Let me do my best to take you on a condensed tour.

Focus on the Human Observation

Their big take-away is this: "Nearly all the opportunities for improvement to teacher evaluation systems are in the area of classroom observations rather than test score gains." In other words, the VAM side of evaluations is as good as it can be, but that pesky human-observing-human piece needs to be tightened up. Yikes.

You see, only some teachers are evaluated on test score gains, but all teachers are observed. And here's one thing they get right-- the human observation can provide feedback that's actually good for something, while test results are too late and too vague to be of any use to teachers at all.

But that leads us to this curious thought: Improvements are needed in how classroom observations are measured if they are to carry the weight they are assigned in teacher evaluation. Human observation needs to be measured in a more sciency way. Their big support for this is the finding that teachers with top students tend to get top observation scores. Their reasoning makes sense-- Danielson, for instance, wants you to show off your teaching of higher-order questioning skills. Would you rather do that with your Honors class, or the class where you're hoping the students just remember what you covered yesterday?

The solution? Make human observations more like VAM. The authors suggest that the same sort of demographic factoring adjustments that are used for VAMs should be used for human observation. And if that strikes you as a lousy idea-- well, it only gets better.

History of Bad Evaluation

The authors run down the history of teacher quality pursuits. NCLB defined "highly qualified" as "possessing certain qualifications," but then researchers figured out how to attache numbers to teacher quality and that made things better because, science. Recap of some of the iffy research claiming that a good second grade teacher will help you grow up to be rich. This has laid groundwork for new, federally-approved-and-pushed-but-not-actually-mandated-because-hey-that-would-be-illegal eval systems. Which can still allow for great variety between school districts, and as we all know, variety is bad juju.

So they decided to go study four districts to see if they could find unicorns there.


1) Evaluation systems are sufficiently reliable and valid to be swell. There is strong year-to-year correlation between scores. They are just as reliable as (I am not making this up) systems used to predict season-to-season performance in professional sports.

I am not a statistics guy, but I have to note that the study drew on "one to three years of data from each district drawn from one or more of the years from 2009 to 2012." Am I crazy, or does that not seem like very much data with which to determine year-to-year consistency?

2) Only some teachers are evaluated by VAM. So none of these four districts were in a location where the art teacher gets credit for third grade math scores.

3) Observation scores are more stable from year to year than VAM. Don't get excited-- that's a bad thing, apparently. The fact that your administrator knows you and your work gives him a preconceived notion of how effective you are. So a long-standing relationship with a boss who knows you and your work is not helpful-- it's just a bias.

They have no absolutely answer for a VAM-to-observation ratio in evals, but they recommend properly handled observations be at least 50%.

4) School VAM scores throw things out of whack. Good school VAMs hide bad teachers; bad school VAMs hurt good teachers. These should be scrapped or minimized.

5) Better students = better observation ratings. I can think of a zillion reasons for this, but I don't think many teachers disagree. "Please come observe me when I'm teaching my lowest class of the day," said no teacher ever. Then follows several pages of charts and numerical wonkery to reach the conclusion I mentioned above-- observations should be subjected to the same kind of demographic adjustical jim-crackery that goes into VAMs.

6) That kind of adjustment calls for large sample sizes. Which means getting that data-laden legerdemain on a state level. There are charts and graphs here as well.

7) Outside observers are more predictive of next years VAM scores than inside ones. Principals are influenced by what they know. What's called for is an outside observer who doesn't know anything. Well, not anything except how to observe characteristics that are predictive of VAM scores.  This produces the most hilarious recommendation of all-- two-to-three annual classroom observations of each teacher. Before principals decide to go hide in an ashram, note that at least one of these should be conducted by a no-nothing outsider.

There are certainly Bad Principal situations where some relief from bias would be a Good Thing. But if we are accepting the premise that a principal's knowledge and understanding of her staff is somehow an obstacle to be avoided, we are approaching again the reformy place where human interactions are bad for education and the people who work in public education are all dopes. This isn't a trip to the unicorn farm; it's a trip to the robot unicorn factory. Where money trees grow.


A new generation of teacher evaluation systems seeks to make performance measurement and feedback more rigorous and useful.

Could be worse. They could have brought up grit. But we're going to wind up by reminding everyone that even though variations in a system may be useful in that they offer the chance to study lots of variables in action, mostly they are bad because, chaos.

Their final paragraph starts with this sentence: 

A prime motive behind the move towards meaningful teacher evaluation is to assure greater equity in students’ access to good teachers.

Also, a bicycle, because a vest has no sleeves. Equal access to great teachers may be the stated motivation for the move toward "meaningful" (a meaningless word in this context) teacher evaluation, but what is still missing is the slimmest shred, the slightest sliver, the most shrunken soupcon of proof that a teacher evaluation system would take us one step closer to that goal. Hell, we haven't even proven that "equal access to great teachers" doesn't exist right now! For all we know, we may be following thinky tanks on these ridiculous field trips to the unicorn farm while actual unicorns are back home, grazing in our front yard.


  1. At my school, many of our teachers with "highly effective" VAM scores were getting low "emerging proficient" Danielson observation scores....and many with high Danielson observation scores had low "emerging" VAM scores. (John White said the tougher teacher evaluations were resulting in stronger test scores, hah!) However, the teachers with high VAM scores have been coerced into performing a "dog and pony" show on observation day(s) to simply get an exemplary score for that evaluation measure, and then allowed to go back to teaching the way they always have to secure high VAM scores (because it works). Your article seems to show a reformer frustration with teachers who have the audacity to teach the "old-fashioned" way and still get high VAM, er....have their VAM scores masked. Lets "bring in someone that does not know the teachers to observe" and get it right. So, are we trying to get high test scores...or are we trying to change the way teachers teach even if the scores are high?

  2. Bringing in people who don't know any teachers has its dangers for sure. It also depends on if those observers are teachers themselves, other admins, or complete idiots, I mean, no - I mean idiots. We once had a Walk-Through by principals and heads of departments downtown as part of our "your-scores-are-low-because-you-don't-know-what-you're-doing" reform-type district. Each year they come on once or twice with their clipboards and check-off lists. One year they claimed the 4th grade teacher didn't have a word wall (they were sitting in front of it, observing), another year the same teacher was blasted for HAVING a word wall (they had apparently gone out of favor in a year), still another was reprimanded for spelling and grammar errors on first-draft student work. One year I was reprimanded for allowing a student with a fever and nausea to lay with her head down for the lesson. But the kicker was the observation of the best science teacher I have ever seen. She was teaching 7th graders via the inquiry method. They had not yet gotten the hang of it and were asking for answers to which she would reply with another question to get them on the path to answering their own question. She was fantastic, eventually producing winners of science fairs and science scholarships, city-wide science awards. These "downtown" people rated her as unsatisfactory because she was not answering the kids' questions point blank. Our principal, who appreciated and understood what she was doing, called her down into the meeting with the big wigs to explain the Inquiry Method of Science Instruction. What idiots!