Sunday, December 14, 2014

Punishing Teachers More Effectively

David Brooks' column praising small miracles made note of a new piece of research from Harvard that argues that while carrots may work better than sticks, the best way to use the carrot is to jam it into the horse's eyeball. (h/t to edushyster for the tip, not the carrot)

Our magic term for the day is "loss aversion," which is a fancy term for "people hate to give stuff up." The paper we'll be looking at is "Enhancing the Efficacy of Teacher Incentives through Loss Aversion:A Field Experiment" written by Roland G. Fryer, Jr. (Harvard University), Steven D. Levitt (The University of Chicago), John List (The University of Chicago), and Sally Sadoff (University of California San Diego). Let's learn some stuff, shall we?


Sigh. We know we're in just great shape when we lead with references to the baloneyfied research that "proves" that a measurable improvement in teacher quality creates the same measurable improvement in student achievement as a decrease in class size (the old "we don't need small classes-- just great teachers" research) and follows it up with Chetty's silly "a good teacher means your kid will grow up to make more money" research. And that's just the first paragraph.

In the second, we get the sideways assumption that VAM is a good measure of teacher quality and that unions make it too hard to get rid of bad teachers. In the third, we lament that merit pay hasn't done any good. "Good" of course means "has students with high test scores." Because when people talk about "good teachers," all they're thinking of is students scores on standardized tests. That's all we want from teachers, right?

On this foundation of sand and jello, our intrepid researchers set out to build a mansion of teacher improvitude.

The experiment (oops-- "field test") was performed in Chicago Heights. Teachers were randomly assigned to one of two groups-- either they were in the Gain group, working toward a possible end-of-year bonus, or they were in the Loss group, receiving a bonus up front which they would lose if their students didn't achieve bonus-worthy results. Bonuses for both groups were the same. Additionally, the researchers used the "pay for percentile" method developed by Barlevy and Neal, which is basically a stack and rank system where there are winners and losers. One would think that might have some significant effects on the field test, but apparently we're just going to barrel on assuming that it's a great idea and not a zero-sum dog-eat-dog approach that might shade the effects of a merit pay system.

Their findings were that there was a significant gain in math scores for Loss teachers' students (the significance was between 0.076 and 0.129, so make of that what you will) and, as expected, no significant affect for Gain teachers' students.

To the library

Part two of the paper is the review of the literature. If you're interested in this, you're on your own.

Program details

Chicago Heights is about thirty miles south of Chicago. They have a 98% free and reduced lunch population in elementary and middle school. The program was implemented with the cooperation of both the superintendent and the union. Of 160 teachers, 150 opted in. Maximum possible bonus pay was $8,000.

Working out the assignments of teachers was hard. So hard that apparently the researchers kind of gave up on tracking the reading side of this experiment and focused on the math. This was further complicated in that the design called for some teachers to be up for bonus on their own, while others were bonusing it up in team fashion.

And while the researchers keep saying that the teachers were assigned randomly, it turns out they were re-randomized with an algorithm that kept swapping teachers based on a set of rules until they were best aligned with the selection rules. So, unless I'm missing something, this was kind of like saying, "We randomly assigned people to groups of people with identical hair color and gender. So, we put all the blond women randomly in one group."

Teachers in the Loss group were given $4,000 at the beginning of the year and signed a contract stating they would give back the difference if their earned bonus came in below that amount. If they earned more, they got more. The tests used were the ThinkLink tests, which are described as otherwise low stakes tests, which again strikes me as a fairly critical factor that the researchers breeze right past.

Data and research design

Basically, these guys went in the back room and whipped up a big kettle of VAM sauce. You know. The same kind of thing that has been so widely discredited that the National Association of Secondary School Principals has come out against using it as a means of evaluating individual teachers. Also, they use some more math to deal with the event of a student having mixed teachers (on Loss group, one Gain group) during the day.


You've already heard the big take-away. Other interesting bits of data include a much higher effect for K-2 students (though, since the VAMsauce depends on data going back four years, I'm wondering how exactly we crunched the little kids' numbers). There is a bunch of statistics-talk here as well, but much of it boils down to fancily-worded "Nothing to see here." There are charts for those who enjoy charts.

Interpretation and pre-emptive kibbitzing

The interpretation is simple. Merit pay will yield better test results if you let teachers hold it in their hands for nine months and threaten to take it back if their students don't do well on the Big Test.

The researchers anticipate three areas that might be used to dispute their results, so they address them ahead of time.

First, attrition. They anticipate the complaint that teachers will find other ways to improve their test scores including getting Little Pat McFailsalot out of their classroom, at least on test day. They tran some numbers and decided this didn't happen to any notable degree.

Second, liquidity restraints. We're talking about teacher money here. Teachers might spend their own money in the classroom to improve their bonus-earning chances, which would be a level playing field if all teachers were wealthy, but in a world where teachers have very little extra money to spend in the classroom (or Wal-mart or anywhere else), an extra $4K in September might tilt the field. In other words, did the group that got a $4K run out and spend it to make sure they kept it? Survey says no. Interesting sidelight-- when asked in March, 69% of the Loss teachers had not cashed their bonus checks yet.

Third, cheating. They decided that wasn't a factor because, reasons. Seriously-- isn't the whole hypothesis here that the bonus will motivate teachers to raise test scores any way they can? I have no reason to believe these teachers were cheating, but if this were my experiment, that would certainly be something I'd look for. What kind of pressure and temptation do you suppose will will be felt by a teacher who has already spent his "bonus" on house payments and groceries?

But it gets better. They argue that the proof that no cheating occurred is that results on the state test-- which had nothing to do with their incentive program-- came out about the same. So, the test results from the incentivized program were pretty much the same as the results that they got with no incentives at all. Maybe that means that test prep for the one test is also good test prep for the state test. Or maybe it means that the incentive program had no effect on anything.

Wrapping it up

I see enough holes in this very specific research to drive a fleet of trucks through. But let's pretend for a moment that they've actually proven something here. What would we do with it?

First, we'd need to convince a school district business office to let teachers hold a big pile of district money for nine months, thereby giving up a bunch of interest income and liquidity. At the same time, we'd have to get the administration and board to budget a merit pay line item for "Somewhere between a small amount and a huge mountain." These are great ideas, because if there's anything business managers love, it's letting someone else hold their money, and they only love that slightly less than starting a year with an unknowable balloon payment of indeterminate size next June.

When school districts talk about merit pay, they talk about a merit lump sum set aside at the beginning of the year so that teachers can fight over a slice of the already-set merit pie. As I've said repeatedly, no school board in this country is ever going to say to the public, "Our teachers did such a great job this year that we need to raise taxes to cover all the well-earned merit pay bonuses we owe them."

Of course, somebody would have to figure out the merit system. How many Harvard grad students work in your district? And how exactly will you figure out the math score bonus for your phys ed teacher? 

Districts could manage the financial challenges of this risk aversion model by pre-determining the aggregate merit pay in the district. This, combined with "pay for percentile," would absolutely guarantee open warfare among staff members, who would be earning their merit bonuses by literally ripping them dollars out of colleagues' hands. Boy, I bet teaching in that school would be fun.

The largest thing they haven't thought through

Instituted, this system will not play out like a merit bonus at all. If I start every year with an "extra" $4K (or whatever amount), I've gotten a raise, and every year I don't make my numbers is a year that I get a punitive retroactive pay cut.

In no time at all, this system morphs from a merit pay bonus system of rewards to a bad score DEmerit system of punishments. Rather than a bonus that really lifts up teachers, these folks have come up with a way to make punishment for low results even more painful and effective. A miracle indeed.


  1. Looks like the ThinkLink tests are kaput. Here is a pdf about their "wonderful" tests and the link to the site takes one to DiscoveryEd's assessment page. Perhaps ThinkLink has morphed into one of DE's products, but it's unclear if that has happened.

    Also the original ThinkLink assessments started with 2nd grade, so, not only does the lack of four years of data make the take-away from the early grades suspect, the lack of tests makes one really wonder about the validity of that particular data set.

    What a bunch of rot and nonsense. The Onion is starting to sound more sane than the reality of what's happening in public education.

  2. Yes, the old Thinklink tests are a product of Discovery Education. It even said so on the test booklets.