Sunday, May 21, 2017

NYT: Value Not Added

Our old friend Kevin Carey popped up in the New York Times this week, using the death of William Sanders as a case to soft-pedal VAM. The article has some interesting points to make about VAM, and it also unintentionally reveals some of the reasons that Value-Added Measuring of teacher performance is a fool's game.

Carey is the education policy program director for the New America Foundation. NAF bills itself as a non-partisan thinky tank based in DC. Eric Schmidt, Google's executive chairman, is chair of the NAF board. Their over-a-million-dollar funders include the Gates Foundation and the US State Department. So their objectivity in these matters is suspect. In the past Carey has turned up trying to support Common Core, attacking public education, using shoddy research to slam higher ed, and helping spread PR for Mark Zuckerberg's AltSchool.


Carey apparently met Sanders and talked to the inventor of the Value Added Measure (specifically, the one known these days as VAAS--  the one that a Houston court just threw out). That provides a fascinatingly specific tale of what started Sanders, who had a doctorate in statistics and quantitative genetics, on the path to evaluating teacher performance:

“In 1945, the United States government set off an atomic bomb.”

That’s how Mr. Sanders began telling me the story of his life, when we met several years ago....

Nuclear weapons tests had released clouds of radiation that had drifted with the weather. Sometime later, farm animals downwind began to die. Did the first event, a mushroom cloud, cause the second event, dead sheep? Or did one merely follow the other coincidentally? Solving this problem required expertise in both statistical probability and livestock biology. Oak Ridge hired Bill Sanders.

So, VAM is tied to nukes. Somehow that seems right.

Another fun factoid: Lamar Alexander was offered the VAM idea when he was governor of Tennessee, but he passed. I would love to hear the story of how he decided not to use Sander's idea.

How easy is it to take shots at Sanders for trying to evaluate teachers based on his work with radioactive cows? Pretty easy-- but it really is striking how little he seemed to grasp the complexity of the whole teaching-learning thing:

To fairly evaluate teachers, Mr. Sanders argued, the state needed to calculate an expected growth trajectory for each student in each subject, based on past test performance, then compare those predictions with their actual growth. Outside-of-school factors like talent, wealth and home life were thus baked into each student’s expected growth. Teachers whose students’ scores consistently grew more than expected were achieving unusually high levels of “value-added.” Those, Mr. Sanders declared, were the best teachers.

It's that simple! The test scores the students produced in previous years make this year's score completely predictable, and any difference must be because of the teacher because no other factor could possible account for a deviation from the predicted student path. Seriously? Sanders had children of his own, so he's definitely met young humans. And yet this overly-simplistic model of human growth and behavior (students just keep progressing along this fully-predictable line unless some teacher disrupts that path) is the mechanical inhuman heart of his system.

But Carey's piece also shows how simple innumeracy has driven the adoption of Sander's work. Sanders tried out his model and found it distributed teacher performance over a "normal" bell curve (kind of like the student achievement fits on a bell curve-- almost as if the teacher bell curve is just an echo of the student one, and not a measurement of something else entirely). Here's how Carey describes the reaction to that curve:

Reformers also looked at the right-hand side of the bell curve, where the effective teachers were, and thought, “What if we could have a lot more of those?” 

Sigh. It's a frickin' bell curve. You can't make the right hand side bigger or the left hand side smaller. You can't, in short, have a system in which all the teachers are above average.

Carey offers the more recent picture of Sanders as a guy who hung back from the argu8ments about policy, but if we look at this friendly profile of Sanders from 2000, we see that in the early days he was a busy eVAMgelist, hitting the road and preaching the Word of Data. That was just before he left the university to join SAS, a data-crunching company that has made a bundle by selling VAAS as a useful product. Presenting Sanders as a kindly old farmer with a PhD glides past the fact that he was employed by a company that made its living selling people on this giant slab of data baloney.

Carey reaches for a valedictory conclusion:

While the use of value-added ratings to hire, fire and pay teachers may have been limited by political pressure, the importance of the value-added bell curve itself continues to grow — less like a sudden explosion than a chime whose resonance gains in power over time. 

Oh, let's tell the truth. VAM systems have also been limited by the fact that they're junk, taking bad data from test scores, massaging them through an opaque and improbable mathematical model to arrive at conclusions that are volatile and inconsistent and which a myriad educators have looked at and responded, "Well, this can't possible be right."

You'll never find me arguing against any accountability; taxpayers (and I am one) have the right to know how their money is spent. But Sander's work ultimately wasted a lot of time and money and produced a system about as effective as checking toad warts under a full moon-- worse, because it looked all number and sciencey and so lots of suckers believed in it. Carey can be the apologist crafting it all into a charming and earnest tale, but the bottom line is that VAM has done plenty of damage, and we'd all be better off if Sanders had stuck to his radioactive cows.



5 comments:

  1. "To fairly evaluate teachers, Mr. Sanders argued, the state needed to calculate an expected growth trajectory for each student in each subject, based on past test performance, then compare those predictions with their actual growth."

    That makes it seem like they're going to take Little Susie's lifetime test scores, look at the pattern thereof, then predict what Little Susie's test scores should continue to be. Then do the same for Little Johnny, Little Nellie, Little Timmy and so on for every kid in every teacher's class. That would be bad enough, but at least it would be based on actual data of actual kids in Ms. Smith's class and the actual kids in Mr. Jones' class, etc.

    But what they actually do bears no resemblance to that at all. There is no specific tracking of individual kids year to year. It's just some kind of model based on generalized characteristics of generalized kids.

    ReplyDelete
  2. "...a chime whose resonance gains in power over time." How poetic. Carey should stick to writing ad copy.

    ReplyDelete
  3. I'm still stuck on the fact that anybody is listening to a fella with a degree in statistics who doesn't understand how bell curves work...

    ReplyDelete
  4. I met and talked to Sanders a couple of times due to the fact that he grew up in my county. We had a very early conversation when he came to explain his teacher-rating system to us local teachers. He explained his straight lines and talked about his equations obliquely. When he was done, I asked him how he accounted for the differences in opinion about what should be taught.

    He was bewildered in a way that made me understand that he had not a clue that there might be a difference in opinion as to what might be fair game in a class. That one teacher might study Steinbeck and another, Twain, and that this might affect the score on a test over American literature, had never crossed his mind.

    Those who wish the world could be described by number

    ReplyDelete
  5. Virtually every NYS teacher (outside of NYC/UFT)now has 50% of their evaluation based on "distributed" test scores, in most cases, these are shared HS Regents exam scores in algebra, biology, global history, US history, and ELA. Instead of VAM, New York uses SLOs which determine teacher effectiveness using the percent of students that have met randomly established "target goals" using the average of said Regents exam scores (which can be re-taken numerous times). Yes, even the elementary school music teacher will be evaluated this way. Most districts have established target goals that they know (using historic Regents scores) will have all teachers rated effective. I don't think any teacher evaluation system can get any further down the rabbit hole than New York. This is all thanks to 2020 presidential hopeful, Andrew Cuomo.

    ReplyDelete