It was a bit of a shock. I picked up my morning paper, and there was an article on the front page touting our school district's PVAAS scores, the commonwealth of Pennsylvania's version of VAM scores, and I was uncomfortably reminded that value-added measures are still a thing.
Value Added Measures are bunk.
We used to talk about this a lot. A. Lot. But VAM (also known as Something-VAAS in some states) has departed the general education discussion even though it has not departed the actual world of education. Administrators still brag about, or bemoan, their VAM scores. VAM scores still affect teacher evaluation. And VAM scores are still bunk.
So let's review. Or if you're new-ish to the ed biz, let me introduce you to what lies behind the VAM curtain.
The Basic Idea
Value Added is a concept from the manufacturing and business world. If I take a dollar's worth of sheet metal and turn it into a forty dollar toaster, then I have added thirty-nine dollars of value to the sheet metal. It's an idea that helps businesses figure out if they're really making money on something, or if adding some feature to a product or process is worth the bother.
Like when you didn't fix the kitchen door before you tried to sell your house because fixing the door would have cost a grand but would allowed you to raise the price of the house a buck and a half. Or how a farmer might decide that putting a little more meat on bovine bones would cost more than you'd make back from selling the slightly fatter cow.
So the whole idea here is that schools are supposed to add value to students, as if students are unmade toasters or unfatted calves, and the school's job is to make them worth more money.
Yikes! Who decided this would be a good thing to do with education?
The Granddaddy of VAAS was William Sanders. Sanders grew up on a dairy farm and went on to earn a PhD in biostatistics and quantitative genetics. He was mostly interested in questions like “If you have a choice of buying Bull A, compared to Bull B, which one is more likely to produce daughters that will give more milk than the other one.” Along with some teaching, Sanders was a longtime statistical consultant for the Institute of Agricultural Research.
He said that in 1982, while an adjunct professor at a satellite campus of the University of Tennessee, he read an article (written by then-Governor Lamar Alexander) saying that there's no proper way to hold teachers accountable for test scores.
Sure there is, he thought. He was certain he could just tweak the models he used for crunching agricultural statistics and it would work great. He sent the model off to Alexander, but it languished unused until the early 90s, when the next governor pulled it out and called Sanders in, and Educational Value-Added Assessment System (EVAAS) was on its way.
The other Granddaddy of VAAS is SAS, an analytics company founded in 1976.
Founder James H. Goodnight was born in 1943 in North Carolina. He earned a Masters in statistics; that combined with some programming background landed him a job with a company that built communication stations for the Apollo program.
He next went to work as a professor at North Carolina State University, where he and some other faculty created Statistical Analysis System for analyzing agricultural data, a project funded mainly by the USDA. Once the first SAS was done and had acquired 100 customers, Goodnight et al left academia and started the company.
William Sanders also worked a North Carolina University researcher, and it's not clear when, exactly, he teamed up with SAS; his EVAAS system was proprietary, and as the 90s unfolded, that made him a valuable man to go into business with. The VAAS system, rebranded for each state that signed on, became a big deal for SAS, who launched their Education Technologies Division in 1997.
Sanders passed away in 2017. Goodnight has done okay. The man owns two thirds of the company, which is still in the VAAS biz, and he's now worth $7.4 billion-with-a-B. But give him credit, apparently remembering his first crappy job, Goodnight has made SAS one of the world's best places to work-- in fact, it is SAS that influenced the more famously fun-to-work culture of Google. It's a deep slice of irony--he has sustained a corporate culture that emphasizes valuing people as live human beings, not as a bunch of statistics.
Somehow Goodnight has built a little world where people live and work among dancing rainbows and fluffy fairy dust clouds, and they spend their days manufacturing big black rainclouds to send out into the rest of the world.
How does it work?
Using mixed model equations, TVAAS uses the covariance matrix from this multivariate, longitudinal data set to evaluate the impact of the educational system on student progress in comparison to national norms, with data reports at the district, school, and teacher levels.
This is a highly complex model that three well-paid consultants could not clearly explain to seven college-educated adults, but there were lots of bars and graphs, so you know it’s really good. I searched for a comparison and first tried “sophisticated guess;” the consultant quickly corrected me—“sophisticated prediction.” I tried again—was it like a weather report, developed by comparing thousands of instances of similar conditions to predict the probability of what will happen next? Yes, I was told. That was exactly right. This makes me feel much better about PVAAS, because weather reports are the height of perfect prediction.
The basic mathless idea is this. Using sophisticated equations, the computer predicts what Student A would likely score on this year's test in some alternate universe where no school-related factors affected the student's score. Then the computer looks at the score that Actual Student A achieved. If Actual Student and Alternative Universe Student have different scores, the difference, positive or negative, is attributed to the teacher.
Let me say that again. The computer predicts a student score. If the actual student gets a different score, that is not attributed to, say, a failure on the part of the predictive software. All the blame and/or glory belong to the teacher.
VAAS fans insist that the model mathematically accounts for factors like socio-economic background and school and other stuff. Here's the explanatory illustration:
Here's a clarification of that illustration:
So how well does it actually work?
Audrey Amrein-Beardsley, a leading researcher and scholar in this field, ran a whole blog for years (
VAMboozled) that did nothing but bring to light the many ways in which VAM systems were failing, so I'm going to be (sort of) brief here and stick to a handful of illustrations.
Let's ask the teachers.
Short answer: no.
Long answer. Collins made a list of the various marketing promises made by SAS about VAAS and asked teachers if they agreed or disagreed (they could do so strongly, too). Here's the list:
EVAAS helps create professional goals
EVAAS helps improve instruction
EVAAS will provide incentives for good practices
EVAAS ensures growth opportunities for very low achieving students
EVAAS ensures growth opportunities for students
EVAAS helps increase student learning
EVAAS helps you become a more effective teacher
Overall, the EVAAS is beneficial to my school
EVAAS reports are simple to use
Overall, the EVAAS is beneficial to me as a teacher
Overall, the EVAAS is beneficial to the district
EVAAS ensures growth opportunities for very high achieving students
EVAAS will identify excellence in teaching or leadership
EVAAS will validly identify and help to remove ineffective teachers
EVAAS will enhance the school environment
EVAAS will enhance working conditions
That's arranged in descending order, starting from the top item, with which over 50% of teachers disagreed. By the time we get to the bottom of the list, the rate of disagreement is almost 80%. At the top of the list, fewer than 20% of teachers agreed or strongly agreed, and it just went downhill from there.
Teachers reported that the data reported was "vague" and "unusable." They complained that their VAAS rating scores whipped up and down from year to year with no rhyme nor reason, with over half finding their VAAS number way different from their principal evaluation. Gifted teachers, because they had the students who had already hit their ceiling, reported low VAAS scores. And while the VAAS magic math is supposed to blunt the impact of having low-ability students in your classroom, it turns out it doesn't really do that.
And this:
Numerous teachers reflected on their own questionable practices. As one English teacher
said, “When I figured out how to teach to the test, the scores went up.” A fifth grade teacher added,
“Anything based on a test can be ‘tricked.’ EVAAS leaves room for me to teach to the test and
appear successful.”
EVAAS also assumes that the test data fed into the system is a valid measure of what it says it measures. That's a generous view of tests like Pennsylvania's Keystone Exam. Massaging bad data with some kind of sophisticated mathiness still just gets you bad data.
But hey--that's just teachers and maybe they're upset about being evaluated with rigor. What do other authorities have to say?
The Houston Court Case
The Houston school district used EVAAS to not only evaluate teachers, but factor in pay systems as well. So the AFT took them to court. A whole lot of experts in education and evaluation and assessment came to testify, and when all was said and done, here are
twelve big things that the assembled experts had to say about EVAAS:
1) Large-scale standardized tests have never been validated for this use. A test is only useful for the purpose for which it is designed. Nobody has designed a test for VAM purposes.
2) When tested against another VAM system, EVAAS produced wildly different results. 3) EVAAS scores are highly volatile from one year to the next.4) EVAAS overstates the precision of teachers' estimated impacts on growth. The system pretends to know things it doesn't really know.
5) Teachers of English Language Learners (ELLs) and “highly mobile” students are substantially less likely to demonstrate added value. Again, the students you teach have a big effect on the results that you get.
6) The number of students each teacher teaches (i.e., class size) also biases teachers’ value-added scores.7) Ceiling effects are certainly an issue. If your students topped out on the last round of tests, you won't be able to get them to grow enough this year.
8) There are major validity issues with “artificial conflation.” (This is the phenomenon in which administrators feel forced to make their observation scores "align" with VAAS scores.) Administrators in Houston were pressured to make sure that their own teacher evaluations confirmed rather than contradicted the magic math.
9) Teaching-to-the-test is of perpetual concern. Because it's a thing that can raise your score, and it's not much like actual teaching.
10) HISD is not adequately monitoring the EVAAS system. HISD was not even allowed to see or test the secret VAM sauce. Nobody is allowed to know how the magic maths work. Hell, in Pennsylvania, teachers are not even allowed to see the test that their students took.
You have to sign a pledge not to peek. So from start to finish, you have no knowledge of where the score came from.
11) EVAAS lacks transparency. See above.
12) Related, teachers lack opportunities to verify their own scores. Think your score is wrong? Tough.
The experts said that EVAAS was bunk.
US Magistrate Judge Stephen Smith agreed, saying that "high stakes employment decisions based on secret algorithms (are)incompatible with... due process" and the proper remedy was to overturn the policy. Houston had to kiss VAAS goodbye.
Anyone else have thoughts?
At first glance, it would appear reasonable to use VAMs to gauge teacher effectiveness. Unfortunately, policymakers have acted on that impression over the consistent objections of researchers who have cautioned against this inappropriate use of VAMs.
The American Education Research Association also cautioned in 2015 against the use of VAM scores for any sort of high stakes teacher evaluation, due to significant technical limitations. They've got a
batch of other research links, too.
The American Statistical Association released a statement in 2014 warning districts away from using VAM to measure teacher effectiveness. VAMs, they say, do not directly measure potential teacher contributions toward other student outcomes. Also, VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.
Most VAM studies find
that teachers account for about 1% to 14% of the variability in test scores, and that the
majority of opportunities for quality improvement are found in the system-level
conditions. Ranking teachers by their VAM scores can have unintended consequences
that reduce quality.
They cite the "peer-reviewed study" funded by Gates and published by AERA which stated emphatically that "Value-added performance measures do not reflect the content or quality of teachers' instruction." This study went on to note that VAM doesn't seem to correspond to anything that anybody considers a feature of good teaching.
What if we don't use the data soaked in VAM sauce to make Big Decisions? Can we use it just to make smaller ones? Research into decade-long experiment in using student test scores to "toughen" teacher evaluation and make everyone teach harder and better
showed that the experiment was a failure.
Well, that was a decade or so ago. I bet they've done all sorts of things to VAM and VAAS to improve them.
You would lose that bet.
Well, at least they don't use them to evaluate teachers any more, right?
Sorry.
There's a lot less talk about tying VAM to raises or bonus/merit pay, but the primary innovation is to drape the rhetorical fig leaf of "students growth" over VAM scores. The other response has been to try to water VAAS/VAM measures down with other "multiple measures," an option that was handed to states
back in 2015 when ESSA replaced No Child Left Behind as the current version of federal education law.
Pennsylvania has slightly reduced the size of PVAAS influence on teacher and building evaluations in
the latest version of evaluation, but it's still in there, both as part of the building evaluation that affects all teacher evaluations and as part of the evaluation for teachers who teach the tested subjects. Pennsylvania also uses the technique of mushing together "three consecutive years of data," a technique that hopes to compensate for the fact that VAAS scores hop around from year to year.
VAAS/VAM is still out there kicking, still being used as part of a way to evaluate teachers and buildings. And it's still bunk.
But we have to do something to evaluate schools and teachers!
You are taken to the hospital with some sort of serious respiratory problem. One afternoon you wake up suddenly to find some janitors standing over you with a chain saw.
"What the hell!" You holler. "What are you getting ready to do??!!"
"We're going to amputate your legs with a chain saw," they reply.
"Good lord," you holler, trying to be reasonable. "Is there any reason to think that would help with my breathing?"
"Not really," they reply. "Actually, all the medical experts say it's a terrible idea."
"Well, then, don't do it! It's not going to help. It's going to hurt, a lot."
"Well, we've got to do something."
"Not that!"
"Um, well. What if we just take your feet off? I mean, this is what we've come up with, and if you don't have a better idea, then we're just going to go ahead with our chain saw plan."
VAM is a stark example of education inertia in action. Once we're doing something, somehow the burden of proof is shifted, and nobody has to prove that there's a good reason to do thing, and opponents must prove they have a better idea. Until they do, we just keep firing up the chain saw.
There are better ideas out there (check out the work of
Jack Schneider at University of Massachusetts Amherst) but this post is long enough already and honestly, if you're someone who thinks it's so important to reduce teachers' work to a single score, the burden is on you to prove that you've come up with something that is valid, reliable, and non-toxic. A system that depends on the Big Standardized Tests and a mysterious black to show that somehow teachers have made students more valuable is none of those things.
VAM systems have had over a decade to prove its usefulness. They haven't. It's long past time to put them in the ground.
I, alas, teach in North Carolina and am still subject to the whims of EVAAS (the original edition.) This will probably not go away anytime before I retire due to the political influence of Goodnight and SAS. So while it is both useless and harmful to teaching in our state and is just another thing contributing to the teacher shortage, it's probably here to stay for the long term.
ReplyDeleteBeyond everything described here, VAM/SLO cheerleaders conveniently ignored the fact that 70% of teachers do not teach a subject area in which standardized tests are federally mandated.
ReplyDeleteHere in NYS, many districts simply use an average of the required high school Regents scores. So, the first year elementary art teacher is being evaluated using scores of students she never even met. It has become a farce that teachers live with because it is rigged in their favor.