CURMUDGUCATION: VAM

Showing posts with label VAM. Show all posts

Wednesday, March 2, 2016

Ace That Test? I Think Not.

The full court press for the Big Standardized Test is on, with all manner of spokespersons and PR initiatives trying to convince Americans to welcome the warm, loving embrace of standardized testing. Last week the Boston Globe brought us Yana Weinstein and Megan Smith, a pair of psychology assistant professors who have co-founded Learning Scientists, which appears to be mostly a blog that they've been running for about a month. And say what you like-- they do not appear to be slickly or heavily funded by the Usual Gang of Reformsters.

Their stated goals include lessening test anxiety and decreasing the negative views of testing. And the reliably reformy Boston Globe gave them a chance to get their word out. Additionally, the pair blogged about additional material that did not make it through the Globe's edit.

The Testing Effect

Weinstein and Smith are fond of "the testing effect" a somewhat inexact term used to refer to the notion that recalling information helps people retain it. It always makes me want a name for whatever it is that makes some people believe that the only situation in which information is recalled is a test. Hell, it could be called the teaching effect, since we can get the same thing going by having students teach a concept to the rest of the class. Or the writing effect, or the discussion effect. There are many ways to have students sock information in place by recalling it; testing is neither the only or the best way to go about it.

Things That Make the Learning Scientists Feel Bad

From their blog, we learn that the LS team feels "awkward" when reading anti-testing writing, and they link to an example from Diane Ravitch. Awkward is an odd way to feel, really. But then, I think their example of a strong defense of testing is a little awkward. They wanted to quote a HuffPost pro-testing piece from Charles Coleman that, they say, addresses problems with the opt out movement "eloquently."

"To put it plainly: white parents from well-funded and highly performing areas are participating in petulant, poorly conceived protests that are ultimately affecting inner-city blacks at schools that need the funding and measures of accountability to ensure any hope of progress in performance." -- Charles F. Coleman Jr.

Ah. So opt outers are white, rich, whiny racists. That is certainly eloquent and well-reasoned support of testing. And let's throw in the counter-reality notion that testing helps poor schools, though after over a decade of test-driven accountability, you'd think supporters could rattle off a list of schools that A) nobody knew were underfunded and underresourced until testing and B) received an boost through extra money and resources after testing. Could it be that no such list actually exists?

Tests Cause Anxiety

The LS duo wants to decrease test anxiety by hammering students with testing all the time, so that it's no longer a big deal. I believe that's true, but not a good idea. Also, parents and teachers should stop saying bad things about the BS Tests, but just keep piling on the happy talk so that students can stop worrying and learn to love the test. All of this, of course, pre-supposes that the BS Tests are actually worthwhile and wonderful and that all the misgivings being expressed by professional educators and the parents of the children is-- what? An evil plot? Widespread confusion? The duo seem deeply committed to not admitting that test critics have any point at all. Fools, the lot of them.

Teaching to the Test

The idea that teaching to a test isn’t really teaching implies an almost astounding assumption that standardized tests are filled with meaningless, ill-thought-out questions on irrelevant or arbitrary information. This may be based on the myth that “teachers in the trenches” are being told what to teach by some “experts” who’ve probably never set foot in a “real” classroom.

Actually, it's neither "astounding" nor an "assumption," but, at least in the case of this "defiant" teacher (LS likes to use argument by adjective), my judgment of the test is based on looking at the actual test and using my professional judgment. It's a crappy test, with poorly-constructed questions that, as is generally the case with a standardized test, mostly test the student's ability to figure out what the test manufacturer wants the student to choose for an answer (and of course the fact that students are selecting answers rather than responding to open ended prompts further limits the usefulness of the BS Test).

But LS assert that tests are actually put together by testing experts and well-seasoned real teachers (and you can see the proof in a video put up by a testing manufacturer about how awesome that test manufacturer is, so totally legit). LS note that "defiant teachers" either "fail to realize" this or "choose to ignore" it. In other words, teachers are either dumb or mindlessly opposed to the truth.

Standardized Tests Are Biased

The team notes that bias is an issue with standardized tests, but it's "highly unlikely" that classroom teachers could do any better, so there. Their question-- if we can't trust a big board of experts to come up with an unbiased test, how can we believe that an individual wouldn't do even worse, and how would we hold them accountable?

That's a fair question, but it assumes some purposes for testing that are not in evidence. My classroom tests are there to see how my students have progressed with and grasped the material. I design those materials with my students in mind. I don't, as BS Tests often do, assume that "everybody knows about" the topic of the material, because I know the everybody's in my classroom, so I can make choices accordingly. I can also select prompts and test material that hook directly into their culture and background.

In short, BS Testing bias enters largely because the test is designed to fit an imaginary Generic Student who actually represents the biases of the test manufacturers, while my assessments are designed to fit the very specific group of students in my room. BS Tests are one-size-fits-all. Mine are tailored to fit.

Reformsters may then say, "But if yours are tailored to fit, how can we use them to compare your students to students across the nation." To which I say, "So what?" You'll need to convince me that there is an actual need to closely compare all students in the nation.

Tests Don't Provide Prompt Feedback

The duo actually agree that test "have a lot of room for improvement." They even acknowledge that the feedback from the test is not only late, but generally vague and useless. But hey-- tests are going to be totes better when they are all online, an assertion that makes the astonishing assumption that there is no difference between a paper test and a computer test except how the students record their answers.

Big Finish

The wrap up is a final barrage of Wrong Things.

Standardized tests were created to track students’ progress and evaluate schools and teachers.

Were they? Really? Is it even possible to create a single test that can actually be used for all those purposes? Because just about everyone on the planet not financially invested in the industry has pointed out that using test results to evaluate teachers via VAM-like methods is baloney. And tests need to be manufactured for a particular purpose-- not three or four entirely different ones. So I call shenanigans-- the tests were not created to both measure and track all three of those things.

Griping abounds about how these tests are measuring the wrong thing and in the wrong way; but what’s conspicuously absent is any suggestion for how to better measure the effect of education — i.e., learning — on a large scale.

A popular reformster fallacy. If you walk into my hospital room and say, "Well, your blood pressure is terrible, so we are going to chop off your feet," and then I say, "No, I don't want you to chop off my feet. I don't believe it will help, and I like my feet," your appropriate response is not, "Well, then, you'd better tell me what else you want me to chop off instead.

In other words, what is "conspicuously absent" is evidence that there is a need for or value in measuring the effects of education on a large scale. Why do we need to do that? If you want to upend the education system for that purpose, the burden is on you to prove that the purpose is valid and useful.

In the absence of direct measures of learning, we resort to measures of performance.

Since we can't actually measure what we want to measure, we'll measure something else as a proxy and talk about it as if it's the same thing. That is one of the major problems with BS Testing in a nutshell.

And the great thing is: measuring this learning actually causes it to grow.

And weighing the pig makes it heavier. This is simply not true, "testing effect" notwithstanding.

PS

Via the blog, we know that they wanted to link to this post at Learning Spy which has some interesting things to say about the difference between learning and performance, including this:

And students are skilled at mimicking what they think teachers want to see and hear. This mimicry might result in learning but often doesn’t.

That's a pretty good explanation of why BS Tests are of so little use-- they are about learning to mimic the behavior required by test manufacturers. But the critical difference between that mimicry on a test and in my classroom is that in my classroom, I can watch for when students are simply mimicking and adjust my instruction and assessment accordingly. A BS Tests cannot make any such adjustments, and cannot tell the difference between mimicry and learning at all.

The duo notes that their post is "controversial," and it is in the sense that it's more pro-test baloney, but I suspect that much of their pushback is also a reaction to their barely-disguised disdain for classroom teachers who don't agree with them. They might also consider widening their tool selection ("when your only tool is a hammer, etc...") to include a broader range of approaches beyond the "test effect." It's a nice trick, and it has its uses, but it's a lousy justification for high stakes BS Testing.

Sunday, November 22, 2015

OH: Raking in Consultant Money

Education reform has spawned a variety of new money-making opportunities, including a burgeoning field of education consultants.

That's because one of the new steady drumbeats is that superintendents, principals, and most especially teachers-- in short, all the people who have devoted their professional lives to education-- don't know what they're doing and possess no expertise in education whatsoever. No, for real expertise, we must call in the High Priests of Reformsterdom.

That takes us to Cleveland, Ohio. I love Cleveland; I did my student teacher at a Cleveland Heights middle school while living in an apartment at the corner of East 9th and Superior, and those, indeed, were the days. But Cleveland schools have a long history of difficulty. Back in the day, Ohio schools had to submit all tax increases to voter referendum; Cleveland voters routinely said no, and Cleveland schools repeatedly shut down around October when they ran out of money.

Now, in the reform era, Cleveland schools have embraced charters and privatization with a plan that stops just short of saying, "We don't know what the hell we're doing or how to run a school district, so we're just going to open it up to anybody who thinks they can run a school or has an opinion about how to run a school. Except for teachers and professional educators-- they can continue to shut the hell up." This is Ohio, a state that has developed a reputation the charter school wild west, where even people who make their living in the charter biz say, "Oh, come on. You've got to regulate something here!"

Given this climate, it seems only likely that Cleveland schools would call on a consulting firm like SchoolWorks. If the mashed-together name makes you think of other reformy all-stars like StudentsFirst and TeachPlus, you can go with the feeling. SchoolWorks started out helping charter schools get up and running and had a close relationship with KIPP schools. Their CEO's bio starts with this:

Spencer Blasdale considers himself a “teacher by nature,” but found early on in his career that his passion was having an impact beyond the four walls of one school.

And may I just pause to note how well that captures the reformy attitude about teaching-- you are just born with a teachery nature, and you don't need training or experience and you certainly don't need to prove yourself to any of those fancy-pants teacher colleges or other professional educators. The entry to the teaching profession is by revelation, and once you "consider yourself" a teacher, well then, what else do you need?

You will be unsurprised to learn that Blasdale's "career" consists being a charter founding teacher, rising to charter administration, and then deciding to jump to charter policy. His LinkedIn profile indicates that he did teach a couple of years at a private day school back in the nineties. He has never taught in a public school. He's a product of the Harvard Ed grad school. Based on all that, he and his company are prepared to come tell you what you're doing wrong at your school; you can sign up for just an evaluation, or they will provide coaching as well.

Which brings us back to Cleveland.

The Cleveland Plain Dealer reports that SchoolWorks has already been to the district. They visited ten schools. They visited each school for three days. Based on three whole days at the school, they evaluated the school in nine areas.

For this they were paid $ 219,000. Seriously.

The reports were not pretty. "Not all educators convey shared commitments and mutual responsibility," they said (which strikes me as a pretty incredible insight to glean in just three days). Another school was slammed for "rarely -– only 11 percent of the time –- letting students know what the goals were for class." This would be more troubling if there were, in fact, a shred of evidence that sharing the goals had any educational benefits. Security officers were careless and the schools were messy and unclean. Good thing they hired a consultant-- I bet nobody in the district is qualified to tell if a school is messy or not.

The whole SchoolWorks package is like that. They come in to give one of four ratings on nine questions.

Do classroom interactions and organization ensure a supportive, highly structured learning climate?
Is classroom instruction intentional, engaging, and challenging for all students?
Has the school created a performance-driven culture, where the teachers effectively use data to make decisions about instruction and the organization of students?
Does the school identify and support special education students, English language learners, and students who are struggling or at risk?
Does the school's culture reflect high levels of both academic expectation and support?
Does the school design professional development and collaborative supports to sustain a focus on instructional improvement?
Does the school's culture indicate high levels of collective responsibility, trust and efficacy?
Do school leaders guide instructional staff in the central processes of improving teaching and learning?
Does the principal effectively orchestrate the school's operations?

Now, some of these are crap, like #3's focus on data-driven instruction and decision-making. And many of them are really good goals in general, but it's going to come down to, for instance, exactly what we think "intentional, engaging and challenging" instruction looks like.

But on what planet are any of these-- even the iffy ones-- better checked by strangers with no public education expertise in the course of a three-day drive-by then by your own in-house experts? Do your superintendents and principals check none of this? We can take care of item #7 before the consultants even get to the school. Does the school's culture indicate high levels of collective responsibility, trust and efficacy? If you have just paid hundreds of thousands of dollars to come look at what your own people can see with their own eyes, the answer is "no," or maybe "hell, no."

But Cleveland schools say, "May we have some more." SchoolWorks will be driving by twenty-five more schools for a price tag of $667,000. Which-- wait a minute. Ten schools for $219,000 is $21,900 per school. And twenty-five times $21,900 is $547,500. Apparently the additional cost is so that SchoolWorks will provide a "toolbox of solutions."

If this seems pricey, SchoolWorks also offered Cleveland a one-day drive-by package which would have covered the thirty-five schools for $219,000.

If there's a bright spot anywhere in this picture, it's that Cleveland school leaders recognize that simply soaking test scores in VAM sauce won't give them a picture of their schools' effectiveness. But if I were a Cleveland taxpayer, I'd be wondering why I was forking over a million dollars so that some out-of-town consultants could come do the job I thought I was already paying educational professionals in my district to do.

ICYMI: Some weekend eduwebs reading

Here's some edureading for the weekend.

Five Cynical Observationa about Teacher Leadership

I mean to include Nancy Flanagan's insightful list about how teacher leadership isn't happening last week, and then, somehow, I didn't. But here it is. These days Flanagan is one of the consistently rewarding bloggers for Ed Week-- save your limited freebie reads for her.

Educators Release Updates VAM Score for Secretary Duncan

Educators for Shared Responsibility have come up with a VAM formula for evaluating Education Secretaries. Not entirely a joke.

Classroom Surveillance and Testing

At the 21st Century Principal, John Robinson makes the striking observation that our classroom data collection bears a striking resemblance to the tools of surveillance and, well, spying.

Drinking Charter Kool-Aid? Here Is Evidence.

Dr. Julian Vasquez Heilig has provided an essential resource. You may not read through it all today, but you'll want to bookmark it somewhere. Here's a very thorough listing of legitimate peer-reviewed research on the effectiveness of charter schools. Handling of special populations, segregation, competition, creaming-- it's all here, and all the real deal. You will want to keep this resource handy.

Stop, Start, Continue

Not always a fan of things I find at Edutopia, but this is a short simple piece focused on three things teachers should stop doing, three things we should start doing, and three things to continue doing. A good piece for sparking a little mental focus.

Friday, March 13, 2015

PA: All About the Tests (And Poverty)

In Pennsylvania, we rate schools with the School Performance Profile (SPP). Now a new research report reveals that the SPP is pretty much just a means of converting test scores into a school rating. This has huge implications for all teachers in PA because our teacher evaluations include the SPP for the school at which we teach.

Research for Action, a Philly-based education research group, just released its new brief, "Pennsylvania'a School Performance Profile: Not the Sum of Its Parts." The short version of its findings are pretty stark and not very encouraging--

90% of the SPP is directly based on test results.

90%.

SPP is our answer to the USED waiver requirement for a test-based school-level student achievement report. It replaces the old Adequate Yearly Progress of NCLB days by supposedly considering student growth instead of simple raw scores. It rates schools on a scale of 0-100, with 70 or above considered "passing." In addition to being used to rate schools and teachers, SPP's get trotted out any time someone wants to make a political argument about failing schools.

RFA was particularly interested in looking at the degree to which SPP actually reflects poverty level, and their introduction includes this sentence:

Studies both in the United States and internationally have established a consistent, negative link between poverty and student outcomes on standardized tests, and found that this relationship has become stronger in recent years.

Emphasis mine. But let's move on.

SPP is put together from a variety of calculations performed on test scores. Five of the six-- which account for 90% of the score-- "rely entirely on test scores."

Our analysis finds that this reliance on test scores, despite the partial use of growth measures, results in a school rating system that favors more advantaged schools.

Emphasis theirs.

The brief opens with a consideration of the correlation of SPP to poverty. I suggest you go look at the graph for yourself, but I will tell you that you don't need any statistics background at all to see the clear correlation between poverty and a lower SPP. And as we break down the elements of the SPP, it's easy to see why the correlation is there.

Indicators of Academic Achievement (40%)

Forty percent of the school's SPP comes from a proficiency rating (aka just plain straight on test results) that comes from tested subjects, third grade read, and the SAT/ACT College Ready Benchmark. Whether we're talking third grade reading or high school Keystone exams, "performance declines as poverty increases."*

Out of 2,200 schools sampled, 187 had proficiency ratings higher than 90, and only seven of those had more than 50% economically disadvantaged enrollment. Five of those were Philly magnet schools.

Indicators of Academic Growth aka PVAAS (40%)

PVAAS is our version of a VAM rating, in which we compare actual student performance to the performance of imaginary students in an alternate neutral universe run through a magical formula that corrects for everything in the world except teacher influence. It is junk science.

RFA found that while the correlation with poverty was still there, when it came to PSSAs (our elementary test) it was not quite as strong as the proficiency correlation. For the Keystones, writing and science tests, however, the correlation with poverty is, well, robust. Strong. Undeniable. Among other things, this means that you can blunt the impact of Keystone test results by getting some PSSA test-takers under the same roof. Time to start that 5-9 middle school!!

Closing the Achievement Gap (10%)

This particular measure has a built-in penalty for low-achieving schools (aka high poverty schools-- see above). Basically, you've got six years to close half the proficiency gap between where you are and 100%. If you have 50% proficiency, you've got six years to hit 75%. If you have 60%, you have six years to hit 80%. The lower you are, the more students you must drag over the test score finish line.

That last 10%, incidentally, is items like graduation rate and attendance rate. Pennsylvania also gives you points for the number of students you can convince to buy the products and services of the College Board, including AP stuff and PSAT. So kudos to the College Board people on superior product placement. Remember kids-- give your money to the College Board. It's the law!

Bottom line-- we have schools in PA being judged directly on test performance, and we have data once again clearly showing that the state could save a ton of money by simply issuing school ratings based on the income level of students.

For those who want to complain, "How dare you say those poor kids can't achieve," I'll add this. We aren't measuring whether poor kids can achieve, learn, accomplish great things, or grow up to be exemplary adults-- there is no disputing that they can do all those things. But we aren't measuring that. We are measuring how well they do on a crappy standardized test, and the fact that poverty correlates with results on that crappy test should be a screaming red siren that the crappy test is not measuring what people claim it measures.

*Correction: I had originally include a mistyping here that reversed the meaning of the study.

Thursday, March 12, 2015

Raj Chetty for Dummies

The name Raj Chetty has been coming up a great deal lately, like a bad burrito that resists easy digestion. A great deal has been written about Chetty and his scholarly work, much of it by other scholars in various states of apoplexy.My goal today is not to contribute to that scholarly literature, but to try to translate the mass of writing by various erudite economists, scholars and statisticians into something shorter and simpler than ordinary civilians can understand.

In other words, I'm going to try to come up with a plain answer for the question, "Who is Raj Chetty, what does he say, how much of it is baloney, and why does anybody care?"

Who is Raj Chetty?

Chetty immigrated to the US from New Delhi at age nine. By age 23 he was an associate professor of economics at UC Berkeley, receiving tenure at at age 27. At 30, he returned to his alma mater and became the Bloomberg Professor of Economics at Harvard.

He has since become a bit of a celebrity economist, consulted and quoted by the President and members of Congress. He has won the John Bates Clark Medal; Fortune put him on their list of influential people under forty in business.

Chetty started attracting attention late in 2010 with the pre-announcement of a publication of research that would give a serious shot in the arm to the Valued Added Measurement movement in teacher evaluation. His work has also made special appearances in the State of the Union address and the Vergara trial.

What does Chetty say?

The sexy headline version of Chetty is that a child who has a great kindergarten teacher will make more money as an adult.

The unsexy version isn't much more complicated than that. What Chetty et al (he has a pair of co-authors on the study) say is that a high-VAM teacher can raise tests scores in younger students (say, K-4) and while that effect will disappear around 8th grade, eventually those VAM-exposed children will start making bigger bucks as adults.

Implications? Well, as one of Chetty's co-authors told the New York Times--

“The message is to fire people sooner rather than later,” Professor Friedman said.

Chetty's work has been used to buttress the folks who believe in firing our way to excellence-- just keep collecting VAM scores and ditching the bottom 5% of your staff. Chetty also plays well in court cases like Vergara, where it can be used to create the appearance of concrete damage to students (if Chris has Mrs. McUnvammy for first grade, Chris will be condemned to poverty in adulthood, ergo the state has an obligation to fire Mrs. McUnvammy toot suite). You can read one of the full versions of the paper here.

Who disagrees with Chetty?

Not everybody. In particular, economist Eric Hanushek has tried to join this little cottage industry, and lots of reformy poicy makers love to quote his study.

But the list of Chetty naysayers is certainly not short. Chetty appears to evoke a rather personal reaction from some folks, who characterize him as everything from a self-important twit to a clueless scientist who doesn't understand that he's building bombs that blow up real humans. I've never met the man, and nothing in his writing suggests a particular personality to me. So let's just focus on his work.

Moshe Adler at Columbia University wrote a research response to Chetty's paper for NEPC. This provoked a response from Chetty et al, which provoked yet another response from Adler. You can read the whole conversation here, but I'll warn you right now that you're not going to just scan it over lunch.

Meanwhile, you'll recall that the American Statistical Association came out pretty strongly opposed to VAM, which also put them in the position of being critical-- directly and indirectly-- of Chetty. Chetty et al took it upon themselves to deliver the ASA a lesson in statistical analyses ("I will keep my mouth shut because these people are authorities in areas outside my expertise," is apparently really hard for economists to say) which led to a conversation recounted here.

What do the scholarly and expert critics say?

To begin with, the study has a somewhat checkered publication history, debuting as news blurbs in 2010 and making its way up to publication in a non-peer-reviewed journal, then to republishing as two articles, then in a peer-reviewed journal. That history, along with many criticisms of the study, can be found here at Vamboozled, the blog of Audrey Amrein-Beardsley (the blog is a wealth of resources about all things Vam).

Many criticize Chetty's methodology. Adler's critique suggests that Chetty may have fudged some numbers, dis-included some data, and ignored previous research that didn't fit his framework. Amrein-Beardsley (and others) accuse Chetty of ignoring context of the data. Many critics suggest that Chetty is trying to make a mountain out of a molehill.

You can chase scholarly links all day long, though the NEPC link to Moshe's work and simply typing "Chetty" into the VAMboozled search box will provide more than enough reading for an afternoon. Or two.

So how much of Chetty's work is bunk?

I'm going to go with "most of it."

Chetty's idea was to link VAM measures to later success-- to be able to say, "Look! High-VAM teachers grow successful students." There are several problems with this.

First, studies of VAM-based teacher effectiveness always seem to descend into the same tautology. Use test scores to measure VAM. Use VAM to id the best teachers. Check to see if VAM-certified teachers raise test scores. Strip out the fancy language and funky math and you're left with a fairly simple tautology-- "Teachers who get students to have high test scores tend to get students to have high test scores." This is no more insightful or useful than research to show that bald men tend to be bald.

Second, Chetty doesn't seem to distinguish between correlation and causation. His results seem to scream for that consideration-- six year olds who do better on tests don't grow into twelve year olds who do better tests, but they do grow into twenty-eight olds who make more money. I'm no economist, but to me, the yawning gulf between the alleged cause and the supposed effect leaves enough room for a truckload of other possible causes. This holds together just about as well as "because I buried a toad under a full moon a year ago, I met my true love today."

And as it turns out, an explanation is readily available. We know who does better on standardized tests-- the children of high income families. We know who's more likely to get better-paying jobs as adults-- the children of high income families. It seems highly probable that the conclusion to be drawn from Chetty's research is, "Children of higher-income families do better on tests and get higher-paying jobs."

Chetty himself tried to plug that last hole, with research about economic mobility that concluded that it's not any worse than it was a decade ago-- but it's still pretty lousy. Chetty et al also insist that the students were distributed across the classrooms in completely random fashion. This strikes many as an assumption without foundation.

Put another way-- a mediocre teacher with a classroom full of rich kids who test well would earn a high VAM and those well-heeled students would still go on to have well-paying jobs, and nothing in Chetty's model would ever reveal that Mr. McMediocre was less than awesome.

There are other detail-inhabiting devils. The "big difference" in future earnings seems to vary according to which draft of the report we're looking at, and Chetty only claims them as far as the students turning twenty-eight-- the "lifetime earnings" claims are based on the assumption that the subjects will just keep getting the same raises for the rest of their lives that they got up until age twenty-eight. That is a heck of a bold assumption.

Chetty's work rests on the unproven assumption that VAM is not junk. VAM, in turn, rests on the assumption that 1) the Big Standardized Tests provide meaningful data and 2) that a magical formula can filter out all other factors related to student results on the BS Tests. Chetty's work also assumes that adult success is measured in monetary terms. And Chetty's work ignores the difference between correlation and causation, and instead makes a huge leap of faith to link cause and effect. I wold bet you dollars to donuts that we could perform research that would "prove" that eating a good breakfast when you're six, or having a nice pair of shoes when you're ten, can also be linked to higher-paying jobs in adulthood. As it is, we have "proof" that Nicholas Cage causes death by drowning, and that margarine causes divorce in Maine.

Chetty's work is not going to go away because it's sexy, it's simple, and it supports a whole host of policy ideas that people are already trying to push. But it is proof positive that just because somebody teaches at Harvard and wins awards, that doesn't mean they can't produce "research" that is absolute baloney.

Wednesday, March 4, 2015

Economist Hansuhek Gets It Wrong Again

When you want a bunch of legit-sounding baloney about education, call up an economist. I can't think of a single card-carrying economist who has produced useful insights about education, schools and teaching, but from Brookings to the Hoover Institute, economists can be counted on to provide a regular stream of fecund fertilizer about schools.

So here comes Eric Hanushek in the New York Times (staging one of their op-ed debates, which tend to resemble a soccer game played on the side of a mountain) to offer yet another rehash of his ideas about teaching. The Room for Debate pieces are always brief, but Hansuhek impressively gets a whole ton of wrong squeezed into a tiny space. Here's his opening paragaph:

Despite decades of study and enormous effort, we know little about how to train or select high quality teachers. We do know, however, that there are huge differences in the effectiveness of classroom teachers and that these differences can be observed.

This is a research puzzler of epic proportions. Hansuhek is saying, "We do not know how to tell the difference between a green apple and a red apple, but we have conclusive proof that a red apple tastes better." Exactly what would that experimental design look like? Exactly how do you compare the red and green apples if you can't tell them apart?

The research gets around this issue by using a circular design. We first define high quality teachers as those whose students get high test scores. Then we study these high quality teachers and discover that they get students to score well on tests. It's amazing!

Economists have been at the front of the parade declaring that teachers cannot be judged on qualifications or anything else except results. Here's a typical quote, this time from a Rand economist: "The best way to assess teachers' effectiveness is to look at their on-the-job performance, including what they do in the classroom and how much progress their students make on achievement tests."

It's economists who have given us the widely debunked shell game that is Valued Added Measuring of teachers, and they've been peddling that snake oil for a while (here's a research summary from 2005). It captures all the wrong thinking of economists in one destructive ball-- all that matters about teachers is the test scores they produce, and every other factor that affects a student's test score can be worked out in a fancy equation.

And after all that, experts (and economists pretending to be experts) have figured that a teacher affects somewhere between 7.5% and 20% of the student outcome.

Now when Hanushek says that teachers make a huge difference, he is obliquely referencing his own crazy-pants assertion that having a good first grade teacher will make you almost a million bucks richer over your lifetime (you can also find the same baloney being sliced by Chetty, Friedman, and Rockoff). Both researchers demonstrate their complete lack of understanding of the difference between correlation and causation.

Remember that, as always, they believe that "test scores" equal "student achievement." They note that students who get high test scores grow up to make more money. Clearly, the test scores cause the more-money-making, right? Or could it be that (as we already know) students from wealthier backgrounds do better on standardized tests, and that students from wealthier backgrounds tend to grow up to be wealthy adults?

So, in short, what we know about the "huge difference" created by Hansuhek's idea of a "good teacher" is pretty much jack and also squat. But he's going to build a house on this sand sculpture of a foundation.

Without knowing the background, preparation or attributes that make a good teacher, we cannot rely on the credentialing process to regulate the quality of people who enter the profession. Therefore the most sensible approach is to expand the pool of potential teachers but tighten up on decisions about retention, tenure and rewards for staying in teaching.

Since we don't know how to spot good teachers, says Hanushek, we should get a bunch of people to enter the profession and then throw a bunch of them out. This is a fascinating approach, and what I really want to see is the kind of promotional brochures that Hanushek would help college programs design. "Come run up over $100K of debt on the off chance that you might be one of the lucky few to get a career in teaching." Or maybe "Do you think teaching might be the work you want to do, maybe? Well, don't get your heart set on it, but do commit to years of expensive education to test the waters." How does a career counselor even approach this subject? "We'd like all of you to commit to this profession with the understanding that we plan to find half of you unfit for it." How exactly do you talk a student into pursuing a career that you don't think he's fit for?

Evaluation of teacher performance becomes key. Gains in student achievement should be one element, because improving student achievement is what we are trying to do, but this is not even possible for most teachers. Moreover, nobody believes that decisions should be made just on test scores. What we need is some combination of supervisor judgments with the input of professional evaluators.

What? What??!! Improving student achievement aka test scores is what we're trying to do? First, which "we" do you mean, exactly, because I certainly didn't enter teaching dreaming of increasing standardized tests scores. And what do you mean "this is not even possible for most teachers"? I mean, it could be a sensible statement, acknowledging that most teachers do not teach subjects that are measured by the Big Standardized Test. And if "nobody believes" that the judgment should be made just on test scores, why would you say that raising test scores is "what we're trying to do"?

And "professional evaluators"? Really. That's a thing? People whose profession is just evaluating teachers? How do you get that job? How do you prepare for that job? Is that what we're going to do with all the people we talked into pursuing teaching as a career just so we could have excess to wash out?

Hansuhek closes by trotting out DC schools as an example of how the test and punish, carrot and stick system works so super well. Would that be the system that was revamped to not include test scores because they were such a mess? Or is he thinking of the good old days when She Who Will Not Be Named used the system to spread fear and loathing, creating an atmosphere ripe for rampant cheating?

There's no evidence, anywhere, that test-based accountability improves schools. None. Not a bit. Not when it's used for "merit pay," not when it's used for hiring and firing decisions, not when it's used for any system of carrots and sticks. Nor could there be evidence, because the only "evidence" folks like Hanushek are looking at is test scores, and test scores are a measure of one thing, and one thing only-- how well students score on the Big Standardized Test. And there is not a link anywhere that those test scores mean anything else (and that would include looking back to the days when US low test scores somehow didn't stand in the way of US economic and international success).

It's tired baloney, baloney sliced so thin that it's easy to see through it. You may want to argue that I am just a high school English teacher, so what do I know about big-brained economics stuff. I'd say that if a high school English teacher can see the big fat hole is your weak economist-generated argument, that just tells you how weak the argument is. Hansuhek has become one of those go-to "experts" whose continued credibility is a mystery to me. He may an intelligent man, a man who treats his mother well, and is fun to hang out with. But his arguments about education are baseless and unsupportable. If you're going to read any portion of the NYT debate, I recommend you skip over Hanushek and check out the indispensable Mercedes Schneider, whose piece is much more closely tied to reality.

Tuesday, January 20, 2015

Dear Randi: About That ESEA Petition--

You've been kind enough to drop me an email about your position on testing in the might-be-new ESEA, so I wanted to share my reaction with you.

What the hell are you thinking?

You've enumerated four actions you would like Congress to take with the could-be-revamped ESEA (in partnership with CAP which already blew my mind just a little). While they are clearer than the joint-CAP statement, they don't make me feel any better.

End the use of annual tests for high-stakes consequences. Let’s instead use annual assessments to give parents and teachers the information they need to help students grow.

Oh, hell. While we're at it, let's use annual assessments to make pigs fly out of our butts, because that's just about as likely as the test being a useful source of information that I need to help my students grow. Exactly how would this work. Exactly what would I learn from a standardized test given late in the year, results to be released over the summer, that would help me grow those students?

Use the data we collect to provide the federal government with information to direct resources to the schools and districts that need extra support.

Yes, because that has worked so well so far. The federal government is great about allocating resources on the local level without lots of red tape and strings attached.

You know what would work better? Actual local control. Actual democracy on the local level. Actual empowerment of the people who have the largest stake in the community's schools.

Ensure a robust accountability system that judges schools looking at multiple measures—including allowing real evidence of student learning.

Do you remember when you were on twitter, pushing "VAM is a sham" as a pithy slogan? What the heck happened? How can the head of a national teachers' union take any approach about the widely discredited and debunked test-based evaluation of students other than, "Hell no!"

And finally, the federal government should not be the human resources department for local schools, and should not be in the business of regulating teacher evaluation from Washington D.C. Teacher evaluation is the district’s job.

Oh, come on. In what universe does the federal government give local school districts resources, oversee their accountability system, but still leave them free to do the job. Answer: they don't. This is local control just like adoption of Common Core was freely adopted by states. This is the feds saying, "You can paint your school any color you want, and we'll buy the paint, just as long as you meet the federal standards that say all schools must be black. But otherwise you're totally freely under local control."

Randi, I have been a fan in the past, but I find this policy package an absolute headscratcher, and no matter how I squint, I cannot see the interests of public education (or the teachers who work there) reflected anywhere in the shiny surface of this highly polished turd.

So, no. I'm not going to sign your petition, and I'd encourage others to refrain as well. This is just wrong. Wrong and discouraging and a little anger-inducing, and I'm not going to the dark side with you, not even if they have great cookies.

Sincerely,

Peter Greene

Monday, January 19, 2015

Spotting Bad Science

This item comes via my old friend and Chicago nursing queen Deb Burdsall. Its original source is Andy Brunning at chemistry site Compound Interest, but it certainly brings to mind some of the "science" that floats around the education world.

So here is the rough guide to spotting bad science:

1) Sensationalized headlines are not always the fault of the researchers when their work is glommed up by the media, but when headlines like "Good teachers are as important as small class size" or "Calculus can make you rich?" are not a good sign.

2) Misinterpreting results. How many times have you followed up on an piece of research only to find it doesn't actually prove what the article says it proves.

3) Conflict of interests. As in, funding research specifically to prove that your pet theory is correct. Just google Gates Foundation.

4) Correlation and causation. This one is everywhere, but nowhere has it been more damaging than in all the policy decisions by the current administration deciding that since low standardized test scores and poverty go together, low test scores must cause poverty. And more research that concludes that teachers cause low test scores.

5) Speculative language. Again, we are living with a boatload of policies based on how we think things ought to work. The infamous Chetty study about future earnings is loaded with suppositions.

6) Sample size too small. Is this still a problem? I just remember looking up studies in college and discovering the "research" was performed on thirty college sophomores.

7) Unrepresentative samples. Chalk this one up for every piece of "research" that proves the effectiveness of a charter school.

8) No control group used. A built-in limitation of education research. We can't really assign a group of tiny humans to have no education so we can see what difference a teacher makes.

9) No blind testing used. Also a limitation. I'm not sure I can even think of how to use blind testing of educational techniques. Blind teacher? Students wrapped in plain brown paper? We get a pass on this one, too.

10) Cherry-picked results. Well, yes. Easiest to do if you're doing charter research and you cherry-pick the test subjects to begin with.

11) Unreplicable results. Sort of like the way VAM scores never come out the same way twice. In fact, VAM fills the bill for most of these indicators of bad science.

12) Journals and Citations. My favorite thing about thinky tank "research" is how it provides nice citation pages filled with references to other papers from thinky tanks. Or this ACT report with footnotes from other ACT reports.

VAM is perhaps the leading source of junk science in the education field, but there are so many fine examples. Print out the handy graphic above and keep it nearby the next time your are perusing the latest in educational "research."

Also, I am going to use this as an excuse to post this picture (one of my faves)

Thursday, January 15, 2015

Fixing Tenure

Conversations over the holiday break have reminded me that to the regular civilians, the removal of bad teachers remains a real policy issue. There is no way to argue against that as a policy issue-- "I didn't have a single bad teacher in all my years of school," said no person ever. Arguing against a system for removing incompetence from the classroom is like arguing against the heliocentric model of the solar system; it can be done, but you'll look like a dope.

But we aren't any closer to fixing whatever is supposedly wrong with tenure than we were a few years ago. Why not? Because there are certain obstacles to the brighter bad-teacher-firing future that some dream of.

Administration

In most districts there is a perfectly good mechanism in place to fire bad teachers. But to use it, administrators have to do work and fill out forms and, you know, just all this stuff. So if you're an administrator, it's much easier to shrug and say, "Boy, I wish I could do something about Mr. McSlugteach, but you know that tenure."

A natural reluctance is understandable. In many districts, the administrator who would do the firing would be the same one who did the hiring, and who wants to say, "Yeah, I totally failed in the Hiring Good People part of my job."

Yes, there are large urban districts where the firing process is a convoluted, expensive, time-wasting mess. But that process was negotiated at contract time; school leaders signed off on it. Could a better version be negotiated? I don't know, but I'll bet no teacher facing those kind of charges thinks, "Boy, I hope this process that's going to decide my career is going to be long and drawn out."

We know that administrators can move quickly when they want to. When a teacher has done something that smells like parent lawsuit material, many administrators have no trouble leaping right over that tenure obstacle.

All of which tells us that most administrators have the tools to get rid of incompetent teachers. They just lack either the knowledge or the will. So there's our first obstacle.

Metrics vs Quality

We don't have a valid, reliable tool for measuring teacher quality. There can't be a serious grown-up left in this country who believes that VAM actually works, and that's all we've got. The Holy Grail of evaluation system is one that can't be tilted by a principal's personal judgment, except that would be a system where a good principal's good judgment would also be blocked, and that seems wrong, too. We need to allow local discretion except when we don't.

I have a whole system blocked out and I'm just waiting for a call to start my consulting career. The downside for national scalability fans is that my system would be customized to the local district, making it impossible to stack rank teachers across the country.

And even my system is challenged by the personal quality involved. I can have every graduate of my high school list their three best and worst teachers, and they can probably all do it-- but their lists won't match. Bad teaching is like pornography-- we know it when we see it. But we don't all see it the same way. Identifying how we know bad teaching is a huge challenge, as yet unsurmounted.

Metrics vs Time

But that hurdle is just about identifying who's doing a good or bad job right now. There's another question that also needs to be answered-- with support, will this teacher be better in the future?
Once we've spotted someone who's not doing well, can we make a projection about her prospects? I've known many teachers who started out kind of meh in the classroom, but got steadily better over the course of their careers (include me in that group). I've known several teachers who hit a bad patch in mid-career and slumped for a while before pulling things back together.

If I ask graduates from over two decades to list best and worst teachers, that will provide even more variety in the lists. So how do we decide whether someone is just done, or that some support and improvement will yield better results that trying to start from scratch with a new person.

Hiring replacements

Any system that facilitates removing bad teachers must also reckon with replacing them. In fact, if we were good at hiring in the first place, we'd have less need to fire.

For all the attention and money and lawyering thrown at tenure, precious little attention has been paid to where high-quality replacements are supposed to come from. Instead, we've got the feds preparing to "evaluate" ed programs with the same VAM that serious grown-ups know is not good for evaluating teachers.

But the lack of suitable replacements has to be part of the serious calculus of firing decisions. Beefing up the teacher pool must be part of the tenure discussion.

Holding onto quality

The constant gush, gush, gush of teachers abandoning the profession is also a factor. If I've just had two or three good teachers quit a department in the last year, I'm less inclined to fire the ones I have left (who at least already know the bell schedule and the detention procedures). There are many ways to address this, including many that don't cost all that much money. But if you are going to remove a feature of teaching that has always made it attractive-- job security-- you need to replace it with something.

This is why holding onto a few less-awesome teachers is better than firing some good ones-- you do not attract teachers by saying, "You might lose your job at any time for completely random reasons."
If you can't hold onto your better people, your school will be a scene of constant churn and instability, which will go a long way toward turning your okay teachers into bad teachers.

The virtues of FILO

I know, I know. Just go to the comments and leave your story of some awesome young teacher who lost her job while some grizzled hag got to stay on. First In, Last Out may be much-hated, but it has the following virtues.

1) It is completely predictable. You don't have to wonder whether or not your job is on the line. The school trades a handful of young staffers with job worries for the rest of the staff having job security.

2) It's a ladder. As a nervous young staffer, you know that if things work out, you'll earn that job security soon enough.

3) Youth. Young teachers at the beginning of their careers are best able to bounce back from losing a job. Being fired is least likely to be a career-ender for the newbs.

But in private industry--

Don't care. Schools are not combat troops, hospitals, or private corporations. I'll save the full argument for another day, but the short argument is this-- schools are not private industry, and there's no good reason to expect them to run like private industry.

Whose judgment

At the end of the day, any tenure and firing system is going to depend on somebody's judgment. When we use something like Danielson rubric or even a God-forsaken cup of VAM sauce, we are simply substituting the judgment of the person who created the system for the judgment of the people who actually work with the teacher.

True story. In a nearby district a few years ago, the teachers were called to a meeting, and as they entered the meeting, they pulled numbers out of a hat. Then as the meeting started they were told what the numbers meant-- certain numbers would have a job the next year, other numbers would not, and the last group were maybe's.

That's what an employment system that uses no personal judgment looks like, and it satisfies the needs of absolutely none of the stakeholders. What we need is a system that uses the best available judgment in the best possible way. But it will have to address all the issues above, or we're just back to numbers in a hat.

Originally posted at View from the Cheap Seats

Monday, January 12, 2015

Schneider on Evaluation

Regular readers here know that I'm a huge fan of Mercedes Schneider, whose attention to detail, relentless research skills, and sharply analytical mind are an inspiration. Also, she once called me the Erma Bombeck of education bloggers, so I kind of love her for that, too.

I read her blog regularly and repeatedly, and while all of it is indispensible, a recent post of hers about Doug Harris and the promotion of VAM contains these pure gold paragraphs about teacher evaluation. I'm copying them out here mostly so that I can find them whenever I want to, but you should read them and take them to heart, to.

Point systems for “grading” the teacher-student (and school-teacher-student) dynamic will always fall short because the complex nature of that dynamic defies quantifying. If test-loving reformers insist upon imposing high-stakes quantification onto schools and teachers, it will backfire, a system begging to be corrupted by those fighting to survive it.

It is not that I cannot be evaluated as a teacher. It’s just that such evaluation is rooted a complex subjectivity that is best understood by those who are familiar with my reality. This should be true of the administrators at one’s school, and I am fortunate to state that it is true in my case.

There are no numbers that sufficiently capture my work with my students. I know this. Yes, I am caught in a system that wants to impose a numeric values on my teaching. My “value” to my students cannot be quantified, nor can my school’s value to my students, no matter what the Harrises of this world might suggest in commissioned reports.

Sunday, January 11, 2015

Parents Demanding Testing

Given the rhetoric in the world of education, there are some things that I would expect to see, and yet don't. For example:

The Chetty Follow-up

Chetty et al are the source of the infamous research asserting that a good elementary teacher will results in an extra coupe of hundred thousand lifetime dollars for the students in their classroom.

Where are the follow-up and confirming studies on this? After all-- all we need are a pair of identical classrooms with non-identical teachers teaching from the same population. Heck, in any given year my own department has two or three of us are teaching randomly distributed students on the same track. All you'd have to do is follow them on through life.

In fact, I would bet that where the Chetty effect is in play, it's the stuff of local legend. For years people have been buzzing about how Mr. McStinkface and Ms. O'Awesomesauce teach the same classes with the same basic sets of kids, but her students all grow up to be successful, comfortably wealthy middle class folks and his students all grow up to a life of minimum wage jobs and food stamps.

I can't think of anything that would more clearly confirm the conclusions and implications of Chetty's research. So where is that report?

Parents Demanding Testing

To listen to testing advocates speak, one would think that our nation is filled with parents desperate for some clue about how their children and their schools are doing.

So surely, somewhere, there is a Parents Demand Tests group. Somewhere there must be a group of parents who have banded together to demand that schools give standardized tests and release the results, so that at last they know the truth.

"I just don't know," says some unhappy Mom somewhere in America. "I have no idea if Chris is learning to read or not. If only I had some standardized test results to look at."

"Dammit," growls some angry Dad somewhere in America. "I've had it with that school. Tomorrow I'm going down there to the principal's office to demand that Pat get a standardized test so we know if the kid can add and subtract or not."

But I can't find any such group on Facebook. Googling "Parents demanding testing" just gets me a bunch of articles about parents who are demanding tests of asbestos, air quality, other safety issues.

This is a striking gap. After all, we have plenty of robust-ish astro-turf groups to convince us that parents are, for example, deeply incensed over tenure-related policy. We are shown that parents really, really want tests to be steeped in VAM sauce and lit afire, so that terrible teachers can be roasted atop them.

And yet, as the crowds increasingly call for the standardized tests to be tossed out with last week's newspapers, it's chirping crickets from parent-land. Not CCSSO, not Arne Duncan, not any of the test-loving advocates has punctuated their pro-test protestations with a moment of, "And I'd like you to meet Mrs. Agnes McAveragehuman who will now tell you in her own words why she thinks lots of standardized testing is just totes swell."

But the reformsters must know plenty of people like Agnes. After all, they keep insisting that we need the tests or else people will not know how well students are learning, what schools are teaching, what progress is being made. Why, just Friday, there was Charles Barone, policy director for Democrats for Education Reform (which I am going to call DERP because somebody ought to) in the Washington Post opining, "I don’t know how else you gauge how students are progressing in reading and in math without some sort of test." Now maybe he imagines that there's a danger of schools in which no tests are being given whatsoever, but my own use of context clues leads me to believe that he is speaking of standardized testing.

When Arne Duncan spoke up to pretend to join the CCSSO initiative to pretend to roll back testing, he made his case for standardized testing by saying, "Parents have a right to know how much their children are learning," implying that only a standardized test could provide that answer.

It is possible that Arne's theory is that parents think they know what's going on with their school and their own children, but are actually deluded and misled (as witnessed in his classic genius quote from late 2013). But by now, over a year later, don't you think we'd have some converts, some parents saying, "Thank you, Mr. Duncan. Now that I have seen some test results, the scales have fallen from my eyes and I realize that merely living with and raising this tiny human has blinded me to a truth that only a standardized test could reveal. Don't let them take those tests away, sir!! I need them to tell me who my child is!" And yet, they don't seem to have appeared.

Maybe these parents are simply disorganized. Maybe they're uniformly shy. Maybe they use some of those underground web thingies so they can operate with cyberninja-like stealth. Or maybe they are raising snipes on a special farm where the ranch-hands ride unicorns and the pumps run on cold fusion. Maybe this world where parents are clamoring for standardized tests to reveal the truth about their children is a world that doesn't actually exist.

The Great Chain of Effectiveness

The USED is ready to forge the next link in the Great Chain of Effectiveness.

We're already familiar with the two links. The very first chain is the testing link-- a standardized test covering narrow slices of two subject areas is forged as a measure of the full education of a child. The second link is from that test to the teacher. Soaked in magic VAM sauce, that second chain says that the test results are the responsibility of the teacher, and so the second link measures how effective that teachers actually is.

Link number three is on the way. That link will stretch from the classroom teacher back to the college department that trained her to be a teacher. The USED is proposing a heaping side order of VAM sauce for colleges and universities.If a student's results for bubbling in answers on questions about certain narrow areas of math and reading are too low, that is clearly the responsibility of the college department that certified the student's teacher. They should be rated poorly.

But why stop there?

That college education department is composed of professors who are clearly ineffective. The institutions that issued their advanced degrees should be rated ineffective. And their direct oversight comes from college administration-- so let's include their ineffectiveness of the college president's evaluation.

And where did that guy come from? This is more complicated, but we'll need to cross-reference his salary, because we know from Chetty et al that a good elementary teacher would have made a difference of several hundred thousand dollars in salary. So if our ineffective college president is also not super-well-paid, we can clearly conclude that his first grade teacher was ineffective-- let's hunt her down and downgrade her evaluation.

Of course, that raises another problem in our great chain of accountability. College presidents aren't generally young guys, so it's possible that his first grade teacher is dead. But now that we've located her, we can locate all the students she ever taught. Now, it's possible that some of her students went on to have successful careers even though she was ineffective-- we can discard those from the sample and assume that those are all students with grit. The rest of her former students who are not making big bucks must be the result of her ineffective instruction, and the government has an obligation to send letters to all of their employers indicating that federal government has determined that, due to an ineffective first grade teacher, those employees are losers.

Now if, any of those students went on to become teachers, we have a bit of a bind. If that teacher turns out to be ineffective, do we blame his college or his first grade teacher?

But back to our ineffective college president. Somebody hired him, so those people must also be rated ineffective. Those university trustees and directors are usually folks with successful careers, but their hiring of the president who ran the department that trained the teacher who taught the child who bubble in several wrong answers on his test reveals them to be actually ineffective. Good government oversight requires that any products produced by their companies should be stamped with a warning: "Warning. This product was produced by a company run in part by an ineffective human being."

Of course, some of those college presidents are in charge of public universities. These state schools are ultimately run by state level bureaucracies, and those are of course ultimately answerable to a governor. So the governor would have to be rated ineffective as well.

But the ineffective governor was elected by the people. I realize that it would require a breach of values that we've long held dear, but I think we've established that in the pursuit of effectiveness labels for education, long-held American values can go straight into the dumpster. So-- let's find out exactly who voted for that ineffective governor, and let's rate them ineffective voters and maybe we should take away their votes in the future and just bring in a charter voting company to do the voting for all those people, who in the meanwhile, have to be Great Chained back to their own first grade teachers who are clearly responsible for their ineffective voting.

You may say that the Great Chain of Effectiveness is built out of tin foil and tenuous connections, and that it violates laws of common sense and decency. Just watch it. That's the kind of talk that gets a person's first grade teacher on a list.

Saturday, January 10, 2015

Brookings Fails To Makes Case For Annual Testing

I kind of love the guys at Brookings. They are such a reliable source of earnest amateur writing about education. They're slick, polished, and professional, and they rarely know what they're talking about when it comes to education.

Like most everybody paying attention, they see the writing on the wall for an ESEA rewrite by the GOP Congress, and the four (!) authors of this piece would like to put their oar in for maintaining the regimen of annual testing.

"The Case for Annual Testing," by Grover J. "Russ" Whitehurst, Martin R. West, Matthew M. Chingos and Mark Dynarski of the Brown Center on Educational Policy, presents an argument that they contend is composed of four part. And not one of them is correct. The central foundation of the structure is that testing, standards and accountability are discrete and totally separable. So we're in trouble already with this argument. But let's go ahead and look at the four legs of this stool.

Federal control of standards and accountability is unnecessary, but the provision of valid and actionable information on school performance is a uniquely federal responsibility.

Information on school performance in education is a public good, meaning that individuals cannot be effectively excluded from using the information once it exists. Because it is impossible to prevent consumers who have not paid for the information from consuming it, far too little evidence will be produced if it is not required by the federal government.

IOW, local districts won't produce information because they are afraid that someone will see it, so only the federal government can force the production. And, the authors continue, only the feds can produce the high-grade top-quality stuff. The argument is some combination of "nobody else as good as the feds" and "others can do it, but they won't unless the feds make them.

The states, they argue, are perfectly capable of setting standards and holding schools accountable. But somehow, only the feds can get good information. How does that even make sense? States are perfectly capable of making a good pancake and telling if it's any good, but only the feds can go to the store for the ingredients? How would states set standards or hold schools accountable if they couldn't also come up with the information implicit in each of those activities?

Nevertheless, Brookings says sternly, "If the federal government doesn't support it, it will not happen."

Note: they have made another bad assumption here, but I'll wait a bit to bring it up.

Student learning impacts long-term outcomes that everyone should value, and test scores are valid indicators of such learning.

Neither half of this sentence is correct.

The first half of the sentence is supported entirely and only in the article by the work of Chetty, Friedman and Rockoff. This is the infamous study asserting that a good teacher in elementary school will make a difference of $250,000ish dollars in future earnings. Disproving the study is a popular activity, made extra popular because much of the proof is right there in the original study's own data set. If you'd like to read a scholarly takedown, try this. If you'd like one with plain English and a Phineas and Ferb reference, try this. Either way, the study is bunk.

But while the first half is substantially wrong, but still kind of right (yes, student learning results in stuff that people should care about), the last half is just silly.

The authors try to shoe-horn some Chetty et al in to prove the second as well, but it doesn't. This whole argument boils down to, "There's one paper that shows some teeny tiny correlation between test scores and doing well later in life."

But in terms of offering support for the assertion that test scores are a valid measure of important learning, they offer nothing at all. Nothing. At. All.

And here's the other thing-- even if they were a valid measure, so what? What is the purpose of knowing before the fact which students are headed for greater success as adults?

Many school management and improvement functions depend on annual measures of student growth.

The functions they're talking about include marketing charter schools and "differentiating" teachers. They assert, with a straight face, that you can't run VAM systems without test data, which they suggest is important by alluding again to Chetty, thereby managing to cram two discredited and debunked pieces of work into a single paragraph.

They also assert that test results are needed to evaluate policies that are foisted on schools (because, I guess, the schools themselves don't know or won't say). And they are looking out for the schools, which won't be credited for their success (credited? by whom? who is out there giving schools credits for doing a good job?).

Finally, you can't disaggregate data for subgroups if you don't have data.

Most of the opponents of federally imposed standards, testing, and accountability should be in favor of federally imposed annual testing shorn of standards and accountability.

Brookings' fourth and final point is that everybody really ought to love annual testing once you remove accountability and standards from the mix (if I could insert a Jon Stewart "Do tell" gif here, I surely would).

Conservatives should love it because testing data can be used to feed school choice. And to assuage their fears of federal oversight, the writers offer this astonishing assertion:

And it doesn’t have to be the same test across the nation to provide this information, or even a single end-of-the-year test as opposed to a series of tests given across the year that can be rolled-up into an estimate of annual growth. All that is required is something that tests what a school intends to teach and is normed to a state or national population.

I have no words. Apparently this entire article is a waste of time because when they say they're in favor of annual testing, they just mean that at least once a year teachers should give some sort of test. Well, hey! Done!! I will leave it to you guys to figure out how those tens of thousands of tests will be normed up so that all of those schools doing testing a completely different way can somehow be legitimately compared. Get back to me when you sort that one out, in a decade or two.

For progressives, we offer the argument that disagregated test data is a useful tool for lobbying on behalf of whatever subgroup you're concerned about. I've contemplated this argument before, and while I understand the appeal of keeping groups from disappearing, I have serious ethical issues with using students as tools to generate talking points. If your argument for testing is, "Well, no, it doesn't really serve the kids. It might even be damaging for the kids. But it generates some real good lobbying material for advocates," I think you're on shaky ground, indeed.

And parents? Well, there's this:

Surely, such parents no more want to be in the dark about a K-12 school’s academic performance than they would want to ignore the quality of the college to which their child will eventually seek enrollment.

Because, of course, all students will eventually seek enrollment in a college. Beyond that, I'm wondering as always-- where is this great mass of parents clamoring for and demanding federal testing? Where are all these parents who have no idea how well their child's school is doing and so are desperately demanding federal test results so they will know?

Brookings finally notes that teachers unions might be a lost cause on this issue because 9and they use very nice fancy language to say this) teachers are all afraid of being evaluated and punished for the results. But teachers should be practical enough to see the value in trading an end to test-linked evaluations in exchange for keeping the annual tests themselves.

To wrap up

As always, Brookings really captures the point of view of economists who haven't an actual clue about what goes in actual schools.

The biggest gaping hole in their proposal is an unfounded belief in the validity of The Big Test. They believe that The Big Test is a valid measure of learning, and that is an assumption that nobody, anywhere has backed up. The closest these guys come is throwing around the infamous Chetty results, and all that Chetty shows is that there is a slight correlation between test scores and later financial success (thereby creating supremely narrow definitions of learning and success). For their purposes, that means nothing. I'll bet you that there's a correlation between how nice a student's shoes are and how successful that student is later in life, but that doesn't mean buying nice shoes for every student would make the student successful later in life.

But every piece of the Brookings argument rests on that foundation-- that a narrow bubble test with some questions about math and some reading questions somehow measures the full depth and breadth of a student's education. Brookings assumes that people are just upset about the High Stakes part of High Stakes Testing; they fail to grasp that a major reason for being upset about the High Stakes portion is that the Testing is crap. You can play with the data from the crap test all day, but at teh end of the day, you'll just have crap data in a shiny report.

Final verdict? Brookings has completely failed to make a case for annual testing.

Wednesday, January 7, 2015

Fun with Teacher Evaluation

Jim Popham presents a great little infomercial for the teacher evaluation biz, including such fine products as Five Hour Pedagogy and a Value-added Mystery Model. Save this for when you need a ten-minute lift.

Monday, January 5, 2015

Speak Up Now for Teacher Prep Programs

The holidays are over, life is back to normal(ish), and your classroom has hit that post-holiday stride. It is time to finally make your voice heard on the subject of teacher preparation programs.

As you've likely heard, the USED would like to start evaluating all colleges, but they would particularly like to evaluate teacher preparation programs. And they have some exceptionally dreadful ideas about how to do it.

Under proposed § 612.4(b)(1), beginning in April, 2019 and annually thereafter, each State would be required to report how it has made meaningful differentiations of teacher preparation program performance using at least four performance levels: “low-performing,” “at-risk,” “effective,” and “exceptional” that are based on the indicators in proposed § 612.5 including, in significant part, employment outcomes for high-need schools and student learning outcomes.

And just to be clear, here's a quick summary from 612.5

Under proposed § 612.5, in determining the performance of each teacher preparation program, each State (except for insular areas identified in proposed § 612.5(c)) would need to use student learning outcomes, employment outcomes, survey outcomes, and the program characteristics described above as its indicators of academic content knowledge and teaching skills of the program's new teachers or recent graduates. In addition, the State could use other indicators of its choosing, provided the State uses a consistent approach for all of its teacher preparation programs and these other indicators are predictive of a teacher's effect on student performance.

Yes, we are proposing to evaluate teacher prep programs based on the VAM scores of their graduates. Despite the fact that compelling evidence and arguments keep piling up to suggest that VAM is not a valid measure of teacher effectiveness, we're going to take it a step further and create a great chain of fuzzy thinking to assert that when Little Pat gets a bad grade on the PARCC, that is ultimately the fault of the college that granted Little Pat's teacher a degree.

Yes, it's bizarre and stupid. But that has been noted at length throughout the bloggosphere plenty. Right now is not the time to complain about it on your facebook page.

Now is the time to speak up to the USED.

The comment period for this document ends on February 2. All you have to do is go to the site, click on the link for submitting a formal comment, and do so. This is a rare instance in which speaking up to the people in power is as easy as using the same device you're using to read there words.

Will they pay any attention? Who knows. I'm not inclined to think so, but how can I sit silently when I've been given such a simple opportunity for speaking up? Maybe the damn thing will be adopted anyway, but when that day comes, I don't want to be sitting here saying that I never spoke up except to huff and puff on my blog.

I just gave you a two-paragraph link so you can't miss it. If you're not sure what to say, here are some points to bring up-

The National Association of Secondary School Principals has stated its intention to adopt a document stating clearly that they believe that VAM has no use as an evaluation tool for teachers.

The American Statistical Association has stated clearly that test-based measures are a poor tool for measuring teacher effectiveness.

A peer-reviewed study published by the American Education Research Association and funded by the Gates Foundation determined that “Value-Added Performance Measures Do Not Reflect the Content or Quality of Teachers’ Instruction.”

You can scan the posts of the blog Vamboozled, the best one-stop shop for VAM debunking on the internet for other material. Or you can simply ask a college education department can possibly be held accountable for the test scores of K-12 students.

But write something. It's not very often that we get to speak our minds to the Department of Education, and we can't accuse them ignoring us if we never speak in the first place.

Pages