Thursday, January 24, 2019

The Trouble With Evidence

So now some voices are calling for an emphasis on evidence-based practices in classrooms, and I don't disagree. Evidence-based is certainly better than intuition-based or wild-guess-based or some-guy-from-the-textbook-company-told-us-to-do-this based. But before we get all excited about jumping on this bus, I think we need to think about our evidence bricks before we start trying to build an entire house on top of them.

There are three things to remember. The mot important is this:

Not all evidence is created equal.

Your Uncle Floyd is sure that global warming is a hoax, and his evidence is that it's currently five degrees outside. The Flat Earth Society has tons of evidence that the world is not round at all. Youtube is crammed with videos showing the evidence that 9/11 was faked, that the moon landing was staged, and that the Illuminati are running a huge world-altering conspiracy via the recording industry. Every one of those folks is certain that their idea is evidence-based, and yet their evidence is  junk.

Some evidence is junk because it has been stripped of context and sense. Some is junk because it has been unmoored from any contradictory evidence that might give it nuance and accuracy. Some is junk because it has been mislabeled and misrepresented.

One of the huge problems in education evidence-based anything is that when we follow the trail, we find that the evidence is just the same old set of test scores from a single bad standardized test. Test scores have been used as evidence of student learning, of teacher quality, of school achievement. Test scores have been used as evidence of the efficacy of one particular practice, an influence somehow separated from all others, as if we were arguing that the beans Chris ate at lunch three Tuesdays ago are the cause for Chris's growth spurt. And don't even get me started on the absurd notion that a teacher's college training can be evaluated by looking at student test scores.

Certainly test scores are evidence of something, but not much that's useful. Don't tell me that Practice A improves student learning if all you really mean is that Practice A appears to make test scores go up. The purpose of education is not to get students to do well on a standardized test.

Educational evidence has one other quality problem-- experimental design doesn't allow for real control groups. For the most part, educational researchers cannot say, "We know your child is supposed to be getting an education right now, but we'd like to use her as a lab rat instead." Arguably there are huge exceptions (looking at you, Bill Gates and Common Core), but mostly educational research has to go on during the mess of regular life, with a gazillion variables in play at every moment. Yes, medical research has similar problems, but in medical research, the outcomes are more clear and easily measurable (you either have cancer, or you don't).

Bottom line: much of the evidence in evidence-based educational stuff is weak. At one point everyone was sure the evidence for "learning styles" was pretty strong. Turns out it wasn't, according to some folks. Yet teachers largely use it still, as if they see evidence it works. Apparently the evidence is not strong enough to settle the battle.

And all of this is before we even get to the whole mountain of "evidence" that is produced by companies that have a product to sell, so they go shopping for evidence that will help sell it.

This helps explain why teachers are mostly likely to trust evidence that they collect themselves,  the evidence of their own eyes and ears, invoking a million data points gathered on a daily basis. This evidence is not foolproof either, but they know where it came from and how it was collected.

After we work our way through all that data collection, we still have to interpret it.

We just came out of a harrowing weekend of national weekend over an incident in DC involving some Catholic school teens, some indigenous peoples, and the Black Israelites. We've all had the same opportunity to watch and examine the exact same evidence, and we have arrived at wildly different conclusions about what the evidence actually shows. Not only do we reach different conclusions, but we are mostly really, really certain of our conclusions. And that's just the people making good-faith efforts to interpret the data; once we throw in the people trying to push a particular agenda, matters get even worse.

My point is this-- anyone who thinks that we'll be able to just say "Well, here's the evidence..." and that just settles everything is dreaming. All evidence is not created equal, and not all interpretations of evidence are created equal.

Newton is no longer king.

I get the desire, I really do. You want to be able to say, "Push down this lever, and X happens. Pull on this rope and the force is exerted through a pulley and the object on the end of the rope moves this much in this direction ." But levers and pulleys are simple machines in a Newtonian universe. Education is not a simple machine, and we don't live in Newton's universe any more.

We left Newton behind about a century ago, but the memos haven't gotten around to everyone yet. Blame that Einstein guy. Things that we thought of as absolutes like, ay, time and space-- well, it turns out how they kind of depend on where you are and how you're moving and even that can only be measured relative to something else. And then we get to chaos theory and information theory (one of the most influential reads of my life was Peter Gleick's Chaos) and the science that tells us how complex systems work, and the short version is that complex systems do not work like simple machines. Push down the "lever" in a complex system and you will probably get a different result every time, depending on any number of tiny uncontrollable variables. Not wildly variable, but not straight line predictable, either (we'll talk about strange attractors some time).

Students and teachers in a classroom are a complex machine indeed. Every teacher already gets this-- nothing will work for every student, and what was a great lesson with last year's class may bomb this year.

But there are people who desperately long for teaching to be a simple machine in Newton's world. I've been following reading debates for weeks and there are people in that space who are just so thoroughly sure that since Science tells us how the brain works re: reading, all we have to do is put the same Science-based method of teaching reading in the hands of every teacher, then every student will learn to read. It's so simple! Why can't we do that? Hell, Mike Petrilli floated for five minutes the notion of suing teacher prep schools that didn't.

The standard response to this is that teaching isn't a science. But I'll go one better-- it is science that tells us that such a simple Newtonian machine approach will not work. The dream some evidence-based folks have that we'll just use science to determine the best practices via evidence, and then just implement that stuff-- they are misunderstanding both teaching and science.

The federal definition of evidence-based is dumb.

The ESSA includes support for evidence-based practices, and it offers definitions of different levels of evidence-basededness that are.... well, not encouraging.

Strong Evidence means there's a least one good research paper that suggests the intervention will improve student outcomes (which, of course, actually means "raise test scores") or a related outcome (which means whatever you want it to). There should be no legit research that contradicts the findings, it should have a large sample, and the sample should overlap the populations and settings involved. In other words, research about rural third graders in Estonia does not count if you're looking for an intervention to use with American urban teens.

Moderate Evidence is one good "quasi-experimental study" and then all the other stuff applies. Not really clear what a quasi-experimental study is, but the department still considers moderate evidence good enough.

Promising Evidence requires a correlation study because (and this really explains a lot) even the federal government doesn't know the difference between correlation and causation. I just smacked my forehead so hard my glasses flew off.

Demonstrates a Rationale, like Promising Evidence, somehow doesn't appear on the No This Doesn't Count list. All this means is you can make an argument for the practice.

All four of these are enshrined in ESSA  as "evidence-based," even though a layperson might conclude that at least two of them are sort of the opposite of evidence-based.

So do we throw out the whole evidence based thing?

Nope. Having evidence for a practice is smart, and it's mostly what teachers do. I don't think I've ever met a teacher who said, "Well, this didn't work for anybody last year, so I'm going to do it again this year." No, teachers watch to see how something works, and then, like any scientist, accept, reject or modify their hypothesis/practice. This, I'd argue, is how so many classroom teachers ended up modifying the baloney that was handed to them under Common Core.

Teachers are perfectly happy to borrow practices that they have reason to trust. The most powerful message implicit in a site like "Teachers Pay Teachers" is "I use this and it works for me." Government agencies and policy wonks who want to help disseminate best evidence-based practices can be useful, provided they're clear about what their evidence really is.

But evidence-based should not be elevated to the status of Next Silver Bullet That Will Fix Everything, and government definitely absolutely positively should not get involved in picking winners and losers and then mandating which will be used. In the dead-on words of Dylan Wiliam, "Everything works somewhere and nothing works everywhere."

Nor should we get trapped in the evidentiary Russian doll set, where we need evidence of the evidence really being evidence by collecting more evidence about the evidence ad infinitum. At some point, someone has to make a call about whether to use the practice or not, and that someone should be the classroom teacher. At that point, that teacher can begin collecting the evidence that really matters.


  1. The least valuable "evidence" in the world of test-and-punish reform rears its useless head as any given "incorrect" test item response.

    Without knowing WHY a student response was incorrect, all those wrong answers are just possibility indicators.

    The possibilities:

    confusing test item
    age inappropriate test item
    unmotivated student
    reading disability
    limited working vocabulary
    test fatigue
    tired student
    angry student
    stressed student
    frequently absent student
    cognitively disable student
    serious life distractions
    poorly taught topic
    age inappropriate topic
    true knowledge/skill deficit
    other . . . ?

    So much for relentless quest for useful data via standardized testing.

  2. The whole proficiency-based learning (PBL, but not the project-based learning PBL) approach is tied to "evidence" of student learning, but there's (almost) no evidence that the approach itself is valid.