Thursday, June 8, 2017

Data Overload

The last nine months have brought a shift in my view of data.

Despite my long and vocal opposition to the gathering in data in schools, to the use of standardized tests to generate data that is in turn used for everything from judging students tons of it every day in my classroom-- in fact, one of my criticisms of many reformer data programs is they involve far too little data. On top of that, you need to collect good data (the Big Standardized Tests generate crappy data) and you need to run it through something that produces useful, legitimate analysis (not something like the opaque, crappy VAM models).

But one thing I'd always felt is that you can't have too much data. Now I'm rethinking that position.

Twin pregnancies are considered high risk, and when the twins share one placenta there's a risk of something called Twin-Twin Transfusion Syndrome (if you are pregnant with twins, I recommend the advice of one of our doctors which was "Whatever you do, don't look this up on the internet")-- all of that put together gets you a ticket to a veritable testapalooza. Ultrasound after ultrasound, with measurements and pictures; my wife's uterus was more thoroughly explored than the surface of the moon after the first telescope.

Measurements indicated a difference in size of 21%, and protocol called for anything over 20% to trigger a new series of tests, even though none of our doctors thought any of these tests would discover anything important. One wonky heart measurement triggered a trip to Children's Hospital in Pittsburgh (where we took the grape elevator to the Hippo Ward) for some super-ultrasound; he confirmed what everyone already said, which is that there was nothing to care about. We had our regular doctors at home and a specialist in Erie watching over us. And all of our doctors agreed on one thing-- if you use all of the equipment available to modern medicine to gather tons and tons of data, you will manage to capture little human variations that will alarm you a lot and actually mean very little.

Part of the problem is that the protocols aren't really complex enough. It's like a protocol that says that if your shoes are wet you need to be checked out for wound but actually it' only a major concern if your shoes keep getting wet even when you dry them off and they're actually wet with blood and there's a piece of rebar protruding from your torso. But you stepped in a puddle and now policy says you need to see the surgeon. And yes-- I would rather be safe than sorry and will go to crazy extremes to do so, but reflecting on the past nine months of constant test and peek and prod that 1) we have spent an awful lot of extra time worrying about things it wouldn't have occurred to us to worry about and 2) I can't imagine how someone who was uninsured or under-insured would have ever dealt with this.

So the extra data, because it lacked context and lacked complexity and because our doctors weren't entirely free to follow their own judgment, didn't really make things better, and may even have made things marginally worse in terms of worry and expense.

The kind of data we get from Big Standardized Tests and other standardized tools has a similar problem-- it's incomplete, lacking context, and is cut off from teacher's professional judgement (in many cases e aren't even allowed to view the instrument that generated the data. But when some of us scofflaws go ahead and peek anyway, what we see is that a question really hinged on a student knowing one or two vocabulary words (let's say the test manufacturer fave, "plethora"). So a student who doesn't know "plethora" misses a question that is interpreted as "student is below basic at understanding words through context clues" which in turn becomes "student is not ready for college or a career." Because the student didn't know what "plethora" means.

I have always believed that more data is better, but more bad data is just more bad data, and more data that's not thoughtfully collected can just add a bunch of noise, and the noise causes confusion and worry and expense and responses to problems that aren't even there.


  1. I wish I could remember who said it, but when I first joined Twitter, I ran across a tweet that said it's possible to be data rich, but information poor.

  2. A memo written by the Penn Hill Group on behalf of the CCSSO calculated that an medium-sized state (in terms of the number of schools) would have to produce "306,300 discrete data items" on their state report card to comply with ESSA. Even with a 1% error rate (which is generous), that's over 3,000 potential mistakes to trigger some further action.

  3. To the tune of money from Cabaret: Data makes the school go around, the school go around, the school go around, data makes the students pound, that clicky clacky sound, of online testing ground ... in ... data, data, data, data, data, data, data, data ... that clicky clacky sound of students going down.

    1. The most important piece of data in my local public high school is, as it has always been, GPA. A sufficiently high GPA gives a student automatic admission to my university, either because the student is ranked in the top third of their class or because they have at least a 2.0 in academic classes.

  4. The word "plethora" was on my PSAT in 1990. I got that analogy correct, and ended up a National Merit Scholar semifinalist.

    Because I'd watched "The Three Amigos" instead of cramming the night before.

    Not quite sure the moral of this story.