Sunday, December 29, 2019

The Height Of A Dead Salmon

A while back someone sent me an article with a striking lead:

The methodology is straightforward. You take your subject and slide them into an fMRI machine, a humongous sleek, white ring, like a donut designed by Apple. Then you show the subject images of people engaging in social activities — shopping, talking, eating dinner. You flash 48 different photos in front of your subject's eyes, and ask them to figure out what emotions the people in the photos were probably feeling. All in all, it's a pretty basic neuroscience/psychology experiment. With one catch. The "subject" is a mature Atlantic salmon.

And it is dead.

And yet, the fMRI showed brain activity. The experiment was run in 2009 and drew a fair amount of attention at the time, perhaps because it allowed editors to write headlines like "fMRI Gets Slap In The Face With A Dead Fish" and "Scanning Dead Salmon in fMRI Machine Highlights Risk Of Red Herrings." The paper, which carried the droll title "Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction" didn't show that dead salmon have thoughts about human social interactions, and it didn't show that fMRI is junk. It showed, as one writer put it, that "bad statistics lead to bad science."

"Guess what I'm thinking right now!"
To understand why the salmon looked alive, there are two things to get. One is that vast amounts of data invariably include bizarre little outliers, and that if you do not correct for the comparison of the huge amounts of data, junk gets into your system. This matters to us because crunching huge amounts of data is how the Artificial Intelligences using "machine learning" to figure out what a student coulda woulda shoulda completed as a competency depend on. When A VAM program decides that a student did better than expected, that expectation is driven by crunching vast amounts of data from supposedly similar students under supposedly similar conditions. This is how outfits like Knewton could claim to being the all-knowing eye in the sky (and then fail at it). The assertion that vast amounts of data insure number-crunching accuracy is not true-- particularly if the data has not been properly massaged. This is how the salmon study authors committed some deliberately bad science.

Second, we have to get what an fMRI does and does not do. Boing-Boing, of all places, caught up with the authors of the study, who pointed out that despite the popular understanding of what the machinery does, it doesn't do that at all:

I'm so tired about hearing about "the brain lighting up". It makes it sound like you see lights in the head or something. That's not how the brain works. It suggests a fundamental misunderstanding of what fMRI results mean. Those beautiful colorful maps ... they're probability maps. They show the likelihood of activity happening in a given area, not proof of activity. According to our analysis, there's a higher likelihood of this region using more blood because we found more deoxygenated blood in this area. It's also correlational. Here's a time frame and the changes we'd expect, so we see which bits of brain correlate with that.

So, without getting too far into neuroscience, the fMRI does not map brain activity, but uses a proxy based on a correlation stretched over a huge number of data points.

This, I'll argue, is much like what we do with high stakes testing. Policy makers figure that Big Standardized Test scores correlate to educational achievement and therefor make a decent proxy. Except that they mostly correlate with family socio-economic background. The supposedly scientific measure of educational achievement in students is faulty. Using test scores as a proxy for teacher effectiveness is even faultier.

Which is why someone can write a paper like the hilarious "Teacher Effects on Student Achievement and Height: A Cautionary Tale." In this instant classic, researchers used the fabled VAM sauce to check for teacher influence on student height, and sure enough-- some teachers can be scientifically shown to have an effect on the physical growth of their students. Just in case you need one more piece of evidence that the value-added measures being used to evaluate teachers are, in fact, bunk.

There are several larger lessons to remember here:

1) A proxy for the thing you want to measure is not the thing you want to measure. If you don't remember that, you are going to make a mess of things (and even if you do remember, Campbell's Law says you probably will still screw it up).

2) Large amounts of data can say many things. Not all of those things are true. Bad statistics, badly managed, lead to really bad science.

3) When "science" tells you that a dead salmon is having thoughts and your common sense tells you that the salmon is frozen, dead, and thought-free, you should go with common sense. Dead fish tell no tales.

Science, data-- wonderful stuff. But not magic-- especially when misused or tossed around by folks who are no more scientists than they are educators.

No comments:

Post a Comment