Pages

Wednesday, November 27, 2019

AI: Bad Data, Bad Results

Once upon a time, when you took computer programming courses, you had two things drilled into you:

1) Computers are dumb. Fast and indefatigable, but dumb.

2) Garbage in, garbage out.

The rise of artificial intelligence is supposed to make us forget both of those things. It shouldn't. It especially shouldn't in fields like education which are packed with cyber-non-experts and far too many people who think that computers are magic and AI computers are super-shiny magic. Too many folks in the Education Space get the majority of their current computer "training" from folks who have something to sell.

AI is too often used inappropriately, when all we've really got is a fancy algorithm, but no actual intelligence, artificial or otherwise. We're supposed to get past that with software that can learn, except that we haven't got that sorted out either.

Remember Tay, the Microsoft intelligent chatbot that learned to be a horrifying racist? Tay actually had a younger sister, Zo, who was supposed to be better, but was just arguably worse in different ways. Facial recognition programs still mis-identify black faces.

The pop culture notion, long embedded in all manner of fiction, is that a cold, logical computer would be ruthlessly objective. Instead, what we learn over and over and over and over and over and over again is that a computer is ruthlessly attached to whatever biases are programmed into it.

Wired just published an article about how tweaking the data used to train an AI could be the new version  of sabotage, a way to turn an AI program into a sleeper agent. Imagine cars trained to veer into a ditch if they see a particular sign,

“Current deep-learning systems are very vulnerable to a variety of attacks, and the rush to deploy the technology in the real world is deeply concerning,” says Cristiano Giuffrida, an assistant professor at VU Amsterdam who studies computer security, and who previously discovered a major flaw with Intel chips affecting millions of computers.

It's not just education that suffers from a desire to throw itself at the mercy of this unfinished, unreliable technology. Here's a survey from Accenture finding that "84% of C-suite executives believe they must leverage artificial intelligence (AI) to achieve their growth objectives." And in what is typical for folks chasing the AI mirage, "76% report they struggle with how to scale." They believe they need AI, for some reason, but they have no idea how to do it. So it's not just your superintendent saying, "We're going to implement this new thingy wit the AI that the sales rep tells me will totally transform our learning stuff, somehow, I hope."

Reading faces is just one example. It's not just facial recognition for the implementation of the surveillance state enhanced school security. We've seen multiple companies that claim that they have software that can read student expressions and tell you what the students are thinking and feeling. How? How do you train a very dumb, fast, indefatigable object to read the full range of complex human emotion? It would have to be via training, which would mean reading a bunch of practice faces, which would mean what-- some computer engineer sits in front of a camera while an operator says "Now think a sad thought"--click-- "Good! Now think of a happy thought!"

The Wired piece talks about the danger of deliberately introducing bad data into a system, which in an education setting could mean anything from clever new ways to juke the stats and data, all the way to tweaking software so that it trains certain responses into students.

But an education system wouldn't need to be deliberately attacked. Just keep pouring in the bad data. Personalized [sic] learning programs allegedly driven by AI depend on the data from the various assessments; what is the guarantee or check that assures us that those assessments actually assess what they are meant to assess? Who assesses the assessor? The very fact that the data has to be generated in a form that a computer can process means that the data will be somewhere between "kind of off" and "wildly bad." It's no wonder that, despite the many promises, there is still no software program that can do a decent job of assessing writing. Schools are generating bad data, corrupted data, incomplete data, and data that just doesn't measure what it says it measures-- all at heretofore unheard of rates. Trying to harness this data, particularly for instructional purposes, can only lead to bad results for students. Garbage in, garbage out.

Meanwhile, in the real world, my Facebook account just spent a week using facial recognition tagging software to identify everything from my two-year-old twins to random bits of fabrics as various random friends. And mass-mailing software continues to send the occasional item to my ex-wife at this address. Best recent achievement by my data overlords-- a phone call to my land line at this house from someone who wanted to sell something to my ex-wife's current husband. These are not just cute stories; everyone has them, and they are all reminders that the computerized data-mining AI systems do a lousy job of separating good data from bad, and that all of that bad data goes right into the hopper to help the software make its decisions.

Computers are big, dumb, fast machines, and when you give them junk to operate with, they give you junk back. That hasn't changed in sixty years. The notion that these big dumb brutes can be trusted with the education of young humans is a marketing pipe dream.

No comments:

Post a Comment