Saturday, March 22, 2014

Cloudy with a Chance of Data

There are so many reasons to be opposed to the business of mining and crunching data. We like to rail about how the data miners are oppressive and Big Brothery and overreaching. But there's another point worth making about our Data Overlords:

Data miners are not very good at their job.

My first wife and I divorced about twenty years ago. We have both since remarried and moved multiple times. And yet, I still get occasional pieces of mail for her here at my current home. The last time I looked at my credit report, it included me living at an address that she used after we split. I could try to get it changed but A) she is a responsible woman who I'm sure has excellent credit and B) have you ever tried to get info on your credit report changed?

As I work on this, several other browser windows are showing ads for K12. I cruised to some sites maybe two weeks ago doing research for some pieces about cyber charters, but now my browser and adsense are sure I'm in the market for cyberschool. It is tempting to click the ads repeatedly in order to drain k12's ad budget of another wasted 25 cents, but I would have to live with the consequences.

My brother and I have an old game we sometimes play. When pollsters call us, we answer opposite of our actual beliefs in order to feed the pollster false info. Because who says we can't or shouldn't?

Before anything of use can happen in the data cloud, two things must be true:

1) The data must be good.

The tools for collection must be accurate. Designing good data collection tools is hard. The Data Overlords are trying to convert all the tools of instruction and assessment into tools for data gathering, but that's not what they're generally designed to do. Most fundamentally, I collect data about a student to create a picture of that student, not to turn that student into one data point among millions.

But beyond the accuracy of the tool, there is the willingness of the data generators. I suspect this is a blind spot for Data Overlords-- they are so convinced of the importance of data collection that they don't necessarily understand that most of us feel no compelling reason to cooperate.

There is no moral imperative to help the Data Overlords gather accurate data.

2) The program for crunching it must be good.

In the late seventies I was studying BASIC programming language and our professor was reminding us repeatedly that computers are stupid machines that happen to possess speed and long attention spans. If we tell them to do stupid things, they will-- but really, really fast! A computer is not one whit "smarter" than the person who programmed it.

If the person writing the software believes that knowing "2 + 2 = 5" means you're ready for calculus, the program will find many six-year-olds are prepared for math courses.

Put another way, a computer doesn't know how to predict anything that no human being knows how to predict, and it particularly doesn't know how to predict anything involving a series of complicated data points that the software writer failed to anticipate. So a human being could easily figure out that my ex-wife doesn't live here, but the software lacks the complexity to pull together the right data. And a human being could figure out that I used some of my brother's airline points to get a magazine subscription, but the software thinks he might live here, too.

The software can't figure out how to put every single person together with his/her perfect romantic match. It can't figure out exactly what movie you want to watch right this minute. And it doesn't know that I hope K12 dies a permanent death.

It's as simple as GIGO-- bad data processed poorly yields no useful results. Waving your laser pointer and intoning, "Look! Compuuuuters! Data! Data made out of numbers!! It's magical!!" will not convince me to cheerfully welcome my New Data Overlords.

No comments:

Post a Comment