Pages

Friday, June 6, 2014

Data-Driven Discrimination

If you needed any more reason to be wary of the massive upsuck of student data from the education system, The Weekly Wonk offers more reason to fear our Data Overlords.  Seeta Peña Gangadharan, and Samuel Woolley, a pair of data wonks, have written "Decoding Discrimination in the Digital Age," a brief overview of some of the concerns raised by the massive data mining going on all around us these days.

They open by reminding us of a 1977 study that showed how housing discrimination was subtle but obvious once one looked at the numbers, and they wonder about the difficulty of discerning modern digital discrimination. 

Unlike the mustache-twiddling racists of yore, conspiring to segregate and exploit particular groups, redlining in the Information Age can happen at the hand of well-meaning coders crafting exceedingly complex algorithms. One reason is because algorithms learn from one another and iterate into new forms, making them inscrutable to even the coders responsible for creating them, it’s harder for concerned parties to find the smoking gun of wrongdoing. (Of course, sometimes coders or overseeing institutions are less well-meaning than others – see the examples to come).

In other words, back in 1977, a realtor had to look at you, see you were black, and then determine that he wasn't going to show you certain houses because, you know, you're black. A computer program is able to enact even more subtle discrimination without anyone ever knowing that it's doing so.

The kluge-like nature of these systems is critical, because it means that nobody really knows how the algorithms are working. Gangadharan and Woolley cite the welfare case management systems currently in use; as descendants of the "let's get people off welfare" initiatives beginning in the 70s, these systems are now so "efficient" at determining eligibility that the systems now "reduce caseloads in an increasingly black box manner."

But even if a system is well-designed,

it can be “garbage (data) in, discrimination out.” A transportation agency may pledge to open public transit data to inspire the creation of applications like “Next Bus,” which simplify how we plan trips and save time. But poorer localities often lack the resources to produce or share transit data, meaning some neighborhoods become dead zones—places your smart phone won’t tell you to travel to or through, isolating these areas into islands of poverty. 

It's the linkages that can be the real killers. The criminal justice system now routinely collects DNA from arrested individuals. But your DNA doesn't just identify you like a fingerprint-- it links you to all your relatives. And then there's all the computer devices we use.

Homes come outfitted with appliances that sense our everyday activities, “speak” to other appliances, and report information to a provider, like an electric utility company. While it’s presumptuous to say that retailers or utility companies are destined to abuse data, there’s a chance that information could be sold down the data supply chain to third parties with grand plans to market predatory products to low-income populations or, worse yet, use data to shape rental terms or housing opportunities. What it boils down to is a lack of meaningful control over where information travels, which makes it more troublesome to intervene if and when a problem arises in the future.

Gangadharan and Woolley do not address the data mining of education, but the implications aren't hard to imagine, particularly in a system that not only admits the possibility of data linkage, but embraces it. From way back in the infamous Marc Tucker "Dear Hillary" letter through the Bush/Obama cradle-to-career tracking program, our Data Overlords have said that a fully-integrated data trail is a Good Thing.

But as Gangadharan and Woolley's article suggests, what happens

-- if the data cloud determines that an infant has a genetic marker present in earlier generations who are linked to criminal behavior or recurrent disease or susceptibility to alcoholism?

-- if a corporation decides to use predatory marketing carefully aimed at "low information" customers who are identified by their elementary and high school academic records?

-- if an algorithm selects out groups who can be identified by genetic, family and school records and targets them for discriminatory practices?

-- if you apply for a job and the program sees that you used to do lots of googling for bong suppliers and that your grades and test scores dipped precipitously about the same time, thereby deciding you were a teenage pothead?

-- if a program puts together your genetic record, your googling for "HIV treatments" and your repeated trips to the guidance counselor in high school, and decides you are too high risk to be hired?

It's been just a couple years since Target famously announced to a father that his daughter was pregnant, and we are still only scratching the surface for all the many ways that data trails can be used to jump to conclusions about people and then discriminate against them on that basis. And that's just looking at the accidental discrimination.

Remember when we decided that it is illegal to ask someone they're race on all sorts of job and financial application paperwork? Well, now it will never be necessary again. Remember how juvenile court records are supposed to remain sealed so that a youthful mistake doesn't ruin the rest of someone's life? Well, that's pretty much a joke now.

We don't even know yet all the ways that this data monster can screw with peoples' lives, destroy any semblance of privacy, and make them victims of discrimination carried out by programs that can't even be questioned. Imagine what could happen if we fed that monster everything we know about a person's youth and childhood?

No comments:

Post a Comment