Guest post: At the mercy of black box algorithms trained by skewed data

Originally a comment by latsot on The AI did not like women.

The interesting part (well. at least to people like me) is this:

It penalized resumes that included the word “women’s,” as in “women’s chess club captain.” And it downgraded graduates of two all-women’s colleges, according to people familiar with the matter.

Nobody told it to take notice of the word “women’s”, it worked that out all by itself. This is one of the deeper problems of machine learning: the software can generate unexpected concepts and make decisions based on them and there’s often no way to know this is happening. Sometimes these concepts can perform well in a task, but then start to do badly when input data gradually changes. Sometimes they can bias future learning even more than it is already biassed.

There are lots and lots (and lots) of problems with algorithms running everything. Having no way to tell why particular decisions have been made is one of them. Trying to fix a bad process by training with the data that it produced is another (in the other room I mentioned the AI system lots of police forces use to predict crime. SPOILER: it picks black neighborhoods) is another.

But by far the biggest problem is the widespread assumption that if the programmers try hard enough, the algorithm can do everything. Which, to be fair, is an assumption everyone I have ever worked for has shared and is not only an AI issue. The UK’s porn filters and the proposed EU copyright filters are examples of systems that can not possibly work. Youtube’s copyright filter has proven this over and over again but nobody seems to take any notice.

I’ve drifted off-topic but my point is that this story is entirely unsurprising to anyone who works in the field (and, I assume, many who don’t). It’s going to happen more and more. We’re increasingly at the mercy of black box algorithms trained by skewed data with – for all anyone knows – capricious or malevolent intent. It’s as dystopian as hell.

6 Responses to “Guest post: At the mercy of black box algorithms trained by skewed data”