AI News, Study finds gender and skin-type bias in commercial artificial-intelligence systems

Study finds gender and skin-type bias in commercial artificial-intelligence systems

In the researchers' experiments, the three programs' error rates in determining the gender of light-skinned men were never worse than 0.8 percent.

The findings raise questions about how today's neural networks, which learn to perform computational tasks by looking for patterns in huge data sets, are trained and evaluated.

Chance discoveries The three programs that Buolamwini and Gebru investigated were general-purpose facial-analysis systems, which could be used to match faces in different photos as well as to assess characteristics such as gender, age, and mood.

Several years ago, as a graduate student at the Media Lab, Buolamwini was working on a system she called Upbeat Walls, an interactive, multimedia art installation that allowed users to control colorful patterns projected on a reflective surface by moving their heads.

The team that Buolamwini assembled to work on the project was ethnically diverse, but the researchers found that, when it came time to present the device in public, they had to rely on one of the lighter-skinned team members to demonstrate it.

Quantitative standards To begin investigating the programs' biases systematically, Buolamwini first assembled a set of images in which women and people with dark skin are much better-represented than they are in the data sets typically used to evaluate face-analysis systems.

Next, she worked with a dermatologic surgeon to code the images according to the Fitzpatrick scale of skin tones, a six-point scale, from light to dark, originally developed by dermatologists as a means of assessing risk of sunburn.

Study finds gender and skin-type bias in commercial artificial-intelligence systems

Three commercially released facial-analysis programs from major technology companies demonstrate both skin-type and gender biases, according to a new paper researchers from MIT and Stanford University will present later this month at the Conference on Fairness, Accountability, and Transparency.

The findings raise questions about how today’s neural networks, which learn to perform computational tasks by looking for patterns in huge data sets, are trained and evaluated.

All three systems treated gender classification as a binary decision — male or female — which made their performance on that task particularly easy to assess statistically.

Several years ago, as a graduate student at the Media Lab, Buolamwini was working on a system she called Upbeat Walls, an interactive, multimedia art installation that allowed users to control colorful patterns projected on a reflective surface by moving their heads.

The team that Buolamwini assembled to work on the project was ethnically diverse, but the researchers found that, when it came time to present the device in public, they had to rely on one of the lighter-skinned team members to demonstrate it.

Quantitative standards To begin investigating the programs’ biases systematically, Buolamwini first assembled a set of images in which women and people with dark skin are much better-represented than they are in the data sets typically used to evaluate face-analysis systems.

Next, she worked with a dermatologic surgeon to code the images according to the Fitzpatrick scale of skin tones, a six-point scale, from light to dark, originally developed by dermatologists as a means of assessing risk of sunburn.

is that our benchmarks, the standards by which we measure success, themselves can give us a false sense of progress.” “This is an area where the data sets have a large influence on what happens to the model,” says Ruchir Puri, chief architect of IBM’s Watson artificial-intelligence system.

Facial Recognition Is Accurate, if You’re a White Guy

So she turned her attention to fighting the bias built into digital technology.

Now 28 and a doctoral student, after studying as a Rhodes scholar and a Fulbright fellow, she is an advocate in the new field of “algorithmic accountability,” which seeks to make automated decisions more transparent, explainable and fair.

Buolamwini studied the performance of three leading face recognition systems — by Microsoft, IBM and Megvii of China — by classifying how well they could guess the gender of people with different skin tones.

These companies were selected because they offered gender classification features in their facial analysis software — and their code was publicly available for testing.

AI facial analysis demonstrates both racial and gender bias

In order to test these systems, MIT researcher Joy Buolamwini collected over 1,200 images that contained a greater proportion of women and people of color and coded skin color based on the Fitzpatrick scale of skin tones, in consultation with a dermatologic surgeon.

Even with that knowledge, these figures are staggering, and it's important that companies who work on this kind of software take into account the breadth of diversity that exists in their user base, rather than limiting themselves to the white men that often dominate their workforces.

Facial recognition software is biased towards white men, researcher finds

New research out of MIT’s Media Lab is underscoring what other experts have reported or at least suspected before: facial recognition technology is subject to biases based on the data sets provided and the conditions in which algorithms are created.

Puri wrote that IBM conducted its own facial recognition study using the faces of parliamentarians, and while he acknowledges that IBM’s data set and methodology were slightly different, says “the error rates of IBM’s upcoming visual recognition service are significantly lower than those of the three systems presented in the paper.” Still, it’s hardly the first time that facial recognition technology has been proven inaccurate.

Two years ago, The Atlantic reported on how facial recognition technology used for law enforcement purposes may “disproportionately implicate African Americans.” It’s one of the larger concerns around this still-emerging technology – that innocent people could become suspects in crimes because of inaccurate tech – and something that Buolamwini and Gebru also cover in their paper, citing a year-long investigation across 100 police departments that revealed “African-American individuals are more likely to be stopped by law enforcement and be subjected to face recognition searches than individuals of other ethnicities.” And, as The Atlantic story points out, other groups have found in the past that facial recognition algorithms developed in Asia were more likely to accurately identify Asian people than white people;

Gender Shades

The Gender Shades Project pilots an intersectional approach to inclusive product testing for AI. Gender Shades is a preliminary excavation of inadvertent ...

AI Now 2017 - Experts Workshop

Fairness in Machine Learning

Machine learning is increasingly being adopted by various domains: governments, credit, recruiting, advertising, and many others. Fairness and equality are ...