I was looking for engineers' pictures on google images. Over the first 30 images: 24 were representing men only, 3 women only and 1 a group of developers.
I'm pretty sure the ratio of women in developers' population if way higher than 10%. Yet my browser was utterly biased towards male representation. I tried with other queries like pre-school teacher, college teacher, nurse... they were all gender biased.
This is not a big surprise, Information Retrieval models learn the bias of the dataset they are trained on. What's surprising is that this is not getting any better over time, on the contrary. According to a recent study from Kepler University, the newest NLP models (like BERT) intensify gender bias.
NLP models ingests gender-bias
Over the last 2 years, increasingly performant Natural Language Processing (NLP) algorithms have come out (like BERT or BLEURT). Computer can learn better than ever about languages and their meaning. Give a model a text it will sum it up for you and answer any questions. State-of-the-art NLP models can infer a lot about the world where are living it. And this world is full of bias.
The bias in AI models is the inevitable accumulation of biases in human history and language.
When you train an NLP model, as any ML model, you feed it with data which are text. The model will learn the relationships between words. For example, {nurse- doctor - hospital} or {sand-beach} will be highly correlated. So using that model, if you make a search about 'hospital', you will get results of doctors and nurses along.
Problem is that the data we feed in is biased. For example, a text about 'doctors' will mainly use words related to men (he, man, guy...). So the model trained will learn that there is a correlation between 'doctor' and 'man'.
Documents have gender, and models learn them.
What is the gender of my document?
For any document, you can calculate the magnitude of the gender represented in it. Measure degree of presence of male-female concepts in it. There are different tehcnics. I'll take the TF (Term Frequency) method which is similar to TF used for word embedding. With that method, you define a set of words correlated with one of the gender female or male (e.g female = {she, woman, lady, pregnant...}, male = {he, man, sir, father ...} ). The TF method is about counting the frequency of the set of terms standing for each gender.
For document d we get magnitude of female an male in the document.
By subtracting male and female magnitude we get the overall magnitude of a document. This way of calculation treats gender as a signed value. If positive the document is about man mainly and if negative, about female mainly.
Late-NLP models make gender-bias worse
Kepler University study put forward that the more elaborate and performant was a NLP model, the more it will intensify the gender bias.
They took a set of queries made on Bing and the anwsers provided. Same way you can classify the gender of a document, you can also classify the gender of a query. So they get a set of male, female, non-gendered and multiple genders queries.
They created Informaton Retrieval models based on different embeddingg technics (BM25, KNMR, MatchPyramid, BERT...). For the same set of queries,they compared the gender-bias induce by the different NLP embedding technics.
It was pretty surprising, all queries answers were biased towards male and the latest models like BERT increased the bias by 40%.
AI is a tool and as an AI engineer, we must always consider who will be using our systems and for what purpose.
How can we reduce gender bias in NLP systems
An 'easy' way to fight it, is to use more balanced datasets. Gender-swapping technic is a solution, it consists of swapping the pronoun of a sentence to create more consistent data for the female gender. For example, 'he is a doctor' would be replaced by 'she is a doctor'.
This topic is becoming a quickly-growing area of research. Google recently released their method to reduce gender-bias in transation: A scalable approach to reduce gender-bais (Apr, 2020). Since 2018 they've been working on reducing gender bias for neutral languages like English but were struggling to make it scale to other languages. To my knowledge, this technic is applied to translation and not yet to browsing.
As our awareness grows, we must realize our role in creating technology that works for the many, not the few.
Top comments (0)