Uncovering Biases Based on Gender in Job Descriptions

April 23, 2024

Try out our Gender Bias Detection tool for Job Descriptions

The Problem

Imagine a classroom where only a privileged few have the opportunity to speak while others are consistently silenced. This scenario can be compared to a dataset in the world of intelligence. When certain voices or data dominate, biases naturally emerge, marginalizing those who’re underrepresented. By examining these biases we delve into questions: Who is absent from this dataset? What narratives are being overlooked?

In the field of employment the issue of gender bias continues to present obstacles, perpetuating inequality and impeding progress, towards an inclusive workforce. Although efforts have been made to acknowledge and address these biases they still persist, often subtly embedded within job descriptions and other texts. It is crucial to detect and address gender bias in texts in order to foster opportunities for individuals regardless of their gender identity.

The Goal

In order to create an equitable job market that is free from such biases, we must recognize these biases where they occur and then take steps to mitigate them. Only then will companies be able to hire people meritocratically.

In order to achieve this, we decided to build a tool that can analyze the job description attached to a listing and analyze how biased this description is towards one gender or another. This system would allow organizations to understand how biased their hiring processes are and take measures to make these processes more equitable.

The Background

A job listing is usually the first thing an applicant to a job reads through to form their perception of the company and the job they are supposed to do there. Biases in these listings, whether implicit or explicit, can dissuade the applicant from applying to the job or make them perceive their potential position in a different light. Gender biases are a common occurrence here due to how gender roles have come to entrench themselves in the professional world as well.

In our research, we found that job listings and descriptions for roles that are deemed traditionally female oriented (nurse, secretary, typist etc.) contain markedly different language use when compared to executive and managerial roles. This in turn forms a dialectic where roles in the latter group, despite them being ostensibly gender agnostic, come to become more and more male oriented.

It is also important to understand that equitability is not mutually exclusive with business goals.

A more equitable hiring policy and process does not only allow potential candidates to have more accessibility when it comes to their employment prospects. It also allows businesses to find candidates that are best suited for the role. Having the most qualified people in their respective roles is crucial for a business to achieve their potential. Hence, a meritocratic hiring process benefits all parties.

Our Approach: Building a system to ensure equitable hiring

Team

Using techniques from Natural Language Processing (NLP)

Natural Language Processing (NLP) offers tools for detecting gender bias in texts. Sophisticated NLP techniques like word embeddings and contextual embeddings allow us to analyze patterns and biases by capturing the relationships between words and phrases. For example word embeddings such as Word2Vec or GloVe can represent words in a dimensional space where their semantic similarities are preserved. By examining how closely gender specific terms are associated with job related words like “leader” or “manager” NLP models can identify biases within job descriptions.

Furthermore transformer based models such as BERT or GPT (Generative Pre trained Transformer) utilize embeddings to grasp the nuances of phrases within their respective contexts. This enables a detection of gender bias in language. Let’s explore the intricacies involved in applying NLP techniques to identify gender bias in texts.

Word Embeddings

Word embeddings refer to vector representations of words placed in a vector space. These embeddings position words with meanings together. Techniques like Word2Vec and GloVe learn these embeddings by considering co-statistics from a corpus of text. When applied to job descriptions word embeddings can capture relationships between gender terms (e.g., ‘he’ ‘she’) and job related words (e.g., ‘leader’,’ ‘manager’). By measuring cosine similarity or Euclidean distance between these word embeddings NLP models can detect instances where gender specific terms are closely associated with job roles indicating bias.

Contextual Embeddings

Contextual embeddings, unlike word embeddings, capture word meanings based on the surrounding context within sentences. Models like BERT and GPT utilize attention mechanisms to generate these embeddings, enabling NLP models to discern subtle language nuances, particularly in job descriptions where gender bias might be present. Attention mechanisms play a pivotal role by assigning relevance weights to words based on their contextual significance. Techniques like fine-tuning and transfer learning are employed to adapt pretrained transformer models for tasks such as identifying gender bias in job descriptions, where the models are trained on annotated datasets to specialize in recognizing biased language patterns.

Machine learning approaches for detecting gender bias in texts involve training models using annotated datasets, employing techniques like binary classification or sequence labeling. Binary classification models distinguish between gender-neutral and biased phrases, while sequence labeling models like CRF or BERT categorize instances of bias, such as pronouns or stereotypical adjectives, within job descriptions.

Preparing the Dataset

The process of building gender bias detection models begins with acquiring and preparing datasets that contain examples of gender bias in texts, such as job descriptions. Human annotators manually label instances of gender bias, including language, stereotypes or unequal representation. It is crucial to ensure that the annotated datasets are diverse and representative of industries, job roles and linguistic styles to guarantee the robustness and generalizability of the models.

In one of the projects, we did the initial exploratory research shown below:

Engineering Features

Feature engineering involves converting data into numerical representations that can be processed by machine learning models. In the case of gender bias detection features may include word embeddings, part of speech tags, syntactic features and semantic features. Furthermore incorporating domain features related to job descriptions like job titles, company descriptions and job requirements can enhance model performance. The feature extraction confusion matrix can be seen below:

Selecting the Model

Depending on the task at hand and available data resources various machine learning algorithms can be employed for gender bias detection. Used approaches include supervised learning methods such as classification or sequence labeling. There are types of models used for classifying text as either gender neutral or gender biased such as regression support vector machines (SVM) and neural networks. Another type of model called Conditional Random Fields (CRF) is particularly good at identifying instances of gender bias within a sequence of text.

When training these models the dataset is divided into three sets: training, validation and testing. The model is trained on the training set using examples that are labeled as either gender neutral or gender biased. To optimize the model’s performance and prevent overfitting its performance is evaluated on the validation set. The training objectives depend on the chosen approach: for classification models the objective is to distinguish between gender gender biased phrases: whereas for sequence labeling models its about assigning labels to individual tokens or segments within the text.

To enhance model performance when labeled data is limited, trained language models like BERT or GPT can be leveraged during tuning. These pre-trained models have been extensively trained on amounts of text data. Possess a deep understanding of language, in various contexts. By refining these models through datasets focused on identifying gender bias the models can effectively capture patterns of bias present in job descriptions.

To evaluate and validate the models performance appropriate metrics such as precision, recall and F1 score are used on a testing dataset. Additionally qualitative analysis can be conducted to assess the models ability to accurately identify types of gender bias.

The model’s generalizability and robustness can be further evaluated through techniques like cross validation. Testing its performance on datasets. To incorporate diversity and inclusion metrics into job description evaluations quantitative measures are employed. For instance the Gender Bias Index analyzes the language used in texts by assigning scores based on the presence of gender gender language. This involves preprocessing job descriptions to extract features, like word frequency and syntactic structures. These features are then analyzed using machine learning algorithms. By measuring and comparing inclusivity levels in job descriptions organizations can monitor progress over time. Prioritize necessary revisions to minimize gender bias.

Utilizing the Power of Human Oversight and Feedback Loops

While AI driven methods play a role, in identifying gender bias in written content it is essential to have oversight as well. Implementing feedback loops where diverse teams review job descriptions can provide insights and perspectives. This collaborative process ensures that both conscious and unconscious biases are recognized and addressed thoroughly. Moreover, seeking input from candidates throughout the hiring process can further enhance job descriptions promoting transparency and accountability.

Our Tool

HR Job Description Gender Bias Assessment Tool Screenshot

We’ve created a demo application that demonstrates the model’s ability to detect gender biases and present them in a way that is easy to understand and action. You can access it here.

Let’s test this with an example to see how it performs. We used a random job listing for a CTO position for a company based in Mumbai. Here’s what the model found:

Male Bias

As you can see, for this job listing, the model found that the job description was heavily biased towards male candidates. This being a CXO level position that is tech oriented, this result is unsurprising considering the contemporary understanding of who should be hired for such a role.

In order to mitigate this, it is important to have a more detailed understanding of the specific ways in which this job description is biased. The tool breaks down the job description into words, and analyzes the wording used. This is presented in a graphical format:

Gender Bias Distribution per relevant word

The tool also breaks down which word groups have the most impact when it comes to biases:

Word Weight Distribution

Finally, the tool bifurcates the weighting of each word into how biased it is towards either gender:

Female and Male Bias by Keywords

Future Steps

Our tool explores and addresses one key area of the hiring process where bias is common – job listings. However, the quest to achieve an equitable work environment free of bias is a long one. Here are some more ways that AI can help companies mitigate their biases:

Inclusion of other types of biases

Though in this case study we explored gender biases, there are other forms of biases that also need addressing. This model can be adapted to detect biases in race, economic class, religion, and more.

Business communication

Beyond job descriptions, companies communicate with their stakeholders in a myriad of ways. We can apply this model to allow businesses to make all their communication free of bias.

Internal Communication

Biases occur at a human level as well. In a work environment, it is important to make sure that all employees are communicating with each other with professionalism and respect. Our system can be used to moderate internal communications and achieve this goal.

Employee Satisfaction

Different people have different needs that they’d like their employer to fulfill. By analyzing which benefits and features appeal to which demographics, businesses can take steps in offering employees benefits which can encourage them to stay longer at the company and remain motivated.

Other Applications

Education

This tool can analyze textbooks, course materials, and educational content to identify and rectify gender biases, fostering more inclusive learning environments.

Media and Advertising

It can assess marketing campaigns, articles, and media content to promote gender equality and eliminate biased portrayals, contributing to more balanced representation in media.

Healthcare

The tool can analyze patient communication, medical literature, and healthcare policies to address gender disparities in healthcare access and treatment outcomes, ensuring more equitable healthcare delivery.

Legal and Justice Systems

It can review legal documents, court proceedings, and legislation to mitigate gender bias in legal practices and uphold equality before the law, promoting fairness and justice.