Detecting Bias in Articles Through Natural Language Processing
  • The Results

Detecting Bias in Articles Through Natural Language Processing

Project completed!

Mavin is an impact-driven startup that was founded around the explosion of misinformation in online media on topics such as Covid-19, climate change, elections, the effect of dairy, smoking, car congestion, and more. In this two-month Omdena Challenge, 50 AI changemakers built Natural Language Processing (NLP) models that detect the trustworthiness, bias, and quality of an article. 


The problem

On the internet, misinformation, bots, and trolls are dominating the online (political) discourse and have become the questionable new world standard. The examples of the 2020 Presidential campaign in the US and the Brexit campaign in 2016 no longer require any explanation and in 48 countries there were large-scale, intentional disinformation campaigns in 2019. Financial misinformation resulted in $17 billion in damages. Readers and organizations want to know what content to trust while publishers want to ban bad bots, trolls, and engage with high-profile users to drive the discussion and traffic. 

As the spread of misinformation around the Coronavirus has shown, this is something that needs to be solved urgently. The epidemic saw a flood of misinformation spread over the internet like the virus itself. (Infodemic) Misinformation is always hazardous, but during the Corona-crisis it can even have deadly consequences. Mavin aims to contribute to a better-informed society and expose and label fake news around Corona as well as any other misinformation or trolling campaign.


The solution


The Mavin AI

Mavin is building an AI to detect the quality and trustworthiness of an article. The AI rating of an article consists of multiple parameters. The weight of these parameters in the formula is defined by the crowd. A user can contribute to this if the user has a minimum Mavin Reputation Score of 65%. All individual preferences will be stored, anonymously, on the Mavin blockchain. Those same users can also suggest other parameters that we can add to calculate the AI score. Only Mavin Reputation Token holders with a minimum MRS can vote on which parameter is selected. 


The project outcomes

Bias Score

One of the parameters is the bias of an article. This bias score (how biased is the article) should detect biases such as Spin, Opinion Statements Presented as Fact, Sensationalism/Emotionalism (Types of Media Bias and How to Spot It | AllSides) and provide a score as to how biased the article is. AllSides provides 11 biases that we incorporate. We built a multi-label machine learning classifier where we take as input news articles (text) written in English. The output is a set of probabilities assigned to labels that describe the article (i.e. Spin, Sensationalism, Bias).

For the bias score, we have already set up the ML project and the annotation framework (Doccano) and the team worked on completing the annotations (approximately >1000 articles) and productionizing the ML model after that. 

Step 1: Creating the training set

  • Given a selection of news dataset, select the labels that best generalize the news articles (you can use the existing research and use Doccano)
  • Figure out what is the smallest annotatable unit (paragraph, sentence, two paragraphs, etc)
  • Annotate the article units and create the training set


Step 2: Creating the model

  • Create a language model (i.e. BERT2, GPT), which, given an article unit, determines the labels and their weights
  • Determine which quality metrics to score the model (Krippendorff’s alpha, kappa, rmse, etc) and compare it with the human judgment
  • Using the created model, create an API where given an article it returns a set of labels and their associated weights
  • Set up a pipeline that allows easy retraining of the model


Perplexity Score

In addition, we built a model to create a perplexity score. This will predict whether the sentences comprising the article make sense. Using the pre-trained BERT model (or any other model that is suggested) to calculate the average perplexity score of the article. The perplexity score will predict whether the sentences comprising the article make sense (e.g. “a book is on my desk” makes more sense than “a book is in my desk”, and even more sense than “a desk is in my book”).


This challenge has been hosted with our friends at