Fighting Misinformation and Promoting Plurality by Detecting Fake News
The Newsroom is an impact-driven startup with the goal of slowing the spread of misinformation and promoting plurality online. In this two-month Omdena Challenge, 50 AI changemakers built a model that creates scores of online news articles and claims on trust. The model makes the score transparent and also links to other articles to provide a balanced set of views on the topic.
The Problem
Over half a billion people in the world consume news online. A number that is increasing rapidly as more individuals gain access to the internet. While open access to information has been an incredible breakthrough of the digitized world, the democratization of content creation and distribution has also led to a rapid spread of false and highly biased information.
Trust in the news has plummeted, and with it, polarization has risen. Individuals – when unsure of what’s reliable – put their trust in content that simply confirms their prior beliefs. A confirmation bias that is further augmented by algorithmic echo-chambers, particularly prevalent on social media.
While this is an online phenomenon, instances such as the impact of misinformation campaigns in the 2016 U.S. election, and the spread of false health information in the wake of the Covid pandemic, have highlighted the dangerous impact these phenomena can have on millions of individuals offline.
The project goals
In this two-month challenge, the goal was to build a model that attaches a trust score to online news articles and claims and identifies related articles with opposing stances.
The scoring model is highly explainable, so users can understand what elements were taken into account for the given score. This was accomplished by not building one unique model, but a set of models, each addressing a specific piece of the puzzle.
The challenge involved some of the following tasks: claim extraction and matching, named entity recognition, document, and entity-level sentiment analysis, document classification, stance classification, among others.
Read more on how the team built the model and other methodologies in the article below.
The data
A combination of labeled and unlabeled data will be made available. The Newsroom has collected millions of unlabeled text data from online news articles (in English), the majority of which are of political nature or related to the current COVID pandemic.
Additionally, several labeled open-source datasets will be made available, containing: articles or individual claims classified as true/false; sources classified as reliable/unreliable; and article/claim pairs classified as agree/disagree / unrelated.
Part of this task was to assess whether existing open-source labeled datasets are effective in building supervised and/or semi-supervised classification models.
The Newsroom about the AI Challenge results
Your benefits
Join a thriving AI community in 88 countries
Work with changemakers from around the world
Adress a real-world problem with your skills
Build up your skill-set while setting the stage for a meaningful career
Requirements
Good English
A good/very good grasp in computer science and/or mathematics
Student, (aspiring) data scientist, (senior) ML engineer, data engineer, or domain expert (no need for AI expertise)
Programming experience with C/C++, C#, Java, Python, Javascript or similar
Understanding of NLP, ML and Deep learning algorithms
Application Form
Become an Omdena Collaborator