Fighting Misinformation and Promoting Plurality by Detecting Fake News
Project completed! Results attached!
The Newsroom is an impact-driven startup with the goal of slowing the spread of misinformation and promoting plurality online. In this two-month Omdena Challenge, 50 AI changemakers built a model that creates scores of online news articles and claims on trust. The model makes the score transparent and also links to other articles to provide a balanced set of views on the topic.
Over half a billion people in the world consume news online. A number that is increasing rapidly as more individuals gain access to the internet. While open access to information has been an incredible breakthrough of the digitized world, the democratization of content creation and distribution has also led to a rapid spread of false and highly biased information.
Trust in the news has plummeted, and with it, polarization has risen. Individuals – when unsure of what’s reliable – put their trust in content that simply confirms their prior beliefs. A confirmation bias that is further augmented by algorithmic echo-chambers, particularly prevalent on social media.
While this is an online phenomenon, instances such as the impact of misinformation campaigns in the 2016 U.S. election, and the spread of false health information in the wake of the Covid pandemic, have highlighted the dangerous impact these phenomena can have on millions of individuals offline.
The project goals
In this two-month challenge, the goal was to build a model that attaches a trust score to online news articles and claims and identifies related articles with opposing stances.
The scoring model is highly explainable, so users can understand what elements were taken into account for the given score. This was accomplished by not building one unique model, but a set of models, each addressing a specific piece of the puzzle.
The challenge involved some of the following tasks: claim extraction and matching, named entity recognition, document, and entity-level sentiment analysis, document classification, stance classification, among others.
Read more on how the team built the model and other methodologies in the article below.
A combination of labeled and unlabeled data will be made available. The Newsroom has collected millions of unlabeled text data from online news articles (in English), the majority of which are of political nature or related to the current COVID pandemic.
Additionally, several labeled open-source datasets will be made available, containing: articles or individual claims classified as true/false; sources classified as reliable/unreliable; and article/claim pairs classified as agree/disagree / unrelated.
Part of this task was to assess whether existing open-source labeled datasets are effective in building supervised and/or semi-supervised classification models.
The Newsroom about the AI Challenge results
From a broad scope and high complexity to high value via a very thoughtful development approach and technical solutions , which bring us many steps forward in our mission.