Fighting Misinformation and Fake News using NLP

Published
Jun 9, 2021
Reading Time
Rate this post
(11 votes)
Fighting Misinformation and Fake News using NLP

ACF

ID81233
keyfield_631f4d7797230
labelSummary
nameblog_summary
prefixacf
typewysiwyg
value<p>A complete pipeline using NLP to fight misinformation in news articles. In this two-month <a class="markup--anchor markup--blockquote-anchor" href="https://omdena.com/projects/ai-misinformation/" target="_blank" rel="noopener" data-href="https://omdena.com/projects/ai-misinformation/">challenge</a>, a group of 45+ collaborators prepared annotated news datasets, solved related classification problems, and built a browser extension to identify and summarize misinformation in news.</p>
parent52446
wrapperArray ( [width] => [class] => [id] => )
tabsall
toolbarbasic
_nameblog_summary
_valid1

Module Settings

custom_identifierSummary
acf_namefield_631f4d7797230
is_author_acf_fieldoff
post_object_acf_namenone
author_field_typeauthor_post
linked_user_acf_namenone
type_taxonomy_acf_namenone
acf_tagspan
show_labeloff
label_seperator:
visibilityon
empty_value_optionhide_module
use_iconoff
icon_color#7EBEC5
use_circleoff
circle_color#7EBEC5
use_circle_borderoff
circle_border_color#7EBEC5
use_icon_font_sizeoff
icon_image_placementleft
image_mobile_stackinginitial
return_formatarray
image_link_urloff
image_link_url_acf_namenone
checkbox_stylearray
checkbox_radio_returnlabel
checkbox_radio_value_typeoff
checkbox_radio_linkoff
link_buttonoff
email_subjectnone
email_body_afternone
add_css_classoff
add_css_loop_layoutoff
add_css_class_selectorbody
link_new_taboff
link_name_acfoff
link_name_acf_namenone
url_link_iconoff
image_sizefull
true_false_conditionoff
true_false_condition_css_selector.et_pb_button
true_false_text_trueTrue
true_false_text_falseFalse
is_audiooff
is_videooff
video_loopon
video_autoplayon
is_oembed_videooff
defer_videooff
defer_video_iconI||divi||400
video_icon_font_sizeoff
pretify_textoff
pretify_seperator,
number_decimal.
show_value_if_zerooff
text_imageoff
is_options_pageoff
is_repeater_loop_layoutoff
linked_post_stylecustom
link_post_seperator,
link_to_post_objecton
loop_layoutnone
columns4
columns_tablet2
columns_mobile1
repeater_dyn_btn_acfnone
text_before_positionsame_line
label_positionsame_line
vertical_alignmentmiddle
admin_labelSummary
module_classsummary
_builder_version4.16
_module_presetdefault
title_css_font_size14px
title_css_letter_spacing0px
title_css_line_height1em
acf_label_css_font_size14px
acf_label_css_letter_spacing0px
acf_label_css_line_height1em
label_css_text_color#333333
label_css_font_size20px
label_css_letter_spacing0px
label_css_line_height23.5px
text_before_css_font_size14px
text_before_css_letter_spacing0px
text_before_css_line_height1em
seperator_font_size14px
seperator_letter_spacing0px
seperator_line_height1em
relational_field_item_font_size14px
relational_field_item_letter_spacing0px
relational_field_item_line_height1em
background_enable_coloroff
use_background_color_gradientoff
background_color_gradient_repeatoff
background_color_gradient_typelinear
background_color_gradient_direction180deg
background_color_gradient_direction_radialcenter
background_color_gradient_stops#2b87da 0%|#29c4a9 100%
background_color_gradient_unit%
background_color_gradient_overlays_imageoff
background_color_gradient_start#2b87da
background_color_gradient_start_position0%
background_color_gradient_end#29c4a9
background_color_gradient_end_position100%
background_enable_imageon
parallaxoff
parallax_methodon
background_sizecover
background_image_widthauto
background_image_heightauto
background_positioncenter
background_horizontal_offset0
background_vertical_offset0
background_repeatno-repeat
background_blendnormal
background_enable_video_mp4on
background_enable_video_webmon
allow_player_pauseoff
background_video_pause_outside_viewporton
background_enable_pattern_styleoff
background_pattern_stylepolka-dots
background_pattern_colorrgba(0,0,0,0.2)
background_pattern_sizeinitial
background_pattern_widthauto
background_pattern_heightauto
background_pattern_repeat_origintop_left
background_pattern_horizontal_offset0
background_pattern_vertical_offset0
background_pattern_repeatrepeat
background_pattern_blend_modenormal
background_enable_mask_styleoff
background_mask_stylelayer-blob
background_mask_color#ffffff
background_mask_aspect_ratiolandscape
background_mask_sizestretch
background_mask_widthauto
background_mask_heightauto
background_mask_positioncenter
background_mask_horizontal_offset0
background_mask_vertical_offset0
background_mask_blend_modenormal
custom_buttonoff
button_text_size20
button_bg_use_color_gradientoff
button_bg_color_gradient_repeatoff
button_bg_color_gradient_typelinear
button_bg_color_gradient_direction180deg
button_bg_color_gradient_direction_radialcenter
button_bg_color_gradient_stops#2b87da 0%|#29c4a9 100%
button_bg_color_gradient_unit%
button_bg_color_gradient_overlays_imageoff
button_bg_color_gradient_start#2b87da
button_bg_color_gradient_start_position0%
button_bg_color_gradient_end#29c4a9
button_bg_color_gradient_end_position100%
button_bg_enable_imageon
button_bg_parallaxoff
button_bg_parallax_methodon
button_bg_sizecover
button_bg_image_widthauto
button_bg_image_heightauto
button_bg_positioncenter
button_bg_horizontal_offset0
button_bg_vertical_offset0
button_bg_repeatno-repeat
button_bg_blendnormal
button_bg_enable_video_mp4on
button_bg_enable_video_webmon
button_bg_allow_player_pauseoff
button_bg_video_pause_outside_viewporton
button_use_iconon
button_icon_placementright
button_on_hoveron
positioningnone
position_origin_atop_left
position_origin_ftop_left
position_origin_rtop_left
width660px
width_tablet90%
width_phone90%
width_last_editedon|phone
max_widthnone
module_alignmentcenter
min_heightauto
heightauto
max_heightnone
custom_margin60px||60px||true|false
custom_margin_tablet50px||50px||true|false
custom_margin_phone50px||50px||true|false
custom_margin_last_editedon|phone
custom_padding24px|21px|24px|21px|true|true
filter_hue_rotate0deg
filter_saturate100%
filter_brightness100%
filter_contrast100%
filter_invert0%
filter_sepia0%
filter_opacity100%
filter_blur0px
mix_blend_modenormal
animation_stylenone
animation_directioncenter
animation_duration1000ms
animation_delay0ms
animation_intensity_slide50%
animation_intensity_zoom50%
animation_intensity_flip50%
animation_intensity_fold50%
animation_intensity_roll50%
animation_starting_opacity0%
animation_speed_curveease-in-out
animation_repeatonce
hover_transition_duration300ms
hover_transition_delay0ms
hover_transition_speed_curveease
link_option_url_new_windowoff
sticky_positionnone
sticky_offset_top0px
sticky_offset_bottom0px
sticky_limit_topnone
sticky_limit_bottomnone
sticky_offset_surroundingon
sticky_transitionon
motion_trigger_startmiddle
hover_enabled0
label_css_font_size_tablet18px
label_css_font_size_phone16px
label_css_font_size_last_editedon|desktop
label_css_line_height_tablet23.5px
label_css_line_height_phone23.5px
label_css_line_height_last_editedon|phone
title_css_text_shadow_stylenone
title_css_text_shadow_horizontal_length0em
title_css_text_shadow_vertical_length0em
title_css_text_shadow_blur_strength0em
title_css_text_shadow_colorrgba(0,0,0,0.4)
acf_label_css_text_shadow_stylenone
acf_label_css_text_shadow_horizontal_length0em
acf_label_css_text_shadow_vertical_length0em
acf_label_css_text_shadow_blur_strength0em
acf_label_css_text_shadow_colorrgba(0,0,0,0.4)
label_css_text_shadow_stylenone
label_css_text_shadow_horizontal_length0em
label_css_text_shadow_vertical_length0em
label_css_text_shadow_blur_strength0em
label_css_text_shadow_colorrgba(0,0,0,0.4)
text_before_css_text_shadow_stylenone
text_before_css_text_shadow_horizontal_length0em
text_before_css_text_shadow_vertical_length0em
text_before_css_text_shadow_blur_strength0em
text_before_css_text_shadow_colorrgba(0,0,0,0.4)
seperator_text_shadow_stylenone
seperator_text_shadow_horizontal_length0em
seperator_text_shadow_vertical_length0em
seperator_text_shadow_blur_strength0em
seperator_text_shadow_colorrgba(0,0,0,0.4)
relational_field_item_text_shadow_stylenone
relational_field_item_text_shadow_horizontal_length0em
relational_field_item_text_shadow_vertical_length0em
relational_field_item_text_shadow_blur_strength0em
relational_field_item_text_shadow_colorrgba(0,0,0,0.4)
border_radiion|6px|6px|6px|6px
border_width_all2px
border_color_all#2c39b1
button_text_shadow_stylenone
button_text_shadow_horizontal_length0em
button_text_shadow_vertical_length0em
button_text_shadow_blur_strength0em
button_text_shadow_colorrgba(0,0,0,0.4)
box_shadow_stylepreset4
box_shadow_horizontal7px
box_shadow_vertical7px
box_shadow_blur0px
box_shadow_spread0px
box_shadow_color#e0e2ff
box_shadow_positionouter
box_shadow_style_buttonnone
box_shadow_color_buttonrgba(0,0,0,0.3)
box_shadow_position_buttonouter
text_shadow_stylenone
text_shadow_horizontal_length0em
text_shadow_vertical_length0em
text_shadow_blur_strength0em
text_shadow_colorrgba(0,0,0,0.4)
disabledoff
lockedoff
global_colors_info{}

A complete pipeline using NLP to fight misinformation in news articles. In this two-month challenge, a group of 45+ collaborators prepared annotated news datasets, solved related classification problems, and built a browser extension to identify and summarize misinformation in news.

Execution time: 0.0012 seconds

Execution time: 0.0003 seconds

AuthorsMahfuzur Rahman, Ann Chia, and Wilmer Gonzalez

In our globalized, digitalized world, information from a variety of sources can be disseminated at unprecedented speeds to widespread audiences. While this has proven to be largely beneficial, the world has also experienced a sharp rise in the pervasiveness of Fake News. This has become a global phenomenon that undermines not only the integrity of mainstream news media but could also cause societal instability.

Using examples from the 2016 US election, H. Allcott, and M. Gentzkow, in their article Social Media and Fake News in the 2016 Election, suggested that ‘one fake news article was about as persuasive as one TV campaign ad’ and has the potential to impact close political battles. The ubiquity of social media only makes the matter worse!

Social media is the biggest contributor to the spread of fake news. Source: H. Allcott, M. Gentzkow, J. Econ. Perspect. 31, 211 (2017)

Social media is the biggest contributor to the spread of fake news. Source: H. Allcott, M. Gentzkow, J. Econ. Perspect. 31, 211 (2017)


In ‘The science of fake news’ published in Science, David M. J. Lazer, et al. defines Fake news as “fabricated information that mimics news media content in form but not in organizational process or intent. Fake-news outlets, in turn, lack the news media’s editorial norms and processes for ensuring the accuracy and credibility of information. Fake news overlaps with other information disorders, such as misinformation (false or misleading information) and disinformation (false information that is purposely spread to deceive people).”

In this project, 45+ collaborators from Omdena partner with The Newsroom and take on a specific type of Fake News, Misinformation. The overarching goal of The Newsroom is to identify news articles and claims most likely to contain false or highly biased information, assign a trust score to them, and summarize through a product — NewsScore — a Browser extension. To assist us in this process, The NewsRoom provided us with an unlabeled dataset containing ~240K scraped news articles and suggested several published labeled datasets.

Setting the Goals

Given a news article, our first goal is to assign it a trust score based on the extent and types of misinformation that it propagates. To ensure transparency of the scoring process, we decided to build a set of models, each addressing a specific attribute of news misinformation. Upon discussion, the decided shortlisted attributes are Hate Speech, Clickbait, and Political bias. We further divided this goal into two parts:

  • In-house dataset preparation: Using the unlabeled dataset provided by The Newsroom, prepare three labeled datasets, each focusing on a specific attribute.
  • Transparency modeling: Prepare classification models for Hate Speech, Clickbait, and Political bias using open-source and In-house datasets.

Our second goal is to build models for claim detection and verification in news. Given our capacity in two months, we decided to only tackle the ‘claims detection’ problem.

Our final goal is to build a minimal viable product (MVP) that will tie everything together. For this, we decided to build a google chrome extension.

The workflow of this project is visualized below:

Goals and deliverables (transparency and claims detection modeling are shown under the umbrella term ‘NLP modeling’). Source: Omdena

Goals and deliverables (transparency and claims detection modeling are shown under the umbrella term ‘NLP modeling’). Source: Omdena


In-house dataset preparation

One of the primary goals of the project is to prepare in-house datasets from unlabeled news articles provided by The NewsRoom. The resulting datasets will be used to solve diverse misinformation-associated problems, for example, hate speech detection, political bias identification, clickbait detection, claims detection, and verification. The following subsections provide an overview of the in-house dataset generation process and a summary of the resulting datasets.

Dataset labeling process

We planned a generic approach to label datasets and used it to label hate speech, clickbait, and political bias datasets. The first step in the dataset labeling life cycle (Figure 1) is the choice of an appropriate labeling tool. To this end, we explored different tools such as HumanFirst, Labelbox, Labelstudio, etc., but eventually selected HumanFirst for speed and ease of use.

The second step in the labeling life cycle is to prepare a set of guidelines. These guidelines are problem-specific and include the definition of the problem with examples, the exact labels to assign, etc. with the ultimate goal of achieving consistency of labels from different collaborators.

The first two steps in the labeling process are data agnostic, meaning we don’t consult the actual news articles to complete the steps. However, the remaining steps are data-dependent, starting with the NewsRoom unlabeled data and ending with the final labeled data.

The huge amount of unlabeled data at hand makes it impossible to label the entire data. The third step in the life cycle makes it manageable by suggesting a shortlisted ‘Unlabeled dataset’. For each problem, we used a combination of supervised (based on published labeled datasets) and unsupervised techniques to subsample a small, albeit representative dataset to be labeled in 1–2 weeks. All of these datasets are generated at the sentence level.

Figure 1: Dataset labeling life cycle. Source: Omdena

Figure 1: Dataset labeling life cycle. Source: Omdena


The fourth step is crowdsourcing (actual labeling) to get independent labels from collaborators. We aimed for 3x labeling of each sentence, however, we were able to get 2x labeling for most of the shortlisted datasets.

Even though we spent a considerable amount of time preparing consistent guidelines, we still found many labeling mismatches (conflicts). The final step is to resolve the conflicts; we assigned additional people to specifically label those examples to reach a consensus. This gives us our final in-house dataset(s).

Case Study: Clickbait

With HumanFirst already selected as the labeling tool, we start at step two: generating guidelines. We studied different articles to define ‘clickbait’ and use examples to clearly demonstrate them. Table 1 shows different types of ‘clickbait’ with illustrative examples.

Table 1: Definition and examples of clickbait. Source: Omdena

Table 1: Definition and examples of clickbait. Source: Omdena


In the next step, we parsed out all the headlines from the NewsRoom articles. We then trained a Universal Sentence Encoder (USE) based model on an independent dataset and used that model to predict clickbait probability scores (0 to 1) for all the NewsRoom headlines. We randomly sampled 10,000 headlines encompassing different ranges of clickbait scores. This gives us a uniform representation of different types of headlines. Finally, we converted those to HumanFirst format, divided all of the sentences and articles into 5 different datasets, and uploaded them to HumanFirst for labeling. This is our shortlisted unlabeled dataset.

The datasets are then independently 2x labeled by different collaborators using HumanFirst. We then exported all the datasets from HumanFirst, resolved the conflicts, and prepared the final dataset. The final ‘in-house’ labeled clickbait dataset contains 9,954 article headlines.

Summary of in-house datasets

In this section, we summarize all three independent datasets we prepared using the dataset labeling lifecycle.

The hate speech dataset is the most imbalanced of (Figure 2, 1% hate vs 99% no hate examples) our labeled datasets. One of the reasons behind this imbalance is that hate speech is very underrepresented in mainstream news articles. Another potential reason is that our approach to shortlist sentences for hate speech is based on the presence of hate words and bi-grams from a previous study which could be limited and outdated.

Figure 2: Summary of in-house datasets. Source: Omdena

Figure 2: Summary of in-house datasets. Source: Omdena


Clickbait dataset is probably our best in-house dataset in terms of quality and representation. This is partly because clickbait detection is a relatively easier problem. For this dataset, we were able to consistently ensure 2x labeling.

The political bias dataset is the last one we labeled. We spent a good amount of time finding a good candidate unlabeled dataset, however, most of the examples were only labeled by one collaborator. Therefore, the quality of the dataset is worse than the previous two. We were also unable to get good coverage as only half of the 10,000 examples were labeled.

We also prepared an in-house labeled dataset (1000 examples only) for claim detection. This dataset did not follow the labeling lifecycle and due to limited capacity, only one collaborator was assigned to this. Exploration and extension of this in more detail could be future work.

Claims detection modeling

We defined a claim as “A statement about the world that can be verified”. The Claims Detection models function as binary classification tasks, grouping input sentences as Check-Worthy Factual Sentences (CFS) and Non-Factual Sentences (NFS). This labeling convention, as well as the model codes, tested, stem from the open-sourced ClaimSpotter publication and GitHub.

In this study, baseline models BiLSTM and SVM are proposed, along with transformer models BERT, DistilBERT, and RoBERTa. However, for this project, only the BiLSTM model was tested and integrated into the MVP.

Initially, the BiLSTM model achieved an F1 score of approximately 70%. However, upon fine-tuning and tweaks to the model parameters, this baseline model achieved an F1-score of ~74% detection rate for the positive class of Check-Worthy Factual Sentences (CFS).

Transformer models were also tested, based on the ClaimSpotter publication. In this study, BERT, DistilBERT, and RoBERTa were tested. In addition, the authors added adversarial perturbations to each of the transformers, which prevented overfitting and improved model accuracy when tested on unknown data in their study. The models were published open-source to GitHub, and all of them, in the base version and with the added adversarial layers, were tested for the scope of this project. Prioritizing a balance of detection accuracy and model training time, we found that the BERT-based model (without adversarial perturbations) outperformed all other models (F1-score of 0.8338 for CFS).

Figure 3: Claims detection on the ‘Claimbuster’ dataset. Source: Omdena

Figure 3: Claims detection on the ‘Claimbuster’ dataset. Source: Omdena


Transparency Modeling

Transparency modeling includes the preparation of classifiers based on published datasets for hate speech, clickbait, and political bias classification. Collaborators build many independent models for these problems. We benchmarked the models and selected the best one(s) based on the F1-score of the positive class (hate, clickbait, or politically biased). Finally, we evaluated the models on the in-house datasets prepared earlier.

Hate Speech Classification

Hate speech classification is a binary problem at the sentence level where each sentence is labeled as either ‘hate’ or ‘no-hate’. We used two openly available datasets for this classification problem: StormFront (based on a forum) and Crowdflower (based on tweets). Even though we prepared classification models on both datasets, only the StormFront dataset is binary by nature, and therefore we spent the majority of our time modeling StormFront data.

The full StormFront dataset is highly imbalanced, and therefore, we prepared two different datasets from that: one with the full dataset and the second with a subsample of the dataset and is balanced. We built several classification models for these two datasets separately.

On the balanced dataset, a BERT + CNN-based model achieved the best F1-score of 0.812. Another USE-based model was a very close second. Several traditional machine learning algorithms (Naive Bayes, Random Forest, and Support Vector Classifier) provided close performances.

Figure 4: Hate speech classification on ‘StormFront’ dataset

Figure 4: Hate speech classification on ‘StormFront’ dataset


Clickbait classification

Clickbait is also a binary classification problem, where given an article headline, we try to predict whether an article is clickbait or not. For this problem, we used a dataset from Kaggle, which in reality is a combination of two datasets. We name it the ‘combined’ dataset.

We built several classification models on the combined dataset including xgboost, BERT-based model, and a USE-based model. According to the F1-score comparison, xgboost with a comprehensive set of features performed the best, however, it came at a cost of time and memory inefficiency. Moreover, the gain over a comparable xgboost model with a smaller set of features was minimal (F1-score of 0.905 vs 0.902). Therefore, we think a simpler xgboost model would probably be the practical best solution.

Figure 5: Clickbait classification on the ‘combined’ dataset. Source: Omdena

Figure 5: Clickbait classification on the ‘combined’ dataset. Source: Omdena


Political bias Classification

Classification of political bias can be done as either a binary problem (biased or not biased) or a three-label problem (left/liberal, center/neutral, and right/conservative). We investigated both options but due to bad performance on 3 label problems, we decided to solve political bias as a binary classification problem.

Additionally, different freely available datasets available for political bias label political bias either at the full article level or individual sentence level. Examples of article-level datasets are DeepBlue and Baly et al. datasets, and an example of a sentence-level dataset is the IBC dataset. Here we exhibit the performances of Baly et al. and IBC datasets.

For article-level classification with Baly et al. dataset, we built tree-based classifiers Random Forest and xgboost, and transformer-based classifiers RoBERTa and LongFormer, where RoBERTa outperformed other models (F1-score of 0.79). For sentence-level classification on the IBC dataset, we tried a Naive Bayes model and a USE-based model. We found the USE-based model to perform the best (F1-score of 0.90). Our study suggests that classification at the article level is considerably more challenging than classifying at the sentence level.

Figure 6: Political bias classification on ‘article’ level and ‘sentence’ level datasets. Source: Omdena

Figure 6: Political bias classification on ‘article’ level and ‘sentence’ level datasets. Source: Omdena


In-house data modeling

Once our in-house labeled data were ready, we evaluated those datasets for the three transparency modeling problems. We first applied some of the models built from freely available datasets off-the-shelf on the in-house datasets but the performance was poor. Therefore, we decided to separately model the in-house datasets.

Figure 7 summarizes the results of modeling on the in-house datasets. For hate speech, we build models separately for the original imbalanced dataset and a subsampled balanced dataset. Using a USE-based approach, we found that balancing improves the F1-score from 0.31 to 0.51, however, compared to the StormFront dataset, the performance is still inferior.

For the clickbait in-house dataset, we used a xgboost model and a USE-based model (similar to the combined dataset modeling), and we found the USE-based model to outperform the xgboost model. However, both of these approaches performed substantially worse as our best F1-score reduced to 0.42 (from ~0.90 in the combined dataset).

As our political bias in-house dataset has three labels (left, center, and right), we first merged the left and center for binary classification and prepared a binary dataset. We used a USE-based model on this dataset, however, we were only able to get an F1-score of 0.17.

Figure 7: Performance of in-house datasets. Source: Omdena

Figure 7: Performance of in-house datasets. Source: Omdena


Overall, all the in-house datasets performed comparatively poorly, and it indicates a scope to improve the in-house datasets.

Minimal Viable Product: NewsScore

To demonstrate how the delivered models can be used, and considering The Newsroom’s vision on developing a Browser Extension, a basic version of such an extension was developed. The extension is named NewsScore. In Figure 8, we show how the extension works in practice. When a user visits a news article, the extension takes that article as input and prepares a report in the back-end. When the user clicks on the extension, it visualizes the report and provides additional options to interact with the extension.

Figure 8: The web extension, NewsScore, in practice. Source: Omdena

Figure 8: The web extension, NewsScore, in practice. Source: Omdena


Figure 9 shows different components of the extension, schematically, and zoom in on the NewsScore report from Figure 8. NewsScore has the following features:

  • An initial report about the whole article regards the presence of clickbait, bias, or hate speech, along with more detailed information on each section about why the article received the given score. Currently, in the reliability of information section, only detected CFS are printed, in the future, this CFS will be the input for new features such as claim verification, report of a specific claim, and others. (This module is also available in the form of an option for the user to highlight a sentence from the article and apply any of the available tools above to it. Currently, this feature is only used to include text into the claim detected lists for demonstration purposes.)
  • A section where the user can provide feedback for the app regards the provided score for each section to improve the extension over time with more sophisticated approaches like active learning.
  • A (disabled) section for related articles that may be populated in the future for showing similar articles, with better scores about the same topic for example.
Figure 9: A browser extension MVP. Source: Omdena

Figure 9: A browser extension MVP. Source: Omdena


Some features were handed to The Newsroom team as is with few steps left so they can be directly used by an end-user. In the future, it would be useful to integrate different modeling approaches in the MVP back-end and enhance the front-end with useful data visualizations. Once completed, the chrome extension will provide an article summary that recapitulates the news article by providing an overall news score, transparency scores for Hate speech, clickbait, and political bias, and a score for claim verification (reliable information).

Conclusion and Future Directions

In conclusion, the envisioned goals for this project were successfully achieved, with in-house labeled datasets generated for Political Bias, Hate Speech, and Clickbait. Transparency Models were also trained for the detection of the aforementioned three attributes, as well as for claims detection. Finally, an MVP was produced for front-end model deployment and display of the news article trust score.

Evidently, we, a team of 45+ collaborators, achieved a considerable result in an 8-week time span. Nonetheless, this work could be further extended and furnished Possible areas of future exploration are listed below:

  • Prepare higher quality datasets to ensure 3x labeling for all of the in-house datasets. Extensively explore model training and evaluation on the in-house datasets.
  • Use domain-specific approaches to model the data. One example is NewsBERT, a recent development, which we did not have time to explore.
  • Explore different approaches to generate an aggregated news (trust) score from the results of transparency and claim detection models.
  • Implement additional transparency models for other attributes of misinformation. For instance, detection of Machine-Generated Text was initially explored but eventually halted to prioritize the aforementioned classification models given the timeline of this project.
  • Design storage strategies so each news article is efficiently processed for the users by reusing previous visits to it.
  • Design a common feature representation so the models can reuse these features across each score generation.
  • Explore different approaches to model deployment on the MVP, particularly those of Transformer modeling.
  • Port the MVP extension code to more scalable and robust technologies like Vue for increased performance.
  • Apply each prediction right into the news article in the form of a highlighted text so the user can spot any phenomena occurrence (clickbait, bias, hate speech, and so on).
  • Calibrate further the process of sentence segmentation that is transversal to all classification problems and could be optimized for news article sentences, for example, considering the role of social media citations within the text.
  • Incorporate active learning practices so the feedback from the user can help modeling algorithms to improve their predictions.

Finally, we acknowledge all the collaborators for their hard work, our labeling partner HumanFirst for assistance in labeling, our client The Newsroom for their close cooperation and feedback, and Omdena for making this project possible!

ACF

ID58157
keyfield_623341deec7d0
labelPhoto
namephoto
prefixacf
typeimage
valueArray ( [ID] => 88119 [id] => 88119 [title] => Favicon [filename] => Favicon.png [filesize] => 69919 [url] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [link] => https://omdena.com/favicon/ [alt] => Favicon [author] => 19 [description] => Favicon [caption] => Favicon [name] => favicon [status] => inherit [uploaded_to] => 16994 [date] => 2022-10-10 12:36:26 [modified] => 2022-10-10 13:36:24 [menu_order] => 0 [mime_type] => image/png [type] => image [subtype] => png [icon] => https://omdena.com/wp-includes/images/media/default.png [width] => 512 [height] => 512 [sizes] => Array ( [thumbnail] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [thumbnail-width] => 96 [thumbnail-height] => 96 [medium] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [medium-width] => 512 [medium-height] => 512 [medium_large] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [medium_large-width] => 512 [medium_large-height] => 512 [large] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [large-width] => 512 [large-height] => 512 [1536x1536] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [1536x1536-width] => 512 [1536x1536-height] => 512 [2048x2048] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [2048x2048-width] => 512 [2048x2048-height] => 512 [et-pb-post-main-image] => https://omdena.com/wp-content/uploads/2022/10/Favicon-400x250.png [et-pb-post-main-image-width] => 400 [et-pb-post-main-image-height] => 250 [et-pb-post-main-image-fullwidth] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [et-pb-post-main-image-fullwidth-width] => 512 [et-pb-post-main-image-fullwidth-height] => 512 [et-pb-portfolio-image] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [et-pb-portfolio-image-width] => 284 [et-pb-portfolio-image-height] => 284 [et-pb-portfolio-module-image] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [et-pb-portfolio-module-image-width] => 382 [et-pb-portfolio-module-image-height] => 382 [et-pb-portfolio-image-single] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [et-pb-portfolio-image-single-width] => 512 [et-pb-portfolio-image-single-height] => 512 [et-pb-gallery-module-image-portrait] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [et-pb-gallery-module-image-portrait-width] => 400 [et-pb-gallery-module-image-portrait-height] => 400 [et-pb-post-main-image-fullwidth-large] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [et-pb-post-main-image-fullwidth-large-width] => 512 [et-pb-post-main-image-fullwidth-large-height] => 512 [et-pb-image--responsive--desktop] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [et-pb-image--responsive--desktop-width] => 512 [et-pb-image--responsive--desktop-height] => 512 [et-pb-image--responsive--tablet] => https://omdena.com/wp-content/uploads/2022/10/Favicon.png [et-pb-image--responsive--tablet-width] => 512 [et-pb-image--responsive--tablet-height] => 512 [et-pb-image--responsive--phone] => https://omdena.com/wp-content/uploads/2022/10/Favicon-480x480.png [et-pb-image--responsive--phone-width] => 270 [et-pb-image--responsive--phone-height] => 270 ))
menu_order1
parent58155
wrapperArray ( [width] => [class] => [id] => )
return_formatarray
preview_sizethumbnail
libraryall
_namephoto
_valid1

Module Settings

custom_identifierImage
acf_namefield_623341deec7d0
is_author_acf_fieldoff
post_object_acf_namenone
author_field_typeauthor_post
linked_user_acf_namenone
type_taxonomy_acf_namenone
acf_tagdiv
show_labeloff
label_seperator:
visibilityon
empty_value_optionhide_module
use_iconoff
icon_color#7EBEC5
use_circleoff
circle_color#7EBEC5
use_circle_borderoff
circle_border_color#7EBEC5
use_icon_font_sizeoff
icon_image_placementleft
image_mobile_stackingcolumn
return_formatarray
image_link_urloff
image_link_url_acf_namenone
checkbox_stylearray
checkbox_radio_returnlabel
checkbox_radio_value_typeoff
checkbox_radio_linkoff
link_buttonoff
email_subjectnone
email_body_afternone
add_css_classoff
add_css_loop_layoutoff
add_css_class_selectorbody
link_new_tabon
link_name_acfoff
link_name_acf_namenone
url_link_iconoff
image_sizefull
true_false_conditionoff
true_false_condition_css_selector.et_pb_button
true_false_text_trueTrue
true_false_text_falseFalse
is_audiooff
is_videooff
video_loopon
video_autoplayon
is_oembed_videooff
defer_videooff
defer_video_iconI||divi||400
video_icon_font_sizeoff
pretify_textoff
pretify_seperator,
number_decimal.
show_value_if_zerooff
text_imageoff
is_options_pageoff
is_repeater_loop_layoutoff
linked_post_stylecustom
link_post_seperator,
link_to_post_objecton
loop_layoutnone
columns4
columns_tablet2
columns_mobile1
repeater_dyn_btn_acfnone
button_alignmentcenter
text_before_positionsame_line
label_positionsame_line
vertical_alignmentmiddle
image_max_width_last_editedon|phone
admin_labelPhoto
_builder_version4.16
_module_presetdefault
title_css_font_size14px
title_css_letter_spacing0px
title_css_line_height1em
acf_label_css_font_size14px
acf_label_css_letter_spacing0px
acf_label_css_line_height1em
label_css_letter_spacing0px
text_before_css_font_size14px
text_before_css_letter_spacing0px
text_before_css_line_height1em
seperator_font_size14px
seperator_letter_spacing0px
seperator_line_height1em
relational_field_item_font_size14px
relational_field_item_letter_spacing0px
relational_field_item_line_height1em
background_enable_coloron
use_background_color_gradientoff
background_color_gradient_repeatoff
background_color_gradient_typelinear
background_color_gradient_direction180deg
background_color_gradient_direction_radialcenter
background_color_gradient_stops#2b87da 0%|#29c4a9 100%
background_color_gradient_unit%
background_color_gradient_overlays_imageoff
background_color_gradient_start#2b87da
background_color_gradient_start_position0%
background_color_gradient_end#29c4a9
background_color_gradient_end_position100%
background_enable_imageon
parallaxoff
parallax_methodon
background_sizecover
background_image_widthauto
background_image_heightauto
background_positioncenter
background_horizontal_offset0
background_vertical_offset0
background_repeatno-repeat
background_blendnormal
background_enable_video_mp4on
background_enable_video_webmon
allow_player_pauseoff
background_video_pause_outside_viewporton
background_enable_pattern_styleoff
background_pattern_stylepolka-dots
background_pattern_colorrgba(0,0,0,0.2)
background_pattern_sizeinitial
background_pattern_widthauto
background_pattern_heightauto
background_pattern_repeat_origintop_left
background_pattern_horizontal_offset0
background_pattern_vertical_offset0
background_pattern_repeatrepeat
background_pattern_blend_modenormal
background_enable_mask_styleoff
background_mask_stylelayer-blob
background_mask_color#ffffff
background_mask_aspect_ratiolandscape
background_mask_sizestretch
background_mask_widthauto
background_mask_heightauto
background_mask_positioncenter
background_mask_horizontal_offset0
background_mask_vertical_offset0
background_mask_blend_modenormal
custom_buttonoff
button_text_size20
button_bg_use_color_gradientoff
button_bg_color_gradient_repeatoff
button_bg_color_gradient_typelinear
button_bg_color_gradient_direction180deg
button_bg_color_gradient_direction_radialcenter
button_bg_color_gradient_stops#2b87da 0%|#29c4a9 100%
button_bg_color_gradient_unit%
button_bg_color_gradient_overlays_imageoff
button_bg_color_gradient_start#2b87da
button_bg_color_gradient_start_position0%
button_bg_color_gradient_end#29c4a9
button_bg_color_gradient_end_position100%
button_bg_enable_imageon
button_bg_parallaxoff
button_bg_parallax_methodon
button_bg_sizecover
button_bg_image_widthauto
button_bg_image_heightauto
button_bg_positioncenter
button_bg_horizontal_offset0
button_bg_vertical_offset0
button_bg_repeatno-repeat
button_bg_blendnormal
button_bg_enable_video_mp4on
button_bg_enable_video_webmon
button_bg_allow_player_pauseoff
button_bg_video_pause_outside_viewporton
button_use_iconon
button_icon_placementright
button_on_hoveron
positioningnone
position_origin_atop_left
position_origin_ftop_left
position_origin_rtop_left
width100%
max_widthnone
max_width_tablet25%
max_width_phone25%
max_width_last_editedon|tablet
module_alignmentcenter
min_heightauto
heightauto
max_heightnone
custom_margin_tablet||0px||false|false
custom_margin_phone||0px||false|false
custom_margin_last_editedon|phone
filter_hue_rotate0deg
filter_saturate100%
filter_brightness100%
filter_contrast100%
filter_invert0%
filter_sepia0%
filter_opacity100%
filter_blur0px
mix_blend_modenormal
animation_stylenone
animation_directioncenter
animation_duration1000ms
animation_delay0ms
animation_intensity_slide50%
animation_intensity_zoom50%
animation_intensity_flip50%
animation_intensity_fold50%
animation_intensity_roll50%
animation_starting_opacity0%
animation_speed_curveease-in-out
animation_repeatonce
hover_transition_duration300ms
hover_transition_delay0ms
hover_transition_speed_curveease
link_option_url_new_windowoff
sticky_positionnone
sticky_offset_top0px
sticky_offset_bottom0px
sticky_limit_topnone
sticky_limit_bottomnone
sticky_offset_surroundingon
sticky_transitionon
motion_trigger_startmiddle
hover_enabled0
title_css_text_shadow_stylenone
title_css_text_shadow_horizontal_length0em
title_css_text_shadow_vertical_length0em
title_css_text_shadow_blur_strength0em
title_css_text_shadow_colorrgba(0,0,0,0.4)
acf_label_css_text_shadow_stylenone
acf_label_css_text_shadow_horizontal_length0em
acf_label_css_text_shadow_vertical_length0em
acf_label_css_text_shadow_blur_strength0em
acf_label_css_text_shadow_colorrgba(0,0,0,0.4)
label_css_text_shadow_stylenone
label_css_text_shadow_horizontal_length0em
label_css_text_shadow_vertical_length0em
label_css_text_shadow_blur_strength0em
label_css_text_shadow_colorrgba(0,0,0,0.4)
text_before_css_text_shadow_stylenone
text_before_css_text_shadow_horizontal_length0em
text_before_css_text_shadow_vertical_length0em
text_before_css_text_shadow_blur_strength0em
text_before_css_text_shadow_colorrgba(0,0,0,0.4)
seperator_text_shadow_stylenone
seperator_text_shadow_horizontal_length0em
seperator_text_shadow_vertical_length0em
seperator_text_shadow_blur_strength0em
seperator_text_shadow_colorrgba(0,0,0,0.4)
relational_field_item_text_shadow_stylenone
relational_field_item_text_shadow_horizontal_length0em
relational_field_item_text_shadow_vertical_length0em
relational_field_item_text_shadow_blur_strength0em
relational_field_item_text_shadow_colorrgba(0,0,0,0.4)
border_radiion|100%|100%|100%|100%
border_radii_tableton||||
border_radii_phoneon|100%|100%|100%|100%
border_radii_last_editedon|phone
button_text_shadow_stylenone
button_text_shadow_horizontal_length0em
button_text_shadow_vertical_length0em
button_text_shadow_blur_strength0em
button_text_shadow_colorrgba(0,0,0,0.4)
box_shadow_stylenone
box_shadow_colorrgba(0,0,0,0.3)
box_shadow_positionouter
box_shadow_style_buttonnone
box_shadow_color_buttonrgba(0,0,0,0.3)
box_shadow_position_buttonouter
text_shadow_stylenone
text_shadow_horizontal_length0em
text_shadow_vertical_length0em
text_shadow_blur_strength0em
text_shadow_colorrgba(0,0,0,0.4)
disabledoff
global_colors_info{}
Favicon

Execution time: 0.0040 seconds

ACF

ID58156
keyfield_623341caec7cf
labelName
nameblog_author_name
prefixacf
typetext
valueOmdena
parent58155
wrapperArray ( [width] => [class] => [id] => )
_nameblog_author_name
_valid1

Module Settings

custom_identifierACF Item
acf_namefield_623341caec7cf
is_author_acf_fieldoff
post_object_acf_namenone
author_field_typeauthor_post
linked_user_acf_namenone
type_taxonomy_acf_namenone
acf_tagp
show_labeloff
label_seperator:
visibilityon
empty_value_optionhide_module
use_iconoff
icon_color#7EBEC5
use_circleoff
circle_color#7EBEC5
use_circle_borderoff
circle_border_color#7EBEC5
use_icon_font_sizeoff
icon_image_placementleft
image_mobile_stackinginitial
return_formatarray
image_link_urloff
image_link_url_acf_namenone
checkbox_stylearray
checkbox_radio_returnlabel
checkbox_radio_value_typeoff
checkbox_radio_linkoff
link_buttonoff
email_subjectnone
email_body_afternone
add_css_classoff
add_css_loop_layoutoff
add_css_class_selectorbody
link_new_tabon
link_name_acfoff
link_name_acf_namenone
url_link_iconoff
image_sizefull
true_false_conditionoff
true_false_condition_css_selector.et_pb_button
true_false_text_trueTrue
true_false_text_falseFalse
is_audiooff
is_videooff
video_loopon
video_autoplayon
is_oembed_videooff
defer_videooff
defer_video_iconI||divi||400
video_icon_font_sizeoff
pretify_textoff
pretify_seperator,
number_decimal.
show_value_if_zerooff
text_imageoff
is_options_pageoff
is_repeater_loop_layoutoff
linked_post_stylecustom
link_post_seperator,
link_to_post_objecton
loop_layoutnone
columns4
columns_tablet2
columns_mobile1
repeater_dyn_btn_acfnone
text_before_positionsame_line
label_positionsame_line
vertical_alignmentmiddle
admin_labelName
_builder_version4.21.0
_module_presetdefault
title_css_text_alignleft
title_css_font_size14px
title_css_letter_spacing0px
title_css_line_height1em
acf_label_css_text_alignleft
acf_label_css_font_size14px
acf_label_css_letter_spacing0px
acf_label_css_line_height1em
label_css_fontRoboto|700|||||||
label_css_text_alignleft
label_css_letter_spacing0px
text_before_css_font_size14px
text_before_css_letter_spacing0px
text_before_css_line_height1em
seperator_font_size14px
seperator_letter_spacing0px
seperator_line_height1em
relational_field_item_font_size14px
relational_field_item_letter_spacing0px
relational_field_item_line_height1em
background_enable_coloron
use_background_color_gradientoff
background_color_gradient_repeatoff
background_color_gradient_typelinear
background_color_gradient_direction180deg
background_color_gradient_direction_radialcenter
background_color_gradient_stops#2b87da 0%|#29c4a9 100%
background_color_gradient_unit%
background_color_gradient_overlays_imageoff
background_color_gradient_start#2b87da
background_color_gradient_start_position0%
background_color_gradient_end#29c4a9
background_color_gradient_end_position100%
background_enable_imageon
parallaxoff
parallax_methodon
background_sizecover
background_image_widthauto
background_image_heightauto
background_positioncenter
background_horizontal_offset0
background_vertical_offset0
background_repeatno-repeat
background_blendnormal
background_enable_video_mp4on
background_enable_video_webmon
allow_player_pauseoff
background_video_pause_outside_viewporton
background_enable_pattern_styleoff
background_pattern_stylepolka-dots
background_pattern_colorrgba(0,0,0,0.2)
background_pattern_sizeinitial
background_pattern_widthauto
background_pattern_heightauto
background_pattern_repeat_origintop_left
background_pattern_horizontal_offset0
background_pattern_vertical_offset0
background_pattern_repeatrepeat
background_pattern_blend_modenormal
background_enable_mask_styleoff
background_mask_stylelayer-blob
background_mask_color#ffffff
background_mask_aspect_ratiolandscape
background_mask_sizestretch
background_mask_widthauto
background_mask_heightauto
background_mask_positioncenter
background_mask_horizontal_offset0
background_mask_vertical_offset0
background_mask_blend_modenormal
custom_buttonoff
button_text_size20
button_bg_use_color_gradientoff
button_bg_color_gradient_repeatoff
button_bg_color_gradient_typelinear
button_bg_color_gradient_direction180deg
button_bg_color_gradient_direction_radialcenter
button_bg_color_gradient_stops#2b87da 0%|#29c4a9 100%
button_bg_color_gradient_unit%
button_bg_color_gradient_overlays_imageoff
button_bg_color_gradient_start#2b87da
button_bg_color_gradient_start_position0%
button_bg_color_gradient_end#29c4a9
button_bg_color_gradient_end_position100%
button_bg_enable_imageon
button_bg_parallaxoff
button_bg_parallax_methodon
button_bg_sizecover
button_bg_image_widthauto
button_bg_image_heightauto
button_bg_positioncenter
button_bg_horizontal_offset0
button_bg_vertical_offset0
button_bg_repeatno-repeat
button_bg_blendnormal
button_bg_enable_video_mp4on
button_bg_enable_video_webmon
button_bg_allow_player_pauseoff
button_bg_video_pause_outside_viewporton
button_use_iconon
button_icon_placementright
button_on_hoveron
positioningnone
position_origin_atop_left
position_origin_ftop_left
position_origin_rtop_left
text_orientationleft
widthauto
max_widthnone
module_alignmentleft
min_heightauto
heightauto
max_heightnone
custom_margin_tablet||10px||false|false
custom_margin_phone||10px||false|false
custom_margin_last_editedon|tablet
custom_padding5px||||false|false
filter_hue_rotate0deg
filter_saturate100%
filter_brightness100%
filter_contrast100%
filter_invert0%
filter_sepia0%
filter_opacity100%
filter_blur0px
mix_blend_modenormal
animation_stylenone
animation_directioncenter
animation_duration1000ms
animation_delay0ms
animation_intensity_slide50%
animation_intensity_zoom50%
animation_intensity_flip50%
animation_intensity_fold50%
animation_intensity_roll50%
animation_starting_opacity0%
animation_speed_curveease-in-out
animation_repeatonce
hover_transition_duration300ms
hover_transition_delay0ms
hover_transition_speed_curveease
link_option_url_new_windowoff
sticky_positionnone
sticky_offset_top0px
sticky_offset_bottom0px
sticky_limit_topnone
sticky_limit_bottomnone
sticky_offset_surroundingon
sticky_transitionon
motion_trigger_startmiddle
hover_enabled0
title_css_text_align_tabletcenter
title_css_text_align_phonecenter
title_css_text_align_last_editedon|phone
acf_label_css_text_align_tabletcenter
acf_label_css_text_align_phonecenter
acf_label_css_text_align_last_editedon|phone
label_css_text_align_tabletcenter
label_css_text_align_phonecenter
label_css_text_align_last_editedon|desktop
text_orientation_tabletcenter
text_orientation_phonecenter
text_orientation_last_editedon|phone
module_alignment_tabletcenter
module_alignment_phonecenter
module_alignment_last_editedon|desktop
title_css_text_shadow_stylenone
title_css_text_shadow_horizontal_length0em
title_css_text_shadow_vertical_length0em
title_css_text_shadow_blur_strength0em
title_css_text_shadow_colorrgba(0,0,0,0.4)
acf_label_css_text_shadow_stylenone
acf_label_css_text_shadow_horizontal_length0em
acf_label_css_text_shadow_vertical_length0em
acf_label_css_text_shadow_blur_strength0em
acf_label_css_text_shadow_colorrgba(0,0,0,0.4)
label_css_text_shadow_stylenone
label_css_text_shadow_horizontal_length0em
label_css_text_shadow_vertical_length0em
label_css_text_shadow_blur_strength0em
label_css_text_shadow_colorrgba(0,0,0,0.4)
text_before_css_text_shadow_stylenone
text_before_css_text_shadow_horizontal_length0em
text_before_css_text_shadow_vertical_length0em
text_before_css_text_shadow_blur_strength0em
text_before_css_text_shadow_colorrgba(0,0,0,0.4)
seperator_text_shadow_stylenone
seperator_text_shadow_horizontal_length0em
seperator_text_shadow_vertical_length0em
seperator_text_shadow_blur_strength0em
seperator_text_shadow_colorrgba(0,0,0,0.4)
relational_field_item_text_shadow_stylenone
relational_field_item_text_shadow_horizontal_length0em
relational_field_item_text_shadow_vertical_length0em
relational_field_item_text_shadow_blur_strength0em
relational_field_item_text_shadow_colorrgba(0,0,0,0.4)
button_text_shadow_stylenone
button_text_shadow_horizontal_length0em
button_text_shadow_vertical_length0em
button_text_shadow_blur_strength0em
button_text_shadow_colorrgba(0,0,0,0.4)
box_shadow_stylenone
box_shadow_colorrgba(0,0,0,0.3)
box_shadow_positionouter
box_shadow_style_buttonnone
box_shadow_color_buttonrgba(0,0,0,0.3)
box_shadow_position_buttonouter
text_shadow_stylenone
text_shadow_horizontal_length0em
text_shadow_vertical_length0em
text_shadow_blur_strength0em
text_shadow_colorrgba(0,0,0,0.4)
disabledoff
global_colors_info{}

Omdena

Execution time: 0.0009 seconds

Execution time: 0.0003 seconds

Execution time: 0.0006 seconds

Vetted Senior AI Talent

Work with our top 2% hidden gems, vetted through over 300 real-world projects.

Top Talent

Leave a comment.
0 Comments
Submit a Comment

Your email address will not be published. Required fields are marked *