Hot Topic detection and tracking on AFCON 2021 in Cameroon

Local Project Cameroon Chapter

Coordinated by the Lead of Cameroon, Yuehgoh Foutse, Gaelle Patricia Talotsing, Ngnie Christian ,

Status: Completed

Project Duration: 01 Apr 2022 - 30 Apr 2022

Open Source resources available from this project

Project background.

To access the french version of a problem statement please visit https://docs.google.com/document/d/1kvpDuY0L2D4C_DIAU9o7CstcVzROb56HP_hd1oD5f8M/edit?usp=sharing

Nowadays, many people turn to express their feelings regarding events through social media, thus generating a vast corpus of conversations on a vast range of topics. Hot topic detection being an important knowledge discovery task on social data streams turns out to be very useful in such a context. They help to determine hot topics that are being discussed the most on a social platform. One of the main challenges of this task is the processing of a very large social dataset in a sequence order efficiently and effectively.

The Total Energies 2021 Africa Cup of Nations (CAN Total Energies 2021) took the continent by storm for a good 29 days, with the recently concluded 33rd edition acclaimed as the most widely followed in the history of the competition. #AFCON2021 engendered a surplus of feisty content, laden with agenda-setting, crescendoing the continent into a fever pitch. With mainstream and social media across and beyond the continent awash with technical analyses of the sport. This exciting event was about 24 countries, sparring through 52 matches and ending up with a Senegal-Egypt finale with the Teranga Lions emerging as first-time champions! With Cameroon gleaning 3rd place after a spectacular Lion-montada against Burkina Faso. Senegal’s Sadio Mané and Edouard Mendy winning best player and goalkeeper, respectively, Cameroon’s Vincent Aboubakar finishing as top goalscorer with 8 goals, with Karl Toko Ekambi (5 goals ) in tow, Cameroon’s “CAN sucrée” thrilled spectators to the very end. However, the murmuring ranged from debatable concerns about the refereeing and CAF COVID-19 regulations to improbable claims about accommodation. But it wasn’t long before it morphed into a berserk attack on the host country, more could be read here.

The problem.

The continuously increasing artifacts about the AFCON2021 on social media make it difficult to quickly gain insights into certain themes that best describe the noise we had. To facilitate such comprehension, and to provide a better understanding of the noise around the AFCON2021 this 4 week’s challenge will propose a model to discover the hot topics and patterns of topics evolutions in the  AFCON2021. The determination of these hot topics could help give a better picture of the main concerns of fans as well as what they did like the most, thus helping in improving the next organizations and making an even better AFCON for the coming years.

Project goals.

• Collect and pre-process the historical interactions data of (Facebook, Twitter, …) users for the months July 2021 to February 2022;

• Collect, pre-process and analysis of the frequency of hot terms (events, words, publications, …) related to AFCON 2021: extract hot terms and assignation of weight. Hot terms are extracted based on three features: the frequency of the terms used in the document collection; the location of the terms within a document; the breadth of terms distribution in the document collection;

• Provide a Hot Term Recognition model with Timeline Analysis
• Multidimensional Sentence Modeling
• Evaluation and validation of the model
• Provide an interactive dashboard about results

Project plan.

  • Week 1

    Tools preparation for data extraction + Data pre-processing

    Output: Data

  • Week 2

    Analysis + Model construction and Training

    Output: Data visualization, Hot topics

  • Week 3

    Model construction and training +Dashboard

    Output: Hot topics, Dashboard

  • Week 4

    Preparation of the report and dashboard

    Output: Presentation of the report and dashboard

Learning outcomes.

1. Social Media Data Extraction
2. Data pre-processing
3. Exploratory Data Analysis
4. Transformers and Graph-based models
5. Interactive Dashboard

Summary and results.

**Output: Presentation of the report and dashboard – **<https://omdena.com/blog/hot-topic-detection-and-tracking-topic-modeling-techniques/>

This challenge allowed us to understand the highly engaging online behavior of media outlets and Facebook or Twitter users during the AFCON and identify the key topics using various NLP techniques in supervised and unsupervised ways.
Analysis suffered from the inability to remove noises added by posts semantically irrelevant but containing some AFCON related terms. In further research, interest will be to understand how similar communities (based on linguistic similarities of their posts/tweets) or echo chambers evolved.

Share project on: