To access the french version of a problem statement please visit https://docs.google.com/document/d/1kvpDuY0L2D4C_DIAU9o7CstcVzROb56HP_hd1oD5f8M/edit?usp=sharing
Nowadays, many people turn to express their feelings regarding events through social media, thus generating a vast corpus of conversations on a vast range of topics. Hot topic detection being an important knowledge discovery task on social data streams turns out to be very useful in such a context. They help to determine hot topics that are being discussed the most on a social platform. One of the main challenges of this task is the processing of a very large social dataset in a sequence order efficiently and effectively.
The Total Energies 2021 Africa Cup of Nations (CAN Total Energies 2021) took the continent by storm for a good 29 days, with the recently concluded 33rd edition acclaimed as the most widely followed in the history of the competition. #AFCON2021 engendered a surplus of feisty content, laden with agenda-setting, crescendoing the continent into a fever pitch. With mainstream and social media across and beyond the continent awash with technical analyses of the sport. This exciting event was about 24 countries, sparring through 52 matches and ending up with a Senegal-Egypt finale with the Teranga Lions emerging as first-time champions! With Cameroon gleaning 3rd place after a spectacular Lion-montada against Burkina Faso. Senegal’s Sadio Mané and Edouard Mendy winning best player and goalkeeper, respectively, Cameroon’s Vincent Aboubakar finishing as top goalscorer with 8 goals, with Karl Toko Ekambi (5 goals ) in tow, Cameroon’s “CAN sucrée” thrilled spectators to the very end. However, the murmuring ranged from debatable concerns about the refereeing and CAF COVID-19 regulations to improbable claims about accommodation. But it wasn’t long before it morphed into a berserk attack on the host country, more could be read here.
The continuously increasing artifacts about the AFCON2021 on social media make it difficult to quickly gain insights into certain themes that best describe the noise we had. To facilitate such comprehension, and to provide a better understanding of the noise around the AFCON2021 this 4 week’s challenge will propose a model to discover the hot topics and patterns of topics evolutions in the AFCON2021. The determination of these hot topics could help give a better picture of the main concerns of fans as well as what they did like the most, thus helping in improving the next organizations and making an even better AFCON for the coming years.
• Collect and pre-process the historical interactions data of (Facebook, Twitter, …) users for the months July 2021 to February 2022;
• Collect, pre-process and analysis of the frequency of hot terms (events, words, publications, …) related to AFCON 2021: extract hot terms and assignation of weight. Hot terms are extracted based on three features: the frequency of the terms used in the document collection; the location of the terms within a document; the breadth of terms distribution in the document collection;
• Provide a Hot Term Recognition model with Timeline Analysis
• Multidimensional Sentence Modeling
• Evaluation and validation of the model
• Provide an interactive dashboard about results
Tools preparation for data extraction + Data pre-processing
Preparation of the report and dashboard
Output: Presentation of the report and dashboard
1. Social Media Data Extraction
2. Data pre-processing
3. Exploratory Data Analysis
4. Transformers and Graph-based models
5. Interactive Dashboard