Project background.

Social media platforms offer invaluable sources to collect real-world text content to build NLP Solutions which have become easier to build with the advances of language models. However, in countries where local dialects are commonly utilized on social media, NLP engineers encounter numerous obstacles engineers when it comes to develop language-based solutions. In such cases, customized models that handle these dialects are needed

The problem.

In the Algerian context, the development of NLP-based solutions is hampered by the general use of dialect in conversation on social media. Thus, there is a need to develop customized NLP solutions that can handle dialect-based text content. 

Project goals.

As the first NLP-related project for our local chapter, this challenge aims to build a model to classify hate speech content in the Algerian dialect, extracted from social media. At the same time, it will allow the participants to evaluate their knowledge with regards to NLP and machine /deep learning methods and language models and also learn new skills throughout the project in a collaborative environment with their peers.The objectives of the project are: - Extract data from social networks. - Annotate the data and build the hate speech dataset. - Explore the current state-of-the-art NLP techniques for the Algerian dialect. - Build content classifiers using different techniques, Machine learning/ Deep learning, and transformers. - Evaluate and compare the performances of the different models. - Deployment of the selected classifier.

Project plan.

  • Week 1

    Planning and preparation

  • Week 2

    Data collection

  • Week 3

    1) Data collection
    2) Annotation/ Building the dataset

  • Week 4

    1) Annotation/ Building the dataset
    2) Explore NLP State of Art with Algerian dialect

  • Week 5

    1) Annotation/ Building the dataset
    2) Explore NLP State of Art with Algerian dialect
    3) Building Classifiers using different techniques

  • Week 6

    1) Building Classifiers using different techniques
    2) Evaluate and compare the performances of the different models

  • Week 7

    Model Deployment

  • Week 8

    Project Wrap up (project report and final presentation)

Learning outcomes.

NLP, state of art of text classification of algerian dialect, practical experience on machine and deep learning

