[French Chapter] Deploying an Accurate Classifier to Stop Online Violence Against Children Using NLP

Challenge Started!

This Omdena Local Chapter Challenge runs for 6 weeks and is a unique experience to try and grow your skills in a collaborative and safe environment with a diverse mix of people from all over the world.

You will work on solving a local problem, initiated by the Omdena Marseille, France Chapter.

The problem

The project is designed to reduce Online Sexual Exploitation and Abuse of Children (OSEAC). With a 15,000% rise in online Child Sexual Abuse Materials (CSAM) online from 2005 to 2020, it is clear that online child violence is growing exponentially. In 2021, the National Center for Missing and Exploited Children’s CyberTipline received 29.3 million reports of CSAM, making 2021 the worst year on record for online child sexual abuse.

A primary way that adults with a sexual interest in children or those who wish to harm them in other ways are through online grooming. As described by Sørensen, 2015; Greijer et al., 2016,
“Grooming is a multidimensional phenomenon in which an adult aims to solicit a child into a seemingly voluntary interaction with the intention of sexually abusing that child.” In a study Save the Children published last year, Grooming in the Eyes of a Child (Juusola et al., 2021), we found that children who are the object of grooming often do not realize what is happening so they do not recognize they are in danger until they are being extorted into providing increasingly harmful imagery or even to meeting an online predator in person.

The project goals

Our goal is to stop online violence against children by deploying an accurate classifier to identify grooming behavior in online chats with children. Once suspicion of grooming reaches a threshold based on its similarity to the training data, it will trigger an action, which may differ depending on the platform it is deployed on and the objectives of the intervention. As example, we may warn the child through the chatbot without alerting the groomer, call a moderator, or shut down the chat entirely.

In 2020, Save the Children US collaborated with Omdena to address online violence (https://omdena.com/projects/children-violence/). Of the various products that were generated from the sprint, the most promising was a classifier algorithm using Natural Language Programming to identify online grooming combined with a chatbot that can warn the children that they may be chatting with a groomer. Since then, a team of three engineers associated with the original project has continued to refine the technology. The core team now wants to expand on the work to build an industry-usable solution at scale.

From the original challenge, we have a large dataset of more than 800,000 lines taken from the Perverted Justice project, a project from 2003 to 2019 that used online volunteers as decoys to entrap predators that sought to contact minors to obtain sexual images or videos from them or to meet them in person. During the challenge and afterward, we tagged much of the training data with labels, such as male or female, predator or victim, and level of risk of the conversation, but the data still requires extensive processing, and in particular, we need to improve and systematize the way judge and annotate the level of risk. In addition to the data we already have, we are actively attempting to obtain additional databases of online grooming chats from a variety of sources, such as law enforcement agencies.

1. Build on existing data and further annotate additional sentences. Target is to achieve 100,000 annotated sentences with risk levels (non-risky, potentially risky, or risky). If new data is made available by law enforcement, annotate that data.

2. Look for and scrape user data from online resources.

3. Create a language model [Classification] to detect grooming behavior by labeling it as non-risky, potentially risky, or risky.

4. Test the data on various models and provide ablation studies.

5. Deploy the system as an API.

6. Make the API a stand-alone chrome extension that predicts labels in an impromptu manner [The Grammarly execution process is the best example to relate with the final deliverable]

Why join? The uniqueness of Omdena Local Chapter Challenges

Omdena Local Chapter Challenges are not a competition or hackathon but a real-world project that will grow your experience to a new level.

A unique learning experience with the potential to make an impact through the outcome of the project. You will go through an entire data science project lifecycle. This covers problem scoping, data collection, and preparation, as well as modeling for deployment.

And the best part is that you will join the global and collaborative community of Omdena with tons of benefits to accelerate your career.

Read more on how Omdena´s Local Chapters work

First Omdena Local Chapter Challenge?

Beginner-friendly, but also welcomes experts

Education-focused

Open-source

Duration: 4 to 8 weeks

Your Benefits

Address a significant real-world problem with your skills

Build your project portfolio

Access paid projects (as an Omdena Top Talent)

Get hired at top organizations

Requirements

Good English

Suitable for AI/ Data Science beginners but also more senior collaborators

Learning mindset

This challenge is hosted with our friends at

Application Form

Application Closed.