Deploying an Accurate Classifier to Stop Online Violence Against Children Using NLP

Challenge Completed!

Deploying an accurate classifier to identify grooming behavior in online chats with children to stop online violence against children. In this 8-week challenge, you will join a collaborative team of 50 AI engineers.

The problem

The project is designed to reduce Online Sexual Exploitation and Abuse of Children (OSEAC). With a 15,000% rise in online Child Sexual Abuse Materials (CSAM) online from 2005 to 2020, it is clear that online child violence is growing exponentially. In 2021, the National Center for Missing and Exploited Children’s CyberTipline received 29.3 million reports of CSAM, making 2021 the worst year on record for online child sexual abuse.

A primary way that adults with a sexual interest in children or those who wish to harm them in other ways are through online grooming. As described by Sørensen, 2015; Greijer et al., 2016,

“Grooming is a multidimensional phenomenon in which an adult aims to solicit a child into a seemingly voluntary interaction with the intention of sexually abusing that child.” In a study Save the Children published last year, Grooming in the Eyes of a Child (Juusola et al., 2021), we found that children who are the object of grooming often do not realize what is happening so they do not recognize they are in danger until they are being extorted into providing increasingly harmful imagery or even to meeting an online predator in person.

The project goals

The team goal is to stop online violence against children by deploying an accurate classifier to identify grooming behavior in online chats with children. Once suspicion of grooming reaches a threshold based on its similarity to the training data, it will trigger an action, which may differ depending on the platform it is deployed on and the objectives of the intervention. As an example, the team may warn the child through the chatbot without alerting the groomer, call a moderator, or shut down the chat entirely.

In 2020, Save the Children US collaborated with Omdena to address online violence. Of the various products that were generated from the sprint, the most promising was a classifier algorithm using Natural Language Programming to identify online grooming combined with a chatbot that can warn the children that they may be chatting with a groomer. Since then, a team of three engineers associated with the original project has continued to refine the technology. The core team now wants to expand on the work to build an industry usable solution at scale.

From the original challenge, the team has a large dataset of more than 800,000 lines taken from the Perverted Justice project, a project from 2003 to 2019 that used online volunteers as decoys to entrap predators that sought to contact minors to obtain sexual images or videos from them or to meet them in person. During the challenge and afterward, the team tagged much of the training data with labels, such as male or female, predator or victim, and level of risk of the conversation, but the data still requires extensive processing, and in particular, the team need to improve and systematize the way judge and annotate the level of risk. In addition to the data already have, the team is actively attempting to obtain additional databases of online grooming chats from a variety of sources, such as law enforcement agencies.

The project milestones

Build on existing data and further annotate additional sentences. The target is to achieve 100,000. annotated sentences with risk levels (non-risky, potentially risky, or risky). If new data is made available by law enforcement, annotate that data.
Look for and scrape user data from online resources.
Create a language model [Classification] to detect grooming behavior by labeling it as non-risky, potentially risky, or risky.
Test the data on various models and provide ablation studies.
Deploy the system as an API.
Make the API a stand-alone chrome extension that predicts labels in an impromptu manner [The Grammarly execution process is the best example to relate with the final deliverable].

Why join? The uniqueness of Omdena AI Challenges

A collaborative experience you never had in your working life! For the next eight weeks, you will not only build AI solutions to make a real-world impact but also go through an entire data science project lifecycle. This covers problem scoping, data collection, and preparation, as well as modeling for deployment.

And the best part is that you will join a global and collaborative team of changemakers. Omdena AI Challenges are not a competition or hackathon but a real-world project that will take your experience of what is possible through collaboration to a new level.

Find more information on how an Omdena project works

First Omdena Project?

Join the Omdena community to make a real-world impact and develop your career

Build a global network and get mentoring support

Earn money through paid gigs and access many more opportunities

Your Benefits

Address a significant real-world problem with your skills

Get hired at top companies by building your Omdena project portfolio (via certificates, references, etc.)

Access paid projects, speaking gigs, and writing opportunities

Requirements

Good English

A very good grasp in computer science and/or mathematics

Student, (aspiring) data scientist, (senior) ML engineer, data engineer, or domain expert (no need for AI expertise)

Programming experience with Python

Understanding of Machine Learning and NLP

This challenge is hosted with our friends at

Application Form

Application Closed.