Violence Detection Between Children and Caregivers Using Computer Vision
A team of 50 Omdena AI changemakers collaborated with Israel-based startup EyeKnow AI to apply deep learning to build a computer vision model for violence detection. The model can help not only detect but in the future also prevent violent behaviour applied to children by caregivers.
The problem
Child maltreatment presents a substantial public health concern. Estimates using Child Protective Service (CPS) reports from the National Child Abuse and Neglect Data System (NCANDS) suggest that 678,810 youth were subjected to maltreatment in 2012, with 18% of these experiencing physical abuse (U. S. Department of Health and Human Services et al., 2013). Additionally, a large proportion of cases are undetected by CPS, suggesting that more youth are likely subjected to abusive or neglectful behavior (Fallon et al., 2010). Most seriously, maltreatment was responsible for an estimated 1,640 youth fatalities in 2012 (U. S. Department of Health and Human Services et al., 2013).
The project outcomes
The data
Two datasets, one is a caregiver-to-senior violence dataset, made out of 500 clips sourced entirely from YouTube. The 2nd dataset comprises 500 clips of caregiver-to-child aggression/violence, driven by YouTube clips and unique data obtained through partnerships with EyeKnow’s partners.
The machine learning models
The contributors of the challenge defined several approaches to build a model to detect violent interaction or any relevant interaction between the entities (caregivers, elderly, children). The first step of this approach was to see the entities, which the team did by utilizing object detection.
The team applied frame-level entity annotation to label the caregivers, children, and elderly. After this step, the collaborators trained an object detection model and implemented an ML pipeline. This pipeline ingests video recordings from CCTV or other sources and outputs frame-level information about the number and type of entities on the frame level. In addition, bounding box-based overlap analysis was included in the pipeline, which flags frames that potentially contain interaction of high intensity (potentially violent).
Next to this pipeline, the team applied video classification modeling utilizing deep neural networks. This approach combined pre-trained models for feature extraction with sequence modeling to capture temporal relationships.
All the developed models and approaches run in a Python application. The application is highly modular and serves multiple purposes. By modifying a configuration file (parameters JSON file), the user can execute training of component models or manage inference and process video files.
First Omdena Project?
Join the Omdena community to make a real-world impact and develop your career
Build a global network and get mentoring support
Earn money through paid gigs and access many more opportunities
Your Benefits
Address a significant real-world problem with your skills
Get hired at top companies by building your Omdena project portfolio (via certificates, references, etc.)
Access paid projects, speaking gigs, and writing opportunities
Requirements
Good English
A very good grasp in computer science and/or mathematics
(Senior) ML engineer, data engineer, or domain expert (no need for AI expertise)
Programming experience with Python
Understanding of OCR, Deep Learning, and Computer Vision.
This challenge has been hosted with our friends at
Application Form
Become an Omdena Collaborator