Violence Detection Between Children and Caregivers Using Computer Vision
  • The Results

Violence Detection Between Children and Caregivers Using Computer Vision

Challenge Completed!

A team of 50 Omdena AI changemakers collaborated with Israel-based startup EyeKnow AI to apply deep learning to build a computer vision model for violence detection. The model can help not only detect but in the future also prevent violent behaviour applied to children by caregivers.

 

The problem

Child maltreatment presents a substantial public health concern. Estimates using Child Protective Service (CPS) reports from the National Child Abuse and Neglect Data System (NCANDS) suggest that 678,810 youth were subjected to maltreatment in 2012, with 18% of these experiencing physical abuse (). Additionally, a large proportion of cases are undetected by CPS, suggesting that more youth are likely subjected to abusive or neglectful behavior (). Most seriously, maltreatment was responsible for an estimated 1,640 youth fatalities in 2012 ().

 

The project outcomes 

The data

Two datasets, one is a caregiver-to-senior violence dataset, made out of 500 clips sourced entirely from YouTube. The 2nd dataset comprises 500 clips of caregiver-to-child aggression/violence, driven by YouTube clips and unique data obtained through partnerships with EyeKnow’s partners. 

 

The machine learning models

The contributors of the challenge defined several approaches to build a model to detect violent interaction or any relevant interaction between the entities (caregivers, elderly, children). The first step of this approach was to see the entities, which the team did by utilizing object detection.

The team applied frame-level entity annotation to label the caregivers, children, and elderly. After this step, the collaborators trained an object detection model and implemented an ML pipeline. This pipeline ingests video recordings from CCTV or other sources and outputs frame-level information about the number and type of entities on the frame level. In addition, bounding box-based overlap analysis was included in the pipeline, which flags frames that potentially contain interaction of high intensity (potentially violent). 

Next to this pipeline, the team applied video classification modeling utilizing deep neural networks. This approach combined pre-trained models for feature extraction with sequence modeling to capture temporal relationships. 

All the developed models and approaches run in a Python application. The application is highly modular and serves multiple purposes. By modifying a configuration file (parameters JSON file), the user can execute training of component models or manage inference and process video files.

 

This challenge has been hosted with our friends at