Increasing Drug Safety By Detecting Anomalies in Clinical Data Using Machine Learning
This Omdena AI Challenge is to build an online service to detect anomalies in clinical data. This helps democratize clinical data anomaly detection and make devices and drugs safer, more effective, and accessible for patients and physicians. The challenge partner Flaskdata is an Israeli tech startup that specializes in helping life sciences companies complete their Phase 2 and Phase 3 clinical trials
The world of real-world clinical data is exploding. The variety, volume, and velocity of this type of data are growing exponentially. Our clinical data is being generated all the time, by multiple sources at varying times. The result is an endless body of data, unsynchronized in place, time, and doctor visits. According to Flaskdata, this brings up the following questions:
Can we rely on this data to make decisions? Can we quantify the risk? Is there a more relevant question for a $1.5 Trillion/year life science industry and a world where over a million people have died from SARS-COV-2 and FDA is tightening guidance on safety follow-up for vaccine development? Where a major global pharma discovered that 20% of the subjects in their Corona vaccine trial were mis-dosed, just before FDA submission?
Regulatory agencies, the global life science industry, and the big tech players all understand the immense value of our real-world clinical data. Amazon, Google, Apple, and Microsoft are intensely engaged in healthcare data processing and delivery.
Can we rely on tech companies to use our clinical data for decisions that affect our lives without independent validation?
The team built an API to detect anomalies in structured clinical data (not free text or images) for two use cases: clinical trials and connected devices.
The algorithm(s) behind the API should readily work on high-dimensional data, be model-free, and scale well.
Use case 1: Clinical trial data
Data from clinical trials are timestamped and typically has a large number of dimensions (300 to over 3000) when compared with textbook use cases of anomaly detection in online commerce or factory process control. Data is often sparse in the dimensions because of missing data or data model items that are not relevant for every patient.
There are a number of unique challenges with clinical data vs. textbook problems:
Physiological data is not necessarily a stationary process
Humans (patients and investigators) do unexpected things
Each combination of therapeutic and medical indication is unique unlike the textbook anomaly detection (AD) cases of revenue/machine anomalies.
At the beginning of a clinical trial, there is no training set
Small data sets (average 350 patients Phase 1-4, 50,000 large Phase 3)
Issues of bias
Use AD to assess the reliability of the data, the efficacy of the therapeutic, and monitor patient safety.
we used an ensemble approach to combine the detection of outliers in multidimensional data points, time-series drifts, and spikes as well as therapeutic-user-specified rules. For example, in psychiatric studies, some patients may report suicidal tendencies using the QIDS form using a mobile app. This is an example of a therapeutic-user-specified rule.
Use case 2: Connected devices
Time-series data from connected wearable devices, watches, connected medical devices. Connected wearables may be standalone devices used by consumers or devices used in clinical trials to monitor efficacy, compliance, and patient safety.
Very large data sets, a small number of parameters
Current work on fall and stress detection does not address the full potential for clinical data AD with devices.
Use AD to assess the reliability of the data.
Use AD to assess the suitability of patients to participate in a clinical trial (so-called ‘pre-screening’)
Use AD to monitor the clinical trial participants for adverse events/safety issues (for example consistently rising or dropping blood pressure may be indicators of a developing serious adverse event to the patient.
The project outcomes
A RESTful API service for automated detection of anomalies in clinical data. This will help democratize clinical data anomaly detection and make devices and drugs safer, more effective, and accessible for patients and physicians.