Healthcare

A Faster Way to Annotate Transcript Data in PTSD Therapy Sessions

May 11, 20204 min readUpdated May 30, 2024

Omdena

A Faster Way to Annotate Transcript Data in PTSD Therapy Sessions

The Problem

This project has been done with Christoph von Toggenburg, CEO of World Vision Switzerland, who was exposed to Post Traumatic Stress Disorder in an armed ambush in Africa. PTSD can be triggered when someone experiences a severe traumatic event, and instead of the trauma leveling off, it becomes a mental health condition.

Symptoms include panic attacks, anxiety, uncontrollable thoughts, and more, which can be triggered whenever they are reminded of the event.

“The difference between trauma and PTSD is that switch in your brain, and it becomes a part of your life. It is something you cannot reverse, but you can deal with the symptoms, and if treated properly, you can get much better” — Christoph

Christoph started BEATrauma, an initiative to help victims with PTSD therapy all around the world. His vision is to create a mobile app Risk Assessment chatbot to converse with users and determine a risk assessment for PTSD, by using Cognitive Behavioral Therapy(CBT), which would implement machine learning — that’s where we come in!

The Data Problems — Not Annotated, Not Enough

Data is not always easy to find, especially when dealing with sensitive user information like therapy sessions. Though through our community network, we were able to get around 1700 transcripts on therapy sessions, about only 50 which were for PTSD.

The Solution

From a traditional treatment point, we discovered that CBT (Cognitive Behavioral Therapy) was the best solution for PTSD therapy using a Risk Assessment Chatbox. CBT is having a therapist to talk to the patient more about their experiences and “expose” them more until they finally become comfortable with it. Knowing that we could implement a conversational agent in NLP for this purpose, we set our sights on training data using Risk Assessment Chatbox.

We split into two groups. One was in charge of risk assessment, creating a rule-based algorithm in rasa with sentiment analysis to converse with the user, along with a backend classification model trained on transcript data to determine if the user had PTSD. The other focused on CBT, training a seq-to-seq chatbot for therapy!

This article described the data annotation part. Since the transcripts came completely unlabelled, we had to give them a score between 0 to 1 so that the model could learn which patients had PTSD and which didn’t. One of our project collaborators had experience with statistics and psychology and guided the team of seven through reading through the transcripts and scoring them!

The Annotation Process

Understand each of the 6 criteria for PTSD. E.x., Exposure to actual or threatened death, serious injury, or sexual violence, Persistent avoidance of stimuli associated with the traumatic event(s), and more!
Keeping the criteria in mind, read an entire transcript (which can take from 45 min-1 hr).
Score each of the 6 criteria with either a 0, 0.5, or 1, of which 0 means not displaying the symptom at all, 0.5 meaning somewhat displaying it, and 1 representing a clear expression of that symptom.
Follow a formula to take in all 6 numbers and spit out a number between 0 and 1 for the risk assessment for PTSD.
Rinse and repeat for the other 49.

Criterion A’s description

We faced two problems in our annotation process. The first was that it took far too long to annotate all the data. Through complications and busyness, it took around two weeks to finish with tons of hard work put in. The second was that the transcripts were often a bit unclear and difficult to understand.

We brainstormed several solutions to the annotation problem:

Determine a bag of words and their embeddings for each criterion and run LDA (Latent Dirichlet Allocation) on top of them for classification of each criterion to completely automate the process
Using USE (Universal Sentence Encoder) to determine the cosine similarity of each sentence to match sentences of the same criterion
Use GPT-2 to summarize each transcript to get the main idea, speeding up the annotations

Creating the Risk Assessment Chatbot

From there, we had to create a classification model that takes in user conversations and determine if they had PTSD. Another task group had a breakthrough with ULMFiT’s transfer learning technique, which resulted in 80% accuracy, which is a very good start that is currently further improved through data augmentation methods.

Ready to run the advanced models soon!

You might also like

Want to work with us too?

Let’s see if we are a good fit

Share this article

Share on LinkedIn, send by email, or copy the direct link.

LinkedIn Email

Agriculture

From Orbit to Harvest: Inside TerraYield, a Multimodal Dataset for Smarter Crop Yield Forecasting

July 3, 2026

Computer Vision

AI-Powered Rooftop Solar Assessment: How Computer Vision Eliminates the 30-40% Pre-Sales Survey Cost

May 27, 2026

Data Engineering

AI for Solar Energy Adoption in Sub-Saharan Africa

December 4, 2025

Machine Learning

Solar GIS Mapping with Satellite Imagery to Locate Ideal Installation Sites in Africa