Analyzing Mental Health and Youth Sentiment Through NLP and Social Media
September 5, 2020
We are living in an era where life passes so quickly that mental illness has become a pivotal issue, and perhaps a bridge to some other diseases.
As Ferris Bueller once said:
“Life moves pretty fast. If you don’t stop and look around once in awhile, you could miss it.”
This fear of missing out has caused people of all ages to suffer from mental health issues like anxiety, depression, and even suicide ideation. Contemporary psychology tells us that this is expected — simply because we live on an emotional roller coaster every day.
The way our society functions in the modern day can present us with a range of external contributing factors that impact our mental health — often beyond our control. The message here is not that the odds are hopelessly stacked against us, but that our vulnerability to anxiety and depression is not our fault. — Students Against Depression
According to WHO, good mental health is “a state of well-being in which every individual realizes his or her own potential, can cope with the normal stresses of life, can work productively and fruitfully, and is able to make a contribution to her or his community”. At the same time, we find it at WordNet Search as “the psychological state of someone who is functioning at a satisfactory level of emotional and behavioral adjustment”. Notice that it is far from being a perfect definition, but it gives us a hint related to which indicator to look for, e.g. “emotional and behavioral adjustment”.
It’s foreseen that this year (2020) around 1 in 4 people will experience mental health problems. Especially, low-income countries have an estimated treatment gap of 85%, contrary to high-income countries. The latter has a treatment gap of 35% to 50%.
Every single day, tons of information is thrown into the wormhole that is the internet. Millions of young people absorb this information and see the world through the glass of online events and others’ opinions. Social media is a playground for all this information and has a deep impact on the way our youth interacts. Whether by contributing to a movement on Twitter or Facebook (#BlackLifeMatters), staying up to date with the latest news and discussions on Reddit (#COVID19), or engaging in campaigns simply for the greater good, the digital world is where the magic happens and makes worldwide interactions possible. The digital eco-not so friendly-system plays a crucial role and represents an excellent opportunity for analysts to understand what today’s youth think about their future tomorrow.
Take a look at the article written by Fondation Botnar related to the young people’s aspiration.
The power of sentiment analysis
Sentiment analysis, a.k.a opinion mining or emotional artificial intelligence (AI), uses text analysis, and NLP to identify affective level patterns presented in data. Therefore, a wise question could be: How do the polarities change?
Violin plots
Considering a data set scraped from Reddit and Twitter from 2016–2020, these “dynamic” polarity distributions could be expressed using violin plots.
On one hand, we see that as the years go by polarity tends to become more and more neutral. On the other hand, it’s difficult to understand which sentiment falls in what category, and what does the model categorizes as positive vs negative sentiments for each year. Also, text sentiment analysis is subjective and does not really spot complex emotions like sarcasm.
Violin plots according to label
So now, the next attempt was to see polarities according to labels — anxiety, depression, self-harm, suicide, exasperation, loneliness, and bullying.
Even if we try to see the polarities by the label, we might end up with surface-level results instead of crisp insights. Look at Self-harm, what’s the meaning of positive self-harm? But it’s still there in the green plot.
We see that most of the polarities are distributed close to the limits of the neutral region, which is ambiguous since it can be viewed as either a lack of positiveness or a lack of accurate sentiment categorization. The question is — how do we gain better insights?
Maybe we try plotting the mean (average) sentiment per year per label.
Notice that Depression was the only label that went through two consecutive decreasing mean sentiment values and passed from positive (2017–2019) to neutral in 2020. Moreover, Loneliness and Bullying classes are depicted only with one mark each, because they appear only in the data scraped from (Jan - Jun)/2020.
Depression-label word cloud
Before pressing on, let’s just take a look at the Depression-label word cloud. Here we can detect a lot of “emotions” besides the huge “depression” in green, e.g. “low”, “hopeless”, “financial”, “relationship”.
These are just the most frequent words associated with posts labeled as Depression and not necessarily translates the feelings behind the scene. However, there is a huge “feel” there… Why? For sure, this is related to one of the most common words, which actually is the 6th more common word in the whole data set. In a more in-depth analysis aiming to find interconnections among topics, certainly “feel” would be used as one of the most prominent edges.
This Knowledge graph shows all the nodes where “feel” is used as the edge connector. Very insightful but not very visible.
In fact, there’s a much better approach that performs text analysis across lexical categories. So now the question is: “What sort of other feelings related to mental health issues should we be looking for?”.
Empath analysis
The main objective of empath analysis consists of connecting the text within a wide range of sentiments besides just negative, neutral, and positive polarities. Now we’re able to go far beyond trying to detect subjective feelings. For example, look at the second and third lexicon- “sadness” and “suffering”. Empath uses similarity comparisons to map a vocabulary of the text words, (our data set is composed of Reddit and Twitter posts) across Empath’s 200 categories.
The Empath value is calculated by counting how many times it appears and is normalized according to the total text emotions spotted and the total number of words analyzed. Now we’re able to go much deeper and truly connect the sentiment presented in the text into some real emotion, rather than just counting the most frequent ones and assuming whether it is related to something good or bad.
We choose five lexicons that might be more deeply associated with mental health issues and show in the left plot: “nervousness”, “suffering”, “shame”, “sadness” and “hate”, we tacked these five emotions per year analyzed. And guess what? Sadness skyrocketed in 2020.
Sentiment analysis in the post-COVID world
The year 2020 turned our lives upside down. From now on we will most likely have to rethink the way we eat, travel, have fun, work, connect,… In short, we will have to rethink our entire lives.
There’s absolutely no question that the COVID-19 pandemic plays an essential role in mental health analysis. To take these impacts into account, since COVID-19 began to spread out worldwide in January, we selected all the data comprising the period of (January — June)/2020 to perform the analysis. Take a look at the Word Cloud related to the COVID-19 analysis from May and June.
We can see words like help, anxiety, loneliness, health, depression, isolation. In this case, we can consider that it reflects the emotional state of people on social media. As said earlier that the sentiment analysis under polarity tracking isn’t that insightful, but we display the violin graphs below just for comparing.
Now we see a very different pattern from the previous one, and why is that? Well, now we’re filtering by the COVID-19 keywords and indeed the sentiment distribution now seems to make sense. Looking more closely at the distribution of the data, the following is observed.
In the word count from the sample of texts from 2020, only 2.59% of them contain words related to COVID-19. The words we used are “corona”, “virus”, “COVID”, “covid19”, “COVID-19” and “coronavirus”. Furthermore, the frequency of occurrence decreases as the number of related words found increases, the most common being at most three times in the same text.
Till now, we have presented the distribution of sentiments for specific words related to COVID-19. Nonetheless, questions about how these words relate to the general sentiment during the time period under analysis haven’t been answered yet.
The general sentiment has been deteriorating, i.e. becoming more negative, since the beginning of 2020. In particular, June is the month with the most negative sentiment, which coincides with the month with the most contagious cases of COVID-19 in the period considered, with a total number of 241 million cases. Considering the differences between the words related to COVID-19 and words that are completely unrelated, in the former, more negativity in sentiments is perceived in general.
Exasperation remains stable, with February being the month that attracts the most attention due to its negativity compared to the rest. Likewise, self-harm is quite stable. The months that call out the attention for their negativity in this category are March and June. Contrary to self-harm, in suicides, March doesn’t represent a negative month. However, the rest of the months between February and June not only present a detriment in the sentiment, which worsens over time, but they are also notably negative. June draws attention to having really positive and really negative sentiments (high polarities), which doesn’t happen in the other months. It has to be verified, but it could be that the number of suicides has been increasing in the last months. Regarding anxiety, a downward trend is also observed in the sentiment between February and May. Finally, one should be careful with loneliness, given the high negativity perception in May and June. Given that there are only data for June 2020 for Bullying, this label isn’t analyzed.
The next figure presents the time series corresponding to the sentiment between 2019/05 and 2020/06. A slight downward trend can be observed. This means that the general sentiment has become more negative. Additionally, there are days that present greater negativity, indicated by the troughs. Most of the troughs in the present year are found in the last months since April.
There are other major incidents, besides COVID-19, that have influenced the youth to call for help and to speak up in 2020. The recent murder of George Floyd was the turning point and lighted up the #BlackLivesMatter movements. Have a look at the word cloud on the left — with the most frequent and insightful words
The youth gathered to protest against racism and call for equality and freedom worldwide. The Empath values related to Racism and Mental Health are displayed below.
The COVID-19 pandemic has led the world towards a scenario of a global economic crisis. Massive unemployment, lack of food, lack of medicines. Perhaps the big Q is: “How will the pandemic affect the younger generations and the generations to come? ”. Unfortunately, there’s no answer to this question. Except that the economic crisis that we’re presently living in is definitely going to affect the younger generation because they’re the ones to study, go to college and find a job in the near future. The big picture tells us that unemployment is increasing on a daily basis and there are not enough resources for all of us. The Word Cloud in the opening of the article reflects some of the most frequent words related to the actual economic crisis.
—
This article is written by Mateus Broilo and Andrea Posada Cardenas.