Analyzing Mental Health and Youth Sentiment Through NLP and Social Media

Analyzing Mental Health and Youth Sentiment Through NLP and Social Media

By Mateus Broilo and Andrea Posada Cardenas

 

We are living in an era where life passes so quickly that mental illness has become a pivotal issue, and perhaps a bridge to some other diseases.

As Ferris Bueller once said:

“Life moves pretty fast. If you don’t stop and look around once in awhile, you could miss it.”

This fear of missing out has caused people of all ages to suffer from mental health issues like anxiety, depression, and even suicide ideation. Contemporary psychology tells us that this is expected — simply because we live on an emotional roller coaster every day.

The way our society functions in the modern day can present us with a range of external contributing factors that impact our mental health — often beyond our control. The message here is not that the odds are hopelessly stacked against us, but that our vulnerability to anxiety and depression is not our fault. — Students Against Depression

According to WHO, good mental health is “a state of well-being in which every individual realizes his or her own potential, can cope with the normal stresses of life, can work productively and fruitfully, and is able to make a contribution to her or his community. At the same time, we find it at WordNet Search as “the psychological state of someone who is functioning at a satisfactory level of emotional and behavioral adjustment”. Notice that it is far from being a perfect definition, but it gives us a hint related to which indicator to look for, e.g. “emotional and behavioral adjustment”.

It’s foreseen that this year (2020) around 1 in 4 people will experience mental health problems. Especially, low-income countries have an estimated treatment gap of 85%, contrary to high-income countries. The latter has a treatment gap of 35% to 50%.

Every single day, tons of information is thrown into the wormhole that is the internet. Millions of young people absorb this information and see the world through the glass of online events and others’ opinions. Social media is a playground for all this information and has a deep impact on the way our youth interacts. Whether by contributing to a movement on Twitter or Facebook (#BlackLifeMatters), staying up to date with the latest news and discussions on Reddit (#COVID19), or engaging in campaigns simply for the greater good, the digital world is where the magic happens and makes worldwide interactions possible. The digital eco-not so friendly-system plays a crucial role and represents an excellent opportunity for analysts to understand what today’s youth think about their future tomorrow.

Take a look at the article written by Fondation Botnar related to the young people’s aspiration.

 

The power of sentiment analysis

Sentiment analysis, a.k.a  opinion mining or emotional artificial intelligence (AI), uses text analysis, and NLP to identify affective level patterns presented in data. Therefore, a wise question could be: How do the polarities change?

 

Top Mental Health keywords from Reddit and Twitter

 

 

Violin plots

Considering a data set scraped from Reddit and Twitter from 2016–2020, these “dynamic” polarity distributions could be expressed using violin plots.

 

 

Sentiment Violin-Plot hued by Year

Sentiment Violin Plots by year. Here positive values refer to positive sentiments, whereas negative values indicate negative sentiments. The thicker part means the values in that section of the violin has a higher frequency, and the thinner part implies lower frequency.

 

 

 

On one hand, we see that as the years go by polarity tends to become more and more neutral. On the other hand, it’s difficult to understand which sentiment falls in what category, and what does the model categorizes as positive vs negative sentiments for each year. Also, text sentiment analysis is subjective and does not really spot complex emotions like sarcasm.

 

Violin plots according to label

So now, the next attempt was to see polarities according to labels — anxiety, depression, self-harm, suicide, exasperation, loneliness, and bullying.

 

 

Sentiment Violin-Plot hued by Year

Sentiment Violin Plots by label

 

 

Even if we try to see the polarities by the label, we might end up with surface-level results instead of crisp insights. Look at Self-harm, what’s the meaning of positive self-harm? But it’s still there in the green plot.

We see that most of the polarities are distributed close to the limits of the neutral region, which is ambiguous since it can be viewed as either a lack of positiveness or a lack of accurate sentiment categorization. The question is — how do we gain better insights?

Maybe we try plotting the mean (average) sentiment per year per label.

 

 

Mean Sentiment per Year hued by label

 

 

Notice that Depression was the only label that went through two consecutive decreasing mean sentiment values and passed from positive (2017–2019) to neutral in 2020. Moreover, Loneliness and Bullying classes are depicted only with one mark each, because they appear only in the data scraped from (Jan - Jun)/2020.

 

Depression-label word cloud

Before pressing on, let’s just take a look at the Depression-label word cloud. Here we can detect a lot of “emotions” besides the huge “depression” in green, e.g. “low”, “hopeless”, “financial”, “relationship”.

 

Keywords relating to mental health

Source: Omdena

 

 

These are just the most frequent words associated with posts labeled as Depression and not necessarily translates the feelings behind the scene. However, there is a huge “feel” there… Why? For sure, this is related to one of the most common words, which actually is the 6th more common word in the whole data set. In a more in-depth analysis aiming to find interconnections among topics, certainly “feel” would be used as one of the most prominent edges.

 

 

feel knowledge graph

“Feel” Knowledge Graph

 

 

This Knowledge graph shows all the nodes where “feel” is used as the edge connector. Very insightful but not very visible.

In fact, there’s a much better approach that performs text analysis across lexical categories. So now the question is: “What sort of other feelings related to mental health issues should we be looking for?”.

 

 

 

Empath analysis

The main objective of empath analysis consists of connecting the text within a wide range of sentiments besides just negative, neutral, and positive polarities. Now we’re able to go far beyond trying to detect subjective feelings. For example, look at the second and third lexicon- “sadness” and “suffering”. Empath uses similarity comparisons to map a vocabulary of the text words, (our data set is composed of Reddit and Twitter posts) across Empath’s 200 categories.

 

 
AI Mental Health

Empathy Value VS Lexicon

 

 

 

The Empath value is calculated by counting how many times it appears and is normalized according to the total text emotions spotted and the total number of words analyzed. Now we’re able to go much deeper and truly connect the sentiment presented in the text into some real emotion, rather than just counting the most frequent ones and assuming whether it is related to something good or bad.

 

 

Empathy value vs Year

Emotion trends hued by lexicon

 

 

We choose five lexicons that might be more deeply associated with mental health issues and show in the left plot: “nervousness”, “suffering”, “shame”, “sadness” and “hate”, we tacked these five emotions per year analyzed. And guess what? Sadness skyrocketed in 2020.

 

 

Sentiment analysis in the post-COVID world

The year 2020 turned our lives upside down. From now on we will most likely have to rethink the way we eat, travel, have fun, work, connect,… In short, we will have to rethink our entire lives.

There’s absolutely no question that the COVID-19 pandemic plays an essential role in mental health analysis. To take these impacts into account, since COVID-19 began to spread out worldwide in January, we selected all the data comprising the period of (January — June)/2020 to perform the analysis. Take a look at the Word Cloud related to the COVID-19 analysis from May and June.

 

 

 

COVID 19 Top keyword Analysis of Mental Health

 

 

Covid 19 Top Keyword Analysis of Mental Health

 

 

 

We can see words like help, anxiety, loneliness, health, depression, isolation. In this case, we can consider that it reflects the emotional state of people on social media. As said earlier that the sentiment analysis under polarity tracking isn’t that insightful, but we display the violin graphs below just for comparing.

 

 

Sentiment Violin-plot for COVID 19 Analysis by Months

 

 

Sentiment Violin-plot for COVID 19 Analysis by Label

 

 

Now we see a very different pattern from the previous one, and why is that? Well, now we’re filtering by the COVID-19 keywords and indeed the sentiment distribution now seems to make sense. Looking more closely at the distribution of the data, the following is observed.

 

 

Graph of number of relatable words vs count of words

 

 

In the word count from the sample of texts from 2020, only 2.59% of them contain words related to COVID-19. The words we used are “corona”, “virus”, “COVID”, “covid19”, “COVID-19” and “coronavirus”. Furthermore, the frequency of occurrence decreases as the number of related words found increases, the most common being at most three times in the same text.

Till now, we have presented the distribution of sentiments for specific words related to COVID-19. Nonetheless, questions about how these words relate to the general sentiment during the time period under analysis haven’t been answered yet.

The general sentiment has been deteriorating, i.e. becoming more negative, since the beginning of 2020. In particular, June is the month with the most negative sentiment, which coincides with the month with the most contagious cases of COVID-19 in the period considered, with a total number of 241 million cases. Considering the differences between the words related to COVID-19 and words that are completely unrelated, in the former, more negativity in sentiments is perceived in general.

 

 

Graph between sentiment vs months in 2020

The sentiment by the label is again observed — this time from January 2020 to June 2020 only.

 

 

Violin Plot by label 2020

 

 

Exasperation remains stable, with February being the month that attracts the most attention due to its negativity compared to the rest. Likewise, self-harm is quite stable. The months that call out the attention for their negativity in this category are March and June. Contrary to self-harm, in suicides, March doesn’t represent a negative month. However, the rest of the months between February and June not only present a detriment in the sentiment, which worsens over time, but they are also notably negative. June draws attention to having really positive and really negative sentiments (high polarities), which doesn’t happen in the other months. It has to be verified, but it could be that the number of suicides has been increasing in the last months. Regarding anxiety, a downward trend is also observed in the sentiment between February and May. Finally, one should be careful with loneliness, given the high negativity perception in May and June. Given that there are only data for June 2020 for Bullying, this label isn’t analyzed.

The next figure presents the time series corresponding to the sentiment between 2019/05 and 2020/06. A slight downward trend can be observed. This means that the general sentiment has become more negative. Additionally, there are days that present greater negativity, indicated by the troughs. Most of the troughs in the present year are found in the last months since April.

 

 

Sentiment Analysis from 2019-05

Incidents that moved the youth

 

 

 

There are other major incidents, besides COVID-19, that have influenced the youth to call for help and to speak up in 2020. The recent murder of George Floyd was the turning point and lighted up the #BlackLivesMatter movements. Have a look at the word cloud on the left — with the most frequent and insightful words

The youth gathered to protest against racism and call for equality and freedom worldwide. The Empath values related to Racism and Mental Health are displayed below.

 

AI Mental Health

Normalized empathy analysis

 

 

The COVID-19 pandemic has led the world towards a scenario of a global economic crisis. Massive unemployment, lack of food, lack of medicines. Perhaps the big Q is: “How will the pandemic affect the younger generations and the generations to come? ”. Unfortunately, there’s no answer to this question. Except that the economic crisis that we’re presently living in is definitely going to affect the younger generation because they’re the ones to study, go to college and find a job in the near future. The big picture tells us that unemployment is increasing on a daily basis and there are not enough resources for all of us. The Word Cloud in the opening of the article reflects some of the most frequent words related to the actual economic crisis.

Student Debt Crisis: Analysing Sentiments, Repayment, and Wellbeing of Students through AI | ShapingEDU

Student Debt Crisis: Analysing Sentiments, Repayment, and Wellbeing of Students through AI | ShapingEDU

by Nancy Rubin, Arizona State University ShapingEDU Innovator in Residence

 

The “Applying AI to the Student Debt Crisis” project is entering week 5 and we are moving along at a fast pace. With just 3 more weeks to go in our eight-week project, some areas of focus are emerging.

It is impressive to see how much can be accomplished in a short period of time. I credit the progress to a fully engaged community and offer appreciation to the Omdena project Shepherds who keep everyone moving in a positive direction.

Each week, project update calls get more exciting as work is done and ideas exchanged.

Sentiment Analysis on social media sites, such as Twitter and Reddit, is leading to a deeper understanding of feelings about student debt. Mining news sites have shown sentiment about policies relating to student loans and student loan debt which I hope to hear more in this week’s update.

 

Initial Word Cloud of Sentiment about Student Debt

Figure 1: Initial Word Cloud of Sentiment about Student Debt

 

Several groups are looking at loan repayment data. One group is exploring demographic data to predict repayment capabilities so students can better understand loans and repayment obligations. Another group is gathering income levels and spending habit data of demographic groups in the United States so they can create a regression model to estimate, without bias, a comfortable percentage of income needed for college in times of student debt crisis.

As the participation in higher education of women, minorities and first-generation college students increases, it appears these groups are

  • less likely to get well-paid jobs;
  • more prone to other debt, and
  • have higher levels of stress and which impacts their wellbeing.

Recent job losses due to COVID-19 have affected these groups disproportionally. The impacts of the Coronavirus on the debt crisis can be seen in both sentiment analysis and the ability to repay loans. A group looking at this more closely will be able to share information that could be relevant for students considering loans after COVID-19 ends.

 

Will COVID-19 impact borrowers flowchart

Figure 2: Will COVID-19 impact borrowers

 

Student loan debt is stressful even when there aren’t additional factors, such as a pandemic, adding to it. Through an exploratory data analysis of academic research, one group is assessing if student loan/debt crisis has a statistically significant impact on mental health.

 

 

I have been working with team members to decide how to best surface the results of this project when it ends. If you have any suggestions, please share them with me. There will be informational outcomes to educate students about saving for college, educational materials about the implications of loans, and repayment informed by demographical insights. Hopefully, specific policy recommendations based on real data will be available as well.

This project is hosted by Arizona State University ShapingEDU. ShapingEDU is a community of dreamers, doers, and drivers shaping the future of learning in the digital age.

 

More About Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through global bottom-up collaboration.

 

 

NLP Clustering to Understand Social Barriers Towards Energy Transition | World Energy Council

NLP Clustering to Understand Social Barriers Towards Energy Transition | World Energy Council

Using NLP clustering to better understand the thoughts, concerns, and sentiments of citizens in the USA, UK, Nigeria, and India about energy transition and decarbonization of their economies. The following article shares observatory results on how citizens of the world perceive their role within the energy transition. This includes associated social risks, opportunities, and costs.

The findings are part of a two-month Omdena AI project with the World Energy Council (WEC). None of the findings are conclusive but observative taking into account the complexity of the analysis scope.

 

The Project Goal

The aim was to find information that can help governments to effectively involve people in the accelerating energy transition. The problem was quite complicated and there was no data provided to us. Therefore, we were supposed to create our own data-set, analyze it, and provide WEC with insights. We started with a long list of open questions such as:

  • What should our output look like?
  • What search terms would be useful to scrape data for?
  • What countries should be considered as our main focus?
  • Should we consider non-English languages as well and analyze them?
  • How much data per country will be enough?
  • Etc.

In order to meet the deadline for the project, we decided to go with the English language only and come up with good working models.

 

The Solution

 

Getting data from Social Media

We scraped the following resources: Twitter, YouTube, Facebook, Reddit, and famous newspapers specific to each country. Desired insights should cover developed, developing, and under-developed countries and the emphasis was specifically on developing, and under-developed countries.

The results discussed in this article obtained from scraped tweet data and for USA, UK, India, and Nigeria which cover the three categories of developed, developing, and under-developed countries.

 

Our Approach: Trying different NLP techniques

We first gathered data by scraping tweets using several specific keywords we found to be important for specific countries using google trends. I added stop-words, stemming, removed hashtags, punctuation, numbers, mentions, and replaced URLs with _URL. I used TF-IDF vectorization for feature extraction of the articles. I am going to walk you through various steps taken to tackle the problem.

 

Approach 1: Sentiment Analysis (Non-satisfactory)

Sentiment analysis of short tweets data comes with its own challenges and some of the important challenges we were facing for this project were:

  • Tags mean different things in different countries. #nolight can be Canadians complaining about the winter sunset, or Nigerians having a power cut.
  • Tags take a side. For example, #renewables is pro-green and #climatehoax is not. So positive sentiment on #renewables might not really tell us much.
  •  The classifier model built on #climatechange and related tags do not work at all on the anti-green tags such as #climatemyth.
  • Some anti-green tweets are full of happy emojis which makes the sentiments unreliable.
  • The major tweeting countries are overwhelmingly positive. In fact, the distribution of climate change-related tweets across the world is not uniform and the number of tweets across some countries is much more prevalent in the data-set as compared to others (Figure1) [1].
  • The interpretation of outputs. In fact, by just assigning labels to each tweet we will not be able to derive insights on the barriers to the energy transition. Therefore, the interpretability of the model is very important.

Considering all the challenges discussed, the sentiment analysis of the tweets did not produce satisfactory results (Table1) and we decided to test other models.

 

 

Number of climate change related tweets per country [1]

Figure1: Number of climate change related tweets per country [1]

 

 

Classifier accuracy for sentiment analysis of tweets data (USA)

Table1: Classifier accuracy for sentiment analysis of tweets data (USA)

 

 

Approach 2: Topic Modeling (Unsatisfactory) 

Topic modeling is an NLP technique that provides a way to compare the strength of different topics and tells us which topic is much more informative as compared to others. Topic models are unsupervised models with no need for data labeling. Because tweets are short it was really hard to differentiate between different topics and also correspond them to a specific topic using models such as LDA. Topic models tend to produce the best results when applied to texts that are not too short and those that have a consistent structure.

 

1. Using a semi-supervised approach

We chose a semi-supervised topic modeling approach (CorEX) [2]. Since the data was very high dimensional, we applied dimensionality reduction in order to remove noise and interpret the data. Permutation Test is used to determine the optimum number of principal components required for PCA [3,4]. From the explained variance ratio plot, it appeared that the cumulative explained variance line is not perfectly linear, but it is very close to a straight line.

Through permutation tests, I noticed that the mean of the explained variance ratio of permuted matrices did not really differ from the explained variance ratio of the non-permuted matrix which suggested that applying PCA on correlated topic model’s results were not helpful at all.

 

 

 

 

This means each of the principal components contributes to the variance explanation almost equally, and there’s not much point in reducing the dimensions based on PCA.

 

2. Identifying 20 important topics

The CorEx results showed that there are about 20 important topics and it was also showing the important words per topic. But how to interpret the results?

Data was very high dimensional and dimensionality reduction was not helpful at all. For example, if price, electricity, ticket, fuel, gas, and skepticism are the most important words for one topic how to understand the concerns of the people of that country? Is it fuel price that is of concern to them? Or electricity prices, or ticket prices? There could be a combination of many different possibly related words in each topic and by just looking at the important words in each topic, it would not be possible to find out what is the story behind data to harness clean energy for a better future.

Besides, bigrams or trigrams with topic models did not help much either because not the main keywords conveying the main focus of the tweet might always appear together.

 

 

 

 

Approach 3: Clustering (Kmeans & Hierarchical)

Both Kmeans and Hierarchical clustering models lead to comparable results illustrating separate clear clusters. Because both models have comparable performance, we derived all results using Hierarchical clustering which better shows the hierarchy of the clusters. Tweet data were collected for four different countries as discussed before and the model was applied to the data of each country separately to analyze the results. To summarize we only show the clustering results for India. But all the insights across countries are shown at the end of the article.

 

 

 

 

Hierarchical Clustering Results

After finding clear clusters from the data, the next step was interpreting the data by creating meaningful visualizations and insights. A combination of Scattertext, co-occurrence graph, dispersion plot, colocated word clouds, and top trigrams resulted in very useful insights from data to harness clean energy for a better future.

An important lesson to point out here is to always rely on a combination of various plots for your interpretations instead of only one. Each type of plot helps us visualize one aspect of data and combining various plots together helps to create a comprehensive clear picture from data.

 

 

1. Using Scattertext

Scattertext is an excellent exploratory text analysis tool that allows cool visualizations differentiating between the terms used by different documents using an interactive scatter plot.

Two types of plots were created which was very helpful in interpreting the results.

1) Visualizing word embedding projections. This has been explored using word association with a specific keyword. The keywords include the following: [Access, Availability, Affordability, Bills, Prices]. If the reader is interested, they can try more keywords using the provided code in this study.

2) In another plot, the uni-grams from the clustered tweets are selected and plotted using their dense-ranked category-specific frequencies. We used this difference in dense ranks as the scoring function.

All the interactive plots are stored in an HTML file and are available in the GitHub repository. If you click on the interactive version, the list of tweets with each specific term can be explored. Please note that first hierarchical clustering is applied to the data and then the clustered tweets are given to Scattertext as input. You can gain further information by diving deep into these plots. The data used for creating these results can be found here and the notebook to apply to cluster and create these scatter plots can be found here.

The following shows the interactive versions of all plots for various countries:

 

1.1. Rank and frequencies across different categories (India)

 

 

 An example Scattertext plot showing positions of terms based on the dense ranks of their frequencies, for cluster 1 & 2. The scores are the difference between the terms’ dense ranks. The bluer terms are, the higher their association scores are for cluster 1. The redder the terms, the higher their association score is for cluster 2. See Cluster 1 vs 2 for an interactive version of this plot.

Figure 8. An example Scattertext plot showing positions of terms based on the dense ranks of their frequencies, for cluster 1 & 2. The scores are the difference between the terms’ dense ranks. The bluer terms are, the higher their association scores are for cluster 1. The redder the terms, the higher their association score is for cluster 2. See Cluster 1 vs 2 for an interactive version of this plot.

 

 

An example Scattertext plot showing positions of terms based on the dense ranks of their frequencies, for cluster 1 & 3. The scores are the difference between the terms’ dense ranks. The bluer terms are, the higher their association scores are for cluster 1. The redder the terms, the higher their association score is for cluster 3. See Cluster 1 vs 3 for an interactive version of this plot.

Figure 9. An example Scattertext plot showing positions of terms based on the dense ranks of their frequencies, for cluster 1 & 3. The scores are the difference between the terms’ dense ranks. The bluer terms are, the higher their association scores are for cluster 1. The redder the terms, the higher their association score is for cluster 3. See Cluster 1 vs 3 for an interactive version of this plot.

 

 

1.2. Word embedding projection plots using Scattertext (India)

 

 

An example Scattertext plot showing word associations to term prices using Spacy’s pretrained embedding vectors. This is used to see the terms most associated with the term prices. At the top right corner, we see the most commonly associated words with the term prices such as electricity. If you click on the interactive version, the list of tweets with the terms can be explored. See Word Embedding: Bills for an interactive version of this plot.

Figure 10. An example Scattertext plot showing word associations to term prices using Spacy’s pre-trained embedding vectors. This is used to see the terms most associated with the term prices. At the top right corner, we see the most commonly associated words with the term prices such as electricity. If you click on the interactive version, the list of tweets with the terms can be explored. See Word Embedding: Bills for an interactive version of this plot.

 

 

 An example Scattertext plot showing word associations to term bills using Spacy’s pretrained embedding vectors. This is used to see the terms most associated with the term bills. At the top right corner, we see the most commonly associated words with the term bills such as electricity, prices, energy, power. If you click on the interactive version, the list of tweets with the terms can be explored. See Word Embedding: Prices for an interactive version of this plot.

Figure 11. An example Scattertext plot showing word associations to term bills using Spacy’s pretrained embedding vectors. This is used to see the terms most associated with the term bills. At the top right corner, we see the most commonly associated words with the term bills such as electricity, prices, energy, power. If you click on the interactive version, the list of tweets with the terms can be explored. See Word Embedding: Prices for an interactive version of this plot.

 

 

2. Twitter Insights (Price & Energy Transition Concerns)

 

2.1. India
  • Solar and wind don’t necessarily mean cheaper prices as it did not cause so in Germany. When Germany went all on renewables, energy prices and carbon emissions went up.
  • The electrical prices can drop for people who are sourcing power from the government-owned renewable sources because the prices are not going to vary with oil and natural gas.
  • Renewable energy policy can lead to much lower electricity prices, a stronger globally competitive economy, less import of fossil fuels, and as a result less pollution.
  • Putting a tax on coal and making open access a reality are two potential action areas to make renewable energy affordable.
  • Let oil prices increase and subsidies stop.
  • Many requests to replace fossil fuels with cleaner fossil fuels such as stubbles from farmers.
  • Cut oil imports and encourage renewable energies.
  • A lot of complaints regarding electricity shortage, lack of electricity for hours or days, electricity cut, electricity, and water supply.
  • Fossil fuels are dirty, and Nuclear power is dangerous. Therefore, we need to make renewable energy work or harness clean energy for a better future.

 

2.2. Nigeria
  • People complaining about no constant electricity, and zero business-friendly policy.
  • Enhancing the delivery of electricity in the country.
  • Whenever it rained electricity supply was cut off for days, lack of electricity every weekend daily and overnight, and unstable electricity.
  • No water and no electricity.
  • The electricity sector is the third main consuming sector of oil.
  • Lots of worries and trouble regarding paying electricity bills.
  • Access to electricity is not for everyone.
  • Access to affordable sustainable renewable energy.
  • Renewable energy water and waste management are some of Nigeria’s major partnership areas with Ghana.
  • Harnessing tidal or offshore wind energy which is a clean and renewable source.
  • Lots of positive experiences and low prices with the usage of Solar power systems.

 

2.3. UK

  • Bringing down the prices of electricity and gas.
  • Having stable prices for electricity.
  • People prefer higher prices for gas than electricity.
  • Need to think beyond electricity to affect the energy transition.
  • Renewables disrupt the electricity market and politicians raising electricity prices to tackle climate emergency problems is an awful policy.
  • A lot of requests on investment in Renewable Energies.
  • The transition to renewable is being too slow.
  • Lots of discussions on whether it is good to replace the nuclear stations with renewables.
  • Whether the zero-carbon economy has any economic benefit for the UK.

 

2.4. USA

  • Slowing down climate change.
  • Market-based solutions for climate change.
  • Renewable energy infrastructure is lame and unreliable.
  • Renewables increase electricity prices and distort energy markets with favorable purchase agreements.
  • Many complaints regarding gas prices.
  • National security’s priority should be on renewable energy Investing in its infrastructure and jobs progs.
  • Figure out how to store renewable energy and get rid of excess CO in the atmosphere.
  • Renewable energy represents a significant economic opportunity.

 

 

3. Weighing a word´s importance via Dispersion Plot

A word’s importance can be weighed by its dispersion in a corpus. Lexical dispersion is a measure of a word’s homogeneity across the parts of a corpus. The following plot notes how many times a word occurs throughout the entire corpus for different countries including India, Nigeria, UK, and the USA.

According to the following dispersion plot, access to electricity is an important concern for Nigeria while this is not the case for the other three countries. How do we know that this access is related to electricity? Well, the answer is Scattertext plots shown in the previous section. Analyzing those plots together with the dispersion plot shows that the concern is electricity access.

Access to affordable renewable energy is a big concern in Nigeria and then India, while the affordability of renewable energy is not a problem for people in the UK and the USA. Affordability is a big concern for the people in Nigeria and people have difficulty paying their electricity bills.

Energy, electricity, power, and renewables are also the topic of most of the discussions in all of these countries. But what aspects of each topic are of concern to each country? The answer is given in the previous section where we interpret the results of Scattertext plots.

 

 

Lexical dispersion for various keywords across different countries

Figure 12. Lexical dispersion for various keywords across different countries

 

 

4. Top Trigrams for Different Countries

 

 

Top twenty trigrams for India

Figure 13. Top twenty trigrams for India

 

 

As can be seen from the top 20 trigrams for India the top concerns are Renewable energy, Renewable energy sector, Renewable energy capacity, Renewable energy sources, New renewable energy, and clean renewable energy. These top concerns specifically match the insights drawn from clustering in the previous section.

 

 

Top twenty trigrams for Nigeria

Figure 14. Top twenty trigrams for Nigeria

 

 

As can be seen from the top 20 trigrams for Nigeria the top concerns are Renewable energy, Renewable energy training, Electricity distribution companies, Renewable energy sources, Renewable energy solutions, Solar renewable energy, Renewable energy sector, Affordable prices, Power Supply, Climate change renewables, Public-private sectors, Renewable energy industry, Renewable energy policies, and Access to renewable energy. These top concerns specifically match the insights drawn from clustering in the previous section.

 

 

Top twenty trigrams for UK

Figure 15. Top twenty trigrams for UK

 

 

As can be seen from the top 20 trigrams for the United-Kingdom the top concerns are Free renewable energy, Renewable energy sources, Using renewable energy, New renewable energy. These top concerns specifically match the insights drawn from clustering in the previous section.

 

 

 Top twenty trigrams for USA

Figure 16. Top twenty trigrams for USA

 

 

As can be seen from the top 20 trigrams for the USA the top concerns are Clean renewable energy, Renewable energy sources, Supporting renewable energy, Renewable fuel standard, Transition into renewable energy, Solar renewable energy, New renewable energy, Using renewable energy, Need for quality products, and renewable energy jobs. These top concerns specifically match the insights drawn from clustering in the previous section.

 

 

5. Collocated word clouds & Co-occurrence Network

The following plots display the networks of co-occurring words in tweets in different countries. Here, we visualize the network of top 25 occurring bigrams. The connection between the words confirms the insight derived in the previous section for all cases.

 

 

 Collocate Clouds-India

Figure 17. Collocate Clouds-India

 

 

Co-occurrence Network-India (First 25 Bigrams)

Figure 18. Co-occurrence Network-India (First 25 Bigrams)

 

 

Collocate Clouds-Nigeria

Figure 19. Collocate Clouds-Nigeria

 

 

Co-occurrence Network-Nigeria (First 25 Bigrams)

Figure 20. Co-occurrence Network-Nigeria (First 25 Bigrams)

 

 

Collocate Clouds-UK

Figure 21. Collocate Clouds-UK

 

 

Co-occurrence Network-UK (First 25 Bigrams)

Figure 22. Co-occurrence Network-UK (First 25 Bigrams)

 

 

Collocate Clouds-USA

Figure 23. Collocate Clouds-USA

 

 

Co-occurrence Network-USA (First 25 Bigrams)

Figure 24. Co-occurrence Network-USA (First 25 Bigrams)

 

 

 

 

 

 

More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here