Understanding Youth Sentiments Through Artificial Intelligence
July 13, 2021
In this article, we implemented a Data Analysis pipeline to understand youth sentiments, analyzing aspirations, fears, and thoughts of the youth through scraping the web and youth-led media. Going through their sentiment analysis during the pandemic and around different topics like human rights, politics, education, etc.
This analysis was a part of Omdena’s challenge – Understanding the Sentiments, Thoughts, and Aspirations of Young People
Youth-led media is any effort created, planned, implemented, and reflected upon by young people in the form of media, including websites, newspapers, television shows, and publications. Such platforms connect writers, artists, and photographers in the age range of 13–24 all around the globe and promote and defend a free youth press. Members of these platforms not only have the freedom to express their own opinions on various issues and topics but also represent various communities and let their voices be heard.
Hence, such platforms prove to be a good source of data to understand and analyze youth aspirations across various parts of the globe. In the remaining sections, we will explain our methodology of data collection and will list down our results and insights derived from the analysis of various topics.
This Section is overall given insights about the data was distributed over newspapers and articles, the insights and visualizations tell us about how youths are going on and how their sentiments change over time period (Ranges from 2015–2020).
Why did we choose this topic?
This topic aims to analyze data from a different perspective i.e Outside Social media. This is the reason we choose this topic to scrape and analyze the data i.e present over there outside social media and we present our insights accordingly.
Objectives
- To scrape and process News articles from different resources, to prepare them for sentiment analysis and topic modeling, in order to draw useful insights about the sentiment of the youth from it.
- To conduct sentiment analysis, for understanding the youth sentiment better.
- To collect the insights from all of these points and to visualize the results in a cogent manner for the audience.
Methodology
Data Collection
To collect articles, we scraped data from various media platforms (ref. Table 1) using a scraper we made using BeautifulSoup and requests a library in Python. Lots of articles were scraped ranging from the year 1994 to 2020 and merged to a final dataset that we used for analysis. We also focused on extracting articles for certain categories, viz:
- Education
- Environment & Climate
- Human Rights
- COVID-19
- Politics
- Health and Leisure
Tools Used
For scraping data:
- Beautiful soup
- Requests
- Selenium
For visualizing data:
- Matplotlib
- Seaborn
- Python-Plotly
- Matplotlib-Animations
- Tableau
- Python Word Clouds
For sentiment analysis:
- Text Blob
- Empath Analysis
- Region-Based Analysis
- Knowledge Graph
- Network Analysis
Data Preprocessing
With all the articles scraped, next, we focused on preprocessing the articles. While preprocessing, one of our major challenges was to identify and remove promotional content from the articles. To start with, we removed all the URLs from the articles. Next, we identified the templates that each of the platforms used for advertisements or for promoting other articles and used regular expressions to identify and remove them from the articles. We then sent our articles through a basic preprocessing pipeline to change the case, stem, lemmatize and remove special characters and regular stopwords, etc. We also identified certain redundant words like journalism, etc. that didn’t add to the analysis and removed them from our dataset.
Additionally, we also did a keyword analysis as a preprocessing step so as to ensure that we have everything ready before we start with our analysis. Next, we used Stanford’s NER and Python’s geopy library to identify locations with respect to the articles. Then, we used LDA and Empath based analysis for topic modeling and recognized 9 following topics:
- Environment (Climate Change)
- Leadership & Politics (Democracy, Leadership)
- Health
- COVID-19
- Education
- Technology
- Human Rights (LGBT, Black Lives Matter, Bullying)
- Terrorism and Violence
- Career and Employment
- ENVIRONMENT ( Author: Mr. Mateus Broilo)
There is no question that the Environment is a key topic that gathers the concern of the whole society, from youngsters to adults, and to elders. However, the youth of today are the future of tomorrow and for this reason, they are the part of society that most probably will suffer the most in years to come. The environment can not be seen as a cultural movement, simply because it is not. But it must be seen as and dealt like a political movement and as an economical trend where most of the time it serves the will of powerful corporations.
The Word Cloud shows some of the most common and meaningful words related to the Environment topic analysis. Notice that words like climate, change, people, plastic, and others presented in Below Figure may be correlated to the basic concerns of young people. And not surprisingly they appear as the most common words in over the 380 articles analyzed. Clearly, “climate” and “change” are two pieces of a bigram. Climate is changing and that is a fact. “People” are part of the problem, but also can be the solution, mostly the youth. After all, the youth aspirations are a heat map towards where the world actually should be going to. And just for curiosity, have you ever found a “plastic” bottle on the beach? See Below Figures for more clarity.
One last analysis is to look for lexicons, in other words, to perform a text analysis across lexical categories. Here the main objective is to connect the text with a broad range of sentiments beyond positive, negative, and neutral, as shown in Figure 3.6.15. On the other hand in Figure 3.6.16 we see the most common levels in which the text articles can be categorized and in 3.6.17 the empath values are associated with the most meaningful levels that impact the environmental movement.
2. CARRER AND EDUCATION (Author: Mr. Mario Vasquez Arias, Ms. Adelore Similoluwa Gloria)
Education is one of the chosen categories and, at the same time, fundamental to this study because, if we talk about young people, it should not be lacking. Most young people are at some level of education, be it primary school, high school, or university. Therefore many young people spend a lot of time in educational sites becoming their second home and directly affecting the lives of each young person. As they are considered home, they reflect their personalities, concerns, and other feelings that the young person has at that time, so it is important to analyze this aspect.
We can see in the word cloud that the two words that stand out the most are “school” and “students”, which allude precisely to what education represents, so it is obvious to expect those results. The word “time” also stands out, which we can infer is all the time that young people spend in school, which is a large part of the day for five days and for many years (these words are also visible in the graph). The word “high school” shows that the articles scraped and from which the analyses were made were more focused on a younger population and the target population, precisely. Another key word is “work”, which implies that young people not only study but also work, probably because of economic conditions. Another word that can be visualized is “immigrant”, which is an aspect that has been seen quite a lot in recent years, and education would not be exempt from this. The word “home problem” is seen in a smaller size but also important to note, as this reflects that sometimes students bring home problems to school, affecting their performance on grades and mental health.
Empath Analysis
The four values of empathy at the highest level are school (which is also the most repeated), reading, social networks, and holidays. In this range of time, it is what the young people but have emphasized in their thoughts, the school that already we said that it is like its second home; the habit of the reading that is something that has been increasing, or in physical or digital means; the social networks that these were a boom in the society and practically all the young people know and handle this type of technological services; and the vacations that are a few dates enough waited by the young people to enjoy the free time, their hobbies and the rest. Another value to highlight is technology, which appears at a lower level, but is still relevant for young people, due to the great advance of technology and the great proliferation of services and devices that are available to anyone, especially to this young population.
We can observe the average of the youth sentiment values for each year, where we have the highest peak in 2016 and the lowest in 2020, the latter could be due to the negative feelings generated by the pandemic generated by the coronavirus, which generates feelings of anxiety, confinement due to quarantine, loneliness, and depression, among others.
3. TERRORISM AND VIOLENCE (Author: Ms. Shanya Sharma)
Youth Sentiments Trends over time
The dip in youth sentiments for 2019 can be associated to 2019 Oakland Gun Violence. The same can be inferred from the keywords extracted from 2019 terrorism articles.
Keywords like domestic violence can be seen for the year 2020 which can have a direct relation to COVID-19 and lockdown
Police brutality is also a frequent keyword for 2020 data indicating that police brutality for imposing lockdown (for e.g. India) or that surrounding George Floyd’s case kept the youth in terror.
4. HUMAN RIGHTS(Author: Mr. Opeyemi Fabiyi)
Keywords for certain locations
Let’s Look at the Emotion Trends
- A gradually increasing trend for negative emotions wrt human rights is concerning
- Similarly, hate can be seen to be increasing gradually
- A higher value for positive emotions can indicate that the articles might also be hopeful about certain aspects
- An analysis for finding the reason for the peak in 2018 showed that most articles written during these times were about how the writers want to fight the wrongdoings around them and indicated hope.
- Some common causes of concerns were:
Racism
Violence
Poverty
Immigration
Homophobia
6. Some concerning insights that came were:
Youth in India is worried about Menstrual Hygiene
Sex-Trafficking is a cause of concern in developed nations like the US.
SEX TRAFFICKING
- Almost all articles were from the United States
- Most articles are written in 2019
5. POLITICS(Author: Ms. Kriti Rai Saini)
Let’s analyze yearly changes in youth sentiments over the years due to politics on different topics.
- The steep increase in disappointment due to the #MeToo movement, the US Presidential results announcement, and the Facebook scandal.
- A steep decrease in hate in 2018 maybe due to the royal wedding and an increase in 2020 due to the #BLM movement.
- Steep Increase in poor in 2020 due to the COVID pandemic.
- An increase in death in 2020 can be attributed to the COVID pandemic.
- Increase in anger due to constant discontent with the political situations over the years.
- Increase in violence in 2020 due to lockdown imposed, the #BLM movement, and the peak in 2017 due to the #MeToo movement, mass violence in the USA(Texas, Las Vegas).
- An increase in contentment and optimism in 2020 maybe because the pandemic has made people realize the importance of little things. The peak in contentment in 2017 due to the royal wedding.
- The peak in lexicon love in 2017 due to the royal wedding.
- The peak in lexicon strength in 2017 due to the Women’s march wherein 1 million women stood up for women’s rights and in 2020 due to the BLM movement.
6. COVID-19(Author: Ms. Monalisa Panda)
In the above fig, we can see that there are only a few articles present in covid that is only the year 2020.
Mean Sentiments over different Months before lockdown vs after lockdown.
So these are listed as positive youth sentiments on the topic of COVID-19, mostly the words detected are:
People, Time: Most people get time to spend with their families and Relatives.
So here in this word cloud, we can see that misinformation and racism, discrimination is some of the negative key holders in the case of the COVID topic.
Racism
The peak in the negative emotions can be associated with US Presidential Elections 2016
Fear with respect to racism was gradually decreasing but saw a slight rise in 2020
Region-based Positive and Negative youth sentiments on the topic of COVID
Positive Youth Sentiments Regions:
Negative Youth Sentiments Regions:
So these are the Whole analysis with all the topics mentioned at the Top. Once again I would like to thank all Omdena to give me this Wonderful Opportunity to work on this project.
—
This article is written by Monalisa Panda.
Ready to test your skills?
If you’re interested in collaborating, apply to join an Omdena project at: https://www.omdena.com/projects