Demo Day Insights | Accelerating the Clean Energy Transition | World Energy Council

Demo Day Insights | Accelerating the Clean Energy Transition | World Energy Council

By Rosana de Oliveira Gomes

Two Omdena teams with a total of 50 AI experts and data scientists from 25 countries collaborated with the World Energy Council and the Nigerian NGO RA365 in carrying out data-driven analyses and providing AI solutions to address the Global Transition to Clean Energy.

At a recent Omdena Demo Day, team members Amardeep Singh, Julia Wabant, and Simon Mackenzie shared the results and insights gained from these two projects.

 

The Topic: Energy Transition

One of the Sustainable Development Goals adopted by all United Nations Member States in 2015 aims to ensure access to affordable, reliable, sustainable, and modern clean energy for all by 2030.

Transitioning into a society with cleaner energy is crucial for fighting climate change. Different parts of the world are currently facing different stages of the energy transition. This can be noted both on the implementation of solutions in specific regions as well as in the cultural perception of such transition by societies. Both topics are addressed in the following two Omdena use cases.

 

1. Use Case: AI for Renewable Energy in Nigeria

 

Clean Energy 

 

Nigeria is one of the countries in the world facing the most severe energy challenges. Over half of the country’s population — 100 million people — lack access to electricity. Some of the problems faced by Nigerians include precarious electricity systems, unstable electricity supply, and electricity available only in certain locations.

An alternative to these problems is investing in local and renewable power solutions. Renewable Africa RA365 is an NGO with the mission to end energy poverty in Nigeria by leveraging innovative clean energy solutions and focusing on providing solar energy to vulnerable populations. In this project, the Omdena team partnered up with RA365 with the goal of identifying communities where solar panels would add the most value.

The first task in this challenge was to define what these areas should be: groups of about 4000 people living within a radius of about 500 m, and that are located more than 15 km away from a power grid. Regions close to schools, healthcare centers, and water locations were considered to have a higher ranking of priority, as they can benefit even more from renewable energy implementation.

One of the biggest challenges in the project was the lack of data for population density, making it hard to identify where people need assistance. In order to find out how the population is distributed in Nigeria and determine who is without access to electricity, the team compared nighttime satellite imagery from NASA Black Marble VIIRS against the geographic location of the population using the Demographic and Health Surveys (DHS) program, ground surveys from WorldPop, and the GRID3 dataset. Also, for identifying the national grid location and, therefore, find regions where people live in relation to existing power lines, the team applied Machine Learning techniques on satellite images from the HV grid from Development Seed/World Bank.

 

Clean Energy

Combined two satellite data information on average over a large number of nights and seasons.

 

Clean Energy

 

Finally, the team finally worked on finding, among all these towns without electricity supply, which ones would be suitable for the criteria established for the implementation of a local solar energy system. This was done by clustering 4000 people in a 500 m radius using the DbScan clustering technique, leading to the identification of over a thousand high-potential regions.

 

Clean Energy

 

Clusters of towns with populations between 4–15 thousand people which are suitable for potential off-grid solar navigation in the North of Nigeria.

The Omdena’s team deliverable for this project: A prototype interactive map of the whole of Nigeria identifying the regions with a high demand for electricity and a high potential for solar.

The next steps for this project include a detailed survey for the top target areas in order to identify which locations are most suitable both in terms of infrastructure and cost for implementation of solar systems.

A detailed description of this project and its documentation are available in other Omdena publications. See more about the background of this project in this Omdena article.

 

The Impact

The initiative taken by Omdena and Renewable Africa RA365 has the potential of enabling data-driven investments and policy-making that can change the lives of many people in Nigeria and other African countries.

The data and prototype of this project have been shared with the Lagos State Government agency for solar systems, which is now willing to start the process of mass production already in 2020.

“In order to get this job done, it is not all about providing solutions to these people. We want to make sure that the solutions get to the right people at the right places, and Omdena has really helped us to achieve that.”

Joseph Itopa, Machine Learning Engineer at Renewable Africa RA365

 

2. Use Case: Sentiment Analysis on Energy Transition

The transition away from dependency on CO2 to a more sustainable society dominates the news headlines worldwide, exposing conflicting opinions and political measures driving towards a future with cleaner energy sources. Understanding the clean energy transition at a human-level is crucial to the effectiveness of whatever steps are taken in the direction of a carbon-free society.

Commissioned by the World Energy Council, the world’s leading member-based global energy network, Omdena explored applications of AI in understanding how people in different regions of the world perceive the energy transition and their role in it.

Using natural language processing (NLP) techniques, the team created tools to collect, scrape, and analyze text about the clean energy transition found on different social media sources (Twitter, YouTube, Facebook, Reddit, and famous newspapers). This text data was analyzed using varying methods, such as sentiment analysis, topic modeling, and clustering to reveal the challenges, reactions, and attitudes of citizens around the world.

 

Sentiment Analysis Reddit

Topic “Energy transition” for the USA on Reddit.

 

Visualizations of the results allow for comparisons of sentiments across nations and societies. The analysis was first focused on English speaking countries, as this provides a common basis for comparing text. For this, the countries representative of different continents and development backgrounds were: USA (America), UK (Europe), Nigeria (Africa), and India (Asia).

 

Renewable Energy

Data Analysis of Twitter data.

 

The word cloud representation of the results shows that among the 4 countries investigated, only Nigeria has prominent tweets about “electricity supply”. Similarly, “gas prices” are specific to the USA. However, “renewable energy” is present in all 4 countries.

A part of the analysis was also expanded to other countries and languages, gathering and analyzing tweets related to complaints about “renewable energy cost” in more than 20 countries. The results revealed how local conditions and culture can differ significantly from different places. For example, “technology” was the most relevant concern in the complaint tweets in Brazil and France, whereas in Nigeria these tweets were focused solely on “policy”.

 

Energy Transition

Complaints about Energy Transition

 

Other short and detailed discussions about this project can be found in Omdena publications.

 

The Impact

Though broad conclusions cannot be drawn from these isolated collections of data, the results point to models and data sets that are promising for further development. The analysis carried out by the Omdena team allowed for a better understanding of how natural language processing techniques can be used to capture the opinions and concerns of people worldwide about the clean energy transition.

“The Council has been interested in how public sentiment on energy issues might be tracked, or if this were even possible. That is where this project came in — the team at Omdena explored the broad brief and have proven that the conceptual idea is possible.”

Martin Young, Senior Director at the World Energy Council

 

The demo day recording

 

 

Collaborators from this project

We thank our partner organizations, Renewable Africa 365 and the World Energy Council. as well as all Omdena collaborators (listed below) who made the project a success.

 

Omdenda team members, on the Renewable Energy Nigeria project:

  • Anastasis Stamatis, Greece
  • Daniil Khodosko, Canada
  • Peace Bakare, Nigeria
  • John Wu, Australia
  • Siddharth Srivastava, India
  • Simon Mackenzie, UK
  • Hoa Nguyen, Vietnam
  • Takashi Daido, Japan
  • Jessica Alecci, Netherlands/Italy
  • Jack David, UK
  • Shubham Bindal, India
  • Deborah David, France
  • Qi Han, Singapore
  • Stefan Hrouda-Rasmussen, Denmark
  • Varun G P, India
  • Ifeoma Okoh (Ify), Nigeria
  • Suraiya Khan, Canada
  • Ivan Tzompov, Bulgaria
  • Henrique Mendonca, Switzerland
  • Himadri Mishra, India
  • Sai Praveen, India
  • Jaikanth J, India
  • Krithiga Ramadass, India

 

Omdena team members, on the Energy Transition Social Sentiment project:

  • Syed Hassan, UAE
  • Julia Jakubczak, Poland
  • Marek Cichy, Poland
  • Krithiga Ramadass, India
  • Abhishek Deshpande, India
  • Julia Wabant, France
  • Simon Mackenzie, UK
  • Alejandro Bautista Ramos, Mexico
  • Irune Lansorena Sanchez, Spain
  • Vishal Ramesh, India
  • Elizabeth Tishenko, Poland
  • Shashank Agrawal, India
  • Ilias Papadopoulos, Greece
  • Aqueel Jivan, USA
  • Nicholas Musau, Kenya
  • Matteo Bustreo, Italy
  • Mahzad Khoshlessan, USA
  • Yamuna Dulanjani, Sri Lanka
  • Fiona, USA
  • Murindanyi Sudi, Rwanda
  • Raghhuveer Jaikanth, India
  • Abhishek Gupta, USA
  • Aboli Marathe, India
  • Momodou B Jallow, China
  • Jordi Frank, USA
  • Amardeep Singh, Canada
  • Julie Maina, Kenya
 
 
 
 
 

More About Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

| Demo Day Insights | Matching Land Conflict Events to Government Policies via Machine Learning

| Demo Day Insights | Matching Land Conflict Events to Government Policies via Machine Learning

By Laura Clark Murray, Joanne Burke, and Rishika Rupam

 

A team of AI experts and data scientists from 12 countries on 4 continents worked collaboratively with the World Resources Institute (WRI) to support efforts to resolve land conflicts and prevent land degradation.

The Problem: Land conflicts get in the way of land restoration

Among its many initiatives, WRI, a global research organization, is leading the way on land restoration — restoring land that has lost its natural productivity and is considered degraded. According to WRI, land degradation reduces the productivity of land, threatening the economy and people’s livelihoods. This can lead to reduced availability of food, water, and energy, and contribute to climate change.

Restoration can return vitality to the land, making it safe for humans, wildlife, and plant communities. While significant restoration efforts are underway around the world, local conflicts get in the way. According to John Brandt of WRI, “Land conflict, especially conflict over land tenure, is a really large barrier to the work that we do around implementing a sustainable land use agenda. Without having clear tenure or ownership of land, long-term solutions, such as forest and landscape restoration, often are not economically viable.”

 

Photo credit: India’s Ministry of Environment, Forest and Climate Change

Photo credit: India’s Ministry of Environment, Forest and Climate Change

 

And though governments have instituted policies to deal with land conflicts, knowing where conflicts are underway and how each might be addressed is not a simple task. Says Brandt, “Getting data on where these land conflicts, land degradation, and land grabs occur is often very difficult because they tend to happen in remote areas with very strong language barriers and strong barriers around scale. Events occur in a very distributed manner.” WRI turned to Omdena to use AI and natural language processing techniques to tackle this problem.

 

The Project Goal: Identify news articles about land conflicts and match them to relevant government policies

 

Impact

“We’re very excited that the results from this partnership were very accurate and very useful to us.

We’re currently scaling up the results to develop sub-national indices of environmental conflict for both Brazil and Indonesia, as well as validating the results in India with data collected in the field by our partner organizations. This data can help supply chain professionals mitigate risk in regards to product-sourcing. The data can also help policymakers who are engaged in active management to think about what works and where those things work.” — John Brandt, World Resources Institute.

 

The Use Case: Land Conflicts in India

In India, the government has committed 26 million hectares of land for restoration by the year 2030. India is home to a population of 1.35 billion people, has 28 states, 22 languages, and more than 1000 dialects. In a land as vast and varied as India, gathering and collating information about land conflicts is a monumental task.

The team looked to news stories, with a collection of 65,000 articles from India for the years 2017–2018, extracted by WRI from GDELT, the Global Database of Events Language and Tone Project.

 

Identifying news articles about land conflicts

Land conflicts around land ownership include those between the government and the public, as well as personal conflicts between landowners. Other types of conflicts include those between humans and animals, such as humans invading habitats of tigers, leopards, or elephants, and environmental conflicts, such as floods, droughts, and cyclones.

 

 

The team used natural language processing (NLP) techniques to classify each news article in the 65,000 article collection as pertaining to land conflict or not. While this problem can be tackled without the use of any automation tools, it would take human beings years to go through each article and study it, whereas, with the right machine or deep learning model, it would take mere seconds.

A subset of 1,600 newspaper articles from the collection was hand-labeled as “positive” or “negative”, to act as an example of proper classification, or example of proper classification. For example, an article about a tiger attack would be hand-labeled as “positive”, while an article about local elections would be labeled as “negative”.

To prepare the remaining 63,400 articles for an AI pipeline, each article was pre-processed to remove stop words, such as “the” and “in”, and to lemmatize words to return them to their root form. Co-referencing pre-processing was used to increase accuracy. A topic modeling approach was used to further categorize the “positive” articles by the type of conflict, such as Land, Forest, Wildlife, Drought, Farming, Mining, Water. With refinement, the classification model achieved an accuracy of 97%.

 

 

With the subset of land conflict articles successfully identified, NLP models were built to identify four key components within each article: actors, quantities, events, and locations. To train the model, the team hand-labeled 147 articles with these components. Using an approach called Named Entity Recognition, the model processed the database of “positive” articles to flag these four components.

 

 

 

Matching land conflict articles to government policies

Numerous government policies exist to deal with land conflicts in India. The Policy Database was composed of 19 policy documents relevant to land conflicts in India, including policies such as the “Land Acquisition Act of 2013”, the “Indian Forest Act of 1927”, and the “Protection of Plant Varieties and Farmers’ Rights Act of 2001”.

 

 

A text similarity model was built to compare two text documents and determine how close they are in terms of context or meaning. The model made use of the “Cosine similarity” metric to measure the similarity of two documents irrespective of their size.

The Omdena team built a visual dashboard to display the land conflict events and the matching government policies. In this example, the tool displays geo-located land conflict events across five regions of India in 2017 and 2018.

 

 

Underlying this dashboard are the NLP models that classify news articles related to land conflict, and land degradation, and match them to the appropriate government policy.

 

 

The results of this pilot project have been used by the World Resources Institute to inform their next stage of development.

Join one of our upcoming demo days to see the power of Collaborative AI in action.

Want to watch the full demo day?

Check out the entire recording (including a live demonstration of the tool).

 

| Demo Day Insights | How COVID-19 Pandemic Policies Affected the Vulnerable Populations

| Demo Day Insights | How COVID-19 Pandemic Policies Affected the Vulnerable Populations

By Devika Bhatia & Laura Clark Murray

 

A team of 28 AI experts and data scientists collaborated to gauge the impact of pandemic policy implemented post-COVID-19 on vulnerable populations to find correlations and encourage data-driven policymaking to lessen the adversity for the most vulnerable populations around the world.

The entire data analysis including a live demonstration can be found in the demo day recording at the end of the article.

 

COVID-19 pandemic policy impacting the world’s vulnerable populations

At the onset of the pandemic in 2020, the World Health Organization urged governments to take “urgent and aggressive action” against COVID-19. Many governments reacted with strict measures such as closing borders and quarantining entire cities. Governments all over the world enacted these policies without fully analyzing the factors that impact their effectiveness. Nor did they consider how these policies might deepen the problems for vulnerable populations in different regions.

 

The project goal: Conduct data-driven impact-analyses on how various pandemic policies affect the well-being of vulnerable populations.

 

Defining “Vulnerability”

An important step of the project was to define “vulnerability” with respect to the particular context. The project focused on the factors of employment and wage loss, access to health, and domestic violence. To identify the vulnerable population for each of these categories, the team looked to the Inequality-adjusted Human Development Index, considered populations above 65 years of age, and women.

 

Source: UNDP

 

 

 

 

Assessing policies and their effects

The team looked at 17 types of policies from the Oxford COVID-19 Pandemic Government Response Tracker, across the categories of containment, economic response, and health systems. The policies explored included closing of public transportation, stay at home requirements, income support, COVID-19 testing policy, and emergency investment in healthcare.

To analyze the effects of these policies, three key aspects were considered:

  • Time of policy enactment: comparing the time of policy enactment with the effect on a target variable
  • Stringency metric: the degree of intensity of the policies enacted
  • Google Mobility Dataset: quantifies the movement of people in places (e.g. grocery stores vs. parks)

 

Domestic violence as a ‘Shadow pandemic’

It was ascertained that domestic violence is a growing shadow pandemic as countries displayed a relationship between a decrease in mobility and an increase in the google search rates of relevant topics, coupled with an increase in the number of domestic violence-related articles.

The number of news articles related to both Covid19 and domestic violence started to increase a couple of weeks after the first lockdown measures were implemented in Europe (end of February).

 

Figure 1: Graph between Ratio of News Articles and Date of recording the values

 

The data shows a strong relationship between a decrease in mobility and an increase in Google search rates of domestic abuse topics in many countries. In the countries considered, other than Japan, the peak in search rates has doubled or even tripled, as seen in these graphs of the data from France and India.

 

Figure 2: Graph between Search trend and mobility change (%) and Date recorded with two different categories namely: Schooling Closing and Workplace Closing for France

 

Figure 2: Graph between Search trend and mobility change (%) and Date recorded with two different categories namely: Schooling Closing and Workplace Closing for India

 

The results indicate that the problem of domestic violence could be much bigger than indicated by news stories.

 

Access to healthcare

The effects of COVID-19 pandemic-response policy measures on access to healthcare, specifically for non-COVID patients was a fascinating angle in this challenge.

The team sought to understand the effects of policy measures on access to healthcare, specifically for non-COVID patients. The vulnerable population was defined based on age, existing chronic medical conditions, and physical access to care facilities. The analysis was focused on England and Wales where there was significant relevant data.

It was found that there was high-mortality among patients with non-COVID chronic diseases during the pandemic as compared to the numbers for the same group in previous years. The data shows a correlation between medical appointment status, such as whether an appointment was kept, changed, or canceled, the stringency of the pandemic policies enacted for the region, and the mobility of the population in that region. In other words, the stringency of pandemic policy and the resulting restrictions on the mobility of a population may cause the medically-vulnerable to miss or avoid regular medical care. And this may be contributing to the increase in non-COVID deaths among this group.

 

 

 

The economic impact of pandemic policies

Closures, lockdowns, and decreased mobility have led to wage and employment loss. Though some governments have instituted income support policies, the timing of that aid correlates to employment loss. In countries where income support policies were put in place at roughly the same time as stringent lockdown policies such as workplace closings, the unemployment rate remained relatively flat. This was the case, for example in Sweden and Belgium. In contrast, a delay in the implementation of income support policies correlates to an increased unemployment rate, as was seen in the United States.

Income support policies may affect individuals in the labor force differently. Many countries have undergone employment and wage loss in the informal economy, wherein enterprises, jobs, and workers are not protected by the state.

The team set out to identify the most economically vulnerable populations in this context. The analysis focused on those countries with stringent lockdowns that have implemented income support policies, and in which the population works in sectors highly-impacted by the pandemic policy, such as accommodation and food service, manufacturing, and retail trade.

Some of the results of this analysis are represented here by a mapping of countries according to the stringency of their pandemic policies and the share of their labor force participation in highly impacted sectors. Each country is represented as a circle, the color, and size of which indicates the vulnerability of the workforce in terms of the share of the workforce involved in informal labor.

 

Vulnerability ranked by Informality Rate

 

Circle size denotes vulnerability, defined in terms of percentage of worker in high impact sectors and share of workforce involved in informal labor.

Figure 3: The circle size denotes vulnerability, defined in terms of percentage of workers in high impact sectors and share of the workforce involved in informal labor.

 

This type of topology of the vulnerability of labor forces during the pandemic may be useful in indicating which groups to attend to with income support policies.

 

Conclusions

While government lockdown policies were designed to slow the spread of COVID-19, they had direct and indirect negative effects on their populations.

  • We found that non-COVID deaths of those with existing health conditions and considerations increased during the pandemic, for the population studied. For this medically-vulnerable population, we found a relationship between the stringency of lockdown pandemic policy and the level of mobility within a locality, with the delivery of non-COVID, and potentially life-saving, healthcare.
  • Domestic violence emerged as a growing “shadow pandemic”. We found a strong relationship between a decrease in mobility of a population and indicators of domestic violence.

 

To offset the economic impact of anti-contagion policies, many governments instituted income support policies.

  • We determined that the timing of income support policies mattered. For the locations studied, when income support policies were implemented at the same time as lockdown measures, unemployment rates stayed flat. In contrast, in countries where income support policies were delayed, unemployment rate curves remained steep even after policy implementation.
  • The team created economic vulnerability assessments of countries, by considering the stringency of lockdown policies and the share of the labor force involved in highly-impacted sectors and in the informal economy. Income support policies may be more effective when such vulnerability is considered.

 

Our objective with these results is to support policymakers in finding the most effective ways to minimize the suffering of those most vulnerable.

 

Find all insights in the demo day recording

 
All Collaborators from this project

We thank our partner organizations, AI for Peace, SH4P, and PWG. as well as all Omdena collaborators (listed below) who made the project a success.

Omdena collaborator

 

More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Student Debt Crisis: Analysing Sentiments, Repayment, and Wellbeing of Students through AI | ShapingEDU

Student Debt Crisis: Analysing Sentiments, Repayment, and Wellbeing of Students through AI | ShapingEDU

by Nancy Rubin, Arizona State University ShapingEDU Innovator in Residence

 

The “Applying AI to the Student Debt Crisis” project is entering week 5 and we are moving along at a fast pace. With just 3 more weeks to go in our eight-week project, some areas of focus are emerging.

It is impressive to see how much can be accomplished in a short period of time. I credit the progress to a fully engaged community and offer appreciation to the Omdena project Shepherds who keep everyone moving in a positive direction.

Each week, project update calls get more exciting as work is done and ideas exchanged.

Sentiment Analysis on social media sites, such as Twitter and Reddit, is leading to a deeper understanding of feelings about student debt. Mining news sites have shown sentiment about policies relating to student loans and student loan debt which I hope to hear more in this week’s update.

 

Initial Word Cloud of Sentiment about Student Debt

Figure 1: Initial Word Cloud of Sentiment about Student Debt

 

Several groups are looking at loan repayment data. One group is exploring demographic data to predict repayment capabilities so students can better understand loans and repayment obligations. Another group is gathering income levels and spending habit data of demographic groups in the United States so they can create a regression model to estimate, without bias, a comfortable percentage of income needed for college in times of student debt crisis.

As the participation in higher education of women, minorities and first-generation college students increases, it appears these groups are

  • less likely to get well-paid jobs;
  • more prone to other debt, and
  • have higher levels of stress and which impacts their wellbeing.

Recent job losses due to COVID-19 have affected these groups disproportionally. The impacts of the Coronavirus on the debt crisis can be seen in both sentiment analysis and the ability to repay loans. A group looking at this more closely will be able to share information that could be relevant for students considering loans after COVID-19 ends.

 

Will COVID-19 impact borrowers flowchart

Figure 2: Will COVID-19 impact borrowers

 

Student loan debt is stressful even when there aren’t additional factors, such as a pandemic, adding to it. Through an exploratory data analysis of academic research, one group is assessing if student loan/debt crisis has a statistically significant impact on mental health.

 

 

I have been working with team members to decide how to best surface the results of this project when it ends. If you have any suggestions, please share them with me. There will be informational outcomes to educate students about saving for college, educational materials about the implications of loans, and repayment informed by demographical insights. Hopefully, specific policy recommendations based on real data will be available as well.

This project is hosted by Arizona State University ShapingEDU. ShapingEDU is a community of dreamers, doers, and drivers shaping the future of learning in the digital age.

 

More About Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through global bottom-up collaboration.

 

 

Domestic Violence - The Shadow Pandemic of COVID-19

Domestic Violence - The Shadow Pandemic of COVID-19

 

By Omdena Collaborator Elke Klaassen


 

 

The Problem: Effects of policy measures on the vulnerable population

 

To prevent the spread of Covid-19, many governments have been taking strict measures such as closing borders, imposing nationwide lockdowns, and setting up quarantine facilities. While these measures may ensure that social distancing is followed seriously, they may have indirect effects on the economy and adverse effects on the well-being of people, especially the vulnerable population. To help governments make data analysis-driven policy decisions to effectively deal with issues like during COVID-19 like Domestic Violence, Omdena provided an enabling platform to AI experts, data scientists, and domain experts so that they could study the effects of Covid 19 policy measures on the vulnerable population. This article describes the results of one of many facets of this challenge, which focused on the impact of Covid-19 on domestic violence using Data Analysis.

The goal of this task was to get a better grip on domestic violence during COVID-19 and gauge the scale of the problem. To this end, different data sources were used — including news articles, policy data, mobility trends, and domestic violence search rates. The results indicate that the problem of domestic violence could be much bigger than indicated by some of the key figures mentioned in the news. Further, restrictions on movement and strict enforcement of lockdowns may have further amplified the issue. It can be said that domestic violence is a shadow pandemic and it is integral to understand the gravity of the problem and ensure redressal and support to survivors and vulnerable populations.

 

 

Domestic violence — a growing shadow pandemic of COVID 19

The UN Women recently labeled the increase of violence against women as ‘a growing shadow pandemic’. As a consequence of Covid19 policy measures, many victims find themselves in proximity to their abusers due to lockdown measures. The world is witnessing a sharp rise in the number of helpline calls, domestic violence reports, as illustrated in the following infographic. This highlights the pressing need to reflect upon the pre-existing and growing incidence of domestic violence and sensitizing organizations and communities at the grassroots level to provide help and support.

 

 

Infographic on Covid19 and domestic violence adapted from the UN Women.

Infographic on Covid19 and domestic violence adapted from the UN Women.

 

 

 

The shadow pandemic’s size— news coverage

The news is replete with reports and cases of domestic violence and its surge during the pandemic. During March beginning, the increase in domestic violence in China received coverage in the news. In the Hubei Province the number of reported cases had tripled in February, compared to the same period last year. Weeks later, similar articles appeared from all over the world.

To get the first grip on gravity and spread of this shadow pandemic, a dataset of about 80,000 Covid-19-related news articles was used. This dataset was created using GDELT to query relevant articles and news-please to extract contents. The said dataset has been used for different analysis in the Omdena AI pandemic challenges. To identify the news articles related to domestic violence, the corpus was filtered based on domestic violence-related keywords. In total 1,500 articles were linked to both Covid-19 and domestic violence  using Data Analysis revealing a connection.

 

 

Covid19 and domestic violence-related articles

To assess the relevance of the subset of domestic violence-related news articles, LDA topic modeling was performed, using gensim. Three topics were modeled, and one of these clearly illustrates that the considered subset covers domestic violence. The world-cloud of this topic is shown in the figure.

 

Graph between Number of news articles and Date

Number of both Covid19 and domestic violence-related news articles over time.

 

The absolute increase in domestic violence-related articles

The number of news articles related to both Covid-19 and domestic violence started to increase a couple of weeks after the first lockdown measures were implemented in Europe (end of February).

 

Relative increase

The increasing trend in domestic violence-related articles could be explained by an overall increase in Covid-19 related articles. To study whether the topic of domestic violence has become more dominant in the discussion, the ratio of domestic violence-related articles versus the total number of Covid-19 related articles is illustrated. An increasing trend can be observed using Data Analysis, indicating that the issue of domestic violence has become more dominant post the onset of the pandemic.

 

 

Graph between Ratio of Domestic Violence News Articles and Date

Domestic violence-related news articles are relative to Covid19 related news articles.

 

 

The shadow pandemic’s size — search rates

The data mentioned in the news is typically in summary form, similar to the key figures shown in the Infographic of UN Women. To get a more detailed grip on the extent and size of the shadow pandemic, different datasets were used:

  • Policy data:
    OXFORD COVID-19 Government Response Tracker (OxCGRT), covers the policy measures taken in 152 countries (accessed on May 8, 2020).
  • Mobility data:
    Google COVID-19 Community Mobility Reports, indicate the percentual changes in mobility patterns in 132 countries (accessed on May 8, 2020). The data is relative (_rel) to the mobility patterns between January 7 and February 7, 2020. To limit stochasticity, a moving average (_ma) filter of 7 days (1 week) was applied.
  • Search data:
    Google Trends data, indicates the search trend of a certain topic over time (accessed on May 8, 2020). To get the percentual change (_rel) in search rates, this date is made relative to a baseline period as well (Jan 3 — Feb 13). To remove stochasticity a moving average filter (_ma) of 14 days (2 weeks) was applied to the Google Trends data.

 

The data analysis focuses on countries that are present in all three datasets, and that have sufficient Google Trends data available. The condition of having data available for at least 50% of the considered time period (Jan 3 — May 8) was imposed. This ensured that the analysis was expansive and included a total number of 53 countries.

The search trend data is considered to be relevant for studying the scale of the problem in situations where one is in search of help, has access to the internet, and has a certain level of trust in societal organizations to be able to offer help. Evidently, the last two conditions are not met in different countries to the desired level across the world. This is, amongst others, reflected in the Human Development Data — for example, the % of the (female) population that has access to the internet. Hence, the results should be considered with these conditions, caveats, and nuances in mind.

Further, the use of search rates has a clear advantage. The victim’s quest for help and receiving help is expected to consist of several steps; and more courage is required for every succeeding step that needs to be taken. The most basic step might be to browse the web for ways to deal with and seek help for domestic violence. Hence, search rate data might reflect the scale of the real problem more accurately than the number of domestic violence reports, because the search rate is probably the first step a victim might take in seeking assistance. 

 

Correlation between policy measures, mobility, and domestic violence search rates using Data Analysis

 

The first step in the analysis is to study correlations between the different features in the dataset. The correlation plot for France is shown below. A highly negative correlation (-0.95) between workplace mobility and domestic violence search rates can be observed. And, as expected, workplace mobility highly correlates with the workplace closing policy measure that was implemented by the government.

 

 

Correlation plot of the different features of the policy, mobility, and search rate dataset (France).

Correlation plot of the different features of the policy, mobility, and search rate dataset (France).

 

 

Graph between Search Trend and Mobility Change and Date

Policy measures, mobility, and search rate trends over time (France).

 

In the figure, the trend of workplace mobility and domestic violence search rates is visualized over time. The negative correlation between both variables is illustrated by the decrease in workplace mobility, while at the same time there is an increase in domestic violence search rates. Compared to the baseline, search rates almost doubled (100% increase). This indicates that the incidence of searching for information related to domestic violence increased with the decline in workplace mobility and as people found themselves stuck at home.

 

Regression models to quantify the effect of mobility on domestic violence search rates

Regression models were used to assess the size and significance of the relationship between workplace mobility and domestic violence search rates.

 

Regression model results of the impact of mobility on domestic violence search rates (France).

Regression model results of the impact of mobility on domestic violence search rates (France).

 

The linear line in the scatter plot is the illustration of the output of the regression model for the case study of France. The relationship between mobility and domestic violence is significant, and the slope indicates that with every 1% decrease in mobility, domestic violence search rates increase by 1.4%.

The results of the models for the countries in the top 10 and bottom 10 are listed below. In the top 10 countries, decreasing mobility correlates with a steep increase in domestic violence search rates. In the bottom 10 countries, the opposite trend is observed: mobility and domestic violence both decrease at the same time. To further study and explain the results of the different models, the individual plots for the first six in the categories of the top 10 and bottom 10 countries are shown in the next section.

 

 

Tabular format describing top 10 countries to bottom 10 countries defining Pvalues, Coefficient, Country, and Significance.

 

 

Countries illustrating a strong relationship between a decrease in mobility and an increase in domestic violence

The individual figures for the first six among the top 10 countries are shown. These countries have a strong relationship between mobility decrease and domestic violence increase.

 

Graph Between Search Trend and Mobility Change % vs date for 6 countries namely, Vietnam, Japan, South Africa, Germany, France, and Belgium.

 

  • With the exception of Japan, the peak in search rates has doubled or even tripled in each of the illustrated countries.
  • Although the coefficient in Japan is relatively high, the peak in search rate is ‘just’ 60%. This is due to a relatively limited decrease in mobility, likely due to less strict lockdown measures in this country.
  • Vietnam stands out with a peak in domestic violence search rates that increased by more than triple the baseline. The issue of domestic violence in light of social distancing in Vietnam is stressed in this article as well, stating that the number of people who are in need of shelter has doubled compared to 2018 and 2019.
  • The figures for Germany, France, Belgium, and South Africa, clearly illustrate the increasing trend in domestic violence search rate as mobility drops.

 

Countries not illustrating a relationship between a decrease in mobility and an increase in domestic violence

The individual figures for the final six countries among the bottom 10 countries are displayed below and show a positive relationship between mobility and domestic violence.

 

Graph Between Search Trend and Mobility Change % vs date for 6 countries namely, Australia, Thailand, South Korea, Jamaica, El Salvador, and Philippines.

 

  • First of all, the plot for Australia stands out, which witnessed a high increase in domestic violence towards the end of February. The sudden rise in domestic violence in Australia is assumed to be a consequence of the bushfires which occurred around this time. This relationship is also expressed in this article: ‘the bushfires’ hidden aftermath: Surging risk of domestic abuse
  • In South Korea, lockdown measures could be considered to be more targeted instead of strict blanket measures, and this could explain the unique trend displayed for this country as compared to the others.
  • For the Philippines, Thailand, El Salvador, and Jamaica, the simultaneous drop in domestic violence search rates and mobility is visible. This does not mean that there have been fewer domestic violence incidents. There can be various other factors influencing the observed search rate trends. For example, the peaks in search rates in these countries towards the late February / beginning of March could be explained by the (media) attention given to domestic violence in light of International Women’s Day on March 8. there was a large turnout for the different marches that were held that day, both in Asia and Latin America.
 

 

Action is needed to mitigate the increase in domestic violence

This article studies the impact of the Covid-19 global pandemic on domestic violence. The increase in domestic violence can be viewed as the ‘growing shadow pandemic’. This is stressed by the news as well — there is an increasing trend in the number of articles that cover the issue. Some of these articles give insight into the gravity and scale of the ‘growing shadow pandemic’ in summary form. For example, the Infographic of UN Women, shown at the beginning of this article, mentions that in France, Argentina, Cyprus and Singapore domestic violence emergency calls and reports have increased by more than 30%.

 

The results indicate that the problem of domestic violence could be much bigger than indicated by some of the key figures in the news
The Data analysis of Google mobility and search rate trends shows that the effect of lockdown measures on domestic violence, such as the closing of workplaces, can be much higher than 30%. In countries where the inverse relationship between the decrease in mobility and increase in domestic violence is strongest, search rates have doubled, and some more than tripled. A search query could be considered the most accessible step in seeking out help. This could explain why the results in this article indicate that the problem of domestic violence could be much bigger than the previously mentioned key figures.

It is important to note that there are many other factors that can influence the search rate results. The extent to which the search rates may accurately reflect the growing scale of the problem of domestic violence also depends on the situation the countries are in. As stated before, a victim is only expected to perform a search query if s/he has access to the internet and a certain level of trust in societal organizations to be able to offer help. These assumptions could explain that a strong relationship is found in many European countries in this study.

The aim of this work is to help build awareness on the issue of domestic violence. Although some countries have adopted steps to mitigate problems, the results clearly indicate that the issues persist. In this light, the UN recently published a brief with ‘recommendations to be considered by all sectors of society, from governments to international organizations and to civil society organizations in order to prevent and respond to violence against women and girls, at the onset, during, and after the public health crisis with examples of actions already taken’.

 

 

 

More About Omdena

 

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

 
 

Matching Land Conflict Events to Government Policies via Machine Learning | World Resources Institute

Matching Land Conflict Events to Government Policies via Machine Learning | World Resources Institute

By Laura Clark Murray, Nikhel Gupta, Joanne Burke, Rishika Rupam, Zaheeda Tshankie

 

Download the PDF version of this whitepaper here.

Project Overview

This project aimed to provide a proof-of-concept machine-learning-based methodology to identify land conflicts events in geography and match those events to relevant government policies. The overall objective is to offer a platform where policymakers can be made aware of land conflicts as they unfold and identify existing policies that are relevant to the resolution of those conflicts.

Several Natural Language Processing (NLP) models were built to identify and categorize land conflict events in news articles and to match those land conflict events to relevant policies. A web-based tool that houses the models allows users to explore land conflict events spatially and through time, as well as explore all land conflict events by category across geography and time.

The geographic scope of the project was limited to India, which has the most environmental (land) conflicts of all countries on Earth.

 

Background

Degraded land is “land that has lost some degree of its productivity due to human-caused process”, according to the World Resources Institute. Land degradation affects 3.2 billion people and costs the global economy about 10 percent of the gross product each year. While dozens of countries have committed to restore 350 million hectares of degraded land, land disputes are a major barrier to effective implementation. Without streamlined access to land use rights, landowners are not able to implement sustainable land-use practices. In India, where 21 million hectares of land have been committed to the restoration, land conflicts affect more than 3 million people each year.

AI and machine learning offer tremendous potential to not only identify land-use conflicts events but also match suitable policies for their resolution.

 

Data Collection

All data used in this project is in the public domain.

News Article Corpus: Contained 65,000 candidate news articles from Indian and international newspapers from the years 2008, 2017, and 2018. The articles were obtained from the Global Database of Events Language and Tone Project (GDELT), “a platform that monitors the world’s news media from nearly every corner of every country in print, broadcast, and web formats, in over 100 languages.” All the text was either originally in English or translated to English by GDELT.

  • Annotated Corpus: Approximately 1,600 news articles from the full News Article Corpus were manually labeled and double-checked as Negative (no conflict news) and Positive (conflict news).
  • Gold Standard Corpus: An additional 200 annotated positive conflict news articles, provided by WRI.
  • Policy Database: Collection of 19 public policy documents related to land conflicts, provided by WRI.

 

Approach

 

Text Preparation

 

In this phase, the articles of the News Article Corpus and policy documents of the Policy Database were prepared for the natural language processing models.

The articles and policy documents were processed using SpaCy, an open-source library for natural language processing, to achieve the following:

  • Tokenization: Segmenting text into words, punctuation marks, and other elements.
  • Part-of-speech (POS) tagging: Assigning word types to tokens, such as “verb” or “noun”
  • Dependency parsing: Assigning syntactic dependency labels to describe the relations between individual tokens, such as “subject” or “object”
  • Lemmatization: Assigning the base forms of words, regardless of tense or plurality
  • Sentence Boundary Detection (SBD): Finding and segmenting individual sentences.
  • Named Entity Recognition (NER): Labelling named “real-world” objects, like persons, companies, or locations.

 

Coreference resolution was applied to the processed text data using Neuralcoref, which is based on an underlying neural net scoring model. With coreference resolution, all common expressions that refer to the same entity were located within the text. All pronominal words in the text, such as her, she, he, his, them, their, and us, were replaced with the nouns to which they referred.

 

For example, consider this sample text:

“Farmers were caught in a flood. They were tending to their field when a dam burst and swept them away.”

Neuralcoref recognizes “Farmers”, “they”, “their” and “them” as referring to the same entity. The processed sentence becomes:

Farmers were caught in a flood. Farmers were tending to their field when a dam burst and swept farmers away.”

 

Coreference resolution of sample sentences

 

 

Document Classification

 

The objective of this phase was to build a model to categorize the articles in the News Article Corpus as either “Negative”, meaning they were not about conflict events, or “Positive”, meaning they were about conflict events.

After preparation of the articles in the News Article Corpus, as described in the previous section, the texts were then prepared for classification.

First, an Annotated Corpus was formed to train the classification model. A 1,600 article subset of the News Article Corpus was manually labeled as “Negative” or “Positive”.

To prepare the articles in both the News Article Corpus and Annotated Corpus for classification, the previously pre-processed text data of the articles was represented as vectors using the Bag of Words approach. With this approach, the text is represented as a collection, or “bag”, of the words it contains along with the frequency with which each word appears. The order of words is ignored.

For example, consider a text article comprised of these two sentences:

Sentence 1: “Zahra is sick with a fever.”

Sentence 2: “Arun is happy he is not sick with a fever.”

This text contains a total of ten words: “Zahra”, “is”, “sick”, “happy”, “with”, “a”, “fever”, “not”, “Arun”, “he”. Each sentence in the text is represented as a vector, where each index in the vector indicates the frequency that one particular word appears in that sentence, as illustrated below.

 

 

 

With this technique, each sentence is represented by a vector, as follows:

“Zahra is sick with a fever.”

[1, 1, 1, 0, 1, 1, 1, 0, 0, 0]

“Arun is happy he is not sick with a fever.”

[0, 2, 1, 1, 1, 1, 1, 1, 1, 1]

With the Annotated Corpus vectorized with this technique, the data was used to train a logistic regression classifier model. The trained model was then used with the vectorized data of the News Article Corpus, to classify each article into Positive and Negative conflict categories.

The accuracy of the classification model was measured by looking at the percentage of the following:

  • True Positive: Articles correctly classified as relating to land conflicts
  • False Positive: Articles incorrectly classified as relating to land conflicts
  • True Negative: Articles correctly classified as not being related to land conflicts
  • False Negative: Articles incorrectly classified as not being related to land conflicts

 

The “precision” of the model indicates how many of those articles classified to be about the land conflict were actually about land conflict. The “recall” of the model indicates how many of the articles that were actually about the land conflict were categorized correctly. An f1-score was calculated from the precision and recall scores.

The trained logistic regression model successfully classified the news articles with precision, recall, and f1-score of 98% or greater. This indicates that produced a low number of false positives and false negatives.

 

Classification report using a test dataset and logistic regression model

 

 

Categorize by Land Conflicts Events

The objective of this phase was to build a model to identify the set of conflict events referred to in the collection of positive conflict articles and then to classify each positive conflict article accordingly.

A word cloud of the articles in the Gold Standard Corpus gives a sense of the content covered in the articles.

A topic model was built to discover the set of conflict topics that occur in the Positive conflict articles. We chose a semi-supervised approach to topic modeling to maximize the accuracy of the classification process. We chose to use CorEx (Correlation Explanation), a semi-supervised topic model that allows domain knowledge, as specified by relevant keywords acting as “anchors”, to guide the topic analysis.

To align with the Land Conflicts Policies provided by WRI, seven relevant core land conflicts topics were specified. For each topic, correlated keywords were specified as “anchors” for the topic.

 

 

 

The trained topic model provided 3 words for each of the seven topics:

  • Topic #1: land, resettlement, degradation
  • Topic #2: crops, farm, agriculture
  • Topic #3: mining, coal, sand
  • Topic #4: forest, trees, deforestation
  • Topic #5: animal, attacked, tiger
  • Topic #6: drought, climate change, rain
  • Topic #7: water, drinking, dams

The resulting topic model is 93% accurate. This scatter plot uses word representations to provide a visualization of the model’s classification of the Gold Standard Corpus and hand-labeled positive conflict articles.

 

Visualization of the topic classification of the Gold Standard Corpus and Positive Conflict Articles

 

 

Identify the Actors, Actions, Scale, Locations, and Dates

The objective of this phase was to build a model to identify the actors, actions, scale, locations, and dates in each positive conflict article.

Typically, names, places, and famous landmarks are identified through Named Entity Recognition (NER). Recognition of such standard entities is built-in with SpaCy’s NER package, by which our model detected the locations and dates in the positive conflict articles. The specialized content of the news articles required further training with “custom entities” — those particular to this context of land conlficts.

All the positive conflict articles in the Annotated Corpus were manually labeled for “custom entities”:

  • Actors: Such as “Government”, “Farmer”, “Police”, “Rains”, “Lion”
  • Actions: Such as “protest”, “attack”, “killed”
  • Numbers: Number of people affected by a conflict

This example shows how this labeling looks for some text in one article:

 

 

These labeled positive conflict articles were used to train our custom entity recognizer model. That model was then used to find and label the custom entities in the news articles in the News Article Corpus.

 

Match Conflicts to Relevant Policies

The objective of this phase was to build a model to match each processed positive conflict article to any relevant policies.

The Policy Database was composed of 19 policy documents relevant to land conflicts in India, including policies such as the “Land Acquisition Act of 2013”, the “Indian Forest Act of 1927”, and the “Protection of Plant Varieties and Farmers’ Rights Act of 2001”.

 

Excerpt of a 2001 policy document related to agriculture

 

 

A text similarity model was built to compare two text documents and determine how close they are in terms of context or meaning. The model made use of the “Cosine similarity” metric to measure the similarity of two documents irrespective of their size.

Cosine similarity calculates similarity by measuring the cosine of an angle between two vectors. Using the vectorized text of the articles and the policy documents that had been generated in the previous phases as described above, the model generated a collection of matches between articles and policies.

 

Visualization of Conflict Event and Policy Matching

The objective of this phase was to build a web-based tool for the visualization of the conflict event and policy matches.

An application was created using the Plotly Python Open Source Graphing Library. The web-based tool houses the models and allows users to explore land conflict events spatially and through time, as well as explore all land conflict events by category across geography and time.

The map displays land conflict events detected in the News Article Corpus for the selected years and regions of India.

Conflict events are displayed as color-coded dots on a map. The colors correspond to specific conflict categories, such as “Agriculture” and “ Environmental”, and actors, such as “Government”, “Rebels”, and “Civilian”.

In this example, the tool displays geo-located land conflict events across five regions of India in 2017 and 2018.

 

 

 

By selecting a particular category from the right column, only those conflicts related to that category are displayed on the map. Here only the Agriculture-related subset of the events shown in the previous example is displayed.

 

 

News articles from the select years and regions are displayed below the map. When a particular article is selected, the location of the event is shown on the map. The text of the article is displayed along with policies matched to the event by the underlying models, as seen in the example below of a 2018 agriculture-related conflict in the Andhra Pradesh region.

 

 

Here is a closer look at the article and matched policies in the example above.

 

 

 

Next Steps

This overview describes the results of a pilot project to use natural language processing techniques to identify land conflict events described in news articles and match them to relevant government policies. The project demonstrated that NLP techniques can be successfully deployed to meet this objective.

Potential improvements include refinement of the models and further development of the visualization tool. Opportunities to scale the project include building the library of news articles with those published from additional years and sources, adding to the database of policies, and expanding the geographic focus beyond India.

Opportunities to improve and scale the pilot project

 

Improvements
  • Refine models
  • Further development of visualization tool

 

Scale
  • Expand library of articles with content from additional years and sources
  • Expand the database of policies
  • Expand the geographic focus beyond India

 

 

About the Authors

  • Laura Clark Murray is the Chief Partnership & Strategy Officer at Omdena. Contact: laura@omdena.com
  • Nikhel Gupta is a physicist, a Postdoctoral Fellow at the University of Melbourne, and a machine learning engineer with Omdena.
  • Joanne Burke is a data scientist with MUFG and a machine learning engineer with Omdena.
  • Rishika Rupam is a Data and AI Researcher with Tilkal and a machine learning engineer with Omdena.
  • Zaheeda Tshankie is a Junior Data Scientist with Telkom and a machine learning engineer with Omdena.

 

Omdena Project Team

Kulsoom Abdullah, Joanne Burke, Antonia Calvi, Dennis Dondergoor, Tomasz Grzegorzek, Nikhel Gupta, Sai Tanya Kumbharageri, Michael Lerner, Irene Nanduttu, Kali Prasad, Jose Manuel Ramirez R., Rishika Rupam, Saurav Suresh, Shivam Swarnkar, Jyothsna sai Tagirisa, Elizabeth Tischenko, Carlos Arturo Pimentel Trujillo, Zaheeda Tshankie, Gabriela Urquieta

 

Partners

This project was done in collaboration with Kathleen Buckingham and John Brandt, our partners with the World Resources Institute (WRI).

 

 

About Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through global bottom-up collaboration. Omdena is a partner of the United Nations AI for Good Global Summit 2020.

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here