Identifying Economic Incentives for Forest and Landscape Restoration
In this two-month project with World Resources Institute, 50 technology changemakers applied Natural Language Processing (NLP) to identify economic and financial incentives for forest and landscape restoration. The project focus lied on Latin America (Mexico, Peru, Chile, Guatemala and El Salvador).
Background and motivation
We are on the verge of the United Nations Decade for Ecosystem Restoration. The Decade starts in 2021 and ushers in a global effort to drive ecosystem restoration to support climate mitigation and adaptation, water and food security, biodiversity conservation, and livelihood development. In order to prepare for the decade, we must understand the current situation regarding enabling policies. However, to understand policies involves reading and analyzing thousands of pages of documentation across multiple sectors. Using NLP to mine policy documents, would promote knowledge sharing between stakeholders and enable rapid identification of incentives, disincentives, perverse incentives, and misalignment between policies.
If a lack of incentives or disincentives were discovered, this would provide an opportunity to advocate for positive change. Creating a systematic analysis tool using NLP would enable a standardized approach to generate data that can support an evidence-based change to save the environment.
The project outcomes
The project goals were divided into the following areas:
Identifying which policies relate to forest and landscape restoration
Detecting financial and economic incentives in policies
Creating a heat map that determines the relevance of policies to forest and landscape restoration
We answered these questions through the following pipeline:
The web scraping process consisted of two approaches: the scraping of official policy databases, and Google Scraping. This allowed the retrieval of virtually all official policy documents from the five listed countries roughly between 2016 and 2020. The team filtered the scraping results further by the relevance of landscape restoration. Thus, we were able to build a comprehensive database of policy documents for use further down the pipeline.
Collecting all available online policies in a country can result in a database of thousands of documents, and millions of text fragments, all contributing to the policy landscape in the country or region. However, faced with thousands of documents, it is impractical to do word clouds for them individually. This is where topic modeling comes in handy. Topic Modeling is a technique to extract the hidden topics from large volumes of text. Latent Dirichlet Allocation (LDA) is a popular algorithm for topic modeling.
The LDA model is a topic classification model developed by Prof. Andrew Ng et al. of Stanford University’s NLP Laboratory. The team applied the model to identify various models in the policy database.
Next, we generated the following heatmap visualization. The horizontal axis contains different topic labels, while the vertical axis lists three countries: Mexico, Peru, and Chile. The heat map gives us insights into the different levels of categorical policy present in the three countries; for instance, a territorial-related policy is widely prevalent in Mexico, but not adopted widely in Chile or Peru.
This allows policymakers to observe the decisions made by other countries and how they compare to their local administration. As a result, they can make better-informed choices in domestic policy, supported by data-driven evidence.
Example visualization: Heatmap displaying the frequency of policy topics by country
To find all results, please check out the articles at the end of this page.