A predictive impact analytics model for quantifying the social, economic, and environmental impact of making an investment in integrating AI for Forest Landscape Restoration projects. This challenge was hosted with Trillion Tree Fund.
“If a tree falls in a forest and no one is around to hear it, does it make a sound?” is a famous philosophical question about perception and observation. And similar to our question, does the benefits of forest landscape restoration to tackle climate change make sounds to the world or it just falls, and no one hears nor observes nor perceives.
According to the Trillion Tree Fund (1TFund), Climate change could cost the world ~$792 trillion in the next 80 years. Forest landscape restoration (FLR) helps mitigate those risks; for example, mangroves absorb 70-90% of storm surge. FLR could generate $7–$30 in economic benefits for every dollar invested. Yet these co-benefits are undervalued by markets. This poses a major impediment to financing FLR, which faces an annual investment gap of around $400 billion.
Why is investing in FLR projects very important?
The ability of trees to absorb carbon dioxide and other gases from the atmosphere has long made them a valuable weapon in the fight against rising temperatures. A single mature tree can absorb 48lbs. of carbon a year and makes enough clean oxygen for 4 people to breathe fresh air.
According to the goals of the Paris agreement, the change in global temperature at the end of this century must be limited to 2 degrees Celsius and aggressively work towards 1.5 degrees Celsius.
The Intergovernmental Panel on Climate Change (IPCC) said that if the world wanted to limit the rise to 1.5C by 2050, an extra 1bn hectares (2.4bn acres) of trees would be needed.
How do we make it sound?
After understanding the scope of the project, we quantified the damages from flooding disasters with or without the benefits of the FLR project on the particular region of the United States in USD then datasets related to disasters, tree benefits, and satellite images were consolidated.
The project followed a standard pipeline of data collection, pre-processing, consolidation, EDA, model development, and dashboard deployment.
Data cleaning and data consolidation
The raw data has been processed by omitting unwanted records, replacing missing or erroneous data, standardization of date and time, and binning some continuous attributes. imputing missing categorical values with the most frequent ones, imputing missing numerical values with the mean or median, imputing some values (under specific conditions) with unsupervised Machine Learning techniques.
After data cleaning has been consolidated, the team prepared a summary document capturing the list of attributes and mentioning the reason why doing an EDA for this dataset.
Exploratory Data Analysis
Natural catastrophes, such as floods, landslides, storm surges, tsunamis, earthquakes, cyclonic winds, and wildfires, are becoming increasingly frequent and intense around the world, highlighting the need for a more holistic strategy to dealing with them. The International Emergency Disasters Database (EMDAT) documented an annual average of 363 disasters from 1990 to 2020, with floods and storms being the most common.
The average annual total disaster fatalities in the period are 170,984 people. With earthquakes being accountable for the fatalities of at least 1.24 million people globally. Storm and flood also account for the death of almost 1 million people (the true number may be much higher due to nuances in data gathering).
Average annual economic losses total more than US$107 billion. Annually, 175 million people have been affected by disasters during 1990-2020. Riverine floods, tropical cyclones, and convective storms account for most of the damage to property.
Team also used auto EDA libraries like Sweetviz, D-Tale, Pandas Profiling, and Autoviz for gaining quick insights into the datasets.
Initially, all the datasets related to forests, disasters, and tree benefits were collected. After applying the pre-processing algorithms such as log transformation and imputation, the datasets were used for modeling. In machine learning, depending on the final output, there are two types of algorithms – classification and regression models.
For the training data, we modeled a regression model. The regression model allows us to predict a continuous dependent variable (y) based on the value of one or multiple independent/predictor variables (x). As shown below, the dependent/predicted variable (y) will be the Flood Damage Cost in the US while the independent/predictors variables (x) are the variables that influence the Flood Damage.
Following models were created:
- Time series forecast (ARIMA) to predict GDP, population, and Inflation rate
- Various regressors to predict numerical values for example damages, the number of deaths.
- Various classifiers to predict categorical values for example biomass losses, wildfire datasets.
- Classification models on National Forest2_Bienville-short for finding which types of trees are in Abundance.
- Regression techniques for Time_series_US_1980-2021 for finding total damages cost.
- Trained U-NET model for tree cover loss got an IOU Score of 0.81.
Team also used Pycaret to create the 2 regression models: the flood damage cost without the FLR project of Trillion Tree Fund that consists of Xn predictor variables to quantify it and the flood damage cost with the FLR project of Trillion Tree Fund that consists Xn of predictor variables with Z Benefits of Trees variable to quantify it.
Here is an example of the performance of the model using the Pycaret library.
Performance of all Models
Here, we use the function compare_models of the Pycaret library to find the best algorithm. Based on the metrics the function ranks the model. We find that the first best model for the dataset is an ada boost classifier. In the next course of action, we can take the ada boost classifier individually on the data and do hyperparameter tuning.
Create a Model
Tune the Model
Import as a pickle file to integrate it.
Following are the examples of pickle files that were created.
Apart from the tabular datasets, the team had extracted satellite images from the GEE (Google Earth Engine) related to tree cover images. Once the preprocessing of satellite images was done in the GEE (Google Earth Engine) platform they were used to train a U Net model. The U Net model is primarily used in computer vision applications to do the segmentation of images. Tree cover images were extracted for years from the year 2003 to 2020. After that, we used image augmentation techniques to increase the number of images in the training dataset.
Deploying the Streamlit app to Heroku
Streamlit is a web app framework to deploy machine learning models locally using Python. Heroku is a cloud-based Platform as a Service (PaaS) to deploy modern apps onto the internet and it has been used to deploy our Streamlit app.
The team has created onetfund_app.py using Streamlit. Now it’s time to deploy the app using Heroku.
Step 1: Run the Streamlit app locally.
For running the code locally with Streamlit, we need to open our terminal/prompt then locate the directory where our onetfund_app.py python file is saved and run the following command.
# Running Streamlit locally
Streamlit run onetfund_app.py
The window will open automatically in the browser.
Step 2: Create and fork the repository on GitHub
The team has created a repository called dashboard-Heroku for our app.
After creating the repository, click the “fork” button.
All the files that are needed to deploy on Heroku are provided in the repository.
The repository comprises the following important files.
– Readme- This file will provide details about our app.
– Python files- While designing an impact analytics dashboard, apart from the main frontend landing page, 6 other pages to display the quantified social, economic, environmental, and financial impact of making an investment in a particular FLR project were created.
– Pickle files- All the models are saved as pickle files.
Apart from creating the above files which are directly related to creating the Streamlit app, the following files were created.
– Procfile- The Procfile is created to run setup.sh file and Streamlit web application.
– requirements.txt- All the libraries that we are going to use in our python script are added in this file. This file tells Heroku to install all required python libraries needed to run our application.
– setup.sh- This file is created to take care of all server-side issues like the port number and it will be added to the configuration.
Step 3– Connect to Heroku
Once we have all the required files, now it’s time to set up our app so it can interact with Heroku.
Head over to Heroku and create an account. Once we are on the Heroku dashboard, click on create a new app. Here we have an option to select your region.
Next, in the deployment method, click on GitHub and connect our GitHub account with Heroku. Once we are connected with our GitHub account, type our repository name where we have saved all your files.
The team has enabled an option for automatic deployment, so whenever there is a change in our web application files on GitHub, it will automatically deploy our web application on Heroku.
We can see it installing all required python libraries and dependencies in real-time. Once it’s done, we will see the message: Your app was successfully deployed and when we click on the View button, and it will open up our app.
Forest land restoration (FLR) not only helps to mitigate climate change risks like floods and wildfires but can prove to be economically beneficial. Keeping in mind Trillion Tree Fund’s mission of mobilizing conservation finance to restore 1.2 trillion trees and regenerate ecosystem, which would cancel out a decade of carbon emissions, generating jobs, and lessen the monetary and social impact of disasters; a team of collaborators worked on building a predictive impact analytics dashboard for quantifying the social, economic, and environmental impact of making an investment in a particular Forest Landscape Restoration project.