Applying various data science tools and methods to visualize climate change impacts.
By Nishrin Kachwala, Debaditya Shome, and Oscar Chan
Day by day, as we generate exponentially more data, we also sift through its complexity and consume more. Filtering out relevancy is essential to get to the gist of the data in front of us. It is a well-known fact that the human brain absorbs a picture 60,000 times faster than texts. And that about 65% of humans are visually inclined.
To tell a climate-change-related data’s story beyond analysis and investigation, we needed to analyze trends and support decision making. Visualizing the information is necessary for practical data science — to explore the data, preprocess it, tune the model to the data, and ultimately to gain insights to take action.
No data story is complete without the inclusion of great visuals.
Understanding the impact of Nature-based solutions on climate change
The World Resources Institute (WRI) sought to understand the regional and global landscape of Nature-based Solutions (NbS).
- How are some NbS platforms addressing climate hazards?
- What type of NbS solutions are adapted?
- What barriers and opportunities exist, etc.
The focus was initially on three platforms, AFR100, Cities4Forests, and Initiative20x20, and later scale the work to more platforms.
More than 30 Omdena AI engineers worked on this NLP problem to derive several actionable insights, develop a recommendation and Knowledge-based Q&A system to query the data from the NbS platforms, and extract sentiments from the data to find potential gaps. Topic Modeling was applied to derive dominant topics from the data, Website Network analysis of organizations, and statistical analysis helped to explore the involvement of `climate change impacts, ‘interventions’ and ‘ecosystems’ for the three platforms.
Using Streamlit, we built a highly interactive shareable web application (dashboard) to zoom into NLP results for actionable insights on Nature-based solutions. The Streamlit app was deployed to the web using Heroku. A major advantage of using Streamlit is that it allows developers to build a sophisticated dashboard with multiple elements, such as Plotly graph objects, tables, and interactive controlling objects, with Python scripts instead of additional HTML codes for further layout definition. This allows the incorporation of multiple project outputs on the same dashboard swiftly with minimal codes.
Overview of the Dashboard
The dashboard consists of five major sections of the results, where users can navigate across each section using the navigation pull-down menu on the left side-bar, and use other functionalities on the side-bar to select the content they would like to see. The following will describe the components in each of the sections.
Choropleth Map View
Choropleth maps use colors on a diverging scale to represent a changed situation. A diverging color scale for countries represents the magnitude of climate change over time.
The analysis considers yearly data of country-level climate and landscape parameters, such as land type cover, temperature, and soil moisture, across the major platforms’ participating countries. Deforestation evaluation used the Hansen and MODIS Land Cover Type datasets. The temperature change analysis used the MODIS Land Surface Temperature dataset. And the NASA-USDA SMAP Global Soil Moisture dataset was used to assess land degradation. Each year’s changes in the climate parameters are computed compared to the earliest year available in the data. The calculated changes each year are plotted on the choropleth maps based on the predefined diverging color scale, and users can select the year to view using the slider above the map on the dashboard.
Take the change in temperature across participating countries as an example. The graph shows that the average yearly temperature in most South American countries and Central-Eastern African countries in 2019 decreased by around 0.25 to 1.3 °C compared to 2015. In contrast, there is an increase in the heat level of participating countries in northern Africa and Mexico, where the temperature in these countries has increased compared to 2015. Such a difference in temperature change can therefore be easily represented by the diverging color scale, where red represents an increase in heat and blue represents a decline.
Heat Map View
Heat maps represent the intensity of attention from the nature-based solution platforms and how each of the climate risks matches with the NbS intervention across platforms. The two heat maps illustrate measurements of attention intensity from each NbS platform. The first is a document frequency and the second a calculation of hazard to ecosystem match scores. Users can filter their data visualization of interest using the checkbox on the sidebar, the pull-down menu on the top-left corner, and selecting the corresponding NbS platform.
As an example, the heatmap above shows the number of documents and websites related to climate impacts and the corresponding climate intervention strategies from the initiative 20×20 platform. Users can see that the land degradation problem has received the most attention from the platform, where restoration, reforestation, restorative farming, and agroforestry are the major climate intervention strategies that are correlated with the land degradation problem. Besides, the heatmap shows that the attention for the solutions for some climate risks such as wildfires, air and water pollution, disaster risk, bushfires, coastal erosion on the initiative 20×20 platform is relatively limited compared to other risks.
Apart from the heatmap itself, the dashboard design allows rooms for linking to external resources based on the information presented on the heatmap. Similar to the interactive tool in the Nature-based Solutions Evidence Platform by the University of Oxford where users can access the external cases by clicking on heatmaps, users can use the pull-down menus below the heatmap to browse the list of links and documents for each of the document numbers represented. For example, the attached figure shows the results when users select the restoration effort in response to land degradation on initiative 20×20, where users can read the brief descriptions of the page, the keywords and access the external site by clicking on the hyperlink.
Potential gap/solution identification
This section presents the results of our Sentiment analysis models. The goal was to identify which Projects / Publications / Partners of the major NbS platforms were addressing Potential Gaps or solutions for climate change. A Gap is a negative sentiment, which means it has some negative impact on climate change. Similarly, a solution is a positive sentiment, which implies that it has a positive impact on climate change. The output of this sentiment analysis subtask were three Hierarchical data frames, each on Projects, Publications, and Partners of AFR100, Initiative20x20, and Cities4forests. To present these huge data frames in a compact manner, we used Treemap and sunburst plots. Treemap charts visualize hierarchical data using nested rectangles. Sunburst plots visualize hierarchical data spanning outwards radially from root to leaves. The hierarchical grouping has been done based on the three platforms and then showing inside a platform which countries are there, and then the projects associated with them, and then if you click deeper, it shows the description and keywords for that project. The size of a rectangular box / Sector represents how much certain that there’s a potential gap/solution.
This pull-down tab consists of the network analysis and knowledge graphs. Knowledge Graphs (KGs) represent raw information(in our case texts from NbS platforms) in a structured form, capturing relationships between entities.
In Network analysis, concepts(nodes) are identified from the words in the text and the edges between the nodes represent relations between the concepts. The network can help one visualize the general structure of the underlying text in a compact way. In addition, latent relations between concepts become visible, which are not explicit in the text. Visualizing texts as networks allow one to focus on important aspects of the text without reading large amounts of the texts. Visuals for Knowledge graphs and Network Analysis can be seen in the GIF above.
Knowledge-based Question-Answer System
Knowledge-based Question & Answering NLP system aims to answer questions in the context of text scraped data from the NbS platform and PDF documents available on the NbS platform websites. The system is built on the open-source Deepset.ai Haystack framework and hosted on a virtual machine, accessible via REST API and the Streamlit Dashboard.
Read more about the Q&A NLP system in this article.
The recommendation system uses content-based filtering or collaborative filtering. Collaborative Filtering uses the “wisdom of the crowd” to recommend items. Our collaborative recommendations are based on indicators from World bank data and keyword similarity using the Starspace model by Facebook. In the dashboard, one can select multiple indicators for a platform and platforms related to the selected one
Content-based filtering recommendation is based on the description of an item and a profile of the user’s preference.
Content-based filtering guesses similar organizations, projects, news articles, blog articles, publications, etc. for a selected organization. The starspace model was used to get the word embeddings, and then a similarity analysis was done comparing the description of the selected organization and all the other organization’s data sets. Different Projects, Publications, News articles, etc. can be selected as options, using which related organizations can be recommended.
Keyword Analysis of Partner Organizations
This section includes an intuitive 3D t-SNE visualization of all keywords/topics in the 12801 unique URLS inside 34 Partner organization websites. The goal of each organization as displayed in the hover label was the output from Topic modeling with Latent Dirichlet Allocation (LDA).
What is a t-SNE plot?
t-SNE is an algorithm for dimensionality reduction that is well-suited for visualizing high dimensional data. TSNE stands for t-distributed Stochastic Neighbor Embedding. The idea is to embed high-dimensional points in low dimensions in a way that respects similarities between points.
We got the embeddings for every URL’s entire texts using the widely known Sentence Transformer by HuggingFace. These high dimensional embeddings were used as input to the t-SNE model which gave output projections in 3 dimensions. These projections are seen below in the interactive 3D visualization.
Advantages of this visual?
There were 12801 URLs under these 34 organizations, going through all of them and figuring out what each URL talks about would take a huge amount of time, as some websites themselves had nearly 1M words in their About section. This visual can be of help for anyone who wants to know what’s being discussed by each organization without having to manually go through those URL’s descriptions.
Today, data visualization has become an essential part of the story, no longer a pleasant enhancement but adding depth and perspective to a story. For our case, Geo-plots, heatmaps, network diagrams, Treemaps, drop down and filter elements, 3D interactive plots guide the reader step-by-step through the narrative.
We have only explored a few visuals from the multitude available and developed by the Omdena Data Science enthusiasts. With the Visual Dashboard we hope to provide a more robust connection between critical insights about Nature-based Solutions and their adaptation to the viewer. The dashboard is portable and can be shared amongst the climate change community, driving engagement, and birthing new ideas.