NLP Clustering to Understand Social Barriers Towards Energy Transition | World Energy Council

NLP Clustering to Understand Social Barriers Towards Energy Transition | World Energy Council

Using NLP clustering to better understand the thoughts, concerns, and sentiments of citizens in the USA, UK, Nigeria, and India about energy transition and decarbonization of their economies. The following article shares observatory results on how citizens of the world perceive their role within the energy transition. This includes associated social risks, opportunities, and costs.

The findings are part of a two-month Omdena AI project with the World Energy Council (WEC). None of the findings are conclusive but observative taking into account the complexity of the analysis scope.

 

The Project Goal

The aim was to find information that can help governments to effectively involve people in the accelerating energy transition. The problem was quite complicated and there was no data provided to us. Therefore, we were supposed to create our own data-set, analyze it, and provide WEC with insights. We started with a long list of open questions such as:

  • What should our output look like?
  • What search terms would be useful to scrape data for?
  • What countries should be considered as our main focus?
  • Should we consider non-English languages as well and analyze them?
  • How much data per country will be enough?
  • Etc.

In order to meet the deadline for the project, we decided to go with the English language only and come up with good working models.

 

The Solution

 

Getting data from Social Media

We scraped the following resources: Twitter, YouTube, Facebook, Reddit, and famous newspapers specific to each country. Desired insights should cover developed, developing, and under-developed countries and the emphasis was specifically on developing, and under-developed countries.

The results discussed in this article obtained from scraped tweet data and for USA, UK, India, and Nigeria which cover the three categories of developed, developing, and under-developed countries.

 

Our Approach: Trying different NLP techniques

We first gathered data by scraping tweets using several specific keywords we found to be important for specific countries using google trends. I added stop-words, stemming, removed hashtags, punctuation, numbers, mentions, and replaced URLs with _URL. I used TF-IDF vectorization for feature extraction of the articles. I am going to walk you through various steps taken to tackle the problem.

 

Approach 1: Sentiment Analysis (Non-satisfactory)

Sentiment analysis of short tweets data comes with its own challenges and some of the important challenges we were facing for this project were:

  • Tags mean different things in different countries. #nolight can be Canadians complaining about the winter sunset, or Nigerians having a power cut.
  • Tags take a side. For example, #renewables is pro-green and #climatehoax is not. So positive sentiment on #renewables might not really tell us much.
  •  The classifier model built on #climatechange and related tags do not work at all on the anti-green tags such as #climatemyth.
  • Some anti-green tweets are full of happy emojis which makes the sentiments unreliable.
  • The major tweeting countries are overwhelmingly positive. In fact, the distribution of climate change-related tweets across the world is not uniform and the number of tweets across some countries is much more prevalent in the data-set as compared to others (Figure1) [1].
  • The interpretation of outputs. In fact, by just assigning labels to each tweet we will not be able to derive insights on the barriers to the energy transition. Therefore, the interpretability of the model is very important.

Considering all the challenges discussed, the sentiment analysis of the tweets did not produce satisfactory results (Table1) and we decided to test other models.

 

 

Number of climate change related tweets per country [1]

Figure1: Number of climate change related tweets per country [1]

 

 

Classifier accuracy for sentiment analysis of tweets data (USA)

Table1: Classifier accuracy for sentiment analysis of tweets data (USA)

 

 

Approach 2: Topic Modeling (Unsatisfactory) 

Topic modeling is an NLP technique that provides a way to compare the strength of different topics and tells us which topic is much more informative as compared to others. Topic models are unsupervised models with no need for data labeling. Because tweets are short it was really hard to differentiate between different topics and also correspond them to a specific topic using models such as LDA. Topic models tend to produce the best results when applied to texts that are not too short and those that have a consistent structure.

 

1. Using a semi-supervised approach

We chose a semi-supervised topic modeling approach (CorEX) [2]. Since the data was very high dimensional, we applied dimensionality reduction in order to remove noise and interpret the data. Permutation Test is used to determine the optimum number of principal components required for PCA [3,4]. From the explained variance ratio plot, it appeared that the cumulative explained variance line is not perfectly linear, but it is very close to a straight line.

Through permutation tests, I noticed that the mean of the explained variance ratio of permuted matrices did not really differ from the explained variance ratio of the non-permuted matrix which suggested that applying PCA on correlated topic model’s results were not helpful at all.

 

 

 

 

This means each of the principal components contributes to the variance explanation almost equally, and there’s not much point in reducing the dimensions based on PCA.

 

2. Identifying 20 important topics

The CorEx results showed that there are about 20 important topics and it was also showing the important words per topic. But how to interpret the results?

Data was very high dimensional and dimensionality reduction was not helpful at all. For example, if price, electricity, ticket, fuel, gas, and skepticism are the most important words for one topic how to understand the concerns of the people of that country? Is it fuel price that is of concern to them? Or electricity prices, or ticket prices? There could be a combination of many different possibly related words in each topic and by just looking at the important words in each topic, it would not be possible to find out what is the story behind data to harness clean energy for a better future.

Besides, bigrams or trigrams with topic models did not help much either because not the main keywords conveying the main focus of the tweet might always appear together.

 

 

 

 

Approach 3: Clustering (Kmeans & Hierarchical)

Both Kmeans and Hierarchical clustering models lead to comparable results illustrating separate clear clusters. Because both models have comparable performance, we derived all results using Hierarchical clustering which better shows the hierarchy of the clusters. Tweet data were collected for four different countries as discussed before and the model was applied to the data of each country separately to analyze the results. To summarize we only show the clustering results for India. But all the insights across countries are shown at the end of the article.

 

 

 

 

Hierarchical Clustering Results

After finding clear clusters from the data, the next step was interpreting the data by creating meaningful visualizations and insights. A combination of Scattertext, co-occurrence graph, dispersion plot, colocated word clouds, and top trigrams resulted in very useful insights from data to harness clean energy for a better future.

An important lesson to point out here is to always rely on a combination of various plots for your interpretations instead of only one. Each type of plot helps us visualize one aspect of data and combining various plots together helps to create a comprehensive clear picture from data.

 

 

1. Using Scattertext

Scattertext is an excellent exploratory text analysis tool that allows cool visualizations differentiating between the terms used by different documents using an interactive scatter plot.

Two types of plots were created which was very helpful in interpreting the results.

1) Visualizing word embedding projections. This has been explored using word association with a specific keyword. The keywords include the following: [Access, Availability, Affordability, Bills, Prices]. If the reader is interested, they can try more keywords using the provided code in this study.

2) In another plot, the uni-grams from the clustered tweets are selected and plotted using their dense-ranked category-specific frequencies. We used this difference in dense ranks as the scoring function.

All the interactive plots are stored in an HTML file and are available in the GitHub repository. If you click on the interactive version, the list of tweets with each specific term can be explored. Please note that first hierarchical clustering is applied to the data and then the clustered tweets are given to Scattertext as input. You can gain further information by diving deep into these plots. The data used for creating these results can be found here and the notebook to apply to cluster and create these scatter plots can be found here.

The following shows the interactive versions of all plots for various countries:

 

1.1. Rank and frequencies across different categories (India)

 

 

 An example Scattertext plot showing positions of terms based on the dense ranks of their frequencies, for cluster 1 & 2. The scores are the difference between the terms’ dense ranks. The bluer terms are, the higher their association scores are for cluster 1. The redder the terms, the higher their association score is for cluster 2. See Cluster 1 vs 2 for an interactive version of this plot.

Figure 8. An example Scattertext plot showing positions of terms based on the dense ranks of their frequencies, for cluster 1 & 2. The scores are the difference between the terms’ dense ranks. The bluer terms are, the higher their association scores are for cluster 1. The redder the terms, the higher their association score is for cluster 2. See Cluster 1 vs 2 for an interactive version of this plot.

 

 

An example Scattertext plot showing positions of terms based on the dense ranks of their frequencies, for cluster 1 & 3. The scores are the difference between the terms’ dense ranks. The bluer terms are, the higher their association scores are for cluster 1. The redder the terms, the higher their association score is for cluster 3. See Cluster 1 vs 3 for an interactive version of this plot.

Figure 9. An example Scattertext plot showing positions of terms based on the dense ranks of their frequencies, for cluster 1 & 3. The scores are the difference between the terms’ dense ranks. The bluer terms are, the higher their association scores are for cluster 1. The redder the terms, the higher their association score is for cluster 3. See Cluster 1 vs 3 for an interactive version of this plot.

 

 

1.2. Word embedding projection plots using Scattertext (India)

 

 

An example Scattertext plot showing word associations to term prices using Spacy’s pretrained embedding vectors. This is used to see the terms most associated with the term prices. At the top right corner, we see the most commonly associated words with the term prices such as electricity. If you click on the interactive version, the list of tweets with the terms can be explored. See Word Embedding: Bills for an interactive version of this plot.

Figure 10. An example Scattertext plot showing word associations to term prices using Spacy’s pre-trained embedding vectors. This is used to see the terms most associated with the term prices. At the top right corner, we see the most commonly associated words with the term prices such as electricity. If you click on the interactive version, the list of tweets with the terms can be explored. See Word Embedding: Bills for an interactive version of this plot.

 

 

 An example Scattertext plot showing word associations to term bills using Spacy’s pretrained embedding vectors. This is used to see the terms most associated with the term bills. At the top right corner, we see the most commonly associated words with the term bills such as electricity, prices, energy, power. If you click on the interactive version, the list of tweets with the terms can be explored. See Word Embedding: Prices for an interactive version of this plot.

Figure 11. An example Scattertext plot showing word associations to term bills using Spacy’s pretrained embedding vectors. This is used to see the terms most associated with the term bills. At the top right corner, we see the most commonly associated words with the term bills such as electricity, prices, energy, power. If you click on the interactive version, the list of tweets with the terms can be explored. See Word Embedding: Prices for an interactive version of this plot.

 

 

2. Twitter Insights (Price & Energy Transition Concerns)

 

2.1. India
  • Solar and wind don’t necessarily mean cheaper prices as it did not cause so in Germany. When Germany went all on renewables, energy prices and carbon emissions went up.
  • The electrical prices can drop for people who are sourcing power from the government-owned renewable sources because the prices are not going to vary with oil and natural gas.
  • Renewable energy policy can lead to much lower electricity prices, a stronger globally competitive economy, less import of fossil fuels, and as a result less pollution.
  • Putting a tax on coal and making open access a reality are two potential action areas to make renewable energy affordable.
  • Let oil prices increase and subsidies stop.
  • Many requests to replace fossil fuels with cleaner fossil fuels such as stubbles from farmers.
  • Cut oil imports and encourage renewable energies.
  • A lot of complaints regarding electricity shortage, lack of electricity for hours or days, electricity cut, electricity, and water supply.
  • Fossil fuels are dirty, and Nuclear power is dangerous. Therefore, we need to make renewable energy work or harness clean energy for a better future.

 

2.2. Nigeria
  • People complaining about no constant electricity, and zero business-friendly policy.
  • Enhancing the delivery of electricity in the country.
  • Whenever it rained electricity supply was cut off for days, lack of electricity every weekend daily and overnight, and unstable electricity.
  • No water and no electricity.
  • The electricity sector is the third main consuming sector of oil.
  • Lots of worries and trouble regarding paying electricity bills.
  • Access to electricity is not for everyone.
  • Access to affordable sustainable renewable energy.
  • Renewable energy water and waste management are some of Nigeria’s major partnership areas with Ghana.
  • Harnessing tidal or offshore wind energy which is a clean and renewable source.
  • Lots of positive experiences and low prices with the usage of Solar power systems.

 

2.3. UK

  • Bringing down the prices of electricity and gas.
  • Having stable prices for electricity.
  • People prefer higher prices for gas than electricity.
  • Need to think beyond electricity to affect the energy transition.
  • Renewables disrupt the electricity market and politicians raising electricity prices to tackle climate emergency problems is an awful policy.
  • A lot of requests on investment in Renewable Energies.
  • The transition to renewable is being too slow.
  • Lots of discussions on whether it is good to replace the nuclear stations with renewables.
  • Whether the zero-carbon economy has any economic benefit for the UK.

 

2.4. USA

  • Slowing down climate change.
  • Market-based solutions for climate change.
  • Renewable energy infrastructure is lame and unreliable.
  • Renewables increase electricity prices and distort energy markets with favorable purchase agreements.
  • Many complaints regarding gas prices.
  • National security’s priority should be on renewable energy Investing in its infrastructure and jobs progs.
  • Figure out how to store renewable energy and get rid of excess CO in the atmosphere.
  • Renewable energy represents a significant economic opportunity.

 

 

3. Weighing a word´s importance via Dispersion Plot

A word’s importance can be weighed by its dispersion in a corpus. Lexical dispersion is a measure of a word’s homogeneity across the parts of a corpus. The following plot notes how many times a word occurs throughout the entire corpus for different countries including India, Nigeria, UK, and the USA.

According to the following dispersion plot, access to electricity is an important concern for Nigeria while this is not the case for the other three countries. How do we know that this access is related to electricity? Well, the answer is Scattertext plots shown in the previous section. Analyzing those plots together with the dispersion plot shows that the concern is electricity access.

Access to affordable renewable energy is a big concern in Nigeria and then India, while the affordability of renewable energy is not a problem for people in the UK and the USA. Affordability is a big concern for the people in Nigeria and people have difficulty paying their electricity bills.

Energy, electricity, power, and renewables are also the topic of most of the discussions in all of these countries. But what aspects of each topic are of concern to each country? The answer is given in the previous section where we interpret the results of Scattertext plots.

 

 

Lexical dispersion for various keywords across different countries

Figure 12. Lexical dispersion for various keywords across different countries

 

 

4. Top Trigrams for Different Countries

 

 

Top twenty trigrams for India

Figure 13. Top twenty trigrams for India

 

 

As can be seen from the top 20 trigrams for India the top concerns are Renewable energy, Renewable energy sector, Renewable energy capacity, Renewable energy sources, New renewable energy, and clean renewable energy. These top concerns specifically match the insights drawn from clustering in the previous section.

 

 

Top twenty trigrams for Nigeria

Figure 14. Top twenty trigrams for Nigeria

 

 

As can be seen from the top 20 trigrams for Nigeria the top concerns are Renewable energy, Renewable energy training, Electricity distribution companies, Renewable energy sources, Renewable energy solutions, Solar renewable energy, Renewable energy sector, Affordable prices, Power Supply, Climate change renewables, Public-private sectors, Renewable energy industry, Renewable energy policies, and Access to renewable energy. These top concerns specifically match the insights drawn from clustering in the previous section.

 

 

Top twenty trigrams for UK

Figure 15. Top twenty trigrams for UK

 

 

As can be seen from the top 20 trigrams for the United-Kingdom the top concerns are Free renewable energy, Renewable energy sources, Using renewable energy, New renewable energy. These top concerns specifically match the insights drawn from clustering in the previous section.

 

 

 Top twenty trigrams for USA

Figure 16. Top twenty trigrams for USA

 

 

As can be seen from the top 20 trigrams for the USA the top concerns are Clean renewable energy, Renewable energy sources, Supporting renewable energy, Renewable fuel standard, Transition into renewable energy, Solar renewable energy, New renewable energy, Using renewable energy, Need for quality products, and renewable energy jobs. These top concerns specifically match the insights drawn from clustering in the previous section.

 

 

5. Collocated word clouds & Co-occurrence Network

The following plots display the networks of co-occurring words in tweets in different countries. Here, we visualize the network of top 25 occurring bigrams. The connection between the words confirms the insight derived in the previous section for all cases.

 

 

 Collocate Clouds-India

Figure 17. Collocate Clouds-India

 

 

Co-occurrence Network-India (First 25 Bigrams)

Figure 18. Co-occurrence Network-India (First 25 Bigrams)

 

 

Collocate Clouds-Nigeria

Figure 19. Collocate Clouds-Nigeria

 

 

Co-occurrence Network-Nigeria (First 25 Bigrams)

Figure 20. Co-occurrence Network-Nigeria (First 25 Bigrams)

 

 

Collocate Clouds-UK

Figure 21. Collocate Clouds-UK

 

 

Co-occurrence Network-UK (First 25 Bigrams)

Figure 22. Co-occurrence Network-UK (First 25 Bigrams)

 

 

Collocate Clouds-USA

Figure 23. Collocate Clouds-USA

 

 

Co-occurrence Network-USA (First 25 Bigrams)

Figure 24. Co-occurrence Network-USA (First 25 Bigrams)

 

 

 

 

 

 

More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Geographical Data Science to Identify the Most Impactful Areas for Solar Installation in Africa

Geographical Data Science to Identify the Most Impactful Areas for Solar Installation in Africa

Data-driven decision making and signal processing with Google Earth Engine to meet the electricity and water demand in Nigeria.

The Nigerian NGO Renewable Africa #RA365 has the mission to install off-grid solar containers to mitigate the lack of electricity access in the country, where only half of the population of 198 million has stable access to the power supply. We came up with the solution to by using Solar Data Science concept.

 

The demand – A known Problem

The Demographic and Health Surveys (DHS) provide a large amount of data on African and other developing countries.

 

Nigerian Electricity Supply 2015

Exploring DHS data on Nigerian Electricity Demand in 2015: Github

 

This dataset has been used by several researchers and plots similar to the above can be found throughout the internet and literature.

However, the dataset in Nigeria is based on a 2015 survey of about only 1000 households per state, without specifying their precise geographic location within each state. Nevertheless, it shows the critical state of energy access in Nigeria. For example, from the 1194 sampled households in Sokoto state, only 20% (239) had access to electricity in 2015.

 

Our approach — Nighttime images

We quickly came up with the idea of comparing nighttime satellite imagery against the geographic location of the population.

 

Night sky image seen through Google Earth Engine

Night sky image in Google Earth Engine

 

Although the nightlights seem quite straightforward to use, we still needed to find where all the Nigerian houses are located, and then check if they are lit up at night or not (demand).

We initially thought of using a UNet-like model to detect or segment the house roofs from the sky. This has been done already in several machine learning competitions, however, we came across the population dataset from WorldPop, which is also available in Google Earth Engine and uses ground surveys and machine learning to fill the gaps.

GRID3 is another dataset from the same group, which has been validated during vaccination campaigns and provides much higher resolution and precision.

With both datasets in hand, the math seems easy: demand = population and no lights.

 
// GRID3 population data
var img_pop3 = ee.ImageCollection('users/henrique/GRID3_NGA_PopEst_v1_1_mean_float')
// Nigerian nightlights (1Y median)
var nighttime = ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG')
                  .filter(ee.Filter.date('2018–09–01', '2019–09–30'))
                  .median()
                  .select('avg_rad')
                  .clipToCollection(nigeria);
// Demand layer
var demand = img_pop3.gte(pop_threshold) // threshold population
                     .multiply(nighttime.lte(light_threshold)) // population without lights

Here is the code snippet link.

 

Some challenges to overcome

However, we first have to take into consideration the noise present in each one of the datasets. And secondly, find the optimal places for Installation of the Solar containers by using Geographical Data Science, within the immense sea of electricity demand in Nigeria, Africa.

We also used a few sample villages (where the electricity supply was known) to calibrate the thresholds of minimal population density and minimal light levels to consider into the algorithm.

 

Omuo, Ekitn region satellite image

The region around Omuo, Ekiti

 

Gaussian Convolutional Filter for nighttime lights over the region of Omuo, Ekiti

Overlay with both NOAA datasets VIIRS (blue) and DMSP-OLS (orange) nighttime lights, smoothed by a Gaussian convolutional filter

 

Overlay data using GRID3 population

Overlay with GRID3 population data (green)

 

Building the location heatmap

A large part of the container installation cost is due to the wiring and distribution of the electricity. This cost has a nonlinear relationship to the distance between the panel and the house to be supplied with energy, in the way that it is much cheaper to supply to nearby houses.

For example, a house 200m away from the energy source should cost more than 2x the cost of one at 100m.

We assume the optimal solar panel location in relation to a household will approximately follow a Gaussian distribution due to the wiring cost. Therefore both noisy nightlights and the electricity demand itself can be smoothed out by applying Gaussian convolutional filters in order to find the best spots for the solar panel installation.

 

heatmap of Omuo, Ekiti

Demand heatmap

 

Finally, we tried several image segmentation techniques to capture the clusters of demand, however, the best technique in GEE turned out to be the very simple “connected Components algorithm”.

// GMeans Segmentation
var seg = ee.Algorithms.Image.Segmentation.GMeans(smooth.gt(demand_threshold), 3, 50, 10);
Map.addLayer(seg.randomVisualizer(), {opacity:0.5}, 'GMeans Segmentation');
// SNIC Segmentation
var snic = ee.Algorithms.Image.Segmentation.SNIC(smooth.gt(demand_threshold), 30, 0, 8, 300);
Map.addLayer(snic.randomVisualizer(), {opacity:0.5}, 'SNIC Segmentation');

// Uniquely label the patches and visualize.
var patchid = smooth.gt(demand_threshold)
                    .connectedComponents(ee.Kernel.plus(1), 256);
Map.addLayer(patchid.randomVisualizer(), {opacity:0.5}, 'Connected Patches');

 

Here is the code snippet link for GEE Algorithms for Image Segmentation

 

 

Additionally, we can sum the population density of each area to estimate the total population on each cluster.

// Make a suitable image for `reduceConnectedComponents()`
// by adding a label band to the `img_pop3` image.
img_pop3 = img_pop3.addBands(patchid.select('labels'));

// Calculate the total population in demand area
// defined by the previously added "labels" band
// and reproject to original scale
var patchPop = img_pop3.reduceConnectedComponents({
  reducer: ee.Reducer.sum(),
  labelBand: 'labels',
}).rename(['pop_total']).reproject(img_pop3.projection())
Here is the code snippet link.
 
 
 

GEE allows you to export the raster as a TIF, which can then be worked on GeoPandas to find their contour and centroids and link it back to google maps for further exploration.

 

Locations marked in blue landmarks for population greater than 4000

Interactive map using Folium and leaflet.js on Jupyter (all potential locations with a population above 4000)

 

Conclusion

 

We showed how to combine satellite imagery and population data to create an interactive map and a list of the top Nigerian regions with high demand for electricity by the usage of Solar containers Installation, via Geographical Data Science.

The NGO Renewable Africa will use those tools to survey and validate the locations before installing the solar panels. This should have a real impact on the lives of thousands of people in need. Additionally, this report can also be used to show where the demand lies and help to pressure the local government into action.

We also hope that the initiative is followed by the neighboring and other developing countries, as all the methodology and code used here can be easily transferred to other locations.

 

The Code

Source code for both GEE and the Colab notebook is available here.

 

 

More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

 

Increasing Solar Adoption in the Developing World through Machine Learning and Image Segmentation

Increasing Solar Adoption in the Developing World through Machine Learning and Image Segmentation

 

The problem

How to Increase Solar Adoption in the developing world through Image Segmentation? Applied in India.

 

The solution

Step 1: Identification of the Algorithm: Image Segmentation

We initially started with the goal of increasing Solar Adoption using Image Segmentation algorithms from computer vision. The goal was to segment the image into roofs and non-roofs by identifying the edges of the roofs. Our first attempt was to use the Watershed image segmentation algorithm. The Watershed algorithm is especially useful when extracting touching or overlapping objects are in the images. The algorithm is very fast and computation inexpensive. In our case, the average computing time for one image was 0.08 sec.

Below are the results from the Watershed algorithm.

 
 
 
Images of rooftops in Delhi, India

Original image(left). The output from the Watershed model(right)

 

As you can see the output was not very good. Next, we implemented Canny Edge Detection. Like Watershed, this algorithm is also widely used in computer vision and tries to extract useful structural information from different visual objects. In the traditional Canny edge detection algorithm, there are two fixed global threshold values to filter out the false edges. However, as the image gets complex, different local areas will need very different threshold values to accurately find the real edges. So there is a technique called auto canny, where the lower and upper bound are automatically set. Below is the Python function for auto canny:

 

 

Snippet of the code for Image Segmentation

 

 
 

The average time taken by a Canny edge detector on one image is approx. 0.1 sec, which is very good. And the results were better than the Watershed algorithm, but still, the accuracy is not enough for practical use.

 
 
 
 
Image of Delhi rooftops for understanding the algorithm

The output from the Canny Edge detection algorithm

 

Both of the above techniques use Image Segmentation and work without understanding the context and content of the object we are trying to detect (i.e. rooftops). We may get better results when we train an algorithm with the objects (i.e. rooftops) looks like. Convolutional Neural Networks are state-of-the-art technology to understand the context and content of an image and are being used here to increase Solar Adoption Awareness using Image Segmentation technique.

As mentioned earlier, we want to segment the image into two parts — a rooftop or not a rooftop. This is a Semantic segmentation problem. Semantic segmentation attempts to partition the image into semantically meaningful parts and to classify each part into one of the predetermined classes.

 
 
 
 
Explaining what segmentation is using a basic example

Semantic Segmentation (picture taken from https://www.jeremyjordan.me/semantic-segmentation/)

 

In our case, each pixel of the image needs to be labeled as a part of the rooftop or not.

 
 
 
 
Differentiating image into two segments, Roof and Non roof part

We want to segment the image into two segments — roof and not roof(left) for a given input image(right).

 

Step 2: Generating the Training Data

To train a CNN model we need a dataset with rooftops satellite images with Indian buildings and their corresponding masks. There is no public dataset available for Indian buildings’ rooftops images with masks. So, we had to create our own dataset. A team of students tagged the images and created masked images (as below).

And here are the final outputs after masking.

 
 
 

Roof top satellite images converted into image segmented photo

 
 
 
 
 

Although the U-Net model is known to work with fewer images for data but to begin with, we had only like 20 images in our training set which is way below for any model to give results even for our U-Net. One of the most popular techniques to deal with less data is Data Augmentation. Through Data Augmentation we can generate more data images using the ones in our dataset by adding a few basic alterations in the original ones.

For example, in our case, any Rooftop Image when rotate by a few degrees or flipped either horizontally or vertically would act as a new rooftop image, given the rotation or flipping is in an exact manner, for both the roof images and their masks. We used the Keras Image Generator on already tagged images to create more images.

 
 
 
 
Augmenting Data

Data Augmentation

 

Step 3: Preprocessing input images

We tried to sharpen these images. We used two different sharpening filters — low/soft sharpening and high/strong sharpening. After sharpening we applied a Bilateral filter for noise reduction produced by sharpening. Below are some lines of Python code for sharpening

 
 
 
 
 
Code for Low Sharpening Filter

Low sharpening filter

 
 
 
 
 
 
Code for high sharpening filter

High sharpening filter

 

 

And below are the outputs.

 
 
 
 

Satellite view of buildings

 
 
 
 
 
 
 
 

Google Images

 
 
 
 
 
 

Step 4: Training and Validating the model

We generated training data of 445 images. Next, we chose to use U-Net architecture. U-net was initially used for Biomedical image segmentation, but because of the good results it was able to achieve, U-net is being applied in a variety of other tasks. is one of the best network architecture for image segmentation. In our first approach with the U-Net model, we chose to use RMSProp optimizer with a learning rate of 0.0001, Binary cross-entropy with Dice loss (implementation taken from here). We ran our training for 200 epochs and the average(last 5 epochs) training dice coefficient was .6750 and the validation dice coefficient was .7168

Here are the results of our first approach from the Validation set (40 images):

 
 
 
 
Predicted and Targeted Image

Predicted (left), Target (right)

 
 
 
 
 

Predicted (left), Target (right)

 

 

As you can see, in the predicted images there are some 3D traces of building structure in the middle and corners of the predicted mask. We have found out that this is due to the Dice loss. Next, we used Adam optimizer with a learning rate 1e-4 and a decay rate of 1e-6 instead of RMSProp. We used IoU loss instead of BCE+Dice loss and binary accuracy metric from Keras. The training was performed for 45 epochs. The Average(last 5 epochs) training accuracy was: 0.862 and the average validation accuracy was: 0.793. Below are some of the predicted masks on the Validation set from the second approach:

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

And here are the results form the test data:

 

 

Test data

 

 

 

Test data

 

 

More About Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Tackling Energy Poverty in Nigeria Through Artificial Intelligence

Tackling Energy Poverty in Nigeria Through Artificial Intelligence

Can AI help to address energy poverty in Nigeria where more than 100m people lack stable access to electricity?

 

By Laura Clark Murray 


 

A staggering 1 billion people on Earth live in energy poverty

Without stable access to electricity, families can’t light their homes or cook their food. Hospitals and schools can’t dependably serve their communities. Businesses can’t stay open.

Energy poverty shapes and constrains nearly every aspect of life for those who are trapped in it. As the Global Commission to End Energy Poverty puts it, “we cannot end poverty without ending energy poverty.” In fact, energy poverty is considered to be one of humanity’s greatest challenges of this century.

In Nigeria, Africa’s most populous country, more than half of the 191 million citizens live in energy poverty. And though governments have been talking for years about extending national electricity grids to deliver energy to more people, they’ve made little progress.

 

With such a vast problem, what can be done?

Rather than focusing on the national electricity grid, Nigerian non-profit Renewable Africa 365, or RA365, is taking a different approach. RA365 is working with local governments to install mini solar power substations, known as renewable energy microgrids. Each microgrid can deliver electricity to serve small communities of 4,000 people. In this way, RA365 aims to address Nigerian energy poverty community-by-community with solar installations.

To be effective, RA365 needs to convince local policymakers of the potential impact of a microgrid in their community. For help they turned to Omdena. Omdena is a global platform where AI experts and data scientists from diverse backgrounds collaborate to build AI-based solutions to real-world problems. You can learn more here about Omdena’s innovative approach to building AI solutions through global collaboration.

 

Which communities need solar microgrids the most?

Omdena pulled together a global team of AI experts and data scientists. Working collaboratively from remote locations around the globe, the team set about identifying the regions in Nigeria where the energy poverty crisis is most dire and where solar power is likely to be effective. 

To determine which regions don’t have access to electricity, our team looked to satellite imagery for the areas of the country that go completely dark at night. Of those locations, they prioritized communities with large populations that incorporate schools and hospitals. Also the collaborators looked at the distance of those communities from the existing national electricity grid. In reality, if a community is physically far from the existing grid, it’s unlikely to be hooked up anytime soon. In this way, by analyzing the satellite data with population data, the team identified the communities most in crisis.

In any machine learning project, the quality and quantity of relevant data is critical. However, unlike projects done in the lab, the ideal data to solve a real-world problem rarely exists.  In this case, available data on the Nigerian population was incomplete and inaccurate. There wasn’t data on access to the national electricity grid. Furthermore, the satellite data couldn’t be relied upon. Given this, the team had to get creative. You can read how our team addressed these data roadblocks in this article from collaborator Simon Mackenizie. 

 

What’s the impact?

The team built an AI system that identifies regional clusters in Nigeria where renewable energy microgrids are both most viable and likely to have high impact on the community. In addition, an interactive map acts as an interface to the system.

AI in Nigeria

Heatmap with most suitable spots for solar panel installments

RA365 now has the tools it needs to guide local policymakers towards data-driven decisions about solar power installation. What’s more, they’re sharing the project data with Nigeria Renewable Energy Agency, a major funding source for rural electrification projects across Nigeria. 

With this two-month challenge, the Omdena team delivered one of the first real-world machine learning solutions to be deployed in Nigeria. Importantly, our collaborators from around the globe join the growing community of technologists working to solve Nigeria’s toughest issues with AI.

Ademola Eric Adewumi, Founder of Renewable Africa 365, shares his experience working with the Omdena collaborators here. Says Adewumi, “We want to say that Omdena has changed the face of philanthropy by its support in helping people suffering from electrical energy poverty. With this great humanitarian help, RA365 hopes to make its mission a reality, bringing renewable energy to Africa.”

 

About Omdena

Building AI through global collaboration

Omdena is a global platform where changemakers build ethical and inclusive AI solutions to real-world problems through collaboration.

Learn more about the power of Collaborative AI.

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here