Geographical Data Science to Identify the Most Impactful Areas for Solar Installation in Africa

Geographical Data Science to Identify the Most Impactful Areas for Solar Installation in Africa

Data-driven decision making and signal processing with Google Earth Engine to meet the electricity and water demand in Nigeria.

The Nigerian NGO Renewable Africa #RA365 has the mission to install off-grid solar containers to mitigate the lack of electricity access in the country, where only half of the population of 198 million has stable access to the power supply. We came up with the solution to by using Solar Data Science concept.


The demand – A known Problem

The Demographic and Health Surveys (DHS) provide a large amount of data on African and other developing countries.


Nigerian Electricity Supply 2015

Exploring DHS data on Nigerian Electricity Demand in 2015: Github


This dataset has been used by several researchers and plots similar to the above can be found throughout the internet and literature.

However, the dataset in Nigeria is based on a 2015 survey of about only 1000 households per state, without specifying their precise geographic location within each state. Nevertheless, it shows the critical state of energy access in Nigeria. For example, from the 1194 sampled households in Sokoto state, only 20% (239) had access to electricity in 2015.


Our approach — Nighttime images

We quickly came up with the idea of comparing nighttime satellite imagery against the geographic location of the population.


Night sky image seen through Google Earth Engine

Night sky image in Google Earth Engine


Although the nightlights seem quite straightforward to use, we still needed to find where all the Nigerian houses are located, and then check if they are lit up at night or not (demand).

We initially thought of using a UNet-like model to detect or segment the house roofs from the sky. This has been done already in several machine learning competitions, however, we came across the population dataset from WorldPop, which is also available in Google Earth Engine and uses ground surveys and machine learning to fill the gaps.

GRID3 is another dataset from the same group, which has been validated during vaccination campaigns and provides much higher resolution and precision.

With both datasets in hand, the math seems easy: demand = population and no lights.

// GRID3 population data
var img_pop3 = ee.ImageCollection('users/henrique/GRID3_NGA_PopEst_v1_1_mean_float')
// Nigerian nightlights (1Y median)
var nighttime = ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG')
                  .filter('2018–09–01', '2019–09–30'))
// Demand layer
var demand = img_pop3.gte(pop_threshold) // threshold population
                     .multiply(nighttime.lte(light_threshold)) // population without lights

Here is the code snippet link.


Some challenges to overcome

However, we first have to take into consideration the noise present in each one of the datasets. And secondly, find the optimal places for Installation of the Solar containers by using Geographical Data Science, within the immense sea of electricity demand in Nigeria, Africa.

We also used a few sample villages (where the electricity supply was known) to calibrate the thresholds of minimal population density and minimal light levels to consider into the algorithm.


Omuo, Ekitn region satellite image

The region around Omuo, Ekiti


Gaussian Convolutional Filter for nighttime lights over the region of Omuo, Ekiti

Overlay with both NOAA datasets VIIRS (blue) and DMSP-OLS (orange) nighttime lights, smoothed by a Gaussian convolutional filter


Overlay data using GRID3 population

Overlay with GRID3 population data (green)


Building the location heatmap

A large part of the container installation cost is due to the wiring and distribution of the electricity. This cost has a nonlinear relationship to the distance between the panel and the house to be supplied with energy, in the way that it is much cheaper to supply to nearby houses.

For example, a house 200m away from the energy source should cost more than 2x the cost of one at 100m.

We assume the optimal solar panel location in relation to a household will approximately follow a Gaussian distribution due to the wiring cost. Therefore both noisy nightlights and the electricity demand itself can be smoothed out by applying Gaussian convolutional filters in order to find the best spots for the solar panel installation.


heatmap of Omuo, Ekiti

Demand heatmap


Finally, we tried several image segmentation techniques to capture the clusters of demand, however, the best technique in GEE turned out to be the very simple “connected Components algorithm”.

// GMeans Segmentation
var seg = ee.Algorithms.Image.Segmentation.GMeans(, 3, 50, 10);
Map.addLayer(seg.randomVisualizer(), {opacity:0.5}, 'GMeans Segmentation');
// SNIC Segmentation
var snic = ee.Algorithms.Image.Segmentation.SNIC(, 30, 0, 8, 300);
Map.addLayer(snic.randomVisualizer(), {opacity:0.5}, 'SNIC Segmentation');

// Uniquely label the patches and visualize.
var patchid =
                    .connectedComponents(, 256);
Map.addLayer(patchid.randomVisualizer(), {opacity:0.5}, 'Connected Patches');


Here is the code snippet link for GEE Algorithms for Image Segmentation



Additionally, we can sum the population density of each area to estimate the total population on each cluster.

// Make a suitable image for `reduceConnectedComponents()`
// by adding a label band to the `img_pop3` image.
img_pop3 = img_pop3.addBands('labels'));

// Calculate the total population in demand area
// defined by the previously added "labels" band
// and reproject to original scale
var patchPop = img_pop3.reduceConnectedComponents({
  reducer: ee.Reducer.sum(),
  labelBand: 'labels',
Here is the code snippet link.

GEE allows you to export the raster as a TIF, which can then be worked on GeoPandas to find their contour and centroids and link it back to google maps for further exploration.


Locations marked in blue landmarks for population greater than 4000

Interactive map using Folium and leaflet.js on Jupyter (all potential locations with a population above 4000)




We showed how to combine satellite imagery and population data to create an interactive map and a list of the top Nigerian regions with high demand for electricity by the usage of Solar containers Installation, via Geographical Data Science.

The NGO Renewable Africa will use those tools to survey and validate the locations before installing the solar panels. This should have a real impact on the lives of thousands of people in need. Additionally, this report can also be used to show where the demand lies and help to pressure the local government into action.

We also hope that the initiative is followed by the neighboring and other developing countries, as all the methodology and code used here can be easily transferred to other locations.


The Code

Source code for both GEE and the Colab notebook is available here.



More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.


Satellite Image Analysis to Identify Trees and Prevent Fires

Satellite Image Analysis to Identify Trees and Prevent Fires

The project goal was to build a Machine Learning model for tree identification on satellite images. The solution will prevent power outages and fires sparked by falling trees and storms. This will save lives, reduce CO2 emissions, and improve infrastructure inspection. The project was hosted by the Swedish AI startup Spacept.


Four weeks ago 35 AI experts and data scientists, from 16 countries came together through the Omdena platform. The community participants formed self-organized task groups and each task group either picked up a part or approach to solving the challenge.


Forming the Task Groups

Omdena’s platform is a self-organized learning environment and after the first kick-off call, the collaborators started to organize in task groups. Below are screenshots of some of the discussions that took place in the first days of the project.



Task Group 1: Labeling

We labeled over 1000 images. A large group of people makes it not only faster but also more accurate through our peer-to-peer review process.

Active collaborators: Leonardo Sanchez (Task Manager, Brazil), Arafat Bin Hossain (Bangladesh), Sim Keng Ying(Singapore), Alejandro Bautista Ramos (Mexico), Santosh Kumar Pydipalli (India), Gerardo Duran (Mexico), Annie Tran (USA), Steven Parra Giraldo (Colombia), Bishwa Karki (Nepal), Isaac Rodríguez Bribiesca (Mexico).


Labeled images

Task Group 2: Generating images through GANs

Given a training set, GANs can be used to generate new data with the same features as the training set.

Active Participants: Santiago Hincapie-Potes (Task Manager, Colombia), Amit Singh (Task Manager for DCGAN, India), Ramon Ontiveros (Mexico), Steven Parra Giraldo (Colombia), Isaac Rodríguez (Mexico), Rafael Villca (Bolivia), Bishwa Karki (Nepal).


Output from GAN

Task Group 3: Generating elevation model

The task group is using a Digital Elevation Model and triangulated irregular network. Knowing the elevation of the land as well as trees will help us to assess risk potential tree posses to overhead cables.

Active Participants: Gabriel Garcia Ojeda (Mexico)



Task Group 4: Sharpening the images

A set of image processes has been built, different combinations of filters were used and a basic pipeline to automate the process was implemented to test out the combinations. All in order to preprocess the set of labeled images to achieve more accurate results with the AI models.

Active Participants: Lukasz Kaczmarek (Task Manager, Poland) Cristian Vargas (Mexico), Rodolfo Ferro (Mexico), Ramon Ontiveros (Mexico).


Output after sharpening

Task Group 5: Detecting trees through Masked R-CNN model

Mask R-CNN was built by the Facebook AI research team. The model generates a set of bounding boxes that possibly contain the trees. The second step is to color based on certainty.

Active Participants: Kathelyn Zelaya (Task Manager, USA), Annie Tran (USA), Shubhajit Das (India), Shafie Mukhre (USA).


Masked RCNN output

Task Group 6: Detecting trees through U-Net and Deep U-Net model

U-Net was initially used for biomedical image segmentation, but because of the good results it was able to achieve, U-Net is being applied in a variety of other tasks. It is one of the best network architecture for image segmentation. We applied the same architecture to identifying trees and got very encouraging results, even when trained with less than 50 images.

Active Participants: Pawel Pisarski (Task Manager, Canada), Arafat Bin Hossain (Bangladesh), Rodolfo Ferro (Mexico), Juan Manuel Ciro Torre (Colombia), Leonardo Sanchez (Brazil).

The U-Net consists of a contracting path and an expansive path, which gives it the u-shaped format. The contracting path is a typical convolutional network that consists of repeated application of convolutions, each followed by a rectified linear unit (ReLU) and a max-pooling operation.


U-Net architecture


One of the techniques called our attention: the Deep U-Net. Similarly to U-Nets, the Deep U-Nets have both sides (contraction and expansion side) but use U-connections to pass high-resolution features from the contraction side to the upsampled outputs. And additionally, it uses Plus connections to better extract information with less loss error.


Deep-U-Net Architecture


Deep-U-Net Architecture

Having discussed the architecture, a basic Deep U-Net solution was applied to the unique 144 images labeled that were then divided into 119 images and 119 masks for the training set, 22 images and 22 masks for the validation set, and 3 images and 3 masks for a test set. As images and masks were in 1,000 x 1,000 images, they were cropped into 512 x 512 images generating 476 images and 476 masks for the training set, 88 images and 88 masks for the validation set, and 12 images and 12 masks for the test set. Applying the Deep U-Net model with 10 epochs and a batch size equal to 4, the results for the 10 epochs — using Adam optimizer, a binary-cross-entropy loss and running over a GPU Geforce GTX 1060 — were quite encouraging, reaching 94% accuracy over validation.


Model Accuracy and Loss


Model Accuracy and Loss

Believing that accuracy could be improved a bit further, the basic solution was expanded using data augmentation. We generated through rotations, 8 augmented images per original image and had 3,808 images and 3,808 masks for the training set, and 704 images and 704 masks for the validation set.

We reproduced the previous model, keeping the basal learning rate as 0.001 but adjusting with a decay inversely proportional to the number of epochs and increasing the number of epochs to 100.

Doing this we reached more than 95% accuracy, which was above the expectation of our project partner.

The Deep U-Net model learned very well to distinguish trees in new images, even separating shadows among forests as not trees, reproducing what we humans did during the labeling process but with an even better performance.

A few results can be seen below and were generated using new images completely unseen before by the Deep U-Net.


Lithuania image (the model was trained on Australia with a different landscape)

Predictions over the test set


More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.


Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here