How to Increase Solar Adoption in the developing world through Image Segmentation? Applied in India.
Machine Learning can bring a new era in the clean energy sector. This article describes how to identify rooftops from satellite images to find suitable places for solar panel installments.
The complexity of the task at hand is increased due to the low quality of satellite images from India (and most of the developing world). Similar solutions like Google Sunroof project work only on high-resolution images and not usable in the majority of the developing world.
Step 1: Identification of the Algorithm: Image Segmentation
We initially started with the goal of increasing Solar Adoption using Image Segmentation algorithms from computer vision. The goal was to segment the image into roofs and non-roofs by identifying the edges of the roofs. Our first attempt was to use the Watershed image segmentation algorithm. The Watershed algorithm is especially useful when extracting touching or overlapping objects are in the images. The algorithm is very fast and computation inexpensive. In our case, the average computing time for one image was 0.08 sec.
Below are the results from the Watershed algorithm.
The average time taken by a Canny edge detector on one image is approx. 0.1 sec, which is very good. And the results were better than the Watershed algorithm, but still, the accuracy is not enough for practical use.
Both of the above techniques use Image Segmentation and work without understanding the context and content of the object we are trying to detect (i.e. rooftops). We may get better results when we train an algorithm with the objects (i.e. rooftops) looks like. Convolutional Neural Networks are state-of-the-art technology to understand the context and content of an image and are being used here to increase Solar Adoption Awareness using Image Segmentation technique.
As mentioned earlier, we want to segment the image into two parts — a rooftop or not a rooftop. This is a Semantic segmentation problem. Semantic segmentation attempts to partition the image into semantically meaningful parts and to classify each part into one of the predetermined classes.
In our case, each pixel of the image needs to be labeled as a part of the rooftop or not.
Step 2: Generating the Training Data
To train a CNN model we need a dataset with rooftops satellite images with Indian buildings and their corresponding masks. There is no public dataset available for Indian buildings’ rooftops images with masks. So, we had to create our own dataset. A team of students tagged the images and created masked images (as below).
And here are the final outputs after masking.
Although the U-Net model is known to work with fewer images for data but to begin with, we had only like 20 images in our training set which is way below for any model to give results even for our U-Net. One of the most popular techniques to deal with less data is Data Augmentation. Through Data Augmentation we can generate more data images using the ones in our dataset by adding a few basic alterations in the original ones.
For example, in our case, any Rooftop Image when rotate by a few degrees or flipped either horizontally or vertically would act as a new rooftop image, given the rotation or flipping is in an exact manner, for both the roof images and their masks. We used the Keras Image Generator on already tagged images to create more images.
Step 3: Preprocessing input images
We tried to sharpen these images. We used two different sharpening filters — low/soft sharpening and high/strong sharpening. After sharpening we applied a Bilateral filter for noise reduction produced by sharpening. Below are some lines of Python code for sharpening
And below are the outputs.
Step 4: Training and Validating the model
We generated training data of 445 images. Next, we chose to use U-Net architecture. U-net was initially used for Biomedical image segmentation, but because of the good results it was able to achieve, U-net is being applied in a variety of other tasks. is one of the best network architecture for image segmentation. In our first approach with the U-Net model, we chose to use RMSProp optimizer with a learning rate of 0.0001, Binary cross-entropy with Dice loss (implementation taken from here). We ran our training for 200 epochs and the average(last 5 epochs) training dice coefficient was .6750 and the validation dice coefficient was .7168
Here are the results of our first approach from the Validation set (40 images):
As you can see, in the predicted images there are some 3D traces of building structure in the middle and corners of the predicted mask. We have found out that this is due to the Dice loss. Next, we used Adam optimizer with a learning rate 1e-4 and a decay rate of 1e-6 instead of RMSProp. We used IoU loss instead of BCE+Dice loss and binary accuracy metric from Keras. The training was performed for 45 epochs. The Average(last 5 epochs) training accuracy was: 0.862 and the average validation accuracy was: 0.793. Below are some of the predicted masks on the Validation set from the second approach:
And here are the results form the test data: