In this article, we will go through the process of commercial rooftops classification using Deep Learning methods. And accelerate the installation of solar energy in North America, as a part of Omdena’s challenge with EnergyTech startup.
A walkthrough in understanding the use of labeling tools like CVAT in labeling satellite imagery, the difference between semantic and instance segmentations with rooftops images pixels, and training different pre-trained deep learning models to get the best roof classification results.
Author: Margaux Masson-Forsythe
Solar energy hasn’t reached its full potential as a clean energy source for the United States yet. According to National Renewable Energy Laboratory (NREL) analysis in 2016, there are over 8 billion square meters of rooftops on which solar panels could be installed in the United States, representing over 1 terawatt of potential solar capacity.
Therefore, significant work remains to be done to accelerate the deployment of solar energy in the US by identifying potential locations and rooftops classification.
The goal of this 2 months challenge was to detect rooftops and provide rooftops classification in North America in order to identify the potential of facilities’ solar installation in this region. Solar rooftop potential for the entire country is the number of rooftops that would be suitable for solar power, depending on the uncluttered surface area, shading, direction, and location.
The challenge partner is a Techstars Energy tech startup that provides a digital map of these facilities’ rooftops and energy profiles.
The final goal of this challenge is to detect rooftops, and give understandable rooftops classification thus, accelerate the growth of solar installations in the United States.
So, the first step was to gather, preprocess and label the data.
Data preparation and images labeling
For this project, we used satellite images of the rooftops at specific GPS coordinates. We used the GPS coordinates provided by the partner. All the images were downloaded from Mapbox. Then, we manually labeled a large amount of the images using the labeling tool CVAT (Computer Vision Annotation Tool). CVAT is a free, open-source, web-based image annotation tool. It is a very powerful and efficient tool that allows multiple collaborators to label images, and then review the labeling done.
One of the best features of CVAT, which was essential for this project, was that CVAT supports labeling for object detection, image classification, and image segmentation. Thus, the annotation done on one image can be exported in any format (e.g bounding boxes, coco format, segmentation masks, etc).
A lot of work and time was actually spent on labeling the images. We chose to label 5 different classes of rooftops classification:
- Flat: Rooftops with a single flat surface with/without clutters
- Complex Rooftop: Rooftops with multiple surfaces at different heights
- Existing Solar: Rooftops with solar panels
- Heavy Industrial: Rooftops with pipes, and cluttered with machinery
- Slope: Rooftops with an inclined surface
Once our team was done with the labeling, we had 3895 Flat / 2278 Slope / 1050 Complex / 450 existing solar / 450 heavy industrial instances labeled on 600×580 images.
We needed to try different segmentation methods to get the best results.
We started by segmenting rooftops using semantic segmentation models. Semantic segmentation is the process of attributing a class to each pixel of an image. So in our case, we want to classify all pixels as rooftop or non-rooftop. We are implementing a binary semantic segmentation of rooftops.
We tried to use the AIRS (Aerial Imagery for Roof Segmentation) dataset for initial training and then fine-tune the model, but the results were not convincing — the main issue was that the zoom level was pretty different between the AIRS dataset and our data and even by cropping the AIRS images, we did not end up with good enough results. However, once we had enough data labeled, U-Net was able to segment the rooftops more accurately.
This training could be improved by running for a longer period of time and with more images, but we decided to switch to other models as we wanted to have a model that could accurately and efficiently detect and classify multiple classes. We tried to use U-Net for multi-classes semantic segmentation but the results were not good.
Indeed, as we can see in the results above, the net was not able to learn the fact that one rooftop is a whole entity by itself, and is predicting pixels of the same roof as different classes which are not what we wanted for this project. For example, on the first two results, we see that the net wrongly classified random pixels of the rooftop. The predictions ended up being a mix of several classes for the same roof.
In the same scope, we used the DeepLab model. We wanted to classify all pixels as a specific type of rooftops. So this is not binary semantic segmentation in this case but multi-classes semantic segmentation — like we tried with U-Net — but we wanted to test this with another model.
The goal here is to classify pixels as belonging to one of the 5 classes presented earlier (flat, slope, heavy industrial, existing solar, complex) to get meaningful rooftops classification.
With this model, we were able to use the pre-trained version, trained on the AIRS dataset, and fine-tune it on our data. The overall IOU score we obtained was 0.48 which is not good enough.
Here again, we see the same behavior we had with the multi-classes U-Net predictions.
At this point, we understood that semantic segmentation itself was too pixel-specific and that we needed a model that would converge faster and learn about the different types of rooftops as entities instead of only looking at the pixel level.
We, therefore, decided to focus on Instance Segmentation models instead.
Instance segmentation differs from semantic segmentation in the way that with semantic segmentation the model predicts:
“This pixel is a pixel belonging to a flat rooftop”
when with instance segmentation, the model predicts:
“This is a flat rooftop and here are all the pixels of the roof”.
So the instance segmentation model understands that the roof is an entity by itself.
Here are some examples of the labeled images COCO-style that shows that each rooftop is its own object:
We tried two instance segmentation models: Mask R-CNN and Yolact.
Mask R-CNN with Detectron2
We used the Facebook AI Research library called Detectron2. This library provides state-of-the-art detection and segmentation algorithms such as Mask R-CNN. The documentation for this library can be found here.
We modified the original Detectron2 tutorial Google Colab notebook for our project with our custom rooftop dataset. Our dataset was created from CVAT where we exported the annotations as COCO format. We first had one single json file with all the annotations that we split into a train and a validation json file using the cocosplit tool.
Once everything was well set up and configured, we fine-tuned a COCO-pre-trained R50-FPN Mask R-CNN model on our rooftops dataset.
Our final metrics were (AP = Average Precision in %):
‘AP-flat’: 34.46745607306496 ‘AP-slope’: 9.342863008612126 ‘AP-existing_solar’: 1.7519702923446772 ‘AP-complex_rooftop’: 14.343733930311137 ‘AP-heavy_industrial’: 5.055339100777756
So the model is learning about Flat and Complex and roofs but does not seem to understand Existing Solar, Slope, and Heavy industrial very well. These classes are under-represented in our dataset which might be the reason for this low performance.
From the results, we see that the net is doing a fairly good job at detecting and classifying flat rooftops.
Here are some examples of misclassified rooftops (mostly slope and existing solar all classified as flat rooftops):
We also saw some wrong predictions of the class “Existing solar” (second image below) but the model missed the real solar panels on the first image. However, we can notice that the net is learning about the Slope rooftops (first and third images below):
This imbalance in the results could be improved by increasing the number of examples for the under-represented classes such as Existing solar. We could also improve the model by playing more with the hyperparameters.
But for this project, we chose the network that converged to satisfying results the fastest which we obtained with the Yolact model.
However, we want to specify that with further training and a higher learning rate, and more epochs, we were able to get better results with the Mask R-CNN at the very end of the project. So this shows that this network has some potential:
These results were much better than our first results (for example the results on the first image are now very good compared to the previous results on this image presented previously) — but still — the network had issues being generalized to most of the images from the validation set.
Yolact is a fully convolutional model for real-time instance segmentation. It also uses COCO-style Object Detection JSON annotations. Our final model configuration is the yolact_im700_config with 300 prototype masks and a ResNet 101 backbone. It was trained for a total of 63 epochs with a batch size of 8 with other parameters left at default on a total of 3092 images. More applications of using Yolact segmentation like in weed detection in this article “Detecting Weeds Through YolactEdge Instance Segmentation to Support Smart Farming“
We used the deepest Yolact ResNet 101 model, however, there are other variants that could be used for further improvements — along with more hyperparameters fine-tuning.
The Yolact git repository also provides a different version called Yolact++ that is supposed to yield better results. However, it required more installation and some additional configuration which we did not have enough time for.
Our final metrics on the full dataset obtained for the Yolact ResNet 101 model were:
Box Localization Loss — 0.70 Confidence Loss — 1.6 Mask Loss — 1.1 Semantic Segmentation Loss — 0.40
This model had the best results and is the model we ended up using for our final deliverable. Here are some visualizations of the predictions on unannotated images:
We can see here that we have pretty good results: the rooftops with solar panels were classified as “Existing solar”. We also have some “Slope” and “Complex” rooftops being correctly classified. These results are satisfying and show that not only the predominant class “Flat” was correctly predicted which is what we were looking for.
This 2 months project was a lot of fun and we learned so much. Using satellite images is always a challenge, but I think, the most challenging part of this project was the labeling effort done by the team which is incredible!
More approaches accomplished by Omdena in the Energy sector – “Machine Learning For Rooftop Detection and Solar Panel Installment” and “AI for Solar Energy Adoption in Sub-Saharan Africa“
Big thank you to all collaborators who made this project a success: Alisson Damasceno, Amal Mathew, Amardeep Singh, Ampatishan Sivalingam, Ansh Motiani, Ayushi, Chebrolu Harika, Dhruvan Choubisa, Hadi Babaei, Javier Smith, Kayalvizhi Selvaraj, Maitreyi Nair, Mihir Godbole, Nishrin Kachwala, Ozan Ahmet Çetin, Parth Dandavate, Praful Mohanan, Qasim Hassan, S.Koushik, Sanjay Parajuli, Sara Faten Diaz, Sarang Nikhare, Sudarshan Paul, Syeda Iffat Naz, Tawanda Mutasa.