Deep Learning for Solar Rooftop Mapping and Faster Installation
A walkthrough in labeling satellite imagery, image segmentation, and using deep learning models to get best rooftops classification results.

This article chronicles Omdena’s two-month challenge to develop a solar rooftop mapping tool using deep learning. The challenge partner, a Techstars EnergyTech startup, aims to provide a digital map of commercial facilities’ rooftops and energy profiles. By classifying rooftops across North America, the project seeks to accelerate solar deployment by pinpointing which roofs are suitable for photovoltaic arrays.
As an anchor for assessing potential, a 2016 analysis by the National Renewable Energy Laboratory (NREL) estimated that more than eight billion square metres of rooftops in the United States could host over one terawatt of solar capacity. Significant work remains to turn that potential into reality, and rooftop solar assessment is a crucial first step. Leveraging satellite imagery and deep-learning algorithms for solar rooftop mapping enables planners to visualise suitable rooftops at scale, but it also demands meticulous data preparation and thoughtful model selection.
Defining the Problem
The challenge focused on turning satellite imagery into reliable insights for rooftop solar suitability. Instead of manually inspecting each site, the goal was to automatically detect and classify rooftops so planners can quickly identify strong solar candidates.
To address this, the project needed to solve three core problems:
-
Detect rooftops accurately from satellite images
-
Classify rooftop types (flat, sloped, complex, industrial, or existing solar)
-
Provide actionable outputs that indicate which rooftops are best suited for solar installation
By solving these issues at scale, the tool removes early uncertainty in solar planning and accelerates decision-making for large commercial areas.
Data Preparation and Image Labeling
The project began by gathering satellite images of rooftops at specific GPS coordinates. These images were obtained through Mapbox. A substantial portion of the effort involved manually labeling thousands of images using CVAT, a free, web‑based annotation tool that supports object detection, image classification and segmentation. CVAT’s collaborative features allowed multiple team members to label and review images efficiently.

Demonstration of labeling done on a rooftop in CVAT — source: Omdena
This example shows how annotators delineate roof boundaries using polygons, capturing features such as protrusions and obstacles. Such meticulous work ensures that the model learns accurate rooftop outlines and distinguishes roof surfaces from surrounding structures.
One of CVAT’s strengths is its ability to export annotations in various formats, including bounding boxes, COCO format and segmentation masks. These flexible export options were essential for experimenting with different deep‑learning models.

Insert Image 2: GIF showing CVAT export formats — source: Omdena
Different export formats enable interoperability with various machine‑learning frameworks. For example, segmentation masks are essential for training U‑Net models, while COCO annotations power instance segmentation models like Mask R‑CNN and Yolact.
Labeling required careful categorisation of rooftops into five classes:
- Flat: a single flat surface with or without clutters
- Complex rooftop: multiple surfaces at different heights
- Existing solar: roofs with solar panels already installed
- Heavy industrial: roofs with pipes and machinery clutter
- Slope: roofs with an inclined surface

Examples of images and their label — Source: Omdena
These labelled examples illustrate the diversity of rooftop shapes and features—from simple flat warehouses to complex industrial structures and buildings already fitted with solar panels. Capturing such variety is essential for training robust models that generalise across different rooftop configurations.
By the end of the labeling phase, the team had annotated 3 895 flat roofs, 2 278 sloped roofs, 1 050 complex rooftops, 450 existing solar roofs and 450 heavy industrial roofs on 600 × 580‑pixel images. These labeled datasets formed the foundation for training deep-learning models. The same AI-powered methodology has been used in Omdena’s energy projects to boost solar reliability at grid-scale — see an example of AI used in solar and renewable energy forecasting
Exploring Semantic Segmentation
The first modelling approach involved semantic segmentation, which assigns a class to each pixel. The team initially trained U‑Net models using the AIRS dataset and then fine‑tuned them on the new data. However, differences in zoom level and image resolution made it challenging to transfer learning from AIRS to the current task. Only after labeling enough images did U‑Net begin to segment rooftops with reasonable accuracy.

Example of mask used for labeling — source: Omdena
This mask highlights how each pixel of a rooftop is assigned a label during semantic segmentation. It serves as ground truth for teaching the model to distinguish roof areas from the background and is a fundamental component of supervised learning in computer vision.

U‑Net binary segmentation of rooftops — source: Omdena
The binary segmentation results demonstrate that, given sufficient training data, U‑Net can delineate rooftop boundaries. However, this approach does not differentiate between roof types, limiting its usefulness for assessing solar suitability.
The team attempted multi‑class semantic segmentation with U‑Net but found that the network often misclassified pixels from the same roof into different categories. The following examples illustrate how the model failed to recognise each roof as a coherent entity.

Multi‑classes semantic segmentation results with U‑Net — source: Omdena
Here we see the shortcomings of multi‑class U‑Net: the model confuses different roof types within the same structure, highlighting the limitations of pixel‑wise classification for this task. Without recognising each roof as a cohesive entity, the network cannot provide reliable class labels.
Seeking improvement, the team experimented with DeepLab via the Detectron2 framework and consulted the documentation here. Yet even with this model, the semantic approach struggled to achieve acceptable performance. The pixel‑level focus of semantic segmentation proved too fine‑grained for the task of classifying entire rooftops.

Deep‑Lab predictions — source: Omdena
DeepLab predictions exhibit similar issues, reinforcing the conclusion that pixel‑level semantic segmentation alone cannot capture the holistic shape and class of a rooftop. Even sophisticated semantic models struggle when buildings present diverse textures or when shadows obscure edges, which underscores the need for models that understand objects as unified entities.
At this stage, it became clear that a model capable of recognising individual rooftops as objects—rather than classifying pixels in isolation—would perform better. The team decided to shift from semantic to instance segmentation, a technique that treats each roof as its own object.
Transition to Instance Segmentation
Instance segmentation predicts both the class and the pixels belonging to each object. In this case, the model answers: “This is a flat rooftop, and here are all of its pixels.” To prepare for this approach, the team exported annotations from CVAT into COCO format.

The COCO format treats each rooftop as an individual object with bounding polygons and class labels. This representation is well suited to instance segmentation models that learn both the object’s extent and its category.
Mask R‑CNN with Detectron2
Using Facebook AI Research’s Detectron2 library, the team adapted a tutorial Colab notebook to train a Mask R‑CNN model on the rooftop dataset. The annotations were split into training and validation sets using the cocosplit tool. After fine‑tuning a pre‑trained R50‑FPN Mask R‑CNN model, the team evaluated performance using Average Precision (AP) metrics:
- AP‑flat: 34.47 %
- AP‑slope: 9.34 %
- AP‑existing solar: 1.75 %
- AP‑complex rooftop: 14.34 %
- AP‑heavy industrial: 5.06 %

Mask R‑CNN results — source: Omdena
The visualisation demonstrates that Mask R‑CNN identifies large, flat roofs reliably, yet misses or misclassifies smaller or more complex structures. The network’s performance highlights the importance of balanced training data across all roof types.
The model excelled at detecting flat and complex roofs but struggled to classify under‑represented categories such as existing solar, slope and heavy industrial rooftops. Class imbalance likely contributed to this challenge. Misclassifications often involved slope and existing solar roofs being labelled as flat.

Examples of the wrong classification with Mask R‑CNN — source: Omdena
Misclassifications often correspond to categories with fewer training examples. Increasing data diversity could help the model learn to distinguish between subtle differences in roof geometry and clutter.
Despite these issues, extended training improved the model’s performance. Running the network for 4 000 epochs at a learning rate of 0.001 yielded markedly better results, demonstrating that further tuning could enhance accuracy.

Predictions Mask R‑CNN — source: Omdena
With more epochs and a lower learning rate, the network begins to distinguish roof types more accurately, suggesting that there is room for improvement through hyperparameter tuning and data augmentation. However, training for thousands of epochs demands considerable computational resources, so future work must balance accuracy gains with practical training budgets.

Mask R‑CNN improved results with learning rate = 0.001 and 4000 epochs — source: Omdena
These improved results came at the cost of longer training times but demonstrate the network’s potential when given sufficient time and properly adjusted parameters. Extended training also highlights the diminishing returns one encounters beyond a certain point, a common trade‑off when tuning deep models.
Yolact Instance Segmentation
To achieve real‑time instance segmentation, the team turned to Yolact, training the yolact_im700_config variant with 300 prototype masks and a ResNet‑101 backbone. The model was trained for 63 epochs on 3 092 images with a batch size of eight. Other variants and evaluation details can be found variants. The final metrics included:
- Box localization loss: 0.70
- Confidence loss: 1.60
- Mask loss: 1.10
- Semantic segmentation loss: 0.40

Yolact’s results for roof classification — source: Omdena
The improved classification performance across multiple roof types demonstrates the effectiveness of Yolact for our application. By delivering near real‑time predictions, Yolact enables interactive tools that respond quickly to user input. It balances accuracy with speed, making it suitable for practical rooftop assessment tools that must evaluate numerous images.
These results were encouraging. Yolact correctly classified roofs with solar panels as “Existing solar” and accurately identified slope and complex rooftops. The model’s ability to recognise multiple roof types beyond the predominant “Flat” class made it a practical choice for rooftop solar assessment.
Final Deliverable: Streamlit Application
The culmination of the project was a Streamlit application that runs Yolact’s pre‑trained weights to perform rooftop classification. Designed with non‑technical stakeholders in mind, the interface allows urban planners, facility managers and energy consultants to upload imagery and receive immediate feedback on roof classes and potential suitability for solar panels. Users can upload their own images or specify GPS coordinates to retrieve and analyse rooftops. The app draws on satellite imagery, runs the Yolact model in the background and displays results as annotated overlays with simple labels and colour codes. It summarises the number of roofs detected per class and provides a quick overview of where solar panels could be installed. The interface provides a clear visualisation of predicted classes, helping decision‑makers determine solar rooftop suitability.

Visual of the roof classification Streamlit application in the class prediction section of the application — source: Omdena
The interface clearly highlights each detected rooftop and assigns a colour‑coded label, making it easy for users to assess rooftop suitability for solar installation at a glance. By summarising results visually and numerically, it streamlines decision‑making for engineers and planners who need to prioritise sites for further evaluation.
A demo of the dashboard is available on Youtube. By integrating deep‑learning models with a user‑friendly interface, the application translates complex computer‑vision outputs into actionable insights for solar deployment. In essence, it automates rooftop solar assessment by converting raw imagery into structured information about solar rooftop suitability. The tool can be integrated into existing energy‑planning workflows, enabling rapid screening of thousands of roofs without the need for manual inspection.
Conclusion
This project demonstrated how deep learning can streamline rooftop solar assessment by automatically detecting and classifying roofs from satellite imagery. Despite challenges like class imbalance and time-intensive labeling, the team succeeded in creating a functional model and user-friendly application that turns raw images into meaningful insights for solar planning.
The results lay a strong foundation for scalable solar mapping. With additional data, further tuning, and potential integration into solar feasibility tools or GIS platforms, the system could significantly support planners, engineers, and policymakers in accelerating rooftop solar adoption across large regions.
Transform slow manual site screening into an automated solar suitability workflow. Partner with Omdena to deploy scalable rooftop mapping tools that support faster energy rollout and smarter investment planning.




