There is a rising demand of electricity access in households and communities with likely 70% of this demand covered by solar energy! This article describes how an Omdena team created two dashboards in an AI challenge with impact startup NeedEnergy .  One dashboard for predicting the return on investment on the installation of a solar photovoltaic (PV) system over the long term and another for alerting clients with existing PV systems for near-real-time predictions of solar energy adoption in the short term. 

Authors: Ankur Shah, and Juan David Chacon


 

“I have no doubt that we will be successful in harnessing the sun’s energy. … If sunbeams were weapons of war, we would have had solar energy centuries ago.”

George Porter

Solar energy is a promising and growing renewable energy source for a climate-friendly future that does not rely on limited fossil fuels. Parts of the world that do not have access to electricity can especially benefit from solar panel systems for better livelihoods.

Sub-Saharan Africa has over 600 million people without access to electricity and its demand grows at an annual growth rate of 11%, the highest rate of any region worldwide (International Energy Administration, 2019). 

Over the next two decades, it is anticipated that electricity demand for households and commercial spaces will grow to 390 TWh. Across the globe, solar panel costs have been rapidly falling in recent years and are expected to meet a large part of the growing demand. In fact, it is likely that 70% of this rising demand will be covered by solar energy!

This is why Omdena partnered with NeedEnergy, a business based in Zimbabwe focused on providing energy solutions to communities in Sub-Saharan Africa. As a result, we created two dashboards in this challenge — one for predicting the return on investment on the installation of a PV system over the long term and another for alerting clients with existing PV systems for near-real-time predictions of solar energy supply in the short term. 

You can view and explore the tools using this link. In this article, we will learn how these dashboards were created. To learn more about the features used and best machine learning models implemented, please read “Estimating Electricity Demand of Sub-Saharan Africa Using AI

 

Construction of solar panel array structure (Source: Wikimedia Commons)

Construction of solar panel array structure (Source: Wikimedia Commons)

 

PV Sizing Tool — Long-term dashboard

The PV Sizing Tool, also called the long-term dashboard, for NeedEnergy is designed specifically for the city of Harare. This dashboard predicts the approximate solar energy production along with the costs over a long time horizon which is generally 20 years, the average lifetime of solar panels. The dashboard is created for prospective clients interested in solar panel systems for going off the grid and calculating the potential return on investment and the optimal size of the solar array. This type of PV sizing tool exists for industrial applications and is common in developed countries. However, this is a novel dashboard for Harare, Zimbabwe owing to the lack of energy consumption data. Let’s learn more about how to implement this for your chosen location for increasing the adoption of renewable energy! The following diagram represents the pipeline for this tool.

Pipeline for the PV Sizing Tool - Omdena

Pipeline for the PV Sizing Tool

 

Step 1: Data Collection and Wrangling

The most important requirement for this dashboard is the data. Multiple datasets were used for the PV sizing tool which includes:

  • Energy consumption from NeedEnergy’s clients via their proprietary API
  • Historical solar irradiance data for Harare from Solcast
  • Panel and Inverter list from the PVlib Python library

This dashboard uses a static file of solar irradiance data which is an average of 14 years of historical irradiance data for Harare from 2007 until 2021. This generated file is titled ‘Mean Meteorological Year for Harare’. Based on the assumption that the irradiance values will not significantly deviate from the 14-year average over the next two decades, a tool called PVLIB predicts the solar energy generation based on a number of user inputs which include the following:

  • Type of solar panel
  • Type of inverter
  • Price per installed watt
  • Time horizon
  • Number of panels

The irradiance data has three primary categorical input variables which one must understand for working with the prediction of solar energy generation. They include:

  • DHI or Diffuse Horizontal Irradiance
  • DNI or Direct Normal Irradiance
  • GHI or Global Horizontal Irradiance

To learn more about variables directly relevant to the prediction of solar energy, we recommend reading this article by National Renewable Energy Laboratory (NREL) on solar energy terms. 

Then, the mean meteorological irradiance dataset was merged with the energy consumption dataset using the Pandas library. The column with the dates was converted to the index of the resulting dataframe.

The figure below displays the user input half of the PV Sizing Tool dashboard. There are two options or objectives for the user which include either estimating savings or estimating the size of the installation.

 

The User Input Display for the PV Sizing Tool (Source: Omdena)

The User Input Display for the PV Sizing Tool (Source: Omdena)

 

Step 2: Modeling Solar Energy Production 

The PV sizing tool serves two primary functions which include:

  1. Calculating the net estimated savings on a solar installation for the given time horizon
  2. Estimating the approximate size of the PV system by calculating the number of panels required based on the energy consumption data

Based on the mean meteorological year of solar irradiance data, the panel type, and inverter type, the PVLIB library predicts the solar energy generation of a PV system. PVLIB Python is a community-supported tool that provides a set of functions and classes for simulating the performance of photovoltaic energy systems. PVLIB Python has been developed from a MATLAB toolbox of the same name at Sandia National Laboratories and it implements many of the models and methods developed at the Labs. More information on Sandia Labs PV performance modeling programs can be found at https://pvpmc.sandia.gov/. Based on the user inputs, PVLIB has functions to calculate the solar energy produced.

Owing to the lack of data in Harare, we had to make specific assumptions for our models which include the following:

  • Energy demand is considered to be periodic in nature.
  • The growth rate of the demand is not implemented so it is assumed to be constant for the time being.
  • The seasonality of energy consumption data is not fully captured since much of NeedEnergy’s API data is available for less than a year.

In order to use PVLIB, the first step to predicting solar energy production of a PV system, the installation characteristics need to be determined. The following code displays the commands for retrieving such characteristics such as module type, inverter type, and temperature parameters. For the PV sizing tool, the surface azimuth angle is assumed to be 180 degrees.

 

import pvlib
sandia_modules = pvlib.pvsystem.retrieve_sam(‘SandiaMod’)
sapm_inverters = pvlib.pvsystem.retrieve_sam(‘cecinverter’)
module = sandia_modules[module_name]
inverter = sapm_inverters[inverter_name]
temperature_model_parameters = pvlib.temperature.TEMPERATURE_MODEL_PARAMETERS[‘sapm’][‘open_rack_glass_glass’]
system = {‘module’: module, ‘inverter’: inverter, ‘surface_azimuth’: 180}

 

Secondly, meteorological conditions for the PV installation are required. For obtaining the approximate conditions at the right time and location, those parameters are necessary. Using PVLIB, the sun’s position can be tracked for a specific location and the pressure can be obtained using a given altitude.

 

altitude=1490
latitude = -17.824858
longitude = 31.053028
times = data_frame.index
system[‘surface_tilt’] = latitude
solpos = pvlib.solarposition.get_solarposition(times, latitude, longitude)
dni_extra = pvlib.irradiance.get_extra_radiation(times)
airmass = pvlib.atmosphere.get_relative_airmass(solpos[‘apparent_zenith’])
pressure = pvlib.atmosphere.alt2pres(altitude)
am_abs = pvlib.atmosphere.get_absolute_airmass(airmass, pressure)

 

Using the angles and the irradiance data, the effective irradiance is calculated. Finally, the effective irradiance, the approximate cell temperature, and the number of installed panels are required for determining the power generated by the solar panels.

 

aoi = pvlib.irradiance.aoi(system[‘surface_tilt’], system[‘surface_azimuth’],
solpos[‘apparent_zenith’], solpos[‘azimuth’])
total_irrad = pvlib.irradiance.get_total_irradiance(system[‘surface_tilt’], system[‘surface_azimuth’], 
                                                solpos[‘apparent_zenith’], solpos[‘azimuth’], data_frame[‘Dni’], 
                                                data_frame[‘Ghi’], data_frame[‘Dhi’], 
                                                dni_extra=dni_extra, model=’haydavies’)
tcell = pvlib.temperature.sapm_cell(total_irrad[‘poa_global’], temp_air, wind_speed, **temperature_model_parameters)

effective_irradiance = pvlib.pvsystem.sapm_effective_irradiance( total_irrad[‘poa_direct’], 
                                               total_irrad[‘poa_diffuse’], am_abs, aoi, module)

dc = pvlib.pvsystem.sapm(effective_irradiance, tcell, module)

ac_power = np.maximum(number_modules * pvlib.inverter.sandia(dc['v_mp'], dc['p_mp'], inverter), 0)

 

Step 3: Calculating the net estimated savings of the PV installation

Ultimately, the net savings per PV installation is calculated by subtracting the approximate installation cost from the amount saved by not paying for the energy generated. The function to calculate the net estimated savings requires input parameters which include the merged dataframe of the irradiance and energy consumption data along with the user inputs of the time horizon, price per watt, number of panels, panel type, inverter type, and the price of energy consumed per kilowatt-hour. The code is given below.

power_savings = np.minimum(data_frame[‘consumption’], panels_count    *  estimated_generation_by_unit)

mean_hourly_power_savings_by_day = power_savings.groupby(power_savings.index.date).mean()

expected_hourly_power_savings = mean_hourly_power_savings_by_day.mean()

potential_energy_savings = (expected_hourly_power_savings * time_horizon * 365 * 24)

watts_per_panel = int(panel_type.split(‘_’)[3][0:3])

initial_investment = panels_count * watts_per_panel * price_per_watt

total_invoice_reduction = potential_energy_savings * (price_per_kwh / 1000)

potential_savings = total_invoice_reduction — initial_investment

 

The figure below displays the interactive plot of the PV sizing tool. The graph in red denotes the estimated generation of solar energy for the chosen time horizon whereas the blue plot represents the energy consumption data.

Resulting plots of solar energy produced vs energy demand for a client in Harare (Source: Omdena)

Resulting plots of solar energy produced vs energy demand for a client in Harare (Source: Omdena)

 

The plot below displays the daily variation in energy demand as well as solar energy production. The Plotly library was used to create the plots for these dashboards. Since Plotly is interactive, the users can zoom in and out of the plot easily and observe trends on a micro and macro level.

 

Plot capturing the daily variation in energy demand and solar energy production (Source: Omdena)

Plot capturing the daily variation in energy demand and solar energy production (Source: Omdena)[

 

Energy Alert Tool — Short-term dashboard

The short-term dashboard for NeedEnergy is designed specifically for the city of Harare, given the availability of energy demand data. This dashboard is used for modeling energy demand up to 36 hours in the future and plotting the potential solar energy generation in the same time frame. The end-users of this dashboard are clients who already have solar array systems installed and are interested in receiving alerts for their energy demand and production. This dashboard uses the LightGBM model to forecast energy demand using past consumption data. The solar irradiance values for 7 days in the future are derived from the Solcast API for Harare and the PVLIB library is used for predicting the solar energy generation for the same 36 hours as the forecasted energy demand. The figure below displays the pipeline of this tool.

 

Pipeline for the Energy Alert Tool (Source: Omdena)

Pipeline for the Energy Alert Tool (Source: Omdena)

 

Step 1: Data Collection

The most important requirement for this dashboard is the data. Multiple datasets were used for the short term prediction tool which includes:

  • Energy consumption from NeedEnergy’s clients via their proprietary API
  • Forecasted solar irradiance data for Harare from Solcast API
  • Panel and Inverter list from the PVLIB Python library

 

This dashboard uses a 7-day forecast dataset of solar irradiance for Harare provided by the Solcast API. This generated file is titled ‘Mean Meteorological Year for Harare’. Based on the assumption that the irradiance values will not significantly deviate from the 14-year average over the next two decades, a tool called PVLIB predicts the solar energy generation based on a number of user inputs which include the following:

  • Type of panel
  • Type of inverter
  • Tilt angle
  • Azimuth angle
  • Number of panels

 

The figure below displays the user input section of the short-term dashboard.

Input section for the short term dashboard (Source: Omdena)

Input section for the short term dashboard (Source: Omdena)

 

The primary objectives of this dashboard are the following:

  • Predict the energy demand of the users over a period of one week
  • Predict the solar energy produced in the same time period using forecasted irradiance data

 

Step 2: Energy Demand Forecasting with LightGBM

As mentioned previously, given the historic demand of a client, our aim is to forecast the demand during the next 36 hours.

To do this we trained as many predictors as hours we have in our forecast time horizon, i.e. 36 models; each of them potentially with a different set of input variables. To simplify the discussion from now on we will focus on the energy demand prediction ‘t’ hours ahead in the future for a fixed value of ‘t’. Fitting a model able to forecast for all the clients at once sounds tempting however, unrealistic as each client has different consumption patterns and such a model would be very complex. Instead, we fit one model per client.

We deliver a methodology to train forecast models instead of focusing on one model. This will allow the users to trigger the training process and update the models periodically incorporating new data by executing the training methodology explained below. This methodology relies on LightGBM to train several models and the usage of SHAP values to reduce the overfitting to prune the set of input variables.

LightGBM is a widely used fast and highly efficient gradient boosting framework using trees with low memory usage, high accuracy, and able to handle large-scale data sets (find more details in the lightgbm package documentation). On the other hand, the SHAP values technique aims to determine the contribution of a given input variable to the output of a trained model (please read “Predicting the Important Factors of a Successful Startup using SHAP Value to know more about this useful technique).

The base demand forecast models are endogenous and autoregressive, i.e. they only use historic values of the demand for a given client, and not exogenous variables as meteorological or any other sort of data.

 

Train and Prune Approach

To reduce the well-known overfitting of models with a large number of input variables we utilized an input variable pruning approach to reduce the model complexity. It consists of two steps.

Step 1. Keeping the heavier input variables

First, the LightGBM estimator is trained on the initial set of features and the importance or weight of each of them is obtained through the SHAP values (see code below). Then, we keep the most important twenty features. You can see more detail on the implementation below.

def keep_importants(cols, importances, size=20):
   important_index = np.argsort(importances)[::-1][:size]
   important_features = cols[important_index]

   return important_features

 

 

Step 2. Keeping the input variables with weights surpassing a percentage threshold

From the current set features, we recompute the importance using SHAP values, but this time we normalize the values with respect to the highest one, in an effort to make the contributions comparable. Finally, we only keep the features with at least 5% of normalized importance. An implementation of this normalization and pruning process is shown below.

def keep_by_percentage(cols, importances, percentage=0.05):

     largest_importance = np.max(importances)
     normalized_importance = importances/largest_importance
     mask = normalized_importance > percentage
     important_features = cols[mask]

     return important_features

 

This recursive process of feature selection is nothing new, but similar to what is done by the recursive feature elimination (RFE).

 

Input Features After Each Pruning Step (Summary)

The initial set of input variables or features:

  • Previous 72 hours of energy consumption
  • Weekday name
  • Time of the day

 

First pruning step:

        • The 20 features with the largest SHAP values after the first training round

Second pruning:

        • The remaining features with SHAP values being at least 5% of the maximum SHAP value after the second training round

 

A warning word on pruning based on SHAP values

As the SHAP values technique is not supposed to deal with correlations between input variables. It is possible that major contributions of underlying effects that are present in several correlated variables are removed. This is because the impact of the effect is split among all the correlated variables having as a result small SHAP values making them prone to be pruned removing the underlying effect.

Energy Alert Tool Plots Snapshot

Energy Alert Tool Plots Snapshot

 

In the figure above, historical energy demand data is displayed in the blue line, forecasted energy demand for the next 36 hours is in purple, and the solar energy produced with the selected number of panels is displayed in red. The alerts warn the clients whether the solar energy produced will be sufficient for their demand or not. The alerts are produced based on specific percentage thresholds of the difference between solar energy versus energy demand. 

 

Step 3: Solar Energy Production Forecast with PVLIB

This step involves using the forecasted solar irradiance from Solcast. This data is derived from the Solcast API and it has a 30-minute resolution. The data itself contains the same parameters as those used in the PV sizing long-term dashboard above. In order to access the API, one needs to create an account and obtain an API key on their website. We will be using the solcast python library to call the Solcast API for retrieving data as given below. The latitude and longitude variables are fixed for Harare, Zimbabwe. 

import solcast

latitude = -17.824858
longitude = 31.053028
API_KEY = #Place your API_KEY here

data = solcast.get_radiation_forecasts(latitude, longitude, API_KEY)
seven_day_forecast = data.forecasts
data_df = pd.DataFrame(seven_day_forecast)

Then, we repeat the same methodology in Step 2: Modeling Solar Energy Production from the PV Sizing Tool section above for using the PVLIB library to model energy generation based on irradiance and temperature data. The time horizon for the modeled solar production matches the forecasted energy demand time horizon of 36 hours.

The user interface for the dashboards displayed in this blog was created with a combination of Streamlit and Heroku apps. This article will not focus on the specific details as the usage of Streamlit is explained thoroughly in an excellent blog Streamlit Tutorial: Deploying an AutoML Model Using Streamlit“. For deploying the dashboards on Herokuapp, the tutorial by Navid Mashinchi does the trick. It is titled “A quick tutorial on how to deploy your Streamlit app to Heroku”. We followed the steps in both blogs to create and deploy our dashboards.

 

Conclusion

Machine learning and artificial intelligence have a strong use case in energy demand forecasting whereas solar energy production can be modeled for a given location using physics-based libraries such as PVLIB. These tools can become more commonplace for places around the world that desperately need renewable energy such as places in Sub-Saharan Africa. Once again, the tools shown in this article can be viewed here. Working with Omdena and NeedEnergy for a real-world challenge was an incredible learning experience and we are very grateful for this unique opportunity. Thank you so much for reading and happy coding! 

Develop Your Career and Make a Real-World Impact

Innovation

The world´s only place for truly collaborative AI projects to apply your skills on real-world data with changemakers from around the world.

Apply & grow your skills in our real-world projects

Upcoming AI Projects

AI Teams

Make an impact in our upcoming projects in Natural Language Processing, Computer Vision, Machine Learning, Remote Sensing, and more.

Check out our projects!

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here