How we developed (what we believe to be) the most comprehensive residential energy benchmarking analysis of its kind for Sub-Saharan Africa.
Electricity is a driving force behind our most fundamental services; from healthcare to education to refrigeration and lighting in our homes. Without the proliferation of electric power grids in the 20th century, our world would not be the fluorescent, hyper-connected place as we know it. Despite the AC power grid being an invention of the 1880s, it has still not reached many of the people of today. In fact, almost 800 million people do not have access to electricity, with around 600 million of those residing in Sub-Saharan Africa.
Yep, that’s right, 600 million people do not have access to electricity in Sub-Saharan Africa.
For Africa to prosper and reap the benefits of 21st-century technological advancements, we must provide it with cheap and reliable electricity. Furthermore, it is in all of our interests to generate that electricity without the emission of greenhouse gases. Providing that electricity from fossil fuel sources would be myopic; not only from an environmental standpoint but also when considering the economics. According to the International Energy Agency (IEA), solar photovoltaic (PV) power is now the cheapest form of electricity – yes, even cheaper than coal.
This opportunity has been seized by NeedEnergy, an energy-tech start-up that is attempting to leverage modern data science techniques to encourage the uptake of solar PV systems in Sub-Saharan Africa. In early 2021, NeedEnergy partnered with voluntary organization Omdena to gather a team of data scientists to take on this challenge.
In this blog post, I will share some insights from my journey as a Machine Learning Engineer, collaborating with a team of 40 talented changemakers from around the world, in my first project with Omdena.
Sizing a solar photovoltaic (PV) system
When designing any new PV system, we must size it according to the demands of the consumer in question. Too small a PV system would not generate enough energy for their needs; too big a system would be unnecessarily expensive. To size an appropriate PV solution, we must understand the electricity consumption of the user.
Ideally, when sizing a PV system for a consumer, we would have data on their electricity consumption down to an hourly granularity, so we can understand how their demand varies throughout the day. This poses a problem for NeedEnergy because electricity consumption data is not yet widely available in Sub-Saharan Africa. The adoption of smart meters in recent years is developing a wealth of highly granular demand data throughout the world, though these devices are less common in Africa. If electricity consumption data is not available for African consumers, how might we estimate their demands?
Could we use building parameters or socio-economic indicators to estimate the electricity demand?
Our approach – will socio-economic indicators work?
Using building parameters is a common method within the energy sector to estimate the demands of buildings. Energy benchmarks, usually provided in units of kWh/m²/year, are widely available for a variety of building types in Europe and North America. These benchmarks are used heavily in the industry; they are a simple, quick, and effective way of estimating building energy demand. Using socio-economic indicators, however, is not a method that I have come across in the industry, though it feels like a pretty reasonable thing to do, particularly for developing countries. One could assume that there is a relationship between the wealth and standard of living of a consumer, and the magnitude of their electricity demand.
Upon starting this project, I looked for such energy benchmarks that might be applicable to this continent but was surprised to find that they were not widely available. A few estimations of typical household consumption exist, but I found no analysis that could break down residential demands by building size and/or economic status. But wouldn’t it be great if such an analysis existed? Wouldn’t it be awesome to have sophisticated benchmarks that could estimate demand not only on building size but also on socio-economic indicators? Yes, it would, and that’s exactly what we did, please read on to find out more.
The goldmine – 5GB of 91M rows dataset!
Energy benchmarks can only be developed once we have a substantial dataset such that the average consumption values could be considered reliable. Whilst I stated in the previous section that data on energy consumption in Sub-Saharan Africa is not in abundance (which appears to be true), during this project we managed to obtain an absolute goldmine. In 2019, the Domestic Electrical Load (DEL) study datasets for South Africa were published, which are proclaimed to be the largest of their kind for Africa. The datasets must be obtained with permission from DataFrist, University of Cape Town.
When I saw the sheer quantity and type of information available within these datasets I couldn’t quite believe it. Over the years of 1994 to 2014, about a year’s worth of hourly consumption data was collected on 12,000 residential households connected to the South African grid. The complete dataset had over 91 million rows and came to 5Gb – at this size, your standard laptop with 8Gb of RAM really starts to struggle! But not only did we have consumption data, but we also had survey data for each household that collected 50 further variables including a household’s floor area, building materials, appliance use, and financial income.
A few of the key variables available in the survey dataset
|Floor area||Wall material||Number of adults|
|Monthly income||Water access||Number of children|
|Roof material||Years electrified||Number of unemployed|
On top of the abundance of data collected, a large proportion of households could be considered as low income, many of which had monthly incomes that are comparable to other countries in Sub-Saharan Africa. Less than half of the households used in the analysis had access to piped water in the home, more than two-thirds of the households had a corrugated iron or zinc roof, and about 1,100 of the dwellings were built with mud walls. Furthermore, the dataset covers a large number of newly electrified households, with 25% of the households having only had electricity for 4 years or less. These household characteristics have been plotted on the two heatmaps below, the number of households has been split by income band, the darker colors represent a greater number of households.
Whilst South Africa is way more developed than the rest of Sub-Saharan Africa, it is a country with high inequality and the demographics covered in this dataset have shown economic indicators that share similarities with the less developed parts of the continent. In the chart below, the 25th percentile and median monthly incomes in the dataset are compared to the mean and minimum wage monthly incomes of various other Sub-Saharan African countries. The 25th percentile income, at 230 US$ PPP, falls below many of the averages and touches close to some of the minimum wage incomes.
All incomes shown have been adjusted for inflation and converted to the 2016 value of US dollars Power Purchasing Parity (PPP), which can be a better comparator of wages than standard US dollars as it takes into account the differing costs of goods and services within each country.
Annual energy demand
After wrangling and data cleaning to remove erroneously low and high energy per meter squared values, the number of households was whittled down 5,500. The first metric investigated was the annual energy consumption (in kWh) of the households. In Figure 5 below, you will see the distribution of values described by a box and whisker plot. Note that the highest energy consumption values have been identified as outliers.
Previous work by McKinsey has identified that an urban household in Sub-Saharan Africa may consume 2,072kWh/year, whilst a rural household will consume just 480kWh/year. To provide some further context, an average European household may consume in the region of 3,700kWh/year. The median value of annual energy consumption is 30% greater than McKinsey’s estimate for an urban household, which is not far off, though the rural estimate is well below the 25th percentile.
Identifying energy benchmarks
1. Size of a building
The industry standard of building energy benchmarks is to normalize energy demand against the floor area of the building. A building’s size should be a good indicator of its consumption; larger buildings are more likely to have more lights, more appliances, more occupants, etc. Previous studies have found strong linear correlations between building energy consumption and floor area, with R² values as high as 0.9, however, investigating this relationship in this dataset did not yield such an encouraging result. A linear relationship can just about be seen in the chart below; the linear R² is very low at 0.1, but not statistically insignificant.
Plotting a box and whisker plot highlights many outliers at the upper end. However, the mean and median values feel reasonable, at 60kWh/m²/year and 44kWh/m²/year respectively. A new build household in the UK (which has to adhere to certain energy efficiency standards) may have an electricity consumption in the range of 34-104kWh/m²/year.
There is clearly a huge spread in energy consumption values, with much of the upper end feeling unreasonable high, the median is probably a better indicator of typical energy performance, as the mean is being skewed by the high values. In Figure 8, we have plotted annual energy consumption against floor area again, but this time the median energy consumption has been calculated in bins of 25m² of floor area. We see a much nicer correlation, there is a clear positive relationship between floor area and median values of energy consumption in this dataset.
2. Number of occupants
One would expect that the number of occupants would impact the energy demands of a building. With increasing inhabitants, it would be reasonable to assume a greater use of electrical appliances. However, the data has not shown a correlation between two variables, as indicated by the extremely weak R² value in Figure 9 below. The cause of this poor relationship is unknown, though it may be due to the varying ratios of adults to children within households, or it could be that we have a very heterogeneous population in terms of energy consumption, i.e. wealthier individuals may consume more energy than those that have less money.
3. Household income
As a household’s financial income increases it could be expected that their electricity consumption may increase, due to less concern over energy bills and a greater number of appliances, with those appliances potentially being more power-hungry than their less luxurious counterparts. Conversely, a higher-income household may be purchasing higher-end products that are more energy-efficient. For developing countries, the effects of increasing energy efficiency are thought to be less strong than the effects of increasing energy consumption with increasing income.
In the chart below, when looking at the scatter points, a relationship is not easily visible, however, the linear line has shown a positive relationship. Interestingly, its R² value is actually better than the R² seen in the plot against the floor area.
Calculating the median annual energy consumption in bins of 100 US$ PPP of monthly income, we see a very strong correlation. It is clear that the higher-income households are consuming more electricity when looking at the median values.
4. Water Access
An interesting finding has been that a household’s access to water has a significant impact on its electricity consumption. Those with access to piped running water in their homes are consuming more electricity than those that collect water from a nearby river. The access to water is likely to be an indication of wealth, which we have seen to make an impact on energy consumption, but it is also thought that access to running water may increase the use of electric water heaters. Households that do not have running water are more likely to be in rural settings where access to infrastructure is less likely, such households may have a greater tendency to use biomass for water heating than urban dwellings.
From the box and whisker plots in Figure 12 below, it can be seen that those with a tap in their house have the highest median energy consumption (highlighted by the orange line in the box), then there is a significant drop in households that either get running water from a tap in their yard, from a street tap and those that collect water from a nearby river, dam or borehole.
5. Demand models
The aim of this analysis was to develop energy benchmarks for households in Sub-Saharan Africa. The industry standard energy per meter squared metric has shown a huge range in values, which is surprising given that the dataset contained predominantly residential households. To attempt to improve on this standard benchmark, the below two charts display linear models which incorporate the effects of monthly income and water access on the median energy per meter squared, in monthly income bins of 100 US$ PPP.
The relationship with monthly income alone retains a strong positive correlation, with an R² of almost 0.8. Interestingly, when splitting for the water access type, the relationship with increasing income is not as strong, though there is still a weak positive trend (see Figure 14 below). The inside tap model includes households that were labeled as having “tap inside the house”, the outside model includes those households that were labeled as having “tap in the yard”, “block/street taps” or “nearby river/dam/borehole”.
6. Daily load profiles
When sizing a PV system, we not only want to understand a household’s annual or daily energy demand but also how that demand varies throughout the day. Solar PV generation will follow a similar pattern each day that coincides with the amount of solar irradiance hitting the earth’s surface. The sun comes up in the morning, reaches a peak around midday, then tails off in the evening. Ideally, we would consume electricity at the same time as we generate from PV, otherwise, we will need to take electricity from the grid, or have some kind of backup storage such as batteries.
The hourly level of granularity in this South African dataset meant that we could analyze thousands of daily load profiles. An average weekday and weekend day profile was calculated for each of the 5,500 households. The profiles were then normalized by dividing each hour’s consumption by the total daily consumption, this results in each hourly point being the fraction of daily energy consumed at that hour. The sum of all the hourly points will be equal to one. This method allows us to compare the shape of consumption across different households equally, as the shape is irrespective of the magnitude of consumption.
The plot below shows all 5,500-weekday profiles; it’s a bit of a messy hairball of a plot, though you can just about make out a dominant residential profile (the average profile is shown in black). The demand picks up in the morning, tails off during the day (when the inhabitants go out to work or school), then the demand shoots up again in the evening as people return home. The biggest peak is typically in the evening, as lighting will start to be switched on and people relax in front of the TV after a hard day’s work.
The average weekend profile is a little bit smoother, it is less spikey in the morning, presumably due to fewer people rushing to get up for their daily business. The demand stays higher in the daytime than the weekday profile, as more inhabitants will be at home rather than at work or school; the biggest peak still occurs in the evening.
Finding the signal in the noise
In recent years, there have been numerous studies in using unsupervised machine learning techniques such as k-means clustering to identify dominant load profiles within large consumption datasets, Toussaint et al attempted multiple clustering techniques on this very dataset. Let’s give sci-kit learn’s k-means algorithm a go, and see what profiles we can pull out.
The plot below contains all of the 5,500 of the weekday profiles in grey, with the colored lines being the five clusters that k-means has managed to extract. Five clusters were chosen, partly informed by attempting the elbow method on the distortion present in the clusters, and partly just because five is a nice number. All five of the clusters follow a profile that is characteristically residential, except that some appear to be a bit more “spikier” than others. Cluster three, identified in green, has a particularly large evening peak, with its daytime consumption being lower than the others.
Unfortunately, there is no easy way to link these different clusters back to any building parameter or socio-economic variable. We wanted to see if we could connect distinct load profiles back to our findings on annual energy consumption. Average profile shapes were analyzed for increasing bins of 100 US$ PPP income, the load profiles did not appear to show any relationship with its income, a household’s water access, however, did have an impact. The outside water access households had a higher peak in the evenings and a lower midday consumption than those with the inside water access (see Figure 18 below).
Putting it all together
So how can we use all of this insight to help us in sizing a new PV system? The modeling of PV systems is commonly undertaken using a typical year’s worth of hourly electricity demand and solar irradiance data. We can now generate a year’s worth of electricity demand data using the insights gained above.
We developed the linear models for inside and outside water access (as shown in Figure 14) for both weekday and weekend demand (as greater consumption is seen on weekends). Then, using the load profiles determined in Figure 18, we can build a full year’s worth of hourly consumption data ready for modeling. This was all wrapped up into a nice, easy-to-use dashboard, check out Figure 19 below.
With a better understanding of electricity consumption in this region, we can better size new solar PV systems. In this study, we have been able to go beyond typical kWh/m² benchmarks and provide two linear models that can estimate residential electricity demand given a household’s monthly income and access to water. As someone who uses energy benchmarks in their day-to-day work, I’ve never had the luxury of these extra variables in estimating energy demand. How applicable the energy consumption values in this dataset are to countries outside of South Africa is debatable, and needs further research, though I sincerely hope that the insights gained here will be of use in estimating electricity demand in the countries that need it the most.
Special thanks go out to Arna Karick, Charlotte Savage, Teresa Scholz, and Zayd Vawda for their work on this analysis. My further gratitude goes out to all 40 volunteers who tirelessly devoted their own time to help improve clean energy access on this continent. Finally, thanks to Omdena and NeedEnergy for making this happen.
This has been my first Omdena project; I came here for two reasons: as a chance to use my energy sector knowledge for social good, but also as an opportunity to develop my skills in data science and machine learning. There really is no better way of developing your capabilities than to get stuck into a real problem. So if you are looking to build your skills, sign up for a project with Omdena, get real experience, and make the world a better place in the process.
-  IEA, SDG7: Data and Projections, 2020: https://www.iea.org/reports/sdg7-data-and-projections/access-to-electricity
-  Carbon Brief, Solar, Solar is now ‘cheapest electricity in history’, confirms IEA, 2020 https://www.carbonbrief.org/solar-is-now-cheapest-electricity-in-history-confirms-iea
-  Toussaint, Wiebke. Domestic Electrical Load Metering, Hourly Data 1994-2014 [dataset]. Version 1. Johannesburg: SANEDI [funders]. Cape Town: Energy Research Centre, UCT [producers], 2014. Cape Town: DataFirst [distributor], 2019. DOI: https://doi.org/10.25828/56nh-fw77
-  The World Bank Research Observer, Volume 32, Issue 1, February 2017, Pages 21–74: https://doi.org/10.1093/wbro/lkw007
-  McKinsey, Brighter Africa, The growth potential of the sub-Saharan electricity sector, 2015: https://www.mckinsey.com/~/media/McKinsey/dotcom/client_service/EPNG/PDFs/Brighter_Africa-The_growth_potential_of_the_sub-Saharan_electricity_sector.ashx
-  Odyssee Mure, EU – Electricity Consumption per Dwelling: https://www.odyssee-mure.eu/publications/efficiency-by-sector/households/electricity-consumption-dwelling.html
-  GBCSA Energy and Water Benchmark Methodology, 2012: https://gbcsa.org.za/wp-content/uploads/2017/12/Methodology-Report.pdf
-  UK Government, Energy consumption in new domestic buildings 2015 – 2017 (England and Wales): https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/853067/energy-consumption-new-domestic-buildings-2015-2017-england-wales.pdf
-  Toussaint, W. and Moodley, D. (2020). Clustering Residential Electricity Consumption Data to Create Archetypes that Capture Household Behaviour in South Africa.South African Computer Journal 32(2), 1–34: https://doi.org/10.18489/sacj.v32i2.845
-  Toussaint, W. 2019. Evaluation of clustering techniques for generating household energy consumption patterns in a developing country: https://open.uct.ac.za/handle/11427/30905