AI Insights

Forecasting Electricity Prices for Optimal Usage of Renewables in Norway

September 27, 2022


article featured image

Norwegian impact-driven company Think Outside hosted an Omdena AI Challenge where the team developed solutions to forecast the water availability coming from rivers and snowmelt into reservoir lakes as well as electricity price forecasting. This article focuses on the latter one.

Problem Statement

Since the Paris Agreement that was signed at the UN in April 2016 many countries worldwide have made efforts to fight against climate change. The aim of this agreement is to limit global temperature rise through reducing emissions [1]. One of these countries is Norway which is according to a policy review of the IEA a global pillar of energy security and helps lead the world on the transition to clean energy technologies [2]. Recently, Norway unveiled plans to transition from the production of oil and gas into an exporter of renewable electricity [3].

This AI challenge, hosted by Omdena in collaboration with the Norwegian company ThinkOutside aimed to provide a solution to predict water availability as well as electricity prices, as these two are the main factors that drive the production of hydroelectric power in Norway, in order to optimize the renewable energy production in Scandinavia. The project was divided into two sub-projects: water inflow prediction and electricity price prediction. This article will focus on the electricity price prediction.

Approach

Countries belonging to the Nordpool market are divided into several different bidding zones for energy production and distribution. The bidding zones are linked together into a coupled energy market and they ensure that regional market conditions are reflected in the price. Bidding zones can have either a balance, deficit, or surplus of electricity. Electricity flows occur from areas with low demand and low price to areas with high demand and higher price [4,5]. Norway is divided into five bidding zones (Fig. 1):

  • NO1 – Oslo
  • NO2 – Kristiansand
  • NO3 – Molde, Trondheim
  • NO4 – Tromsø
  • NO5 – Bergen
Fig. 1. Norway’s five bidding zones

Fig. 1. Norway’s five bidding zones

In our models we focused and predicted the electricity prices for the following four zones:

  • NO1 – Oslo
  • NO2 – Kristiansand
  • NO3 – Molde
  • NO4 – TromsøNO5 – Bergen

Trondheim was not included as since 2014 it is part of the same bidding zone as Molde. Krisitansand (NO2) was not included as well as the descriptive analysis has shown that it has the same prices as Bergen (correlation of 1). Hence, for those two zones the prices of the other zones can be used. 

The project pipeline is depicted below, it is based on the classical machine learning workflow. First, we understood the problem statement and the business use case behind it in order to conduct some research for this topic. As a next step we examined possible data sources before we wrote code to automatically pre-process the data and merge all needed features into one single dataframe. Then, we imputed missing values and performed an exploratory data analysis (EDA) in which we checked for correlations and visualized data in order to understand them better. In the modeling part we tested different models to predict our four target variables (Elspot prices for each Norwegian region). Finally, we visualized our results.

Fig. 2. Project Pipeline

Fig. 2. Project Pipeline (Source: Omdena)

Data

In this section an overview of the data will be given. The data timespan ranges from 2014 until the first months of 2022. It can be divided into five areas: Nordpool data, commodity prices, world market index, time related data and weather data. After a brief description of the first four data areas, we will have a closer look at the satellite weather data in this section.

Nordpool data:

  • Elspot prices
  • Elspot volumes
  • Regulating prices
  • Elspot capacity incl. production imbalance prices (sales/purchases),
  • Elspot transmission capacities and Elbas initial transmission capacities
  • Consumption prognosis
  • Production prognosis
  • Elbas Volume

Commodity prices:

  • Gas
  • Brent oil
  • Coal

World market index:

  • MSCI World index ETF
  • holidays, weekends
  • weekly, monthly, and yearly encoded features (transformed with cos/sin functions)

The Nordpool database was the main source for our dataset as it offers a lot of data in terms of electricity. We did not only include data of Norway but also of other Nordpool countries in order to account for their affects regarding electricity prices as well. Commodity prices, like oil, gas, and coal affect electricity production costs and therefore directly influence electricity prices. Therefore, we included commodity price data from finanzen.net in our models. In addition, we included the world market index ‘MSCI World index ETF’, to account for geopolitical and economic global events that are not reflected in oil prices. Our data comparison showed that oil prices, although quite closely connected to the index prices, do not always follow their price trends and so it was beneficial to include the stock index. Furthermore, we included as stated above various time related features including cyclical encoding features.

Weather data:

  • Precipitation
  • Average temperature
  • Wind speed
  • Relative and specific humidity
  • Short-wave radiation

Another factor that influences electricity prices is weather. Temperature, for example, affects electricity demand resulting in higher demand in winter months in cold countries like Norway, which in turn affects electricity prices [6]. Other weather factors influence electricity production from renewable energy sources like wind turbines and hydropower dams.

Fig 3. Comparison of weather data resolution from NVE and GloH2O

Fig. 3. Comparison of weather data resolution from NVE and GloH2O

The team chose to use weather data in the form of spatial grids, whereby the map would be divided to equal areas and each area is assigned a value for the weather variable. This type of data was preferred because it provides measurements for all areas uniformly, unlike in-situ measurements from dispersed weather stations which are only available at certain locations and their availability may vary from one region to another. At first, we collected data from the Norwegian Water Resources and Energy Directorate (NVE). NVE has its own API and provides weather data for each square kilometer in Norway. However, ThinkOutside requested that we use a global data source to allow them to extend this application to other countries. Since NVE data is limited to Norway, we collected weather data from GloH2O, which provides weather data for the entire world with a resolution of 0.1 degree (this is equivalent to a rectangle with a height of about 11 km in the North-South direction and a varying width in the East-West due to the variation of longitude spacing between the equator and poles, see this and this). Fig. 3 compares the resolution of the two data sources (the region shown in this graph is the basin of a reservoir in Norway). The GloH2O dataset provides weather data as raster files (netCDF). A raster file is an array of pixels (the rectangles/squares in Fig. 3), where each pixel represents an area on the earth (in this case a 0.1×0.1 degree rectangle) defined by its longitude and latitude, and each pixel is associated with one or more values that represent the measured variable (temperature, precipitation, etc.) [7]. Due to time limitation, we only had the chance to use data averaged over entire bidding zones. This was done by extracting the pixels that fall within each bidding zone, averaging their values over each bidding zone, and returning the averaged measurement (temperature, wind speed, etc.) for each zone (Fig. 4). The process in Fig. 4 was repealed once for each daily measurement of each variable.

Fig. 4. Weather data extraction from netCDF files

Fig. 4. Weather data extraction from netCDF files

A better approach would have been to either manually divide each bidding zone to sub-zones or directly feed all pixel values to the model and let it extract the relevant features. It is intuitive to expect an improvement in prediction accuracy from such an approach. For example, low temperatures in northern and low populated areas surely have less effect on electricity demand than major cities. Similarly, precipitation in hydropower dam basins influence water availability in reservoirs more than precipitation that will eventually be carried away into the sea.

Machine Learning models:

Before we started modeling, we pre-processed and analyzed the data. i.e imputed missing data and performed EDA. We tested two types of models: regression models (Ridge and Lasso) and SARIMAX. In both types we split the data into training and test data as it is common practice in machine learning models. The models are trained on the training data and then tested/evaluated with the test dataset.

Regression models:

For the regression models we shuffled our dataset, i.e. the data is randomly assigned into either training or test data according to the chosen proportion. We observed that the models are performing better when the data is shuffled. It is important to note here that shuffling can only be done when the machine learning model does not have memory. Hence, for the SARIMAX model we did not shuffle the data. Moreover, we performed hyperparameter tuning in order to find the best parameters for our models using the optuna package which has the advantage that it does not take an excessively long time to search through the hyperparameter space for finding the best hyperparameters. The aim of the hyperparameter tuning was to find the model that resulted in the lowest root mean squared error (RMSE).

SARIMAX model:

Since SARIMAX models are computationally expensive and time-consuming, only few features were used out of all the available exogenous features. The assumption for the selection of these features was that electricity (Elspot) prices are dependent on demand and supply (Buy and Sell Elspot volumes), weather, and commodities features. For the SARIMAX models to perform well, it is necessary that the input data is stationary. Hence, we checked our data for stationarity using the Augmented Dickey Fuller (ADF) test which showed that our target variable is not stationary. In order to make it stationary we took the first-differences. Finally, similar to the regression models, we created a function to select the best model parameters using the Akaike information criterion (AIC).

Results

The figures below show the results for the electricity price prediction for the Oslo bidding zone for the regression model and the SARIMAX model respectively. For the regression model for Oslo the hyperparameter tuning and grid search had the result that the Ridge model performs best. The RMSE is 5.94 and R-squared 0.972. As can be seen in Figure 5 the actual and predicted points are quite close to each other.

Fig 5. Oslo Elspot prices prediction with Ridge

Fig. 5. Oslo Elspot prices prediction with Ridge (Source: Omdena)

Figure 6 shows the results for the SARIMAX model for Oslo which has a higher RMSE (12.18) than the Ridge regression.

Fig. 6. Oslo Elspot prices prediction with SARIMAX

Fig. 6. Oslo Elspot prices prediction with SARIMAX (Source: Omdena)

Overall the RMSE for the four predicted bidding regions ranged from approximately 5 to 8 for the regression models while for the SARIMAX models the RMSE ranged from approximately 7 to 12. Hence, the regression models performed better in general. However, we need to take into account that predictions of regression models are made considering a single day’s worth of data as an independent data point and hence make the day ahead prediction based on that day’s worth of data. In contrast to this, the predictions of the SARIMAX models are based on all the historical data before the point in time that is being predicted. As a consequence, regression models might be slower in adjusting to changes than SARIMAX models, which needs to be taken into consideration while deploying the models into production. Our research has shown that also other types of models are recommended in the literature to tackle the challenge of predicting electricity prices as for example deep learning models like LSTM. Unfortunately, due to the time constraint of this project we could not test such models.

Conclusion and social impact

Forecasting electricity prices is substantial not only for companies but also for society. Recent developments in the energy market (gas and electricity) caused by the Ukrainian War have shown how important and interdependent the energy market is. Hence, accurate energy price predictions that take into account several factors – as in this challenge by including very diverse exogenous features – are important for the industry and governments in order to be able to react by making adjustments or adequate policies.

Authors:

  • Evanthia Fasoula: All sections apart from data description and weather data.
  • Yasser Zouzou: Data description and weather data sections.

Collaborators:

Aakanksha Chouhan, Andrew Henry, Anne Losch, Deepali Bidwai, Devika Pace, Ekaterina Paerschke, Elena Barbulescu, Evanthia Fasoula, Hamzah Warsi, Huy-Thong Phan, Joan Vlasschaert, Jorge de Vivero, Kartikey Saini, Keerthana Perumal, Leon Hamnett, Miguel Sindreu, Mihaela Borta, Mircea-Margarit Nistor, Niklas Schlessmann, Noel Simonovici, Peter Rockwood, Suganthi Giridharan, Titilayo Amuwo, Torsten Walther, Yangyang Cai, Yasser Zouzou, Zaw Thu Htet, Zeyneb Chiha

Product Owner: Shrey Grover

References

This article is written by Evanthia Fasoula, Yasser Zouzou.

Want to work with us too?

media card
Smart Solutions Battling Malaria in Liberia with AI
media card
Harnessing AI to Monitor and Optimize Reforestation Efforts in Madagascar
media card
How We Leveraged Advanced Data Science and AI to Make Farms Greener
media card
A Beginner’s Guide to Exploratory Data Analysis with Python