European Commission logo
JRC Smart Electricity Systems

Price forecasting for the balancing energy market using machine-learning regression


Publication year:
Publication Category:
External Link:


The importance of price forecasting has gained attention over the last few, with the growth of Aggregators and the general opening of the European electricity markets. The importance of the position in the market, increases closer to the delivery of energy, and hence the tendency for higher electricity prices. Market participants manage a tradeoff between, bidding in a lower price market (day-ahead), but with typically higher volume, or aiming for a lower volume market but with potentially higher returns (Balance energy market). Companies try to forecast the extremes of revenues or prices, in order to manage risk and opportunity, assigning their assets in an optimal way. The marginal price of electricity production from different units or system demand, are variables that have contributed to the decision making strategy of bidding in these markets. It is thought that in general, electricity markets have quasi-deterministic principles, rather than being based on speculation, hence the desire to forecast the price based on variables that can describe the outcome of the market. Many studies address this problem from a statistical approach or by performing multiple-variable regressions, but they very often focus only on the time series analysis. The literature shows that hybrid solutions tend to deliver better accuracy, but this often depends on the dataset under analysis. In 2019, the Loss of Load Probability (LOLP) was made available in the UK for the first time. Taking this opportunity, this study focusses on 5 LOLP variables (with different time-ahead estimations) and other quasi-deterministic variables, to explain the price behavior of a multi-variable regression model. These include base production, system load, wind and solar generation, seasonality and imbalance volume contributions. Three machine learning algorithms were applied to test for performance, Gradient Boosting, Random Forest and XGBoost. The latter has a higher performance and so implemented for real time forecast. The model has a mean absolute error (MAE) of 6.10, a R2 score of 83.6% and a mean squared error (MSE) of 72.92. The variables that contribute the most to the model are the Net Imbalance Volume, the LOLP (aggregated), the De-rated margins (aggregated) and the base Production with 30.5% with 29.4%, 13.4%, and 6.1% of weight on feature importance respectively.