Energy in Catalonia
Providing a scalable machine-learning solution, the project enhances electricity demand prediction to support resilient and sustainable energy networks in a changing climate.
In a nutshell
- The growing electrification of economic sectors and climate-induced extreme weather are straining electricity networks, necessitating better demand-supply prediction and adaptation.
- A machine-learning-powered system integrates diverse datasets to predict high-resolution electricity demand, accounting for weather and socioeconomic factors.
- This model aids in adapting energy systems to climate challenges, supporting distributed renewable generation and resilient electricity networks.
Technical Overview
Challenge
In the European Union (EU), households represented 27.4% of the final energy consumption in 2021. Adding commercial and industrial buildings accounts for almost 70 % of the final energy consumption. Many of these energy loads are electrical-driven and contribute significantly to seasonal and daily peak demand. Analysing the trend of the electric vehicles penetration in Europe (EV) over the 2016-2021 period, EV sales increased by a compound annual growth rate (CAGR) of 61%. This scenario of growing electrification of most economic sectors means that the electricity networks and the energy sector, in general, must increase the generation of clean energy to avoid power supply interruptions.
Furthermore, significant improvements in demand response prediction and control should be undertaken to ensure a proper match between generation and consumption. Generally, 20% of the power generation capacity is latently available to meet the peak demand for approximately 5% of the time. Historically, matching electricity supply and demand was relatively straightforward, with large and controllable power plants on the one hand and relatively easy to predict demand on the other. But in recent years, this scenario has started to change due to several reasons: (a) the use of smaller, variable and less predictable decentralised renewable generation is increasingly directly affecting the energy matrix; (b) Europe is attempting to disentangle its demand from Russian gas and fuels (with a looming supply gap for the coming winter) while keeping up with its decarbonisation targets; (c) Climate change, and the resulting temperature changes (like the recent European heat waves), affect the electricity sector both on the demand and the supply sides; (d) the significant penetration of renewable energy sources (RES) has an impact in the territory and in the electricity networks which requires more profound and more sophisticated analytic and infrastructure design methods.
Under these complex scenarios, improving the assessment of the impact of extreme weather events, due to the climate change, on the electricity consumption and supply in a specific territory, such as the Catalonia region, is essential. This assessment should be based on a detailed characterisation of the electricity demand (in a high time resolution) and of the energy already supplied by the existing RES power plants.
DestinE Solution
Approach
The solution to the challenge consists on training a machine learning XGBoost algorithm on a Dask Cluster, taking advantage of the parallelization. Unlike the classic machine learning methods for time series that require separate models, for example, for each postal code, in XGBoost a single model for the entire dataset can be effective; enabling data scalability.
The model is trained using daily granularity. To enhance the performance, feature engineering was performed, such as heating degree days (HDD) and cooling degree days (CDD), derived from the average difference between air temperature and reference temperature to quantify the energy demand related to cooling and heating demand for any given space.
Databases used
The model training was performed using the historical data of ERA5 Land, with a dataset partition of 80% (843 950 instances) for the training and 20% (210 987 instances) for the test.
For the analysis, the following reliable data platforms were used 1:
- Datadis [1]: Provides consumption data sourced from the energy distributors across Spain.
- INE [2]: Supplies socioeconomic data from the National Institute of Statistics of Spain.
- Esios [3]: Offers pricing data from the Spanish Electricity Market Operator System.
- Cadaster [4]: Provides building information from the INSPIRE harmonized dataset.
- DEDL [5]: Supplies weather data. It retrieves historical data from ERA5 Land and predictive data from Digital Twins. Short-term predictions are sourced from the Extreme DT, while long-term predictions come from the Climate DT.
The model was trained using a 5-fold cross-validation to ensure a robust evaluation of its performance across different data subsets.
Figure 1: Ranking of the features trained in the model.
In the initial training without tuning, the model achieved a CVRMSE (Coefficient of Variation of the Root Mean Square Error) of 19.7%, which was improved to 7.95% after performing an hyperparameter optimization.
The tuned hyperparameters obtained from grid search were the following:
- Maximum Tree Depth: 12 (allowing the model to capture complex interactions in the data).
- Minimum Child Weight: 3 (controlling overfitting by limiting how deep the trees grow).
- Subsample by Tree: 1.0 (using all available data for each tree to avoid variance).
- Column Sample Rate by Tree: 0.8 (reducing overfitting by randomly sampling columns for each tree).
- Learning Rate: 0.1 (balancing the model’s learning speed and convergence).
Architecture
The architecture is integrated within the Islet service (PaaS). Three instances are used using the dual-location setup of LUMI, and CENTRAL is created. Each instance employs a “Shared nothing” architecture. This approach interconnects many independent machines through the network, with each virtual machine managing its own disk, memory, and processor. This design allows independent operations to simultaneously gain exposure to other machines.
Figure 2: Data-driven architecture of the use-case.
Figure 2 sketches the data management system, which comprises three actions put in a dashed container, each with a specific purpose. The first, Data collection inserts/uploads raw data into object storage, distributing it to Data Processing that parallelizes the data harmonisation with Dask and Polars to efficiently clean, ingest, transform, and load the data into persistent storage, ensuring both reliability and privacy. Analytical tools are integrated into the database, enabling the development of applications such as XGBoost ML model for predictive analytics. The results obtained can lead to adjustments or enhancements in the data management strategy.
Visualization
This trained model was used to predict scenarios for the Climate DT employing the complete dataset for 2026. And it is designed to predict scenarios using the Extreme Weather DT for the subsequent 5-day forecast. The validation period was the 14th, 15th, 16th, 17th, and 18th of October of 2024.
The Visualization is an interface to analyse the impact of weather on electricity consumption. A website has been developed so that everyone can access and explore the data. The technologies employed were Plotly, Matplotlib, Streamlit and Docker. The pipeline illustrated in Figure 3 outlines the web page construction process.
Figure 3: Structure of the website.
This web application consists of three main pages: Energy, Weather, and Predictions.
Figure 4: Electricity consumption for 2026 based on the Climate Adaptation DT predictions.
Figure 5: Electricity consumption for the period 14th to 18th of October of 2024 using the predictions of the Extreme Weather DT.
Impact
The Energy in Catalonia use case facilitates the combination of multiple data sources to train a Machine Learning model which can predict the energy consumption of an entire region at a very high resolution (spatial and temporal). This is an important step towards building a complete digital twin of the electricity distribution network, which faces a paradigm shift, from concentrated power plants, to a much more distributed power production, taking advantage of renewable energies. And thus, it is important to assess whether the electricity distribution network is ready to cope with this paradigm shift. More so, in light of the increasing effects of climate change, which is causing extreme weather effects, that in turn cause the population to demand higher power peaks.