Dive into the DestinE Data Lake and glimpse the future
The Destination Earth Data Lake will be a key element providing harmonised access to a very broad data portfolio from many diverse data spaces, contributing with the right data at hand to solve what-if questions for Europe – but how will this be possible?
In an exclusive interview, Lothar Wolf, Head of Digital Solutions and SAF Division at the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT), delves into the details of the organisation’s role within DestinE and the groundbreaking impact this collaboration is working to achieve.
Introducing EUMETSAT and DestinE
DestinE is implemented in a strategic partnership between the European Space Agency (ESA), the European Centre for Medium-Range Weather Forecasts (ECMWF) and the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT), in short the 3Es.
DestinE consists of three major subsystems interacting via well-defined interfaces. Each organisation is responsible for one DestinE subsystem. The DestinE Core Service Platform (DESP), under the responsibility of ESA, is the user entry point to DestinE. It will allow users to customise the services they use, integrate their own data, develop and share their own applications or information generated.
ECMWF develops the Digital Twin on Weather-induced and Geophysical Extremes and the DT on Climate Change Adaptation and makes use of its Digital Twin Engine (DTE) to facilitate access to the DT outputs.
EUMETSAT is responsible for the DestinE Data Lake (DEDL) component, aiming to bring to users’ fingertips, data of unprecedented scale and diversity as well as a dynamic and flexible portfolio of big data processing services and workflows executed near these vast volumes of data.
Unpacking the Concept: The Data Lake and its Three Pillars
So, what exactly is the Data Lake? EUMETSAT Digital Solutions and SAF Division Head Lothar Wolf guides us through the significance of the Destination Earth Data Lake (DEDL) and its three pillars. The DEDL is not purely an infrastructure service; its a dynamic solution that addresses the challenges of providing the right data services for DestinE as a whole. The three pillars are:
Pillar 1: Harmonisation of Data Access: The DestinE Data Lake handles a wide variety of data from diverse data spaces. Use-case implementations can benefit from a uniform interface to obtain data at a fingertip to solve their questions. The DEDL provides access not only to the challenging volumes of Digital Twin outputs but also to federated data from various existing and evolving data spaces, beyond traditional Earth Observation. This is managed via a user-driven data portfolio and enabled by a harmonised data access solution that abstracts away the heterogeneity and complexity of the underlying data sources.
Pillar 2: Digital Twins Data output: The DEDL will enable the vast output data that the Digital Twins will produce to be managed and stored using the DEDL data warehouse reference architecture that has been jointly developed by EUMETSAT and ECMWF.
Pillar 3: Big Data and Near Data Processing: The Data Lake facilitates processing directly near the large volumes of data hosted, in particular the outputs produced by the DTs as mentioned above. This simplifies the development of artificial intelligence (AI) and machine learning applications and paves the way for data-driven insights and real-time decision-making.
What sets the DEDL apart is the technical and service innovation it represents. While individual components and standards exist, the Data Lake is a unique combination of these elements to create a cohesive, user-focused ecosystem drawing on information from diverse data spaces, while embracing state of the art data service.
“The way we integrate existing components and the way we then expose those as a service to the user is truly new for Europe. That doesn’t exist in such form today!” said Lothar.
The data lake combines various cloud edges distributed across Europe and provides support via its big data processing framework with near data processing capabilities, enabling users to derive information for their what-if questions.
Fishing for Data
So how does the data get into the Data Lake? And how will users be able to access that data? Lothar Wolf guides us through the population and integration process.
To answer the first question: it is useful to recall that all data that the DEDL will provide, is defined and governed by the user driven DestinE Data Portfolio.
Conceptually, data of the DEDL data portfolio is divided into two categories. First among these are the DT outputs. These will be obtained via the data warehouse reference architecture, jointly developed by ECMWF and EUMETSAT. Secondly, data that resides at different data providers/data spaces to which the data lake will implement and provide access via data federation.
This sets the basis for the core operations of the DEDL as a federated, interconnected and responsive service. Lothar highlighted, “The Data Lake will federate with the different Data Spaces over time and, in view of implementing the user-driven Data Portfolio. It will be our responsibility to engage with the different Data Spaces.”
Speaking of users, those who wish to access the Data Lake data processing services, there will be two kinds of access:
–indirect access is the more “traditional” one, as Lothar Wolf puts it: users will run their processing application on the Destination Earth Core Service Platform, and from there access the datasets they require.
–direct access is where users will launch part of their data processing tasks via the DESP directly in the DEDL and by this truly interface with the resources and capabilities of the DEDL big data processing framework. This means that users will be able to execute algorithms directly near the data, which will significantly speed up the time for information generation and positively impact the development of their research and development processes.
Of course, some of the users might develop applications that will turn themselves into data and service providers, who will themselves provide their outputs to the wider benefit of DestinE community back into the DEDL, a concept which is emblematic of…
The circular nature of the Lake
Having explained the above, it becomes clear that the innovative concepts of the DEDL go far beyond the conventional data infrastructures available elsewhere today, in which data is simply stored and accessed.
In fact, one of the major strengths of the DEDL lies in the prospective growth and evolution that is baked into its digital infrastructure: the more data is inserted, federated, harmonised, and processed, the more the Lake itself grows across all domains. Lothar Wolf refers to this as a “snowball effect”, explaining also that the more Machine Learning capabilities increase and are refined, the more the Data Lake’s potential will be fully exploitable.
Lothar said, “I really do think that in a few years from now, as the user-driven data portfolio becomes bigger and bigger, it will become more and more simpler for users to implement their use cases and launching algorithms directly near the data will become much more prevalent.”
Conclusion: gazing into the Lake’s future
The Data Lake represents a turning point in the empowerment of research and innovation processes related to the environment. It is only with the combined effort of all three major components of Destination Earth that the initiative will be able to propel us forward into a future of environmental simulation, climate change adaptation, and sustainability.
Lothar ended saying, “This is one of the most attractive opportunities in Europe not only because it combines science, data and technology in a unique way but also to finally find answers to the compelling “What-if” questions, and this has the potential to be a huge game changer.”
Make sure you don’t miss important DestinE updates. Join the DestinE Community newsletter now.