PROJECT RESULTS

SOFTWARE PLATFORM

The EXTRACT project aims at developing a software platform that seamlessly integrates different environments (edge, cloud, and HPC). Its main objective is to enable the distribution and execution of tasks (for example, machine learning, data analysis, simulations, etc.) across a heterogeneous infrastructure, facilitating collaboration and data sharing for the two identified use cases:

Below is the conceptual architecture of the platform, consisting of five distinct layers.

Putting It All Together

The EXTRACT platform is designed to seamlessly integrate each of these layers to form a unified software ecosystem capable of operating across edge, cloud, and HPC resources. In practice, an application begins by collecting or receiving data (be it logs from sensors on the edge, large datasets in the cloud, or HPC simulation outputs). This information is then organized, cataloged, and enriched in the Data Infrastructure layer, which provides consistent data access and moves it if needed.

Once data is properly staged, the Data Mining Framework takes over, allowing developers or domain experts to define analytical workflows. Here, frameworks such as COMPSs and Lithops help describe and parallelize tasks, with the end goal of exploiting the compute continuum infrastructure at its maximum.

Meanwhile, the Data-Driven Orchestration serves as the central “brain” of EXTRACT, deciding how and where these tasks should run, once the workflows have been previously defined. Factors like resource availability, latency needs, or cost objectives can guide this scheduling and deployment process. The orchestrator automatically provisions services or containers where needed and monitors their performance, so users gain a real-time view of their workflows and can adjust as conditions change.

Finally, the Compute Continuum layer ensures that HPC clusters, cloud backends, and edge nodes are abstracted behind a single interoperability framework. As a result, applications can make use of different environments, for instance offloading certain tasks to edge nodes for real-time processing or using HPC for the heaviest calculations. Ortogonally to all that, the Full Stack Security layer binds these operations together, guaranteeing that data handling, model training, and overall service continuity are secured against threats.

Although the aforementioned Use Cases of EXTRACT act as testbeds for validating and demonstrating the platform’s capabilities, the EXTRACT software platform is broadly applicable. Any scenario requiring distributed data processing, analytics pipelines, or heterogeneous resource orchestration can benefit from EXTRACT’s cohesive approach. By merging comprehensive data management, a robust data mining framework, sophisticated orchestration, an interoperable compute continuum, and strong security, the platform stands ready to address a wide variety of domains far beyond its initial industrial pilots.