PROJECT RESULTS

SOFTWARE PLATFORM

The EXTRACT project aims at developing a software platform that seamlessly integrates different environments (edge, cloud, and HPC). Its main objective is to enable the distribution and execution of tasks (for example, machine learning, data analysis, simulations, etc.) across a heterogeneous infrastructure, facilitating collaboration and data sharing for the two identified use cases:

Below is the conceptual architecture of the platform, consisting of five distinct layers.

Putting It All Together

The EXTRACT platform is designed to seamlessly integrate each of these layers to form a unified software ecosystem capable of operating across edge, cloud, and HPC resources. In practice, an application begins by collecting or receiving data (be it logs from sensors on the edge, large datasets in the cloud, or HPC simulation outputs). This information is then organized, cataloged, and enriched in the Data Infrastructure layer, which provides consistent data access and moves it if needed.

Once data is properly staged, the Data Mining Framework takes over, allowing developers or domain experts to define analytical workflows. Here, frameworks such as COMPSs and Lithops help describe and parallelize tasks, with the end goal of exploiting the compute continuum infrastructure at its maximum.

Meanwhile, the Data-Driven Orchestration serves as the central “brain” of EXTRACT, deciding how and where these tasks should run, once the workflows have been previously defined. Factors like resource availability, latency needs, or cost objectives can guide this scheduling and deployment process. The orchestrator automatically provisions services or containers where needed and monitors their performance, so users gain a real-time view of their workflows and can adjust as conditions change.

Finally, the Compute Continuum layer ensures that HPC clusters, cloud backends, and edge nodes are abstracted behind a single interoperability framework. As a result, applications can make use of different environments, for instance offloading certain tasks to edge nodes for real-time processing or using HPC for the heaviest calculations. Ortogonally to all that, the Full Stack Security layer binds these operations together, guaranteeing that data handling, model training, and overall service continuity are secured against threats.

Although the aforementioned Use Cases of EXTRACT act as testbeds for validating and demonstrating the platform’s capabilities, the EXTRACT software platform is broadly applicable. Any scenario requiring distributed data processing, analytics pipelines, or heterogeneous resource orchestration can benefit from EXTRACT’s cohesive approach. By merging comprehensive data management, a robust data mining framework, sophisticated orchestration, an interoperable compute continuum, and strong security, the platform stands ready to address a wide variety of domains far beyond its initial industrial pilots.

 

KEY EXPLOITABLE RESULTS

DESCRIPTION

BISETO Security Toolkit for Compute Continuum

The BISETO Security Toolkit is a consulting and training service designed to provide comprehensive security support for compute continuum projects. Leveraging the BNR Cybersecurity Toolkit for Compute Continuum Software, BISETO integrates BNR’s expertise in cybersecurity and compute continuum technologies (e.g., EXTRACT) to guide complex research and industrial deployments.

SkyStore Distributed Object Storage

Sky Store is a distributed object storage system used for saving large datasets. It provides scalable, secure storage that’s ideal for cloud or hybrid environments where we need to store and retrieve large amounts of data.

COMPSs (COMP Superscalar)

COMPSs is a programming model that boosts the performance of large-scale applications by automatically running tasks in parallel. It makes it easy to scale applications across different environments like cloud, edge, and HPC without needing manual intervention.

Lithops

Lithops is a serverless framework that splits large datasets and runs tasks in parallel across different cloud providers. It makes handling data-heavy workloads easier by automatically scaling resources based on need.

Dataplug

DataPlug facilitates seamless data integration between different systems and platforms. It acts as a bridge that connects disparate data sources, ensuring smooth data flow and compatibility. It is particularly useful in environments where multiple data formats, sources, or services need to interact, helping to automate the process of data extraction, transformation, and loading (ETL) while maintaining data integrity across the compute continuum.

Nuvla Data Catalogue

Nuvla.io is a cloud-native platform which allows remote management of edge devices and applications. It simplifies how we manage and deploy containerized applications across edge locations, making remote management and scaling more efficient. Nuvla.io and NuvlaEdge are used in the EXTRACT project for data management features, namely the data catalogue. In addition to this existing functionality, SIX is developing from scratch new features on top of the data catalogue, to enhance edge-to-cloud data management.

IKERLAN Monitoring Platform 

Kubernetes is an open-source tool that automates how we deploy, scale, and manage containerized apps. It handles tasks like scaling, load balancing, and ensuring apps run consistently across environments.

ObsParis TASKA Tools

The set of EXTRACT TASKA tools, developed by ObsParis, make use of the EXTRACT infrastructure  for optimising and orchestrating the data processing of the NenuFAR instrument, an SKA Pathfinder. It implements: (i) real-time detection of scientific events, high resolution raw data flow, (ii) a compute continuum workflow orchestrator for radio astronomy; and (iii) an advanced ML-based radio astronomy imager for dynamic radio sources.

Device Simulator

The simulator is the engine that drives the movement and actions of virtual people within the Urban Digital Twin. It replicates real-world behaviours to help us study and optimize urban scenarios. It is also helpful to simulate a large amount of scenarios in order to train our MARL.

Data Fusion

DataFusion merges data from multiple sources into one cohesive view. It combines, cleans, and integrates datasets, allowing us to make sense of diverse data streams for analysis and decision-making.

Digital Twin

The Urban Digital Twin (UDT), developed by LRI in the context of the PER use case, is a modular platform that creates a real-time and semantically enriched representation of people mobility in urban areas. It fuses multiple data sources—such as mobile device positions, environmental sensors, and satellite imagery—into a dynamic digital mirror of the city. The platform enables continuous monitoring, semantic reasoning, and visualization of city dynamics, supporting emergency response and long-term urban planning.