PROJECT RESULTS
SOFTWARE PLATFORM
The EXTRACT project aims at developing a software platform that seamlessly integrates different environments (edge, cloud, and HPC). Its main objective is to enable the distribution and execution of tasks (for example, machine learning, data analysis, simulations, etc.) across a heterogeneous infrastructure, facilitating collaboration and data sharing for the two identified use cases:
- Use Case 1: Real-time monitoring and analysis for predictive maintenance
- Use Case 2: Automated process and resource management for operational efficiency
Below is the conceptual architecture of the platform, consisting of five distinct layers.
Putting It All Together
The EXTRACT platform is designed to seamlessly integrate each of these layers to form a unified software ecosystem capable of operating across edge, cloud, and HPC resources. In practice, an application begins by collecting or receiving data (be it logs from sensors on the edge, large datasets in the cloud, or HPC simulation outputs). This information is then organized, cataloged, and enriched in the Data Infrastructure layer, which provides consistent data access and moves it if needed.
Once data is properly staged, the Data Mining Framework takes over, allowing developers or domain experts to define analytical workflows. Here, frameworks such as COMPSs and Lithops help describe and parallelize tasks, with the end goal of exploiting the compute continuum infrastructure at its maximum.
Meanwhile, the Data-Driven Orchestration serves as the central “brain” of EXTRACT, deciding how and where these tasks should run, once the workflows have been previously defined. Factors like resource availability, latency needs, or cost objectives can guide this scheduling and deployment process. The orchestrator automatically provisions services or containers where needed and monitors their performance, so users gain a real-time view of their workflows and can adjust as conditions change.
Finally, the Compute Continuum layer ensures that HPC clusters, cloud backends, and edge nodes are abstracted behind a single interoperability framework. As a result, applications can make use of different environments, for instance offloading certain tasks to edge nodes for real-time processing or using HPC for the heaviest calculations. Ortogonally to all that, the Full Stack Security layer binds these operations together, guaranteeing that data handling, model training, and overall service continuity are secured against threats.
Although the aforementioned Use Cases of EXTRACT act as testbeds for validating and demonstrating the platform’s capabilities, the EXTRACT software platform is broadly applicable. Any scenario requiring distributed data processing, analytics pipelines, or heterogeneous resource orchestration can benefit from EXTRACT’s cohesive approach. By merging comprehensive data management, a robust data mining framework, sophisticated orchestration, an interoperable compute continuum, and strong security, the platform stands ready to address a wide variety of domains far beyond its initial industrial pilots.
KEY EXPLOITABLE RESULTS
DESCRIPTION
The BISETO Security Toolkit is a consulting and training service designed to provide comprehensive security support for compute continuum projects. Leveraging the BNR Cybersecurity Toolkit for Compute Continuum Software, BISETO integrates BNR’s expertise in cybersecurity and compute continuum technologies (e.g., EXTRACT) to guide complex research and industrial deployments.
SkyStore Distributed Object Storage
Sky Store is a distributed object storage system used for saving large datasets. It provides scalable, secure storage that’s ideal for cloud or hybrid environments where we need to store and retrieve large amounts of data.
COMPSs is a task-based programming model that eases the development and execution of parallel applications in hetereogeneous environments. It has been enhanced to be efficiently executed across multiple Kubernetes multiclusters, abstracting the underlying compute continuum to users.
Lithops is a serverless framework that splits large datasets and runs tasks in parallel across different cloud providers. It makes handling data-heavy workloads easier by automatically scaling resources based on need.
Dataplug is a framework for analyzing unstructured data in the cloud that enables efficient parallel access through on-the-fly dynamic partitioning. It uses read-only, storage-aware indexing to avoid costly preprocessing, significantly reducing costs while improving scalability and performance for distributed scientific workloads in cloud environments.
Nuvla Data Catalogue
Nuvla.io is a cloud-native platform that allows remote management of edge devices and applications with the help of NuvlaEdge, an agent software running on edge devices. Nuvla not only simplifies the deployment of containerized applications across edge locations, it also features a metadata catalogue to ensure that data collected at the edge is categorised, mapped, and easily retrievable by users
A lightweight, extensible software component that enables intelligent and energy-aware observability across heterogeneous compute continuum infrastructures
ObsParis TASKA Tools
The set of EXTRACT TASKA tools, developed by ObsParis, make use of the EXTRACT infrastructure for optimising and orchestrating the data processing of the NenuFAR instrument, an SKA Pathfinder. It implements: (i) real-time detection of scientific events, high resolution raw data flow, (ii) a compute continuum workflow orchestrator for radio astronomy; and (iii) an advanced ML-based radio astronomy imager for dynamic radio sources.
Device Simulator
The simulator is the engine that drives the movement and actions of virtual people within the Urban Digital Twin. It replicates real-world behaviours to help us study and optimize urban scenarios. It is also helpful to simulate a large amount of scenarios in order to train our MARL.
Data Fusion
DataFusion merges data from multiple sources into one cohesive view. It combines, cleans, and integrates datasets, allowing us to make sense of diverse data streams for analysis and decision-making.
Digital Twin
The Urban Digital Twin (UDT), developed by LRI in the context of the PER use case, is a modular platform that creates a real-time and semantically enriched representation of people mobility in urban areas. It fuses multiple data sources—such as mobile device positions, environmental sensors, and satellite imagery—into a dynamic digital mirror of the city. The platform enables continuous monitoring, semantic reasoning, and visualization of city dynamics, supporting emergency response and long-term urban planning.





