PROJECT RESULTS

SOFTWARE PLATFORM

The EXTRACT project aims at developing a software platform that seamlessly integrates different environments (edge, cloud, and HPC). Its main objective is to enable the distribution and execution of tasks (for example, machine learning, data analysis, simulations, etc.) across a heterogeneous infrastructure, facilitating collaboration and data sharing for the two identified use cases:

Below is the conceptual architecture of the platform, consisting of five distinct layers.

Putting It All Together

The EXTRACT platform is designed to seamlessly integrate each of these layers to form a unified software ecosystem capable of operating across edge, cloud, and HPC resources. In practice, an application begins by collecting or receiving data (be it logs from sensors on the edge, large datasets in the cloud, or HPC simulation outputs). This information is then organized, cataloged, and enriched in the Data Infrastructure layer, which provides consistent data access and moves it if needed.

Once data is properly staged, the Data Mining Framework takes over, allowing developers or domain experts to define analytical workflows. Here, frameworks such as COMPSs and Lithops help describe and parallelize tasks, with the end goal of exploiting the compute continuum infrastructure at its maximum.

Meanwhile, the Data-Driven Orchestration serves as the central “brain” of EXTRACT, deciding how and where these tasks should run, once the workflows have been previously defined. Factors like resource availability, latency needs, or cost objectives can guide this scheduling and deployment process. The orchestrator automatically provisions services or containers where needed and monitors their performance, so users gain a real-time view of their workflows and can adjust as conditions change.

Finally, the Compute Continuum layer ensures that HPC clusters, cloud backends, and edge nodes are abstracted behind a single interoperability framework. As a result, applications can make use of different environments, for instance offloading certain tasks to edge nodes for real-time processing or using HPC for the heaviest calculations. Ortogonally to all that, the Full Stack Security layer binds these operations together, guaranteeing that data handling, model training, and overall service continuity are secured against threats.

Although the aforementioned Use Cases of EXTRACT act as testbeds for validating and demonstrating the platform’s capabilities, the EXTRACT software platform is broadly applicable. Any scenario requiring distributed data processing, analytics pipelines, or heterogeneous resource orchestration can benefit from EXTRACT’s cohesive approach. By merging comprehensive data management, a robust data mining framework, sophisticated orchestration, an interoperable compute continuum, and strong security, the platform stands ready to address a wide variety of domains far beyond its initial industrial pilots.

 

KEY EXPLOITABLE RESULTS

DESCRIPTION

BISETO Security Toolkit for Compute Continuum

The BISETO Security Toolkit is a consulting and training service designed to provide comprehensive security support for compute continuum projects. Leveraging the BNR Cybersecurity Toolkit for Compute Continuum Software, BISETO integrates BNR’s expertise in cybersecurity and compute continuum technologies (e.g., EXTRACT) to guide complex research and industrial deployments.

SkyStore Distributed Object Storage

Sky Store is a distributed object storage system used for saving large datasets. It provides scalable, secure storage that’s ideal for cloud or hybrid environments where we need to store and retrieve large amounts of data.

COMPSs (COMP Superscalar)

COMPSs is a task-based programming model that eases the development and execution of parallel applications in hetereogeneous environments. It has been enhanced to be efficiently executed across multiple Kubernetes multiclusters, abstracting the underlying compute continuum to users. 

Lithops

Lithops is a serverless framework that splits large datasets and runs tasks in parallel across different cloud providers. It makes handling data-heavy workloads easier by automatically scaling resources based on need.

Dataplug

Dataplug is a framework for analyzing unstructured data in the cloud that enables efficient parallel access through on-the-fly dynamic partitioning. It uses read-only, storage-aware indexing to avoid costly preprocessing, significantly reducing costs while improving scalability and performance for distributed scientific workloads in cloud environments.

Nuvla Data Catalogue

Nuvla.io is a cloud-native platform that allows remote management of edge devices and applications with the help of NuvlaEdge, an agent software running on edge devices. Nuvla not only simplifies the deployment of containerized applications across edge locations, it also features a metadata catalogue to ensure that data collected at the edge is categorised, mapped, and easily retrievable by users

IKERLAN Monitoring Platform

A lightweight, extensible software component that enables intelligent and energy-aware observability across heterogeneous compute continuum infrastructures

ObsParis TASKA Tools

The set of EXTRACT TASKA tools, developed by ObsParis, make use of the EXTRACT infrastructure  for optimising and orchestrating the data processing of the NenuFAR instrument, an SKA Pathfinder. It implements: (i) real-time detection of scientific events, high resolution raw data flow, (ii) a compute continuum workflow orchestrator for radio astronomy; and (iii) an advanced ML-based radio astronomy imager for dynamic radio sources.

Device Simulator

The simulator is the engine that drives the movement and actions of virtual people within the Urban Digital Twin. It replicates real-world behaviours to help us study and optimize urban scenarios. It is also helpful to simulate a large amount of scenarios in order to train our MARL.

Data Fusion

DataFusion merges data from multiple sources into one cohesive view. It combines, cleans, and integrates datasets, allowing us to make sense of diverse data streams for analysis and decision-making.

Digital Twin

The Urban Digital Twin (UDT), developed by LRI in the context of the PER use case, is a modular platform that creates a real-time and semantically enriched representation of people mobility in urban areas. It fuses multiple data sources—such as mobile device positions, environmental sensors, and satellite imagery—into a dynamic digital mirror of the city. The platform enables continuous monitoring, semantic reasoning, and visualization of city dynamics, supporting emergency response and long-term urban planning.