
The EU-funded EXTRACT project has reached its conclusion, delivering a fully integrated distributed data-mining software platform capable of handling extreme data across edge, cloud and high-performance computing (HPC) environments. Over three years, the consortium coordinated by the Barcelona Supercomputing Center (BSC) has transformed an ambitious vision into a validated platform, demonstrated through two demanding real-world use cases in crisis management and astrophysics.
From Vision to Platform
When EXTRACT launched in January 2023, the challenge was clear: existing technologies could only cope with some data characteristics independently and in isolation. Three years on, the project has a complete edge-cloud-HPC continuum that addresses the full data lifecycle, from ingestion and processing to knowledge extraction and model serving.

The platform integrates a rich stack of open-source and purpose-built components. A multi-layer data infrastructure handles bulk object storage via SkyStore (an S3-compatible multi-cluster solution), time-series data, and metadata management through the Nuvla Data Catalog. On top of this, the EXTRACT data-mining workflow framework brings together COMPSs for task-based parallel orchestration, Lithops for serverless parallel data processing, and KServe for scalable machine learning model serving, all running on multiple Kubernetes clusters.
Crucially, the platform does not simply facilitates the development and deployment of complex workflows across the continuum but exploit the performance capabilities of the different resources.
Orchestration Across the Continuum
A major achievement of the project has been the development of orchestration mechanisms across the continuum. On one side, COMPSs, as the coarse-grained application orchestrator, dynamically schedules tasks to the different resources that form the compute continuum, respecting data locality to reduce unnecessary data movement and adapting in real time to resource availability. On the other side, Lithops, combined with Dataplug and Flexecutor, complements this fine-grained parallel exploitation with automatic data transfer reduction and smart provisioning.
The multi-cluster architecture, allows workflows to span edge, cloud and HPC resources seamlessly, selecting the most appropriate computing environment for each task based on latency, system health, and the extreme data characteristics specified by the user. Underpinning this is a comprehensive distributed monitoring framework built on Prometheus, an open-source systems monitoring and alerting toolkit capable of monitoring a wide range of performance metrics.
Security and Privacy by Design
The EXTRACT platform treats cybersecurity as a foundational requirement. The project has delivered protections at every data state: encryption solutions for data at rest, security-audited data-in-transit flows, and hash-based integrity mechanisms for data in use.
At the machine learning level, EXTRACT evaluated Multi-party Computation and Homomorphic Encryption for deep learning inference, enabling model execution in encrypted form, so that the data owner and model owner can remain separate entities operating in isolated environments. This capability was tested both locally and on edge devices.
Two Use Cases, Proven Results
Personalized Evacuation Routing in Venice. The PER use case has demonstrated how EXTRACT technology can support real-time crisis management in a complex urban environment. Data Fusion technology merges data from multiple sources and a Reinforcement Learning model, was trained to generate personalised evacuation routes for citizens navigating Venice. The system integrates a semantic Urban Digital Twin, fed by Copernicus and Galileo satellite data, IoT sensors and 5G signals, and uses ontologies to maintain rich, queryable representations of urban conditions, while a Device Simulator drives the movement and actions of virtual people in the UDT to optimize urban scenarios. Differential privacy techniques protect citizen data throughout.
Transient Astrophysics with a Square Kilometre Array Pathfinder (TASKA). The astronomy use case tackled the extreme data volumes produced by NenuFAR radio-telescopes. Through Lithops-powered parallel ingestion and the DataPlug dynamic partitioning tool, EXTRACT workflows achieve the targeted 100× reduction in raw data volume, producing high-quality, openly accessible datasets for the astronomy community via the EOSC portal.
Looking Ahead
Eduardo Quiñones, EXTRACT coordinator and established researcher at the Barcelona Supercomputing Center, reflected on the project’s journey:“EXTRACT has shown that it is possible to build a platform that is simultaneously open, secure and capable of handling truly extreme data, across the full spectrum from constrained edge devices to large-scale HPC clusters. The technologies we have developed and validated do not just serve the specific challenges of Venice or radio-astronomy, they form a reusable foundation that European industry and academia can build upon to continue addressing the data-intensive challenges ahead.”
About EXTRACT
The EXTRACT project (A distributed data-mining software platform for extreme data across the compute continuum) was funded under Horizon Europe Research and Innovation Action number 101093110. The project ran from 1 January 2023 to 31 March 2026. The consortium was coordinated by the Barcelona Supercomputing Center (BSC) and included: Ikerlan (Spain), Universitat Rovira i Virgili (Spain), Observatoire de Paris (France), Centre National de la Recherche Scientifique (France), Université Paris Cité (France), Logos Ricerca e Innovazione (Italy), City of Venice (Italy), Binare (Finland), Mathema srl (Italy), IBM Israel (Israel), and SixSq (Switzerland).
