Topology-Aware Task and Data Placement for Distributed Scientific Workflows
Key: KWSS26
Author: Sami Kharma, Tobias Wies, Björn Scheuermann, Florian Schintke
Date: May 2026
Kind: In proceedings
Abstract: Modern scientific data analysis workflows frequently operate across diverse, distributed infrastructure. In such environments, task placement and data transfers between them can impact overall performance significantly. While typical scheduling approaches often focus on computational resource allocation, the impact of network constrained data transfers is often ignored or simplified. In this work-in-progress paper, we propose an approach that explicitly models data transfers while considering the network topology. Integrating topology information with data size estimates and node capability insights, we develop a cost function that estimates the makespan. It considers all previously mentioned factors, for a specified task and data placement. We formulate the task of finding a good assignment as a minimization problem and outline a genetic-algorithm-based search strategy to approximate the best placement. The proposed framework aims to enable topology-aware scheduling decisions, or to refine schedules generated elsewhere, for distributed scientific workflows.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.