Learned Parallel Stream Processing using Zero-Shot Cost Models

Supervisor: Pratyush Agnihotri
KOM-ID: KOM-M-0720
Link zur Ausschreibung

Motivation

 In Distributed Stream Processing Systems (DSPS), queries are usually long-running. They typically deal with a very high workload of million or even billion of events per second. Under such scenarios, parallelism plays an important role in providing scalability for DSPS. One of the core
decisions here is to determine the right parallelization degree to process a high load of events and queries while ensuring high throughput and low latency requirements. In this context, parallelization can be controlled by defining the degree of parallelism for each operator in the operator graph.
The main questions we want to investigate here are -
 

  •  What is the right parallelism degree for query processing? (what is the performance in terms of end-to-end latency for parallel stream processing? At what extend the model can be used - seen and unseen behaviour? What are the limitations of models? At which can be extended?)
  • How to parallelize the processing of DSPs operators? 

 Prerequisites

 There are no hard guidelines for this topic. However, it is preferable if you have:

  • Good programming skills in Java and Python
  • Good understanding of the concepts of machine learning
  • Understanding of the concept of communication networks, big data processing engines, e.g., Apache Flink, Kafka. 

 Reference Literature

  • Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows.
  • A Comprehensive Survey on Parallelization and Elasticity in Stream Processing
  • Resource Management and Scheduling in Distributed Stream Processing Systems: A Taxonomy, Review, and Future Directions
  • One Model to Rule them All: Towards Zero-Shot Learning for Databases Benjamin
  • Apache Flink documentation: nightlies.apache.org/flink/flink-docs-master/docs/learn-flink/overview/ [https://nightlies.apache.org/flink/flink-docsmaster/docs/learn-flink/overview/] 

Contact:

I am looking for motivated students interested in working on cutting-edge technologies and IoT applications area. If you are interested in writing a Bachelor's or Master's Thesis or gaining experience in MAKI project as HiWi, please feel free to contact me. You can email me the following information:

  • Your CV or small text about your courses and prior experiences in this area.
  • The short note on your motivation about this topic and,
  • Transcript or TUCaN grade list.