Leonhard Balduf, M.Sc.

Leonhard Balduf

Bio

Leonhard Balduf got his B.Sc. in Munich in 2016. His thesis on distributing computational tasks for stochastic optical reconstruction microscopy (STORM) was written while visiting the Bewersdorf Lab at Yale University.

He did his M.Sc. in computer science at HU Berlin and finished in 2021, having spent multiple semesters abroad, in the USA, Russia, and South Korea.

Leonhard Balduf is now a researcher at the Technical University of Darmstadt and the Weizenbaum Institute, in a research group focused on trust in distributed environments.

Research Interests

His research interests are focused on peer-to-peer and decentralized systems and many problems of efficient solutions in computer science.

The years 2021 and 2022 were mostly filled with research on the Interplanetary Filesystem (IPFS).

Offene Abschlussarbeiten

Supervisor: Leonhard Balduf
KOM-ID: KOM-M-0776
Link zur Ausschreibung

Overview

The Interplanetary Filesystem (IPFS) is a P2P data storage and retrieval network. Structurally, it exhibits features of both structured and unstructured networks. A Kademlia-based DHT is used to store lists of providing peers for data items. Requests for data are first locally flooded in an unstructured fashion, and only on failure located through the DHT and scoped more precisely.

We have multiple years of recorded requests from the IPFS network, including information about the clients and geolocation of the request origins. This is a large dataset that needs to be analyzed. Furthermore, data collection is an ongoing process and can be augmented, if necessary.

Goal

Derive useful information from the dataset.

Tasks

  1. Understand the dataset and how we capture it. Understand its limitations.
  2. Come up with interesting queries about the dataset.
  3. Engineer solutions that incrementally (i.e., batch-processing new data) derive answers to these queries from the dataset.
  4. Implement said solutions to automatically(!) derive insights into the dataset and visualize them.

Requirements

  • Proficiency with Linux and familiarity with the command line. Data analysis will almost exclusively run on remote servers due to the volume of the data.
  • Motivation about the topic. Don't pick this topic if all you need is a thesis project.
  • Knowledge of common formats such as JSON and CSV
  • Knowledge of Python (and R) for data processing and visualization. Final visualizations etc. should be done using R, but it's possible to learn enough R in half a year to make this work.

Literature

  • https://arxiv.org/abs/2104.09202
  • https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9142764

Veröffentlichungen

Alle Veröffentlichungen anzeigen