GraphLeak: A Realistic Dataset for Analyzing Leaks in Water Distribution Systems

Authors

  • Lucas Roberto Tomazini São Paulo State University (Unesp), Institute of Science and Technology, Sorocaba, SP, Brazil.
  • Rodrigo Pita Rolle São Paulo State University (Unesp), Institute of Science and Technology, Sorocaba, SP, Brazil.
  • Eduardo Paciência Godoy São Paulo State University (Unesp), Institute of Science and Technology, Sorocaba, SP, Brazil.
  • Esther Luna Colombini State University of Campinas (Unicamp), Institute of Computing (IC), Campinas, SP, Brazil.
  • Alexandre da Silva Simões São Paulo State University (Unesp), Institute of Science and Technology, Sorocaba, SP, Brazil.

Keywords:

Dataset, Water leak detection, EPANET simulation, Water Distribution systems, leakage diagnosis.

Abstract

Water is an indispensable resource for sustaining life. With the global population rising, ensuring access to safe drinking water has become an escalating concern. Internet of Things (IoT) technologies present a viable solution for monitoring water resources, as these “smart devices” can process and transmit sensor data across diverse wireless networks. As sensor data becomes available, novel Machine Learning (ML) techniques have been widely adopted to extract insights and identify anomalies. To support the development of ML techniques for water management, we present GraphLeak, a comprehensive dataset developed to provide data from simulated Water Distribution Networks (WDNs), including sensor nodes with pressure, flow, and volume measurements. The dataset also contains data samples with leakages in different locations. It allows algorithms to detect the presence of leaks and their locations. We applied a Multilayer Perceptron (MLP) algorithm to the GraphLeak dataset and evaluated its results as a demonstrative case study. GraphLeak is developed under realistic hydraulic parameters, making it a reliable tool to acquire data in different WDN conditions, including situations that are unfeasible to reproduce in real life only for data collection purposes (for example, to break a water pipe). Additionally, the workflow for creating the dataset can be reproduced for different WDN models, as we made the source files available at the following GitHub link: https://github.com/gasiepgodoy/WDN-Models-and-Data-Sets.

Downloads

Published

2024-10-18

Issue

Section

Articles