Characterization of load curves in a real distribution system based on K-MEANS algorithm with time-series data

  • Hernan R. Ullón Universidade Estadual de Campinas
  • Luís F. Ugarte Universidade Estadual de Campinas
  • Eduardo Lacusta Jr. RGE Sul Distribuidora de Energia S.A.
  • Madson C. de Almeida Universidade Estadual de Campinas
Keywords: Data mining, Distance measurements, Distribution system, K-Means algorithm, Time-series data


The modernization of conventional distribution systems in smart grids leads us to face new challenges when dealing with extremely large databases, commonly called Big Data. The accuracy and volume of data have grown significantly with the introduction of Advanced Measurement Infrastructure (AMI). This generates a data tsunami used in different applications of power systems creating great computational efforts, as is the case with the use of a large database of load curves. Due to the patterns that are repeated annually in the demand for active and reactive power in distribution systems, it is necessary to use load clustering methodologies. Based on historical load data, this paper represents a comprehensive approach that uses data mining based on the K-Means clustering method in time-series data for the characterization of real load curves. Besides, a comparative analysis will be presented considering three different distance measurements. This data mining process is presented as a promising method for the recognition of patterns allowing to reduce large databases to some characteristic curves to reduce the computational burden in various applications of power systems. This clustering method is tested using a real database of distribution transformers at UNICAMP.