1 Introduction

Fresh water is one of humanity’s most important natural resources, indispensable not only for consumption but also for industrial and agricultural production [28]. Unfortunately, all around the world, several urban Water Distribution Networks (WDNs) are having difficulties to supply people’s demand [15, 26] and prospects foresee a worsening scenario caused by urban population growth, climate change, and aging infrastructure (UNESCO [42]). For example, approximately 40%, on average, the distributed treated water is lost on its way to consumers in Brazil (SNIS [40]), and around 60% is essentially due to leakages in the network (GO-Associates/Trata Brasil Institute [13]).

Globally, it is estimated that 346 million cubic meters per day are lost, which represents a financial loss of about USD 39 billion per year. Hence, a possible way of mitigating the lack of water is to make the WDNs more efficient through investments in leak reduction, which may have a payback time of only 5 years [11].

To inspect possible leaks in an operating WDN, techniques that require minimal interference or downtime are preferred. Some methods are currently available to meet his requirement: non-vibroacoustic, such as ground-penetrating radar (GPR) [17], satellite-based spectral analysis, infrared thermography [38], and, the most commonly used, vibroacoustic techniques, for example listening sticks, electronic listening devices, noise correlators and noise loggers [18].

In the water industry, the vibroacoustic techniques are the most promising ones [37], since they have higher sensitivity, good location accuracy, lower false alarm rate and require less work time from the operator, compared to non-vibroacoustic techniques [24].

Electronic listening devices are very popular due to their relative low cost and variety of producers and vendors. They usually consist of geophones or accelerometers to measure the leak-induced vibration and convert it into sound so an operator can listen and determine if there is a leak or not [16].

This kind of device is handheld and very versatile. It can be easily transported, allowing to conduct different types of measurements, from coarse leak evaluation—placing the sensor at accessible fittings (such as fire hydrants and valves to mainly detect larger leaks)—to more thorough ones, in which listening is performed at all pipe fittings, including curb-stops or stop-taps, and at the ground surface directly above pipes at small intervals, so smaller leaks can be detected. Most of these devices have embedded signal amplifiers and noise filters to suppress ambient noise and improve the signal-to-noise ratio, which implies that the effectiveness of the model depends heavily on the operator’s training and experience to select the appropriate filtering frequency range, amplification levels, and then, identify the characteristic leak sound [16].

Thus, while these devices are cheaper, portable and can be applied at almost any location, currently, their major drawback is that they are labor-intensive, time-consuming and depend heavily on the operator’s expertise and training. This also means that the cognitive bias reduces its accuracy.

Noise correlators, in turn, require access to the surface of the pipeline to place sensors and need information about the speed at which the leak noise travels—the wave speed in the pipeline, and in the soil. However, they are more precise [3, 33]. At last, acoustic or noise loggers may not achieve satisfactory results because of background noise interference [27]. Each monitoring point requires one or more sensors and is installed in fixed locations. They also require a consistent internet connection between the sensors and the nodes.

Given this context, this paper proposes a data-driven leak location approach for underground water pipelines. The present work aims to develop an algorithm that simplifies the search process performed in the water networks when using electronic listening devices on the ground surface. The proposed method automates the location of the leak by providing the distance between the sensor and the epicenter of the leak, just as the noise correlators do, and has greater potential for identifying the signature of a leak that would not be detected by the operator’s own sensitivity. Therefore, the model may allow, in the future, the development of commercial devices that will not require access to the underground pipe and will reduce the number of measurements needed to find the leakage in the water networks, as suggested in Fig. 1b, reducing the time for repair and, consequently, the amount of water lost.

Fig. 1
figure 1

a Schematic representation of the technique. b Imagined future prototype

There is a great diversity of classical methods for the location of leaks and other vibroacoustic sources in several media [30]. For these applications, Machine Learning Algorithms (MLA) have been used to obtain better results [5, 22]. In particular, Convolutional Neural Networks (CNN) have shown excellent results for the location of vibroacoustic sources [10, 19, 25].

Artificial Neural Networks (ANN) are universal methods of approximation of functions, even those with strong nonlinearities, inspired by the functioning of neuronal cells. The use of ANNs does not require the modeling of the physical phenomenon [21], only its input and output data. CNNs are a combination of convolution layers formed by filters and Multi-Layer Perceptron (MLP), based on the functioning of the human visual neural system. They have the advantage of accepting two- or three-dimensional inputs such as images or spectrograms and have fewer trainable parameters than common MLPs [5].

For example, Fang et al. [8] proposed multiple leakage points detection in water distribution networks based on Convolutional Neural Networks. Nonetheless, the method analyzes changes in the internal pressure distribution of the network; thus, the water pressure sensors needed direct access to the pipeline and did not take into account the influences generated by different soils in the system.

Others [20, 39] have proposed similar approaches using CNNs, but they are limited to detecting if there is a leak or not in a given pipeline segment. In this work, we are proposing a method that intends to give a precise location of the leak allowing a faster repair of the pipe.

To develop a new method that does not depend on access points to the buried pipeline and that is adaptive to different soil conditions—wave propagation velocity, pipe diameters, and pipe materials—we propose the application of Artificial Intelligence based on Machine Learning, using a CNN so that the water leak may be detected through the vibration signals measured at the ground surface, requiring only the training of the algorithm for the various situations created. The technique requires only three sensors, and the CNN model is trained using the Power Spectral Density (PSD) of the collected and processed signals. Figure 1a presents the schematic representation of the proposed technique.

The present paper is organized as follows. Following this introduction, Sect. 2 presents the test setup and the experimental procedures developed for data preparation and processing. Section 3 describes the search for the best Hyper-parameters (HPs) for the CNN. The selection of the four best models was carried out in two stages; in the first one, 227 models are trained with randomly selected HPs, at predetermined intervals. Then, the best models are evaluated and are trained until convergence is achieved. A discussion of the experimental results and the limitations of the method are given in Sect. 4. At last, some conclusions are presented in Sect. 5.

2 Test setup and experimental procedures

The experimental setup used in this work is presented in Fig. 2, and its preparation relied on technical information from water distribution companies in the region. The buried pipe is 2 m long, has a 50 mm diameter and 3 mm in thickness, and is made out of PVC. According to Tsai et al. [36], nowadays, the pipelines in urban areas are generally PVC pipes. For instance, Peres (2005) states that PVC is one of the materials most used by the main water and waste management company in Brazil.

Fig. 2
figure 2

Experimental setup

The pipeline was then buried in a 1.6 m × 1.6 m × 0.8 m wooden box filled by soil at 0.4 m depth to better control the variables related to the soil, such as degree of compaction, internal humidity, and homogeneity. In the field, the minimum depth of laying the pipes depends on the loads that will be on the ground. For these first exploratory tests, 0.4 m depth was used, as mentioned.

The soil was extracted from a deposit in the city of Ilha Solteira, São Paulo, Brazil. It was previously sieved to remove gravels and organic material, and then, it was manually compacted, every 10 cm layer, using a hand stamp, weighing 10 kg, in free drop from a height of 80 cm, for greater uniformity and less spatial variability of the properties of the soil. This stage of soil compaction is of great importance in the backfilling of the supply network trenches, as it provides adequate support for the pipe, avoiding stress concentrations and bending moments in it. Some technical standards, such as NBR 17015 [2], provide guidelines regarding the installation of water supply network pipes.

A 15 cm-long capsule containing an electromechanical actuator (Fig. 3) simulates the vibration caused by a real leak, and a PCB 333B single-axis piezoelectric accelerometer next to it measures the excitation generated. Having a device to simulate the leak instead of real water spilling leak is advantageous for three main reasons: It is guaranteed that the soil properties do not change for all measurements; a real leak requires a pump to keep the water pressure, which would generate un-wanted vibrations and noise to the measurements; an electro-mechanic actuator allows us to easily change the content and intensity of the input signal frequency.

Fig. 3
figure 3

a Capsule with electromechanical actuator attached to the pipe. b Schematic illustration of the pipe excitation system

Therefore, we performed measurements using a white noise input, in the range [250–2500] Hz, generated in Python and amplified by a sound amplifier, with RMS values ranging from 0.1 to 2 m/s2. The lowest signal amplitude was defined for the Signal-to-Noise Ratio (SNR) at around 0, and the highest amplitude was limited by the electromechanical actuator, applying the maximum input current supported.

According to Wang et al. [44], the leak is a continuous seismic micro-source that generates a random vibration with constant statistical parameters over time. Scussel et al. [34] show that the leak spectrum can be represented by a white noise if the ratio of high to low-frequency cut-off frequencies of the bandwidth is small (\(\le\) 10). Figure 3 presents the pipe excitation system.

The pipe was filled with water and pressurized to attempt to mimic the real conditions of the system. The pipe was inserted centralized at 0.3 m high from the base of the box, crossing two sides of the box. The inner walls of the wooden box were covered with two 10-mm-thick layers of the soft polyester mantle. Moreover, in the regions of contact with the wooden box, the pipe was also lined with two layers of geotextile blanket and a layer of 10-mm-thick EVA, avoiding vibrations and reflections in the box structure.

Iron bolts were inserted into the soil, so the accelerometers with a magnetic base could be mounted easily, as seen in Fig. 4a. A 7 × 8 grid was drawn with 0.2 m spacing between each measuring point. A Cartesian coordinate system was defined on the soil surface plane, centered at the point right above the leak simulator device. Out-of-plane acceleration signals were measured at a sampling frequency of 12.8 kHz with eleven PCB 333B single-axis piezoelectric accelerometers, the same model that was coupled to the leak simulator. All sensors were connected to a Siemens LMS Scadas XS data acquisition system.

Fig. 4
figure 4

Sensors positioning. a Accelerometers positioned above iron bolts. b Arrangement of the sensors on the ground surface

One-minute-long signals were measured for each input RMS value, for all points in the grid. However, each measurement presented sensors positioned in 2 rows of 5 sensors, due to the limitation of 12 channels of the data acquisition system, as shown in Fig. 4b, that were moved until all points of the grid were covered. The last sensor was either in the center or at any other arbitrary point.

As done by Wu and Lee [45], this work aims to explore the exponential decay of the vibration concerning the distance from the source to locate the leak. However, as the objective is to locate the point on the surface right above the leak from measurements on the same plane, instead of training the model using energy bands within certain frequency intervals, it was chosen to extract the Power Spectral Density from the time signals collected, given that the amplitude for each frequency is proportional to the energy of the signal for that frequency.

Also, as shown by numerical studies using similar scenarios [7, 44], some frequencies of the signal are attenuated differently in the direction parallel to the pipe compared to the perpendicular direction. Therefore, for a Machine Learning model to detect the coordinates of the leak point on the surface plane from PSD, at least three sensors would be required, placed not co-linearly, as shown in Fig. 5. By following this strategy, the model could compare the amplitude levels for each coordinate and then estimate them. A reduced number of sensors would also reduce the price of a possible future device to locate leaks using the algorithm proposed in this paper.

Fig. 5
figure 5

Configuration of the sensors. a Possible arrangements. b Representation of surface positioning

Figure 5a shows four different configurations of three sensors. In each orientation, the position of one sensor compared to the others is always the same. For example, B is a 90-degree anti-clockwise rotation of A and similarly to C and D. This is important to augment the data available for training and testing and reduce some kind of bias related to the configuration of the sensors.

The data were processed using Pandas (The Pandas development team [35]), and Numpy tools [43], software libraries built on top of the Python programming language, used for data analysis and manipulation, and numerical computations for large arrays and matrices, respectively. All the signals collected were split into windows of 1 s, from which the PSD was calculated using the Welch’s method with Hanning window [29] and 50% overlap, which resulted in a vector with 513 elements.

Thus, the matrices given as input to the CNN model were defined by Xk, shown in Eq. 1, where ps(fn) is the amplitude value from the PSD of sensor s at frequency fn for n = 1, …, N. The values are converted to decibels and range from − 1 to 1.

$$X_{k} \; = \;\left\{ {\begin{array}{*{20}c} {p_{1} \left( {f_{1} } \right)} & {p_{1} \left( {f_{2} } \right)} & { \ldots p_{1} \left( {f_{N} } \right)} \\ {p_{2} \left( {f_{1} } \right)} & {p_{2} \left( {f_{2} } \right)} & { \ldots p_{2} \left( {f_{N} } \right)} \\ {p_{3} \left( {f_{1} } \right)} & {p_{3} \left( {f_{2} } \right)} & { \ldots p_{3} \left( {f_{N} } \right)} \\ \end{array} } \right\}$$
(1)

Since each input matrix is composed of data from only three sensors, unlike they were when the signals were measured, a re-processing step was required so that the dataset could contain all possible positions for the configurations shown in Fig. 5 given the defined grid from Fig. 4.

Figure 6 summarizes the procedures for preparing the collected data for formatting the input matrices.

Fig. 6
figure 6

Procedures for data preparation and processing

3 Hyper-parameter searching

Hyper-parameters (HP) are defined as the variables that construct a Machine Learning Algorithm and are user-controlled. This definition is needed to distinguish parameters chosen by the programmer for the model from trainable parameters that are modified during the training phase by an optimization algorithm [14]. To find the hyper-parameters that may lead to a low error when locating a leakage, a search space was defined inspired by models used for similar tasks [9, 10, 25], listed in Table 1. Several models were built using Python’s library TensorFlow [1], following a typical CNN architecture, as shown in Fig. 7. A random search was performed sampling from uniform distributions for each HP within the defined range, a method shown to be more effective than grid search using regular intervals for the HPs [4].

Table 1 Initial hyper-parameter space
Fig. 7
figure 7

Representation of the configuration of the proposed models

For all tested models, between each dense layer, a random deactivation of 0–30% of neurons during training was applied, a technique known as Dropout. This reduces the precision of the model during training on purpose so that during the testing phase, all neurons are activated and the performance is enhanced, while also significantly avoiding overfitting [12].

The frequency of the PSD vectors for the analysis of the models was truncated at 2000 Hz, since low levels of energy were found above this level when analyzing the energy distribution of the collected signals on the ground surface.

The Mean Absolute Error (MAE) was adopted for some models, whose equation is given by Eq. 2, where yi is the (x, y) coordinates for Sensor 1 (S1), as shown in Fig. 4, \(\widehat{{y}_{i}}\) is the prediction of the model for the position of the sensor, and n in the number of inputs given to the model for testing.

$${\text{MAE}}\; = \;\frac{1}{n}\sum\limits_{i = 1}^{n} \lvert{y_{i}} - {\hat{y}_{i}}\rvert $$
(2)

Although the goal of the model is to predict where the leak is, it predicts the position of the sensor relative to the leak. However, this is just a matter of reference; once the origin of the coordinate system is moved to the sensor, the model predicts the position of the leak. Similarly, the other two loss functions, MSE (Eq. 3) and Huber (Eq. 4) were define.

$${\text{MSE}}\; = \;\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)}^{2}$$
(3)
$$Huber\; = \; \left\{ {\begin{array}{*{20}c} {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}\left( {y_{i} - \hat{y}_{i} } \right)^{2} for \left| {y_{i} - \hat{y}_{i} } \right| \le 1} \\ {\left| {y_{i} - \hat{y}_{i} } \right| - {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}} , otherwise} \\ \end{array} } \right.$$
(4)

The search for the best combination of HPs was performed in two steps. First, considering random combinations from Tab. 1, a total of 227 models were trained along with a function to monitor their ‘loss’ metric. The procedure consisted of checking if the loss function was no longer decreasing at end of every epoch and once this held for five iterations (patience = 5), the training was stopped.

For this approach, the present work used the EarlyStopping class available in the Keras library, which is a deep learning API written in Python, running on top of the Machine Learning platform TensorFlow. 80% of the dataset were used for training and the rest for testing. This allowed us to explore the search space faster and assess the range of the parameter.

In the second step, the proposed models were trained until convergence was achieved. For these models, k-fold cross-validation was performed, using k = 5. Cross-validation allows us to check the consistency of the performance of the model, avoiding biases that may favor a model due to the random partitioning of data for training and testing. This validation is performed by dividing the data into k folds and training the model k times so that in each training, (k—1) parts are used for training and the remaining for testing [41]. The four best results for this step are presented in the Results section.

For both steps, the Schwarz information criteria (SIC) [32] were used to select the best models. This criteria, shown in Eq. 5, evaluate the model quantitatively, taking into account the amount of training data n, the Mean Squared Error (MSE) of the model, and the number of trainable parameters that the models have k. Models with a good trade-off between the number of trainable parameters and precision have lower SIC values and a lower chance of overfitting.

$${\text{SIC}} = n\ln \left( {{\text{MSE}}} \right) + k\ln \left( n \right)$$
(5)

4 Results and discussion

After the collection and processing of the data, a total of 119.500 entries were obtained for the CNN, of which 80% were used for the training stage, and 20% for the test stage.

In the search process, 227 models were trained with randomly selected HPs within the intervals presented in Table 1. The tests were performed with networks with one and two convolution layers, with and without max pooling, and with up to three MLP layers. At end of every epoch, the training loop checked if the loss function was no longer decreasing, and once this held for five iterations, the training was stopped. Then, the best HP arrangements were selected for the next procedure.

Figure 8 illustrates the selection process of some HPs linked to the convolution layers. Each of the lines represents a model, and each of the axes represents a hyper-parameter. Line colors are on a blue-red scale and relate to the error associated with the selection. In the same Figure, a is the activation function, L is the loss function, Otim. is the optimization algorithms, α is the learning rate, and N1N3 are the number of neurons in layers one to three, respectively.

Fig. 8
figure 8

HPs of the models with one convolution layer, trained in the first search step with a three b two and c one layer(s) of MLP

In the second stage of the search, the same proportion between train and test datasets was maintained to carry out the cross-validation in five stages. The second training stage of the models took about thirty to ninety seconds, per iteration. Tables 2, 3, 4 show the configurations of the four models for which the best results were obtained after the first exploration of the hyper-parameters.

Table 2 Convolution layers
Table 3 MLP layers and learning rate
Table 4 Other hyper-parameters

Table 2 shows the size of the filters, with each pair (m, n) referring to a convolution layer. Next, the number of filters is listed, sequentially, for each layer. The activation function used is the same for all convolution layers.

Table 3 contains the HPs of the MLP layers; the number of neurons in each layer and the activation function used in all layers of the MLP. In the last column, the dropout rates are applied at the end of each layer.

Table 4 contains the learning rates applied in the training phase of the models, the loss functions, the algorithms for optimizing the weights of the network, the size of the batch size, and the number of trainable parameters of the model (Table 5).

Table 5 Iterations and MAE of the trained model until convergence

The four pre-selected networks were trained until convergence was achieved and their results, with their 95% confidence intervals, were obtained through cross-validation in 5 steps, as shown in Table 6. The convergence criterion adopted was ∆MAE < 0.01%.

Table 6 Iterations and MSE of the trained model until convergence

Figures 9, 10, 11, 12 present the prediction results obtained with the four selected models. Location predictions are performed on the test data in the last iteration. Subfigure a) shows the convergence curve in which the test MAEs are calculated for each iteration, considering 20% of the data partition. Subfigure b) provides a more visual perspective of the predictions of each CNN, with the yellow points being the real values of the coordinates and the blue points being the predictions. In addition, subfigures c) and d) show the comparison between the results predicted by the network and the real values for each axis of the ground surface plane, separately. It is also possible to observe the error of the network outputs through the boxplots.

Fig. 9
figure 9

Results of the Model A after 147 iterations. a Convergence of the model. b Predictions on the ground surface plane. c X-coordinate predictions. d Y-coordinate predictions

Fig. 10
figure 10

Results of the Model B after 199 iterations. a Convergence of the model. b Predictions on the ground surface plane. c X-coordinate predictions. d Y-coordinate predictions

Fig. 11
figure 11

Results of the Model C after 174 iterations. a Convergence of the model. b Predictions on the ground surface plane. c X-coordinate predictions. d Y-coordinate predictions

Fig. 12
figure 12

Results of the Model D after 201 iterations. a Convergence of the model. b Predictions on the ground surface plane. c X-coordinate predictions. d Y-coordinate predictions

The results showed that from the PSD, it is possible to recognize the signature of the leak. For the four developed models, there is a tendency to predict, with greater precision, points closer to the center. This was already expected, since the farther away from the source, the less signal energy reaches the sensors. Furthermore, the greater the propagation distance of the signal, the more distortions may occur due to the soil heterogeneity—the effect of multi layers or random irregularities in the subsurface [23]—and to the external vibrations of the environment.

Although model C has the lowest MAE and MSE for training (0.69 ± 0.005 cm) and testing (0.51 ± 0.053 cm), its number of trainable parameters is 25.8% higher than the number of the model B. Thus, following the criterion of lowest BIC, model B stands out compared to the other models. Model B presents a convolution layer with 8 filters, of dimensions 8 × 2, with ReLU activation, a MLP intermediate layer with 16 neurons with Swish activation, with a dropout of 0.01, and an output layer with two neurons. The learning rate is 6.09 × 10−4, using the MAE as a loss function, the Adam optimization algorithm, and a batch size of 16.

As seen, model B was able to indicate the position of the leak regardless of the strength of the signal provided by the source satisfactorily, with a test MAE of 1.01 ± 0.070 cm with 95% confidence.

However, a certain limitation of the models is expected in terms of soil conditions. As is common in Machine Learning algorithms, in order to have greater efficiency in predictions for the most diverse scenarios, the same proposed network will probably need to be trained with data collected in other soils, with different compaction degrees and internal humidity. These are some directions for future work. On the other hand, the results obtained so far indicate that the technique is promising and capable of providing good predictions, allowing to facilitate and optimize the process of locating water leaks by acoustic methods.

Some current vibroacoustic methods present really good results for networks with metal pipes, but their performance cannot be extended to field settings with pipes made out of PVC, which remains a commonly used material in WDNs today. Thus, this is also a highlight of the technique proposed in this work as it presented good results for a system with an underground plastic pipe.

Furthermore, compared to another vibroacoustic technique proposed by Proença et al. [31], carried out on the same experimental bench, the Convolutional Neural Network was able to generate results with greater accuracy.

5 Final remarks

In this article, to improve current methods for detecting and locating leaks in buried plastic pipes, a data-driven approach was used to make the process less subjective, faster, and less dependent on convenient access points to the pipes and the user’s experience. Convolutional Neural Network models were trained and validated in an experimental platform by analyzing the Power Density Function of three sensors on the ground surface.

The best configurations of Hyper-Parameters of the models were selected using a two-step search. The best model selected has one convolution layer with 8 filters of dimensions 8 × 2, having ReLU as activation function, followed by an MLP output layer with 16 neurons and a Swish activation, with 10% dropout and an output layer with two neurons. The model was trained with a learning rate of 6.09 × 10−4, using the MAE as a loss function, the optimization algorithm used was Adam and a batch size equal to 16. It was able to estimate the position of the leak regardless of the signal strength provided by the source, having a test MAE of 1.01 ± 0.070 cm with 95% confidence.

The results obtained from the experiments are promising. They provide evidence that this method can indeed decrease the time and effort during the leak location process. Furthermore, it is worth highlighting that the method achieved good results for a system with a plastic pipe, which is already a difficulty for many existing vibroacoustic methods. As mentioned by Brennan et al. [6] and Tsai et al. [36], the leak signal in these cases decreases quickly due to the high damping factor of the plastic material, unlike a steel piping system.

Continuing this line of study, the model may allow the development of a commercial device that will not require access to the underground pipe and will reduce the number of measurements needed to find the leakage in the water networks. However, further validation is required to test the proposed approach for a realistic pipeline situation where more background noise and soil heterogeneities, like buried rocks and organic materials, may be present. The test of the model on different soils, pipeline materials, diameters, and combinations with various pipeline configurations is the next research direction following this paper. Efforts are currently underway to collect field data from a realistic system, and results from that study will be presented in the future.