Opinion

Pump Fault Prediction with AWS Lookout for Equipment

Published on 23 Aug, 2023

In industries such as water treatment, pumps play a vital role. Their reliability is essential for good productivity and reduced maintenance costs. Like all mechanical equipment, pumps are susceptible to failure. Fortunately, pump failures are often predictable. Pumps often show warning patterns in the sensor data, such as vibrations, temperature, flow rates, suction and discharge pressures, etc.

The challenge is to accurately detect the warning patterns early to predict pump failures before they happen; this is where machine learning (ML), and specifically Amazon Lookout for Equipment, can provide a solution.

By analysing multiple measurements from a pump and its operating conditions, these advanced algorithms can learn to recognise the early signs of an impending failure, potentially providing warning hours, days or even weeks in advance.

In this case study, we use the dataset of a pump from Turkey hosted on Kaggle: pump_sensor_data

As the output of the data preprocessing, we will have over 90 days’ worth of data ready for Amazon Lookout for Equipment to train the model and about 60 days’ worth of data to evaluate the model.

In this blog post, we will walk through a case study using Amazon Lookout for Equipment to predict pump failures, evaluate its performance compared with a couple of conventional ML algorithms, and conclude with suggestions to improve the model performance.

Introduction to Amazon Lookout for Equipment

Amazon Lookout for Equipment is an AWS fully managed industrial equipment monitoring service that uses ML algorithms to detect abnormal equipment behaviours. Lookout for Equipment analyses the data from the sensors on the equipment to automatically train an ML model based on the data, and it then uses the equipment-specific ML model to analyse incoming sensor data in real time and accurately identify early warning signs that could lead to equipment failures.

This automated anomaly detection capability makes Amazon Lookout for Equipment a powerful tool for maintaining the health of industrial equipment.

Note: Lookout for Equipment is suitable for applications that involve stationary equipment that operate continuously and with limited variability in their operating conditions, e.g. rotating equipment such as pumps, compressors, motors, wind turbines, etc. or assets like heat exchangers, boilers and inverters. It is not intended for equipment that runs infrequently and has high variability in operating conditions.

An Amazon Lookout for Equipment model supports up to 300 sensors, and it can get smarter over time by retraining the model with new data, which overcomes unseen failures and model drift over time.

In the following sections, we will dive into the specifics of how we used Amazon Lookout for Equipment for model training and evaluation.

Architecture

Below is the design of the training and evaluation of the ML model in Lookout for Equipment. Note that the automated data pipeline is out of scope for this blog, but it is helpful to have it here to have the big picture.

In a real-world scenario, the sensor data of the pump can be sent to the AWS Cloud in real-time or near real-time, e.g., through an OPC UA server to a SiteWise Edge Gateway hosted by Greengrass, which in turn ingests the data into SiteWise in the Cloud. A data pipeline is set up to export, transform and load the sensor data from SiteWise to S3. AWS Step Functions can be used to automate ETL jobs. However, in this case, we manually preprocess the data in the AWS Console, using AWS-managed services such as Glue DataBrew, Glue Catalog, Athena, etc.

Data

Raw Data

The dataset of a pump is originally from Turkey on Kaggle: pump_sensor_data

Another view of the data:

And we upload it to S3: s3://<an-s3-bucket-name>/l4e/pump/raw/sensor.csv

Data preprocessing

We use AWS Glue DataBrew to preprocess the data. Firstly, we need to create two datasets from the original dataset, as required by Lookout for Equipment.

Event dataset

The first dataset is the event dataset. In a DataBrew project, we write a recipe with three steps.


Based on the recipe, we run a DataBrew job, and the output of the job is a two-column CSV file, which looks like this:


Then in AWS Athena, we run the following SQL statement to list the periods when the pump was in a broken state, according to Lookout for Equipment specification:


select Timestamp as start_timestamp, next_ts as end_timestamp from ( select *, LEAD(Timestamp, 1) OVER (ORDER BY Timestamp) next_ts from ( select * from ( select previous_status, Status, Timestamp from ( select LAG(Status, 1) OVER (ORDER BY Timestamp) previous_status, Status, Timestamp from ( select date_format(date_parse(Timestamp, '"%Y-%m-%d %H:%i:%s.0"'), '%Y-%m-%dT%H:%i:%s.%f') Timestamp, Status from ( SELECT col0 as Timestamp, col1 as Status FROM "pump-predictive-maintenance"."pump_predictive_maintenace_labelled_data_28sep2022_1664380184451" where col0 != 'timestamp' order by Timestamp asc ) ) order by Timestamp asc ) order by Timestamp asc ) where (previous_status != Status and (Status = 'BROKEN' or Status = 'NORMAL')) order by Timestamp desc ) order by Timestamp asc ) where Status = 'BROKEN'

The list looks like this:


And we store the list as a CSV file ready for Lookout for Equipment to use later on: s3://<an-s3-bucket-name>/l4e/pump/preprocessed/historical-labelled-data/pump-predictive-maintenance-labelled-data.csv

Sensor Data

The second dataset is the sensor dataset. In a DataBrew project, we review the data and write a recipe with the following two steps.

Note: we remove the non-sensor data columns (_c0 and machine_status), and sensor data columns that are empty or have too many missing values (sensor_15 and sensor_50). We also format the timestamp column per the Lookout for Equipment requirement.

Based on the recipe we run a DataBrew job, and the output of the job is a CSV file, which looks like this:

(s3://<an-s3-bucket-name>/l4e/pump/preprocessed/pump-predictive-maintenance-sensors_28Sep2022_1664395819323/pump-predictive-maintenance-sensors_28Sep2022_1664395819323_part00000.csv, later moved to s3://<an-s3-bucket-name>/l4e/pump/preprocessed/pump1/pump1.csv as required by Lookout for Equipment.)

Altogether there are 51 sensor data columns.
With the profiling of the sensor data, we can see that each sensor data column still has missing values:

However, we don’t have to deal with it explicitly, because Lookout for Equipment will automatically fill in missing data (a.k.a. imputing). It does this by forward-filling previous sensor readings.

ML model training and evaluation with AWS Lookout for Equipment

We now have the two datasets required by Lookout for Equipment, so it is time to train the ML model with AWS Lookout for Equipment.
Lookout for Equipment requires four steps, which we will follow one step at a time.

Create a Project

First, create a project with a given name, such as pump-predictive-maintenance.

Ingest Sensor Data

Then go ahead and add the dataset as below.

When you’ve specified the data source (s3://<an-s3-bucket-name>/l4e/pump/preprocessed/pump1/ in our case), you can start the ingestion of the sensor data. Once done, you’ll find the summary of the dataset details below:

Train the ML Model

When the data has been ingested, we can go ahead and select the sensors to be included for training the ML model, specify the training and evaluation date ranges, the sample rate, the location of the event dataset prepared beforehand, etc. and start the training.

When the training of the model is finished, the status of the model will become ready for inference, as below.

Evaluation Results

Upon training completion, Amazon Lookout for Equipment will automatically evaluate the ML model with the evaluation dataset against the historical event logs when the equipment behaves abnormally. You will be able to find the result of the ML model evaluation in the AWS Console, which looks like below.

Amazon Lookout for Equipment compare the detected events to the labelled events, and you will know the accuracy of the ML model. In our case, 2/2 anomalous equipment behaviour events are detected within label ranges with an average forewarning time of 44 mins, while there are 7 anomalous equipment behaviour events detected outside of label ranges with an average duration of 11 mins. For each prediction, Lookout for Equipment gives a list of top contributing sensors, which helps locate and diagnose the potential issue.

We can see that all anomalous equipment behaviour events have been detected, but not with much forewarning time, and a few equipment behaviour events are falsely reported anomalous.

Model Comparison

With the same dataset, a project was carried out to use ML to detect impending pump failures. The project trained many KMeans and KNN models and found the best one for each. Below are the results from these two models.
Here is the result from the KMeans unsupervised model. Note, 0 represents pump status BROKEN, 1 RECOVERING, AND 2 NORMAL.

We can see that this model does not even identify a single BROKEN event in the test data (but we know there are two occurrences in the test data).
And here is the result from the KNN supervised model.

As in the KMeans ML model, this KNN ML model also struggles to identify any BROKEN event. The primary reason why these conventional ML models perform poorly is that the data is very unbalanced. Almost all the data is labeled NORMAL, then we have some RECOVERING, and very few BROKEN instances. This means that these conventional ML models struggle to identify anything other than NORMAL.

For details on this project, please refer to its Kaggle notebook.

Compared to these two conventional ML algorithms, we can have the following conclusions:

  • First, even without particular data preprocessing, Amazon Lookout for Equipment handles unbalanced data well, and so performs much better with 100% recall compared to 0% in this Kaggle project. That is, Amazon Lookout for Equipment detects both recorded anomaly events in the test data, while the two conventional ML models detect none.
  • Amazon Lookout for Equipment further pinpoints the list of sensor data that most contribute to the anomaly, which helps locate and diagnose the potential problem.
  • Amazon Lookout for Equipment also gives forewarning time, normally long enough to do predictive maintenance before the equipment fails.

Conclusion

This blog post discusses how to train an ML model for predictive maintenance using AWS Lookout for Equipment. It provides a step-by-step guide on how to preprocess and format datasets using AWS Glue DataBrew, create a project, ingest the sensor dataset, and train the ML model using Lookout for Equipment.

We also compare the performance of Lookout for Equipment with conventional ML algorithms, KMeans and KNN, in detecting anomalous equipment behaviour. Lookout for Equipment outperforms the conventional models in accurately detecting all recorded anomaly events in the test data, providing a list of top contributing sensors, and forewarning time. However, the forewarning time was relatively short, at an average of 44 minutes, and there were seven false predictions.
We have just over three months' data for the training of the ML model.

Three months of data only meet the minimum requirement by Lookout for Equipment. We would suggest that with more data, Lookout for Equipment's ML model performance would be further improved.

References

Back to the list