bolt FAILURE PREDICTION

Based on Synthetic Dataset from UC Irvine as a Capstone Project for 

University of Calgary DAT-310 Applied Deep Learning presented by Roderick Paulino.

This dataset and calculations are for educational purposes only.

Citation

AI4I 2020 Predictive Maintenance Dataset. (2020). UCI Machine Learning Repository. https://doi.org/10.24432/C5HS5C.

Introduction

This dataset was sourced from UCI solely for educational purposes, aiming to demonstrate the potential of machine learning algorithms in predicting the failure of bolts under various operating conditions. 

The analysis of the data does not rely on findings derived from engineering standards; instead, it is based on the author's domain knowledge and extensive field and design experience.

The dataset in question is synthetic, generated by a reputable educational institution and does not mirror real-world data from the field. It serves as a constructed input for machine learning applications. The dataset's parameters reflect some of the authentic factors that could influence the performance of the components under investigation. In an actual case study, additional variables such as manufacturing defects and elements within the engineering formulation would be incorporated to provide a more comprehensive analysis.

The purpose of this presentation is not to develop a new algorithm but to use fit-for-purpose available algorithms to solve and visualize the prediction using conventional machine learning and deep learning.

Exploratory data analysis

Considering the contents of the dataset, the study was carried out using samples that were extracted from components exposed to various operating conditions. These conditions encompassed factors like rotational speed, temperature, tool wear, and the level of torque applied to each individual sample.

Elastic Range: In this initial region, when stress is applied to the bolt and then removed, the bolt will return to its original shape. The material behaves elastically, meaning it's only temporarily deformed.

Yield Point: This is a critical point on the curve where the material starts to deform plastically. This means if the bolt is stressed beyond this point, it will not return to its original shape when the stress is removed. It signifies the limit of elastic behavior.

Plastic Range: Beyond the yield point, the material deforms plastically. Here, even after removing the stress, permanent deformation remains.

Proof Load (Typically 85-95% of Yield): This is the maximum load the bolt can hold without experiencing any permanent deformation. It's a safety measure, ensuring the bolt operates within safe limits.

Typical Clamp Load (75% of Proof Load): This value represents the load at which the bolt is typically clamped in practical applications. It's set lower than the proof load to ensure a margin of safety.

Ultimate Tensile Strength: This is the maximum stress the bolt can withstand while being stretched or pulled before failing or breaking.

Failure (Fracture Point or Tensile Point): This is where the bolt breaks or fractures. It's the stress point at which the bolt can no longer withstand the force applied to it and fails catastrophically.

The vertical axis of the graph represents "Stress," which can be understood as the internal forces within the bolt material resisting deformation. This stress is typically measured in units such as Pascals (Pa) or Megapascals (MPa). The horizontal axis represents "Strain," a measure of deformation representing the displacement between particles in the material body. 

The dataset in question provides information about a particular component or part. Specifically, this dataset includes data on whether the part, when subjected to its typical or designated operating conditions, has experienced failure.

1. UDI: This column is a unique identifier for each record in the dataframe, with values ranging from 1 to 10,000 based on the shown records.

2. Product ID: This contains alphanumeric codes that represent individual product or part IDs.

3. Type: A categorical value representing the type of the product. In the displayed records, types such as "M" and "L" are visible, but there could be more types in the complete dataframe.

4. Air temperature [K]: Represents the air temperature in Kelvin. This refer to the ambient temperature when the product was tested or used.

5. Process temperature [K]: Similarly, this indicates the temperature of the process, again in Kelvin. This is the operational temperature when the product is in use.

6. Rotational speed [rpm]: This denotes the speed at which the product or component rotates, measured in revolutions per minute (rpm).

7. Torque [Nm]: Indicates the torque applied, measured in Newton-meters (Nm). Torque is a measure of force that can cause an object to rotate about an axis.

8. Tool wear [min]: Specifies the wear time of the tool in minutes. This could indicate how long the tool has been used or its wear and tear over time.

9. Target: This column contains numeric values, which represent a target performance metric or an expected outcome for each product. 

10. Failure Type: Describes the result of some testing or usage, indicating if the product or part failed. In the shown records, the value is "No Failure" for all, suggesting these products did not experience any faults or breakdowns during the testing or operation.

Based on the displayed records, the dataframe is tracking the performance of products or components under certain conditions, with a focus on parameters like temperature, speed, torque, and tool wear.

This line of code utilizes the pandas library's isnull() method on the dataframe 'df'. The method isnull() returns a dataframe of the same shape as 'df' but with boolean values indicating if a value is missing (True) or not (False). Chaining this with the sum() method gives a sum of the True values (missing data) for each column, effectively counting the number of missing values per column.

Each column from the dataframe 'df' is listed with the corresponding count of missing values next to it. As we can observe, all columns have a count of 0, which indicates that there are no missing values in any of the columns.

This command is taking advantage of the 'describe' function, which gives us a detailed summary of our dataframe. The 'transpose' function is then used to make the display more reader-friendly by swapping rows and columns.

For 'Air temperature [K]', the average value is approximately 300 K with a slight variation of about 2 K. The minimum and maximum values are 295.3 K and 304.5 K respectively.

What's notable is that all columns have 10,000 entries, ensuring consistency across the dataset. The various mean, standard deviation, and percentile values give us a clear understanding of the distribution and spread of data within each column.

This summary gives us a snapshot of our dataset, allowing us to understand its characteristics and distribution at a glance. 

We have an output from a Python code snippet that calculates the correlation of various columns with a specific column named 'Target' from a dataset 'df'. 

This command computes the pairwise correlation of columns in the dataframe with the 'Target' column, then sorts the correlation values for easier interpretation.

These correlation values provide insight into the linear relationship between each variable and our target. Positive values indicate a direct relationship, while negative values indicate an inverse relationship. The magnitude of these values—how close they are to -1 or 1—tells us about the strength of these relationships.

NOTE: This is the initial EDA of the raw data input of all numerical values to show how feature selection affects the prediction capability of the model. A more advanced algorithm for data processing and feature engineering will be used later to improve the evaluation and validation metrics.

Supplementing our earlier discussion on the correlation values, we have a bar chart visualizing these correlations with the 'Target'.

The chart titled "Correlation with Target" displays the correlation values on the y-axis, ranging from about -0.05 to 0.20, and the various features on the x-axis.

By glancing at this chart, it's immediately evident that 'Torque [Nm]' exhibits the strongest linear relationship with the 'Target', followed by 'Tool wear [min]'. The visual representation aids in quickly grasping the relative strengths of the correlations, supplementing the numerical data shared earlier.

Here we have a heatmap that visualizes the correlation matrix among different parameters.

This heatmap is color-coded, ranging from deep purple, representing negative correlations, through white, indicating no correlation, to bright yellow for positive correlations. The color intensity is directly proportional to the strength of the correlation, and each cell in the heatmap provides the exact correlation coefficient between two parameters.

A couple of important observations to make:

1. The diagonal from the top left to bottom right is uniformly bright yellow, representing a perfect correlation of 1. This is expected, as any parameter would have a perfect correlation with itself.

2. When looking at the interactions with 'Torque [Nm]', it is evident that its strongest positive correlation is with 'Tool wear [min]' at 0.19, which stands out notably in a sea of otherwise muted colors.

3. The correlation coefficient between 'Air temperature [K]' and 'Process temperature [K]' is -0.88. This indicates a strong negative correlation between these two parameters. 

For further insights, the aforementioned high correlation between 'Torque [Nm]' and 'Tool wear [min]' will be visualized to delve deeper into its nature and significance. This visualization will be pivotal in understanding how these two parameters interact and potentially influence one another.

This heatmap provides a comprehensive visual summary of the relationships between the various parameters, making it an invaluable tool in our data exploration and analysis.

While the heatmap highlights a strong negative correlation of -0.88 between 'Air temperature [K]' and 'Process temperature [K]', it's essential to pivot our attention to the context provided by the bar chart. The bar chart distinctly demonstrates that 'Torque [Nm]' has the highest correlation with our target variable. Thus, despite the compelling correlation between air and process temperatures, 'Torque [Nm]' remains the most significant parameter in terms of its relationship with the target.

The visualization clearly illustrates that we have an imbalanced dataset. The majority of the samples fall into the '0' category, indicating that they didn't fail. In contrast, a significantly smaller portion of samples is labeled as '1', representing failures. 

This disparity highlights the imbalance between the two classes, with non-failures being predominant.

Such imbalance can lead to challenges in modeling as the models might be biased towards the majority class, potentially compromising their performance on the minority class.

Addressing this imbalance will be crucial for building reliable predictive models.

This visualization presents a histogram of torque values in Newton-meters (Nm). The x-axis represents the torque values, ranging from approximately 0 to a little over 70 Nm. The y-axis displays the count or frequency of each torque value.

From the graph, it's evident that:

This visualization showcases a histogram representing the distribution of process temperatures measured in Kelvin (K). The x-axis represents the process temperature values, spanning from roughly 306 K to just above 313 K. The y-axis denotes the count or frequency of each temperature value.

From the graph, we can observe:

The scatter plot visualizes the relationship between the process temperature (in Kelvin) and torque (in Nm) for different samples. The blue dots represent samples that succeeded, while the orange dots indicate those that failed.

From the scatter plot, we observe the following:

In generalization, the data suggests that regardless of the process temperature, there are instances where failures occur at both high and low torque values.

The scatter plot depicts the relationship between tool wear (in minutes) and torque (in Nm). Here, blue dots represent successful samples, whereas orange dots indicate failures.

The key observations are as follows:

1. Most samples cluster predominantly in the middle torque range, primarily between 30 Nm to 60 Nm, across varying degrees of tool wear.

2. As the tool wear time increases, moving towards the right side of the graph, there's a noticeable concentration of failure points (orange dots). This is particularly evident in the region beyond 150 minutes of tool wear.

3. Importantly, in the latter stages of tool wear (approaching the end of life for the parts), there is a distinct spread of failures across various torque levels. Failures are observed from the high torque values down to the lower torque values, suggesting that as tools age and wear down, they become increasingly susceptible to failures across a broader range of torque applications.

4. This aligns with the notion that older parts, when subjected to both high and low torques, tend to fail more frequently.

The data emphasizes that as the tools approach their end of life (represented by higher wear times), there is a higher likelihood of failures across a spectrum of torques, with a prominent concentration of failures on the right side of the graph.

Principal Component Analysis:

1. Across most of the tool wear times, there is a clear middle region (corresponding to mid-range torque values) where blue dominates, indicating successful outcomes. This suggests that when torque is applied within this range, tools tend to perform without failure.

2. In contrast, at both the higher and lower extremes of the torque range, orange (indicative of failures) appears more frequently. This is consistent with the previous scatter plot, reinforcing the idea that applying either too high or too low torques tends to result in tool failure.

3. The presence of orange boxes in the mid-torque range, amidst predominantly successful blue samples, implies instances where parts failed despite being subjected to optimal torque values. Such occurrences may point towards inherent manufacturing defects or other factors not directly related to the torque that caused the part to fail.

4. Towards the right end of the plot, which likely corresponds to tools with higher wear times, there's an increased density and frequency of failures across all torque levels. This aligns with the previous observation that tools nearing their end of life are more susceptible to failures.

This visualization further supports the notion that tools are most vulnerable to failures when subjected to extreme torque values. However, exceptions exist, possibly due to manufacturing defects, where parts can fail even when optimal torque is applied. As tools age, their resilience to diminish, making them more prone to failure across a broader range of torque applications.

The plot presented here to be a series of vertical box plots aligned sequentially according to the tool wear time (in minutes). The y-axis represents torque values in Nm. The color coding, with blue and orange, represents successful samples and failures, respectively.

Key observations from the plot are as follows:

1. Across most of the tool wear times, there is a clear middle region (corresponding to mid-range torque values) where blue dominates, indicating successful outcomes. This suggests that when torque is applied within this range, tools tend to perform without failure.

2. In contrast, at both the higher and lower extremes of the torque range, orange (indicative of failures) appears more frequently. This is consistent with the previous scatter plot, reinforcing the idea that applying either too high or too low torques tends to result in tool failure.

3. The presence of orange boxes in the mid-torque range, amidst predominantly successful blue samples, implies instances where parts failed despite being subjected to optimal torque values. Such occurrences may point towards inherent manufacturing defects or other factors not directly related to the torque that caused the part to fail.

4. Towards the right end of the plot, which likely corresponds to tools with higher wear times, there's an increased density and frequency of failures across all torque levels. This aligns with the previous observation that tools nearing their end of life are more susceptible to failures.

This visualization further supports the notion that tools are most vulnerable to failures when subjected to extreme torque values. However, exceptions exist, possibly due to manufacturing defects, where parts can fail even when optimal torque is applied. As tools age, their resilience to diminish, making them more prone to failure across a broader range of torque applications.

The scatter plot provides insights into the relationship between torque, process temperature, and the occurrence of failures across different types of parts. The x-axis represents process temperature in Kelvin, while the y-axis depicts torque in Nm. Data points are color-coded and shaped based on the type of part and whether it was a failure (Target 1) or success (Target 0).

Key observations and interpretations from the plot include:

1. Torque's Role in Failures: Across all temperature ranges, there's a notable concentration of failures (Target 1) in the higher torque region. This suggests that high torque values are a prominent factor in the occurrence of failures, especially when combined with elevated temperatures.

2. Temperature's Influence: Failures appear to be more frequent in the higher temperature range, especially when combined with high torque. This combination of high torque and temperature appears to be a critical zone where the likelihood of failure is enhanced.

3. Role of Part Type: The color and shape distribution of the data points indicate that failures and successes are spread fairly evenly across different types of parts (Type M, Type L, Type H). This observation underscores the claim that the type of part doesn't have a substantial impact on predicting failures. The failures are largely driven by torque and temperature rather than the inherent characteristics of the bolt types.

4. High Torque and Temperature Failures: Irrespective of the part type, it's evident that the combination of high torque values and elevated process temperatures is a significant risk factor for failures. This zone can be considered as a critical threshold beyond which the likelihood of failures surges.

To summarize, the plot reinforces the hypothesis that the primary determinants of failure are high torque values and elevated temperatures. In contrast, the type of part, whether it's Type M, L, or H, doesn't seem to have a pronounced influence on the occurrence of failures. As such, focusing on monitoring and regulating torque and temperature values during the manufacturing or assembly process might be pivotal in minimizing failures.

In line with our previous discussions, the scatter plot once again explains the relationship between rotational speed (rpm) and torque (Nm) with respect to the failure and success of parts. Different colors in the plot represent the status of the parts, where blue indicates no failure (Target 0) and orange indicates failure (Target 1).

A meticulous examination of the plot reveals a conspicuous presence of failures at the extremities of the torque values:

The middle section of the torque spectrum, where torque values are moderate, predominantly features successful instances (blue dots), underscoring that parts in this range tend to be more resilient or reliable.

This visualization reinforces our understanding that parts exhibit susceptibility to failure at both high and low torque extremes. The failure at high torque values might be attributed to the excessive stress or strain imposed on the parts, while the failure at lower torques might be indicative of insufficient force or sub-optimal operational conditions, among other possible factors. 

This plot resonates with our prior observations, emphasizing the criticality of torque extremes in influencing part failures and underscoring the need for meticulous torque management to enhance part reliability and performance.

This scatter plot provides a detailed visualization focusing on the relationship between the type of parts ('M', 'L', 'H') and the applied torque values (Nm), coupled with the failure occurrences.

Observations:

The visualization effectively captures the variability in failure occurrences across different part types and torque values, providing a nuanced understanding of the factors contributing to part reliability and failure.

The scatter plot illustrates the correlation between different types of failures and the torque values applied. On the x-axis, various failure types, including "No Failure," "Power Failure," "Tool Wear Failure," "Overstrain Failure," "Random Failures," and "Heat Dissipation Failure," are depicted. The y-axis represents the torque values in Newton-meters (Nm), ranging from low to high values.

Here’s a summarized observation of the data presented:

The scatter plot reveals that higher torque values are associated with specific types of failures such as Power Failure, Tool Wear Failure, Overstrain Failure, and Heat Dissipation Failure. It supports the understanding that torque plays a significant role in the occurrence of various failure types, underscoring the necessity to manage and optimize torque levels to mitigate the risk of failure.

Machine Failure Label: Indicates whether the machine has failed for any of the following five independent failure modes:

EFFECT OF FAILURE type INTERACTION

Permutation shuffling

Using available automatic machine learning, the graph presents a correlation matrix that highlights significant correlations between different types of failures and operational parameters after applying a permutation shuffling algorithm, which were not initially detected using the standard correlation function in pandas.

In this matrix, the correlation coefficients represent direct relationships between the types of failures and parameters such as air temperature, rotational speed, torque, and tool wear. For example, Heat Dissipation Failure shows a significant positive correlation with the target variable at 0.57, directly indicating an increase in the target variable with an increase in Heat Dissipation Failure. On the other hand, No Failure exhibits a strong negative correlation with the target variable at -0.96, indicating that the absence of failure corresponds to a lower value of the target variable.

The permutation shuffling has evidently revealed these strong correlations which the preliminary analysis may have missed. This technique enhances the reliability of the feature importance assessment, confirming that certain operational parameters are indeed significantly correlated with the failure types and hence hold predictive power.

The table displays the results of computing feature importance through permutation shuffling. This method involves randomly shuffling each predictor variable in the model and computing the change in the model's accuracy. The importance metric reflects the degree to which a predictive model relies on the feature.

The first column, 'importance', represents the average decrease in model performance (such as accuracy) when the feature's values are shuffled. The 'stddev' column indicates the standard deviation of the importance metric across multiple shuffling iterations, reflecting variability in the importance measure.

'Failure Type' emerges as the most important feature with a score of approximately 0.955, and its importance is consistent across iterations given the low standard deviation of 0.013. The p-value for 'Failure Type' is extremely low (about 4.93e-09), demonstrating that the importance of this feature is statistically significant.

The 'Tool wear [min]' and 'Torque [Nm]' features also show significant importance with scores around 0.029 and 0.025, respectively, and have low p-values (1.43e-04 for Tool wear and 1.04e-03 for Torque), indicating statistical significance.

'UDI', 'Air temperature [K]', and 'Process temperature [K]' have importance scores around 0.020 and 0.019, respectively, with corresponding low p-values, suggesting their significance is not by chance.

'Rotational speed [rpm]' and 'Type' have the lowest importance scores at approximately 0.019 and 0.011, yet they still hold statistically significant p-values.

The 'n' column indicates the number of permutations, which is 5 for all features, implying that the importance score is based on 5 separate shuffling iterations.

The last two columns, 'p99_high' and 'p99_low', provide the 99th percentile confidence interval for the importance score, suggesting that the true importance of the features is highly likely to fall within these ranges.

In summary, 'Failure Type' is the most important feature by a substantial margin, with other operational parameters also demonstrating significant importance in the model after applying permutation shuffling. The low p-values across all features confirm the statistical significance of these findings.