Reliability ANALYSIS

A continuing discussion how data is being analyzed for predictive maintenance based on the existing dataset to assess the risk failure.

PATH FORWARD

To proceed with creating a Cox Proportional Hazards Model and Kaplan-Meier estimators, we'll need to make some assumptions or decisions:


Time-to-event data: For survival analysis, we need a 'time' until an event (failure in this case) occurs. This could be inferred from the 'Tool wear [min]' column if we assume it reflects the operational time until the observation was made or until failure occurred. However, this assumption might not fully capture the actual 'time to failure' if the tool wear does not directly correlate with failure time or if failures can occur independently of tool wear.

Censoring: The 'Target' column can help us identify if an observation is censored (no failure observed) or if an event (failure) has occurred. A value of 0 in the 'Target' column indicates censoring, whereas 1 indicates an event.


Given these considerations, we'll proceed with the analysis by assuming 'Tool wear [min]' as the time variable and 'Target' to indicate event occurrence. We'll use the Cox Proportional Hazards Model to assess the impact of various factors on the likelihood of failure and the Kaplan-Meier estimator to estimate survival functions. Let's start with the Kaplan-Meier estimator to get a basic understanding of survival probabilities, and then move on to the Cox Proportional Hazards Model for a more detailed analysis.


assumptions:

To compute the failure rate per failure type per component type "L," "M," and "H" using reliability engineering principles, we would typically follow these steps:

RISK ASSESSMENT OF FAILURE

Kaplan-Meier Estimator Output:

The plot at the bottom represents the Kaplan-Meier survival curve. This curve shows the probability of survival over time based on the provided dataset. The x-axis ('timeline') represents the time, which in this case could be the 'Tool wear [min]' from the dataset. The y-axis shows the survival probability. The survival probability starts at 1 (or 100%) and drops as time increases, indicating the occurrence of failures over time. The steps in the curve represent the points in time where events (failures) occur. The shaded area around the curve represents the confidence interval, providing a visual indication of the uncertainty around the survival estimates.


Cox Proportional Hazards Model Output:

The upper tables show the coefficients from the Cox model. Each row represents a covariate (predictor variable) included in the model, which in this case are 'Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', and 'Torque [Nm]'.


coef: The estimated coefficient for each covariate. A positive coefficient suggests that as the covariate increases, the hazard (risk of failure) increases, whereas a negative coefficient suggests the opposite.


exp(coef): The exponentiation of the coefficient, which can be interpreted as the hazard ratio. For example, a hazard ratio of 1.05 for 'Air temperature [K]' would mean that for each one-unit increase in air temperature, the hazard of failure increases by 5%.


se(coef): The standard error of the coefficient estimate. It measures the statistical accuracy of the coefficient estimate.

coef lower 95% & upper 95%: The 95% confidence interval for the coefficient. If this interval does not include 0, it suggests that the covariate has a statistically significant effect on the hazard rate.


z: The z-statistic, which is the coefficient divided by its standard error. It is used to test the null hypothesis that the coefficient is equal to zero (no effect).


p: The p-value corresponding to the z-statistic. A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you would reject the null hypothesis and infer that the covariate has a significant effect on the hazard rate.


-log2(p): This is the negative log base 2 of the p-value. It's another way to represent the p-value where higher values indicate stronger evidence against the null hypothesis.


From this output, we can conclude which factors are statistically significant predictors of the time to failure in this dataset. For example, if any of the p-values are below 0.05, those covariates would be considered significant predictors of failure. 


Based on the results above for the Cox Proportional Hazards Model, here's the interpretation of the results for the Torque [Nm] covariate:


coef (coefficient): The coefficient for Torque [Nm] is approximately 0.199823. This value indicates the change in the log-hazard for a one-unit increase in torque. Since the coefficient is positive, it suggests that higher torque is associated with an increased hazard of failure.


exp(coef) (hazard ratio): The hazard ratio for Torque [Nm] is about 1.221187. This means that for each additional Newton meter of torque, the hazard (or risk) of failure increases by approximately 22.12%. A hazard ratio greater than 1 indicates an increased hazard as the covariate increases.

se(coef) (standard error): The standard error of the coefficient for Torque [Nm] is 0.007193. This measures the variability or precision of the coefficient estimate; a smaller standard error suggests a more precise estimate.


95% confidence interval: The lower and upper bounds of the 95% confidence interval for the coefficient of Torque [Nm] are 0.185725 and 0.213921, respectively. Since this interval does not contain zero, we can conclude that the effect of torque on the hazard is statistically significant at the 5% significance level.


z (z-score): The z-score for Torque [Nm] is 27.779839. This is a measure of how many standard deviations the coefficient is from zero. A high absolute value of the z-score indicates that the result is statistically significant.


p (p-value): The p-value for Torque [Nm] is extremely small (7.601584e-170), which is effectively zero for all practical purposes. This indicates very strong evidence against the null hypothesis of no association; hence, we reject the null hypothesis and conclude that torque is significantly associated with the hazard rate.


-log2(p): The negative log base 2 of the p-value is a way to represent the p-value on a log scale, and for Torque [Nm] it is very large, reinforcing the finding of a statistically significant result.


Torque [Nm] is a statistically significant predictor of failure time in this model, with higher torque associated with an increased risk of failure.

FAILURE RATE PER 1000 MINUTES

In reliability analysis, the failure rate is a crucial metric that defines the frequency at which an engineered system or component fails, expressed in failures per time unit. In this context, the failure rates provided per 1000 minutes are a direct measure of each component type's reliability: the lower the failure rate, the higher the reliability.

Based on the failure rates provided, we can suggest a priority table where we prioritize addressing the failure types with the highest rates first, as these are the most common and thus present the greatest reliability concern. The prioritization is done for each component type "H," "L," and "M."

Suggested Priority Table for Component Type "H":

Suggested Priority Table for Component Type "L":

Suggested Priority Table for Component Type "M":

In the reliability analysis, these failure rates can inform maintenance schedules, predict the need for replacements, and guide design improvements. The components with the highest failure rates should receive the most attention in terms of preventive maintenance and design review. The goal is to reduce these rates over time through improvements in design, material selection, manufacturing processes, and maintenance practices.

The table ranks the failure types from the highest to the lowest rate of occurrence for each component type. This suggests that for component type "H," the most critical issue to address is Heat Dissipation Failure, while for type "L," Overstrain Failure is most critical, and for type "M," Power Failure is the top priority. Addressing these issues should lead to improved reliability and extended operational life for each component type.


SURVIVAL ANALYSIS PER FAILURE TYPE (cox-KM)

heat dissipation

fitted with 10000 total observations, 9888 right-censored observations

power failure

fitted with 10000 total observations, 9905 right-censored observations

TOOL WEAR FAILURE

fitted with 10000 total observations, 9955 right-censored observations

OVERSTRAIN failure

fitted with 10000 total observations, 9922 right-censored observations

survival analysis

In order to see the patern of all failure type, we plotted the Survival Functions of all the samples regardless of Type of Machine or Failure Type.

Plotting the Survival Probability with 95% confidence

Having these functions, we want also to compare the survival probability of the 3(three) Type of Machines against time with its correspoding confidence level at 95%

Restricted Mean Survival Time (RMST): Represents the average survival time within a specified timeframe. A lower RMST for a group indicates more failures occurring within that timeframe. In our case, therefore, we want to find the RMST of each failure type and determine the approximate time before a failure will occur. Note that control group is the baseline while exp group is the experiment group typically in our case the failure type.


For Tool Wear Failure:

For Power Failure:

For Overstrain Failure:

For Heat Dissipation Failure:

For No Failure: We want also to see the survivality of all the non-failing type to ensure we can take care of them before it fails.

KAPLAN-MEIER: We then compare the RMST against the univariate prediction which have almost identical slope. We then proceed to inspect the cox proportional hazard for multivariate analysis.

In these plots, I want to see how the Type of Machine will survive overtime. So I aggregated the samples per type and compare the survival functions over time. 0 - indicates the samples belongs to 'No Failure' and 1- belongs to all types that experienced failure including the random failure.

RELIABILITY FUNCTION PER machine TYPE (0.1 PENALIZER) cox proportional hazard model

y-label('Useful Life Probability')

y-label('Useful Life Probability')

y-label('Useful Life Probability')

These plots will show the coefficients for each covariate in both models, allowing for a visual comparison of how the inclusion of the interaction term (Rotational_speed_rpm * Torque_Nm) affects the magnitude and direction of the covariates' effects on the hazard rate. Note that interpreting the coefficients requires understanding that they represent the log hazard ratios: positive values indicate an increase in the hazard rate (decreased survival), and negative values indicate a decrease in the hazard rate (increased survival) for each unit increase in the covariate.

SAMPLE INTERACTION OF AIR-TEMPERATURE AND PROCESS TEMPERATURE

accelerated failure time models

In the previous discussions, we noted that there are different survival function of each 'Type' and 'Failure Type' with different covariates. However, we used only the Cox model hazard analysis where we assumed that the effects of other covariates are proportional. Suppose we have two groups, 'Type H' and 'Type L', with different survival functions, and they are related by some accelerated failure rate, This can be interpreted as slowing down or speeding up moving along the survival function just like we have presented in different plots above. Using AFT models, this can accelerate or decelerate failure times depending on subjects’ covariates plus the unknown parameter. This model is called the Weibull AFT discussed in THIS SECTION.

PRediction power BY AGGREGATION

All along, we have been discussing the survival functions of our samples and finding the relationship between features to check the indications of correlation in our prediction. However, there is one more important question that we need to answer. How about the 'Type' of machines influencing the predictability of the outcome. Are there any relationships that we can deduce based on our samples both censored and uncensored? This is important in the real world scenario, for example, if we have two types of machines installed beside each other with the same operating conditions. We wanted to check if the type of machine outcome is isolated from the outcome of the other type. This is also applicable when two bolts of different types were installed in a closed environment where it both subjected to the same operating conditions but has different torque requirement. In the essence, how do we measure the relevance of this aggregation by type? If we can determine that each type has no direct impact to one another, can we study the survival function independently by type? Can we improve the predictability of the model?

Initial test is to check if the samples are unique on each type:

Machine Type: Type_H

Total Samples: 21

All records are unique.


Machine Type: Type_M

Total Samples: 83

All records are unique.


Machine Type: Type_L

Total Samples: 235

All records are unique.


However, when we checked our whole data frame regardless of type, we found four samples out of 10,000 where operating conditon is the same at varying Tool Wear time. This 'Tool_wear_min' can be used to determine if there is an indication of predictability by type at time (t) given that all the duplicate samples are censored.

Now, in order for us to further streamline our dataset, we need to use the variance inflation factor (VIF) to check for collinearity among variables in our DataFrame. High collinearity means that one or more of our covariates are highly correlated with each other. VIF is a common measure to quantify the level of collinearity. A VIF value of 1 indicates no correlation between a given variable and any others, values between 1 and 5 suggest moderate correlation, and values greater than 5 or 10 are often taken as indicators of high collinearity. We then drop the 'Type' and 'Failure Type' . In this case we will be dropping columns which are previously encoded categories which indicate perfect or near-perfect multicollinearity. This is known as the "dummy variable trap." The dropped category serves as the reference category against which the others are compared. From this result, we already have an idea that the prediction might not related to the 'Type' and 'Failure Type' but we need to prove it statistically by only using the covariates in blue text color against the Tool_wear_min.


Air_temperature_K                         4.453922

Process_temperature_K                     4.381316

Rotational_speed_rpm                      5.173437

Torque_Nm                                 5.238022

Tool_wear_min                             1.035109


ENCODED VARIABLES    

Type_H             inf

Type_L                                         inf

Type_M                                         inf

Failure_Type_Heat Dissipation Failure          inf

Failure_Type_No Failure                        inf

Failure_Type_Overstrain Failure                inf

Failure_Type_Power Failure                     inf

Failure_Type_Random Failures                   inf

Failure_Type_Tool Wear Failure                 inf

COXPHFITTER: 

coef: The coefficient estimate, which measures the log hazard ratio associated with a one-unit increase in the covariate.

exp(coef): The exponential of the coefficient, providing the hazard ratio. A value above 1 indicates an increased hazard rate   with each unit increase in the covariate, while a value below 1 indicates a decreased hazard rate.

se(coef): The standard error of the coefficient estimate, measuring the variability or uncertainty of the coefficient estimate.

                    coef lower 95% and coef upper 95%: The lower and upper bounds of the 95% confidence interval for the         coefficient.

exp(coef) lower 95% and exp(coef) upper 95%: The lower and upper bounds of the 95% confidence interval for the hazard ratio.

z: The z-score, calculated as the coefficient divided by its standard error. A higher absolute value indicates a stronger relationship between the covariate and the hazard.

p: The p-value associated with the z-score, testing the null hypothesis that the coefficient is equal to zero (no effect). A p-value below a threshold (e.g., 0.05) indicates statistical significance.

-log2(p): The negative log base 2 of the p-value, providing another way to measure the strength of evidence against the null hypothesis. Higher values indicate stronger evidence.


Model Diagnostics:

Concordance: A measure of the model's predictive accuracy, ranging from 0 to 1, with higher values indicating better prediction. Here, 0.92 suggests excellent predictive ability.

Partial AIC: The Akaike Information Criterion, adjusted for partial likelihood, used for model comparison. Lower values indicate a better-fitting model.

Log-likelihood ratio test: Provides the test statistic and degrees of freedom (df) for testing the model against a null model without any covariates. Higher values indicate a more significant improvement over the null model.

-log2(p) of ll-ratio test: Measures the strength of evidence against the null model, similar to the -log2(p) for individual coefficients.

Based on Concordance index and -log2(p) of il-ratio test, we can partially conclude that the model can predict the outcome without using the covariates 'Type' and the 'Failure Type'. But recall that in the previous example of initial deep learning model, our results were overfitting with dimensionality reduction of 'Failure Type' even though we used many resampling techniques to deal with imbalanced data. So how do we find out which model is correct to find the predicting power of using the groups or clusters? Or are the covariates 'Type' and 'Failure Type' leaking data to the prediction power of our model?


In the next exercise, we need to recreate our dataframe and do a label encoding to check the prediction power of our clusters. Below are the results for 'Type' and 'Failure Type' clusters with high Concordance index value.


C-index: 0.9053428040681188

Stratified C-index: 0.906535995009586

Below are the results for 'Failure Type' cluster.


C-index: 0.9147839036262405

Stratified C-index: 0.8601209314340204


Combining Both Failure Type and Type clusters: which aimed at performing a kind of permutation test on our 'Type' and 'Failure Type' covariates. The basic idea is to assess if the C-index of our model is significantly different compared to C-indices obtained by fitting models on data where the relationship between those covariates and the outcome is randomized.


Original C-index: 0.9033732012628026

Average Randomized C-index: 0.9155159564327963

The values below were obtained after stratification of failure type. We can allow the covariate(strata=['Failure_Type_Cluster']) to still be included in the model without estimating its effect which affected our indices. Notice that there was a significant reduction of C-index which implies that the feature in question is of high importance in the predictability of the model.


C-index: 0.5331985232732872

Stratified C-index: 0.7038135568638192


Below are the results for 'Failure Type' cluster.


C-index: 0.9147839036262405

Stratified C-index: 0.8601209314340204

Combining Both Failure Type and Type clusters: 

Original C-index: 0.9033732012628026

Average Randomized C-index: 0.9155159564327963


The C-index, or concordance index, is a measure of the predictive accuracy of a survival model. It assesses the model's ability to correctly rank pairs of individuals in terms of their event times. A C-index of 0.5 suggests no better predictive ability than random chance, while a C-index of 1.0 indicates perfect predictive ability.

In our original model, we have a C-index of 0.9034, which suggests a high predictive accuracy. We randomized the relationship between 'Type' and 'Failure Type' covariates and the outcome to create a null distribution of C-indices against which we can compare our original C-index. The purpose of this is to determine whether the observed association between the covariates and the outcome in our model could have occurred by chance.

The average C-index obtained from the randomized data is 0.9155, which is unexpectedly higher than the original C-index. Typically, randomizing the covariates should lead to a lower C-index on average because the predictive information the covariates contain about the outcome is destroyed. This has been proven in our deep learning model where one of our cluster was removed and analogous to this test, the random permutation results in a higher C-index which might be the original model may be overfitting to the data, capturing noise rather than true signal --- wherein the label encoding nor removing this feature are not recommended in this dataset.

The Issue of doing One-Hot-Encoding

Multicollinearity: If our categorical column has many levels, one-hot encoding creates numerous new binary columns (shown in EDA section). High correlation between these columns leads to multicollinearity, which can inflate standard errors and make coefficient estimates unstable in the Cox PH model. 

Perfect Separation: If a particular combination of categories is strongly associated with the event or non-event, it can lead to perfect separation or limited variability within specific subsets of data. This makes model fitting challenging.

In this case we have compared our results using the penalizer per type as shown above . RELIABILITY FUNCTION PER machine TYPE (0.1 PENALIZER) cox proportional hazard model.

MACHINE TYPE EFFECT

Now that we are confident that this particular covariate is important, we want also to investigate if there is a relevant pattern that we can discover between the values contain inside the categorical data.

Comparing Group 2('M') vs Group 3('L') using logrank_test

Test statistic value: 3.8390

Corrected p-value: 0.1502

--------------------------------------------------

Comparing Group 2('M') vs Group 1('H') using logrank_test

Test statistic value: 2.6336

Corrected p-value: 0.3139

--------------------------------------------------

Comparing Group 3('L') vs Group 1('H') using logrank_test

Test statistic value: 7.6583

Corrected p-value: 0.0170


Above are the comparisons of survival distributions between different groups (Group 1('H'), Group 2('M'), and Group 3('L')). The corrected p-values take into account the multiple comparisons being made, reducing the likelihood of a Type I error (falsely declaring a significant difference).


Here’s a brief interpretation of each comparison:


Group 2 vs Group 3:

Test statistic value: 3.8390

Corrected p-value: 0.1502

Interpretation: There is no statistically significant difference in the survival distributions between Group 2 and Group 3 after adjusting for multiple comparisons (p > 0.05).


Group 2 vs Group 1:

Test statistic value: 2.6336

Corrected p-value: 0.3139

Interpretation: There is no statistically significant difference in the survival distributions between Group 2 and Group 1 after adjusting for multiple comparisons (p > 0.05).


Group 3 vs Group 1:

Test statistic value: 7.6583

Corrected p-value: 0.0170


Interpretation: 

There is a statistically significant difference in the survival distributions between Group 3 and Group 1, even after adjusting for multiple comparisons (p < 0.05). This suggests that the survival experiences of these two groups are significantly different.


The key takeaway is that only the comparison between Group 3 and Group 1 showed a statistically significant difference in survival times, indicating that the conditions affecting these groups may have a different impact on their survival. This significant result warrants further investigation to understand the factors contributing to the difference in survival between these groups. 


In contrast, the comparisons involving Group 2 did not reveal any statistically significant differences in survival when compared to either of the other groups, suggesting that Group 2's survival experience is not significantly different from that of Group 1 or Group 3 under the conditions tested.

From the plot, we can observe the estimated survival functions for three different groups over time. The survival curves provide visual insight into the survival probabilities of the groups across the timeline of the study. It appears that Group 3 has a lower survival probability compared to Groups 1 and 2 after a certain point in time, which is consistent with the significant p-value indicating a difference in survival distributions between Group 3 and Group 1.


Group 2's curve is situated between Group 1 and Group 3, and its survival probabilities overlap with those of the other two groups during much of the study period. This is in line with the non-significant p-values when comparing Group 2 with Group 1 and Group 3, suggesting no statistical evidence to indicate differences in survival.

FAILURE TYPE EFFECT


Comparing Group 2 vs Group 4 using logrank_test

No Failure * Power Failure

Test statistic value: 8570.4348

Corrected p-value: 0.0000

--------------------------------------------------

Comparing Group 2 vs Group 6 using logrank_test

No Failure * Tool Wear Failure

Test statistic value: 314.4379

Corrected p-value: 0.0000

--------------------------------------------------

Comparing Group 2 vs Group 3 using logrank_test

No Failure * Overstrain Failure

Test statistic value: 664.5695

Corrected p-value: 0.0000

--------------------------------------------------

Comparing Group 2 vs Group 5 using logrank_test

No Failure * Random Failures

Test statistic value: 0.0217

Corrected p-value: 1.0000

--------------------------------------------------

Comparing Group 2 vs Group 1 using logrank_test

No Failure * Heat Dissipation Failure

Test statistic value: 7669.7316

Corrected p-value: 0.0000

--------------------------------------------------

Comparing Group 4 vs Group 6 using logrank_test

Power Failure * Tool Wear Failure

Test statistic value: 84.5671

Corrected p-value: 0.0000

--------------------------------------------------

Comparing Group 4 vs Group 3 using logrank_test

Power Failure * Overstrain Failure

Test statistic value: 104.7241

Corrected p-value: 0.0000

--------------------------------------------------

Comparing Group 4 vs Group 5 using logrank_test

Power Failure * Random Failures

Test statistic value: 22.3534

Corrected p-value: 0.0000

--------------------------------------------------

Comparing Group 4 vs Group 1 using logrank_test

Power Failure* Heat Dissipation Failure

Test statistic value: 0.4063

Corrected p-value: 1.0000

--------------------------------------------------

Comparing Group 6 vs Group 3 using logrank_test

Tool Wear Failure * Overstrain Failure

Test statistic value: 6.0442

Corrected p-value: 0.2093

--------------------------------------------------

Comparing Group 6 vs Group 5 using logrank_test

Tool Wear Failure * Random Failures

Test statistic value: 0.7622

Corrected p-value: 1.0000

--------------------------------------------------

Comparing Group 6 vs Group 1 using logrank_test

Tool Wear Failure * Heat Dissipation Failure

Test statistic value: 74.8500

Corrected p-value: 0.0000

--------------------------------------------------

Comparing Group 3 vs Group 5 using logrank_test

Overstrain Failure * Random Failures

Test statistic value: 1.3714

Corrected p-value: 1.0000

--------------------------------------------------

Comparing Group 3 vs Group 1 using logrank_test

Overstrain Failure * Heat Dissipation Failure

Test statistic value: 91.2936

Corrected p-value: 0.0000

--------------------------------------------------

Comparing Group 5 vs Group 1 using logrank_test

Random Failures * Heat Dissipation Failure

Test statistic value: 20.0082

Corrected p-value: 0.0001

--------------------------------------------------

Interpretation: 


Group 2 vs Group 4: An extremely large test statistic value with a p-value effectively at 0 indicates a profound difference in survival distributions between these groups.

Group 2 vs Group 6: Another large test statistic and a p-value of 0 suggest a significant difference in survival distributions as well.

Group 2 vs Group 3: The significant test statistic and p-value of 0 again show a difference in survival distributions.

Group 2 vs Group 5: The test statistic is very close to 0, and the corrected p-value is 1, indicating no evidence of a difference in survival distributions.

Group 2 vs Group 1: A very large test statistic and a p-value of 0 show a significant difference in survival distributions.

Group 4 vs Group 6: A significant test statistic and p-value of 0 indicate a difference in survival distributions.

Group 4 vs Group 3: Another significant test statistic and p-value of 0 suggest a notable difference.

Group 4 vs Group 5: A significant test statistic and a p-value of 0 show a difference in survival.

Group 4 vs Group 1: A small test statistic and a corrected p-value of 1 suggest no significant difference in survival distributions.

Group 6 vs Group 3: A modest test statistic with a non-significant p-value suggests no evidence of a survival distribution difference.

Group 6 vs Group 5: A small test statistic and a corrected p-value of 1 also indicate no significant difference.

Group 6 vs Group 1: A significant test statistic with a p-value of 0 indicates a difference in survival distributions.

Group 3 vs Group 5: A small test statistic and a corrected p-value of 1 suggest no significant difference in survival distributions.

Group 3 vs Group 1: A significant test statistic with a p-value of 0 indicates a difference in survival distributions.

Group 5 vs Group 1: A significant test statistic with a very low p-value suggests a difference in survival distributions.


These results, especially those with low p-values, suggest that there are statistically significant differences in survival distributions between certain groups. For the few comparisons with non-significant p-values suggest that in some cases, the survival distributions are not significantly different.

PREDICTION OF FAILURE (COMPETING RISKS MODEL)

SAMPLE PREDICTION