BEST DATA FOR PREDICTION
Creating a Neural Network to predict what type of failure will the machine experience using the Type of Failure as the Target feature of prediction.
creating the model
MULTIMODAL CLASSIFICATION OF FAILURE WITH VALIDATION SCORES
The adoption of multimodal classification and the redefinition of the target variable to focus on failure types represent strategic decisions aimed at improving the predictive capability and practical utility of the model in anticipating and managing machine component failures.
Objective of Solidifying Prediction: The primary goal is to enhance the accuracy and reliability of predictions related to machine component failures. This indicates a proactive approach to maintenance and problem mitigation, aiming to anticipate and address issues before they occur.
Utilization of Multimodal Classification: Multimodal classification involves analyzing and interpreting data from multiple sources or modes. In this context, it refers to integrating diverse types of data such as sensor readings, historical performance data, environmental factors, etc. By incorporating various data modalities, the predictive model can capture a more comprehensive understanding of the system's behavior and potential failure patterns.
Quantification of Failure Types: Instead of simply predicting whether a failure will occur or not (binary classification), the focus is on categorizing the predicted failures into different types. This implies a finer granularity in the prediction process, enabling stakeholders to anticipate specific failure modes (e.g., mechanical wear, electrical malfunction, corrosion) and tailor maintenance or intervention strategies accordingly.
Transformation of Target Variable: The conventional approach in predictive modeling involves setting the target variable as the outcome of interest (e.g., predicting whether a machine will fail). However, in this strategy, the target variable is redefined as the type of failure rather than the occurrence of failure itself. This shift in perspective allows the predictive model to address the question not only of "if" but also of "how" and "what type" of failure might manifest.
Enhanced Predictive Capability: By aligning the target variable with the specific types of failures, the predictive model becomes more adept at identifying and classifying subtle patterns and indicators associated with each failure mode. This facilitates more accurate predictions and empowers stakeholders with actionable insights into potential failure scenarios.
CONFUSION MATRIX & CLASSIFICATION REPORT PER CLASS
MULTIMODAL CLASSIFICATION OF FAILURE WITH VALIDATION SCORES
predicting 4-samples
As determined in the previous section. The results below are the same using deep learning model ensembles. However, the assumption remain the same where samples are considered univariate which is a very simplistic approach for prediction for survival analysis.
predicting random sample
This method for predicting the failure type of a particular sample by conducting a random sample analysis.
Random Sample Analysis: This involves randomly selecting samples from a population for analysis. It ensures that the data used for prediction is representative and unbiased, providing a fair assessment of the system's behavior.
Predicting Failure Type: The objective is to anticipate the type of failure that a specific sample may experience. This could include various failure modes such as mechanical wear, thermal degradation, corrosion, etc.
Keeping Other Features Constant: While conducting the analysis, other features or variables are held constant. This means that factors other than toolwear (presumably the variable of interest) are not altered or manipulated during the experiment. This helps isolate the effect of toolwear on the prediction of failure type.
Increasing Toolwear Value: Toolwear is systematically incremented or increased during the experiment. This allows us to observe how changes in toolwear affect the likelihood or type of failure experienced by the sample.
Approximate Useful Life Estimation: By monitoring the relationship between toolwear and the occurrence of failure types, we can estimate the approximate useful life of the sample. This refers to the duration or extent of operation before the sample is likely to fail, based on the observed patterns and trends. However, another section is dedicated for determining the RUL using the WTTE-RNN approach where different Weibull's distributions are used.
The results are analogous to the hazard model using the survival regression. The idea behind Cox’s proportional hazard model is that the log-hazard of an individual is a linear function of their covariates and a population-level baseline hazard that changes over time.
In essence, the approach involves systematically varying toolwear while keeping other variables constant to understand its impact on failure type prediction. By analyzing random samples in this manner, we can gain insights into the relationship between toolwear and useful life, facilitating more accurate predictions and informed decision-making regarding maintenance and replacement schedules.
SURVIVAL REGRESSION (COX)
In the graphs below, X-axis is the tool wear in minutes and Y-axis is the Failure rate percentage as 0.0 is the failure target for all type samples. These plots show the varying covariate values until failure occurs based on available survival regression model (Cox).
Survival regression is a powerful statistical method used when we're interested not just in modeling time to an event (like in traditional survival analysis) but also in understanding how various factors or covariates (operating conditions) influence the timing of the event. This technique allows us to account for the effects of several covariates on survival time, thereby providing a more nuanced understanding of the factors that might prolong or shorten the time to the event of interest.
The "event" in survival analysis typically refers to something that can happen at a particular point in time, such as failure of a machine. The most common form of survival regression is the Cox proportional hazards model, which assumes that covariates have a multiplicative effect on the hazard function of the survival time. This model doesn't assume a specific distribution for survival times but rather focuses on the ratio of hazards at any time point, which should be constant over time for the covariates.
Another approach is the parametric survival models, such as the Weibull, [exponential, or log-normal] models, where the survival times are assumed to follow a specific distribution. These models can provide estimates of survival functions and hazard functions, and they allow for direct modeling of the survival time, depending on the covariates. However, we will incorporate the Weibull functions to our WTTE-RNN model which is discussed in different section.
Coefficients and Hazard Ratios:
coef: The coefficient for each variable, indicating the log hazard ratio associated with a one-unit increase in that variable.
exp(coef): The hazard ratio (HR) for each variable, representing the change in hazard for each one-unit increase in the variable. A HR > 1 suggests an increase in hazard, while a HR < 1 suggests a decrease.
Air_temperature_K: HR = 1.86, suggesting that each one-unit increase in air temperature is associated with an 86% increase in the hazard of failure.
Process_temperature_K: HR = 0.57, suggesting each one-unit increase in process temperature is associated with a 43% decrease in the hazard.
Rotational_speed_rpm: HR = 1.00, suggesting rotational speed does not appear to have a significant effect on the hazard.
Torque_Nm: HR = 1.09, indicating that each one-unit increase in torque is associated with a 9% increase in the hazard.
Confidence Intervals and Statistical Significance:
The 95% CI provides a range within which the true HR is likely to lie 95% of the time. For example, the true HR for air temperature is likely between 1.64 and 2.12.
The p-value (represented by p) for each variable tests the null hypothesis that the variable's coefficient is 0 (no effect). A small p-value (typically <0.05) suggests that we can reject the null hypothesis.
In our output, all variables except Rotational_speed_rpm have p-values indicating significant associations with the hazard of failure.
Model Test Statistics:
log-likelihood ratio test: A significant chi-squared statistic (667.4 on 4 degrees of freedom) with a very low p-value suggests that the model with all variables included fits significantly better than a null model with no predictors.
Partial AIC: A lower Akaike Information Criterion (AIC) suggests a better-fitting model when comparing models with a different number of predictors. This is a relative measure and is used to compare different models.
Forest Plot:
This plot visualizes the log(HR) for each variable along with its 95% confidence interval. A box represents the log(HR), and the lines represent the confidence intervals. The distance from the vertical line at 0 indicates the magnitude and direction of the effect (right for positive, left for negative). A confidence interval that does not cross the vertical line at 0 indicates statistical significance.
Positive values mean that the sample failed sooner than expected (according to our model); negative values mean that the sample stay longer than expected (or were censored).
DATA FOR WEIBULL AFT FITTER
In this dataframe, we dropped the 'Type' and 'Failure Type' columns to check the predictability of our model. This is because these columns have a high value of coefficient (high collinearity with prediction). And then we will add the 'Type' later on to compare the results.
lambda_: This is the scale parameter of the Weibull distribution. The coefficients under this are related to the log-linear part of the model that affects the scale parameter.
Air_temperature_K and Rotational_speed_rpm: Negative coefficients mean that increases in these variables are associated with a decrease in survival time, i.e., higher temperatures and rotational speeds lead to faster tool wear.
Process_temperature_K and Torque_Nm: These variables show opposite effects. The positive coefficient for Process_temperature_K suggests that higher process temperatures are associated with longer survival times, while the negative coefficient for Torque_Nm suggests that higher torque is associated with shorter survival times.
Intercept: This is the baseline log-hazard for lambda_ when all covariates are zero.
rho_: This is the shape parameter of the Weibull distribution. A rho_ value greater than 1, which we have here, indicates that the hazard function is increasing over time (the risk of the event occurring increases as time passes).
Concordance: A concordance index of 0.920 is quite high and suggests that the model has good predictive power.
AIC: The Akaike Information Criterion value provides a means for model comparison, with lower values indicating a better fit relative to the complexity of the model.
log-likelihood ratio test: This suggests that our model is a significantly better fit than a null model with no predictors.
The results below has identical Concordance value but notice the inclusion of Type_OE as one of our predictors. It means that certain type of machine can fail early compared with other type of machines.
From our initial exercise, one-hot encoder gives us a problem with our variance when we used the Weibull AFT fitter. The results below is the indication of the problem that the variance is too low even after introducing the James-Stein Estimation and Ridge Regression with Weibull AFT fitter. Although our Concordance Index gives us 97.3% from our model, our -log2(p) of II-ratio test (a good fit but) might indicate overfitting which needs to be checked with other cross validation. Also notable from these results are that the 'Failure_Type_4' cannot be computed properly including the Type_25 and Type_75 features both encoded which indicate some problem in identifying the 'Types' of our machine.
In these results, we use the summary encoder with 'Type' feature while one-hot encoder for 'Failure Type' feature. This result is the same using conventional Machine Learning algorithm with R-squared as of 0.97 and a Root Mean Squared Error of 0.03 for regression and a promising F1-score for classification. Recall that this score was also achieved using our neural net deep learning model. We also cross validated the model using Repeated K-Fold with the following score:
Optimal alpha: 0.001
R^2: Mean=0.97, SD=0.02
MAE: Mean=0.00, SD=0.00
RMSE: Mean=0.02, SD=0.02
We also tested the model for its scaling score:
Linear Regression (Original):
MSE: 0.00
R2: 0.97
Random Forest (Original):
MSE: 0.00
R2: 0.96
Linear Regression (Scaled):
MSE: 0.00
R2: 0.97
Random Forest (Scaled):
MSE: 0.0
R2: 0.96
model validation with scaling and variable smoothing
ONE-HOT Dataframe Fitted to JAMES-STEIN ESTIMATION & RIDGE REGRESSION
ONE-HOT DF WEIBULL AFT SUMMARY
LEAVE-ONE-OUT ENCODING . WEIBULL AFT SUMMARY
From these results, after fitting with our WeibullAFT model, we need to find the best model for survival analysis as our current encoded model is not fitting well for survival analysis. We will iterate several variables and do feature engineering to find the perfect model fit for prediction. We will measure the model by Concordance Index, AIC and log-likelihood ratio test.
As an initial step, we will find the best encoding solution for these features and then measure the VIF (Variance Inflation Factor) and analyze the results across different encoding combinations for the 'Type' and 'Failure_Type' categorical variables. The objective is to minimize VIF scores to reduce multicollinearity among the features. The VIF measures how much variance of a single variable's coefficient in a regression model is inflated due to the presence of other correlated variables which is similar way how to handle competing risk in survival analysis.
Lower VIF values indicate less multicollinearity, making the model more stable and ensuring that the coefficients are reliable. In the tabulated results below, we iterate to different encoding techniques and find out the best score for Type and Failure_Type. The probable combinations are highlighted in red box. However, these models did not converged during fitting using high penalizer of 0.10 up to 0.001 at L1 Ratio of 0.5.
CROSS-VALIDATION BEFORE FITTING
K-Fold Cross-Validation
Accuracy Scores: This method produced accuracy scores ranging from 0.960 to 0.977 across 10 folds. These scores represent the model's performance on each of the 10 separate data subsets.
Mean Accuracy: The average accuracy across all folds is 0.9701, indicating a high level of predictive performance.
Standard Deviation in Accuracy: The standard deviation is 0.00537, showing that the accuracy scores are quite close to the mean, suggesting consistency in the model's performance across different data subsets.
2. Stratified K-Fold Cross-Validation
Accuracy Scores: Similar to K-Fold CV, but this method ensures that each fold has the same proportion of class labels as the entire dataset. The scores are slightly lower but very consistent, ranging from 0.966 to 0.973.
Mean Accuracy: The mean accuracy is slightly lower than K-Fold CV at 0.9695, but still indicates a high level of performance.
Standard Deviation in Accuracy: With a standard deviation of 0.00280, the results are even more consistent than K-Fold CV, highlighting the effectiveness of maintaining class distribution across folds.
3. Repeated K-Fold Cross-Validation
Accuracy Scores: This method repeats K-Fold CV multiple times (the scores suggest maybe three times, given there are 30 scores) and provides a broader range of accuracy scores, from 0.959 to 0.979, reflecting the variability in performance across different iterations and folds.
Mean Accuracy: The mean accuracy is very similar to the first method at 0.9701, affirming the model's high performance.
Standard Deviation in Accuracy: The standard deviation is 0.00519, which is close to K-Fold CV, indicating a consistent performance across multiple repeats and folds, albeit with slightly more variability than Stratified K-Fold CV.
The results below shows us the cross validation of selected encoding technique but the initial fitting result is not desirable specially for Failure_Type which we are trying to predict.
cross-validation and fitting (label-cumulative)
ParametericAFTRegressionFitter.predict_survival_function
Predict the survival function for samples, given their covariates. This assumes that the sample
just entered the study (that is, we do not condition on how long they have already installed for.)
The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
LEAVE-ONE-OUT VS LABEL-CUMULATIVE
comparison (what ENCODER to choose)
Label-Cumulative Encoder - shows good performance with mean accuracies around 0.97. However, there's some variability in the accuracies across folds, as indicated by the standard deviation values.
Ordinal Encoder - demonstrates excellent performance with mean accuracies very close to 1 (perfect accuracy), and extremely low variability. This dataset shows the highest accuracy scores among the three.
Leave-One-Out Encoder - also presents high mean accuracy, though slightly less than Ordinal Encoder, with a moderate amount of variability.
Now we have two models to compare to find out the best model for our prediction. This table shows the performance of two different data frames ("df_looe_proc" and "dfw_ft_encode") across various configurations of penalization and L1 ratio, detailing their impact on model concordance, AIC, log-likelihood ratio test, and the -log2(p) of the LL-ratio test. The infinite (-log2(p)) values suggest extremely significant effects or model improvements in certain configurations. The table highlights differences in performance metrics based on ancillary usage, penalizer type, and L1 ratio adjustments, with "df_looe_proc" showing consistently high concordance across configurations and "dfw_ft_encode" displaying a wide range of outcomes based on these settings. We will use df_looe_proc dataframe for further analysis.
summary:
To determine the best ENCODED DATA for further study based on the provided results, we should consider several key metrics: Concordance, Akaike Information Criterion (AIC), Log-Likelihood Ratio Test, and the -log2(p) of the LL-Ratio Test. However, since the -log2(p) value for many entries is reported as "inf" (infinity), suggesting extremely significant results, this metric is not as useful for differentiating between the models. Thus, we'll focus on Concordance, AIC, and the Log-Likelihood Ratio Test as our primary criteria for comparison.
Criteria for Selection:
1. Concordance: Higher is better. This statistic measures the predictive capability of the model; a higher concordance indicates a model that is better at predicting the outcome.
2. AIC (Akaike Information Criterion): Lower is better. AIC measures the quality of a model relative to others; it penalizes complexity, thus balancing goodness of fit and model simplicity.
3. Log-Likelihood Ratio Test: Higher is better. This value indicates the model's goodness of fit compared to a simpler model. A higher value suggests a better fit to the data.
Summary of Top Performers by Criteria:
Concordance: The highest concordance scores are observed in the df_looe_proc DataFrame across different configurations, with values slightly above 0.9717, indicating strong predictive performance.
AIC: Lower AIC values are desirable. The df_looe_proc DataFrame with Ancillary = True, Penalizer = 0.0001, and L1 Ratio = 1.0 has the lowest AIC, suggesting a good balance between model fit and complexity.
Log-Likelihood Ratio Test: The highest values are again seen in the df_looe_proc DataFrame, indicating a strong model fit. The entry with Ancillary = True and Penalizer settings either as None or 0.0001 shows the highest Log-Likelihood Ratio, suggesting these configurations provide a better model fit compared to others.
Best DataFrame for Further Study:
The df_looe_proc DataFrame consistently shows high performance across all the evaluated criteria, making it the best choice for further study. Specifically, configurations with Ancillary = True and either a Penalizer of 0.0001 or None are both to offer the best combination of predictive power, model fit, and simplicity. This focus can help in optimizing the model further for predictive tasks or in-depth analysis, ensuring a robust foundation for subsequent investigations
CUMULATIVE HAZARD ON DIFFERENT MODELS USING THE ENCODED DATASET
SURVIVAL FUNCTIONS USING DIFFERENT MODELS
LEFT CENSORING USING DIFFERENT MODELS
Our baseline check for the model fitting to our dataset and based on the results below, none of these models is approriate for our data for left censoring. However, Weibull and Log Normal are the nearest fit and need more customization of hyperparameters.