Survival Analysis
Using different survival analysis to describe the survival probability and cumulative hazard of each sample.
check assumptions
When using the check_assumptions method in lifelines, it's normal to get 1 degree of freedom. This is because the check_assumptions method is only testing the assumptions of the null model (i.e., the model with only the intercept), which has only 1 parameter.
When using the check_assumptions method in lifelines, the output includes two columns: km and rank.
km stands for Kaplan-Meier, which is a non-parametric estimate of the survival function. In this context, the km column shows the Kaplan-Meier estimate of the survival function for each feature.
The rank refers to the ranking of the p-values for each feature. The rank column helps us identify which features are most likely to violate the proportional hazards assumption, with a more rank indicating the feature with the strongest evidence of non-proportionality.
This value is calculated based on the Schoenfeld residual test, which is used to test the proportional hazards assumption.
A higher value of the test statistic (in this case, 39.90) indicates stronger evidence against the proportional hazards assumption, meaning that the feature is more likely to be non-proportional.
DYNAMIC time-dependent AUC-ROC (Base Model)
SURVIVAL FUNCTION
3 samples from top 3 and lowest 3 with additional samples per quantile.
The resulting prediction is too agressive towards the end of the cycle which is opposite with other base models.
CUMULATIVE HAZARD
3 samples from top 3 and lowest 3 with additional samples per quantile.
Using Random Survival Forests (RSF) which is an extension of random forests for survival analysis, handling non-linear relationships and interactions between covariates.
min leaf: 15
min split:10
Estimator=1000
rsf score: 0.44823
TTE/RUL location
Green dots are for RUL which means, the point of observation. Red dots are for TTE or time-to-event failure. The continuation of the curve beyond the dots are projected survival probability. The start of flattening of the curves indicate the predicted TTE from the time of observation (not the end of the curve line).
GRADIENT BOOSTING SA
Finding the number of estimators with maximum depth of 1 and learning rate of 0.01. The computed concordance index for the last three iterations are as follows:
n_estimators: 140, score: 0.7638743376093853
n_estimators: 145, score: 0.7638939630846249
n_estimators: 150, score: 0.7640207374896021
The plots show the behaviour of survival function and cumulative hazard function using Gradient Boosting Survival Analysis (Base Model).
Aggresive probability of failure while lagging cumulative hazard.
COXNET SA
L1 ratio=0.90 at 100 epochs to find the alpha coefficient. Plotting the mean of concordance index and its standard deviation solving for the alpha coefficient.
COXNET prediction
First 10 elements showing how the 'Physical_core_speed_rpm' feature affects the survival probability of the engine.