Survival Analysis

Using different survival analysis to describe the survival probability and cumulative hazard of each sample.

check assumptions

When using the check_assumptions method in lifelines, it's normal to get 1 degree of freedom. This is because the check_assumptions method is only testing the assumptions of the null model (i.e., the model with only the intercept), which has only 1 parameter.

When using the check_assumptions method in lifelines, the output includes two columns: km and rank.

km stands for Kaplan-Meier, which is a non-parametric estimate of the survival function. In this context, the km column shows the Kaplan-Meier estimate of the survival function for each feature.

The rank refers to the ranking of the p-values for each feature. The rank column helps us identify which features are most likely to violate the proportional hazards assumption, with a more rank indicating the feature with the strongest evidence of non-proportionality.

This value is calculated based on the Schoenfeld residual test, which is used to test the proportional hazards assumption.

A higher value of the test statistic (in this case, 39.90) indicates stronger evidence against the proportional hazards assumption, meaning that the feature is more likely to be non-proportional.

DYNAMIC time-dependent AUC-ROC (Base Model)

SURVIVAL FUNCTION

3 samples from top 3 and lowest 3 with additional samples per quantile.

The resulting prediction is too agressive towards the end of the cycle which is opposite with other base models.

CUMULATIVE HAZARD

3 samples from top 3 and lowest 3 with additional samples per quantile.

Using Random Survival Forests (RSF) which is an extension of random forests for survival analysis, handling non-linear relationships and interactions between covariates.

min leaf: 15

min split:10

Estimator=1000

rsf score: 0.44823

TTE/RUL location

Green dots are for RUL which means, the point of observation. Red dots are for TTE or time-to-event failure. The continuation of the curve beyond the dots are projected survival probability. The start of flattening of the curves indicate the predicted TTE from the time of observation (not the end of the curve line).


GRADIENT BOOSTING SA

Finding the number of estimators with maximum depth of 1 and learning rate of 0.01. The computed concordance index for the last three iterations are as follows:

n_estimators: 140, score: 0.7638743376093853

n_estimators: 145, score: 0.7638939630846249

n_estimators: 150, score: 0.7640207374896021

The plots show the behaviour of survival function and cumulative hazard function using Gradient Boosting Survival Analysis (Base Model).


Aggresive probability of failure while lagging cumulative hazard.

COXNET SA

L1 ratio=0.90 at 100 epochs to find the alpha coefficient. Plotting the mean of concordance index and its standard deviation solving for the alpha coefficient.

COXNET prediction

First 10 elements showing how the 'Physical_core_speed_rpm' feature affects the survival probability of the engine.