C-MAPSS Jet Engine Simulated Data

Commercial Modular Aero-Propulsion System Simulation

NASA’s C-MAPSS (dataset to assess the engine’s lifetime and engine degradation based on this publication.

This dataset and calculations are for educational purposes only as analyzed by Roderick Paulino.


TEST SET FD001

Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise.

The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data.

The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to:


1) unit number

2) time, in cycles

3) operational setting 1

4) operational setting 2

5) operational setting 3

6) sensor measurement 1

7) sensor measurement 2

...

26) sensor measurement 26


Data Set: FD001

Train trjectories: 100

Test trajectories: 100

Conditions: ONE (Sea Level)

Fault Modes: ONE (HPC Degradation)


Data Set: FD002

Train trjectories: 260

Test trajectories: 259

Conditions: SIX

Fault Modes: ONE (HPC Degradation)


Data Set: FD003

Train trjectories: 100

Test trajectories: 100

Conditions: ONE (Sea Level)

Fault Modes: TWO (HPC Degradation, Fan Degradation)


Data Set: FD004

Train trjectories: 248

Test trajectories: 249

Conditions: SIX

Fault Modes: TWO (HPC Degradation, Fan Degradation)

Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, ‘Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation’, in the Proceedings of the 1st International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.

Preparing to analyze for censoring mechanism that will be applied to the dataset to analyze the survival probability of each engine based on the supplied RUL data; given " In the test set, the time series ends some time prior to system failure" which denotes that it is censored. Also, as the given fact that "In the training set, the fault grows in magnitude until system failure.", it denotes that at this point in time, the engine has failed and engine has the time-to-event failure.

This experimental setup presents a survival analysis scenario with a mix of right-censoring and complete failure events. The training data, where faults progress to complete system failure, provides instances of time-to-event observations. In contrast, the test data simulates right-censoring; time series are cut off before failure, indicating that the engine continued operating past the observation window.  To analyze the survival probability of each engine with this dataset, it's essential to account for this censoring mechanism. Appropriate survival analysis techniques will allow the model to leverage both the complete failure data from the training set and the right-censored data from the test set, offering a more comprehensive analysis of engine degradation patterns.

Density of Events: In the 240-250 time cycle range, the histogram shows fewer counts compared to the peak around 200 cycles. This suggests that fewer engines reach this stage, which might be indicative of an increasing rate of engine failures as the cycles increase.

Prediction of Reliability or Failure: Given the skew of the distribution toward lower cycle numbers, and considering the engine's operational life distribution, we can infer that engines operating within the 240-250 cycle range are likely nearing the end of their operational life. This implies a higher probability of failure.

TRAIN SET FD001

In the training set, there is no event indicator therefore,  the time (cycle) is the time of observation and we don't know if it failed or not at this point.

Dropping all features with constant values as outline by profiling.

Removing all highly correlated features with 95% threshold and all features with constant values.

Checking the distributions of the remaining features after feature engineering.

SURVIVAL FUNCTIONS

For the model FD001 combining test and training sets for censored and uncensored data on KaplanMeierFitter.

data distribution (mergeD data)

This is the anticipated data distribution where there are overlapping of censored and uncensored samples. As predicted, the test set has an overlapping time cycles with training set based on the survival probability curve.

LIFELINES COXPHFITTER

Fitting to lifelines CoxPHFitter. The concordance index is a measure of how well the model discriminates between subjects who experience the event sooner from those who experience it later. A concordance index of 0.67 indicates that the model has moderate discrimination.

EXAMPLE OF NOISE 

Checking the noise of "Highpressure_turbines_Cool_air_flow" where synthetic data was introduced.

TEST SET

eps value of 0.5 and minimum samples of 5. A Silhouette Score ranges from -1 to 1. A score of 0.585 is closer to 1, suggesting that, on average, each data point is closer to other points in its own cluster than to points in neighboring clusters. 

TRAIN SET

eps value of 0.5 and minimum samples of 5. Although the score isn't very close to 1, it still indicates moderate separation between clusters and cohesion within clusters. In other words, the clusters are more densely packed and separated from each other than they would be in a dataset with a lower Silhouette Score.

MERGE TRAIN & TEST

Silhouette Score:  0.6198522073414106

MERGE TRAIN & TEST

Silhouette Score:  0.753126

P-VALUES 

The variables with p-values below 0.05 are considered statistically significant in most contexts, implying that they have a statistically significant effect on the hazard rate.

COMPRESSOR SIDE

Static pressure on the compressor side tends to increase as it approaches end of service or failure. 

TURBINE SIDE

High Pressure on the turbine side tends to decrease as it approaches end of service or failure. 

SETTING 2

BLEED ENTHALPY

CORE SPEED

Core speed shows a seemingly exponential increase as it approaches end of service or failure. The steep increase may increase the hazard rate which indicate degradation. 

FAN SPEED

The fan speed seems to follow the normal wear and tear as it approaches its service life.

LPC OUTLET TEMPERATURE

LPT OUTLET TEMPERATURE

Bypass Ratio

Ratio of fuel flow to Ps30 pps

setting 1

Bleed Enthalpy at setting 1. Engine Cooling: Cooling turbine blades in some engines.

setting 2

Bleed Enthalpy at setting 2. Pneumatic Systems: Powering various aircraft components.

setting 3

Bleed Enthalpy at setting 3. Compressed air extracted from various stages within a gas turbine engine. 

RTA SYSTEM DESIGN 

Since health index was not defined, we were expected to infer it from the given sensed variables.

RTA (Real-Time Assurance) system design:

Performance: Maintain engine operation within performance boundaries for optimal efficiency and to prevent issues like a combustor blowout.

Specific Safety Limits

Fan speed (Nf): depending on the engine model

Core speed (Nc): depending on the engine model

High-pressure compressor discharge pressure (Ps3): depending on the engine model

Performance Boundaries

HPC stall margin: Greater than 15%

The stall margin represents the safety buffer between the actual operating conditions of the compressor and the conditions at which the compressor will stall. Compressor stall is a situation where the airflow reverses direction or becomes highly turbulent, which can lead to severe performance issues, engine damage, or failure.

LPC stall margin: Greater than 6% (15% for this dataset including fan)

The stall margin indicates the buffer between the operating point of the compressor and the point at which the compressor blades will stall. A stall in a compressor blade occurs when the angle of attack of the incoming air exceeds a critical angle, causing a sudden decrease in lift and an increase in drag, which disrupts the airflow and can lead to engine failure or severe performance degradation.

RU (Richness Unit) limit: Greater than 17% (to prevent lean combustor blowout). 

A "lean blowout" occurs when the fuel-to-air mixture in the combustion chamber of a gas turbine is too lean (i.e., not enough fuel relative to air) to sustain combustion, causing the flame to extinguish. This can lead to engine shutdown or failure, and is a critical safety and performance consideration in gas turbine operation. A value greater than 17% indicates a richer mixture, meaning there is relatively more fuel in the mixture than at the lean blowout threshold. Operating the engine with an RU greater than 17% ensures that the mixture has enough fuel to maintain stable combustion under varying operational conditions, such as changes in air temperature, pressure, and engine load. Note that for EGT(Exhaust Gas Temperature), the stall threshold is about 2%.

main\core fan speed

HPC PRESSURE

performance degradation parameters

Checking the variability of the features accross time cycles on different units on test and train sets showing engine degradation towards failure or end of service life.

Fan Inlet Temperature: As fan efficiency decreases, the fan may work harder to maintain the same pressure, which could lead to an increase in the inlet temperature due to higher work input and less effective cooling.

LPC Outlet Temperature: Lower LPC efficiency typically means the compressor is not compressing the air as effectively, which could cause an increase in the temperature of the air at the LPC outlet due to less effective compression.

HPC Outlet Temperature: As with the LPC, reduced HPT efficiency could result in a higher HPC outlet temperature because the turbine is not extracting as much energy from the air, leading to higher exit temperatures.

LPT Outlet Temperature: A decrease in LPT efficiency could lead to an increase in LPT outlet temperature, again due to less energy being extracted by the turbine.

Fan Inlet Pressure: A decrease in Fan Flow may indicate a potential decrease in fan inlet pressure, assuming the fan is moving less air due to wear.

Bypass-Duct Pressure: This could potentially increase if fan flow decreases but the bypass ratio remains the same, as the same amount of air is being forced through a possibly more restrictive pathway due to wear.

HPC Outlet Pressure: Decreases in HPT efficiency and flow could result in a decrease in the HPC outlet pressure due to less effective work being performed by the turbine.

Physical Fan Speed and Corrected Fan Speed: These may either increase to compensate for lost efficiency or decrease if wear affects the fan's ability to rotate at the same speed.

Physical Core Speed and Corrected Core Speed : Similarly, these may increase to compensate for decreased efficiency or decrease if wear prevents the core from maintaining its speed.

Engine Pressure Ratio : The engine pressure ratio may decrease if both the pressure at the fan inlet and HPC outlet decrease.

HPC Outlet Static Pressure: This is likely to decrease if the HPC outlet pressure decreases.

Ratio of Fuel Flow to Ps30: If the engine is less efficient, it may require more fuel to maintain the same power output, so this ratio might increase.

Bypass Ratio: If the flow through the core decreases and the bypass duct pressure changes, the bypass ratio could change, although the direction would depend on the relative change in flow through the bypass versus the core.

Burner Fuel-Air Ratio: This might increase as the engine tries to maintain power output with decreased efficiency.

Bleed Enthalpy: Bleed air might become hotter if the engine components are less efficient at extracting energy from the air.

Required Fan Speed and Required Fan Conversion Speed : These might increase if the engine control is trying to compensate for lower efficiency to maintain performance.

High-Pressure Turbines Cool Air Flow : This may increase if more cooling is required due to higher operational temperatures from reduced efficiencies.

Low-Pressure Turbines Cool Air Flow : Similarly, this may also increase if more cooling is necessary.

time-cycle

Survival probability againts cycle time assuming the training set experienced failure at the end of cycle.

PREDICTED RUL

Survival probability againts RUL.

SURVIVAL ANALYSIS

Find the indices of probabilities that are zero or near zero. Review the corresponding data points. Analyze these points to understand why they might have such low IPCW probabilities

Using max rul

Zero or negative probabilities have been detected in the IPCW calculations, despite the probabilities array. Since the RUL is the hyphothetical predicted values, we will not rely on this feature to train our model but rather on the cycle time including the sensor measurements. We will verify if our 'Target' assumption is true based on the sensor readings and thermodynamic properties.

Training set: fault grows until system failure (TTE)

Test set: time series ends prior to system failure (Censored)

IPCW VALUES

Inverse Probability of Censoring Weights (IPCW) is a technique used to adjust for censoring in observational studies where some subjects' end outcomes are not observed within the study period. This method involves calculating weights for each individual, which are the inverse of their probability of remaining uncensored up to a given time. These weights are then used in further analyses to correct for any bias introduced by the censoring. So the assumptions to use the maximum RUL at the time of event for the test set might not accurate in this dataset.


health index

As quoted from the NASA's dataset document, "the built-in control system consists of a fan- speed controller, and a set of regulators and limiters. The latter include three high-limit regulators that prevent the engine from exceeding its design limits for core speed, engine-pressure ratio, and High-Pressure Turbine (HPT) exit temperature; a limit regulator that prevents the static pressure at the High-Pressure Compressor (HPC) exit from going too low; and an acceleration and deceleration limiter for the core speed. A comprehensive logic structure integrates these control-system components in a manner similar to that used in real engine controllers such that integrator-windup problems are avoided. Furthermore, all of the gains for the fan-speed controller and the four limit regulators are scheduled such that the controller and regulators perform as intended over the full range of flight conditions and power levels. The engine diagram in Figure 1 shows the main elements of the engine model and the flow chart in Figure 2 shows how various subroutines are assembled in the simulation."

The task was to estimate remaining life of an unspecified system using historical data only, irrespective of the underlying physical process.

INITIAL WEAR

Table 3 show the effects of engine wear on various components of an aircraft engine, specifically on their efficiency and flow, over time as measured by the number of cycles the engine has undergone.

The positive (+) and negative (-) signs denote the direction of change in the efficiency and flow characteristics due to engine wear:

A negative (-) sign indicates a decrease in efficiency or flow. This means that as the engine wears, these components become less efficient or have reduced flow rates. For instance, a negative change in "Fan_Efficiency" means that the fan is less efficient than it was initially.

A positive (+) sign indicates an increase in efficiency or flow which can indicate leakage which increased flow with wear. For example, an increase in "HPT_Flow" means there is more flow through the High-Pressure Turbine than initially.

sarimax (Seasonal Autoregressive Integrated Moving Average + exogenous variables)

SETTING 1

Physical Fan Speed: Has a negative and statistically significant coefficient. Suggests that as the Physical Fan Speed increases, 'setting_1' tends to decrease which was earlier indicated in our safety operating limit. 

Physical_fan_speed_rpm                 0.016     

Fan_inlet_temperature_R                 0.061

Burner_fuelair_ratio                          0.061

HPC_outlet_temperature_R             0.072 

Physical_core_speed_rpm                0.076 

SETTING 2

Lowpressure_turbines_Cool_air_flow   0.164

Highpressure_turbines_Cool_air_flow 0.163

Bleed_Enthalpy                       0.137

SETTING 3 (constant variable)

Although it is a constant variable, it has still has direction of causality:

Fan_inlet_temperature_R: Positive coefficient. Suggests that when 'setting_3' increases, the fan inlet temperature is likely to increase as a response.

Engine_pressure_ratioP50P2:  Positive coefficient.  Similar to the above, an increase in 'setting_3' might lead to a higher engine pressure ratio.

Burner_fuelair_ratio:  Positive coefficient.  Indicates a potential need to adjust the fuel-to-air ratio in response to a change in 'setting_3'.

Required_fan_speed and Required_fan_conversion_speed: It's possible these variables are pre-calculated targets, not directly controlled by 'setting_3'. The model might be suggesting the  'Required_fan_speed' and 'Required_fan_conversion_speed' should be adjusted in line with shifts in 'setting_3'

Fan_inlet_temperature_R 0.00

Engine_pressure_ratioP50P2 0.00

Burner_fuelair_ratio 0.00


UNSCENTED KALMAN FILTER

Using the UKF model, without adjusting the state transition and measurement functions out-of-the-box. Only fitting with high p-values. The three plots below are output of taking all the measurements including constant values for all three operational settings.

setting 1

The plot shows data points that generally follow the red line, which represents the expected quantiles if the data were normally distributed.

The lower end of the distribution deviates slightly from the line, which may suggest a minor skew in the data or the presence of some outliers.

setting 2

The points generally follow the expected line but deviate towards the ends, especially on the upper end.

The deviation at the higher end suggests the presence of outliers or heavy tails in the distribution of estimates.

This may indicate that the filter performs well for most of the data but struggles with certain conditions or timeframes that lead to these discrepancies.

setting 1

UKF using all sensor measurements.

setting 2

UKF using all sensor measurements.

setting 3

UKF using all sensor measurements.

health indexing

Based on setting 1 as substantiated by findings of our Unscented Kalman Filter model and SARIMAX, and the result of DBSCAN (Density-Based Clustering of Application with Noise) and OPTICS (Ordering points To Identify the Clustering Structure), we will now be able to see the values (27 counts) from our sensors that might correspond as "Outlier or Noise" or "Failure State" of the engine. Based on these results, we will then check these values which might denote the direction of change in the efficiency and flow characteristics due to engine wear. As a recap of our given datasets:

GROUND TRUTH

Based on linear cycle time health index. Sample of Health Index Zoning on Physical Core Speed. We also inspect the data distribution of our full dataset for censored data and samples which have failed. 0 - Censored. 1 - Failed. Note that the distributions confirm our safe zone limits which also reveals the possible obvious parameters to calculate the RUL of censored data.

PHYSICAL CORE SPEED RPM

PHYSICAL FAN SPEED RPM

HPC OUTLET TEMPERATURE R

labelling

From the plots above, we infer the following health matrix. Note that we also included parameters from censored samples that have not yet experienced failure. If either of the two features violates the operating range, the health index would be zero. If we want our prediction to be more conservative, we can adjust the limits (threshold) from 1% to 2% of these values, depending on the direction of flow and efficiency.

plot the health index

It is very interesting to see that after implementing the health index to test our hypothesis, we are able to identify data points where our health index was violated inside the safe operating zone around 200 cycles. Notice that most failed samples lie in the periphery of the outer curve of Physical Core/Fan Speed. From these plots, we can conclude that we need to create the performance curves instead.

The legend denotes how many parameters were violated when a sample exceeds the minimum and maximum values of our health index.

PERFORMANCE CURVES

The graph showcases the performance of various high-efficiency engines over time, measured in cycles, with respect to their High-Pressure Compressor (HPC) outlet temperature. The optimal performance curve, depicted in blue, serves as a benchmark, indicating the ideal temperature trajectory for the engines under normal operating conditions (i.e., - longest life span). Engines that closely follow this blue curve tend to demonstrate superior longevity and efficiency.

The failure curve, shown in red, marks the temperature path of an engine that fails prematurely compared to others. This curve starts to deviate significantly from the optimal curve at an early stage, rising above the normal temperature range. This deviation suggests that the engine in question might be experiencing issues such as increased friction, component wear, or other inefficiencies leading to overheating, which in turn accelerates the engine's degradation.

Engines that enter the normal operating zone before approximately 240 cycles show a tendency to have a longer lifespan. Notably, these engines maintain a gradual increase in temperature, avoiding sharp spikes which can be indicative of potential failures. This behavior emphasizes the importance of gradual thermal adaptation and stability in prolonging engine life.

The graph also depicts several other engines (indicated by various colors and styles of lines), each following unique trajectories. Some of these engines maintain temperatures close to the optimal curve throughout their operational life, suggesting stable and efficient performance. In contrast, others exhibit more volatile temperature changes, potentially signaling irregularities or sub-optimal conditions.

failing engines

For the graphs below, we will be investigating the characteristics of short useful life of an engine. We will be checking how long the engine stays outside the normal operating zone and how it affects this behaviour to its duration of service until failure occurred.

The graph below is showing how the unit's pressure increased rapidly and aggressively before it failed. The data points on the graph are converging to a high pressure range, indicating a critical point or threshold. The dashed line represents the unit's behavior, and the graph is providing a visual representation of its characteristics, such as its pressure profile, leading up to failure.