C-MAPSS Jet Engine Simulated Data
Commercial Modular Aero-Propulsion System Simulation
NASA’s C-MAPSS (dataset to assess the engine’s lifetime and engine degradation based on this publication.
This dataset and calculations are for educational purposes only as analyzed by Roderick Paulino.
TEST SET FD001
Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise.
The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data.
The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to:
1) unit number
2) time, in cycles
3) operational setting 1
4) operational setting 2
5) operational setting 3
6) sensor measurement 1
7) sensor measurement 2
...
26) sensor measurement 26
Data Set: FD001
Train trjectories: 100
Test trajectories: 100
Conditions: ONE (Sea Level)
Fault Modes: ONE (HPC Degradation)
Data Set: FD002
Train trjectories: 260
Test trajectories: 259
Conditions: SIX
Fault Modes: ONE (HPC Degradation)
Data Set: FD003
Train trjectories: 100
Test trajectories: 100
Conditions: ONE (Sea Level)
Fault Modes: TWO (HPC Degradation, Fan Degradation)
Data Set: FD004
Train trjectories: 248
Test trajectories: 249
Conditions: SIX
Fault Modes: TWO (HPC Degradation, Fan Degradation)
Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, ‘Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation’, in the Proceedings of the 1st International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.
Preparing to analyze for censoring mechanism that will be applied to the dataset to analyze the survival probability of each engine based on the supplied RUL data; given " In the test set, the time series ends some time prior to system failure" which denotes that it is censored. Also, as the given fact that "In the training set, the fault grows in magnitude until system failure.", it denotes that at this point in time, the engine has failed and engine has the time-to-event failure.
This experimental setup presents a survival analysis scenario with a mix of right-censoring and complete failure events. The training data, where faults progress to complete system failure, provides instances of time-to-event observations. In contrast, the test data simulates right-censoring; time series are cut off before failure, indicating that the engine continued operating past the observation window. To analyze the survival probability of each engine with this dataset, it's essential to account for this censoring mechanism. Appropriate survival analysis techniques will allow the model to leverage both the complete failure data from the training set and the right-censored data from the test set, offering a more comprehensive analysis of engine degradation patterns.
Density of Events: In the 240-250 time cycle range, the histogram shows fewer counts compared to the peak around 200 cycles. This suggests that fewer engines reach this stage, which might be indicative of an increasing rate of engine failures as the cycles increase.
Prediction of Reliability or Failure: Given the skew of the distribution toward lower cycle numbers, and considering the engine's operational life distribution, we can infer that engines operating within the 240-250 cycle range are likely nearing the end of their operational life. This implies a higher probability of failure.
TRAIN SET FD001
In the training set, there is no event indicator therefore, the time (cycle) is the time of observation and we don't know if it failed or not at this point.
Dropping all features with constant values as outline by profiling.
Removing all highly correlated features with 95% threshold and all features with constant values.
Checking the distributions of the remaining features after feature engineering.
SURVIVAL FUNCTIONS
For the model FD001 combining test and training sets for censored and uncensored data on KaplanMeierFitter.
data distribution (mergeD data)
This is the anticipated data distribution where there are overlapping of censored and uncensored samples. As predicted, the test set has an overlapping time cycles with training set based on the survival probability curve.
LIFELINES COXPHFITTER
Fitting to lifelines CoxPHFitter. The concordance index is a measure of how well the model discriminates between subjects who experience the event sooner from those who experience it later. A concordance index of 0.67 indicates that the model has moderate discrimination.
EXAMPLE OF NOISE
Checking the noise of "Highpressure_turbines_Cool_air_flow" where synthetic data was introduced.
TEST SET
eps value of 0.5 and minimum samples of 5. A Silhouette Score ranges from -1 to 1. A score of 0.585 is closer to 1, suggesting that, on average, each data point is closer to other points in its own cluster than to points in neighboring clusters.
TRAIN SET
eps value of 0.5 and minimum samples of 5. Although the score isn't very close to 1, it still indicates moderate separation between clusters and cohesion within clusters. In other words, the clusters are more densely packed and separated from each other than they would be in a dataset with a lower Silhouette Score.
MERGE TRAIN & TEST
Silhouette Score: 0.6198522073414106
MERGE TRAIN & TEST
Silhouette Score: 0.753126
P-VALUES
The variables with p-values below 0.05 are considered statistically significant in most contexts, implying that they have a statistically significant effect on the hazard rate.
COMPRESSOR SIDE
Static pressure on the compressor side tends to increase as it approaches end of service or failure.
TURBINE SIDE
High Pressure on the turbine side tends to decrease as it approaches end of service or failure.
SETTING 2
BLEED ENTHALPY
CORE SPEED
Core speed shows a seemingly exponential increase as it approaches end of service or failure. The steep increase may increase the hazard rate which indicate degradation.
FAN SPEED
The fan speed seems to follow the normal wear and tear as it approaches its service life.
LPC OUTLET TEMPERATURE
LPT OUTLET TEMPERATURE
Bypass Ratio
Ratio of fuel flow to Ps30 pps
setting 1
Bleed Enthalpy at setting 1. Engine Cooling: Cooling turbine blades in some engines.
setting 2
Bleed Enthalpy at setting 2. Pneumatic Systems: Powering various aircraft components.
setting 3
Bleed Enthalpy at setting 3. Compressed air extracted from various stages within a gas turbine engine.
RTA SYSTEM DESIGN
Since health index was not defined, we were expected to infer it from the given sensed variables.
RTA (Real-Time Assurance) system design:
Performance: Maintain engine operation within performance boundaries for optimal efficiency and to prevent issues like a combustor blowout.
Specific Safety Limits
Fan speed (Nf): depending on the engine model
Core speed (Nc): depending on the engine model
High-pressure compressor discharge pressure (Ps3): depending on the engine model
Performance Boundaries
HPC stall margin: Greater than 15%
The stall margin represents the safety buffer between the actual operating conditions of the compressor and the conditions at which the compressor will stall. Compressor stall is a situation where the airflow reverses direction or becomes highly turbulent, which can lead to severe performance issues, engine damage, or failure.
Operational Stability: A higher stall margin in the HPC provides a greater buffer against fluctuations in engine operating conditions, such as changes in air demand, pressure, temperature, and speed. This stability is crucial for maintaining continuous and safe engine operation, particularly under varying flight conditions.
Prevention of Surge and Stall: Compressor surge and stall can occur if the airflow angle across the compressor blades becomes too steep, disrupting the normal flow. This disruption can lead to a rapid decrease in engine thrust and possibly engine failure. A stall margin greater than 15% means that the compressor is less likely to reach these critical airflow angles under normal operational stresses.
Enhanced Engine Performance: Keeping the HPC operating safely away from its stall limit allows it to operate more efficiently, translating into better fuel economy, lower emissions, and increased engine performance. Efficient compression is vital for the overall thermodynamic efficiency of the engine.
Flexibility in Operating Conditions: A higher stall margin provides flexibility in how the engine can be operated, allowing for more aggressive throttle movements, rapid altitude changes, and other dynamic maneuvers without the risk of inducing compressor stall.
LPC stall margin: Greater than 6% (15% for this dataset including fan)
The stall margin indicates the buffer between the operating point of the compressor and the point at which the compressor blades will stall. A stall in a compressor blade occurs when the angle of attack of the incoming air exceeds a critical angle, causing a sudden decrease in lift and an increase in drag, which disrupts the airflow and can lead to engine failure or severe performance degradation.
Temperature and Air Density: The temperature of the air exiting the LPC affects its density. Higher temperatures result in lower air densities, which can alter the airflow characteristics through the compressor blades. If the air is less dense, it can affect the compressor’s ability to effectively compress the incoming air, potentially bringing the operation closer to the stall limit.
Compressor Efficiency: The efficiency of the compressor is partly dependent on the temperature of the air it is compressing. Higher outlet temperatures might indicate that the compressor is working harder to achieve the same pressure ratio, which could reduce its operational efficiency and safety margin against stall.
Impact on Stall Margin: The stall margin needs to account for variations in LPC outlet temperature because these temperature changes can influence the compressibility and stability of the airflow through the compressor stages. A higher outlet temperature might suggest that the compressor is operating closer to its performance limits, thereby reducing the stall margin.
RU (Richness Unit) limit: Greater than 17% (to prevent lean combustor blowout).
A "lean blowout" occurs when the fuel-to-air mixture in the combustion chamber of a gas turbine is too lean (i.e., not enough fuel relative to air) to sustain combustion, causing the flame to extinguish. This can lead to engine shutdown or failure, and is a critical safety and performance consideration in gas turbine operation. A value greater than 17% indicates a richer mixture, meaning there is relatively more fuel in the mixture than at the lean blowout threshold. Operating the engine with an RU greater than 17% ensures that the mixture has enough fuel to maintain stable combustion under varying operational conditions, such as changes in air temperature, pressure, and engine load. Note that for EGT(Exhaust Gas Temperature), the stall threshold is about 2%.
main\core fan speed
HPC PRESSURE
performance degradation parameters
Checking the variability of the features accross time cycles on different units on test and train sets showing engine degradation towards failure or end of service life.
Fan Inlet Temperature: As fan efficiency decreases, the fan may work harder to maintain the same pressure, which could lead to an increase in the inlet temperature due to higher work input and less effective cooling.
LPC Outlet Temperature: Lower LPC efficiency typically means the compressor is not compressing the air as effectively, which could cause an increase in the temperature of the air at the LPC outlet due to less effective compression.
HPC Outlet Temperature: As with the LPC, reduced HPT efficiency could result in a higher HPC outlet temperature because the turbine is not extracting as much energy from the air, leading to higher exit temperatures.
LPT Outlet Temperature: A decrease in LPT efficiency could lead to an increase in LPT outlet temperature, again due to less energy being extracted by the turbine.
Fan Inlet Pressure: A decrease in Fan Flow may indicate a potential decrease in fan inlet pressure, assuming the fan is moving less air due to wear.
Bypass-Duct Pressure: This could potentially increase if fan flow decreases but the bypass ratio remains the same, as the same amount of air is being forced through a possibly more restrictive pathway due to wear.
HPC Outlet Pressure: Decreases in HPT efficiency and flow could result in a decrease in the HPC outlet pressure due to less effective work being performed by the turbine.
Physical Fan Speed and Corrected Fan Speed: These may either increase to compensate for lost efficiency or decrease if wear affects the fan's ability to rotate at the same speed.
Physical Core Speed and Corrected Core Speed : Similarly, these may increase to compensate for decreased efficiency or decrease if wear prevents the core from maintaining its speed.
Engine Pressure Ratio : The engine pressure ratio may decrease if both the pressure at the fan inlet and HPC outlet decrease.
HPC Outlet Static Pressure: This is likely to decrease if the HPC outlet pressure decreases.
Ratio of Fuel Flow to Ps30: If the engine is less efficient, it may require more fuel to maintain the same power output, so this ratio might increase.
Bypass Ratio: If the flow through the core decreases and the bypass duct pressure changes, the bypass ratio could change, although the direction would depend on the relative change in flow through the bypass versus the core.
Burner Fuel-Air Ratio: This might increase as the engine tries to maintain power output with decreased efficiency.
Bleed Enthalpy: Bleed air might become hotter if the engine components are less efficient at extracting energy from the air.
Required Fan Speed and Required Fan Conversion Speed : These might increase if the engine control is trying to compensate for lower efficiency to maintain performance.
High-Pressure Turbines Cool Air Flow : This may increase if more cooling is required due to higher operational temperatures from reduced efficiencies.
Low-Pressure Turbines Cool Air Flow : Similarly, this may also increase if more cooling is necessary.
time-cycle
Survival probability againts cycle time assuming the training set experienced failure at the end of cycle.
PREDICTED RUL
Survival probability againts RUL.
SURVIVAL ANALYSIS
Find the indices of probabilities that are zero or near zero. Review the corresponding data points. Analyze these points to understand why they might have such low IPCW probabilities
Using max rul
Zero or negative probabilities have been detected in the IPCW calculations, despite the probabilities array. Since the RUL is the hyphothetical predicted values, we will not rely on this feature to train our model but rather on the cycle time including the sensor measurements. We will verify if our 'Target' assumption is true based on the sensor readings and thermodynamic properties.
Training set: fault grows until system failure (TTE)
Test set: time series ends prior to system failure (Censored)
IPCW VALUES
Inverse Probability of Censoring Weights (IPCW) is a technique used to adjust for censoring in observational studies where some subjects' end outcomes are not observed within the study period. This method involves calculating weights for each individual, which are the inverse of their probability of remaining uncensored up to a given time. These weights are then used in further analyses to correct for any bias introduced by the censoring. So the assumptions to use the maximum RUL at the time of event for the test set might not accurate in this dataset.
health index
As quoted from the NASA's dataset document, "the built-in control system consists of a fan- speed controller, and a set of regulators and limiters. The latter include three high-limit regulators that prevent the engine from exceeding its design limits for core speed, engine-pressure ratio, and High-Pressure Turbine (HPT) exit temperature; a limit regulator that prevents the static pressure at the High-Pressure Compressor (HPC) exit from going too low; and an acceleration and deceleration limiter for the core speed. A comprehensive logic structure integrates these control-system components in a manner similar to that used in real engine controllers such that integrator-windup problems are avoided. Furthermore, all of the gains for the fan-speed controller and the four limit regulators are scheduled such that the controller and regulators perform as intended over the full range of flight conditions and power levels. The engine diagram in Figure 1 shows the main elements of the engine model and the flow chart in Figure 2 shows how various subroutines are assembled in the simulation."
The task was to estimate remaining life of an unspecified system using historical data only, irrespective of the underlying physical process.
INITIAL WEAR
Table 3 show the effects of engine wear on various components of an aircraft engine, specifically on their efficiency and flow, over time as measured by the number of cycles the engine has undergone.
The positive (+) and negative (-) signs denote the direction of change in the efficiency and flow characteristics due to engine wear:
A negative (-) sign indicates a decrease in efficiency or flow. This means that as the engine wears, these components become less efficient or have reduced flow rates. For instance, a negative change in "Fan_Efficiency" means that the fan is less efficient than it was initially.
A positive (+) sign indicates an increase in efficiency or flow which can indicate leakage which increased flow with wear. For example, an increase in "HPT_Flow" means there is more flow through the High-Pressure Turbine than initially.
sarimax (Seasonal Autoregressive Integrated Moving Average + exogenous variables)
SETTING 1
Physical Fan Speed: Has a negative and statistically significant coefficient. Suggests that as the Physical Fan Speed increases, 'setting_1' tends to decrease which was earlier indicated in our safety operating limit.
Physical_fan_speed_rpm 0.016
Fan_inlet_temperature_R 0.061
Burner_fuelair_ratio 0.061
HPC_outlet_temperature_R 0.072
Physical_core_speed_rpm 0.076
SETTING 2
Lowpressure_turbines_Cool_air_flow 0.164
Highpressure_turbines_Cool_air_flow 0.163
Bleed_Enthalpy 0.137
SETTING 3 (constant variable)
Although it is a constant variable, it has still has direction of causality:
Fan_inlet_temperature_R: Positive coefficient. Suggests that when 'setting_3' increases, the fan inlet temperature is likely to increase as a response.
Engine_pressure_ratioP50P2: Positive coefficient. Similar to the above, an increase in 'setting_3' might lead to a higher engine pressure ratio.
Burner_fuelair_ratio: Positive coefficient. Indicates a potential need to adjust the fuel-to-air ratio in response to a change in 'setting_3'.
Required_fan_speed and Required_fan_conversion_speed: It's possible these variables are pre-calculated targets, not directly controlled by 'setting_3'. The model might be suggesting the 'Required_fan_speed' and 'Required_fan_conversion_speed' should be adjusted in line with shifts in 'setting_3'
Fan_inlet_temperature_R 0.00
Engine_pressure_ratioP50P2 0.00
Burner_fuelair_ratio 0.00
UNSCENTED KALMAN FILTER
Using the UKF model, without adjusting the state transition and measurement functions out-of-the-box. Only fitting with high p-values. The three plots below are output of taking all the measurements including constant values for all three operational settings.
setting 1
The plot shows data points that generally follow the red line, which represents the expected quantiles if the data were normally distributed.
The lower end of the distribution deviates slightly from the line, which may suggest a minor skew in the data or the presence of some outliers.
setting 2
The points generally follow the expected line but deviate towards the ends, especially on the upper end.
The deviation at the higher end suggests the presence of outliers or heavy tails in the distribution of estimates.
This may indicate that the filter performs well for most of the data but struggles with certain conditions or timeframes that lead to these discrepancies.
setting 1
UKF using all sensor measurements.
setting 2
UKF using all sensor measurements.
setting 3
UKF using all sensor measurements.
health indexing
Based on setting 1 as substantiated by findings of our Unscented Kalman Filter model and SARIMAX, and the result of DBSCAN (Density-Based Clustering of Application with Noise) and OPTICS (Ordering points To Identify the Clustering Structure), we will now be able to see the values (27 counts) from our sensors that might correspond as "Outlier or Noise" or "Failure State" of the engine. Based on these results, we will then check these values which might denote the direction of change in the efficiency and flow characteristics due to engine wear. As a recap of our given datasets:
Each dataset divided into training and test subsets
Each time series from a different engine (fleet of engines of the same type)
Initial wear and manufacturing variation present (unknown to user, considered normal)
Three operational settings affecting engine performance (included in data)
Sensor noise contamination
Engine operates normally at start, develops fault during series
Training set: fault grows until system failure
Test set: time series ends prior to system failure
GROUND TRUTH
Based on linear cycle time health index. Sample of Health Index Zoning on Physical Core Speed. We also inspect the data distribution of our full dataset for censored data and samples which have failed. 0 - Censored. 1 - Failed. Note that the distributions confirm our safe zone limits which also reveals the possible obvious parameters to calculate the RUL of censored data.
PHYSICAL CORE SPEED RPM
PHYSICAL FAN SPEED RPM
HPC OUTLET TEMPERATURE R
labelling
From the plots above, we infer the following health matrix. Note that we also included parameters from censored samples that have not yet experienced failure. If either of the two features violates the operating range, the health index would be zero. If we want our prediction to be more conservative, we can adjust the limits (threshold) from 1% to 2% of these values, depending on the direction of flow and efficiency.
plot the health index
It is very interesting to see that after implementing the health index to test our hypothesis, we are able to identify data points where our health index was violated inside the safe operating zone around 200 cycles. Notice that most failed samples lie in the periphery of the outer curve of Physical Core/Fan Speed. From these plots, we can conclude that we need to create the performance curves instead.
The legend denotes how many parameters were violated when a sample exceeds the minimum and maximum values of our health index.
PERFORMANCE CURVES
The graph showcases the performance of various high-efficiency engines over time, measured in cycles, with respect to their High-Pressure Compressor (HPC) outlet temperature. The optimal performance curve, depicted in blue, serves as a benchmark, indicating the ideal temperature trajectory for the engines under normal operating conditions (i.e., - longest life span). Engines that closely follow this blue curve tend to demonstrate superior longevity and efficiency.
The failure curve, shown in red, marks the temperature path of an engine that fails prematurely compared to others. This curve starts to deviate significantly from the optimal curve at an early stage, rising above the normal temperature range. This deviation suggests that the engine in question might be experiencing issues such as increased friction, component wear, or other inefficiencies leading to overheating, which in turn accelerates the engine's degradation.
Engines that enter the normal operating zone before approximately 240 cycles show a tendency to have a longer lifespan. Notably, these engines maintain a gradual increase in temperature, avoiding sharp spikes which can be indicative of potential failures. This behavior emphasizes the importance of gradual thermal adaptation and stability in prolonging engine life.
The graph also depicts several other engines (indicated by various colors and styles of lines), each following unique trajectories. Some of these engines maintain temperatures close to the optimal curve throughout their operational life, suggesting stable and efficient performance. In contrast, others exhibit more volatile temperature changes, potentially signaling irregularities or sub-optimal conditions.
failing engines
For the graphs below, we will be investigating the characteristics of short useful life of an engine. We will be checking how long the engine stays outside the normal operating zone and how it affects this behaviour to its duration of service until failure occurred.
The graph below is showing how the unit's pressure increased rapidly and aggressively before it failed. The data points on the graph are converging to a high pressure range, indicating a critical point or threshold. The dashed line represents the unit's behavior, and the graph is providing a visual representation of its characteristics, such as its pressure profile, leading up to failure.