Predicting Aircraft Engine Failure
Using Random Forest on NASA’s CMAPSS Turbofan Dataset
1. The Problem
Turbofan jet engines degrade over time. In aviation maintenance, the key question is: how many operational cycles does an engine have left before it needs servicing?
Unplanned engine failures cost airlines millions — and more critically, they risk lives. Traditional maintenance operates on fixed schedules, which means engines are often serviced too early (wasting resources) or too late (risking failure).
This project builds a machine learning model that predicts the Remaining Useful Life (RUL) of turbofan engines — how many cycles remain before maintenance is required — using real sensor data from NASA’s CMAPSS benchmark dataset.
2. The Dataset — NASA CMAPSS FD001
The Commercial Modular Aero-Propulsion System Simulation (CMAPSS) dataset was released by NASA as a benchmark for predictive maintenance research. We use the FD001 subset, which simulates engines degrading under a single operating condition until failure.
The training file contains 20,631 rows across 100 engines, each with 26 columns: unit number, time cycle, 3 operational settings, and 21 sensor readings.
3. Pipeline Architecture
The project follows a clean, linear ML pipeline:
- Load raw sensor data → assign column names
- Inspect data quality (missing values, data types)
- Remove constant/dead sensors (no signal = no information)
- Calculate and clip RUL labels from training data
- EDA: lifespan distribution + sensor degradation + correlation
- Scale features with MinMaxScaler → train Random Forest
- Evaluate RMSE + R² → plot feature importance
4. Creating the RUL Target
There is no RUL column in the raw training data. We calculate it: for each engine, find its maximum life cycle, then subtract the current cycle from that maximum.
We then clip the RUL at 125 cycles. This is a deliberate design choice: an engine with 500 cycles remaining behaves identically in sensor readings to one with 200 cycles remaining — the model doesn’t need to distinguish those high-RUL states.
5. Feature Cleaning — Removing Dead Sensors
Not all 21 sensors carry useful information. Some sensors output a constant value throughout the entire dataset — these are effectively dead and contribute only noise to the model.
We identify any column with ≤1 unique value and drop it from both train and test sets:
6. Exploratory Data Analysis
Engine Lifespan Distribution
How long do engines typically run before failure? We group by unit number and plot the max cycle for each engine:
Sensor Degradation Over Time
The T50 sensor (total temperature at LPT outlet) shows a clear degradation trend as engines approach failure. We plot engine #1 to visualize this:
Correlation with RUL
Which sensors correlate most strongly with RUL? Positive correlation means the sensor value increases as RUL goes up; negative means the sensor rises as the engine degrades.
7. Training the Random Forest
Random Forest is a strong baseline for this problem: it handles non-linearity well, is robust to outliers, and provides natural feature importance. We scale features first using MinMaxScaler, then train with 200 trees:
8. Model Evaluation
We evaluate on the test set using the last cycle of each engine (since we predict RUL at the final observed state before failure), matched against the ground truth RUL file:
An RMSE of 18.19 means our predictions are off by around 18 engine cycles on average. An R² of 0.80 means the model explains 80% of the variance in remaining useful life — a strong result for a single-pass Random Forest with no hyperparameter tuning.
9. What Worked and What Didn’t
|
✅ What worked • RUL clipping at 125 cycles — prevented the model from over-fitting to high-RUL noise in the training set • Removing 7 constant sensors — reduced feature count without losing any predictive signal • MinMaxScaler normalization — ensured all sensors contributed equally regardless of raw scale • Random Forest with 200 trees — gave a stable, robust baseline with good generalization |
|
❌ What could be improved • No hyperparameter tuning — GridSearchCV or Optuna could squeeze more performance • We only used the last cycle for test predictions — a rolling-window approach might capture degradation trends better • No cross-validation — training on a single split may hide variance in performance • Deep learning (LSTM, Transformer) would likely outperform RF by capturing temporal sequences across cycles |
10. Feature Importance
Random Forest tells us which sensors matter most for predicting RUL. This is valuable for domain experts: it shows which physical measurements (temperatures, pressures, speeds) are the strongest indicators of engine health.
11. Key Takeaways
- Predictive maintenance is a real, high-impact ML problem. The CMAPSS dataset is a clean entry point into industrial time-series prediction.
- Feature engineering matters more than model choice. Removing dead sensors and clipping RUL had a bigger impact than model architecture.
- Random Forest is a solid baseline. RMSE 18.19 and R² 0.80 without any deep learning is a result worth building on.
- The next step is LSTM. Feeding the full time-series sequence into a recurrent model should push R² above 0.90.