Machine learning &
deep learning
algorithms

Based on Synthetic Dataset from UC Irvine as a Capstone Project for University of Calgary DAT-310 Applied Deep Learning

Introduction to Machine Learning & Deep Learning Algorithms for Predicting Bolt Failures

Machine Learning (ML) and Deep Learning (DL) have revolutionized various industrial sectors, and the realm of manufacturing and quality control is no exception. In the quest to ensure the utmost precision and reliability in production, ML and DL algorithms offer an advanced toolkit to comprehend intricate patterns, identify vulnerabilities, and forecast potential failures in manufacturing processes.

Based on the study provided, it becomes evident that the failure of bolts is influenced by multiple parameters such as torque and process temperature. Traditional analytical methods might struggle to capture the nuanced relationships between these parameters and the likelihood of bolt failure. However, with ML algorithms, we can effectively model these complex relationships, taking into account subtle interactions and non-linear dependencies.

Deep Learning, a subset of ML, takes this a step further. By leveraging neural networks with multiple layers, DL models can autonomously learn and abstract features from vast amounts of data. In the context of our study, a DL model might discern intricate patterns between torque, temperature, and bolt failures that might elude simpler models.

With the integration of ML and DL algorithms, the aim is not only to predict but also to preempt potential failures by identifying risk factors in real-time. By applying these advanced techniques to the data from the aforementioned study, we can develop a robust predictive framework that enhances the reliability of the bolt manufacturing process, mitigates risks, and ultimately leads to increased operational efficiency.

Handling Imbalance Dataset

Handling imbalanced datasets is crucial in machine learning because models trained on such datasets can often have a bias towards the majority class, leading to sub-optimal predictions for the minority class. Various techniques have been developed to address this issue. Here are some of the most commonly used techniques:

1. Resampling Techniques:

2. Using Different Evaluation Metrics: 

3. Cost-sensitive Learning:

4. Algorithm-level Approaches:

Some machine learning algorithms allow for the incorporation of class weights, which provides a higher penalty for wrongly predicting the minority class. Examples include:

5. Anomaly Detection:

6. Using Ensemble Methods:

7. Transfer Learning and Semi-Supervised Learning:

8. Combination of Over and Under Sampling:

Some of the techniques mentioned above are applied in this study but it's often beneficial to experiment with a combination of these techniques to identify the most effective strategy for a particular dataset.

STOCHASTIC GRADIENT DESCENT WITH RESAMPLING

Stochastic Gradient Descent (SGD) remains an eminent optimization method in the domain of machine learning. This study presents the outcomes of integrating SGD with advanced resampling methodologies, specifically Borderline Synthetic Minority Over-sampling Technique combined with Tomek Links (BorderlineSMOTE_TomekLinks).

Key Findings:

While SGD combined with BorderlineSMOTE_TomekLinks resampling offers promise in handling class imbalances, further refinement may be needed to enhance its proficiency for the minority class. Future work could delve into adjusting hyperparameters, experimenting with different penalties, or combining additional resampling techniques for better outcomes.


STOCHASTIC GRADIENT DESCENT variation

The presented results primarily focuses on the performance metrics of a Stochastic Gradient Descent (SGD) model when applied with resampling techniques. The graphical representation can be divided into three main sections:

Technical Parameters:

The model parameters are highlighted at the bottom of the image, indicating the utilization of the "ADASYN_TomekLinks" resampling method. Other key parameters, such as the learning rate set as "constant" and a random state fixed at "42", are also specified.

Summary of Findings:

The SGD model, with its associated resampling method and parameters, produces promising results for class '0' with high precision, recall, and F1-score. However, the performance metrics for class '1' show room for improvement, especially in precision. This discrepancy suggests that while the model is adept at predicting the dominant class, it struggles with the minority class, a challenge often encountered in imbalanced datasets.


xgboost with resampling

This illustrates the results of an XGBoost model that has been trained with a resampling technique but without any scaling. The results are categorized into several main sections:

Summary of Findings:

The XGBoost model with resampling but without scaling shows promising results, especially for the dominant class '0'. It excels in terms of precision, recall, and F1-score for this class both in validation and prediction. However, for the minority class '1', there's a noticeable gap between precision and recall, suggesting potential areas to enhance, especially in the context of imbalanced datasets. The learning curve and confusion matrix further affirm the model's robustness while also indicating slight areas for refinement.

xgboost with other resampling method

The visual showcases the results of an XGBoost model, which employs the BorderlineSMOTE_TomekLinks resampling technique but doesn't involve any scaling. A breakdown of the main sections of the results is as follows:

Recommendations for Fine-Tuning:

While the model demonstrates solid performance for the dominant class '0', the metrics for the minority class '1', particularly precision, suggest there's room for improvement. Here are some potential fine-tuning strategies:

The model's performance with BorderlineSMOTE_TomekLinks resampling without scaling is commendable, especially for the major class. However, further fine-tuning can potentially bolster the model's predictions, especially for the minority class.

random forest with tomeklinks

The presented results are for a Random Forest algorithm that uses TomekLinks for resampling. Here's a breakdown of the main sections:

Conclusions:

The Random Forest model with TomekLinks resampling technique exhibits strong performance, especially in distinguishing class '0'. However, the model faces challenges in predicting the minority class '1', as evidenced by the lower recall and F1-score for this class on both validation and test datasets.

For improvement, it might be worth considering a combination of oversampling and undersampling techniques or further feature engineering and hyperparameter tuning. While the model performs exceptionally well for the majority class, efforts should focus on improving the recall for the minority class.

Neural network (using pytorch nn.module)

Deep Learning Algorithm

Conclusions & Recommendations:

The neural network model exhibits solid performance, especially in predicting the majority class '0'. However, for class '1', there's room for improvement in terms of recall and F1-score

Given that this is a deep learning model, the potential for fine-tuning and optimization is vast. Implementing regularization techniques like dropout, or leveraging more sophisticated architectures, might enhance performance. The use of early stopping, paired with validation checkpoints, can help prevent overfitting and ensure the model generalizes better to unseen data. Also, experimenting with learning rates, optimizers, and batch sizes can lead to better convergence and learning stability.

In summary, while the current performance is commendable, further fine-tuning and leveraging the power of deep learning can undoubtedly push the model's performance even higher.

MODEL PERFORMANCE AFTER PERMUTATION SHUFFLING

The displayed learning curve delineates the training process of a machine learning model, graphing the logarithmic loss values over 100 epochs for both training and validation datasets. The orange line represents the validation loss, while the blue line corresponds to the training loss. Both lines converge closely to zero loss as the number of epochs increases, indicating an effective learning process with minimal overfitting, as evidenced by the validation loss mirroring the training loss closely throughout the training process.

In the validation set performance metrics, the model achieves perfect precision and recall for class '0', with an F1-score of 1.00. For class '1', the model also demonstrates high precision and recall, at 0.97 and 0.98 respectively, culminating in an F1-score of 0.98. The accuracy of the model on the validation set is 1.00, with the macro average and weighted average for precision, recall, and F1-score also reflecting similarly high values.

These results suggest that the permutation shuffling algorithm has contributed to a model that performs exceptionally well on the validation set, with high scores across all evaluated metrics, indicating a robust predictive performance.

CONFUSION MATRIX AFTER PERMUTATION SHUFFLING

The matrix displays a classification report and a confusion matrix for a binary classification problem, evaluated on an x_test dataset after the application of a permutation shuffling algorithm.

The classification report shows precision, recall, f1-score, and support for two classes labeled as '0' and '1'. Both classes have a precision of 1.00 for class '0' and 1.00 for class '1', indicating perfect precision — the model's predictions are 100% accurate for class '0' and for class '1' as well. The recall for class '0' is 1.00, showing that every instance of class '0' was correctly identified. For class '1', the recall is 0.97, indicating that 97% of actual class '1' instances were identified. The f1-score, which balances precision and recall, is 1.00 for class '0' and 0.98 for class '1', suggesting an excellent harmonic mean of precision and recall for both classes. Support indicates the number of actual occurrences of each class in the dataset, with 1939 instances of class '0' and 61 of class '1'.

The overall accuracy of the model is 1.00, which means that the model correctly predicted the class of every instance in the dataset.

The confusion matrix visualizes the performance of the classification algorithm. It shows that the model predicted class '0' correctly 1939 times and class '1' correctly 59 times. It also indicates that there were 2 instances where class '1' was incorrectly predicted as class '0', but there were no instances of class '0' being incorrectly predicted as class '1'.

This high level of performance suggests that the permutation shuffling algorithm has not negatively impacted the model's ability to accurately predict class labels in this test dataset.

metrics comparison

The remaining metrics show high performance values for the best model, WeightedEnsemble_L2, with 'accuracy', 'balanced_accuracy', 'F1', 'roc_auc', 'average_precision', 'precision', and 'recall' all displaying strong performance.

Overall, the model shows exceptional performance across the board, with scores near or at the maximum value for most metrics. This suggests that the model is highly effective at making predictions for the given task. The anomaly in the log loss value should be investigated further, as it does not conform to the typical range for this metric.

final validation with different algorithms

The bar chart visualizes the F1 validation scores of various machine learning and deep learning models. The F1 score is a harmonic mean of precision and recall and is particularly useful for evaluating models on imbalanced datasets.

From traditional machine learning models like KNeighbors (both uniform and distance weighted) with lower F1 scores of 0.2927 and 0.3 respectively, the performance significantly improves with gradient boosting models and tree-based ensembles. LightGBM, RandomForest (both Gini and Entropy), and LightGBMLarge exhibit strong F1 scores of 0.9538, showcasing the effectiveness of ensemble methods in handling complex patterns.

The CatBoost, ExtraTrees (both Gini and Entropy), XGBoost, and the ensemble model WeightedEnsemble_L2 lead the performance with the highest F1 validation scores of 0.9688. These models are known for their robust performance across various types of data distributions and their ability to handle feature interactions.

Notably, the deep learning model 'NeuralNetFastAI' and 'NeuralNetTorch' achieve an F1 validation score of 0.9688, matching the performance of the top-performing machine learning models. This is an important observation as it indicates that with the right configuration and despite early stopping (which halted training after no improvement since epoch 3), deep learning models can reach and potentially surpass the performance of traditional algorithms. This parity in performance underscores the versatility and capability of deep learning methods, which, when combined with techniques like early stopping to prevent overfitting, can be highly effective for predictive tasks.

FINAL PERFORMANCE OF DEEP LEARNING ALGORITHM USING PYTORCH NN.MODULE

The graph above displays a learning curve of a deep learning model with four layers, starting with 128 neurons for 10 epochs, NO dropouts or early stopping, no resampling only scaling of the final dataset. The learning curve shows two lines representing training loss and validation loss over a number of training epochs.

At epoch 0, the training loss starts at a high value but quickly decreases, indicating that the model is learning from the training data effectively. The rapid decrease and subsequent flattening out of the training loss line suggest that the model's performance on the training set improves significantly and then stabilizes as it learns.

The validation loss starts low and remains low throughout the training process, which is a positive indicator that the model is generalizing well to new, unseen data. This is a desirable outcome, as it suggests that the model is not overfitting to the training data. Overfitting would be indicated by a divergence of validation loss from training loss, where validation loss starts to increase or remains significantly higher than training loss.

Given that both training and validation loss lines are close together and low after the initial epochs, this is indicative of a well-fitted model. It suggests that the initial architecture of the network, with its four layers and 128 starting neurons, is effective for the task at hand. The model appears to be complex enough to capture the underlying patterns in the data, but not too complex as to overfit.

In summary, the depicted learning curve signifies a successful training process, with the model achieving a good balance between bias and variance, leading to a robust generalization capability.