Machine learning &
deep learning
algorithms
Based on Synthetic Dataset from UC Irvine as a Capstone Project for University of Calgary DAT-310 Applied Deep Learning
Introduction to Machine Learning & Deep Learning Algorithms for Predicting Bolt Failures
Machine Learning (ML) and Deep Learning (DL) have revolutionized various industrial sectors, and the realm of manufacturing and quality control is no exception. In the quest to ensure the utmost precision and reliability in production, ML and DL algorithms offer an advanced toolkit to comprehend intricate patterns, identify vulnerabilities, and forecast potential failures in manufacturing processes.
Based on the study provided, it becomes evident that the failure of bolts is influenced by multiple parameters such as torque and process temperature. Traditional analytical methods might struggle to capture the nuanced relationships between these parameters and the likelihood of bolt failure. However, with ML algorithms, we can effectively model these complex relationships, taking into account subtle interactions and non-linear dependencies.
Deep Learning, a subset of ML, takes this a step further. By leveraging neural networks with multiple layers, DL models can autonomously learn and abstract features from vast amounts of data. In the context of our study, a DL model might discern intricate patterns between torque, temperature, and bolt failures that might elude simpler models.
With the integration of ML and DL algorithms, the aim is not only to predict but also to preempt potential failures by identifying risk factors in real-time. By applying these advanced techniques to the data from the aforementioned study, we can develop a robust predictive framework that enhances the reliability of the bolt manufacturing process, mitigates risks, and ultimately leads to increased operational efficiency.
Handling Imbalance Dataset
Handling imbalanced datasets is crucial in machine learning because models trained on such datasets can often have a bias towards the majority class, leading to sub-optimal predictions for the minority class. Various techniques have been developed to address this issue. Here are some of the most commonly used techniques:
1. Resampling Techniques:
Upsampling (Over-sampling) the Minority Class: This involves creating copies of instances from the minority class or generating synthetic samples to balance the class distribution.
SMOTE (Synthetic Minority Over-sampling Technique): A popular method where synthetic samples are generated for the minority class.
ADASYN (Adaptive Synthetic Sampling): Similar to SMOTE but focuses on generating samples next to the original samples which are wrongly classified using a k-Nearest Neighbors classifier.
Downsampling (Under-sampling) the Majority Class: This involves removing instances from the majority class to balance the class distribution. This method might result in loss of data.
Tomek Links: Removes the majority class instances that are close to minority class instances.
Edited Nearest Neighbors: Removes majority class instances that are misclassified by nearest neighbors.
2. Using Different Evaluation Metrics:Â
Instead of accuracy, use metrics that provide better insight into the performance on the minority class:
Precision, Recall, F1-score
AUC-ROC (Area Under the Receiver Operating Characteristic Curve)
Matthews Correlation Coefficient (MCC)
3. Cost-sensitive Learning:
Introduce different misclassification costs for false positives and false negatives.
Algorithms such as Decision Trees, Random Forest, and Support Vector Machines allow for the incorporation of these different costs.
4. Algorithm-level Approaches:
Some machine learning algorithms allow for the incorporation of class weights, which provides a higher penalty for wrongly predicting the minority class. Examples include:
Weighted Random Forest
Support Vector Machine with weighted classes
5. Anomaly Detection:
   Treat the problem as an anomaly (or outlier) detection problem rather than a classification problem. The minority class is considered as the anomalies.
6. Using Ensemble Methods:
Bagging and Boosting with base classifiers that are sensitive to imbalance can often improve the overall performance.
Balanced Random Forest: Uses random under-sampling with a traditional random forest.
RUSBoost: Incorporates random under-sampling into the AdaBoost algorithm.
7. Transfer Learning and Semi-Supervised Learning:
Use transfer learning to leverage pre-trained models or other related datasets to improve the classification of the minority class.
Utilize unlabeled data to improve the learning algorithm's understanding of the minority class.
8. Combination of Over and Under Sampling:
SMOTE + Tomek Links
SMOTE + ENN (Edited Nearest Neighbors)
Some of the techniques mentioned above are applied in this study but it's often beneficial to experiment with a combination of these techniques to identify the most effective strategy for a particular dataset.
STOCHASTIC GRADIENT DESCENT WITH RESAMPLING
Stochastic Gradient Descent (SGD) remains an eminent optimization method in the domain of machine learning. This study presents the outcomes of integrating SGD with advanced resampling methodologies, specifically Borderline Synthetic Minority Over-sampling Technique combined with Tomek Links (BorderlineSMOTE_TomekLinks).
Key Findings:
Loss Convergence: The loss values, both raw and smoothed, consistently demonstrate a decline over increasing epochs, affirming the model's capacity to learn and adjust its parameters effectively.
Classification Proficiency: The classification report reveals that while the algorithm shows commendable precision, recall, and f1-score values for the majority class (Class 0), it faces challenges in precisely predicting the minority class (Class 1).
Confusion Matrix Insights: A majority of Class 0 instances are correctly predicted with a minimal rate of false positives. However, Class 1 prediction seems more challenging, with a considerable number of false negatives.
While SGD combined with BorderlineSMOTE_TomekLinks resampling offers promise in handling class imbalances, further refinement may be needed to enhance its proficiency for the minority class. Future work could delve into adjusting hyperparameters, experimenting with different penalties, or combining additional resampling techniques for better outcomes.
STOCHASTIC GRADIENT DESCENT variation
The presented results primarily focuses on the performance metrics of a Stochastic Gradient Descent (SGD) model when applied with resampling techniques. The graphical representation can be divided into three main sections:
Loss Trending Graph: The left section depicts "Raw vs Smoothed Loss across Epochs." The y-axis denotes the logarithmic scale of the loss, and the x-axis signifies the number of epochs. Different curves represent raw training, validation, and test losses, juxtaposed against their smoothed versions. The loss values exhibit fluctuations, indicating the nature of SGD as it updates weights based on individual training instances.
Classification Report: Positioned towards the upper right of the image, this tabulated report provides essential metrics of model performance. Precision, recall, and F1-score values are displayed for the two classes (0 and 1). The table also showcases the accuracy, macro average, and weighted average scores, giving an encompassing view of the model's capability in differentiating between the classes.
Confusion Matrix: Located below the classification report, this matrix visually presents the true positives, true negatives, false positives, and false negatives. The color gradient aids in quickly discerning the values, with darker shades indicating higher counts.
Technical Parameters:
The model parameters are highlighted at the bottom of the image, indicating the utilization of the "ADASYN_TomekLinks" resampling method. Other key parameters, such as the learning rate set as "constant" and a random state fixed at "42", are also specified.
Summary of Findings:
The SGD model, with its associated resampling method and parameters, produces promising results for class '0' with high precision, recall, and F1-score. However, the performance metrics for class '1' show room for improvement, especially in precision. This discrepancy suggests that while the model is adept at predicting the dominant class, it struggles with the minority class, a challenge often encountered in imbalanced datasets.
xgboost with resampling
This illustrates the results of an XGBoost model that has been trained with a resampling technique but without any scaling. The results are categorized into several main sections:
Mode Selection: The image starts with an option to choose a resampling mode among ADASYN, RandomOverSampler, and a combination of both. For this instance, "RandomOverSampler" with a random state of 48 has been applied.
Learning Curve: Situated on the left, the curve displays the training and validation loss across epochs. The graph illustrates that the training and validation losses converge steadily as the epochs increase, indicating a well-fitted model. The training loss starts high and gradually decreases, whereas the validation loss begins lower and slowly ascends before both stabilize, highlighting a balance and minimal overfitting.
Performance Metrics: Toward the right, two tables outline the performance metrics. The first table showcases the "Validation Set Performance" while the second displays "Prediction Performance." Both tables report precision, recall, and F1-score for two classes (0 and 1) along with accuracy, macro average, and weighted average scores.
For the Validation Set: The model demonstrates high precision, recall, and F1-score for class '0'. For class '1', while the recall is commendable at 0.80, precision is moderately lower at 0.51, leading to an F1-score of 0.62.
For Prediction Performance: Similarly, the metrics for class '0' are excellent, but class '1' reveals room for improvement, especially in precision.
Confusion Matrix: Below the performance metrics, the matrix displays the true and false positives and negatives. A majority of the instances are accurately classified, but a few false negatives and false positives are still evident.
Summary of Findings:
The XGBoost model with resampling but without scaling shows promising results, especially for the dominant class '0'. It excels in terms of precision, recall, and F1-score for this class both in validation and prediction. However, for the minority class '1', there's a noticeable gap between precision and recall, suggesting potential areas to enhance, especially in the context of imbalanced datasets. The learning curve and confusion matrix further affirm the model's robustness while also indicating slight areas for refinement.
xgboost with other resampling method
The visual showcases the results of an XGBoost model, which employs the BorderlineSMOTE_TomekLinks resampling technique but doesn't involve any scaling. A breakdown of the main sections of the results is as follows:
Chosen Resampling Technique: BorderlineSMOTE is a variation of the Synthetic Minority Over-sampling Technique (SMOTE) that focuses on creating synthetic samples near the decision boundary. When combined with TomekLinks, the algorithm further refines the dataset by removing instances from major classes that have a counterpart in the minor class.
Learning Curve: Positioned on the left, this curve demonstrates how the training and validation losses evolve across epochs. Both the training and validation losses seem to converge as epochs increase, implying that the model finds a balance between fitting the data and generalizing to new examples. This indicates minimal overfitting.
Performance Metrics: Positioned to the right, two tables convey performance metrics. The first addresses the "Validation Set Performance" and the second pertains to "Prediction Performance." Precision, recall, and F1-score for two classes (0 and 1) are presented, along with accuracy, macro average, and weighted average scores.
For the Validation Set: Precision is outstanding for class '0' but is moderate for class '1' (0.53). The recall for both classes is admirable, leading to an F1-score of 0.99 for class '0' and 0.64 for class '1'.
For Prediction Performance: Class '0' maintains high metrics, but class '1' exhibits a lower precision (0.47) and a subsequent F1-score of 0.57, implying some potential prediction challenges with this class.
Confusion Matrix: Positioned below the performance metrics, the matrix illustrates the true classifications against the predicted ones. The model exhibits a few false positives and false negatives, though a substantial number of instances are rightly classified.
Recommendations for Fine-Tuning:
While the model demonstrates solid performance for the dominant class '0', the metrics for the minority class '1', particularly precision, suggest there's room for improvement. Here are some potential fine-tuning strategies:
Feature Engineering: Refining or adding new features can enhance the predictive power of the model.
Parameter Tuning: Adjusting hyperparameters, such as learning rate, max depth, and number of trees, can optimize the XGBoost model.
Alternative Resampling: Though BorderlineSMOTE_TomekLinks is employed here, considering other oversampling or undersampling techniques might further balance the data.
Ensemble Techniques: Combining predictions from multiple models can improve overall performance.
The model's performance with BorderlineSMOTE_TomekLinks resampling without scaling is commendable, especially for the major class. However, further fine-tuning can potentially bolster the model's predictions, especially for the minority class.
random forest with tomeklinks
The presented results are for a Random Forest algorithm that uses TomekLinks for resampling. Here's a breakdown of the main sections:
Chosen Resampling Technique: The selected method is TomekLinks. This is an undersampling method that identifies and removes ambiguous points from the majority class that are close to the minority class, improving the decision boundary between classes.
OOB Error Rate vs. Number of Trees: Positioned on the left is a graph showing the Out-of-Bag (OOB) error rate plotted against different values of n_estimators (number of trees in the forest). The OOB error rate decreases sharply with an increasing number of trees, reaching a plateau around 40 trees. This suggests that, after approximately 40 trees, adding more trees does not significantly improve the OOB error rate.
Performance Metrics:
Validation Metrics: The validation set results demonstrate excellent precision for class '0' (0.99) and decent precision for class '1' (0.83). However, the recall for class '1' is lower (0.57), resulting in an F1-score of 0.67 for class '1'. The overall accuracy is 0.98.
Test Metrics: For the test set, precision remains high for class '0' (0.98) but drops slightly for class '1' (0.74) compared to the validation set. Similarly, the recall for class '1' decreases further to 0.51, leading to an F1-score of 0.60 for class '1'. The overall accuracy remains at 0.98.
Confusion Matrices:
Validation Set: The matrix displays 1933 true positives, 7 false positives, 26 false negatives, and 34 true negatives.
Test Set: The matrix presents 1928 true positives, 11 false positives, 30 false negatives, and 31 true negatives.
Conclusions:
The Random Forest model with TomekLinks resampling technique exhibits strong performance, especially in distinguishing class '0'. However, the model faces challenges in predicting the minority class '1', as evidenced by the lower recall and F1-score for this class on both validation and test datasets.
For improvement, it might be worth considering a combination of oversampling and undersampling techniques or further feature engineering and hyperparameter tuning. While the model performs exceptionally well for the majority class, efforts should focus on improving the recall for the minority class.
Neural network (using pytorch nn.module)
Deep Learning Algorithm
Model Configuration and Settings: The neural network is trained without any resampling techniques but includes data scaling. This is essential for neural networks as they perform best when input data is normalized or standardized. The learning rate is set at 0.001, and the model is trained over 100 epochs. Notably, there is no dropout or early stopping implemented in the training process.
Learning Curve: The graph shows the learning curve of both training and validation loss over the epochs. It is clear that the training loss steadily decreases and appears to stabilize towards the latter epochs. The validation loss, however, exhibits more fluctuation, although it also trends downward over time. This suggests that the model is learning, but there might be room for regularization to make the learning more stable and potentially improve generalization.
Performance Metrics:
Classification Metrics: The model demonstrates excellent precision, recall, and F1-score for class '0'. For class '1', while the precision is reasonable at 0.74, the recall and F1-score are slightly lower at 0.69 and 0.71, respectively. The overall accuracy of the model stands at a commendable 0.98.
Confusion Matrix: The matrix indicates that the model has made 15 false positives and 19 false negatives, suggesting a reasonable balance between the type I and type II errors.
Matthews Correlation Coefficient (MCC): The MCC value is 0.7035, indicating a good quality binary classification.
Conclusions & Recommendations:
The neural network model exhibits solid performance, especially in predicting the majority class '0'. However, for class '1', there's room for improvement in terms of recall and F1-score
Given that this is a deep learning model, the potential for fine-tuning and optimization is vast. Implementing regularization techniques like dropout, or leveraging more sophisticated architectures, might enhance performance. The use of early stopping, paired with validation checkpoints, can help prevent overfitting and ensure the model generalizes better to unseen data. Also, experimenting with learning rates, optimizers, and batch sizes can lead to better convergence and learning stability.
In summary, while the current performance is commendable, further fine-tuning and leveraging the power of deep learning can undoubtedly push the model's performance even higher.
MODEL PERFORMANCE AFTER PERMUTATION SHUFFLING
The displayed learning curve delineates the training process of a machine learning model, graphing the logarithmic loss values over 100 epochs for both training and validation datasets. The orange line represents the validation loss, while the blue line corresponds to the training loss. Both lines converge closely to zero loss as the number of epochs increases, indicating an effective learning process with minimal overfitting, as evidenced by the validation loss mirroring the training loss closely throughout the training process.
In the validation set performance metrics, the model achieves perfect precision and recall for class '0', with an F1-score of 1.00. For class '1', the model also demonstrates high precision and recall, at 0.97 and 0.98 respectively, culminating in an F1-score of 0.98. The accuracy of the model on the validation set is 1.00, with the macro average and weighted average for precision, recall, and F1-score also reflecting similarly high values.
These results suggest that the permutation shuffling algorithm has contributed to a model that performs exceptionally well on the validation set, with high scores across all evaluated metrics, indicating a robust predictive performance.
CONFUSION MATRIX AFTER PERMUTATION SHUFFLING
The matrix displays a classification report and a confusion matrix for a binary classification problem, evaluated on an x_test dataset after the application of a permutation shuffling algorithm.
The classification report shows precision, recall, f1-score, and support for two classes labeled as '0' and '1'. Both classes have a precision of 1.00 for class '0' and 1.00 for class '1', indicating perfect precision — the model's predictions are 100% accurate for class '0' and for class '1' as well. The recall for class '0' is 1.00, showing that every instance of class '0' was correctly identified. For class '1', the recall is 0.97, indicating that 97% of actual class '1' instances were identified. The f1-score, which balances precision and recall, is 1.00 for class '0' and 0.98 for class '1', suggesting an excellent harmonic mean of precision and recall for both classes. Support indicates the number of actual occurrences of each class in the dataset, with 1939 instances of class '0' and 61 of class '1'.
The overall accuracy of the model is 1.00, which means that the model correctly predicted the class of every instance in the dataset.
The confusion matrix visualizes the performance of the classification algorithm. It shows that the model predicted class '0' correctly 1939 times and class '1' correctly 59 times. It also indicates that there were 2 instances where class '1' was incorrectly predicted as class '0', but there were no instances of class '0' being incorrectly predicted as class '1'.
This high level of performance suggests that the permutation shuffling algorithm has not negatively impacted the model's ability to accurately predict class labels in this test dataset.
metrics comparison
The remaining metrics show high performance values for the best model, WeightedEnsemble_L2, with 'accuracy', 'balanced_accuracy', 'F1', 'roc_auc', 'average_precision', 'precision', and 'recall' all displaying strong performance.
Accuracy: The model's accuracy score is nearly perfect at 0.998, indicating that it correctly predicts the outcome 99.8% of the time.
Balanced Accuracy: The balanced accuracy score is very high at 0.9697, showing that the model performs very well across all classes, taking into account any imbalances in the dataset.
F1: The F1 score is 0.9688, reflecting a strong balance between precision and recall.
ROC-AUC: The ROC-AUC score is 0.9964, indicating an excellent ability to discriminate between the positive and negative classes.
Average Precision: The average precision score is 0.9625, which suggests that the model has a high precision across various threshold levels.
Precision: Precision is perfect at 1.0, meaning there are no false positives; every instance predicted as positive is truly positive.
Recall: The recall is also very high at 0.9394, indicating the model is able to identify a high percentage of all positive instances.
Overall, the model shows exceptional performance across the board, with scores near or at the maximum value for most metrics. This suggests that the model is highly effective at making predictions for the given task. The anomaly in the log loss value should be investigated further, as it does not conform to the typical range for this metric.
final validation with different algorithms
The bar chart visualizes the F1 validation scores of various machine learning and deep learning models. The F1 score is a harmonic mean of precision and recall and is particularly useful for evaluating models on imbalanced datasets.
From traditional machine learning models like KNeighbors (both uniform and distance weighted) with lower F1 scores of 0.2927 and 0.3 respectively, the performance significantly improves with gradient boosting models and tree-based ensembles. LightGBM, RandomForest (both Gini and Entropy), and LightGBMLarge exhibit strong F1 scores of 0.9538, showcasing the effectiveness of ensemble methods in handling complex patterns.
The CatBoost, ExtraTrees (both Gini and Entropy), XGBoost, and the ensemble model WeightedEnsemble_L2 lead the performance with the highest F1 validation scores of 0.9688. These models are known for their robust performance across various types of data distributions and their ability to handle feature interactions.
Notably, the deep learning model 'NeuralNetFastAI' and 'NeuralNetTorch' achieve an F1 validation score of 0.9688, matching the performance of the top-performing machine learning models. This is an important observation as it indicates that with the right configuration and despite early stopping (which halted training after no improvement since epoch 3), deep learning models can reach and potentially surpass the performance of traditional algorithms. This parity in performance underscores the versatility and capability of deep learning methods, which, when combined with techniques like early stopping to prevent overfitting, can be highly effective for predictive tasks.
FINAL PERFORMANCE OF DEEP LEARNING ALGORITHM USING PYTORCH NN.MODULE
The graph above displays a learning curve of a deep learning model with four layers, starting with 128 neurons for 10 epochs, NO dropouts or early stopping, no resampling only scaling of the final dataset. The learning curve shows two lines representing training loss and validation loss over a number of training epochs.
At epoch 0, the training loss starts at a high value but quickly decreases, indicating that the model is learning from the training data effectively. The rapid decrease and subsequent flattening out of the training loss line suggest that the model's performance on the training set improves significantly and then stabilizes as it learns.
The validation loss starts low and remains low throughout the training process, which is a positive indicator that the model is generalizing well to new, unseen data. This is a desirable outcome, as it suggests that the model is not overfitting to the training data. Overfitting would be indicated by a divergence of validation loss from training loss, where validation loss starts to increase or remains significantly higher than training loss.
Given that both training and validation loss lines are close together and low after the initial epochs, this is indicative of a well-fitted model. It suggests that the initial architecture of the network, with its four layers and 128 starting neurons, is effective for the task at hand. The model appears to be complex enough to capture the underlying patterns in the data, but not too complex as to overfit.
In summary, the depicted learning curve signifies a successful training process, with the model achieving a good balance between bias and variance, leading to a robust generalization capability.