
This project investigates the use of machine learning to classify the severity of traffic delays caused by roadway accidents based on features available at the time of the incident. The problem addressed is the need for timely identification of high-impact events to support traffic management and routing decisions. The research question concerns how accident-related traffic delay severity can be predicted based on real-time features, with a focus on minimizing false negatives for high-severity cases. Concepts applied include supervised classification, class balancing, feature engineering, and model validation. The analysis is based on the US Accidents dataset containing over 7.7 million records, which was cleaned, binarized, balanced, and used to train four models. Histogram-Based Gradient Boosting achieved the highest recall at 0.79, outperforming Random Forest, Logistic Regression, and Multilayer Perceptron, which showed higher accuracy but lower sensitivity to severe cases. These results suggest that HGBoost is best suited for applications where the accurate identification of high-severity delays is prioritized. It is recommended as the preferred model when recall is the primary objective and training efficiency is also relevant.
Built With
Topics