MY PROJECTS

Project 1: ETA Decoded: Predicting Flight Delays: Leveraging Machine Learning to Enhance Airline Efficiency and Passenger Experience”

Overview:

Flight delays are a significant challenge in the aviation industry, affecting passenger satisfaction, airline efficiency, and operational costs. This project aimed to address two questions:

Will a flight be delayed?

If delayed, how severe will the delay be?

Leveraging machine learning models and a comprehensive dataset of over 6 million flights, we developed predictive solutions to help airlines optimize resource allocation and improve the passenger experience.

Techniques: Logistic regression, random forests, ridge regression, feature engineering, and hyperparameter tuning.

Approach:

Data Analysis: Conducted exploratory data analysis (EDA) to identify trends such as higher delay rates during summer, weekends, and specific times (e.g., 3:00 AM and 7:00 PM flights).

Feature Engineering: Designed new variables to improve predictive performance, including airport congestion indicators, delay trends by aircraft, and seasonal labels.

Model Development: Built and optimized machine learning models, including logistic regression, decision trees, random forests, and ridge regression, to predict delays and their durations.

Strategic Insights: Delivered actionable recommendations for stakeholders, emphasizing high-risk delay periods, optimizing schedules, and proactive communication strategies.

Key Insights:

Delay Patterns: Summer, Sundays, and Mondays exhibit the highest delay rates, with Allegiant Air and JetBlue Airways having the highest delay frequencies.

Flight Factors: Features such as aircraft-specific delay histories, airport congestion, and travel patterns significantly impact delay predictions.

Model Performance:

Binary prediction (will a flight be delayed): Random Forest achieved the highest recall at 79%.

Continuous prediction (how long will a flight be delayed): Polynomial Ridge Regression explained 7% of variance, with an RMSE of 16.87 minutes.

Proposed Solutions:

Operational Efficiency: Airlines can use delay predictions to proactively adjust schedules, allocate resources, and minimize bottlenecks.

Passenger Communication: Predictive insights enable timely notifications, reducing passenger dissatisfaction and missed connections.

Real-Time Data Integration: Incorporating live data (e.g., weather, air traffic, maintenance) would further improve model accuracy and responsiveness.

Contributions:

Feature Engineering

Designed and integrated interaction variables like Distance_AvgDelays to capture relationships between flight distance and historical delays, improving model predictions.

Excluded features directly correlated with the target variable, such as DepDelayMinutes, to prevent data leakage and ensure the model’s generalizability.

Applied feature scaling with StandardScaler to standardize values across all features, improving model stability and predictive accuracy.

Model Development

Built and refined regression models achieving an RMSE of 16.87 minutes for continuous delay prediction.

Addressed data leakage by excluding features directly related to delay metrics and capped outliers to improve model generalizability.

Stakeholder Communication:

Delivered findings through:

Created a comprehensive presentation deck summarizing key insights, model performance, and actionable recommendations.

A detailed technical report documenting data preprocessing, feature engineering, modeling approaches, and evaluation results.

Visualizations and dashboards to communicate complex data and patterns effectively to both technical and non-technical audiences.

Artifacts:

Final Project Report

Presentation Deck