A Predictive Analytics Project Using Machine Learning
ISOM 835: Predictive Analytics and Machine Learning with Python
This project applies the complete predictive analytics workflow to predict whether a student will be selected for an internship. The analysis uses student-level data including academic performance, coding skills, interview scores, project experience, internships completed, GitHub activity, and other career-readiness indicators.
The goal is to identify the factors that most influence internship selection and build predictive models that can support students, recruiters, and university career services in making better data-driven decisions.
Internship selection is an important process for both students and employers. Students want to understand what skills and experiences improve their chances, while recruiters need efficient and consistent ways to evaluate candidates.
The dataset contains 10,000 student records and includes variables related to academic performance, technical skills, professional experience, and interview readiness.
Key features include CGPA, coding test score, interview score, skills score, projects count, internships completed, GitHub score, college tier, placement training, and extracurricular participation.
The exploratory analysis examined the structure of the dataset and relationships between features and internship selection outcomes.
Recommended visuals for this section include histograms, boxplots, a correlation heatmap, and bar charts comparing selected vs. non-selected students.
Before modeling, the dataset was cleaned and prepared for machine learning. The student ID column was removed because it does not help predict selection. Categorical variables were encoded, numerical variables were scaled, and the dataset was split into training and testing sets.
Three classification models were evaluated:
The models were compared using accuracy, precision, recall, F1 score, and ROC-AUC. These metrics help evaluate how well the models predict internship selection.
| Model | Accuracy | F1 Score | ROC-AUC |
|---|---|---|---|
| Logistic Regression | Replace with your result | Replace with your result | Replace with your result |
| Random Forest | Replace with your result | Replace with your result | Replace with your result |
| Gradient Boosting | Replace with your result | Replace with your result | Replace with your result |
Based on the model comparison, Gradient Boosting is expected to be the strongest model because it can capture complex patterns in the data.
Because this project relates to internship selection, ethical considerations are important. Some variables, such as college tier or prior internship access, may reflect existing inequalities. If a model relies too heavily on these variables, it may reinforce unfair outcomes.
For this reason, predictive models should support human decision-making rather than replace it. Recruiters and universities should use model insights carefully and ensure that hiring decisions remain fair, transparent, and accountable.