Internship Selection Prediction

A Predictive Analytics Project Using Machine Learning

ISOM 835: Predictive Analytics and Machine Learning with Python

View Repository Run in Colab

Executive Summary

This project applies the complete predictive analytics workflow to predict whether a student will be selected for an internship. The analysis uses student-level data including academic performance, coding skills, interview scores, project experience, internships completed, GitHub activity, and other career-readiness indicators.

The goal is to identify the factors that most influence internship selection and build predictive models that can support students, recruiters, and university career services in making better data-driven decisions.

Business Context & Objectives

Internship selection is an important process for both students and employers. Students want to understand what skills and experiences improve their chances, while recruiters need efficient and consistent ways to evaluate candidates.

Project Objectives:
  • Explore and clean the internship selection dataset
  • Identify key factors that influence internship selection
  • Build and compare predictive machine learning models
  • Generate actionable insights for students and recruiters
  • Reflect on ethical and fairness concerns in hiring-related predictions

Dataset Description

The dataset contains 10,000 student records and includes variables related to academic performance, technical skills, professional experience, and interview readiness.

Key features include CGPA, coding test score, interview score, skills score, projects count, internships completed, GitHub score, college tier, placement training, and extracurricular participation.

Exploratory Data Analysis (EDA)

The exploratory analysis examined the structure of the dataset and relationships between features and internship selection outcomes.

Key EDA Findings:
  • Students with higher interview scores were more likely to be selected.
  • Coding test scores showed a strong relationship with selection outcomes.
  • Students with more projects and internship experience had better selection rates.
  • CGPA had a positive but moderate relationship with selection.
  • Categorical variables such as college tier and placement training also influenced outcomes.

Recommended visuals for this section include histograms, boxplots, a correlation heatmap, and bar charts comparing selected vs. non-selected students.

Methodology & Modeling

Before modeling, the dataset was cleaned and prepared for machine learning. The student ID column was removed because it does not help predict selection. Categorical variables were encoded, numerical variables were scaled, and the dataset was split into training and testing sets.

Three classification models were evaluated:

Results & Model Comparison

The models were compared using accuracy, precision, recall, F1 score, and ROC-AUC. These metrics help evaluate how well the models predict internship selection.

Model Accuracy F1 Score ROC-AUC
Logistic Regression Replace with your result Replace with your result Replace with your result
Random Forest Replace with your result Replace with your result Replace with your result
Gradient Boosting Replace with your result Replace with your result Replace with your result

Based on the model comparison, Gradient Boosting is expected to be the strongest model because it can capture complex patterns in the data.

Business Insights & Recommendations

Key Insights:
  • Interview performance is one of the strongest predictors of internship selection.
  • Technical skills, especially coding ability, strongly influence outcomes.
  • Practical experience through projects and internships improves selection chances.
  • Academic performance matters, but it is not the only deciding factor.

Recommendations for Students

Recommendations for Recruiters and Universities

Ethics & Interpretability

Because this project relates to internship selection, ethical considerations are important. Some variables, such as college tier or prior internship access, may reflect existing inequalities. If a model relies too heavily on these variables, it may reinforce unfair outcomes.

For this reason, predictive models should support human decision-making rather than replace it. Recruiters and universities should use model insights carefully and ensure that hiring decisions remain fair, transparent, and accountable.

Project Deliverables