Internship Selection Prediction Project

Executive Summary

This project applies the complete predictive analytics workflow to predict whether a student will be selected for an internship. The analysis uses student-level data including academic performance, coding skills, interview scores, project experience, internships completed, GitHub activity, and other career-readiness indicators.

The goal is to identify the factors that most influence internship selection and build predictive models that can support students, recruiters, and university career services in making better data-driven decisions.

Business Context & Objectives

Internship selection is an important process for both students and employers. Students want to understand what skills and experiences improve their chances, while recruiters need efficient and consistent ways to evaluate candidates.

Project Objectives:

Explore and clean the internship selection dataset
Identify key factors that influence internship selection
Build and compare predictive machine learning models
Generate actionable insights for students and recruiters
Reflect on ethical and fairness concerns in hiring-related predictions

Dataset Description

The dataset contains 10,000 student records and includes variables related to academic performance, technical skills, professional experience, and interview readiness.

Target Variable: selected
1: Selected for internship
0: Not selected

Key features include CGPA, coding test score, interview score, skills score, projects count, internships completed, GitHub score, college tier, placement training, and extracurricular participation.

Exploratory Data Analysis (EDA)

The exploratory analysis examined the structure of the dataset and relationships between features and internship selection outcomes.

Key EDA Findings:

Students with higher interview scores were more likely to be selected.
Coding test scores showed a strong relationship with selection outcomes.
Students with more projects and internship experience had better selection rates.
CGPA had a positive but moderate relationship with selection.
Categorical variables such as college tier and placement training also influenced outcomes.

Recommended visuals for this section include histograms, boxplots, a correlation heatmap, and bar charts comparing selected vs. non-selected students.

Methodology & Modeling

Before modeling, the dataset was cleaned and prepared for machine learning. The student ID column was removed because it does not help predict selection. Categorical variables were encoded, numerical variables were scaled, and the dataset was split into training and testing sets.

Three classification models were evaluated:

Logistic Regression: Used as a baseline model.
Random Forest: Used to capture complex relationships and feature importance.
Gradient Boosting: Used as an advanced model to improve predictive performance.

Results & Model Comparison

The models were compared using accuracy, precision, recall, F1 score, and ROC-AUC. These metrics help evaluate how well the models predict internship selection.

Model	Accuracy	F1 Score	ROC-AUC
Logistic Regression	Replace with your result	Replace with your result	Replace with your result
Random Forest	Replace with your result	Replace with your result	Replace with your result
Gradient Boosting	Replace with your result	Replace with your result	Replace with your result

Based on the model comparison, Gradient Boosting is expected to be the strongest model because it can capture complex patterns in the data.

Business Insights & Recommendations

Key Insights:

Interview performance is one of the strongest predictors of internship selection.
Technical skills, especially coding ability, strongly influence outcomes.
Practical experience through projects and internships improves selection chances.
Academic performance matters, but it is not the only deciding factor.

Recommendations for Students

Practice coding and technical problem-solving consistently.
Prepare for interviews and improve communication skills.
Build a strong project portfolio.
Gain internship or practical experience whenever possible.

Recommendations for Recruiters and Universities

Use structured evaluation criteria for hiring decisions.
Support students with interview preparation and technical training.
Use predictive analytics as a decision-support tool, not a final decision-maker.

Ethics & Interpretability

Because this project relates to internship selection, ethical considerations are important. Some variables, such as college tier or prior internship access, may reflect existing inequalities. If a model relies too heavily on these variables, it may reinforce unfair outcomes.

For this reason, predictive models should support human decision-making rather than replace it. Recruiters and universities should use model insights carefully and ensure that hiring decisions remain fair, transparent, and accountable.

Project Deliverables

Google Colab notebook
Final business report
Dataset folder
Visualizations folder
README.md file
.gitignore file