Data Science Internship Interview Prep: Singapore Guide
Data science is one of the fastest-growing internship categories in Singapore. GovTech, Sea Group (Shopee), Grab, DBS, Standard Chartered, and dozens of funded startups hire data science interns. The interview covers four distinct areas: SQL, statistics, machine learning concepts, and case studies. This guide prepares you for all four.
The Singapore Data Science Interview Structure
| Stage | What Is Tested |
|---|---|
| Online assessment | SQL + probability + basic Python/pandas |
| Technical round 1 | SQL, statistics, ML concepts |
| Technical round 2 / case study | Open-ended data problem or A/B test design |
| Behavioural round | Communication, stakeholder management, past projects |
Section 1: SQL
SQL is tested at every company. Proficiency up to and including window functions is expected.
Core skills to master:
- SELECT, WHERE, GROUP BY, HAVING, ORDER BY — Foundational. Must be automatic.
- JOINs (INNER, LEFT, RIGHT, FULL OUTER) — Know when to use each and how NULLs behave
- Subqueries and CTEs — CTEs (WITH statements) are preferred for readability; subqueries are sometimes faster
- Window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER(PARTITION BY...) — These are the most commonly tested advanced SQL concepts
Common question patterns:
"Find the top 3 products by revenue for each region." → Use RANK() or ROW_NUMBER() with PARTITION BY.
"Find users who made a purchase in January but not in February." → LEFT JOIN or NOT IN / NOT EXISTS pattern.
"Calculate the 7-day rolling average of daily orders." → SUM() OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
Practice resources: LeetCode SQL section (Medium difficulty), Mode Analytics practice problems, StrataScratch.
Section 2: Statistics
Probability fundamentals:
- Bayes' theorem and conditional probability
- Probability distributions: Normal, Bernoulli, Binomial, Poisson, Exponential
- Central Limit Theorem: Why is it important? (It allows us to make inferences about population parameters using sample statistics)
- Law of Large Numbers
Hypothesis testing:
- Null hypothesis (H0) vs alternative hypothesis (H1)
- Type I error (false positive, α) and Type II error (false negative, β)
- p-value: The probability of observing results at least as extreme as the data, assuming H0 is true. Does NOT tell you the probability that H0 is true.
- t-test (comparing means), chi-square test (categorical variables), ANOVA (multiple groups)
Common interview question: "You run an A/B test and get p=0.04 with α=0.05. What do you conclude?" Answer: You reject H0 — there is statistically significant evidence of a difference. However, always add: "Statistical significance does not equal practical significance — I'd also look at the effect size and business impact."
A/B testing: Expect questions like: "How would you design an A/B test to evaluate a new feature on Grab's app?" Structure your answer: Define the metric (primary KPI + guardrails), determine sample size (using power analysis: need minimum detectable effect, α, and power 1-β), determine random assignment (user-level or session-level?), set the test duration, and plan the analysis.
Section 3: Machine Learning Concepts
For most internship interviews, conceptual understanding is more important than implementation code.
Supervised learning:
- Linear regression: assumptions (linearity, homoscedasticity, independence), interpretation of coefficients, R²
- Logistic regression: logit function, odds ratio, binary classification
- Decision trees: splitting criteria (Gini impurity, entropy), overfitting, pruning
- Random forests: ensemble method, bagging, feature importance
- Gradient boosting (XGBoost, LightGBM): boosting vs bagging, overfitting control
Unsupervised learning:
- K-means clustering: how it works, choice of K (elbow method, silhouette score)
- PCA: dimensionality reduction, explained variance ratio
Model evaluation:
- Accuracy, Precision, Recall, F1 Score — and when to prioritise each
- ROC-AUC curve — what it means, when it is misleading (class imbalance)
- Confusion matrix — be able to draw and interpret one
Common interview question: "Your model has 99% accuracy but your boss is unhappy. Why might this be?" Answer: Class imbalance. If 99% of transactions are legitimate and 1% are fraudulent, a model that predicts "legitimate" for everything achieves 99% accuracy while catching zero fraud.
Section 4: Case Study Questions
Data science case studies test your ability to approach a business problem with data thinking.
Example: "Shopee's conversion rate dropped 15% last week. How would you investigate?"
Strong answer:
- Clarify the metric: Is this across all platforms (iOS, Android, desktop)? All geographies? All product categories?
- Check for data issues: Is the data collection pipeline functioning correctly?
- Segment: Break down by device, geography, traffic source, user cohort (new vs returning). Where is the drop most concentrated?
- Hypotheses: UI change? Payment gateway issue? Price increase? Seasonal effect? Competitor promotion?
- Test: Which hypothesis can you test most quickly with available data?
- Action: What would you recommend based on the most likely cause?
GovTech DS Interview Specifics
GovTech data science interviews emphasise:
- Python (pandas, scikit-learn) over R
- Data visualisation (seaborn, plotly, or Tableau)
- Public sector use cases: fraud detection, resource allocation, citizen service personalisation
- Data privacy awareness: PDPA compliance, data minimisation, anonymisation techniques
Grab / Sea DS Interview Specifics
Grab focuses on geospatial data analysis (ETA models, surge pricing, driver allocation) and recommendation systems. Sea/Shopee focuses on user behaviour modelling, GMV forecasting, and pricing optimisation.
For both, be ready to discuss a real personal project in Python or SQL. Candidates who can walk through a live notebook or GitHub project during a technical interview consistently outperform those who can only answer abstract questions.