Many candidates have recently started receiving the Capital One Data Scientist New Grad OA. Overall, the difficulty is moderate and relatively approachable: the first two questions focus on core data handling and logic, while the last two lean heavily toward Simulation-style tasks that test structured thinking and attention to detail. Most candidates don’t fail because the problems are unsolvable — they struggle with workflow complexity and time pressure.
This guide walks through the question types, solution strategies, and common pitfalls so you can build a clear problem-solving framework before taking the assessment.
Question 1: Fundamental Data Analysis + CSV Output
Core skills tested: multi-file ingestion, data cleaning, aggregation, exporting results.
Task: Read drivers.csv along with multiple trip datasets (rides_1.csv through rides_4.csv), perform basic cleaning and merging, compute required metrics, and output a final CSV.
Approach:
- Load the driver dataset and compute metrics such as average rating and the proportion of bilingual drivers.
- Merge the four trip datasets and calculate the success rate of completed rides.
- Construct the final DataFrame according to the requirements and export it.
This is typically considered a “free-point” question — just be careful with file paths, encoding, and null handling.
Question 2: Time-Based Features + Extended Field Analysis
Core skills tested: temporal feature engineering, multi-table joins, missing value handling.
Task: Using a fixed reference date (2023-04-15), derive features such as driving tenure and vehicle inspection intervals. Merge datasets and aggregate likes per driver.
Approach:
- Compute tenure and inspection gap using the given baseline date.
- Aggregate total likes by
driver_idfrom the trip data. - Use the driver table as the primary table and left join vehicle and aggregated trip tables.
- Fill missing like counts with 0 and output columns in the specified order.
Important: Always use the provided baseline date — never rely on the runtime system date.
Question 3: Driver Profiling / Performance Dataset Pipeline
Core skills tested: preprocessing pipelines, missing value strategies, normalization, categorical encoding.
This is classic Simulation style — instead of calculating metrics, you must strictly reproduce a preprocessing workflow.
Task: Apply consistent preprocessing to both training and test datasets, including imputation, encoding, and scaling.
Approach:
- Fill missing age values using the training-set mean only, then round.
- Fit categorical encodings on the training set; map unseen test categories to
-1. - Standardize tip net value using training-set mean and standard deviation.
- Encode driver levels with fixed mappings, keep five decimal places, and export results.
The biggest risk here is train/test data leakage. Every statistical value — means, standard deviations, encoding maps — must come exclusively from the training data.
Question 4: Machine Learning Classification Task
Core skills tested: classification modeling, class imbalance handling, metric prioritization (Precision vs. Recall).
Task: Predict driver_class (0/1) using the cleaned dataset from the previous step. Class B (1) is the positive class, and the objective is to maximize Recall while keeping Precision at an acceptable level.
Approach:
- Load training, validation, and test sets; remove irrelevant ID columns.
- Combine training and validation data, then split into features (X) and labels (y).
- Use a class-weighted model such as Random Forest to address imbalance.
- Generate predictions for the test set and output them in the required format.
This is a very typical Capital One setup: imbalanced classes with a Recall preference. Default model parameters often underperform unless adjusted.
Want to Avoid Costly Mistakes in the Capital One DS OA?
If you’ve recently received the Capital One DS New Grad OA, Simulation-style questions should be your top priority. Many candidates struggle not because they lack coding ability, but because:
- The prompts are long and layered, making them hard to fully grasp in one pass.
- Preprocessing pipelines are strict — one incorrect step can invalidate the entire output.
- Time pressure leaves little room for debugging or verification.
We continuously compile real OA questions from top North American companies and have deep familiarity with Capital One’s testing patterns, scoring logic, and frequently used models. If you want to approach your OA with confidence, reduce avoidable mistakes, and prevent the assessment from becoming a bottleneck in your interview process, you can explore our support options here:
Learn more about our OA support
Many candidates have successfully advanced to the next interview stage with the right preparation — make sure you give yourself that advantage.
Top comments (0)