DEV Community

net programhelp
net programhelp

Posted on

BCG X Data Scientist OA Review | 4 Questions Passed with a Stable Data Pipeline Approach

Just wrapped up a BCG X Data Scientist Online Assessment with one of our students, and the overall experience was surprisingly predictable. The question patterns have stayed almost unchanged for the past couple of years, and many candidates report seeing very similar problems repeated across different OA sessions.

The assessment included four questions in total. The structure mainly focuses on Python data processing combined with a light machine learning task. If you've practiced building a small data pipeline before — reading datasets, cleaning data, merging tables, and training a simple model — the implementation becomes very manageable during the exam.

Once the coding structure is clear, most of the work becomes straightforward pandas operations and basic ML steps.

Quick Overview of the Four Questions

Q1 Data Statistics (Basic Pandas)

The first task involves computing several metrics based on a driver table and multiple trip datasets.

Main objectives:

  • Calculate the average rating of drivers
  • Compute the percentage of drivers who speak a second language
  • Merge multiple trip files and calculate the success rate of trips

The main trick here is handling multiple input files correctly. After merging them into a single dataset, the remaining operations are mostly standard pandas workflows such as filtering, aggregation, and computing ratios.

Q2 Data Preprocessing (Standard Feature Engineering Pipeline)

The second question is essentially a classic machine learning preprocessing pipeline. The steps are quite typical for real-world ML projects.

The pipeline includes:

  • Fill missing age values using the mean age from the training set and round the result
  • Combine training and test datasets for consistent preprocessing
  • Apply ordinal encoding to vehicle type and second language features
  • Normalize tip amount and keep five decimal places
  • Convert driver level A/B into binary values (0/1)

After completing the transformations, the combined dataset is split again into processed training and testing sets before saving the outputs.

Q3 Multi-Table Data Integration (Join + Aggregation)

This problem requires joining three different tables: driver, vehicle, and trip.

Key tasks include:

  • Compute the inspection interval for each vehicle in days
  • Calculate driving experience using: 2023 − starting driving year
  • Aggregate total likes for each driver
  • Fill missing values with 0 for drivers without trips

The solution mainly involves pandas merge, groupby, and aggregation operations to produce a clean summary dataset.

Q4 Random Forest Modeling + Threshold Optimization

The final question introduces a classification model where threshold tuning plays an important role.

Typical modeling workflow:

  • Fill numerical features using median values
  • Fill categorical features using the mode
  • Train a Random Forest classifier
  • Assign higher weight to class B

During validation, the model iterates through multiple probability thresholds. The objective is to maintain a minimum precision requirement while maximizing recall.

After identifying the best threshold, predictions are generated for the test dataset and exported as the final result.

Preparation Takeaways

Consulting-company data science assessments like this tend to follow very repeatable patterns. Instead of grinding hundreds of new problems, it is much more effective to practice a complete pipeline workflow:

  • Data loading
  • Dataset merging and cleaning
  • Feature encoding
  • Aggregation and metric computation
  • Model training and threshold tuning

Once this workflow becomes familiar, most questions can be solved quickly during the assessment.

Real-Time OA Assistance

If you have a BCG or similar consulting-company data science OA scheduled soon and want to avoid getting stuck during the exam, many candidates choose to use our real-time OA assistance.

We provide live guidance during the assessment, help structure the solution pipeline, and offer code framework support when needed. This approach has helped many students complete their OA smoothly and move forward to the interview stage.

Top comments (0)