DEV Community: Jeremiah Oseremi Mayaki

Spark Augmented Reality (AR) Filter Engagement Metrics

Jeremiah Oseremi Mayaki — Mon, 19 May 2025 12:29:40 +0000

I recently completed an SQL challenge on the interviewmaster.ai platform involving a scenario where I am the data analyst in the marketing analytics team at Meta and have been tasked with evaluating the performance of branded AR filters with the aim of identifying which filters are driving the highest user interactions and shares to inform future campaign strategies for brands using the Spark AR platform. By analyzing engagement data, my team aims to provide actionable insights that will enhance campaign effectiveness and audience targeting.
I completed this challenge using SQLite.

I was provided with 2 tables:

ar_filters: containing the filter_id and the filter_name fields
ar_filters_engagements containing the engagement_id, filter_id, interaction_count and engagement_date fields

Challenge 1

I was required to query the dataset to return the AR filters that have generated (at least 1) user interactions in July 2024 by their filter names.

This challenge required me to;

Retrieve the filter names.
Use the SUM() aggregate function.
Join the two tables using the filter_id as the common field in both tables.
Filter the result based on the required date using the strftime() function.
Order the result by the total interaction count.

SELECT f.filter_id, 
  f.filter_name,
  SUM(e.interaction_count) AS total_interaction_count
FROM ar_filters AS f
JOIN ar_filter_engagements AS e
  ON f.filter_id = e.filter_id
WHERE strftime('%Y', e.engagement_date) = '2024'
  AND strftime('%m', e.engagement_date) = '07'
GROUP BY f.filter_id, f.filter_name
ORDER BY total_interaction_count DESC

Challenge 2

I was required to get how many total interactions each AR filter received in August 2024, and to return only filter names that received over 1000 interactions, and their respective interaction counts.

This challenge required me to use the HAVING clause to selectively filter out only the AR filters that have more than 1000 engagements.

Although, this seems like a filter function that could have been done using the WHERE statement, SQL does not support using an aggregate function SUM(e.interaction_count) in this case, hence the reason why we had to use the HAVING clause.

 SELECT f.filter_id, 
  f.filter_name,
  SUM(e.interaction_count) AS total_interaction_count
FROM ar_filters AS f
JOIN ar_filter_engagements AS e
  ON f.filter_id = e.filter_id
WHERE strftime('%Y', e.engagement_date) = '2024'
  AND strftime('%m', e.engagement_date) = '08'
GROUP BY f.filter_id, f.filter_name
   HAVING SUM(e.interaction_count) > 1000
ORDER BY total_interaction_count DESC

Challenge 3

In the third and last challenge, I was required to write a query that returns the top 3 AR filters with the highest number of interactions in September 2024 and show how many interactions each filter received.

All I had to do was to edit the query that solved the second task. I removed the HAVING clause (since this challenge did not require an engagement count limit to be added to the result), and LIMITed my answer to just the top 3 filter names.

 SELECT f.filter_id, 
  f.filter_name,
  SUM(e.interaction_count) AS total_interaction_count
FROM ar_filters AS f
JOIN ar_filter_engagements AS e
  ON f.filter_id = e.filter_id
WHERE strftime('%Y', e.engagement_date) = '2024'
  AND strftime('%m', e.engagement_date) = '09'
GROUP BY f.filter_id, f.filter_name
ORDER BY total_interaction_count DESC
LIMIT 3

Overall, it was a thrilling challenge which required some serious analytical thinking.

What do you think about it? Recommendations are highly welcome.

Taxi Drivers Efficiency Analysis with SQL & Tableau

Jeremiah Oseremi Mayaki — Sat, 10 May 2025 11:16:07 +0000

I recently completed a hands-on SQL project using the Chicago Taxi Trips dataset on BigQuery, where I calculated taxi drivers' efficiency based on total fare earned per minute spent on trips for a single day.

🧠 What I Did:

1. Queried the dataset with a Common Table Expression (CTE):

One of the most useful SQL features I leveraged in this project was the Common Table Expression (CTE).

A CTE lets you create a temporary, named result set that can be referenced within your main query. For me, it makes my SQL logic far more readable and manageable.

Instead of cramming all calculations into one long block of code, I broke things down step by step—calculating total trip duration, number of trips, and total fare inside the CTE. This made the final SELECT query cleaner, easier to debug, and more efficient to run.

# Creating a CTE
WITH table_mains AS (
  SELECT
    taxi_id,
    SUM (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) AS total_trip_duration,
    COUNT (*) AS trip_count,
    SUM (fare) AS total_fare
  FROM
    `bigquery-public-data.chicago_taxi_trips.taxi_trips`
  WHERE
    DATE (trip_start_timestamp) = '2013-10-03'

# Filtering out rows that could have problems so as to get a clean result
    AND (fare) is NOT NULL
    AND (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) > 0
  GROUP BY taxi_id
)

2.Filtered out incomplete or unrealistic data while creating the CTE:

To ensure accurate results and insights, I filtered out rows with missing fare values and zero trip durations. This step is crucial—it prevents skewed efficiency scores and keeps the analysis reliable

# Filtering out rows that could have problems so as to get a clean result
    AND (fare) is NOT NULL
    AND (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) > 0

3. Calculated key metrics: total trip duration, total fare, number of trips, and efficiency score and ranked drivers based on their efficiency score:

I calculated key performance metrics like total trip duration, total fare, and trip count.

Using SAFE_DIVIDE, I computed the efficiency score (fare per minute) to avoid errors from division by zero. Then, I applied the RANK() window function to rank drivers based on this score—making it easy to identify the most efficient drivers at a glance.

Also, I used the WHERE clause to extract results for drivers that had more than 5 trips and ordered the results by efficiency rank.

SELECT 
  taxi_id,
  total_trip_duration,
  trip_count,
  total_fare,
  SAFE_DIVIDE (total_fare, trip_count) AS avg_trip_cost,
  SAFE_DIVIDE (total_fare, total_trip_duration) AS efficiency_score,
  RANK () OVER (
    ORDER BY SAFE_DIVIDE (total_fare, total_trip_duration) DESC
  ) AS efficiency_rank
FROM 
  table_mains

# Say we want to see the results for taxis that travelled more than 5 trips
WHERE
  trip_count >= 5
ORDER BY efficiency_score DESC

📊 Results:

After running the query and getting my query result, I saved the result and went ahead to import the saved result into Tableau to create a visualization and draw insights from the result.

The visualization brings the data to life—making it easy to compare driver performance at a glance with its interactivity. It also helps stakeholders quickly identify top performers and make data-driven decisions with clarity

🧪 Tools Used:

SQL (Google BigQuery Sandbox)
Tableau (for visualization)

💭 Reflection:
This project helped me reinforce my understanding of analytic functions like SAFE_DIVIDE() and RANK(), and taught me how to turn raw data into actionable insights.

What do you think of this approach? Feedback and ideas are welcome!

Thank you!