DEV Community

minddd64
minddd64

Posted on

Understanding Search Functionality

Questions before we create the function

  • Why does this function make the product better?
  • How can we measure the effect of the function quantitatively?
  • How long does it take to create the function
  • Does this function make the product too complicated?
  • How dangerous is the function?
  • How innovative is the function?
  • Is it aligned with what users want?

Hypothesis that we need to check

  • Do users use the search function a lot? Among the numerous functions used by users, how commonly is the search function?
  • How often do users use the search function in one session? (It might be a problem if they use it too often or too rarely)
  • How much do results clicks occur after searching?
  • where is the results that is clicked after a search?
  • The more users search, the more click they do?
  • What is the retention rate of the search function users?

Summary of problem

  • Explore whether the search function is necessary
  • Which part should be improved and how

Data Analysis

1. Percentage of search function usage

SELECT DATE_TRUNC('week', z.session_start) AS week,
       COUNT(*) AS sessions, 
       COUNT(CASE WHEN z.autocompletes > 0 THEN z.session ELSE NULL END) AS with_autocompletes,
       COUNT(CASE WHEN z.runs > 0 THEN z.session ELSE NULL END) AS with_runs
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
GROUP BY week
Enter fullscreen mode Exit fullscreen mode

Image description

Image description


2. The number of sessions with autocompletes

SELECT autocompletes, COUNT(*) AS sessions
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
GROUP BY autocompletes

Enter fullscreen mode Exit fullscreen mode

Image description

3. The number of sessions with full search

SELECT runs, COUNT(*) AS sessions
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
GROUP BY runs

Enter fullscreen mode Exit fullscreen mode

Image description

4. Distribution of the clickthrough after full search

SELECT clicks, COUNT(*)
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
WHERE runs > 0 
GROUP BY clicks

Enter fullscreen mode Exit fullscreen mode

Image description

  • After full search, zero clickthrough is more than half.
SELECT runs, AVG(clicks) ::FLOAT AS average_clicks
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
WHERE runs > 0 
GROUP BY runs

Enter fullscreen mode Exit fullscreen mode

Image description

  • The more users search, the more they click.
  • Above 13 runs, the amount of sessions decreased -> not accurate

5. The order of clicked contents

SELECT event_name, COUNT(user_id)
FROM tutorial.yammer_events
WHERE event_name LIKE 'search_click_result_%'
GROUP BY event_name
ORDER BY event_name

Enter fullscreen mode Exit fullscreen mode

Image description

  • The order of contents that users click is distributed.
  • In chart 4, users who click only one result is 2.45%. It implies that it's not the matter of ranking, but the matter of results itself.

6. Retention users

SELECT searches, COUNT(*) AS users
FROM (
SELECT user_id, COUNT(*) AS searches
FROM (
SELECT x.session_start,
       x.session,
       x.user_id, 
       COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
       COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
       COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
       session.session,
       session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(

SELECT bounds.*,
       CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
       WHEN last_event IS NULL THEN id 
       ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at, 
       occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
       LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
       ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
      next_event >= INTERVAL '10 MINUTE' OR
      last_event IS NULL OR 
      next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
   e.occurred_at >= session.session_start AND
   e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
WHERE z.runs > 0
GROUP BY user_id) z 
GROUP BY searches
ORDER BY searches
Enter fullscreen mode Exit fullscreen mode

Image description

Image description

  • Within a month after the first full search, 1,177 users used full search once more, and 428 users used it twice.
  • Within a month after the first autocomplete, 1,239 users used autocomplete once more, and 1,003 users used it twice.

Summary of analysis

  • Both autocomplete (222%) and full search(8%) are used in many sessions, so it's worthy to put more resources.
  • 88% of users search more than twice in full search.
  • More than half of users don't click any result.
  • After full search, people used to click several results not only one or two.
  • The order of contents which users click is distributed.
  • It implies that users can't get the results they want easily.

Review

  • If there are not enough data, it can distort the result (especially in ratio)

Top comments (0)