<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sheila Loekito</title>
    <description>The latest articles on DEV Community by Sheila Loekito (@sloekito).</description>
    <link>https://dev.to/sloekito</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1995384%2F729bd23b-ba7f-4ef7-9a5e-192e9c6f58ae.jpeg</url>
      <title>DEV Community: Sheila Loekito</title>
      <link>https://dev.to/sloekito</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sloekito"/>
    <language>en</language>
    <item>
      <title>How We Build User Search at Outside: A Case Study in Database Indexing</title>
      <dc:creator>Sheila Loekito</dc:creator>
      <pubDate>Tue, 03 Sep 2024 19:19:45 +0000</pubDate>
      <link>https://dev.to/sloekito/how-we-build-user-search-at-outside-a-case-study-in-database-indexing-35f7</link>
      <guid>https://dev.to/sloekito/how-we-build-user-search-at-outside-a-case-study-in-database-indexing-35f7</guid>
      <description>&lt;h3&gt;
  
  
  Intro
&lt;/h3&gt;

&lt;p&gt;In our effort to create user search functionality for &lt;a href="https://www.skimag.com/news/outside-activity-feed/" rel="noopener noreferrer"&gt;Outside Activity Feed&lt;/a&gt;, we needed to implement a robust and scalable search mechanism. Our user base, which totals over 11 million users, presented a few challenges for search performance and quality. &lt;/p&gt;

&lt;p&gt;My colleague &lt;a class="mentioned-user" href="https://dev.to/pjhoberman"&gt;@pjhoberman&lt;/a&gt; tipped me to experiment with PostgreSQL’s pg_trgm extension for trigram similarity scoring, which is instrumental in handling fuzzy text matching.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exploring Trigram Similarity
&lt;/h3&gt;

&lt;p&gt;To understand how trigram similarity scoring works, particularly the &lt;code&gt;WORD_SIMILARITY&lt;/code&gt; function, I created a query to visualize the process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;values&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Blair Braverman'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt; 
    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;show_trgm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'blai'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"query trigrams"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;show_trgm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"document trigrams"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="k"&gt;cardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array_intersect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;show_trgm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'blai'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;show_trgm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"common trgms"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;cardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;show_trgm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'blai'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"query trigrams count"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;word_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'blai'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;   
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fodu6cyibtar2pb1d0ohr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fodu6cyibtar2pb1d0ohr.png" alt="Image description" width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This query generates and compares trigrams from both the query string and document to evaluate similarity. &lt;/p&gt;

&lt;p&gt;Your similarity threshold can be checked using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;pg_trgm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;word_similarity_threshold&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Records with a similarity score above this threshold will appear in search results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Getting Trigram Similarity Search to Work
&lt;/h3&gt;

&lt;p&gt;In the first iteration, we implemented a trigram similarity search using the pg_trgm index on relevant fields. The &lt;code&gt;&amp;lt;%&lt;/code&gt; operator is used to compare the similarity between two text values, where a higher similarity score indicates that the values are more alike.&lt;/p&gt;

&lt;p&gt;The query looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
     &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="nv"&gt;"socialprofile"&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="s1"&gt;'Blai'&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="nv"&gt;"socialprofile"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"username"&lt;/span&gt;
    &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="s1"&gt;'Blai'&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="nv"&gt;"socialprofile"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"display_name"&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;WORD_SIMILARITY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Blai'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"socialprofile"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;WORD_SIMILARITY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Blai'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"socialprofile"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"display_name"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of &lt;code&gt;EXPLAIN&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0geyywfgdd6uvjw09w0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0geyywfgdd6uvjw09w0.png" alt="Image description" width="800" height="272"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Indexes were hit, and performance was acceptable as a beta release.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: Addressing New Requirements
&lt;/h3&gt;

&lt;p&gt;As product requirements evolved, the search results needed to include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only users who have acknowledged the privacy policy.&lt;/li&gt;
&lt;li&gt;Displaying users' real first and last names to foster authentic engagement.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To meet these requirements, I created a partial index to optimize queries for users who acknowledged the privacy policy. The query also requires a JOIN from two other tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_profile_privacy_policy_acknowledged_at_not_null&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;privacy_policy_acknowledged_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;privacy_policy_acknowledged_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query leverages the new partial index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;    &lt;span class="k"&gt;SELECT&lt;/span&gt;
       &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt;
        &lt;span class="n"&gt;socialprofile&lt;/span&gt;
    &lt;span class="k"&gt;JOIN&lt;/span&gt;
        &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;socialprofile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;privacy_policy_acknowledged_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;  
    &lt;span class="k"&gt;JOIN&lt;/span&gt;
        &lt;span class="n"&gt;auth_user&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt;
        &lt;span class="s1"&gt;'Blai'&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="n"&gt;socialprofile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;
        &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="s1"&gt;'Blai'&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="n"&gt;auth_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;
        &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="s1"&gt;'Blai'&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="n"&gt;auth_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the partial index, the number of searchable users was reduced to around 2,000+, improving search latency to under 200ms. However, as we introduced Activity Feed to the world, our searchable user base quickly grew to 200,000+. It was apparent that the query does not scale.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17npkhqsnczj3y6so2lg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17npkhqsnczj3y6so2lg.png" alt="Image description" width="800" height="165"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looking back at my query, I realized that while the partial index is being used, I had lost the trigram similarity index!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu78zro8pymb4tp7153oi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu78zro8pymb4tp7153oi.png" alt="Image description" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: Optimization using multi-column indexing and materialized view
&lt;/h3&gt;

&lt;p&gt;I stumbled across this &lt;a href="https://alexklibisz.com/2022/02/18/optimizing-postgres-trigram-search#a-blazing-fast-search-query" rel="noopener noreferrer"&gt;article&lt;/a&gt; where the author has run into a similar issue. Turns out doing multiple trigram similarity comparison on multiple columns is slow, and his solution was to build an index that is a concatenation of the columns that we want to search for. In this case we combine the author's username and first name/last name into the column to search against:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;blair-braverman Blair Braverman
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The author did not use materialized views in the end. However for us, since we query multiple tables, we cannot build the multi-column index without a &lt;a href="https://www.postgresql.org/docs/current/rules-materializedviews.html" rel="noopener noreferrer"&gt;materialized view&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So the materialized view and index creation looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create the materialized view&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;MATERIALIZED&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;mv_search&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ack_profiles&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;privacy_policy_acknowledged_at&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_profile&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;privacy_policy_acknowledged_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
    &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;au&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
    &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;au&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;search_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ack_profiles&lt;/span&gt; &lt;span class="n"&gt;ap&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;socialprofile&lt;/span&gt; &lt;span class="n"&gt;sp&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;sp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;auth_user&lt;/span&gt; &lt;span class="n"&gt;au&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;au&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Add trigram similarity index on the search text&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_mv_search_text_trigram_idx&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;mv_search&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;gist&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_text&lt;/span&gt; &lt;span class="n"&gt;gist_trgm_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query itself is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
 &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;mv_search&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
    &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="n"&gt;search_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One of the downsides of this solution is that search data is not live. This requires us to maintain an extra process to &lt;code&gt;REFRESH&lt;/code&gt; the materialized view. However this is a worthy trade off to make given the performance gains. Currently we are clocking less than 200ms latency. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1kto7o1btaos1umcnxb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1kto7o1btaos1umcnxb.png" alt="Image description" width="800" height="165"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What's next?
&lt;/h3&gt;

&lt;p&gt;To paraphrase a commenter on Stack Overflow, "if your index is slow, you're likely doing something wrong". So far I've tested the solution with about 11M records, and the &lt;code&gt;pg_trgm&lt;/code&gt; index does deliver on its promise. I have seen an &lt;a href="https://dba.stackexchange.com/questions/342011/why-does-order-by-significantly-slow-down-my-query-with-a-computed-score-of-trig" rel="noopener noreferrer"&gt;problem&lt;/a&gt; that we'll need to solve down the road that is caused by &lt;code&gt;ORDER BY&lt;/code&gt; and &lt;code&gt;LIMIT&lt;/code&gt; when the text being searched is too common, for example the name "John". The query seems to struggle with finding the top 100 "John" out of 100,000. I would also like to introduce caching, and play with the similarity search and ranking a little bit more. Thanks for reading and let me know your thoughts!&lt;/p&gt;

</description>
      <category>backenddevelopment</category>
      <category>postgres</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
