<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: matheus-b</title>
    <description>The latest articles on DEV Community by matheus-b (@matheus_b).</description>
    <link>https://dev.to/matheus_b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F334533%2F5bb0d291-ece1-4971-b13c-df62fb27c395.jpeg</url>
      <title>DEV Community: matheus-b</title>
      <link>https://dev.to/matheus_b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/matheus_b"/>
    <language>en</language>
    <item>
      <title>Full Rich Text Search with Slate and PostgreSQL 12</title>
      <dc:creator>matheus-b</dc:creator>
      <pubDate>Wed, 12 Feb 2020 10:14:47 +0000</pubDate>
      <link>https://dev.to/charperbonaroo/full-rich-text-search-with-slate-and-postgresql-12-1i69</link>
      <guid>https://dev.to/charperbonaroo/full-rich-text-search-with-slate-and-postgresql-12-1i69</guid>
      <description>&lt;p&gt;The following approach will work for all JSON content but was used here for full text search on content created with &lt;a href="https://slatejs.org"&gt;Slate&lt;/a&gt;. In the format Slate produces by default all plain text content is conveniently placed in fields with key &lt;code&gt;text&lt;/code&gt;. Slate can emit plain text but here all content will be stored in a &lt;a href="https://postgresql.org"&gt;PostgreSQL&lt;/a&gt; &lt;code&gt;jsonb&lt;/code&gt; column and we will be doing full texts search using a functional GIN index directly.&lt;/p&gt;

&lt;p&gt;Postgres can produce &lt;code&gt;tsvector&lt;/code&gt;s (the format postgres uses to store text search data) on actual json data with the &lt;code&gt;json(b)_to_tsvector&lt;/code&gt; function, but this will index &lt;em&gt;all&lt;/em&gt; values in the json document, so we need to do some filtering first. Postgres 12 introduced the SQL/JSON Path Language machinery which exactly allows for this.&lt;/p&gt;

&lt;p&gt;Let's consider the following &lt;code&gt;content&lt;/code&gt; stored in our &lt;code&gt;documents&lt;/code&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
    {
        "data": {
        },
        "type": "introduction",
        "nodes": [
            {
                "text": "This is a test document",
                "marks": [
                ],
                "object": "text"
            }
        ],
        "object": "block"
    },
    {
        "data": {
        },
        "type": "paragraph",
        "nodes": [
            {
                "text": "Just to show what content format we might have.",
                "marks": [
                ],
                "object": "text"
            }
        ],
        "object": "block"
    },
    {
        "data": {
        },
        "type": "conclusion",
        "nodes": [
            {
                "text": "I hope you get the general idea.",
                "marks": [
                ],
                "object": "text"
            }
        ],
        "object": "block"
    }
]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Jumping right to the good bit, &lt;code&gt;jsonb_path_query_array(content, 'strict $.**.text')&lt;/code&gt; will produce &lt;code&gt;jsonb&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;["This is a test document", "Just to show what content format we might have.", "I hope you get the general idea."].
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Note the &lt;code&gt;strict&lt;/code&gt; keyword here, leaving it out will produce:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;["This is a test document", "This is a test document", "Just to show what content format we might have.", "Just to show what content format we might have.", "I hope you get the general idea.", "I hope you get the general idea."].
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;According to the &lt;a href="https://www.postgresql.org/docs/12/functions-json.html%5D"&gt;documentation&lt;/a&gt; &lt;code&gt;strict&lt;/code&gt; mode should throw actual errors when a non-existing path is queried (which does not happen because of the explicit &lt;code&gt;**&lt;/code&gt; expansion (otherwise one might use &lt;code&gt;'strict $.** ? (exists(@.text)).text'&lt;/code&gt;)), but it seems to me &lt;code&gt;lax&lt;/code&gt; mode should not return double results. Anyway, whatever works.&lt;/p&gt;

&lt;p&gt;Note that&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select jsonb_path_query(content, 'strict $.** ? (exists(@.text))')-&amp;gt;&amp;gt;'text' from documents;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;produces nice rows of plain &lt;code&gt;text&lt;/code&gt; content (as opposed to &lt;code&gt;jsonb-text&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;We can now do a full text search of our &lt;code&gt;documents&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select * from documents where
    jsonb_to_tsvector('english',
                      jsonb_path_query_array(content, 'strict $.**.text'),
                      '["string"]')
    @@ to_tsquery('belgian &amp;amp; beer');
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Note that &lt;code&gt;json(b)_to_tsvector&lt;/code&gt; actually generates a &lt;code&gt;tsvector&lt;/code&gt; for every element in our json array and then concatenates these, this might not be what we want. Results do look a bit different, with our original content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;to_tsvector('english', jsonb_path_query_array(content, 'strict $.**.text')::text)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;gives&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'content':10 'document':5 'format':11 'general':20 'get':18 'hope':16 'idea':21 'might':13 'show':8 'test':4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;and&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;jsonb_to_tsvector('english', jsonb_path_query_array(content, 'strict $.**.text'), '["string"]')
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;gives&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'content':11 'document':5 'format':12 'general':22 'get':20 'hope':18 'idea':23 'might':14 'show':9 'test':4
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;A quick check does not show any difference in performance.&lt;/p&gt;

&lt;p&gt;To speed things up and allow for split second full rich text search on &lt;em&gt;millions&lt;/em&gt; of records create a functional index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE INDEX ON paper USING gin(jsonb_to_tsvector('english', jsonb_path_query_array(content, 'strict $.**.text'), '["string"]'));
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Note that you need to exactly reproduce the expression used in your index in your query to make use of this.&lt;/p&gt;

&lt;p&gt;And that's that!&lt;/p&gt;

&lt;p&gt;If you want to know more or have any other questions, please get in touch with us via &lt;a href="mailto:info@bonaroo.nl"&gt;info@bonaroo.nl&lt;/a&gt;&lt;/p&gt;

</description>
      <category>psql</category>
      <category>postgres</category>
      <category>search</category>
    </item>
  </channel>
</rss>
