<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rooted</title>
    <description>The latest articles on DEV Community by Rooted (@rooted).</description>
    <link>https://dev.to/rooted</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F478122%2F3d651eab-82af-4794-bced-e118859e2367.png</url>
      <title>DEV Community: Rooted</title>
      <link>https://dev.to/rooted</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rooted"/>
    <language>en</language>
    <item>
      <title>Macros Explained for Java Developers</title>
      <dc:creator>Rooted</dc:creator>
      <pubDate>Sun, 29 Jun 2025 14:45:45 +0000</pubDate>
      <link>https://dev.to/rooted/macros-explained-for-java-developers-k70</link>
      <guid>https://dev.to/rooted/macros-explained-for-java-developers-k70</guid>
      <description>&lt;p&gt;If you’re a Java dev, you’ve probably used or heard of Project Lombok, Jakarta Bean Validation (JSR 380), AutoValue, MapStruct, or Immutables. They all help reduce boilerplate and add declarative magic to your code.&lt;br&gt;
And I’m sure you’ve come across the term “macro”, usually explained in some academic or cryptic way. But here’s the thing: these libraries are simulating macro-like behavior — just without true macro support.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Are Macros Anyway?
&lt;/h2&gt;

&lt;p&gt;In languages like Lisp or Clojure, macros are compile-time programs that transform your code before it runs. They let you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rewrite or generate code&lt;/li&gt;
&lt;li&gt;Build new control structures&lt;/li&gt;
&lt;li&gt;Create entire domain-specific languages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They're basically code that writes code — giving you full control of the compiler pipeline.&lt;/p&gt;


&lt;h2&gt;
  
  
  Java’s “Macro” Workarounds
&lt;/h2&gt;

&lt;p&gt;Java doesn’t support macros. Instead, it uses annotation processors and code generation tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lombok’s &lt;a class="mentioned-user" href="https://dev.to/data"&gt;@data&lt;/a&gt; → generates constructors, getters, and equals()/hashCode()&lt;/li&gt;
&lt;li&gt;Jakarta Bean Validation (&lt;a class="mentioned-user" href="https://dev.to/min"&gt;@min&lt;/a&gt;, &lt;a class="mentioned-user" href="https://dev.to/notblank"&gt;@notblank&lt;/a&gt;) → declarative validation&lt;/li&gt;
&lt;li&gt;AutoValue → immutable value types&lt;/li&gt;
&lt;li&gt;MapStruct → type-safe mappers (my personal favorite)&lt;/li&gt;
&lt;li&gt;Immutables → generates immutable types with builders&lt;/li&gt;
&lt;li&gt;Spring Validation → framework-driven validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are powerful tools — but they can’t create new syntax or change how Java works at its core. They're still working within the language, not extending it.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Real Macros Look Like
&lt;/h2&gt;

&lt;p&gt;In Clojure, you can define a new data structure and its validator in a single macro:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lisp
(defmacro defvalidated
  [name fields validations]
  `(do
     (defrecord ~name ~fields)
     (defn ~(symbol (str "validate-" name)) [~'x]
       (let [errors# (atom [])]
         ~@(for [[field rule] validations]
             `(when-not (~rule (~field ~'x))
                (swap! errors# conj ~(str field " failed validation"))))
         @errors#))))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lisp
(defvalidated User
  [name age]
  {name not-empty
   age #(&amp;gt;= % 18)})

(validate-User (-&amp;gt;User "" 15))
;; =&amp;gt; ["name failed validation" "age failed validation"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No annotations. No libraries. No ceremony.&lt;br&gt;
Just your own language feature, built with a macro.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Java’s toolchain simulates macro-like behavior through annotations and codegen. But if you want to invent language, write less boilerplate, and build smarter abstractions — macros in languages like Clojure or Racket offer the real deal.&lt;/p&gt;

&lt;p&gt;Java gives you a powerful toolkit. Macros give you the power to build your own.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Inspired by Paul Graham's essay collection "Hackers &amp;amp; Painters"&lt;/em&gt;&lt;/p&gt;

</description>
      <category>java</category>
      <category>lisp</category>
      <category>macros</category>
      <category>lombok</category>
    </item>
    <item>
      <title>Simple Browser Tracking</title>
      <dc:creator>Rooted</dc:creator>
      <pubDate>Sat, 03 May 2025 13:00:33 +0000</pubDate>
      <link>https://dev.to/rooted/simple-browser-tracking-7kk</link>
      <guid>https://dev.to/rooted/simple-browser-tracking-7kk</guid>
      <description>&lt;p&gt;⚠️ Only the technical part is explained. If you care about legality, you're on your own. At the time of writing, this method works in Firefox and Chrome.&lt;/p&gt;

&lt;h2&gt;
  
  
  Browser Tracking with Fingerprints
&lt;/h2&gt;

&lt;p&gt;Tracking users is a touchy topic. Should you rely on screen size? Favicon loading hacks (&lt;a href="https://github.com/jonasstrehle/supercookie" rel="noopener noreferrer"&gt;like this one&lt;/a&gt;)? Or something more exotic?&lt;/p&gt;

&lt;p&gt;Honestly, it depends. What level of accuracy do you need? How much time are you willing to sink into it? There's a sea of FOSS libraries and SaaS platforms out there, but sometimes you don’t want the whole enterprise-grade circus—just a quick way to know if "User A" today is the same "User A" from last week. Ideally, it should also be low-maintenance and not break every time a browser sneezes.&lt;/p&gt;

&lt;p&gt;So here's a dead-simple way to track users using browser fingerprinting. It’s not perfect, but it’s light, easy to implement, and does the job for a lot of use cases.&lt;/p&gt;

&lt;p&gt;We're using &lt;a href="https://github.com/Rajesh-Royal/Broprint.js" rel="noopener noreferrer"&gt;Broprint.js&lt;/a&gt; — a tiny browser fingerprinting library that gives you a unique(ish) identifier based on a bunch of properties like canvas fingerprinting, user agent, timezone, etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  🖥️ Client Side
&lt;/h2&gt;

&lt;p&gt;Add this snippet to your frontend to grab a fingerprint and send it off somewhere. We’re using a &lt;a href="//corsproxy.io"&gt;CORS proxy&lt;/a&gt; and a GET request for simplicity—mostly because browsers these days don't like cross-origin POSTs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getCurrentBrowserFingerPrint&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@rajesh896/broprint.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;sendFingerprintToServer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fingerprint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getCurrentBrowserFingerPrint&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userAgent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;proxyUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://corsproxy.io/?&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apiUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://script.google.com/macros/s/CHANGE_ME/exec&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;encodedUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;apiUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;?id=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;userAgent=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userAgent&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fullUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;proxyUrl&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;encodedUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;xhr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;XMLHttpRequest&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fullUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;xhr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Error sending fingerprint to server:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You could expand this to send fingerprints on every click, scroll, or form submit if you're feeling fancy. Right now, we’re just capturing the browser fingerprint and user agent. If you start adding resolution, timezone, device memory, etc., you'll get better accuracy—especially if you later combine this with some ML to group behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  🗃️ Server Side (Google Sheets)
&lt;/h2&gt;

&lt;p&gt;Google Sheets — the poor man’s database that actually works. Here’s how to catch that fingerprint data and dump it into a spreadsheet using Google Apps Script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;ssID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;CHANGE_ME&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;sheetLog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Log&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;doGet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Append the data to the spreadsheet&lt;/span&gt;
    &lt;span class="nx"&gt;SpreadsheetApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ssID&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;getSheetByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sheetLog&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;appendRow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parameter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parameter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userAgent&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="c1"&gt;// Return a simple response&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;ContentService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createTextOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Log error details to the spreadsheet&lt;/span&gt;
    &lt;span class="nx"&gt;SpreadsheetApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ssID&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;getSheetByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sheetLog&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;appendRow&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stack&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No stack&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No message&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="c1"&gt;// Return an error response&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;ContentService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createTextOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a browser visits your site, you’ll log the timestamp, fingerprint ID, and user agent. If something goes wrong, the error gets logged too. Good enough for debugging.&lt;/p&gt;

&lt;h2&gt;
  
  
  📈 Basic Analysis
&lt;/h2&gt;

&lt;p&gt;Later we can analyze logs on all different way, one simple analysis can be number of visits per month.&lt;br&gt;
For that we create another sheet and use following formula to map timestamps from Log sheet into here with only month and year.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;=IF(Log!A1&amp;lt;&amp;gt;"";TEXT(Log!A1; "MM/YYYY"))&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg3mtik48p2qpn1eglk3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg3mtik48p2qpn1eglk3.png" alt="Visits per month" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🤔 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This isn’t foolproof tracking. If someone switches browsers, disables JavaScript, or uses a hardened privacy setup (Brave), they’ll slip through. But for casual, low-friction tracking, it’s surprisingly effective—and you can set it up in under an hour without deploying a single server.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>privacy</category>
      <category>frontend</category>
    </item>
    <item>
      <title>Full-Text Search with Hibernate Search</title>
      <dc:creator>Rooted</dc:creator>
      <pubDate>Sat, 12 Apr 2025 17:58:14 +0000</pubDate>
      <link>https://dev.to/rooted/integrating-full-text-search-with-hibernate-search-in-a-java-application-36jh</link>
      <guid>https://dev.to/rooted/integrating-full-text-search-with-hibernate-search-in-a-java-application-36jh</guid>
      <description>&lt;p&gt;This article demonstrates full-text search integration using Hibernate Search in a &lt;strong&gt;Java 8+&lt;/strong&gt; application with &lt;strong&gt;Hibernate ORM&lt;/strong&gt; for relational database storage.&lt;/p&gt;

&lt;p&gt;The first section (&lt;strong&gt;Basics&lt;/strong&gt;) gives a high level overview, and the second section (&lt;strong&gt;DEMO&lt;/strong&gt;) provides example project and explains it's crucial parts. The project presents some more complex use case scenario, using custom &lt;strong&gt;analyzers&lt;/strong&gt;, &lt;strong&gt;edgeNgram&lt;/strong&gt;, and larger &lt;strong&gt;projections&lt;/strong&gt;. Simple examples are omitted, as they can be found in the official Hibernate documentation.&lt;/p&gt;

&lt;p&gt;⚠️ This is not a one-size-fits-all solution for every full-text search requirement. Hibernate Search is optimized for handling large datasets and high-throughput applications. Also, the extra resource greedy search engine is required. For different use cases, alternatives like client-side search or PostgreSQL's full-text search capabilities might be more suitable. However, these approaches are beyond the scope of this article.&lt;/p&gt;

&lt;h1&gt;
  
  
  Basics
&lt;/h1&gt;

&lt;p&gt;This guide is based on Hibernate Search 6.1 documentation. For additional details, refer to the &lt;a href="https://docs.jboss.org/hibernate/search/6.1/reference/en-US/html_single/?v=6.1" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Hibernate Search?
&lt;/h2&gt;

&lt;p&gt;Implementing full-text search in an application can be challenging, but Hibernate Search simplifies the process by offering a built-in solution that requires minimal configuration. It seamlessly integrates with powerful search engines like &lt;a href="https://www.elastic.co/" rel="noopener noreferrer"&gt;Elasticsearch&lt;/a&gt; and &lt;a href="https://lucene.apache.org/" rel="noopener noreferrer"&gt;Lucene&lt;/a&gt;, enabling efficient and scalable search capabilities.&lt;/p&gt;

&lt;p&gt;In the diagram below, the blue section represents a typical application that uses Hibernate ORM to interact with a relational database. The red section highlights the additional infrastructure required to enable full-text search with Hibernate Search.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxldf9cm3rbn05aht6wr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxldf9cm3rbn05aht6wr.png" alt="Image of ORM integration with DB and full-text search infrastructure"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, introducing a search engine also means dealing with data synchronization between the database and the search index.&lt;/p&gt;

&lt;h2&gt;
  
  
  Synchronization Challenges &amp;amp; Solutions
&lt;/h2&gt;

&lt;p&gt;Since Hibernate Search maintains a separate index in Elasticsearch, data must be kept synchronized. The default solution is automatic synchronization, which replicates all database modifications to the search index in real-time. However, for some use cases, automatic synchronization may not be optimal. Instead, batch synchronization (e.g., updating once a day) can be more efficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying the Data
&lt;/h2&gt;

&lt;p&gt;Once the data is indexed and synchronized, Hibernate Search provides two primary ways to execute search queries:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Query Elasticsearch to retrieve only indexes, then fetch corresponding data from the database.&lt;/li&gt;
&lt;li&gt;Query Elasticsearch to retrieve data directly, without additional database queries (using projections).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For a more detailed explanation, checkout the next presentation:&lt;br&gt;
  &lt;iframe src="https://www.youtube.com/embed/Q4PMC3QgYBw"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;




&lt;h1&gt;
  
  
  DEMO - Spring Boot Application with Full-Text Search
&lt;/h1&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/Netz00" rel="noopener noreferrer"&gt;
        Netz00
      &lt;/a&gt; / &lt;a href="https://github.com/Netz00/hibernate-search-6-example" rel="noopener noreferrer"&gt;
        hibernate-search-6-example
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Simple Spring Boot application demonstrating Hibernate Search 6 advanced usage
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/8f4297a8faca818e889aabe61cef80067d2a24da6e6bf0dcc511d6fb4d4fb627/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f537072696e67253230426f6f742d3644423333463f7374796c653d666f722d7468652d6261646765266c6f676f3d737072696e67626f6f74266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/8f4297a8faca818e889aabe61cef80067d2a24da6e6bf0dcc511d6fb4d4fb627/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f537072696e67253230426f6f742d3644423333463f7374796c653d666f722d7468652d6261646765266c6f676f3d737072696e67626f6f74266c6f676f436f6c6f723d7768697465" alt="Spring Boot"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/e68ab02ee45c668c7ce5c4f6614045884761b475936c16525be396b3d24d757e/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f456c61737469637365617263682d3030353537313f7374796c653d666f722d7468652d6261646765266c6f676f3d656c6173746963736561726368266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/e68ab02ee45c668c7ce5c4f6614045884761b475936c16525be396b3d24d757e/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f456c61737469637365617263682d3030353537313f7374796c653d666f722d7468652d6261646765266c6f676f3d656c6173746963736561726368266c6f676f436f6c6f723d7768697465" alt="Elasticsearch"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/85e3ff712bb08b8e5595b34ecddfd189a51b20f61988aa467a56c5da9a107dda/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446f636b65722d3234393645443f7374796c653d666f722d7468652d6261646765266c6f676f3d646f636b6572266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/85e3ff712bb08b8e5595b34ecddfd189a51b20f61988aa467a56c5da9a107dda/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446f636b65722d3234393645443f7374796c653d666f722d7468652d6261646765266c6f676f3d646f636b6572266c6f676f436f6c6f723d7768697465" alt="Docker"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/9ea1f6a2f978fc1168b7b44509fd6cbd1812defbbd88a8bf6a036a3ae2acec6c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446f636b6572253230436f6d706f73652d3234393645443f7374796c653d666f722d7468652d6261646765266c6f676f3d646f636b6572266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/9ea1f6a2f978fc1168b7b44509fd6cbd1812defbbd88a8bf6a036a3ae2acec6c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446f636b6572253230436f6d706f73652d3234393645443f7374796c653d666f722d7468652d6261646765266c6f676f3d646f636b6572266c6f676f436f6c6f723d7768697465" alt="Docker Compose"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Hibernate Search 6 Example&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;Simple Spring Boot application demonstrating Hibernate Search 6 usage with Elasticsearch.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/" rel="nofollow noopener noreferrer"&gt;Hibernate Search 6.1.7.Final: Reference Documentation&lt;/a&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Example app&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;ER diagram:&lt;/p&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/Netz00/hibernate-search-6-example/./documentation/ERD.influncers.drawio.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FNetz00%2Fhibernate-search-6-example%2F.%2Fdocumentation%2FERD.influncers.drawio.png" alt="ER diagram"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Full text search is available for Freelancer and Project entities.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Indexing Entities&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/main/java/com/netz00/hibernatesearch6example/model/Project.java" rel="noopener noreferrer"&gt;Project&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/main/java/com/netz00/hibernatesearch6example/model/Freelancer.java" rel="noopener noreferrer"&gt;Freelancer&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/main/java/com/netz00/hibernatesearch6example/model/Category.java" rel="noopener noreferrer"&gt;Freelancer categories&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;3 search examples:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/main/java/com/netz00/hibernatesearch6example/services/ProjectServiceImpl.java" rel="noopener noreferrer"&gt;searchProjectsEntities&lt;/a&gt;
demonstrates basic full text search of projects by&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;project name&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/main/java/com/netz00/hibernatesearch6example/services/ProjectServiceImpl.java" rel="noopener noreferrer"&gt;searchProjects&lt;/a&gt;
demonstrates previous example with projections usage&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/main/java/com/netz00/hibernatesearch6example/services/FreelancerServiceImpl.java" rel="noopener noreferrer"&gt;searchFreelancers&lt;/a&gt;
demonstrates full text search of freelancers (with projections) by&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;username&lt;/li&gt;
&lt;li&gt;first name&lt;/li&gt;
&lt;li&gt;last name&lt;/li&gt;
&lt;li&gt;categories (M:N relationship)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Creating custom
&lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/main/java/com/netz00/hibernatesearch6example/config/MyElasticsearchAnalysisConfigurer.java" rel="noopener noreferrer"&gt;edgeNgram analyser&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Running the Application&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;Before starting the Spring Boot application, ensure that the necessary Docker containers are running.&lt;/p&gt;
&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;docker compose -f deployment/docker-compose-dev.yaml up -d&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Running Tests:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;newman run ./backend/src/test/postman/Hibernate-search-6-example.postman_collection.json -e ./backend/src/test/postman/Test&lt;span class="pl-cce"&gt;\ &lt;/span&gt;Environment.postman_environment.json --reporters cli,json --reporter-json-export ./backend/src/test/postman/output/outputfile.json&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Import postman collection from &lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/test/postman/Hibernate-search-6-example.postman_collection.json" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Elasticsearch browser extension: &lt;a href="https://elasticvue.com/" rel="nofollow noopener noreferrer"&gt;https://elasticvue.com/&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Extras&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/main/java/com/netz00/hibernatesearch6example/repository/FreelancerRepository.java" rel="noopener noreferrer"&gt;Paginated fetching of child entities over parent entity at unidirectional OneToMany relationship&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/a/46055857" rel="nofollow noopener noreferrer"&gt;CREDITS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Netz00/hibernate-search-6-example/./backend/src/main/java/com/netz00/hibernatesearch6example/model/mapper/FreelancerMapper.java" rel="noopener noreferrer"&gt;Mapping only required fields with MapStruct by defining&lt;/a&gt;…&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/Netz00/hibernate-search-6-example" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;This example shows how Hibernate Search fits into a Spring Boot architecture, covering everything from controllers to the search engine and back, including handling real life scenarios. The following section explains crucial parts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Full-Text Search
&lt;/h2&gt;

&lt;p&gt;For development, we configure Elasticsearch as a single-node cluster running on the same server as the application, single backend configuration. RAM usage is limited to prevent excessive memory consumption. You can use &lt;a href="https://elasticvue.com/" rel="noopener noreferrer"&gt;ElasticVue&lt;/a&gt; to explore your data. Also, it is good practice to secure Elasticsearch by enabling security and providing password for default user “elastic”. &lt;strong&gt;Advanced security options are not included in the free Security functionality.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;ℹ️ Hibernate container configuration, maven dependencies and hibernate configuration can be found in repository.&lt;/p&gt;

&lt;h3&gt;
  
  
  Indexing Entities
&lt;/h3&gt;

&lt;p&gt;Which data should be indexed? Hibernate Search offers annotations that allow developers to control this behavior.&lt;/p&gt;

&lt;p&gt;To index an entity, annotate the class with &lt;code&gt;@Indexed(index = "index_name")&lt;/code&gt;. Following annotation will create an empty index inside Elasticsearch with name &lt;code&gt;idx_comment&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Entity&lt;/span&gt;
&lt;span class="nd"&gt;@NoArgsConstructor&lt;/span&gt;
&lt;span class="nd"&gt;@AllArgsConstructor&lt;/span&gt;
&lt;span class="nd"&gt;@ToString&lt;/span&gt;
&lt;span class="nd"&gt;@Getter&lt;/span&gt;
&lt;span class="nd"&gt;@Setter&lt;/span&gt;
&lt;span class="nd"&gt;@Table&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"comment"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@Indexed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"idx_comment"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Comment&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@Id&lt;/span&gt;
    &lt;span class="nd"&gt;@GeneratedValue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GenerationType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;SEQUENCE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sequenceGenerator"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nd"&gt;@SequenceGenerator&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sequenceGenerator"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nd"&gt;@Column&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In order to map entity properties into index fields they also need to be annotated. Multiple annotations on same entity property are allowed. Following entity properties annotations will be explained:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;@FullTextField&lt;/code&gt; – For analyzed text fields (supports tokenization and filtering).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@KeywordField&lt;/code&gt; – For exact match searches and sorting.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@GenericField&lt;/code&gt; – For other data types like Long or Date.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@IndexedEmbedded&lt;/code&gt; – For nested objects (e.g., searching Students by Course name).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  @FullTextField
&lt;/h4&gt;

&lt;p&gt;Works only with String and configures field as text. Text will be analyzed before indexing or searching. Analyzers consists of tokenizer and filters. Tokenizer splits the string to substring which are then processed by filters. That means before indexing, string "Thinking in Java" will be tokenized to ["Thinking", "in", "Java"] and then several filters can be applied, such as lowercase all chars or remove stop words… Then while searching "same steps" will be repeated on query. It is possible to configure different analyzers for indexing and for searching through configuration. Finally if user searched for "Learning Java" it will be tokenized to ["Learning", "Java"] and "Java" will match stored "Java" (Thinking in Java) which will be considered as match and “Thinking in Java” will be returned as result! Text fields can’t be sorted but the following annotation solves that problem (keyword).&lt;/p&gt;

&lt;p&gt;It is possible to make custom analyzers combining specific tokenizer and filters. Except whitespace tokenizer and lowercase filter there are many others available &lt;a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.16/analysis-analyzers.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu95nh2wo92bdfwvxmb24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu95nh2wo92bdfwvxmb24.png" alt="Analyzer flow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  @KeywordField
&lt;/h4&gt;

&lt;p&gt;Works only with String and configures the field as a keyword. On keyword fields only normalizers can be applied (no analyzers). Normalizers are similar to analyzers but without tokenizing.&lt;br&gt;
That means before indexing, the string “Thinking in Java” can only be normalized and will be stored as a single keyword. Also, while searching, the term will be also normalized and the previous example wouldn’t match. This type is useful for sorting operation. Also we can combine keyword and fulltext field on same field.&lt;/p&gt;
&lt;h4&gt;
  
  
  @GenericField
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;A good default choice that will work for every property type with built-in support.&lt;br&gt;
In the example it is used for Date and Long (primary key).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;
  
  
  @IndexedEmbedded
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;@IndexedEmbedded&lt;/code&gt; annotation is used to include fields from associated entities in the search index of the owning entity. This enables searching across nested object fields.&lt;/p&gt;

&lt;p&gt;For example, consider an entity &lt;code&gt;Student&lt;/code&gt; with a &lt;code&gt;@ManyToMany&lt;/code&gt; association to a &lt;code&gt;Course&lt;/code&gt; entity. By using &lt;code&gt;@IndexedEmbedded&lt;/code&gt; on the &lt;code&gt;courses&lt;/code&gt; field, you can perform a search for &lt;code&gt;Student&lt;/code&gt; entities based on the name of the associated &lt;code&gt;Course&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This annotation works with various types of associations, including &lt;code&gt;@OneToOne&lt;/code&gt;, &lt;code&gt;@OneToMany&lt;/code&gt;, and &lt;code&gt;@ManyToMany&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It is not necessary to annotate the associated (nested) entity with &lt;code&gt;@Indexed&lt;/code&gt;, unless you also want to index it independently. The following example demonstrates this usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Indexed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"idx_student"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Student&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="nd"&gt;@ManyToMany&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cascade&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CascadeType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ALL&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FetchType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;LAZY&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nd"&gt;@JoinTable&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"student_courses"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;joinColumns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;@JoinColumn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"student_id"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; 
    &lt;span class="n"&gt;inverseJoinColumns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;@JoinColumn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"course_id"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="nd"&gt;@IndexedEmbedded&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"courses"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;includePaths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"name"&lt;/span&gt;&lt;span class="o"&gt;})&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Course&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;courses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HashSet&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Course&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="nd"&gt;@Column&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"name"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nd"&gt;@KeywordField&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"name"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"lowercase"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;projectable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Projectable&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;YES&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hibernate Search automatically detects whether an entity should be reindexed at field level. For example, updating non-indexed fields does not trigger reindexing, which optimizes performance.&lt;/p&gt;

&lt;p&gt;More annotations and explanations can be found &lt;a href="https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#mapper-orm-directfieldmapping" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling Sync Issues with MassIndexer
&lt;/h3&gt;

&lt;p&gt;In edge cases, such as I/O failures after data is stored in database, the database and search index may go out of sync. One solution is to use &lt;a href="https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#search-batchindex-massindexer" rel="noopener noreferrer"&gt;MassIndexer&lt;/a&gt;, which reindexes all data.&lt;/p&gt;

&lt;p&gt;In the example project, this process is automated via a scheduled job, ensuring that data remains in sync.&lt;/p&gt;

&lt;h3&gt;
  
  
  Searching with Hibernate Search
&lt;/h3&gt;

&lt;p&gt;There are two main ways to fetch search results:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Fetching Data Directly from Elasticsearch (Used in DEMO)
&lt;/h3&gt;

&lt;p&gt;This approach uses projections and skips the database, retrieving only indexed data. It requires adding the &lt;code&gt;projectable = Projectable.YES&lt;/code&gt; property to the annotated fields.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7h6rbeoul4cq26g9fzj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7h6rbeoul4cq26g9fzj.png" alt="Projections"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster search results.&lt;/li&gt;
&lt;li&gt;Reduces database load.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data stored in Elasticsearch must be structured properly.&lt;/li&gt;
&lt;li&gt;More complex implementation (requires extra mappings and domain objects).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Fetching Indexes First, Then Retrieving Data from DB
&lt;/h4&gt;

&lt;p&gt;In this method, only the entity IDs are retrieved from Elasticsearch, and the actual data is fetched from the database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnig0ra96pewb1g3qvkgy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnig0ra96pewb1g3qvkgy.png" alt="Classic search"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensures database integrity.&lt;/li&gt;
&lt;li&gt;Simpler to implement.&lt;/li&gt;
&lt;li&gt;Indexing only required fields, and letting the database handle the rest can result with performance improvement (Search engines are optimized for searching, not for updating)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requires an additional database round-trip.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thank you for reading! 🚀&lt;/p&gt;

</description>
      <category>java</category>
      <category>hibernate</category>
      <category>elasticsearch</category>
      <category>springboot</category>
    </item>
    <item>
      <title>Web scraping: Silent and Maintainable</title>
      <dc:creator>Rooted</dc:creator>
      <pubDate>Fri, 28 Mar 2025 19:39:23 +0000</pubDate>
      <link>https://dev.to/rooted/web-scraper-tests-44na</link>
      <guid>https://dev.to/rooted/web-scraper-tests-44na</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;With size, complexity emerges.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Silent Scraping
&lt;/h2&gt;

&lt;p&gt;While writing the scraper, we will first hide behind a VPN or proxy. Then we are going to scrape the target a significant number of times until we are satisfied with the results. But in the meantime, we’ll get blocked, then try another IP—which doesn't work... Then some sunbeam will hit the Lava Lamp in Cloudflare, and we’ll start receiving captchas... Solving a problem that maybe doesn’t even need solving. Why? Because during development, we’ll mostly abuse the targeted site, while in production, scraping might only run peacefully once a day. Also, our free proxy or VPN will throttle, causing delays for each execution.&lt;/p&gt;

&lt;p&gt;This issue can be easily solved by caching the site during development. A scraping framework such as Scrapy already includes caching out-of-the-box. Otherwise, we could use Nginx to cache our requests.&lt;/p&gt;

&lt;p&gt;This is a straightforward way to develop your scraper without raising red flags with suspicious requests while adjusting headers to circumvent anti-scraping measures. Also, the site data is cached locally—no more network issues or delays.&lt;/p&gt;

&lt;h2&gt;
  
  
  Maintainable scraper
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Regression Tests
&lt;/h3&gt;

&lt;p&gt;Using a cached version brings another benefit: the site data becomes immutable, and it’s a lot easier to hit a static target. If it moves, we can patch the code in the next version, but during development it won’t change and won’t cause more bugs than we already have.&lt;/p&gt;

&lt;p&gt;This approach can be extended to the testing level. Let's store the current version of the scraper along with the site data it works with in VCS.&lt;/p&gt;

&lt;p&gt;That way, maintaining the scraper when the target site changes becomes easier. We simply diff (automated or manual) the stored version of the site against the live one. From the diff, we know what changed—and where to fix the scraper. Finally, we store the expected results, and we’ve got ourselves a regression test suite.&lt;/p&gt;

&lt;p&gt;Of course, this increases the complexity of the scraper and requires extra effort upfront. In the case of Scrapy, this can be done in such an elegant way that the added complexity is manageable. But ultimately, it depends on the context and the answers to key questions—such as the estimated lifetime of the scraper, number of targets, scraping frequency, and how often the targeted site changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mitigating pre-rendering JS
&lt;/h3&gt;

&lt;p&gt;However, in real life, we’re mostly dealing with fat clients using client-side rendering. Pre-rendered sites are either ancient relics or cutting-edge setups optimized for SEO (and scraping).&lt;/p&gt;

&lt;p&gt;The fail-safe—but also most expensive—approach is using headless browsers. But rendering that mess is slow, resource-hungry, and most importantly, often avoidable.&lt;/p&gt;

&lt;p&gt;We can often skip full JS rendering by simply fetching only the data we need. A basic analysis of the requests the site makes will quickly reveal the ones we're interested in scraping.&lt;/p&gt;

&lt;p&gt;It might take a few steps to get there—for example, we first extract IDs from &lt;code&gt;URI_1&lt;/code&gt;, then generate a list of endpoints like [&lt;code&gt;URI_2_ID_1&lt;/code&gt;, &lt;code&gt;URI_2_ID_2&lt;/code&gt;, ...] to fetch the actual data.&lt;/p&gt;

&lt;p&gt;Some may argue this is more fragile than rendering the site and scraping the DOM. But I don’t see a strong reason why API endpoints would change more frequently than the HTML selectors in the rendered case. We're also closer to the actual data source, which means fewer moving parts and less that could break the scraper.&lt;/p&gt;

&lt;p&gt;Source - &lt;a href="https://docs.scrapy.org/en/latest/topics/dynamic-content.html" rel="noopener noreferrer"&gt;Mitigating pre-rendering JS&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scrapy Implementation Example
&lt;/h2&gt;

&lt;p&gt;One solution is to use a &lt;code&gt;downloader middleware&lt;/code&gt;. This way, our spiders don’t need to be aware of it. The spider requests &lt;code&gt;https://dev.to&lt;/code&gt;, and inside the &lt;code&gt;downloader middleware&lt;/code&gt;, we simply map that URL to a local file where the site is stored and forward the request.&lt;/p&gt;

&lt;p&gt;This setup can be extended by using an &lt;code&gt;env&lt;/code&gt; variable for &lt;code&gt;dev&lt;/code&gt; / &lt;code&gt;prod&lt;/code&gt; modes, allowing us to include the middleware conditionally in the settings.&lt;/p&gt;

&lt;p&gt;Storing the site can be as simple as using CTRL + S, or handled through an extra mode like &lt;code&gt;init&lt;/code&gt;, which scrapes the sites and saves them with filenames mapped from their URLs.&lt;/p&gt;

&lt;p&gt;Now, one could simply extend this with tests and the init mode if necessary, but explaining that wouldn't add much value at this point. Stopping here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11rbuesofhnr0kctfl99.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11rbuesofhnr0kctfl99.png" alt="LocalFileDownloaderMiddleware" width="800" height="514"&gt;&lt;/a&gt;&lt;br&gt;Downloader Middleware example
  &lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foocjv83jse48ncsicsmc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foocjv83jse48ncsicsmc.png" alt="Mapper function" width="800" height="463"&gt;&lt;/a&gt;&lt;br&gt;Mapper function example
  &lt;/p&gt;




</description>
      <category>python</category>
      <category>webscraping</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
