<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Artur Daschevici</title>
    <description>The latest articles on DEV Community by Artur Daschevici (@adaschevici).</description>
    <link>https://dev.to/adaschevici</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F361785%2Fc49561cf-b2fc-457f-b21e-0ecae2f9aeef.png</url>
      <title>DEV Community: Artur Daschevici</title>
      <link>https://dev.to/adaschevici</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/adaschevici"/>
    <language>en</language>
    <item>
      <title>Rusty puppets, Websockets and Voyeurism (part II): Driving Chromium in Docker with a Window</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Wed, 19 Nov 2025 09:05:00 +0000</pubDate>
      <link>https://dev.to/adaschevici/rusty-puppets-websockets-and-voyeurism-part-ii-driving-chromium-in-docker-with-a-window-30li</link>
      <guid>https://dev.to/adaschevici/rusty-puppets-websockets-and-voyeurism-part-ii-driving-chromium-in-docker-with-a-window-30li</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;You swapped Chrome → Chromium for better arm64 support, strapped on VNC + noVNC to watch the chaos, made Alpine optional to chase size gains, and pimped a Makefile so everything feels like a dead man's switch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Backstory
&lt;/h2&gt;

&lt;p&gt;In some of my previous experiments I have used browser automation with Chrome to extract info from pages, usually via a wrapper around CDP. CDP has a pretty neat and quite huge API for driving Chrome and Chromium based browsers. Instead of using an existing wrapper, I built my own in Rust with &lt;code&gt;tungstenite&lt;/code&gt; to communicate over WebSocket.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;Ideally what I want is a server style primitive component that is able to speak CDP over a websocket without having to babysit a local browser instance. Normally the way you run most of the chrome automations is by running the process and if you want some transparency into what is happening you flip &lt;code&gt;--headless&lt;/code&gt; flag to false, but this is not what you want, the automation is outside the docker container, the container only holds the browser with the remote debugging port exposed. Something else that tripped me up was that there were no Chrome repos with arm64 builds that i could run on my M1 mac, so I switched to Chromium that supports arm64 better.&lt;/p&gt;

&lt;p&gt;Yes, "headless" is efficient; no, I don't trust it until I can see it wiggle, that is why an extra feature that felt right was having VNC enabled on one of the container variants.&lt;/p&gt;

&lt;h2&gt;
  
  
  What?
&lt;/h2&gt;

&lt;p&gt;Components (lego brick style)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chromium (not Chrome) – better support for arm64 builds, CDP at 9222.&lt;/li&gt;
&lt;li&gt;Socat - port forwarding for CDP, since binding to 0.0.0.0 did not work well with Chromium.&lt;/li&gt;
&lt;li&gt;Xvfb + lightweight WM (fluxbox) – fake display for the VNC stream.&lt;/li&gt;
&lt;li&gt;VNC server (x11vnc) – the eyeballs.&lt;/li&gt;
&lt;li&gt;noVNC + websockify – view it in the browser at :6080.&lt;/li&gt;
&lt;li&gt;supervisord – herd cats (multiple daemons).&lt;/li&gt;
&lt;li&gt;Alpine (optional) – minimal base; trade-offs: fonts, glibc shims, weird edges.&lt;/li&gt;
&lt;li&gt;Rust client – talks to &lt;a href="http://container:9222/json" rel="noopener noreferrer"&gt;http://container:9222/json&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------------------ Docker Container --------------------+
|                                                        |
|  [Xvfb] --- [WM] --- [VNC Server] --- [noVNC]          |
|                         ^             (HTTP 8080)      |
|                         |                              |
|                    screen:0                            |
|                                                        |
|  [Chromium --headless=new --remote-debugging-port=9222]|
|                                         (WS:9222)      |
|                                ^                       |
|                                |                       |
|                            [Socat proxy]               |
+--------------------------------------------------------+

 Outside:
   Rust (chromiumoxide/others) --&amp;gt; http://host:&amp;lt;exposed-socat-port&amp;gt;
   Human -&amp;gt; http://host:6080 (noVNC)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;I.  Image strategy (two by two):&lt;/p&gt;

&lt;p&gt;Essentially what I wanted was efficiency at the container level and at the browser level. I applied that in practice by using &lt;code&gt;alpine&lt;/code&gt; for smaller image size and running headless for better runtime efficiency. However running blind is kinda hard to introspect so I added two additional images with VNC support and those came with both &lt;code&gt;alpine&lt;/code&gt; and &lt;code&gt;debian&lt;/code&gt; flavors.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Debian/Ubuntu base: fewer papercuts, bigger image, fastest to "it works."&lt;/li&gt;
&lt;li&gt;Alpine base: smallest, but bring your own fonts, codecs, and glibc cuddles.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Pro tip: start Debian for DX, ship Alpine once you tame fonts/codecs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;II.  Dockerfile (there is a common base between &lt;code&gt;debian&lt;/code&gt; and &lt;code&gt;alpine&lt;/code&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;#....&lt;/span&gt;
&lt;span class="c"&gt;# Default ports&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; CHROME_PORT=9222&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; SOCAT_PORT=9224&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; VNC_PORT=5900&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NOVNC_PORT=6080&lt;/span&gt;

&lt;span class="c"&gt;#....packages common to both debian and alpine&lt;/span&gt;
  vim \ # want to edit stuff?
  chromium \ # duh
  socat \ # port forwarding for the 9222 remote debugging port
  curl \ # healthcheck and manual testing
  net-tools \ # netstat useful for listing open ports
  iproute2 \ # ss useful for listing open ports
  ca-certificates \
  procps \


#.... Install VNC and GUI components
  supervisor \
  xvfb \
  x11vnc \
  websockify \
  fluxbox \
  git \

#....

# Install noVNC from source (most reliable method)
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /opt &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  git clone &lt;span class="nt"&gt;--depth&lt;/span&gt; 1 https://github.com/novnc/noVNC.git &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="nb"&gt;cd &lt;/span&gt;noVNC &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; vnc.html index.html

&lt;span class="c"&gt;#....&lt;/span&gt;

&lt;span class="c"&gt;# Create necessary directories with proper permissions&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /home/chrome/data &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /var/log/supervisor &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="nb"&gt;chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; chrome:chrome /home/chrome &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="nb"&gt;chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; chrome:chrome /var/log/supervisor


&lt;span class="c"&gt;# Copy startup script&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; STEALTH=basic&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ${STEALTH}.sh /usr/local/bin/start-chrome.sh&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x /usr/local/bin/start-chrome.sh &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="nb"&gt;chown &lt;/span&gt;chrome:chrome /usr/local/bin/start-chrome.sh

&lt;span class="c"&gt;#....&lt;/span&gt;


&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; ${SOCAT_PORT} ${VNC_PORT} ${NOVNC_PORT}&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["supervisord", "-n", "-c", "/etc/supervisor/conf.d/supervisord.conf"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;III. Dockerfile (lightweight Debian to get a more frictionless experience):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; debian:bookworm-slim&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DEBIAN_FRONTEND=noninteractive&lt;/span&gt;

&lt;span class="c"&gt;# Install base dependencies&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="c"&gt;#....&lt;/span&gt;
  fonts-liberation \
  fonts-noto-color-emoji \
  fonts-roboto \
  fonts-noto \
  libasound2 \
  libatk-bridge2.0-0 \
  libatk1.0-0 \
  libatspi2.0-0 \
  libcups2 \
  libdbus-1-3 \
  libdrm2 \
  libgbm1 \
  libgtk-3-0 \
  libnspr4 \
  libnss3 \
  libwayland-client0 \
  libxcomposite1 \
  libxdamage1 \
  libxfixes3 \
  libxkbcommon0 \
  libxrandr2 \
  xdg-utils \
  &amp;amp;&amp;amp; rm -rf /var/lib/apt/lists/*

# Install VNC and GUI components
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  gnupg &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="c"&gt;#....&lt;/span&gt;
  python3 \
  python3-numpy \
  &amp;amp;&amp;amp; rm -rf /var/lib/apt/lists/*

#....

# Create a non-root user
&lt;span class="k"&gt;RUN &lt;/span&gt;useradd &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /bin/bash chrome

&lt;span class="c"&gt;#....&lt;/span&gt;

&lt;span class="c"&gt;# Copy supervisord config&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; supervisord.conf.debian /etc/supervisor/conf.d/supervisord.conf&lt;/span&gt;

&lt;span class="c"&gt;#....&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;IV. The &lt;code&gt;alpine&lt;/code&gt; version, similar but not quite the same, some commands are different and some of the package names and dependencies differ too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine:3.19&lt;/span&gt;

&lt;span class="c"&gt;#....&lt;/span&gt;

&lt;span class="c"&gt;# Install base dependencies and Chromium&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="c"&gt;# Core utilities&lt;/span&gt;
  bash \ # using some custom scripts to launch the browser
  &lt;span class="c"&gt;# ....&lt;/span&gt;
  xdpyinfo \ # x11 utilities that are not included by default
  xauth \
  xprop \
  xwininfo \
  # Fonts
  font-liberation \
  font-noto \
  font-noto-emoji \
  font-noto-cjk \
  # Chromium dependencies
  libstdc++ \
  harfbuzz \
  nss \
  freetype \
  ttf-freefont \
  wqy-zenhei \
  # Audio/Video libraries
  alsa-lib \
  at-spi2-core \
  cups-libs \
  dbus-libs \
  libdrm \
  mesa-gbm \
  libxcomposite \
  libxdamage \
  libxfixes \
  libxkbcommon \
  libxrandr \
  wayland-libs-client \
  # X11 libraries
  libx11 \
  libxext \
  libxrender \
  libxtst \
  libxi

&lt;span class="c"&gt;# Install VNC and GUI components&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="c"&gt;#....&lt;/span&gt;
  python3 \
  py3-numpy \
  py3-pip

&lt;span class="c"&gt;#....&lt;/span&gt;

&lt;span class="c"&gt;# Create a non-root user (Alpine uses adduser instead of useradd)&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;adduser &lt;span class="nt"&gt;-D&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /bin/sh chrome

&lt;span class="c"&gt;#....&lt;/span&gt;

&lt;span class="c"&gt;# Copy supervisord config&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; supervisord.conf.alpine /etc/supervisor/conf.d/supervisord.conf&lt;/span&gt;

&lt;span class="c"&gt;#....&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;V. In order to manage the multiple containers &lt;code&gt;k8s&lt;/code&gt; would be overkill so I used &lt;code&gt;docker-compose&lt;/code&gt; instead. I like it for simple experiments when I have to do quick and incremental iteration for testing my containers. There is less of a need for cleaning up things after and less of a chance to make mistakes with &lt;code&gt;docker run&lt;/code&gt; commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;chrome-cdp&lt;/span&gt;

&lt;span class="na"&gt;x-chrome-common&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nl"&gt;&amp;amp;chrome-common&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;chrome-cdp:${DISTRO:-debian}-${MODE:-headless}-stealth-${STEALTH:-basic}&lt;/span&gt;
  &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;chrome-${DISTRO:-debian}-${MODE:-headless}-stealth-${STEALTH:-basic}&lt;/span&gt;
  &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;cdpnet&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;shm_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${HEADLESS_SHM_SIZE}"&lt;/span&gt;
  &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
  &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;curl"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-fsS"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://0.0.0.0:${SOCAT_PORT}/json/version"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3s&lt;/span&gt;
    &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
    &lt;span class="na"&gt;start_period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;chrome-headless&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*chrome-common&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./headless&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile.${DISTRO:-debian}&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;STEALTH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${STEALTH:-basic}&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${CDP_HOST_PORT}:${SOCAT_PORT}"&lt;/span&gt;

  &lt;span class="na"&gt;chrome-gui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*chrome-common&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./gui&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile.${DISTRO:-debian}&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;STEALTH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${STEALTH:-basic}&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${CDP_HOST_PORT}:${SOCAT_PORT}"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${VNC_PORT}:5900"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${NOVNC_PORT}:6080"&lt;/span&gt;

&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;cdpnet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bridge&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;as you can see the &lt;code&gt;docker-compose&lt;/code&gt; file is slightly streamlined by using YAML anchors and aliases to avoid repetition between the two services. But this is not the most interesting DX improvements I added.&lt;/p&gt;

&lt;p&gt;VI. The pimped out &lt;code&gt;Makefile&lt;/code&gt;. The big deal with this one is that I got everything working almost like an extension of make, where previously I would just be creating targets to wrap &lt;code&gt;docker-compose&lt;/code&gt; now I added a bunch of more advanced params to allow parametrization of each target instead of doing different names for each variant.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="nv"&gt;SHELL&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; /bin/bash
&lt;span class="nv"&gt;ENV_FILE&lt;/span&gt; &lt;span class="o"&gt;?=&lt;/span&gt; .env
&lt;span class="nv"&gt;STEALTH&lt;/span&gt; &lt;span class="o"&gt;?=&lt;/span&gt; basic
&lt;span class="nv"&gt;MODE&lt;/span&gt; &lt;span class="o"&gt;?=&lt;/span&gt; headless
&lt;span class="nv"&gt;DISTRO&lt;/span&gt; &lt;span class="o"&gt;?=&lt;/span&gt; debian

&lt;span class="c"&gt;# Container name (adjust to match your docker-compose service name)
&lt;/span&gt;&lt;span class="nv"&gt;CONTAINER_NAME&lt;/span&gt; &lt;span class="o"&gt;?=&lt;/span&gt; chromium
&lt;span class="nv"&gt;COMPOSE_FILE&lt;/span&gt; &lt;span class="o"&gt;?=&lt;/span&gt; docker-compose.yml

&lt;span class="nl"&gt;.PHONY&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;health list-containers ps stats stats-all stats-live top &lt;/span&gt;\
&lt;span class="nf"&gt;   ports ports-all ports-detailed logs logs-chrome logs-chrome-live &lt;/span&gt;\
&lt;span class="nf"&gt;   shell rebuild up down chrome-version chrome-tabs chrome-health verify-chrome-flags &lt;/span&gt;\
&lt;span class="nf"&gt;   restart-all stop-all wsurl&lt;/span&gt;

&lt;span class="c"&gt;# ============================================
# Configuration
# ============================================
&lt;/span&gt;&lt;span class="nv"&gt;CHROME_IMAGE_PREFIX&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; chrome-cdp
&lt;span class="nv"&gt;CHROME_IMAGES&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="p"&gt;$(&lt;/span&gt;CHROME_IMAGE_PREFIX&lt;span class="p"&gt;)&lt;/span&gt;:debian-headless-stealth-basic &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="p"&gt;$(&lt;/span&gt;CHROME_IMAGE_PREFIX&lt;span class="p"&gt;)&lt;/span&gt;:debian-headless-stealth-advanced &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="p"&gt;$(&lt;/span&gt;CHROME_IMAGE_PREFIX&lt;span class="p"&gt;)&lt;/span&gt;:debian-gui-stealth-basic &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="p"&gt;$(&lt;/span&gt;CHROME_IMAGE_PREFIX&lt;span class="p"&gt;)&lt;/span&gt;:debian-gui-stealth-advanced &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="p"&gt;$(&lt;/span&gt;CHROME_IMAGE_PREFIX&lt;span class="p"&gt;)&lt;/span&gt;:alpine-headless-stealth-basic &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="p"&gt;$(&lt;/span&gt;CHROME_IMAGE_PREFIX&lt;span class="p"&gt;)&lt;/span&gt;:alpine-headless-stealth-advanced &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="p"&gt;$(&lt;/span&gt;CHROME_IMAGE_PREFIX&lt;span class="p"&gt;)&lt;/span&gt;:alpine-gui-stealth-basic &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="p"&gt;$(&lt;/span&gt;CHROME_IMAGE_PREFIX&lt;span class="p"&gt;)&lt;/span&gt;:alpine-gui-stealth-advanced

&lt;span class="c"&gt;# Build docker filter arguments
&lt;/span&gt;&lt;span class="nv"&gt;DOCKER_FILTERS&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;foreach img,&lt;span class="p"&gt;$(&lt;/span&gt;CHROME_IMAGES&lt;span class="p"&gt;)&lt;/span&gt;,--filter &lt;span class="s2"&gt;"ancestor=&lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;&lt;span class="s2"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Command to get container
&lt;/span&gt;&lt;span class="nv"&gt;GET_CONTAINER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; docker ps &lt;span class="p"&gt;$(&lt;/span&gt;DOCKER_FILTERS&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt;

&lt;span class="c"&gt;# Command to get all containers
&lt;/span&gt;&lt;span class="nv"&gt;GET_ALL_CONTAINERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; docker ps &lt;span class="p"&gt;$(&lt;/span&gt;DOCKER_FILTERS&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt;

&lt;span class="c"&gt;# ============================================
# Helper Functions
# ============================================
# Check if container exists and set CONTAINER variable
&lt;/span&gt;&lt;span class="k"&gt;define&lt;/span&gt; &lt;span class="nv"&gt;require_container&lt;/span&gt;
 &lt;span class="nl"&gt;$(eval CONTAINER &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;= $(shell $(GET_CONTAINER)))&lt;/span&gt;
 &lt;span class="err"&gt;@if&lt;/span&gt; &lt;span class="err"&gt;[&lt;/span&gt; &lt;span class="err"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"$(CONTAINER)"&lt;/span&gt; &lt;span class="err"&gt;];&lt;/span&gt; &lt;span class="err"&gt;then&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;
  &lt;span class="err"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ No $(CHROME_IMAGE_PREFIX) container running"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;
  &lt;span class="nl"&gt;echo "Available images&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;$(CHROME_IMAGES)"; &lt;/span&gt;\
&lt;span class="nf"&gt;  exit 1; &lt;/span&gt;\
&lt;span class="nf"&gt; fi&lt;/span&gt;
 &lt;span class="nl"&gt;@echo "✓ Using container&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;$(CONTAINER)"&lt;/span&gt;
&lt;span class="k"&gt;endef&lt;/span&gt;

&lt;span class="c"&gt;# ============================================
# Targets
# ============================================
&lt;/span&gt;&lt;span class="nl"&gt;help&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"Chrome Container Management"&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
 &lt;span class="nl"&gt;@echo "Available targets&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;"&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"  make list-containers    - List all running chrome containers"&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"  make verify-chrome-flags - Verify stealth flags in Chrome"&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"  make stats              - Show container stats"&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"  make stats-all          - Show stats for all chrome containers"&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"  make logs               - Show container logs"&lt;/span&gt;
 &lt;span class="c"&gt;# @echo "  make exec CMD=&amp;lt;cmd&amp;gt;     - Execute command in container"
&lt;/span&gt; &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"  make shell              - Open shell in container"&lt;/span&gt;

&lt;span class="nl"&gt;list-containers&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nl"&gt;@echo "Running chrome-cdp containers&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;"&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;ps&lt;/span&gt; &lt;span class="err"&gt;$(DOCKER_FILTERS)&lt;/span&gt; &lt;span class="err"&gt;--format&lt;/span&gt; &lt;span class="s2"&gt;"table {{.ID}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}"&lt;/span&gt;

&lt;span class="nl"&gt;up&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nl"&gt;@echo "Building and starting chromium container in $(MODE) mode with $(DISTRO) (stealth&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;$(STEALTH))"&lt;/span&gt;
 &lt;span class="nv"&gt;DISTRO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;DISTRO&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;MODE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;MODE&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;STEALTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;STEALTH&lt;span class="p"&gt;)&lt;/span&gt; docker compose &lt;span class="nt"&gt;--env-file&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;ENV_FILE&lt;span class="p"&gt;)&lt;/span&gt; up chrome-&lt;span class="p"&gt;$(&lt;/span&gt;MODE&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--build&lt;/span&gt;

&lt;span class="nl"&gt;down&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="err"&gt;docker&lt;/span&gt; &lt;span class="err"&gt;compose&lt;/span&gt; &lt;span class="err"&gt;--env-file&lt;/span&gt; &lt;span class="err"&gt;$(ENV_FILE)&lt;/span&gt; &lt;span class="err"&gt;down&lt;/span&gt;

&lt;span class="nl"&gt;logs&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;logs&lt;/span&gt; &lt;span class="err"&gt;-f&lt;/span&gt; &lt;span class="err"&gt;$(CONTAINER)&lt;/span&gt;

&lt;span class="nl"&gt;shell&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;exec&lt;/span&gt; &lt;span class="err"&gt;-it&lt;/span&gt; &lt;span class="err"&gt;$(CONTAINER)&lt;/span&gt; &lt;span class="err"&gt;bash&lt;/span&gt;

&lt;span class="nl"&gt;rebuild&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nv"&gt;DISTRO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;DISTRO&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;MODE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;MODE&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;STEALTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;STEALTH&lt;span class="p"&gt;)&lt;/span&gt; docker compose &lt;span class="nt"&gt;--env-file&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;ENV_FILE&lt;span class="p"&gt;)&lt;/span&gt; build &lt;span class="nt"&gt;--no-cache&lt;/span&gt; chrome-&lt;span class="p"&gt;$(&lt;/span&gt;MODE&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nl"&gt;ps&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="err"&gt;docker&lt;/span&gt; &lt;span class="err"&gt;compose&lt;/span&gt; &lt;span class="err"&gt;--env-file&lt;/span&gt; &lt;span class="err"&gt;$(ENV_FILE)&lt;/span&gt; &lt;span class="err"&gt;ps&lt;/span&gt;

&lt;span class="nl"&gt;health&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nl"&gt;@echo "Headless&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;" &amp;amp;&amp;amp; curl -fsS http://127.0.0.1:$$(grep ^CDP_HOST_PORT $(ENV_FILE) | cut -d= -f2)/json/version | jq -r .Browser || true&lt;/span&gt;
 &lt;span class="nl"&gt;@echo "GUI&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;" &amp;amp;&amp;amp; curl -fsS http://127.0.0.1:$$(grep ^CDP_PORT_GUI $(ENV_FILE) | cut -d= -f2)/json/version | jq -r .Browser || true&lt;/span&gt;

&lt;span class="c"&gt;# Quick helper to print a page websocketDebuggerUrl (requires jq)
&lt;/span&gt;&lt;span class="nl"&gt;wsurl&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nl"&gt;@curl -s "http&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;//127.0.0.1:$$(grep ^CDP_PORT_HEADLESS $(ENV_FILE) | cut -d= -f2)/json/new?about:blank" | jq -r .webSocketDebuggerUrl&lt;/span&gt;
 &lt;span class="nl"&gt;@curl -s "http&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;//127.0.0.1:$$(grep ^CDP_PORT_GUI $(ENV_FILE) | cut -d= -f2)/json/new?about:blank" | jq -r .webSocketDebuggerUrl&lt;/span&gt;


&lt;span class="c"&gt;# ============================================
# Chrome-specific commands
# ============================================
&lt;/span&gt;&lt;span class="nl"&gt;chrome-version&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;exec&lt;/span&gt; &lt;span class="err"&gt;$(CONTAINER)&lt;/span&gt; &lt;span class="err"&gt;chromium&lt;/span&gt; &lt;span class="err"&gt;--version&lt;/span&gt;

&lt;span class="nl"&gt;chrome-tabs&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;CHROME_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;docker port &lt;span class="p"&gt;$(&lt;/span&gt;CONTAINER&lt;span class="p"&gt;)&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'9222|9223|9224'&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' -&amp;gt; '&lt;/span&gt; &lt;span class="s1"&gt;'{print $$2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CHROME_PORT"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ Chrome DevTools port not found"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;fi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Fetching tabs from http://&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CHROME_PORT/json"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"http://&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CHROME_PORT/json"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.[] | "\(.id): \(.title) - \(.url)"'&lt;/span&gt;

&lt;span class="nl"&gt;chrome-health&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;CHROME_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;docker port &lt;span class="p"&gt;$(&lt;/span&gt;CONTAINER&lt;span class="p"&gt;)&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'9222|9223|9224'&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' -&amp;gt; '&lt;/span&gt; &lt;span class="s1"&gt;'{print $$2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CHROME_PORT"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ Chrome DevTools port not found"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;fi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Chrome DevTools Health Check ==="&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Endpoint: http://&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CHROME_PORT"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;if &lt;/span&gt;curl &lt;span class="nt"&gt;-sf&lt;/span&gt; &lt;span class="s2"&gt;"http://&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CHROME_PORT/json/version"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;/dev/null 2&amp;gt;&amp;amp;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Chrome DevTools is responding"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"http://&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CHROME_PORT/json/version"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'"Browser: \(.Browser)\nProtocol Version: \(."Protocol-Version")\nUser Agent: \(."User-Agent")"'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ Chrome not responding"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="nl"&gt;verify-chrome-flags&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"Verifying Chrome flags in container $(CONTAINER)..."&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;exec&lt;/span&gt; &lt;span class="err"&gt;$(CONTAINER)&lt;/span&gt; &lt;span class="err"&gt;sh&lt;/span&gt; &lt;span class="err"&gt;-c&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"cat /proc/\$$(pgrep -o chromium)/cmdline | tr '\0' '\n' | grep -E 'disable-blink-features|user-agent'"&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;
  &lt;span class="err"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="err"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Stealth flags detected"&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;
  &lt;span class="err"&gt;||&lt;/span&gt; &lt;span class="err"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ No stealth flags"&lt;/span&gt;

&lt;span class="nl"&gt;stats&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="nl"&gt;@echo "Container stats&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="nf"&gt;"&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;stats&lt;/span&gt; &lt;span class="err"&gt;--no-stream&lt;/span&gt; &lt;span class="err"&gt;--format&lt;/span&gt; &lt;span class="s2"&gt;"table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}"&lt;/span&gt; &lt;span class="err"&gt;$(CONTAINER)&lt;/span&gt;

&lt;span class="nl"&gt;stats-all&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$($(&lt;/span&gt;GET_ALL_CONTAINERS&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CONTAINERS"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ No &lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;&lt;span class="s2"&gt;CHROME_IMAGE_PREFIX&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; containers running"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;fi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Stats for all chrome containers:"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 docker stats &lt;span class="nt"&gt;--no-stream&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s2"&gt;"table {{.Container}}&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;{{.Image}}&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;{{.CPUPerc}}&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;{{.MemUsage}}&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;{{.MemPerc}}"&lt;/span&gt; &lt;span class="nv"&gt;$$&lt;/span&gt;CONTAINERS

&lt;span class="nl"&gt;stats-live&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$($(&lt;/span&gt;GET_ALL_CONTAINERS&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CONTAINERS"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ No &lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;&lt;span class="s2"&gt;CHROME_IMAGE_PREFIX&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; containers running"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;fi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 docker stats &lt;span class="nv"&gt;$$&lt;/span&gt;CONTAINERS

&lt;span class="nl"&gt;top&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;CONTAINER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$($(&lt;/span&gt;GET_CONTAINER&lt;span class="p"&gt;))&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Top Processes in Container ==="&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;top&lt;/span&gt; &lt;span class="err"&gt;$$(docker-compose&lt;/span&gt; &lt;span class="err"&gt;-f&lt;/span&gt; &lt;span class="err"&gt;$(COMPOSE_FILE)&lt;/span&gt; &lt;span class="err"&gt;ps&lt;/span&gt; &lt;span class="err"&gt;-q&lt;/span&gt; &lt;span class="err"&gt;$$CONTAINER)&lt;/span&gt;

&lt;span class="c"&gt;# ============================================
# Bulk operations
# ============================================
&lt;/span&gt;&lt;span class="nl"&gt;restart-all&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$($(&lt;/span&gt;GET_ALL_CONTAINERS&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CONTAINERS"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ No containers to restart"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;fi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Restarting all chrome containers..."&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 docker restart &lt;span class="nv"&gt;$$&lt;/span&gt;CONTAINERS

&lt;span class="nl"&gt;stop-all&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$($(&lt;/span&gt;GET_ALL_CONTAINERS&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CONTAINERS"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ No containers to stop"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;fi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Stopping all chrome containers..."&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 docker stop &lt;span class="nv"&gt;$$&lt;/span&gt;CONTAINERS

&lt;span class="c"&gt;# Show port mappings for a single container
&lt;/span&gt;&lt;span class="nl"&gt;ports&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Port Mappings for Container $(CONTAINER) ==="&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;port&lt;/span&gt; &lt;span class="err"&gt;$(CONTAINER)&lt;/span&gt; &lt;span class="err"&gt;|&lt;/span&gt; &lt;span class="err"&gt;awk&lt;/span&gt; &lt;span class="err"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' -&amp;gt; '&lt;/span&gt; &lt;span class="s1"&gt;'{print $$1 "\t→\t" $$2}'&lt;/span&gt; &lt;span class="err"&gt;|&lt;/span&gt; &lt;span class="err"&gt;column&lt;/span&gt; &lt;span class="err"&gt;-t&lt;/span&gt; &lt;span class="err"&gt;-s&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="s1"&gt;'\t'&lt;/span&gt;

&lt;span class="c"&gt;# Show port mappings for all chrome containers in a table
&lt;/span&gt;&lt;span class="nl"&gt;ports-all&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;CONTAINERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$($(&lt;/span&gt;GET_ALL_CONTAINERS&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CONTAINERS"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ No &lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;&lt;span class="s2"&gt;CHROME_IMAGE_PREFIX&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; containers running"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;fi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Port Mappings for All Chrome Containers ==="&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s2"&gt;"%-15s %-40s %-20s %-20s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"CONTAINER ID"&lt;/span&gt; &lt;span class="s2"&gt;"IMAGE"&lt;/span&gt; &lt;span class="s2"&gt;"CONTAINER PORT"&lt;/span&gt; &lt;span class="s2"&gt;"HOST BINDING"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s2"&gt;"%-15s %-40s %-20s %-20s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"---------------"&lt;/span&gt; &lt;span class="s2"&gt;"----------------------------------------"&lt;/span&gt; &lt;span class="s2"&gt;"--------------------"&lt;/span&gt; &lt;span class="s2"&gt;"--------------------"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;for &lt;/span&gt;container &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$$&lt;/span&gt;CONTAINERS&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;IMAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;docker inspect &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{{.Config.Image&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;}'&lt;/span&gt; &lt;span class="nv"&gt;$$&lt;/span&gt;container&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;SHORT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$$&lt;/span&gt;container | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-c1-12&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  docker port &lt;span class="nv"&gt;$$&lt;/span&gt;container | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; line&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nv"&gt;CONTAINER_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;line"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' -&amp;gt; '&lt;/span&gt; &lt;span class="s1"&gt;'{print $$1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nv"&gt;HOST_BINDING&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;line"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' -&amp;gt; '&lt;/span&gt; &lt;span class="s1"&gt;'{print $$2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s2"&gt;"%-15s %-40s %-20s %-20s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;SHORT_ID"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;IMAGE"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CONTAINER_PORT"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;HOST_BINDING"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Detailed port information with service names
&lt;/span&gt;&lt;span class="nl"&gt;ports-detailed&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Detailed Port Information for Container $(CONTAINER) ==="&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;
 &lt;span class="err"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;
 &lt;span class="nv"&gt;IMAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;docker inspect &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{{.Config.Image&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;}'&lt;/span&gt; &lt;span class="p"&gt;$(&lt;/span&gt;CONTAINER&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Container: &lt;/span&gt;&lt;span class="p"&gt;$(&lt;/span&gt;&lt;span class="s2"&gt;CONTAINER&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Image: &lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;IMAGE"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s2"&gt;"%-35s %-20s %-25s %-15s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"SERVICE"&lt;/span&gt; &lt;span class="s2"&gt;"CONTAINER PORT"&lt;/span&gt; &lt;span class="s2"&gt;"HOST BINDING"&lt;/span&gt; &lt;span class="s2"&gt;"PROTOCOL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s2"&gt;"%-35s %-20s %-25s %-15s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"----------------------------------"&lt;/span&gt; &lt;span class="s2"&gt;"--------------------"&lt;/span&gt; &lt;span class="s2"&gt;"-------------------------"&lt;/span&gt; &lt;span class="s2"&gt;"---------------"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 docker port &lt;span class="p"&gt;$(&lt;/span&gt;CONTAINER&lt;span class="p"&gt;)&lt;/span&gt; | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; line&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;CONTAINER_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;line"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' -&amp;gt; '&lt;/span&gt; &lt;span class="s1"&gt;'{print $$1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'/'&lt;/span&gt; &lt;span class="nt"&gt;-f1&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;PROTOCOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;line"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' -&amp;gt; '&lt;/span&gt; &lt;span class="s1"&gt;'{print $$1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'/'&lt;/span&gt; &lt;span class="nt"&gt;-f2&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;HOST_BINDING&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;line"&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' -&amp;gt; '&lt;/span&gt; &lt;span class="s1"&gt;'{print $$2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nv"&gt;$$&lt;/span&gt;CONTAINER_PORT &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   9222|9223|9224&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Chrome DevTools Socat Proxy"&lt;/span&gt; &lt;span class="p"&gt;;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   5900&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"VNC Server"&lt;/span&gt; &lt;span class="p"&gt;;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   6080&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"noVNC Web"&lt;/span&gt; &lt;span class="p"&gt;;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Unknown"&lt;/span&gt; &lt;span class="p"&gt;;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="k"&gt;esac&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s2"&gt;"%-35s %-20s %-25s %-15s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;SERVICE"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;CONTAINER_PORT"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;HOST_BINDING"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="s2"&gt;PROTOCOL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# View Chrome startup script logs
&lt;/span&gt;&lt;span class="nl"&gt;logs-chrome&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Supervisor Chrome Program Logs ==="&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;exec&lt;/span&gt; &lt;span class="err"&gt;$(CONTAINER)&lt;/span&gt; &lt;span class="err"&gt;cat&lt;/span&gt; &lt;span class="err"&gt;/var/log/supervisor/chrome.log&lt;/span&gt; &lt;span class="err"&gt;2&amp;gt;/dev/null&lt;/span&gt; &lt;span class="err"&gt;||&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;
  &lt;span class="err"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Chrome log file not found"&lt;/span&gt;

&lt;span class="c"&gt;# Live tail of Chrome logs
&lt;/span&gt;&lt;span class="nl"&gt;logs-chrome-live&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
 &lt;span class="nf"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;call&lt;/span&gt; require_container&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="err"&gt;@echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Live Chrome Logs (Ctrl+C to exit) ==="&lt;/span&gt;
 &lt;span class="err"&gt;@docker&lt;/span&gt; &lt;span class="err"&gt;logs&lt;/span&gt; &lt;span class="err"&gt;-f&lt;/span&gt; &lt;span class="err"&gt;$(CONTAINER)&lt;/span&gt; &lt;span class="err"&gt;2&amp;gt;&amp;amp;1&lt;/span&gt; &lt;span class="err"&gt;|&lt;/span&gt; &lt;span class="err"&gt;grep&lt;/span&gt; &lt;span class="err"&gt;--line-buffered&lt;/span&gt; &lt;span class="s2"&gt;"CHROMIUM\|chromium\|Chrome"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Perfomance &amp;amp; size notes
&lt;/h2&gt;

&lt;p&gt;Alpine and Debian size are roughly the same with the current layout, so whatever I am doing wrong, needs a bit more investigation. The list of installed packages can probably be trimmed a bit more, maybe I get some extra savings but I don't expect them to be noticeable at small scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security notes
&lt;/h2&gt;

&lt;p&gt;Haven't added any specific security protocols so don't run this in production without:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gating with VPN or bind to 127.0.0.1.&lt;/li&gt;
&lt;li&gt;Add auth to noVNC/websockify or stick it behind a reverse proxy with auth.&lt;/li&gt;
&lt;li&gt;Run as non-root (done above), and keep --no-sandbox only inside containers you control.&lt;/li&gt;
&lt;li&gt;Pin package versions for reproducible builds.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  WTF moments (and fixes)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Had a bunch a pain with the &lt;code&gt;chromium&lt;/code&gt; remote debugging port as it would not bind to &lt;code&gt;0.0.0.0:2222&lt;/code&gt; on the container so I had to add &lt;code&gt;socat&lt;/code&gt; to forward the port. When trying to connect to the remote debugging port of chromium i was getting a connection reset by peer error. Turns out chromium only binds to localhost inside the container unless you use &lt;code&gt;socat&lt;/code&gt; to forward the port.&lt;/li&gt;
&lt;li&gt;When the browser was not launching inside the container, I thought I messed up, but, as it turns out, any kind of &lt;code&gt;--headless&lt;/code&gt; flags should probably be omitted when running with VNC, otherwise the browser will not show up in the VNC session.&lt;/li&gt;
&lt;li&gt;this is a hack and probably you can fix it in a better way, but I am lazy, so 2MB of overhead on alpine is me adding bash. I used bash to start chrome with the wrapper script, add the flags all that jazz.&lt;/li&gt;
&lt;li&gt;in my first attempt I tried to use chrome, but there were some silly issues with arm64 builds, so I switched to chromium which has better support for arm64.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My conclusion
&lt;/h2&gt;

&lt;p&gt;I will use this setup to automate via CDP. It decouples the browser from the language. I know that this is not ideal as a solo dev but hey it's fun. So I will give it a try.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>rust</category>
      <category>playwright</category>
    </item>
    <item>
      <title>How I optimized my blog images using Rust</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Wed, 12 Feb 2025 10:15:00 +0000</pubDate>
      <link>https://dev.to/adaschevici/how-i-optimized-my-blog-images-using-rust-3008</link>
      <guid>https://dev.to/adaschevici/how-i-optimized-my-blog-images-using-rust-3008</guid>
      <description>&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;Ages ago I read a blog post about how images can be optimized and that gives visitors to your site a better experience as the site loads faster yadda, yadda. Why would anyone not want this?&lt;/p&gt;

&lt;p&gt;My blog is built with &lt;a href="https://www.getzola.org/" rel="noopener noreferrer"&gt;zola&lt;/a&gt; so I am a bit of a &lt;code&gt;rust&lt;/code&gt; fanboy and want to use &lt;code&gt;rust&lt;/code&gt; whenever it makes sense and my novice skills can handle it. Not going to lie, I am not a &lt;code&gt;rust&lt;/code&gt; expert and for me &lt;code&gt;ChatGPT&lt;/code&gt; is a useful tool as it most often than not points me in the right direction.&lt;/p&gt;

&lt;p&gt;But I digress, a few days ago I decided to resurrect my blog, I tried to write a bit more consistently last year and got a streak of a few articles going, was feeling pretty good about it, but then my daughter was born and I was thrown in the gauntlet of figuring things out as a first time dad and the blog was left to rot. Since I am a bit more web marketing savvy, I decided to add some SEO to my blog, maybe I get some more visitors to it and get a sense of how popular it is.&lt;/p&gt;

&lt;p&gt;I am trying to be polite with the people that land on my blog and not track them so I don't use cookies. I host my stuff on &lt;code&gt;cloudflare&lt;/code&gt; since that gives the best bang for my buck. In other words I want my blog to be performant and free to host.&lt;/p&gt;

&lt;p&gt;My blog uses some analytics that are available through &lt;a href="https://www.cloudflare.com/web-analytics/" rel="noopener noreferrer"&gt;&lt;code&gt;cloudflare&lt;/code&gt;&lt;/a&gt; but they are very respectful of user privacy in that they are &lt;code&gt;GDPR&lt;/code&gt; and &lt;code&gt;CCPA&lt;/code&gt; compliant. This saves me the hassle of having to add a cookie consent form that disrupts the user navigation experience. I both like and dislike the analytics from &lt;code&gt;cloudflare&lt;/code&gt; as the numbers I am seeing are a bit weird as I am only seeing a constant number.&lt;/p&gt;

&lt;p&gt;Since I learned a bit more about &lt;code&gt;SEO&lt;/code&gt; and about &lt;a href="https://search.google.com/search-console/about" rel="noopener noreferrer"&gt;&lt;code&gt;Google Search Console&lt;/code&gt;&lt;/a&gt; I decided to check my blog's performance and see what I can do to improve it. Submitted my sitemap and ran a performance check and even if performance was at 100/100 I saw that the images were not optimized.&lt;/p&gt;

&lt;p&gt;My OCD kicked in and I had to figure out a way to address it, especially since I remembered that I read &lt;a href="https://endler.dev/2020/perf" rel="noopener noreferrer"&gt;an article&lt;/a&gt; talking about this. I dug into it a bit and noticed he is using &lt;code&gt;ImageMagik&lt;/code&gt;, &lt;code&gt;cavif&lt;/code&gt; and &lt;code&gt;cwebp&lt;/code&gt; to optimize the images, I decided to go a different way, essentially almost reinventing the wheel. I built a &lt;code&gt;rust cli&lt;/code&gt; that converts bigger &lt;code&gt;png&lt;/code&gt; and &lt;code&gt;jpeg&lt;/code&gt; images to &lt;code&gt;webp&lt;/code&gt; or &lt;code&gt;cavif&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcn92fx817ulkee4tp7z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcn92fx817ulkee4tp7z.png" alt="Image optimization for SEO benefits" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;The step by step process looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Change the shortcode for images to try and render the optimal image if supported by the browser&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;write the rust tool that traverses the directory tree and convert images to &lt;code&gt;webp&lt;/code&gt; or &lt;code&gt;avif&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;integrate the tool into the github action pipeline&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;perform caching on the github workflow to avoid spending too many github minutes on the actual conversion&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's in it for me?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Storage Cost Savings&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AVIF can be ~70% smaller than PNG&lt;/strong&gt; while maintaining similar or better quality.&lt;/li&gt;
&lt;li&gt;If you're storing images on &lt;strong&gt;AWS S3, Google Cloud Storage, DigitalOcean Spaces, or another cloud provider&lt;/strong&gt;, reducing storage by &lt;strong&gt;70%&lt;/strong&gt; directly cuts storage costs by the same percentage.
&lt;strong&gt;Example:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;100GB of PNGs → ~30GB of AVIF&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;If storage costs &lt;strong&gt;$0.023 per GB (AWS S3 Standard)&lt;/strong&gt;:&lt;/li&gt;
&lt;li&gt;PNG: &lt;strong&gt;$2.30/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;AVIF: &lt;strong&gt;$0.69/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: ~$1.61 per 100GB/month (~70%)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Egress Bandwidth Cost Savings&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Most cloud providers charge for outbound bandwidth (data transferred to users).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Smaller AVIF files mean lower bandwidth usage, leading to significant savings.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AVIF reduces bandwidth usage by ~70% compared to PNG.&lt;/strong&gt;
&lt;strong&gt;Example with AWS CloudFront:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data transfer cost (to the internet):&lt;/strong&gt; &lt;strong&gt;$0.085 per GB&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;If you serve 1TB of PNGs per month:&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PNG: 1TB → $85/month&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AVIF (70% smaller): 0.3TB → $25.50/month&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: ~$59.50 per TB/month (~70%)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. CDN Caching &amp;amp; Requests&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Many e-commerce sites use a &lt;strong&gt;CDN (Cloudflare, CloudFront, Fastly, etc.)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Smaller images:

&lt;ul&gt;
&lt;li&gt;Improve &lt;strong&gt;cache hit ratio&lt;/strong&gt; (more images fit in CDN cache).&lt;/li&gt;
&lt;li&gt;Reduce &lt;strong&gt;origin fetch requests&lt;/strong&gt;, further lowering egress costs.&lt;/li&gt;
&lt;li&gt;Speed up load times, improving user experience.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Total Cost Savings Estimate&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Cost Factor&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;PNG&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;AVIF&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Savings&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage (100GB)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$2.30&lt;/td&gt;
&lt;td&gt;$0.69&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1.61 (70%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Egress (1TB/month)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$85.00&lt;/td&gt;
&lt;td&gt;$25.50&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$59.50 (70%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Savings per TB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$87.11/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Playbook
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Step 1: Shortcode
&lt;/h4&gt;

&lt;p&gt;I avoided using javascript for this since &lt;code&gt;html&lt;/code&gt; already gives a mechanism to render an image with a fallback&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;picture&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;source&lt;/span&gt; &lt;span class="na"&gt;srcset=&lt;/span&gt;&lt;span class="s"&gt;"{{id}}.avif"&lt;/span&gt; &lt;span class="err"&gt;{%&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt; &lt;span class="na"&gt;alt&lt;/span&gt; &lt;span class="err"&gt;%}&lt;/span&gt;&lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"{{alt}}"&lt;/span&gt; &lt;span class="err"&gt;{%&lt;/span&gt; &lt;span class="na"&gt;endif&lt;/span&gt; &lt;span class="err"&gt;%}&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;source&lt;/span&gt; &lt;span class="na"&gt;srcset=&lt;/span&gt;&lt;span class="s"&gt;"{{id}}.webp"&lt;/span&gt; &lt;span class="err"&gt;{%&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt; &lt;span class="na"&gt;alt&lt;/span&gt; &lt;span class="err"&gt;%}&lt;/span&gt;&lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"{{alt}}"&lt;/span&gt; &lt;span class="err"&gt;{%&lt;/span&gt; &lt;span class="na"&gt;endif&lt;/span&gt; &lt;span class="err"&gt;%}&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;img&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"{{id}}.png"&lt;/span&gt; &lt;span class="err"&gt;{%&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt; &lt;span class="na"&gt;alt&lt;/span&gt; &lt;span class="err"&gt;%}&lt;/span&gt;&lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"{{alt}}"&lt;/span&gt; &lt;span class="err"&gt;{%&lt;/span&gt; &lt;span class="na"&gt;endif&lt;/span&gt; &lt;span class="err"&gt;%}&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/picture&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Rust tool
&lt;/h4&gt;

&lt;p&gt;The way I structure my posts is that each post lies neatly inside its own folder, along with all the images and any other extra assets that add some sort of value to the content.&lt;/p&gt;

&lt;p&gt;So, from the theme I grab all the &lt;code&gt;png&lt;/code&gt; images&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;input_paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Params&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"content/**/*.png"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
        &lt;span class="nf"&gt;.filter_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Params&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;should_recreate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="py"&gt;.recreate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="nn"&gt;Default&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trouble is the theme also contains some images which need to be converted. At the moment the only image is my logo&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;theme_image_paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Params&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"themes/**/*.jpg"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
        &lt;span class="nf"&gt;.filter_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Result&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Params&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;should_recreate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="py"&gt;.recreate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;should_resize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="nf"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In order to save time, converting images that exist already is a bit redundant so the tool checks if the path exists already and if it does, conversion is skipped. Working with paths is surprisingly straightforward. I was previously quite afraid to write code in &lt;code&gt;rust&lt;/code&gt; because I feared the overhead.&lt;/p&gt;

&lt;p&gt;The actual code is stupid easy to understand and reason about, even for me&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;    &lt;span class="c1"&gt;//  webp file path&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;webp_file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parent_dir&lt;/span&gt;&lt;span class="nf"&gt;.join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}.webp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_stem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.as_str&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;// Convert to .webp as an example&lt;/span&gt;
    &lt;span class="c1"&gt;// was it already converted?&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;webp_file_path&lt;/span&gt;&lt;span class="nf"&gt;.exists&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A big chunk of the code lies in the conversion code which also gave me the most brain pain.&lt;br&gt;
We converted &lt;code&gt;webp&lt;/code&gt; using the &lt;a href="https://docs.rs/webp/latest/webp/" rel="noopener noreferrer"&gt;webp crate&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;convert_to_webp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;DynamicImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AnyResult&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;encoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;WebpEncoder&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;webp_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="nf"&gt;.encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;75.0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Quality 75&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;File&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="nf"&gt;.write_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;webp_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Saved WebP to {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then &lt;code&gt;avif&lt;/code&gt; using the &lt;a href="https://crates.io/crates/ravif" rel="noopener noreferrer"&gt;avif crate&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;convert_to_avif&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;DynamicImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AnyResult&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="nf"&gt;.dimensions&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;rgba&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="nf"&gt;.to_rgba8&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;encoded_avif&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Encoder&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;.with_quality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;50.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.with_alpha_quality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;50.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.with_speed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.with_alpha_color_mode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;AlphaColorMode&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;UnassociatedClean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.with_num_threads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;avif_pixels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rgba&lt;/span&gt;
        &lt;span class="nf"&gt;.pixels&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Rgba&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="py"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Rgba&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;EncodedImage&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;avif_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;color_byte_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;alpha_byte_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;..&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;encoded_avif&lt;/span&gt;
        &lt;span class="nf"&gt;.encode_rgba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Img&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;avif_pixels&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="nf"&gt;.try_into&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="nf"&gt;.try_into&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;File&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="nf"&gt;.write_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;avif_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Saved AVIF to {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason converting to &lt;code&gt;avif&lt;/code&gt; was a bit more convoluted was due to the requirement for pixels to be in &lt;code&gt;rgba&lt;/code&gt; format. I had to convert the image to &lt;code&gt;rgba&lt;/code&gt; and then convert the pixels to &lt;code&gt;Rgba&lt;/code&gt; format(thank you libs with different types). This was a bit of a pain but I managed to get it working.&lt;/p&gt;

&lt;p&gt;I'm not an image processing expert so the solution was the result of a long conversation with trail and error with &lt;code&gt;ChatGPT&lt;/code&gt;, then again this is why I love &lt;code&gt;rust&lt;/code&gt; and how strict it is. It forces you to write code in a way that if it runs it most likely is correct.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 3: Github action
&lt;/h4&gt;

&lt;p&gt;The action installs and enables &lt;code&gt;rust&lt;/code&gt; so that the cli can be used&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Rust&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y&lt;/span&gt;
    &lt;span class="s"&gt;rustup toolchain install nightly&lt;/span&gt;
    &lt;span class="s"&gt;rustup default nightly&lt;/span&gt;
    &lt;span class="s"&gt;echo "$HOME/.cargo/bin" &amp;gt;&amp;gt; $GITHUB_PATH&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Verify Rust Installation&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;rustup --version&lt;/span&gt;
    &lt;span class="s"&gt;rustc --version&lt;/span&gt;
    &lt;span class="s"&gt;cargo --version&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build CLI tool&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;cargo build --manifest-path ./helpers/image-optimizer/Cargo.toml --release --verbose&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Perform the optimization&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./helpers/image-optimizer/target/release/image-optimizer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing I particularly liked was the fact that it is possible to use a relative path to &lt;code&gt;Cargo.toml&lt;/code&gt; which means no mucking about with paths.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 4: Caching
&lt;/h4&gt;

&lt;p&gt;Now one thing about &lt;code&gt;rust&lt;/code&gt; that is a bit of a bummer is that builds take quite some time. I guess that is the price to pay for static memory analysis. I would love to do a deep dive at some point on the optimization of rust build times but that is a story for another time.&lt;/p&gt;

&lt;p&gt;The one thing that &lt;code&gt;github&lt;/code&gt; tends to hold you accountable for is the number of build minutes you use when a workflow runs, so having rust install itself, download dependencies and then run a build for the tools can quickly add up.&lt;/p&gt;

&lt;p&gt;The optimization for build times covers caching cargo dependencies but also caching the built binary.&lt;/p&gt;

&lt;p&gt;I cached most things that I was able to but I am getting mixed results when trying to cache apt packages. It simply does not seem to work as intended in the naive approach.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install OS Dependencies (if needed)&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Create a file listing your required packages, one per line.&lt;/span&gt;
    &lt;span class="s"&gt;cat &amp;gt; apt-packages.txt &amp;lt;&amp;lt; EOF&lt;/span&gt;
    &lt;span class="s"&gt;nasm&lt;/span&gt;
    &lt;span class="s"&gt;EOF&lt;/span&gt;
    &lt;span class="s"&gt;sudo apt-get update&lt;/span&gt;
    &lt;span class="s"&gt;sudo apt-get install -y --no-install-recommends $(cat apt-packages.txt)&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Cache Rust toolchain&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/cache@v3&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;~/.rustup&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ runner.os }}-rustup-${{ hashFiles('rust-toolchain') }}&lt;/span&gt;
    &lt;span class="na"&gt;restore-keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;${{ runner.os }}-rustup-&lt;/span&gt;

  &lt;span class="s"&gt;...&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Cache Cargo dependencies and target&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/cache@v3&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;~/.cargo/registry&lt;/span&gt;
      &lt;span class="s"&gt;~/.cargo/git&lt;/span&gt;
      &lt;span class="s"&gt;./helpers/image-optimizer/target&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ runner.os }}-manual-cargo-${{ hashFiles('**/Cargo.lock') }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The savings in time by using the caching is quite substantial. The first run of the workflow took 15 minutes, the run that had cached the deps was less than 1 minute. Even with a substantial amount of images this will most likely not be a bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Optimizing images can decrease load on the server up to 70%&lt;/li&gt;
&lt;li&gt;... and it also improves performance which is beneficial for SEO&lt;/li&gt;
&lt;li&gt;Optimizing github workflows can save you a lot of wait time&lt;/li&gt;
&lt;li&gt;... and github minutes&lt;/li&gt;
&lt;li&gt;While this was interesting to do, I optimized for something that did not move the needle at all, it just made the evaluation in the &lt;code&gt;Google Search Console&lt;/code&gt; a bit better.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what I have to blame my OCD for. I am happy with the result and I learned quite a few things about &lt;code&gt;rust&lt;/code&gt; and &lt;code&gt;image&lt;/code&gt; processing. I am also happy that I managed to get the &lt;code&gt;avif&lt;/code&gt; conversion working as it is a format that is not yet widely supported but is the most efficient format out there.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>cdn</category>
      <category>seo</category>
      <category>avif</category>
    </item>
    <item>
      <title>Rustify some puppeteer code(part I)</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Thu, 27 Jun 2024 15:30:00 +0000</pubDate>
      <link>https://dev.to/adaschevici/rustify-some-puppeteer-code-3n33</link>
      <guid>https://dev.to/adaschevici/rustify-some-puppeteer-code-3n33</guid>
      <description>&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;Rust is pretty amazing but there are a few things that you might be weary about. There are &lt;a href="https://discord.com/blog/why-discord-is-switching-from-go-to-rust" rel="noopener noreferrer"&gt;few war stories&lt;/a&gt; of companies building their entire stack on &lt;code&gt;rust&lt;/code&gt; or and then living happily ever after. Software is an ever evolving organism so in the &lt;a href="https://www.darwinproject.ac.uk/people/about-darwin/six-things-darwin-never-said/evolution-misquotation" rel="noopener noreferrer"&gt;darwinian sense the more adaptable the better&lt;/a&gt;. Enough of that though, not here to advocate any particular language or framework, what I want is to share my experience with writing an equivalent scraper in &lt;code&gt;rust&lt;/code&gt; to &lt;a href="https://dev.to/adaschevici/gopherizing-some-puppeteer-code-29g4"&gt;my previous post&lt;/a&gt; where I used &lt;code&gt;golang&lt;/code&gt; and &lt;code&gt;chromedp&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The experience using &lt;code&gt;go&lt;/code&gt; with &lt;code&gt;chromedp&lt;/code&gt; to automate chrome was pretty good, it is not as powerful as what is available in &lt;code&gt;puppeteer&lt;/code&gt; so I figured I would have a look at what might be available in the &lt;code&gt;rust&lt;/code&gt; landscape.&lt;/p&gt;

&lt;h2&gt;
  
  
  What?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueockjszks2z31802f0l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fueockjszks2z31802f0l.png" alt="Rust Puppeteering" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;rust&lt;/code&gt; there are several libraries that deal with browser automation, a few I have had a look at are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/jonhoo/fantoccini" rel="noopener noreferrer"&gt;fantocini&lt;/a&gt; - A high-level API for programmatically interacting with web pages through WebDriver, but I want chrome devtools protocol instead.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/rust-headless-chrome/rust-headless-chrome" rel="noopener noreferrer"&gt;rust-headless-chrome&lt;/a&gt; - chrome devtools protocol client library in rust, not as active as the crate I wound up using.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/mattsse/chromiumoxide" rel="noopener noreferrer"&gt;chromiumoxide&lt;/a&gt; - this is the one that seem to be the most active in terms of development so it looks like a good choice at time of writing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;As I was reading one of my older posts that focuses on quasi live coding I realized it was boring as hell, and if your attention span is that of a goldfish, like mine is, it would probably make sense to just drop in a link to the &lt;a href="https://github.com/adaschevici/rustic-toy-chest/tree/main/rust-crawl-pupp" rel="noopener noreferrer"&gt;repo&lt;/a&gt; so that you can download the code and try it out yourself. The repo is a collection of rust prototypes that I have been building for fun and learning, haven't had yet a compelling reason to use rust in production unfortunately 😢.&lt;/p&gt;

&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;To my surprise the code was closer in structure to the &lt;a href="https://pptr.dev/" rel="noopener noreferrer"&gt;&lt;code&gt;puppeteer&lt;/code&gt;&lt;/a&gt; version than it was to the &lt;a href="https://github.com/chromedp/chromedp" rel="noopener noreferrer"&gt;&lt;code&gt;chromedp&lt;/code&gt;&lt;/a&gt;. The &lt;code&gt;chromedp&lt;/code&gt; version uses nested context declarations to manage the browser and page runtimes, the &lt;code&gt;rust&lt;/code&gt; version uses a more linear approach. You construct a browser instance and then you can interact with it as a user would. This points at the fact that the &lt;code&gt;chromiumoxide&lt;/code&gt; api is higher level. &lt;/p&gt;

&lt;p&gt;The way you can set things up to keep your use cases separate is by adding &lt;a href="https://docs.rs/clap/latest/clap/" rel="noopener noreferrer"&gt;&lt;code&gt;clap&lt;/code&gt;&lt;/a&gt; to your project and use command line flags to select the use case you want to run.&lt;/p&gt;

&lt;p&gt;You will see that I have covered most cases but not everything is transferable from &lt;code&gt;puppeteer&lt;/code&gt; or &lt;code&gt;chromedp&lt;/code&gt; to the &lt;code&gt;chromiumoxide&lt;/code&gt; version. I will not go through the setup of &lt;code&gt;rustup&lt;/code&gt;, rust toolchain or &lt;code&gt;cargo&lt;/code&gt; as this is a basic and well documented process, all you have to do is search for &lt;code&gt;getting started with rust&lt;/code&gt; and you will find a bunch of resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Show me the code
&lt;/h2&gt;

&lt;h4&gt;
  
  
  1. Laying down the foundation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;set up my project root&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo new rust-crawl-pupp
&lt;span class="nb"&gt;cd &lt;/span&gt;rust-crawl-pupp
cargo &lt;span class="nb"&gt;install &lt;/span&gt;cargo-edit &lt;span class="c"&gt;# this is useful for adding and upgrading dependencies&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;add dependencies via &lt;code&gt;cargo add&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;chromiumoxide&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.5.7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"tokio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"tokio-runtime"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c"&gt;# this is the main dependency&lt;/span&gt;
&lt;span class="py"&gt;chromiumoxide_cdp&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.5.2"&lt;/span&gt; &lt;span class="c"&gt;# this is the devtools protocol&lt;/span&gt;
&lt;span class="py"&gt;clap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"4.5.7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"derive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"cargo"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c"&gt;# this is for command line parsing&lt;/span&gt;
&lt;span class="py"&gt;futures&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.3.30"&lt;/span&gt; &lt;span class="c"&gt;# this is for async programming&lt;/span&gt;
&lt;span class="py"&gt;tokio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.38.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"full"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c"&gt;# this is the async runtime&lt;/span&gt;
&lt;span class="py"&gt;tracing&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1.40"&lt;/span&gt; &lt;span class="c"&gt;# this is for logging&lt;/span&gt;
&lt;span class="py"&gt;tracing-subscriber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.3.18"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"registry"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"env-filter"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c"&gt;# this is for logging&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;add &lt;code&gt;clap&lt;/code&gt; command line parsing to the project so that each different use case can be called via a subcommand&lt;br&gt;
define your imports&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;clap&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;Parser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Subcommand&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;define your command structs for parsing the command line arguments, this will allow for each use case to be called with its own subcommand like so &lt;code&gt;cargo run -- first-project&lt;/code&gt;, &lt;code&gt;cargo run -- second-project&lt;/code&gt;, and so on.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[derive(Parser)]&lt;/span&gt;
&lt;span class="nd"&gt;#[command(&lt;/span&gt;
    &lt;span class="nd"&gt;name&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"OxideCrawler"&lt;/span&gt;&lt;span class="nd"&gt;,&lt;/span&gt;
    &lt;span class="nd"&gt;version&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1"&lt;/span&gt;&lt;span class="nd"&gt;,&lt;/span&gt;
    &lt;span class="nd"&gt;author&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"artur"&lt;/span&gt;&lt;span class="nd"&gt;,&lt;/span&gt;
    &lt;span class="nd"&gt;about&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"An example application using clap"&lt;/span&gt;
&lt;span class="nd"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Cli&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;#[command(subcommand)]&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Commands&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nd"&gt;#[derive(Subcommand,&lt;/span&gt; &lt;span class="nd"&gt;Debug)]&lt;/span&gt;
&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;Commands&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;FirstProject&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="n"&gt;SecondProject&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;the way you can hook this into the main function is via a &lt;code&gt;match&lt;/code&gt; statement that will call the appropriate function based on the subcommand that was passed in.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Cli&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="py"&gt;.command&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nn"&gt;Commands&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;FirstProject&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;user_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;spoof_user_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nd"&gt;info!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"User agent detected"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Starting browser and the browser cleanup
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;use the &lt;code&gt;launch&lt;/code&gt; method and its options to start the browser, if the viewport and window size are different, the browser will start in windowed mode, with the page size being smaller.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nn"&gt;BrowserConfig&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;.with_head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;// this will start the browser in headless mode&lt;/span&gt;
        &lt;span class="nf"&gt;.no_sandbox&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;// this will disable the sandbox&lt;/span&gt;
        &lt;span class="nf"&gt;.viewport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// this will set the viewport size&lt;/span&gt;
        &lt;span class="nf"&gt;.window_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// this will set the window size&lt;/span&gt;
        &lt;span class="nf"&gt;.build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;task&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;spawn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;move&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;loop&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="nf"&gt;.next&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="nb"&gt;None&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;the browser cleanup needs to be done correctly and there are two symptoms that you will see if you missed anything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the browser will not close - hangs at the end&lt;/li&gt;
&lt;li&gt;you might get a warning like the following:
&lt;/li&gt;
&lt;/ul&gt;

&lt;pre class="highlight shell"&gt;&lt;code&gt;  2024-06-26T08:40:01.418414Z  WARN chromiumoxide::browser: Browser was not closed manually, it will be killed automatically &lt;span class="k"&gt;in &lt;/span&gt;the background
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;to correctly clean up your browser instance you will have to call these on the code paths that close the browser&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="nf"&gt;.close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="nf"&gt;.wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Use cases
&lt;/h4&gt;

&lt;p&gt;In the &lt;a href="https://github.com/adaschevici/rustic-toy-chest/tree/main/rust-crawl-pupp" rel="noopener noreferrer"&gt;repo&lt;/a&gt; each use case lives in its own module most of the time. There are some cases where you might have two living in the same module when they are very closely related, like in Use Case &lt;code&gt;c.&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;a. Spoof your user agent:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The only way I have found to set your user agent was from the &lt;a href="https://docs.rs/chromiumoxide/latest/chromiumoxide/page/struct.Page.html#" rel="noopener noreferrer"&gt;&lt;code&gt;Page&lt;/code&gt;&lt;/a&gt; module via the &lt;code&gt;set_user_agent&lt;/code&gt; method&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="nf"&gt;.new_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"about:blank"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="nf"&gt;.set_user_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s"&gt;"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) &lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s"&gt;
       Chrome/58.0.3029.110 Safari/537.36"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="nf"&gt;.goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://www.whatismybrowser.com/detect/what-is-my-user-agent"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;b. Grabbing the full content of the page&lt;/strong&gt; is pretty straightforward&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;      &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;
        &lt;span class="nf"&gt;.new_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://scrapingclub.com/exercise/list_infinite_scroll/"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="nf"&gt;.content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;c. Grabbing elements via css selectors&lt;/strong&gt;,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;elements_on_page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="nf"&gt;.find_elements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".post"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;elements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;elements_on_page&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;.then&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;move&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;el_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="nf"&gt;.inner_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="nf"&gt;.ok&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
          &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;el_text&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="nb"&gt;None&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="nf"&gt;.filter_map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="py"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
      &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;performing &lt;strong&gt;relative selection&lt;/strong&gt; from a specific node and mapping the content to &lt;code&gt;rust&lt;/code&gt; types&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;  &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="nf"&gt;.find_element&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"h4"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="nf"&gt;.inner_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;product_price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="nf"&gt;.find_element&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"h5"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="nf"&gt;.inner_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;product_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;d. When the page has infinite scroll&lt;/strong&gt; you will have to scroll to the bottom of the page to be able to collect all the elements you are interested in. To achieve this you need to inject &lt;code&gt;javascript&lt;/code&gt; into the page context and trigger a run of the function. The &lt;code&gt;chromiumoxide&lt;/code&gt; api seems to have really decent support for this, I faced much less resistance than I did with &lt;code&gt;chromedp&lt;/code&gt; and &lt;code&gt;go&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;js_script&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;r#"
      async () =&amp;gt; {
        await new Promise((resolve, reject) =&amp;gt; {
          var totalHeight = 0;
          var distance = 300; // should be less than or equal to window.innerHeight
          var timer = setInterval(() =&amp;gt; {
            var scrollHeight = document.body.scrollHeight;
            window.scrollBy(0, distance);
            totalHeight += distance;

            if (totalHeight &amp;gt;= scrollHeight) {
              clearInterval(timer);
              resolve();
            }
          }, 500);
        });
    }
  "#&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;
      &lt;span class="nf"&gt;.new_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://scrapingclub.com/exercise/list_infinite_scroll/"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="nf"&gt;.evaluate_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;js_script&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;e. When you need to wait for an element to load&lt;/strong&gt;, this was not exactly part of the &lt;code&gt;chromiumoxide&lt;/code&gt; api so I had to hack it together. Given my limited rust expertise there probably a better way to do this but this is what I managed to come up with. If the async block runs over the timeout then the &lt;code&gt;element_result&lt;/code&gt; will be an error, otherwise poll the dom for the element we are looking for.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;  &lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;time&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;element_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout_duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;loop&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="nf"&gt;.find_element&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
              &lt;span class="c1"&gt;// Wait for a short interval before checking again&lt;/span&gt;
              &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;time&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_millis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  4. Fixtures to replicate various scenarios
&lt;/h4&gt;

&lt;p&gt;Some websites, actually most websites have some sort of delay for loading different parts of the page, in order to prevent blocking the entire page. To replicate this behavior fixtures can be used to inject nodes into the dom with a delay. For the more edge case scenarios I created fixtures to emulate edge behaviors while not actually having to remember a website that is live and behaves like that.&lt;/p&gt;

&lt;p&gt;The HTML is really basic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"container"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="c"&gt;&amp;lt;!-- New node will be appended here --&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;

  &lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"script.js"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;script.js&lt;/code&gt; file is slightly more, but still fairly straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DOMContentLoaded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Function to create and append the new node&lt;/span&gt;
    &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createDelayedNode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Create a new div element&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newNode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;div&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="c1"&gt;// Add some content to the new node&lt;/span&gt;
      &lt;span class="nx"&gt;newNode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;This is a new node added after a delay.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="c1"&gt;// Add some styles to the new node&lt;/span&gt;
      &lt;span class="nx"&gt;newNode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;padding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;10px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;newNode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;marginTop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;10px&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;newNode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backgroundColor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#f0f0f0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;newNode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;border&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1px solid #ccc&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;newNode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;come-find-me&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="c1"&gt;// Append the new node to the container&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;container&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendChild&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newNode&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Set a delay (in milliseconds)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 3000ms = 3 seconds&lt;/span&gt;

    &lt;span class="c1"&gt;// Use setTimeout to create and append the node after the delay&lt;/span&gt;
    &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;createDelayedNode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it will do is create a new node with some text content and some styles, then append it to the container div after a delay of 3 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why to be continued?
&lt;/h2&gt;

&lt;p&gt;What I hate more than &lt;code&gt;to be continued&lt;/code&gt; in a TV show where I don't have the next episode available is a blog post that has code that looks reasonable and that it might work, but doesn't. So going by the lesser of two evils principle I decided to make this a two parter which will give me the time to write and test the other use cases in order to make sure everything works as expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;This is one of the few times I have stuck with &lt;code&gt;rust&lt;/code&gt; through the pain and I have to say it was a better experience than I had with &lt;code&gt;go&lt;/code&gt; and &lt;code&gt;chromedp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;writing the code was slightly faster since there was less boilerplate to write&lt;/li&gt;
&lt;li&gt;messing around with wrappers and &lt;code&gt;unwrap()&lt;/code&gt; was challenging but probably in time it gets easier&lt;/li&gt;
&lt;li&gt;the code in &lt;code&gt;rust&lt;/code&gt; looks more like &lt;code&gt;puppeteer&lt;/code&gt; than the &lt;code&gt;go&lt;/code&gt; version did&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  In Part II I will cover dealing with bot protection, handling frames, forms and more. Stay tuned!
&lt;/h4&gt;

</description>
      <category>rust</category>
      <category>scraping</category>
      <category>webcrawling</category>
    </item>
    <item>
      <title>OpenAI api RAG system with Qdrant</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Wed, 19 Jun 2024 12:35:00 +0000</pubDate>
      <link>https://dev.to/adaschevici/openai-api-rag-system-with-qdrant-7km</link>
      <guid>https://dev.to/adaschevici/openai-api-rag-system-with-qdrant-7km</guid>
      <description>&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openai.com/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; has been making it easier and easier to build out &lt;a href="https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/" rel="noopener noreferrer"&gt;GPT agents&lt;/a&gt; that make use of your own data to improve the generated responses of the pretrained models.&lt;/p&gt;

&lt;p&gt;Agents give a way to inject knowledge about your specific proprietary data into your pipeline, without actually sharing any private information about it. You can also improve the recency of your data too which makes you less dependent on the model's training cycle.&lt;/p&gt;

&lt;p&gt;OpenAI has improved the DX, UX and APIs since version 3.5, and has made it easier to create &lt;code&gt;agents&lt;/code&gt; and embed your data into your custom &lt;a href="https://openai.com/index/introducing-gpts/" rel="noopener noreferrer"&gt;&lt;code&gt;GPTs&lt;/code&gt;&lt;/a&gt;. They have lowered the barrier to entry which means that virtually anyone can build their own assistants that would be able to respond to queries about their data. This is perfect for people to experiment on building products. IMO this is a very good approach to enable product discovery for the masses.&lt;/p&gt;

&lt;p&gt;Most big AI contenders on the market provide you with a toolbox of high level abstractions and low to no code solutions. The weird thing about my approach to learning things is that not having some understanding of the first principles of the tech I'm using makes me feel a bit helpless, this is why I figured trying to build my own &lt;code&gt;RAG&lt;/code&gt; system would be a good way to figure out the nuts and bolts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What?
&lt;/h2&gt;

&lt;p&gt;I wanted to get a project for running my own pipeline with somewhat interchangeable parts. Models can be swapped around so that you can make the most of the latest models either available on &lt;a href="https://huggingface.co/" rel="noopener noreferrer"&gt;&lt;code&gt;Hugginface&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://openai.com/" rel="noopener noreferrer"&gt;&lt;code&gt;OpenAI&lt;/code&gt;&lt;/a&gt; or wherever.&lt;/p&gt;

&lt;p&gt;Because things are moving so fast in model research the top contenders are surpassing each other every day pretty much. A custom pipeline  would allow us to quickly iterate and test out new models as they evolve. This allows you to try out new models and just as easily rollback your experiment.&lt;/p&gt;

&lt;p&gt;What I wound up building is a &lt;a href="https://streamlit.io/" rel="noopener noreferrer"&gt;&lt;code&gt;Streamlit&lt;/code&gt;&lt;/a&gt; app that uses &lt;a href="https://qdrant.com/" rel="noopener noreferrer"&gt;&lt;code&gt;qdrant&lt;/code&gt;&lt;/a&gt; to index and search data extracted from a collection of &lt;code&gt;pdf&lt;/code&gt; document. The app is a simple chat interface where you can ask questions about the data and get responses from a mixture of &lt;code&gt;GPT-4&lt;/code&gt; and the indexed data.&lt;/p&gt;

&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;h4&gt;
  
  
  1. Setting up the environment
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;use &lt;code&gt;pyenv&lt;/code&gt; to manage python versions
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# update versions&lt;/span&gt;
   pyenv update
   &lt;span class="c"&gt;# install any python version&lt;/span&gt;
   pyenv &lt;span class="nb"&gt;install &lt;/span&gt;3.12.3 &lt;span class="c"&gt;# as of writing this&lt;/span&gt;
   &lt;span class="c"&gt;# create a virtualenv&lt;/span&gt;
   ~/.pyenv/versions/3.12.3/bin/python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
   &lt;span class="c"&gt;# and then activate it&lt;/span&gt;
   &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. Install the dependencies
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# install poetry&lt;/span&gt;
   pip &lt;span class="nb"&gt;install &lt;/span&gt;poetry
   &lt;span class="c"&gt;# install the dependencies&lt;/span&gt;
   poetry &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the dependencies section of the &lt;code&gt;pyproject.toml&lt;/code&gt; file should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;   &lt;span class="err"&gt;...&lt;/span&gt;
   &lt;span class="nn"&gt;[tool.poetry.dependencies]&lt;/span&gt;
    &lt;span class="py"&gt;python&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^3.12"&lt;/span&gt;
    &lt;span class="py"&gt;streamlit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^1.32.1"&lt;/span&gt;
    &lt;span class="py"&gt;langchain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^0.1.12"&lt;/span&gt;
    &lt;span class="py"&gt;python-dotenv&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^1.0.1"&lt;/span&gt;
    &lt;span class="py"&gt;qdrant-client&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^1.8.0"&lt;/span&gt;
    &lt;span class="py"&gt;openai&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^1.13.3"&lt;/span&gt;
    &lt;span class="py"&gt;huggingface-hub&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^0.21.4"&lt;/span&gt;
    &lt;span class="py"&gt;pydantic-settings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^2.2.1"&lt;/span&gt;
    &lt;span class="py"&gt;pydantic&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^2.6.4"&lt;/span&gt;
    &lt;span class="py"&gt;pypdf2&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^3.0.1"&lt;/span&gt;
    &lt;span class="py"&gt;langchain-community&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^0.0.28"&lt;/span&gt;
    &lt;span class="py"&gt;langchain-core&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^0.1.31"&lt;/span&gt;
    &lt;span class="py"&gt;langchain-openai&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^0.0.8"&lt;/span&gt;
    &lt;span class="py"&gt;instructorembedding&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"^1.0.1"&lt;/span&gt;
    &lt;span class="py"&gt;sentence-transformers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"2.2.2"&lt;/span&gt;
   &lt;span class="err"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Set up the loading of the variables from a config file
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;a nice way to manage settings is to use &lt;code&gt;pydantic&lt;/code&gt; and &lt;code&gt;pydantic-settings&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SecretStr&lt;/span&gt;
   &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_settings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseSettings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SettingsConfigDict&lt;/span&gt;

   &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseSettings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;model_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SettingsConfigDict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;env_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config.env&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;env_file_encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;hf_access_token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SecretStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HUGGINGFACEHUB_API_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;openai_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SecretStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;this way you can load the settings from &lt;code&gt;config.env&lt;/code&gt; but variables in the environment override the ones in the file.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a nice extra is that you also get type checking and validation from &lt;code&gt;pydantic&lt;/code&gt; including &lt;code&gt;SecretStr&lt;/code&gt; types for sensitive data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. Set up the UI elements
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Streamlit makes it quite easy to strap together a layout for your app. You have a single script that can run via the streamlit binary:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   streamlit run app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://streamlit.io/components?category=all" rel="noopener noreferrer"&gt;The gallery&lt;/a&gt; has many examples of various integrations and components that you can use to build your app. You have smaller components like inputs and buttons but also more complex UI tables, charts, you even have &lt;a href="https://streamlit.io/components?category=llms" rel="noopener noreferrer"&gt;&lt;code&gt;ChatGPT&lt;/code&gt;&lt;/a&gt; style templates.&lt;/p&gt;

&lt;p&gt;For our chat interface we require very few elements. Generally to create them you only need to use streamlit to initialize the UI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;streamlit&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;
   &lt;span class="bp"&gt;...&lt;/span&gt;
   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
       &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ChatGPT-4 Replica&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ask me anything about the data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ask me anything&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
           &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m thinking...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="bp"&gt;...&lt;/span&gt;
   &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The one thing I find a bit awkward is the fact that if you have elements that need to be conditionally displayed the conditions tend to resemble the javascript pyramid of doom if you have too many conditionals in the same block.&lt;/p&gt;

&lt;p&gt;Below is a simple example so you can see what I mean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please upload some PDFs to start chatting.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sidebar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Process&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
               &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;spinner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processing...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                   &lt;span class="c1"&gt;# get raw content from pdf
&lt;/span&gt;                   &lt;span class="n"&gt;raw_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                   &lt;span class="n"&gt;text_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_text_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                       &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                       &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_vector_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                       &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                       &lt;span class="c1"&gt;# create vector store for each chunk
&lt;/span&gt;                       &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Time taken to create vector store: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes me think that it is probably not designed for complex UIs but rather for quick prototyping and simple interfaces.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. pdf data extraction
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;I used the &lt;code&gt;PyPDF2&lt;/code&gt; library to extract the text from the pdfs. The library is quite simple to use and you can extract the text from a pdf file with a few lines of code.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PyPDF2&lt;/span&gt;

   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_docs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;raw_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
       &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pdf&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pdf_docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="n"&gt;pdf_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
           &lt;span class="n"&gt;pdf_reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PyPDF2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PdfFileReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
           &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;numPages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
               &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pdf_reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getPage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
               &lt;span class="n"&gt;raw_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;raw_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The extracted text should be chunked into smaller pieces that can be used to create embeddings for the &lt;code&gt;qdrant&lt;/code&gt; index.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_text_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;text_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
       &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
           &lt;span class="n"&gt;text_chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_chunks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  6. Setting up the &lt;code&gt;qdrant&lt;/code&gt; server via &lt;code&gt;docker&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;The best way to set up &lt;code&gt;qdrant&lt;/code&gt; is to use docker and to keep track of the environment setup &lt;code&gt;docker-compose&lt;/code&gt; is a nice approach. You can set up the &lt;code&gt;qdrant&lt;/code&gt; server with a simple &lt;code&gt;docker-compose.yml&lt;/code&gt; file like the one below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.9'&lt;/span&gt;

   &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;qdrant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qdrant/qdrant:latest&lt;/span&gt;
       &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6333:6333"&lt;/span&gt; &lt;span class="c1"&gt;# Expose Qdrant on port 6333 of the host&lt;/span&gt;
       &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;qdrant_data:/qdrant/data&lt;/span&gt; &lt;span class="c1"&gt;# Persistent storage for Qdrant data&lt;/span&gt;
       &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;RUST_LOG&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;info"&lt;/span&gt; &lt;span class="c1"&gt;# Set logging level to info&lt;/span&gt;

   &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;qdrant_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qdrant_data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  7. Indexing the data
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;qdrant&lt;/code&gt; client can be used to index the embeddings and perform similarity search on the data. You can pick and choose the best model for embeddings for your data and swap them out if you find &lt;a href="https://huggingface.co/spaces/mteb/leaderboard" rel="noopener noreferrer"&gt;a better one&lt;/a&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_vector_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qdrant_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:6333&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceInstructEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avsolatorio/GIST-Embedding-v0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;device&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
       &lt;span class="n"&gt;vector_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qdrant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="n"&gt;text_chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;qdrant_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pdfs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;force_recreate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  8. sending the query
&lt;/h4&gt;

&lt;p&gt;In order to send the query to &lt;code&gt;qdrant&lt;/code&gt; you again need to embed it to allow to do a similarity search over your collection of documents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qdrant_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:6333&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HuggingFaceInstructEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avsolatorio/GIST-Embedding-v0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;device&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;query_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;vector_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Qdrant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;qdrant_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pdfs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  9. Analysis
&lt;/h4&gt;

&lt;p&gt;You can swap out any of the components in this project with something else. You could use &lt;a href="https://github.com/facebookresearch/faiss" rel="noopener noreferrer"&gt;&lt;code&gt;Faiss&lt;/code&gt;&lt;/a&gt; instead of &lt;code&gt;qdrant&lt;/code&gt;, you could use &lt;code&gt;OpenAI&lt;/code&gt; models for everything(embeddings/chat completion) or you could use open models.&lt;/p&gt;

&lt;p&gt;You can forego the UI and simply use &lt;code&gt;fastapi&lt;/code&gt; to create an API to interact with the PDF documents. I hope this gives you some sense of the possibilities that are available to you when building your own &lt;code&gt;RAG&lt;/code&gt; system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;you can build your own agent and have it respond to queries about your data quite easily&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;streamlit&lt;/code&gt; is great for prototyping and building out simple interfaces&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;qdrant&lt;/code&gt; is good for performing similarity search on your data&lt;/li&gt;
&lt;li&gt;when building &lt;code&gt;RAG&lt;/code&gt; systems you need to make use of embedding models to encode your data&lt;/li&gt;
&lt;li&gt;embedding models are the most taxing parts of the pipeline&lt;/li&gt;
&lt;li&gt;if you have pluggable parts in your pipeline you can swap them out easily to save costs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pydantic&lt;/code&gt; and &lt;code&gt;pydantic-settings&lt;/code&gt; are great for adding type checking and validation to your python code&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>openai</category>
      <category>opensource</category>
      <category>langchain</category>
      <category>rag</category>
    </item>
    <item>
      <title>Converging project boilerplates with copier</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Tue, 11 Jun 2024 10:00:00 +0000</pubDate>
      <link>https://dev.to/adaschevici/converging-project-boilerplates-with-copier-f62</link>
      <guid>https://dev.to/adaschevici/converging-project-boilerplates-with-copier-f62</guid>
      <description>&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;Technically when you start a new project the best way to approach it is by using the &lt;code&gt;CLI&lt;/code&gt; tool of the realm, such as &lt;code&gt;svelte-kit&lt;/code&gt;, &lt;code&gt;astro&lt;/code&gt;, &lt;code&gt;django-cli&lt;/code&gt; etc..., you get the idea. The huge bonus to doing this is that you get the best practices baked in and as new standards are created the &lt;code&gt;CLI&lt;/code&gt; gets updated.&lt;/p&gt;

&lt;p&gt;So far the frontend has been a lot luckier with the tools as far as project generation goes, every major framework having come out with their own project generation tool, some having more than one possibly due to multiple schools of thought.&lt;/p&gt;

&lt;p&gt;There are some backend frameworks that have project generation tools too but so far it seems to be difficult to agree on the structure. The best you can do is find a way to structure it that looks like the majority and makes sense for you. I have been building spiders and crawlers for data ingestion pipelines using &lt;code&gt;python&lt;/code&gt; at first and then &lt;code&gt;node&lt;/code&gt; and &lt;code&gt;go&lt;/code&gt;. Even more recently I have been looking at hacking out some tweaks in some of my &lt;code&gt;neovim&lt;/code&gt; plugins(that is lua).&lt;/p&gt;

&lt;p&gt;For example neovim plugins have a pretty standard setup, they will have a folder layout something like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;scratcher.nvim/
├── README.md
├── lua
│   └── scratcher
│       └── init.lua
└── plugin
    └── scratcher.lua
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The standard way of naming things seems to be gravitating towards having some conventions as you can see so setting up a new plugin would be pretty much repetitive and automatable. And it will probably save you some time and willpower in the long run, provided you have some sense of what your final architecture needs to look like.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fra56uo9dojnhnd8z5ppx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fra56uo9dojnhnd8z5ppx.png" alt="Clones and copies" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;If you are coming from &lt;code&gt;python&lt;/code&gt; like I am then you may already  be familiar with &lt;a href="https://github.com/cookiecutter/cookiecutter" rel="noopener noreferrer"&gt;&lt;code&gt;cookiecutter&lt;/code&gt;&lt;/a&gt;. I have been in the situation a few times where it might have made sense to use it, but every time it was a matter of balancing out the timeline and trying to stay away from over engineering.&lt;/p&gt;

&lt;p&gt;Lately though the stuff I have been dealing with has been slightly on the more experimental side so churning out something new is something that happens quite often, so it makes more sense to have a prebaked architecture for specific project styles.&lt;/p&gt;

&lt;p&gt;From the project templating libraries I was aware of &lt;a href="https://github.com/cookiecutter/cookiecutter" rel="noopener noreferrer"&gt;&lt;code&gt;cookiecutter&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://copier.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;&lt;code&gt;copier&lt;/code&gt;&lt;/a&gt;. &lt;code&gt;cookie-cutter&lt;/code&gt; uses &lt;code&gt;json&lt;/code&gt; for driving the generation while &lt;code&gt;copier&lt;/code&gt; uses &lt;code&gt;yaml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In the end I used copier as I tend to favor &lt;code&gt;yaml&lt;/code&gt; because it allows for comments in the config. It makes it easy to plop random pieces of info or even docs in there. Since the config files driving the wizard it can become quite convoluted and also difficult to read as you would normal code so having the ability to document different options is probably a plus I would think.&lt;/p&gt;

&lt;p&gt;The library allows you to build a sort of setup wizard where you can set up your desired flow of questions, and you can use the choices supplied to drive what folders will be used in the final project boilerplate. This is pretty nifty as it gives you the ability to customize stuff all the way down to the build process.&lt;/p&gt;

&lt;p&gt;Another neat thing is that when you have decent chunks of code that can be shared, so you can just put that in your boilerplate, so it will essentially give you things just the way you like them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cherry pick of features
&lt;/h2&gt;

&lt;p&gt;There are a few notable features that I would kick myself if I didn't mention:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;you can define choices for your options by using the &lt;code&gt;choices&lt;/code&gt; key in your &lt;code&gt;copier.yml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;project_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;str&lt;/span&gt;
&lt;span class="na"&gt;help&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;What type of project are you creating?&lt;/span&gt;
&lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;neovim-plugin&lt;/span&gt;
&lt;span class="na"&gt;choices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;neovim-plugin&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;golang-cli&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;python-cli&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;you can include different files in the root level &lt;code&gt;copier.yml&lt;/code&gt; thus breaking down the wizard in composable parts, for example the CI/CD parts can be shared across projects&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="kt"&gt;!include&lt;/span&gt; &lt;span class="s"&gt;shared-conf/ci-cd.*.yml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;defaults are very powerful and can also make use of current runtime context which is quite nice. Essentially you can think of it as a way to have your very own project wizard that is tweaked for every one of your needs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cool use-cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;you can define a folder/file be created conditionally depending on an option selection eg:&lt;/p&gt;

&lt;p&gt;You define your &lt;code&gt;copier.yml&lt;/code&gt; like this to give you a choice into the type of project:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;project_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;str&lt;/span&gt;
&lt;span class="na"&gt;help&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;What type of project are you creating?&lt;/span&gt;
&lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;neovim-plugin&lt;/span&gt;
&lt;span class="na"&gt;choices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;neovim-plugin&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;golang-cli&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;python-cli&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;you then create a conditionally rendered folder, the naming follows &lt;code&gt;jinja&lt;/code&gt; templating rules, so it might look something like the following&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;% &lt;span class="k"&gt;if &lt;/span&gt;project_type &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;'neovim-plugin'&lt;/span&gt; %&lt;span class="o"&gt;}{{&lt;/span&gt;project_name&lt;span class="o"&gt;}}&lt;/span&gt;.nvim&lt;span class="o"&gt;{&lt;/span&gt;% endif %&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;It looks a bit strange, it will probably not work on Windows and it might look daunting at first but hopefully you will only need to revisit the hierarchy when you update your project structure template. This is not something I would expect to happen very often.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;in more advanced use cases you may need to write some custom python code for transforming/processing of entities in your &lt;code&gt;jinja&lt;/code&gt; templates, or template strings.&lt;br&gt;
To hook this in you need to enable the &lt;code&gt;jinja&lt;/code&gt; template extensions and add a separate package &lt;code&gt;copier-templates-extensions&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;_jinja_extensions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;copier_templates_extensions.TemplateExtensionLoader&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;extensions/context.py:ContextUpdater&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;This allows you to load specific extensions in your project generator runtime and it can serve different functions. The following snippet illustrates a way you can update the context:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;copier_templates_extensions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ContextHook&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ContextUpdater&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextHook&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;new_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;new_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;onboarding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;first steps with &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;project-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;new_context&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;For example you might decide to create a file called "first steps with .txt" once the project is generated. In template form the file name would be &lt;code&gt;{{ onboarding }}&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusions:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;you can build hybrid boilerplates for your project and drive generating the folder hierarchy from a single repo&lt;/li&gt;
&lt;li&gt;the notation is a bit weird with templated folder names, will not work on Win&lt;/li&gt;
&lt;li&gt;the templated naming is also very powerful allowing for conditional creation of folders&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>boilerplate</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Gopherizing some puppeteer code</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Thu, 06 Jun 2024 22:00:00 +0000</pubDate>
      <link>https://dev.to/adaschevici/gopherizing-some-puppeteer-code-29g4</link>
      <guid>https://dev.to/adaschevici/gopherizing-some-puppeteer-code-29g4</guid>
      <description>&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;As developers we sometimes get a bad case of the shiny new object syndrome. I hate to say it but every time I start hacking on something new, the urge to add something new is quite overwhelming. It is really tough to keep an interest in projects for a long time and it starts to become tedious the deeper you go into the weeds. I suppose this is why people list &lt;code&gt;growth&lt;/code&gt; as one of their top motivations.&lt;/p&gt;

&lt;p&gt;I consider that anything new is an opportunity for growth, and doing something over and over in a similar manner quickly becomes a tedious. I've been building various types of scrapers since 2011, and it all started because I wanted to automate a workflow and save myself some time. The time spent on automating this was probably more than if I had done this by hand but it was interesting interacting via &lt;code&gt;http&lt;/code&gt; from code and crunching the data automatically.&lt;/p&gt;

&lt;p&gt;The amount of data on the web is pretty crazy, you have various sources and multiple types of data that can be combined in very interesting ways. Back in those days dropshipping was becoming huge and people were performing arbitrage across Amazon/Ebay/local flea-markets etc.. Tools that were able to perform analytics across these shops were quite trendy, and the market was slightly less crowded, so for me building crawlers seemed like a nice idea to build out a good customer base.&lt;/p&gt;

&lt;p&gt;Nowadays due to &lt;code&gt;RAG&lt;/code&gt; systems, gathering data automatically, breaking it down and feeding it into embedding models and storing it in vector databases for &lt;code&gt;LLM&lt;/code&gt; information enhancement has come back into the spotlight. In between then and now there have been a few changes in the way data is served up for consumption. Off the top of my head:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;single page apps have gained huge traction, most everyone turning to building their content in a &lt;code&gt;js&lt;/code&gt; bundle, loading everything on the fly as the page loads&lt;/li&gt;
&lt;li&gt;websites have become fussy about having their data used by unknown parties, so they have been closing down access and have become very litigious(#TODO: maybe add some cases of court cases Linkedin vs those guys, Financial Times vs OpenAI)&lt;/li&gt;
&lt;li&gt;bot detection and prevention - this one is funny since it is like a flywheel, it built 2 lucrative markets overnight - bot services and anti bot protection&lt;/li&gt;
&lt;li&gt;TBH, it's difficult to predict where this might be heading, it kind of feels like people have been aiming to move all their datas into data centers but since data is becoming so guarded...will they move back to paper?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ba7923qyfb4tlkhatim.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ba7923qyfb4tlkhatim.png" alt="All your data are belong to us" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because of &lt;code&gt;SPAs&lt;/code&gt; and the wide adoption of &lt;code&gt;js&lt;/code&gt; in websites it is much more convenient to use some sort of browser automation to crawl pages and extract the information. This makes it less prone to badgering the servers, and having to reverse engineer the page content loading, so you will probably want to use either a &lt;a href="https://chromedevtools.github.io/devtools-protocol/" rel="noopener noreferrer"&gt;&lt;code&gt;chrome developer tools protocol&lt;/code&gt;&lt;/a&gt; or &lt;a href="https://www.w3.org/TR/webdriver/" rel="noopener noreferrer"&gt;&lt;code&gt;webdriver&lt;/code&gt;&lt;/a&gt; flavored communications protocol with the browser. Back in the day IIRC I have also used the &lt;a href="https://www.riverbankcomputing.com/software/pyqt/intro" rel="noopener noreferrer"&gt;&lt;code&gt;PyQt&lt;/code&gt;&lt;/a&gt; bindings for acessing the &lt;code&gt;Qt&lt;/code&gt; browser component but nowadays its mostly straight-up browsers.&lt;/p&gt;

&lt;p&gt;These days my goto is &lt;code&gt;puppeteer&lt;/code&gt;. It's a weird tool that can be easily be used to scrape data from pages. The reason I say it is weird is mainly due to the deceiving nature of the internals, essentially using two &lt;code&gt;js&lt;/code&gt; engines that communicate via the &lt;code&gt;cdp&lt;/code&gt; protocol that is a a very dense beast and does not play nice with complex objects.&lt;/p&gt;

&lt;p&gt;Recently it has become more appealing to me to use strongly typed languages. This is probably because I have started to narrow down my experiments to very small code samples that illustrate one thing at time. I would go as far as to call it experiment driven development. Duck typing is fun as you can print pretty much anything you want. I was thinking to use &lt;code&gt;rust&lt;/code&gt; but it has a very tough learning curve. Node is pretty nice with &lt;code&gt;mjs&lt;/code&gt; but it's confusing sometimes when it crosses over between the two event loops, also while it is good for communicating on &lt;code&gt;cdp&lt;/code&gt; it is not really designed for sync code and &lt;code&gt;python&lt;/code&gt; is a bit boring for me so I decided to look at &lt;code&gt;go&lt;/code&gt;. Since it is a google language I expected it to have decent support for cdp, and the learning curve is slightly gentler than &lt;code&gt;rust&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;Looking at the alternatives there are two that stand out &lt;a href="https://github.com/chromedp/chromedp" rel="noopener noreferrer"&gt;&lt;code&gt;chromedp&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://go-rod.github.io/#/" rel="noopener noreferrer"&gt;rod&lt;/a&gt;. Rod looks like it is the prodigal son of &lt;a href="https://behave.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;&lt;code&gt;behave&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://cucumber.io/" rel="noopener noreferrer"&gt;&lt;code&gt;cucumber&lt;/code&gt;&lt;/a&gt; some well established BDD frameworks. Personally I am not finding the &lt;code&gt;MustYaddaYadda...&lt;/code&gt; very readable and combining it with other custom APIs would probably make it become inconsistent. It has a few nice things in the way it abstracts &lt;code&gt;iframes&lt;/code&gt; but I am just unable to go past the higher level API.&lt;/p&gt;

&lt;p&gt;In the end I wound up choosing &lt;code&gt;chromedp&lt;/code&gt;. It works pretty well for most use cases, there are some places where it doesn't quite cut it and I wish it did, but by now I have come to terms there is no one technology to rule them all, wouldn't it be nice if that existed?&lt;/p&gt;

&lt;p&gt;You can install it via &lt;code&gt;go get -u github.com/chromedp/chromedp&lt;/code&gt; and then you can start using it in your code. It has quite a few submodules and related projects that you may want to use depending on your concrete use case.&lt;br&gt;
Generally if your use case is only data extraction and you have no tricky actions to deal with(page is &lt;em&gt;bot resistant&lt;/em&gt;, some elements are loaded at later times, &lt;code&gt;iframe&lt;/code&gt; hell etc...).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/chromedp/chromedp"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/chromedp/cdproto/cdp"&lt;/span&gt;
    &lt;span class="c"&gt;// for slightly more advanced use cases&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/chromedp/cdproto/browser"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/chromedp/cdproto/dom"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/chromedp/cdproto/storage"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/chromedp/cdproto/network"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The pleasant surprises
&lt;/h2&gt;

&lt;p&gt;Well, calling them surprises is a bit of a stretch, I have been &lt;code&gt;golang&lt;/code&gt; over the years and I have to admit it is a pretty nice ecosystem and language. &lt;br&gt;
&lt;code&gt;chromedp&lt;/code&gt; automates chrome or any binary that you are able to communicate with via &lt;a href="https://github.com/chromedp/chromedp/blob/ebf842c7bc28db77d0bf4d757f5948d769d0866f/allocate.go#L349" rel="noopener noreferrer"&gt;&lt;code&gt;cdp&lt;/code&gt;&lt;/a&gt;. The API is somewhat intuitive, haven't found myself diving into the guts of it very often to figure out how stuff works. The good part is that once you extract the data from the nodes you are interested in you can map it to go structs and make use of the go typing system. &lt;/p&gt;

&lt;p&gt;For example you can grab a list of elements via selector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;productNodes&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;cdp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c"&gt;// visit the target page&lt;/span&gt;
        &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Navigate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://scrapingclub.com/exercise/list_infinite_scroll/"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;script&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitVisible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;".post:nth-child(60)"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Nodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;`.post`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;productNodes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ByQueryAll&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error while trying to grab product items."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then map each element to a struct&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;productNodes&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;`h4`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ByQuery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FromNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
            &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;`h5`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ByQuery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FromNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error while trying to grab product items."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Another nice perk is that go is built with concurrency in mind so crunching the extracted data can be a lot more performant than in puppeteer.&lt;/p&gt;

&lt;p&gt;Yet another pretty nifty thing I found is that you can deliver a binary that can be compiled for multiple platforms and can be distributed easily. This is a huge plus given that you may not really know who the user of the tool might be in the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ugly parts
&lt;/h2&gt;

&lt;p&gt;The way to communicate with the browser is still through the &lt;code&gt;cdp&lt;/code&gt; protocol and sometimes you need to pass objects only objects that can be serialized.&lt;/p&gt;

&lt;p&gt;If you need to work with objects that can't be serialized you will need to inject &lt;code&gt;js&lt;/code&gt; into the page context and interact with it.&lt;/p&gt;

&lt;p&gt;When you have a page that contains &lt;code&gt;iframes&lt;/code&gt; it is problematic to trigger events on the elements inside them. You can extract data from it but triggering events gets messy as you need &lt;code&gt;js&lt;/code&gt; for that.&lt;br&gt;
An example of how you might extract data from an &lt;code&gt;iframe&lt;/code&gt; might look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;iframes&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;cdp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Nodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;`iframe`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;iframes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ByQuery&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Nodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;`iframe`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;iframes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ByQuery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FromNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iframes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])));&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"#second-nested-iframe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ByQuery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chromedp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FromNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iframes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But in order to trigger events on elements inside the iframe you can't just use the &lt;code&gt;chromedp&lt;/code&gt; API, and since &lt;code&gt;chromedp.Evaluate&lt;/code&gt; does not take a &lt;code&gt;Node&lt;/code&gt; as context you will need to perform all the actions in &lt;code&gt;javascript&lt;/code&gt; and that will make the resulting code a bit of a mishmash of &lt;code&gt;go&lt;/code&gt; and &lt;code&gt;js&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;puppeteer&lt;/code&gt; also has some extra packages that can be used like &lt;code&gt;puppeteer-stealth&lt;/code&gt; but &lt;code&gt;chromedp&lt;/code&gt; does not seem to have an equivalent for that at this time. The &lt;code&gt;rod&lt;/code&gt; package has &lt;a href="https://github.com/go-rod/stealth" rel="noopener noreferrer"&gt;&lt;code&gt;rod stealth&lt;/code&gt;&lt;/a&gt; but I haven't tried it since the API is not to my liking.&lt;/p&gt;

&lt;p&gt;The other slightly dissappointing missing feature is that when running in headless mode all the GPU features are disabled because it is running in a &lt;a href="https://github.com/chromedp/docker-headless-shell" rel="noopener noreferrer"&gt;&lt;code&gt;headless-chrome&lt;/code&gt;&lt;/a&gt; container which does not have a display server. Puppeteer is able to run with GPU features enabled allowing it to pass the &lt;a href="http://bot.sannysoft.com/" rel="noopener noreferrer"&gt;&lt;code&gt;webgl fingerprinting&lt;/code&gt;&lt;/a&gt; tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;in some ways puppeteer is still better than &lt;code&gt;chromedp&lt;/code&gt;, working with &lt;code&gt;iframes&lt;/code&gt; falls short&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rod&lt;/code&gt; is a nice alternative but its API looks like it was designed for testing, reminds me of &lt;code&gt;cucumber&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;chromedp&lt;/code&gt; is a nice alternative to &lt;code&gt;puppeteer&lt;/code&gt; if you are looking to build a binary that can be distributed easily &lt;/li&gt;
&lt;li&gt;it is a bit more performant than &lt;code&gt;puppeteer&lt;/code&gt; due to the concurrency model in &lt;code&gt;go&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>go</category>
      <category>scraping</category>
      <category>chrome</category>
      <category>automation</category>
    </item>
    <item>
      <title>Keep your estimates boring</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Wed, 29 May 2024 09:12:00 +0000</pubDate>
      <link>https://dev.to/adaschevici/making-your-estimates-boring-33ki</link>
      <guid>https://dev.to/adaschevici/making-your-estimates-boring-33ki</guid>
      <description>&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;Most everyone I know has got shiny object syndrome. We all want to work on the latest and greatest. I myself am part of that crowd very much. Whenever I start on a project I will never pin the versions for the libraries. That adds anywhere between 0% and 50% on top of the project timeline. The most notable example is Javascript with its myriad of libraries. You would think everyone is familiar with this meme by now 😅.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3roly01iyhuzumy5ssx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3roly01iyhuzumy5ssx.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the first article in the series we talked about getting &lt;em&gt;anchors&lt;/em&gt; right and trying to stay away from the uniqueness bias when choosing a good anchor. In &lt;em&gt;How Big Things Get Done&lt;/em&gt; the authors also bring in a concept that was new to me: &lt;strong&gt;the reference class&lt;/strong&gt;.&lt;br&gt;
The phrase was originally coined in the 1970s by the psychologist &lt;a href="https://www.newyorker.com/books/page-turner/the-two-friends-who-changed-how-we-think-about-how-we-think" rel="noopener noreferrer"&gt;Daniel Kahneman and his colleague Amos Tversky&lt;/a&gt; and is regularly used in the context of &lt;em&gt;reference class forecasting&lt;/em&gt;.&lt;br&gt;
Daniel and Amos refer to two types of views when estimating a project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;inside view&lt;/em&gt; which is the view while working on the project with your personal biases&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;outside view&lt;/em&gt; which is the view from the outside, looking at the project as a whole and comparing it to similar projects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Enough with the theory, in practice, what we actually want to achieve is to have our estimates work and be as accurate as possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;Now that we have the lingo down, let's get into the nitty-gritty. We want to figure out how to calculate the estimates, you got that right &lt;strong&gt;calculate&lt;/strong&gt;. The calculations are based on being able to cut down your project from being a special and unique snowflake to a project that is similar to others. It's a combination of statistical and historical analysis of other projects as similar as possible to yours.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Going on a tangent here, wouldn't it be cool if we could have a database of software project estimates with numerical data, situational requirements and conditions, and perhaps how long the project took in the end?&lt;/em&gt; 🤔&lt;/p&gt;




&lt;p&gt;You want to reduce your project to something as generic as possible then look for data about other projects like it. As a software developer you may be tempted to think that it is special and unique, but finding the commonalities will help you get a better estimate.&lt;br&gt;
We could make use of both &lt;em&gt;inside view&lt;/em&gt; and &lt;em&gt;outside view&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;outside view&lt;/em&gt;: see how long similar projects took(take the median) - I am referring to the reduced version where you cut out any product differentiators&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;inside view&lt;/em&gt;: see how long the differentiators will take (&lt;em&gt;this you can break down further as well into common tasks and unique tasks&lt;/em&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;... do you see where I am going with this? It's all turtles all the way down 🐢. Now you can already see how things can be broken down further into smaller pieces and how similarities can make estimating easier.&lt;/p&gt;

&lt;p&gt;The numbers show a 30% increase in accuracy when using &lt;em&gt;reference class forecasting&lt;/em&gt;, that is the &lt;em&gt;outside view&lt;/em&gt;, with 50% not being uncommon.&lt;/p&gt;

&lt;p&gt;The aspect that is different from plain anchor-based estimates is that you choose an anchor that is based on the &lt;em&gt;reference class&lt;/em&gt; which makes it closer to the objective reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Into the future with AI
&lt;/h2&gt;

&lt;p&gt;Last couple of years I have been working in the field of AI and I have been an avid reader of various papers and consumed a decent amount of tutorials and courses. Deep learning is an amazingly powerful tool that is able to draw conclusions based on the importance of a particular feature of the project and classify it.&lt;/p&gt;

&lt;p&gt;If we had the data about projects we could train a multi-class classifier to predict the time it would take to complete a project(S/M/L/XL). This could be a great Trello plugin for example.&lt;/p&gt;

&lt;p&gt;Linear regression can be another simpler approach to do a numeric estimate of the project timeline. Now, this feels like we are taking all the joy out of the agile SDLC, but remember this is only supposed to be used as a data focused approach, from the &lt;em&gt;outside view&lt;/em&gt; i.e. looking objectively at the data, so no hard feelings to be had 😉.&lt;/p&gt;

&lt;p&gt;Thinking a bit further we could have an LLM + RAG system that looks at the database of projects we have broken down, does a similarity search and gives us some kind of standard estimates.&lt;/p&gt;

&lt;p&gt;The data would probably be a huge challenge for this one. You would have to get data from various sources and have it clean, usable and can be used to train the models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stay boring&lt;/strong&gt;: don't get caught up in the thinking your project is special and unique&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use the reference class&lt;/strong&gt;: look at similar projects and see how long they took&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use both views&lt;/strong&gt;: &lt;em&gt;inside view&lt;/em&gt; and &lt;em&gt;outside view&lt;/em&gt; to get a better estimate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use data&lt;/strong&gt;: if you have it, use it to your advantage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ask for help&lt;/strong&gt;: if you are not sure, ask someone who has done it before, outside perspective can add a layer of objectivity&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agile</category>
      <category>ai</category>
      <category>estimates</category>
    </item>
    <item>
      <title>Why I still struggle with estimates</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Sun, 26 May 2024 09:20:00 +0000</pubDate>
      <link>https://dev.to/adaschevici/why-i-still-struggle-with-estimates-357k</link>
      <guid>https://dev.to/adaschevici/why-i-still-struggle-with-estimates-357k</guid>
      <description>&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;Most of my articles so far have been slightly heavy on the more technical side, this is what is looking like my forté. I find writing code and solving different problems really interesting.&lt;br&gt;
OFC solving problems, especially while hacking comes with a fair degree of uncertainty and frustration...when things go wrong or simply everything turns into a big ol' rabbit hole. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9kymrwjl2a0jn6zgsgd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9kymrwjl2a0jn6zgsgd.png" alt="The Matrix Morpheus - how deep does the rabbit hole go?" width="662" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I am not 100% certain whether the frustration comes from not being able to finish projects or from the perceived inability to stick to a self-imposed (albeit artificial) deadline.&lt;/p&gt;

&lt;p&gt;So this time you will be reading some of my musings about estimations and some interesting aspects that can be found in a &lt;a href="https://www.amazon.com/How-Big-Things-Get-Done/dp/0593239512" rel="noopener noreferrer"&gt;recent book&lt;/a&gt; I read that put some things into perspective for me.&lt;/p&gt;

&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;I have been trying for years to get close to delivering something on time. The bigger the project the more likely it is to veer off track.&lt;br&gt;
The way you can get to near perfectly predictable timelines is if all the pieces in your project can be estimated with razor sharp precision and the way they will all fit together in the finished product.&lt;br&gt;
Every part needs to be standard and should fit together like off the shelf components, not only that the different components need to fit together well at the end.&lt;br&gt;
The book gives some statistics across projects of how often they are on time, on quality and budget, the number is dismally low across the projects that were analyzed in the book - &lt;strong&gt;only 0.5%&lt;/strong&gt;.&lt;br&gt;
There are some giant scale projects that have succeeded like the Empire State Building and the Guggenheim in Bilbao. The approach that seems to work well for removing the guess work is building the parts before the whole and using tried and tested solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where does it go wrong?
&lt;/h2&gt;

&lt;p&gt;In software engineering mistakes are not as irrecoverable as building a physical structure wrong, of course as the impact of software systems grows the line is more and more blurred.&lt;br&gt;
You can image there may come a time when the resilience of software will become as important as that of our homes...or maybe it already is and we just are unable to wrap our heads around it.&lt;br&gt;
Most project examples described are delayed because of several reasons.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the initial assumption are vague and contain hidden complexity while sounding very simple and clear&lt;/li&gt;
&lt;li&gt;the finished product looks slick and polished, catches the eye, but the architectural feasibility is not assessed beforehand&lt;/li&gt;
&lt;li&gt;optimistic estimates&lt;/li&gt;
&lt;li&gt;doing something that has never been done before&lt;/li&gt;
&lt;li&gt;the human factor&lt;/li&gt;
&lt;li&gt;inexperience&lt;/li&gt;
&lt;li&gt;unknown unknowns&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My A-HA moment
&lt;/h2&gt;

&lt;p&gt;I had not been able to formalize this, and had not come across a term that resonated with me until I read a case study in the &lt;em&gt;"How big things get done"&lt;/em&gt; by Bent Flyvbjerg and Dan Gardner.&lt;br&gt;
The story refers to a newspaper columnist that engages in writing a biography. His estimate was based on his experience of writing particularly long articles. This prior experience bias is called an "anchor".&lt;br&gt;
In the story the writer estimates the biography of ~17 chapters at 9 months to a year, using as an anchor estimate the fact that one long article takes 3 weeks to research and write. Needless to say this estimate was off by a factor of 7. In the end it took 7 years.&lt;br&gt;
The story does have a happy end in his case however this is an outlier among the various case studies.&lt;br&gt;
It turns out that anchors are a very common pattern we use for estimating how long something will take.&lt;br&gt;&lt;br&gt;
Typically we try to find similarities within our prior experience. The catch with software projects is that technology evolves so quickly and company/team culture is so unique that it makes anchors very much a guesstimation rule of thumb rather than a rigorous framework you might use.&lt;br&gt;
There are the extrinsic aspects that change and of course there are the intrinsic goals such as writing better code, designing better architectures and products, faster development whatever motivates you.&lt;br&gt;
Now, given all these things that evolve over time, what would you think the probability of your experience with something in the past would equate to your estimates being accurate for a similar project two years in the future in a different company?&lt;br&gt;&lt;br&gt;
One thing that works is breaking the project down into decent sized components and experiment on building the components, and make the project about putting things together.&lt;br&gt;
You want to work with pretty large components, yet small enough that the experiments churn out fast. It's a balancing act but in the end solving a puzzle with 1000 pieces is much harder than one with 5.&lt;br&gt;
I think Agile came out of this need for faster iteration and predictability...but in the end the probability of an estimate to be accurate to the minute is very low.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;projects are a sum of experiments&lt;/li&gt;
&lt;li&gt;projects are an aggregate of the experiences of the participants&lt;/li&gt;
&lt;li&gt;boring technology is easier to estimate&lt;/li&gt;
&lt;li&gt;avoiding employee churn helps with estimating projects&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agile</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Screenshots optimization on OpenAI tokens</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Fri, 17 May 2024 06:30:28 +0000</pubDate>
      <link>https://dev.to/adaschevici/screenshots-optimization-on-openai-tokens-27pk</link>
      <guid>https://dev.to/adaschevici/screenshots-optimization-on-openai-tokens-27pk</guid>
      <description>&lt;h2&gt;
  
  
  Prologue
&lt;/h2&gt;

&lt;p&gt;In my previous &lt;a href="https://dev.to/adaschevici/hacking-out-an-ai-spider-with-node-1h31"&gt;post&lt;/a&gt; I have taken the approach of extracting data from the &lt;code&gt;html&lt;/code&gt; of the page, and in order to keep token usage low decided to also clean it. Here I will be looking at optimizing the token usage further.&lt;/p&gt;

&lt;p&gt;Funnily enough taking screenshots of the element we are looking for, the number of tokens is drastically reduced.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;When working with OpenAI APIs things are not exactly free. They are cheap but depending on how often you run certain operations it can add up quite quickly. Fortunately the API has some metrics for measuring usage for the different operations so that you don't get any surprises.&lt;/p&gt;

&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;So what you can do is pick a website at random, for the purpose of the exercise we could use &lt;a href="https://www.homegate.ch/rent/real-estate/zip-8002/matching-list?ac=2.5" rel="noopener noreferrer"&gt;homegate&lt;/a&gt;. It's a real estate listing site, so you might use it if you wanted to find a place to rent in Zürich.&lt;/p&gt;

&lt;p&gt;Let's use &lt;code&gt;puppeteer&lt;/code&gt; as per usual to load up the search results page that is listed above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;puppeteer-extra&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;StealthPlugin&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;puppeteer-extra-plugin-stealth&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://www.homegate.ch/rent/real-estate/zip-8002/matching-list?ac=2.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StealthPlugin&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;grabSelectorScreenshot&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// usual browser startup:&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setUserAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;waitUntil&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;networkidle0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="c1"&gt;// the following code will scroll to the bottom of the page&lt;/span&gt;
 &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we have loaded the listing search results you want to select the list element and take a screenshot of it.&lt;/p&gt;

&lt;p&gt;To grab a picture of a single element on the page what you want to do is the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// you need to grab the right selector for your usecase&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.ResultListPage_resultListPage_iq_V2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;designatedPathPng&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`./screenshots/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;hashed&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-list-ss.png`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;screenshot&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;designatedPathPng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;png&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks fairly straightforward and works fine, however if we have a look at the screenshot you will notice that some parts have not rendered.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gotcha 😕
&lt;/h2&gt;

&lt;p&gt;So...it seems there is a small conundrum with the randomly chosen website, it lazy renders parts as they come into view, which means we have to scroll down and get the bottom part visible before we can grab the snapshot. Just to be clear with this approach we will rely solely on the visual aspect of the page so there is no data available if the elements are not rendered.&lt;/p&gt;

&lt;p&gt;To do this we will attempt to scroll down all the way to the bottom of the page and grab the search results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;totalHeight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;distance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// should be less than or equal to window.innerHeight&lt;/span&gt;
        &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;timer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setInterval&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;scrollHeight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;scrollHeight&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
          &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scrollBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="nx"&gt;totalHeight&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

          &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;totalHeight&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;scrollHeight&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;clearInterval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;timer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tiny piece of code scrolls down to the bottom of the page and you can even delay it so it doesn't stress the host. Just to be clear you should always be ethical in your data collection and try not to annoy the site owners too much. It gets particularly dicey when going into the realm of commercial applications as then your legal bases need to be rock solid.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finishing it up with a bit of AI
&lt;/h2&gt;

&lt;p&gt;After we have scrolled all the way down to the bottom of the page we can come back to the snapshot of the element. It should now return the list of listings.&lt;/p&gt;

&lt;p&gt;You can save the screenshot and send it out to OpenAI and grab the info from the image.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;propertyInfoImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;imageType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4-turbo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Can you extract the property prices out of this
            image and send me the results? Can you send me the output as JSON?`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image_url&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="c1"&gt;// imageType is png/jpeg&lt;/span&gt;
              &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`data:image/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;imageType&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;;base64,&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;propertyInfoImage&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="c1"&gt;// spinner.succeed('Received response from OpenAI');&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are a couple of interesting things about the images:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;different type of image files might yield better results or better token usage metrics&lt;/li&gt;
&lt;li&gt;the number of tokens used is considerably slower than in the case of text, which is pretty interesting eg:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt; prompt_tokens: 799, completion_tokens: 289, total_tokens: 1088 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;it's very surprising that using images has a lower token usage than using text by several orders of magnitude&lt;/li&gt;
&lt;li&gt;chunking might be more challenging since you need to rely on visual marker detection which feels a bit more subjective&lt;/li&gt;
&lt;li&gt;it is pretty interesting what you can do with visuals, especially when it comes to extracting data. You might find it interesting interpreting charts automatically or other types of information encoded into images and extracting it automatically.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>openai</category>
      <category>javascript</category>
      <category>api</category>
      <category>gpt4</category>
    </item>
    <item>
      <title>Hacking out an AI spider with Node</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Sat, 11 May 2024 09:06:38 +0000</pubDate>
      <link>https://dev.to/adaschevici/hacking-out-an-ai-spider-with-node-1h31</link>
      <guid>https://dev.to/adaschevici/hacking-out-an-ai-spider-with-node-1h31</guid>
      <description>&lt;h2&gt;
  
  
  Prologue
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Python&lt;/code&gt; is great for AI software, but I prefer &lt;code&gt;NodeJS&lt;/code&gt; for crawling with a headless browser, IMO &lt;code&gt;python&lt;/code&gt; has some good libraries for parsing the content but &lt;code&gt;puppeter&lt;/code&gt; is slightly more powerful when looking at headless browsers.&lt;br&gt;
That said I wanted to do an experiment and try to pull some data from a page without going through the tedious process of checking every element of interest and discarding the fluff.&lt;br&gt;
The nice thing about &lt;code&gt;puppeteer&lt;/code&gt; is also that you don't need to reverse engineer the requests going into the page, figure out cookies and whatnot.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;Web/data scraping is pretty cool, it gives you the power to combine different data sources and devise interesting ways to draw conclusions about it. It is even more interesting these days with ChatGPT, you can simply dump some data and ask it what it can do with it, or if it can extract usable info from it. That is in fact quite nice.&lt;/p&gt;
&lt;h2&gt;
  
  
  What?
&lt;/h2&gt;

&lt;p&gt;We're going to be looking at a &lt;a href="https://www.immobiliare.it/vendita-case/verona/?criterio=rilevanza&amp;amp;prezzoMassimo=120000" rel="noopener noreferrer"&gt;listing site&lt;/a&gt; for rentals in Italy, picked something at random for no real reason, mostly to see if it works.&lt;/p&gt;

&lt;p&gt;Let's plan this out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;grab the content&lt;/li&gt;
&lt;li&gt;identify the listings wrapper and grab the innerHTML of it&lt;/li&gt;
&lt;li&gt;pass in the HTML to &lt;code&gt;gpt-4-turbo&lt;/code&gt; via the API and construct a dialogue with it to extract the data we are looking for&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;We're going to install a few dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# we like speed so while npm is nice, pnpm is faster&lt;/span&gt;
npm i &lt;span class="nt"&gt;-g&lt;/span&gt; pnpm
pnpm i openai puppeteer-extra dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then figure out the main listing element, that at the time of writing was something easy like &lt;code&gt;.in-realEstateResults&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Some quick &lt;a href="https://pptr.dev/" rel="noopener noreferrer"&gt;docs fumbling&lt;/a&gt; and you get a working script for loading the page and grabbing the innerHTML of the listing element like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setUserAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.in-realEstateResults&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;innerHtml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHTML&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;openai&lt;/code&gt; library api is quite straightforward and they offer &lt;a href="https://platform.openai.com/docs/quickstart?context=node" rel="noopener noreferrer"&gt;quite a few examples&lt;/a&gt; too. The library provides some responses and also some metrics about the usage you had for your specific query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4-turbo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Can you extract the property prices out of this
            html and send me the results? The text is in italian so
            you should translate that. Can you send me the output as JSON?`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;propertyInfoHtml&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Another extremely cool thing is that you can simply tell the model to extract data as &lt;code&gt;json&lt;/code&gt; and what you will have done is convert &lt;code&gt;html&lt;/code&gt; to a &lt;code&gt;json&lt;/code&gt; api. Basically you have a magicbox that you can tell what you want to extract from some  structure. You simply have a piece of flexible code that can extract data from various &lt;code&gt;html&lt;/code&gt; pages with different selectors, so no more hassle figuring that stuff out. And the output is quite well structured and would require minimal after-processing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;    &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"price"&lt;/span&gt;: &lt;span class="s2"&gt;"€ 115.000"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"description"&lt;/span&gt;: &lt;span class="s2"&gt;"Bilocale via Felice Casorati 33, Borgo Venezia, Verona"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"rooms"&lt;/span&gt;: &lt;span class="s2"&gt;"2 locali"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"size"&lt;/span&gt;: &lt;span class="s2"&gt;"60 m²"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"bathroom"&lt;/span&gt;: &lt;span class="s2"&gt;"1 bagno"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"floor"&lt;/span&gt;: &lt;span class="s2"&gt;"Piano 1"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"elevator"&lt;/span&gt;: &lt;span class="s2"&gt;"Ascensore"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"balcony"&lt;/span&gt;: &lt;span class="s2"&gt;"Balcone"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;    &lt;span class="o"&gt;}&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;    &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"price"&lt;/span&gt;: &lt;span class="s2"&gt;"€ 120.000"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"description"&lt;/span&gt;: &lt;span class="s2"&gt;"Trilocale via Arnaldo Da Brescia, 27, Porto San Pancrazio, Verona"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"rooms"&lt;/span&gt;: &lt;span class="s2"&gt;"3 locali"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"size"&lt;/span&gt;: &lt;span class="s2"&gt;"77 m²"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"bathroom"&lt;/span&gt;: &lt;span class="s2"&gt;"1 bagno"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"floor"&lt;/span&gt;: &lt;span class="s2"&gt;"Piano R"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"elevator"&lt;/span&gt;: &lt;span class="s2"&gt;"No Ascensore"&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;        &lt;span class="s2"&gt;"cellar"&lt;/span&gt;: &lt;span class="s2"&gt;"Cantina"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;' +
      '&lt;/span&gt;    &lt;span class="o"&gt;}&lt;/span&gt;,&lt;span class="se"&gt;\n&lt;/span&gt;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How much though?
&lt;/h2&gt;

&lt;p&gt;This processing though can quickly get out of hand even with the low pricing, so you want to cache results. You also need to pay attention to the context window of the model you are using. The last line with &lt;code&gt;response.usage&lt;/code&gt; tells you how many tokens you have used in the query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt; &lt;span class="o"&gt;{&lt;/span&gt; prompt_tokens: 30337, completion_tokens: 972, total_tokens: 31309 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works no problem with &lt;code&gt;gpt-4&lt;/code&gt;, but smaller models that you can host locally for example might have an issue with this. There are pages out there that can be quite large also.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trimming the fat
&lt;/h2&gt;

&lt;p&gt;If you think about it a big part of &lt;code&gt;html&lt;/code&gt; is non structure defininig. What that means is that it is either display information(css) or interaction capabilities(events and such). The interesting part is that it does not have any true value for the data which is what we are after.&lt;br&gt;
What if we removed all of that? Crazy idea right?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm i jsdom
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So let's add an intermediate step before sending the &lt;code&gt;html&lt;/code&gt; out to OpenAI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;JSDOM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;innerHtml&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;spinner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cleaning up HTML&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelectorAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;elements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;attribute&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;removeAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attribute&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cleanedHtml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And presto, the metrics of the trimmed down version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt; prompt_tokens: 7790, completion_tokens: 586, total_tokens: 8376 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pretty handy...less than one third.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;code&gt;gpt-4-turbo&lt;/code&gt; apis are definitely worth exploring, 15 minutes exploration is not doing it justice&lt;/li&gt;
&lt;li&gt;you can tweak the content and let go of everything that is not important for your query&lt;/li&gt;
&lt;li&gt;caching might work in some cases depending on what data you want to pull&lt;/li&gt;
&lt;li&gt;slimming down the context actually seems to make quite a difference&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>openai</category>
      <category>javascript</category>
      <category>gpt3</category>
      <category>gpt4</category>
    </item>
    <item>
      <title>Adding search to static websites</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Sat, 04 May 2024 22:37:59 +0000</pubDate>
      <link>https://dev.to/adaschevici/adding-search-to-static-websites-3del</link>
      <guid>https://dev.to/adaschevici/adding-search-to-static-websites-3del</guid>
      <description>&lt;h2&gt;
  
  
  In the beginning
&lt;/h2&gt;

&lt;p&gt;In a &lt;a href="https://dev.to/adaschevici/building-static-websites-1f0c"&gt;previous post&lt;/a&gt; I touched a bit on how lately building static websites has piqued my interest. I have been contemplating starting a blog and fell into my usual routine of digging myself into a rabbit hole. It's kinda annoying the amount of yak-shaving I volunteer my brain for 🤣.&lt;/p&gt;

&lt;p&gt;One of the things I am finding limiting is that due to the nature of static sites you kinda need to flip the script a little bit, data comes in at build time normally. This in turn makes it a bit weird to think about site search.&lt;/p&gt;

&lt;p&gt;Normally search would be based on some kind of database, search index and a server side algorithm and filter API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;In this particular instance I think the answer is self explanatory, the search feature is not really a luxury anymore. It allows for better discoverability, and even though the outreach is from various external posts via backlinks, without search the blog would look rudimentary.&lt;/p&gt;

&lt;h2&gt;
  
  
  How?
&lt;/h2&gt;

&lt;p&gt;While you usually wouldn't fetch data for pages in a static website, the best performing search in terms of relevance would require server side implementation. One example of this that is also workable on static websites is &lt;a href="https://simplystatic.com/tutorials/algolia-with-wordpress/#Search-functionality-for-static-websites" rel="noopener noreferrer"&gt;Algolia&lt;/a&gt;. It's pretty good and it's great that it also supports static websites.&lt;/p&gt;

&lt;p&gt;Algolia is paid so you might want to take that into consideration if you are thinking about, especially you are not some kind of social media influencer.&lt;/p&gt;

&lt;p&gt;I am not that so my first instinct is looking into free options. Luckily there are free options that are fully implemented on the client side.&lt;/p&gt;

&lt;p&gt;Architecturally they implement a dumbed down version of the server side alternative. As mentioned before I started digging into rust and &lt;a href="https://www.getzola.org/themes/" rel="noopener noreferrer"&gt;zola&lt;/a&gt; and came across this really amazing &lt;a href="https://endler.dev/2019/tinysearch/" rel="noopener noreferrer"&gt;blog post&lt;/a&gt; about implementing a static search using WASM and rust.&lt;/p&gt;

&lt;p&gt;Essentially what the dumbed down version is, it's an index implementation that allows for a fully embedded search experience, allowing it to work in a static website. It is not designed for millions of pages and performance tends to degrade as you reach larger numbers.&lt;/p&gt;

&lt;p&gt;When getting into performance territory you might want to work on the performance of the index, there are multiple options, you could implement &lt;a href="https://sts10.github.io/2023/01/11/playing-with-binary-fuse-filters.html" rel="noopener noreferrer"&gt;fuse filters&lt;/a&gt; or &lt;a href="https://www.stavros.io/posts/bloom-filter-search-engine/" rel="noopener noreferrer"&gt;Bloom Filters&lt;/a&gt; or XOR Filters like the ones suggested in the &lt;a href="https://endler.dev/2019/tinysearch/" rel="noopener noreferrer"&gt;blog post&lt;/a&gt;. If you want to go for a further performance bump, server side is your best bet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation quirks
&lt;/h2&gt;

&lt;p&gt;I chose &lt;a href="https://www.fusejs.io/" rel="noopener noreferrer"&gt;fuse.js&lt;/a&gt; for one of my projects. For smaller scale it seems to perform just fine. I haven't reached any performance bottleneck yet.&lt;/p&gt;

&lt;p&gt;When It comes to implementing search you want to aim for keyword density, but also &lt;a href="https://www.oxfordsemantic.tech/faqs/what-is-faceted-search" rel="noopener noreferrer"&gt;facets&lt;/a&gt;. This is where the pure static solution may feel a bit naive.&lt;/p&gt;

&lt;p&gt;So, we are talking static websites, which means that in order to do a faceted search you need to filter content based on meta embedded in the pages before you pass it in to the fuse.js search "engine".&lt;/p&gt;

&lt;p&gt;You can do the filtering in &lt;code&gt;JS&lt;/code&gt;, or do an exact match search on a single field using fuse.js.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Fuse&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fuse.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;includeScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;author&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;useExtendedSearch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;books&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The Great Gatsby&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;F. Scott Fitzgerald&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The Da Vinci Code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Dan Brown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The Catcher in the Rye&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;J.D. Salinger&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fuse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Fuse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;books&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;="Dan Brown"&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fuse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then once you have the facet filtered results you can pass it on to the regular fuzzy search that would give you the static search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;$and&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;author&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="na"&gt;$val&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cott&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;$val&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fuse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Fuse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;books&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fuse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a server side implementation this would be done in a single operation usually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;you can do some of the search functionality without a backend server - think static&lt;/li&gt;
&lt;li&gt;while it may not scale, do you really have that as a hard requirement? &lt;a href="https://paulgraham.com/ds.html" rel="noopener noreferrer"&gt;Paul Graham's do things that don't scale&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;this approach is good for prototyping because it allows you to check the traction of the search&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>search</category>
      <category>javascript</category>
      <category>fusejs</category>
    </item>
    <item>
      <title>Building static websites</title>
      <dc:creator>Artur Daschevici</dc:creator>
      <pubDate>Tue, 30 Apr 2024 06:19:18 +0000</pubDate>
      <link>https://dev.to/adaschevici/building-static-websites-1f0c</link>
      <guid>https://dev.to/adaschevici/building-static-websites-1f0c</guid>
      <description>&lt;h2&gt;
  
  
  Static websites
&lt;/h2&gt;

&lt;p&gt;For anyone who is not yet familiar, static websites are websites that you can deploy, host and serve as prebundled and the website itself is a collection of static pages that are served as HTML with the site navigation map constructed at build time.&lt;/p&gt;

&lt;p&gt;There are some elements that you can include that emulate dynamic content(e.g. content that would normally be fetched from a database), but that is still only updated after the site, or parts of it are rebuilt.&lt;/p&gt;

&lt;p&gt;This approach has seen a proliferation of platforms that offer this as a service(&lt;a href="https://www.netlify.com/" rel="noopener noreferrer"&gt;Netlify&lt;/a&gt;, &lt;a href="https://vercel.com/" rel="noopener noreferrer"&gt;Vercel&lt;/a&gt;, &lt;a href="https://www.cloudflare.com/developer-platform/pages/" rel="noopener noreferrer"&gt;Cloudflare&lt;/a&gt; etc.) and also a proliferation of frameworks with different strengths and weaknesses(&lt;a href="https://developers.cloudflare.com/pages/framework-guides/" rel="noopener noreferrer"&gt;list of frameworks supported cloudflare&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why?
&lt;/h2&gt;

&lt;p&gt;One might opt for a static website rather than a dynamic one is that content might not be changing very often and that makes the need for interacting with a backend application server on every request redundant.&lt;/p&gt;

&lt;p&gt;That said you can update the content on a &lt;code&gt;cron&lt;/code&gt; based approach, every minute, every twelve hours or a mix of on demand and &lt;code&gt;cron&lt;/code&gt;, and have the content refresh.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture and pitfalls
&lt;/h2&gt;

&lt;p&gt;The idea when working on a static website is that 90% of the website content is created when the site is built. It is not necessarily static, but it changes at a much slower rate than say a real time application would. For example a Google maps style app for tracking your current route would not be a good candidate for a static website, a parcel tracking website could be static no problem, provided the updates are infrequent.&lt;/p&gt;

&lt;p&gt;One nice aspect of static websites is that it makes it more secure, since the access to the core is gated, for the most part, and you can focus on the real dynamic component endpoints and tighten security on those, in effect decreasing the attack surface by quite a bit.&lt;/p&gt;

&lt;p&gt;In the end it is a numbers game, infrequent could mean that you only update the site a few times each day. The reason why this is a numbers game is that if you set up your flow optimally the hosting costs can be extremely low if not free which can be a great perk if you are unsure about product market fit or general vision.&lt;/p&gt;

&lt;p&gt;A standard flow in static websites is that data comes in from some third party data source that can trigger builds on change(think headless CMS with webhooks - e.g.&lt;a href="https://www.contentful.com/products/" rel="noopener noreferrer"&gt;Contentful&lt;/a&gt;). Each change in the CMS should trigger a rebuild of the website.&lt;/p&gt;

&lt;p&gt;Most CI/CD platforms offer some kind of free tier for builds and most times the free tier is measured in minutes, so faster builds means more builds, fresher content.&lt;/p&gt;

&lt;p&gt;There is another big caveat you should be aware of, that is, the number of inbound requests. In the past couple of years there have been some war stories where &lt;a href="https://news.ycombinator.com/item?id=39520776" rel="noopener noreferrer"&gt;people have been billed absurd amounts of money&lt;/a&gt; because they were subject to DDoS attacks.&lt;/p&gt;

&lt;p&gt;So I would advise to use some kind of DDoS protection on your site as the inbound traffic comes at a cost too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Case studies
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Case study 1: Netlify, Gatsby and build times
&lt;/h4&gt;

&lt;p&gt;The first time I started building static websites is when I discovered &lt;a href="https://www.gatsbyjs.com/" rel="noopener noreferrer"&gt;Gatsby&lt;/a&gt;. I built several projects using Gatsby and hosted it on &lt;a href="https://www.netlify.com/" rel="noopener noreferrer"&gt;Netlify&lt;/a&gt; free tier. It felt like a really robust architecture and I loved that it was free.&lt;/p&gt;

&lt;p&gt;I was working with React so using this model and using Gatsby made a lot of sense. I built all my React workshops as static  websites, particularly since it has a free tier and Gatsby has many templates to &lt;a href="https://www.gatsbyjs.com/starters/" rel="noopener noreferrer"&gt;choose from&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;At one point though I realized there is a scaling problem with my &lt;a href="https://www.netlify.com/pricing/faq/" rel="noopener noreferrer"&gt;build minutes&lt;/a&gt;. I knew that golang has considerably faster builds and in my case the easy fix is swapping over to &lt;a href="https://gohugo.io/" rel="noopener noreferrer"&gt;Hugo&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Case study 2: &lt;a href="https://gohugo.io/" rel="noopener noreferrer"&gt;Hugo&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;Moving over to hugo cut my build times to a third of the time. It was pretty amazing and given the number of builds I was running each month it made the free minute allowance suffice.&lt;/p&gt;

&lt;p&gt;Hugo also has a nice &lt;a href="https://themes.gohugo.io/" rel="noopener noreferrer"&gt;template library&lt;/a&gt; and one of them was &lt;a href="https://learn.netlify.app/en/" rel="noopener noreferrer"&gt;quite fitting&lt;/a&gt; for my use case. I think the people at &lt;a href="https://solo.awsworkshop.io/" rel="noopener noreferrer"&gt;AWSWorkshops&lt;/a&gt; are using it for doing their workshops.&lt;/p&gt;

&lt;h4&gt;
  
  
  Case study 3: &lt;a href="https://www.getzola.org/" rel="noopener noreferrer"&gt;Zola&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;Since then I got more and more into Rust, but I haven't found many static website builders with nearly as much traction as any of the others I have mentioned. I think it has to do with the size of the community. At one point I was doing a rust codehunt and I asked about it and was left under the impression that zola is somewhat deprecated. Of course I was unable to find many resources to either confirm or reject this, so if anyone out there has some insights I would be happy to reframe my thinking.&lt;/p&gt;

&lt;p&gt;It comes with a list of &lt;a href="https://www.getzola.org/themes/" rel="noopener noreferrer"&gt;templates&lt;/a&gt; too, some look quite nice.&lt;/p&gt;

&lt;p&gt;One thing worth highlighting with zola is that rust build times are not as fast as Hugo. Go is optimized for very fast builds and it also has fewer compile time checks so it only makes sense that it is faster to build.&lt;/p&gt;

&lt;h4&gt;
  
  
  Case study 4: &lt;a href="https://astro.build/" rel="noopener noreferrer"&gt;Astro&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;I started looking into other options recently and stumbled upon Astro. It's &lt;a href="https://github.com/withastro/astro" rel="noopener noreferrer"&gt;OSS&lt;/a&gt; and it has very good &lt;a href="https://docs.astro.build/en/guides/deploy/vercel/" rel="noopener noreferrer"&gt;integration with Vercel&lt;/a&gt;, but can be deployed on any of the platforms for static hosting. (updated courtesy of &lt;a href="https://dev.to/iainsimmons"&gt;Iain Simmons&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Astro is quite nice since it is very hybrid friendly allowing you to integrate any framework in your flow and with a base of Astro templating. It's an elegant way of approaching micro forntends with a static twist to it.&lt;/p&gt;

&lt;p&gt;Not everything is easy or even possible in Astro, but you can use a mixture of Svelte, React, Angular and Vue in a single website, if you wanted to.&lt;/p&gt;

&lt;p&gt;Astro also has a pretty extensive gallery of &lt;a href="https://astro.build/showcase/" rel="noopener noreferrer"&gt;themes&lt;/a&gt; that you could use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;building static websites is fun for me, I get to build cheap apps that have clear rules for optimization&lt;/li&gt;
&lt;li&gt;build performance is key&lt;/li&gt;
&lt;li&gt;you get a lower attack surface out of the box&lt;/li&gt;
&lt;li&gt;if you are going to build a blog, static might be a really good way to go&lt;/li&gt;
&lt;li&gt;beware of scaling with number of visitors&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>staticwebsites</category>
      <category>astro</category>
      <category>hugo</category>
      <category>gatsby</category>
    </item>
  </channel>
</rss>
