<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Roger Oriol</title>
    <description>The latest articles on DEV Community by Roger Oriol (@rogiia).</description>
    <link>https://dev.to/rogiia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F905788%2F8cf647ea-956c-4ca9-aa00-2cd63540878f.png</url>
      <title>DEV Community: Roger Oriol</title>
      <link>https://dev.to/rogiia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rogiia"/>
    <language>en</language>
    <item>
      <title>How to Write a Good index.html File</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Sat, 22 Mar 2025 22:02:36 +0000</pubDate>
      <link>https://dev.to/rogiia/how-to-write-a-good-indexhtml-file-1ajd</link>
      <guid>https://dev.to/rogiia/how-to-write-a-good-indexhtml-file-1ajd</guid>
      <description>&lt;p&gt;Every web developer has been there: you're starting a new project and staring at an empty file called &lt;code&gt;index.html&lt;/code&gt;. You try to remember, which tags were meant to go in the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; again? Which are the meta tags that are best practice and which ones are deprecated?&lt;/p&gt;

&lt;p&gt;Recently, I found myself in this exact situation. My first instinct was to copy the head section from a previous project, but as I reviewed the code, I realized some tags were outdated or simply didn't apply to my new project. What followed was a deep dive into HTML head tags – which ones are essential, which are optional, and which are just cluttering my code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're in a hurry and just want the template:&lt;/strong&gt; You can find my complete starter template on &lt;a href="https://github.com/rogiia/html-starter-template" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. The repository contains two main files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;index.html&lt;/code&gt;: A clean, minimalist template with just what you need and no unnecessary extras.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;index-commented.html&lt;/code&gt;: The same template but with detailed comments explaining how and why you should use each tag.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article is essentially a deep dive into the comments from the &lt;code&gt;index-commented.html&lt;/code&gt; file, providing more context and explanations for each decision made in the template.&lt;/p&gt;

&lt;p&gt;This template represents my opinionated approach after researching current best practices. It aims to be a solid foundation for most web projects while maintaining good performance, accessibility, and search engine optimization.&lt;/p&gt;

&lt;p&gt;Let's dive into the essential components of a well-structured HTML head.&lt;/p&gt;

&lt;h3&gt;
  
  
  The tags you must include
&lt;/h3&gt;

&lt;p&gt;These tags should be present in virtually every HTML document you create. They're essential for proper rendering, SEO, and accessibility.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/code&gt; and &lt;code&gt;lang="en"&lt;/code&gt;: Setting the document type and language
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;html&lt;/span&gt; &lt;span class="na"&gt;lang=&lt;/span&gt;&lt;span class="s"&gt;"en"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always begin your HTML document with the doctype declaration. This tells browsers which version of HTML you're using (in this case, HTML5) and helps ensure consistent rendering. The &lt;code&gt;lang&lt;/code&gt; attribute on the &lt;code&gt;&amp;lt;html&amp;gt;&lt;/code&gt; tag specifies the language of your page - this is crucial for screen readers, search engines, and browsers. If your content is in a different language, change the code accordingly (e.g., &lt;code&gt;lang="es"&lt;/code&gt; for Spanish).&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt;: The page title
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;title&amp;gt;&lt;/span&gt;Hello world!&lt;span class="nt"&gt;&amp;lt;/title&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every HTML document must have a title tag. This text appears in browser tabs, bookmarks, and search engine results. Make your titles descriptive yet concise, ideally under 60 characters. A good title both informs users about your page content and includes relevant keywords for SEO.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;meta name="viewport"&amp;gt;&lt;/code&gt;: Configuring viewport for responsive design
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"viewport"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;"viewport-fit=cover, width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=5.0"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In today's mobile-first world, this meta tag is non-negotiable. It tells mobile browsers how to control the page's dimensions and scaling. Let's break down what each parameter does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;viewport-fit=cover&lt;/code&gt;: Ensures content extends to the edge of the display (especially important for notched phones)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;width=device-width&lt;/code&gt;: Sets the width of the page to follow the screen width of the device&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;initial-scale=1.0&lt;/code&gt;: Sets the initial zoom level when the page is first loaded&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;minimum-scale=1.0&lt;/code&gt;: Prevents users from zooming out too much&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;maximum-scale=5.0&lt;/code&gt;: Allows users to zoom in up to 5x (limiting this completely would harm accessibility)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this tag, mobile devices will render pages at a typical desktop screen width and then scale them down, resulting in tiny, unreadable text and forcing users to zoom and pan.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;meta name="description"&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;meta name="keywords"&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;meta name="author"&amp;gt;&lt;/code&gt;: Essential meta information
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"description"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"keywords"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"author"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These meta tags provide important information about your page:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;description&lt;/code&gt;: A concise summary of your page content (ideally 150-160 characters). This often appears in search engine results below your title.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;keywords&lt;/code&gt;: Relevant keywords for your page content. While less important for Google these days, other search engines and crawlers may still use this information.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;author&lt;/code&gt;: The name of the individual or organization that created the page.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While these tags don't directly affect page rendering, they're valuable for SEO and content categorization.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;link rel="canonical"&amp;gt;&lt;/code&gt;: Avoiding duplicate indexation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"canonical"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tag helps search engines avoid indexing the same content multiple times when it's accessible via different URLs. For example, if your page is accessible via multiple URLs (like &lt;code&gt;example.com/page&lt;/code&gt; and &lt;code&gt;example.com/page/index.html&lt;/code&gt;), the canonical tag tells search engines which URL is the "official" version to index, preventing duplicate indexation which can harm your search rankings.&lt;/p&gt;

&lt;p&gt;Fill in the &lt;code&gt;href&lt;/code&gt; attribute with the primary URL you want search engines to associate with this content.&lt;/p&gt;

&lt;h4&gt;
  
  
  CSS loading strategies: Critical inline CSS vs. external stylesheets
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;style&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;body&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;background&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;#fefefe&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;color&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;#222&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;font-family&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;'Roboto'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;sans-serif&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1rem&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;line-height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/style&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"preload"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"main.css"&lt;/span&gt; &lt;span class="na"&gt;as=&lt;/span&gt;&lt;span class="s"&gt;"style"&lt;/span&gt; &lt;span class="na"&gt;onload=&lt;/span&gt;&lt;span class="s"&gt;"this.onload=null;this.rel='stylesheet'"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;noscript&amp;gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"stylesheet"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"main.css"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/noscript&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This strategy for CSS loading optimizes performance in two ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Inline critical CSS&lt;/strong&gt;: By embedding essential styles directly in the HTML document, you avoid making an additional network request that would block rendering. This makes critical styles load blazingly fast and prevents the Flash of Unstyled Content (FOUC) that can occur when styles load after content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous loading for non-critical CSS&lt;/strong&gt;: The preload technique with the onload handler allows the main stylesheet to load without blocking rendering. This means your page can start displaying while the rest of the styles are still loading, creating a better user experience. The noscript tag provides a fallback for users with JavaScript disabled.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Alternatively, if your site doesn't have significant styling needs above the fold, you can use a simpler approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"main.css"&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"stylesheet"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is more straightforward but can slow initial rendering as the browser must download and parse the CSS before displaying content.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt;: Script loading best practices
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"module"&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"app.js"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For JavaScript, the &lt;code&gt;type="module"&lt;/code&gt; attribute offers several advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically defers script loading until the DOM is ready&lt;/li&gt;
&lt;li&gt;Enables ECMAScript modules for better code organization&lt;/li&gt;
&lt;li&gt;Runs in strict mode by default&lt;/li&gt;
&lt;li&gt;Allows for cleaner dependency management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For scripts that don't depend on DOM elements and should run as soon as possible, consider adding the &lt;code&gt;async&lt;/code&gt; attribute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"module"&lt;/span&gt; &lt;span class="na"&gt;async&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"analytics.js"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additionally, it's good practice to register a service worker for offline capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;serviceWorker&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;load&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;serviceWorker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/service-worker.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script is placed without &lt;code&gt;defer&lt;/code&gt; or &lt;code&gt;async&lt;/code&gt; attributes so it loads and executes as soon as possible, enabling offline capabilities early in the page load process. The service worker runs in the background, independent of your web page, allowing it to handle network requests and caching even when the user navigates away from your site.&lt;/p&gt;

&lt;h3&gt;
  
  
  The tags you maybe should include
&lt;/h3&gt;

&lt;p&gt;These tags aren't necessary for every project but can be valuable in specific situations. Include them based on your project's needs.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;meta charset="utf-8"&amp;gt;&lt;/code&gt;: Character encoding
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;charset=&lt;/span&gt;&lt;span class="s"&gt;"utf-8"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This meta tag specifies the character encoding for your HTML document. UTF-8 is already the default character encoding in HTML5, so this tag isn't strictly necessary in many cases. However, including it explicitly ensures consistency across all browsers and prevents potential character rendering issues, especially with special characters or non-Latin alphabets.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;base href="/"&amp;gt;&lt;/code&gt;: Defining a base URL
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;base&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The base tag specifies the base URL for all relative URLs in a document. If all your site's URLs are already relative to the root path ("/"), you don't need to include this tag. It's primarily useful when your site is hosted in a subdirectory but you want paths to be relative to the domain root, or when developing single-page applications with client-side routing.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;meta name="application-name"&amp;gt;&lt;/code&gt;: Application details
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"application-name"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your Progressive Web App (PWA) should have a different name than what's specified in your title tag, use this meta tag. It defines the name that will appear when your web application is installed on a device or pinned to a user's start menu or taskbar.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;meta name="theme-color"&amp;gt;&lt;/code&gt;: Browser UI theme color
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"theme-color"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;"#33d"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This meta tag defines the color used by the user agent in the UI elements around your page, such as the browser's address bar in mobile browsers or the title bar in some desktop browsers. Choose a color that reflects your brand identity to create a more integrated visual experience.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;meta name="color-scheme"&amp;gt;&lt;/code&gt;: Light and dark mode support
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"color-scheme"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;"light dark"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tag informs the browser if your site supports light mode, dark mode, or both. The value &lt;code&gt;"light dark"&lt;/code&gt; means that both schemes are supported, with light being preferred. This helps browsers render form controls, scrollbars, and other UI elements in the appropriate color scheme, creating a better user experience that respects system preferences.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;meta property="og:"&amp;gt;&lt;/code&gt;: Social media integration with Open Graph
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;property=&lt;/span&gt;&lt;span class="s"&gt;"og:title"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;property=&lt;/span&gt;&lt;span class="s"&gt;"og:type"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;"website"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;property=&lt;/span&gt;&lt;span class="s"&gt;"og:url"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;meta&lt;/span&gt; &lt;span class="na"&gt;property=&lt;/span&gt;&lt;span class="s"&gt;"og:image"&lt;/span&gt; &lt;span class="na"&gt;content=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open Graph meta tags optimize how your content appears when shared on social media platforms like Facebook, LinkedIn, and X (formerly Twitter). While not essential for basic functionality, they significantly improve the appearance and engagement of your content when shared.&lt;/p&gt;

&lt;p&gt;Key Open Graph tags include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;og:title&lt;/code&gt;: The title of your page/content (can differ from your HTML title)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;og:type&lt;/code&gt;: The type of content (website, article, product, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;og:url&lt;/code&gt;: The canonical URL of your page&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;og:image&lt;/code&gt;: The URL to an image representing your content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your site's content is likely to be shared on social platforms, filling these tags with appropriate values can dramatically improve click-through rates and engagement.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;link rel="manifest"&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;link rel="icon"&amp;gt;&lt;/code&gt;: PWA support and favicons
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"manifest"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"manifest.json"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"icon"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"/favicon.ico"&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"image/x-icon"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Progressive Web Apps, use the manifest link to point to a JSON file that contains your app's metadata, including icons. The manifest should include your app's icons in various sizes for different devices and contexts.&lt;/p&gt;

&lt;p&gt;For regular websites that aren't PWAs, use the icon link to define your site's favicon. While technically optional, a favicon helps with brand recognition and user experience, so most sites should include one.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;link rel="alternate"&amp;gt;&lt;/code&gt;: Alternate Content Types for your site
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"alternate"&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"application/rss+xml"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"/feed.xml"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"alternate"&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"text/markdown"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"/llms.txt"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This link tag serves multiple purposes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;It helps RSS readers and other feed aggregators discover your site's RSS feed. Include it if your site provides an RSS feed (common for blogs, news sites, or regularly updated content collections).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It can also be used to specify an &lt;code&gt;llms.txt&lt;/code&gt; file for your site as a &lt;code&gt;type="text/markdown"&lt;/code&gt; alternate link. This file provides your site's content in an easy to digest format for Large Language Models scanning your site. &lt;strong&gt;Note:&lt;/strong&gt; I stole that idea from &lt;a href="https://www.gilesthomas.com/2025/03/llmstxt" rel="noopener noreferrer"&gt;Giles Thomas&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your site doesn't offer these features, you can safely omit these tags.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;&amp;lt;link rel="preload"&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;link rel="preconnect"&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;link rel="prefetch"&amp;gt;&lt;/code&gt;: Resource optimization
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"preload"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://fonts.googleapis.com/css2?family=Roboto:wght@400;700&amp;amp;display=swap"&lt;/span&gt; &lt;span class="na"&gt;as=&lt;/span&gt;&lt;span class="s"&gt;"style"&lt;/span&gt; &lt;span class="na"&gt;onload=&lt;/span&gt;&lt;span class="s"&gt;"this.onload=null;this.rel='stylesheet'"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"preconnect"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"https://fonts.gstatic.com"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"prefetch"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"/next-page.html"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These link tags help optimize resource loading:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;preload&lt;/code&gt;: Tells the browser to download and cache a resource as soon as possible. Useful for critical resources needed early in the rendering process, like fonts or important images.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;preconnect&lt;/code&gt;: Establishes an early connection to external domains from which you'll fetch resources later. This saves time by having the connection ready when needed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;prefetch&lt;/code&gt;: Suggests to the browser that a resource might be needed for future navigation. The browser will download it when idle, making subsequent page loads faster.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use these selectively based on your performance needs. Over-using them can waste bandwidth, so focus on truly critical resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;This template gives you a solid starting point for any web project. Of course, this template is completely opinionated and the best setup for your needs may vary. If I left out some common tags from the template, it's probably because they are not needed, at least for most cases.&lt;/p&gt;

&lt;p&gt;I welcome your feedback! If you think I've missed something important, please open an issue or submit a pull request on the &lt;a href="https://github.com/rogiia/html-starter-template" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Also, if you want a starter template that goes further than just the index.html file, you might want to check out &lt;a href="https://html5boilerplate.com/" rel="noopener noreferrer"&gt;HTML5 Boilerplate&lt;/a&gt;. It's a great resource to get up an running really fast when building a Progressive Web App.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;p&gt;If you want to read more about HTML head tags, I suggest starting here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developer.mozilla.org/en-US/docs/Learn_web_development/Core/Structuring_content/Webpage_metadata" rel="noopener noreferrer"&gt;Mozilla Web Documentation: Metadata in HTML&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://web.dev/learn/html/metadata" rel="noopener noreferrer"&gt;web.dev by Google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ogp.me/" rel="noopener noreferrer"&gt;The Open Graph Protocol&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://w3c.github.io/manifest/" rel="noopener noreferrer"&gt;Web App Manifest specification&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And once more, you can find the my index.html template on &lt;a href="https://github.com/rogiia/html-starter-template" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>css</category>
      <category>frontend</category>
    </item>
    <item>
      <title>About the Dead Internet Theory and AI</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Tue, 25 Feb 2025 20:43:36 +0000</pubDate>
      <link>https://dev.to/rogiia/about-the-dead-internet-theory-and-ai-35g</link>
      <guid>https://dev.to/rogiia/about-the-dead-internet-theory-and-ai-35g</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.ruxu.dev%2Fassets%2Fimages%2Fdead-internet-theory%2FInternet_dog.jpg" class="article-body-image-wrapper"&gt;&lt;img alt="Peter Steiner's 1993 cartoon, as published in The New Yorker" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.ruxu.dev%2Fassets%2Fimages%2Fdead-internet-theory%2FInternet_dog.jpg" width="299" height="334"&gt;&lt;/a&gt;&lt;/p&gt;
On the Internet nobody knows you are a dog... or an LLM



&lt;p&gt;The Dead Internet Theory is a thought that has gained a lot of traction recently. I have to admit, the first time it was explained to me, I felt an eerie realization. Like I had already been experiencing it, but I hadn't paid too much attention to it. The first moment, I felt scared for the future and nostalgic for the past. But that's an old man's attitude, to be so resistant of change. After that moment, I started reflecting. How did we get here? And how can we prevent the Internet from continuing down this concerning path?&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is The "Dead Internet Theory"?
&lt;/h3&gt;

&lt;p&gt;The main premise of the Dead Internet Theory is an outrageous statement that challenges our view of the state of the Internet:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Internet feels empty and devoid of people.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This theory originated in a &lt;a href="https://i.4pcdn.org/x/1587947548944.png" rel="noopener noreferrer"&gt;4chan post&lt;/a&gt; around 2019. This 4chan anonymous user writes his recent experience on the current state of the Internet. While his tone is pretty outrageous and paranoid (as is expected of a 4chan board), he raises some valid concerns that resonate with many other Internet users. This has led to this hypothesis gaining a lot of traction online and others sharing their own experiences and thoughts on it on other Internet boards.&lt;/p&gt;

&lt;p&gt;So what do they mean by empty and devoid of people? There's plenty of people on the Internet, right? I believe there's two factors to it.&lt;/p&gt;

&lt;p&gt;First, we are &lt;strong&gt;increasingly interacting less directly with other humans&lt;/strong&gt;. Now we are talking to an "audience". But really we are talking to the recommendation algorithm, our god, so it graces our posts with engagement. When we post on &lt;em&gt;social&lt;/em&gt; networks, we don't expect our friends and relatives to see and interact with it. We expect strangers we don't know or care about to like it. This impersonal relationship is making the Internet less social and more of a hustle.&lt;/p&gt;

&lt;p&gt;Second, an &lt;strong&gt;increasing amount of users of the Internet are not people&lt;/strong&gt;. They are bots. Fake profiles that algorithmically play the viral content game to gain influence on the social platforms. Advertisement and/or scam e-mails, messages or even calls massively horizontally scaled thanks to bots. &lt;a href="https://hallofdreams.org/posts/physicsforums/" rel="noopener noreferrer"&gt;Generative AI answering questions on behalf of actual people on PhysicsForums&lt;/a&gt; and other forums like StackOverflow.&lt;/p&gt;

&lt;p&gt;The Internet was supposed to connect people all around the world. And it did that, beautifully. But for the last years, it's been going in the opposite direction. It's driving us apart, isolating us from other people and keeping us content with experiencing an Internet massivelly filled with content, without the need to interact with other people.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is It Really That Bad?
&lt;/h3&gt;

&lt;p&gt;To be fair, it's not like nobody uses the Internet to communicate with anyone anymore. Messaging apps are very prevalent and the users of these apps are mostly human. We talk with our friends, make plans, catch up with people that are not as much in our lives as they used to. But this is a &lt;strong&gt;private&lt;/strong&gt; Internet, not an open Internet. A private chat is not meant to be shared, discovered by people interested on what you are talking about or build a little garden in a corner of the Internet that's just yours.&lt;/p&gt;

&lt;p&gt;Also, &lt;strong&gt;people are not interacting with the Internet like they used to&lt;/strong&gt;. I believe the rise of smartphones has contributed to that greatly. Smartphones have been very optimized for consuming content, and especially for &lt;strong&gt;infinite, mindless scrolling&lt;/strong&gt;. It's much more lucrative for your app to have your users trapped in an infinite scroll, consuming content (and ads) by the ton, than to have them create and share. And to achieve that, they provide (impose on) you an amazing recommendation algorithm that will play your psychology to keep you engaged. But if people are only consuming, they are not interacting, discussing, or building. You see a lot of people on your feed, sure, but in such large numbers and impersonal style that it doesn't feel like there's people on the other side.&lt;/p&gt;

&lt;p&gt;Paradoxically, recommendation algorithms also make us more likely to interact with content that arouses anger in us than we agree with, since this drives more &lt;strong&gt;engagement&lt;/strong&gt;. &lt;a href="https://www.youtube.com/watch?v=fuFlMtZmvY0" rel="noopener noreferrer"&gt;This amazing video by Kurzgesagt&lt;/a&gt; talks about the psychology of social interactions in an open, algorithmic internet. It comes to the conclusion that small, year 2000 forum-like communities on the Internet were the most similar to the social mechanisms our brains are accustomed to and thrive in.&lt;/p&gt;

&lt;p&gt;And it has gotten worse recently, thanks to Generative AI and Large Language Models. Bots were already popping up everywhere, but with the power of recent Generative AI models, they have gotten so much better at &lt;strong&gt;deceiving people&lt;/strong&gt;. And some content posted by people is not actually thought about and written by those people. They just asked an LLM to write it for them and copy-pasted the output into their post. A lot of news, or articles like this one, are written almost instantly (and without any thought behind) thanks to Generative AI. Yes, I have also used it to write, but I try to keep it just as a proofreading/editing/brainstorming/idea drafting aid. I don't post anything I haven't thought about deeply and didn't originate from me. &lt;/p&gt;

&lt;p&gt;StackOverflow is a board for asking and answering programming-related questions. Since ChatGPT, it has seen a lot of answers generated by LLMs. I would argue that if you post a question there, you would like an answer from a person that has experience in that topic. If you wanted an answer from ChatGPT you would ask it yourself. And sometimes you might want to, but ChatGPT generally will not be able to solve problems it has never seen like an expert would, but it will still be happy to answer. And this makes a lot of the answers confidently wrong and void of human expertise. This is why StackOverflow has decided to &lt;a href="https://meta.stackoverflow.com/questions/421831/policy-generative-ai-e-g-chatgpt-is-banned" rel="noopener noreferrer"&gt;ban all use of Generative AI in content posted to the platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The wave of content generated by AI and unapologetically non-reviewed by humans has been so strong that a term has been coined for it: &lt;strong&gt;&lt;a href="https://x.com/deepfates/status/1787472784106639418" rel="noopener noreferrer"&gt;AI Slop&lt;/a&gt;&lt;/strong&gt;. &lt;a href="https://benjamincongdon.me/blog/2025/01/25/AI-Slop-Suspicion-and-Writing-Back/" rel="noopener noreferrer"&gt;Slop is usually defined as spam, but for AI-generated content&lt;/a&gt;. I would define it as &lt;strong&gt;low-quality content created by a Generative AI model without any human review or even human thought behind it&lt;/strong&gt;. AI Slop threatens to flood the Internet with useless posts, algorithm-pleasing content, mass AI-generated pictures and videos at a pace unprecedented until now, thanks to no human being needed to create them.&lt;/p&gt;

&lt;p&gt;As if this wasn't bad enough, &lt;strong&gt;Meta is now experimenting with not only allowing AI bots on the platform, but using those bots themselves to drive up engagement&lt;/strong&gt;. In an &lt;a href="https://www.ft.com/content/91183cbb-50f9-464a-9d2e-96063825bfcf" rel="noopener noreferrer"&gt;interview with the Financial Times&lt;/a&gt;, Connor Hayes said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;They’ll have bios and profile pictures and be able to generate and share content powered by AI on the platform ... that’s where we see all of this going.&lt;/p&gt;

&lt;p&gt;[...] make our apps more entertaining and engaging.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Basically, if you post something on Instagram and nobody likes it or comments it, it might dissuade you from interacting with the app. But if suddenly your post got a lot of likes, comments and messages, even if it's from AI profiles, you will spend more time on it. Never actually interacting with anyone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If we continue down this path, soon 99% of all content on the Internet will be AI generated and the Internet will have converted from a place to share and communicate with other people to a place to consume endless slop. It will be lonelier than ever, an Internet without people.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  My Final Thoughts + Possible solutions
&lt;/h3&gt;

&lt;p&gt;In my opinion, the epicenter of the problem is not AI as a whole but recommendation algorithms and infinite scrolling. The problem is a business model built on the need to drive up engagement and keeping users on the platform at any cost. And it's a titanic problem to tackle, since the whole modern Internet is built around this concept.&lt;/p&gt;

&lt;p&gt;AI-generated content can also be a problem, but I believe it's more of a &lt;strong&gt;misuse problem&lt;/strong&gt; than a fundamental problem with the technology. Generative AI models should be our helpers and assistants, but not take over our personas.&lt;/p&gt;

&lt;p&gt;Some people advocate on making AI-generated content easier to spot by &lt;strong&gt;watermarking&lt;/strong&gt;. I don't believe this is the way not only because it's difficult to do so with all the kinds of output LLMs can produce, but also because it's impossible to enforce.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://simonwillison.net/2023/Aug/27/wordcamp-llms/#personal-ai-ethics" rel="noopener noreferrer"&gt;Simon Willson has a great oath on Personal AI ethics&lt;/a&gt;. While he admits to using AI as his writing assistant, he promises to not post anything that takes longer to read than it took him to write. &lt;strong&gt;And I promise to do the same as well&lt;/strong&gt;. Because, just like Simon, I think it's rude to publish text that you haven't even read yourself.&lt;/p&gt;

&lt;p&gt;As for possible solutions to make the Internet &lt;em&gt;less dead&lt;/em&gt;, and taking into account that deleting the Internet and starting from scratch is not possible, I've come up with a few actions that I will be applying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build social circles in online social vehicles that don't feature recommendation algorithms / infinite scrolling and have a limited number of people.&lt;/strong&gt; Think Discord/Slack groups or online forums. The lack of recommendation algorithms will allow you to more directly explore the content and people you are interested in. The lack of infinite scrolling will keep you from endlessly consuming content and encourage you to interact / build more. And the limited number of people will allow for easier connection building with other members.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Also interact outside that circle&lt;/strong&gt; to get content and world views from outside your circle, but be very critical of what you read and see. Take into account that what you read and see might very well be AI generated. Don't let that take most of your time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Favor subscription feeds like RSS to algorithmic recommendation apps and web pages.&lt;/strong&gt; Subscribe to your favorite blogs (wink), newsletters and podcasts via an RSS reader or similar. Explore the personal pages of other people. If you are looking for content you consume, this is the one you are most likely to enjoy and won't keep you stuck infinitely scrolling. When you are done, you are done.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the end, to make your Internet less dead, the goal is to not spend as much time on the Internet, but to be better connected to the things that matter to you.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>socialmedia</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>The Rise Of Reasoner Models: Scaling Test-Time Compute</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Sun, 12 Jan 2025 13:52:30 +0000</pubDate>
      <link>https://dev.to/rogiia/the-rise-of-reasoner-models-scaling-test-time-compute-33e3</link>
      <guid>https://dev.to/rogiia/the-rise-of-reasoner-models-scaling-test-time-compute-33e3</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A new kind of LLM has recently been popping out everywhere: Reasoner models. Kickstarted by OpenAI's o1 and o3, these models are a bit different from the rest. These models particularly shine when dealing with mathematical problems and coding challenges, where success depends on following precise, logical steps to reach a correct solution. On the other hand, these models take much longer to answer than any conventional model.&lt;/p&gt;

&lt;p&gt;The approach to problem-solving used by these models mirrors a well-known distinction in human cognition: System 1 versus System 2 thinking. Traditional LLMs operate much like System 1 thinking - quick, intuitive, and based on pattern recognition. They generate responses rapidly based on their trained neural networks. In contrast, Reasoner models embody System 2 thinking - deliberate, methodical, and self-correcting. They can pause, reflect on their reasoning, and even backtrack when they detect potential errors in their logic.&lt;/p&gt;

&lt;p&gt;The key innovation enabling these capabilities isn't found in revolutionary architectures or training methods, but rather in a different approach to computation: scaling test-time compute.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Test-time Compute?
&lt;/h2&gt;

&lt;p&gt;At its core, test-time compute represents a fundamental shift in how we allocate computational resources in AI systems. While traditional models focus on scaling training time and data (train-time compute) to improve, Reasoner models invest those computational resources during the actual problem-solving phase (test-time compute). This approach essentially gives the model more time to "think" about its answers.&lt;/p&gt;

&lt;p&gt;The concept of "thinking longer" might seem similar to existing techniques like Chain-of-Thought (CoT) prompting, but there's a crucial difference. CoT prompting encourages models to spell out its reasoning and train of thought, but the intermediate steps are not validated or weighted against alternatives. Therefore, even if most steps are spot on, any error in the intermediate reasoning steps will compound, leading to incorrect final answers. Test-time compute solves this by enabling models to actively verify and correct their reasoning process.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Does Test-time Compute Work?
&lt;/h2&gt;

&lt;p&gt;The simplest method to implement test-time compute is &lt;strong&gt;iterative self-refinement&lt;/strong&gt;. In this method, the model outputs its reasoning and thoughts to solve a problem. Then, all the previous model outputs are passed through the same model again, making it pay attention to its reasoning and try to find errors and correct them. After sufficient iterations, the quality of the response should be much higher than the first output. This is however a naïve approach, that doesn’t allow the model to be creative to find the right solution.&lt;/p&gt;

&lt;p&gt;A better approach is &lt;strong&gt;Verifier-guided search&lt;/strong&gt;. In this approach, the model generates multiple different answers and a verifier selects the best one. A high temperature is often used to encourage the model to be creative and explore many different solutions. The same model used to generate can be the verifier, however, it’s a much better approach to train a smaller, separate model for that role.&lt;/p&gt;

&lt;p&gt;The verifier can score the correctness of the overall solution (&lt;strong&gt;ORM - Outcome Reward Model&lt;/strong&gt;) or it can score each of the steps of the solution separately (&lt;strong&gt;PRM - Process Reward Model&lt;/strong&gt;). A PRM is much more expensive to run on all the steps than an ORM, however it is a better approach since ORM will mark a solution with all correct steps except one as incorrect. PRM, on the other hand, will mark all correct steps as correct and will encourage the model to rethink the only incorrect step to get to the correct solution.&lt;/p&gt;

&lt;p&gt;Since the PRM will need to evaluate many different approaches to the solution, which branch for each different step proposed, we need efficient search strategies to find the path to the correct solution. The following are few different search strategies to find the correct solution path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Best of N&lt;/strong&gt;: Generate N independent solutions and score each step using the reward model. Select the solution with the highest score in all its steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best of N Weighted&lt;/strong&gt;: The implementation is the same as Best of N, however, identical responses are aggregated together. Therefore, the most common solutions will have higher scores.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Beam Search&lt;/strong&gt;: Generate N solutions for the first step. All generated solutions are scored and the top N/M will be selected for further investigation. For each of the top solutions, M different possible next steps will be generated, and therefore, N steps will be graded for the next step. This continues until getting to the final solution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DVTS (Diverse Verifier Tree Search)&lt;/strong&gt;: Very similar to Beam Search, but we start with N/M different subtrees and we select only the best step for each of them. Then, for each of the best steps, we generate M new next steps and we score and select the best of them until we get to the final answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lookahead search&lt;/strong&gt;: Also similar to Beam Search, but in order to grade each step, we generate the next step for that path and we also grade that next step. We use the score from the next step to score the previous step. Using this information, we end up selecting the paths that will be further explored and the ones that will be discarded. This method is also similar to the Monte Carlo Tree Search algorithm.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, which of these strategies is the best? It depends. For simpler problems and lower compute iteration budget, Best of N weighted seems to be best. But for harder problems and higher compute budgets, Beam Search and its derivatives are definitely better. To achieve the best performance for both simple and complex problems, a &lt;strong&gt;“compute-optimal”&lt;/strong&gt; strategy can be chosen. This strategy will estimate the difficulty of the problem using a model and choose the search strategy accordingly.&lt;/p&gt;

&lt;p&gt;When correct solutions to hard problems are finally found using search, the data from the reasoning of the whole solution can then be used to further improve both the reasoning model and the reward model using reinforcement learning. This further improves the model on its reasoning abilities and makes it able to get to the correct answer faster in subsequent iterations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Improvements Through Test-time Compute
&lt;/h2&gt;

&lt;p&gt;When we evaluate models that use test-time compute using math and coding benchmarks we find remarkable improvements. In an article from HuggingFace, &lt;em&gt;“Scaling Test-time Compute with Open Models”&lt;/em&gt;, Llama-3.2 3B using 256-iteration test-time compute was found to be better than Llama-3.1 70B, &lt;strong&gt;a model over 20 times larger&lt;/strong&gt;. The paper &lt;em&gt;“Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters”&lt;/em&gt; also finds that a PaLM 2-S small model can &lt;strong&gt;outperform a 14 times larger&lt;/strong&gt; model using test-time compute.&lt;/p&gt;

&lt;p&gt;These findings suggest that some problems, especially reasoning-heavy ones like math and coding, can be solved not by using bigger models, but by making the model “think longer”.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations of Test-time Compute
&lt;/h2&gt;

&lt;p&gt;Does the improvement that reasoning models show mean that, to get better models, we should always let them “think for longer” instead of pretraining bigger models with more data? The already mentioned paper &lt;em&gt;“Scaling Test-time Compute with Open Models”&lt;/em&gt; shows that this does not seem to be the case. The authors of the paper tested which was more effective for solving harder problems; pretraining or scaling test-time compute, and their conclusion is the following:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;​​Test-time and pretraining compute are not 1-to-1 “exchangeable”. On easy and medium questions, which are within a model’s capabilities, or in settings with small inference requirement, test-time compute can easily cover up for additional pretraining. However, on challenging questions which are outside a given base model’s capabilities or under higher inference requirement, pretraining is likely more effective for improving performance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Therefore, although scaling test-time compute can help models get to the correct answer even if they struggle to get it right, for test-time to work, the solution has to be in the models capabilities and knowledge to begin with. Scaling test-time compute is not a solution that can solve any problem just by giving it more time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Reasoner models like o1 and o3 are indeed impressive, and this leads many people to wonder if they are close to achieving Artificial General Intelligence (AGI). Of course, reasoning through test-time compute is not AGI at all. Test-time compute, while powerful, is not sufficient on its own to bridge the gap to AGI and, as we have established, has many shortcomings. Plus it only seems to work well on tasks that have steps that have to be objectively correct to get to the right solution, namely math and coding tasks.&lt;/p&gt;

&lt;p&gt;However, this step to achieve strong reasoning with smaller models does seem to fit with OpenAI’s 5 step plan to AGI. Step 1 is conversational AI and step 2 is reasoning AI, which seems to have been achieved. The third step will be autonomous AI, which is already in the crosshair of many agentic models and agentic systems that are currently being developed.&lt;/p&gt;

&lt;p&gt;So, should you use Reasoner models instead of regular ones? The decision should depend on your specific use case. These models excel at tasks requiring careful reasoning and verification, particularly in domains like mathematics, coding, and logical problem-solving. However, for tasks that require quick responses or deal with more subjective matters, traditional LLMs might still be the better choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Scaling Test-Time Compute with Open Models. Edward Beeching, Lewis Tunstall, Sasha Rush (HuggingFace). URL: &lt;a href="https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute" rel="noopener noreferrer"&gt;https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective. Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Bo Wang, Shimin Li, Yunhua Zhou, Qipeng Guo, Xuanjing Huang, Xipeng Qiu. URL: &lt;a href="https://arxiv.org/abs/2412.14135" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2412.14135&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar. URL: &lt;a href="https://huggingface.co/papers/2408.03314" rel="noopener noreferrer"&gt;https://huggingface.co/papers/2408.03314&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>AI in 2024: Year in Review and Predictions for 2025</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Mon, 23 Dec 2024 19:25:27 +0000</pubDate>
      <link>https://dev.to/rogiia/ai-in-2024-year-in-review-and-predictions-for-2025-6h2</link>
      <guid>https://dev.to/rogiia/ai-in-2024-year-in-review-and-predictions-for-2025-6h2</guid>
      <description>&lt;p&gt;The past year has been transformative for artificial intelligence, marked by breakthrough innovations, emerging regulations, and a shift toward practical AI tools that enhance productivity. As we look ahead to 2025, let's review the major developments of 2024 and explore what the future might hold.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part I: 2024 Year in Review
&lt;/h2&gt;

&lt;h3&gt;
  
  
  OpenAI in the front
&lt;/h3&gt;

&lt;p&gt;OpenAI maintained its position at the forefront of AI innovation in 2024. The release of GPT-4o in May marked a significant milestone as the first frontier multimodal model, capable of understanding and generating content across different forms of media. This was followed by o1 in September, along with its lighter counterpart o1-mini, and the announcement of o3 in December.&lt;/p&gt;

&lt;p&gt;Perhaps most notably, OpenAI's Sora project revolutionized video generation. Announced in February and enhanced with Sora Turbo in December, this technology demonstrated unprecedented capabilities in creating realistic video content from text descriptions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agentic AI and AI Assistants
&lt;/h3&gt;

&lt;p&gt;Rather than replacing jobs, AI in 2024 focused on enhancing human productivity through innovative tools and assistants. Google introduced several groundbreaking projects: Astra, an AI assistant for phones and smart glasses, and Mariner, a Chrome extension that enables Gemini to interact with your browser. These tools represent a shift from simple chat interfaces to interactive agents that can understand and manipulate our digital environment.&lt;/p&gt;

&lt;p&gt;Anthropic joined this trend by enabling Claude to use computers the way humans do—viewing screens, moving cursors, and interacting with interfaces. This development opened new possibilities for AI assistance in everyday computer tasks.&lt;/p&gt;

&lt;p&gt;The developer community saw particularly exciting advances. Github's Copilot and Cursor AI have been very well received by the developer community. On the other hand, autonomous coding agents like Devin and Github Workspace are still rough around the edges. These tools aren't replacing developers; instead, they're supercharging their productivity by handling routine tasks and suggesting improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Regulation
&lt;/h3&gt;

&lt;p&gt;2024 saw significant progress in AI regulation, particularly with the EU AI Act. This groundbreaking legislation set the first comprehensive framework for AI regulation, establishing rules for AI system development and deployment while balancing innovation with safety and ethical concerns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part II: Looking Ahead to 2025
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Small, specialized models
&lt;/h3&gt;

&lt;p&gt;While decoder-only models like GPT and Claude have dominated headlines, in 2025 encoder-only models are still an important piece in AI applications. Just one week before ending the year, ModernBERT was released, representing a significant advance in this space that was long overdue. These models are particularly crucial for Retrieval-Augmented Generation (RAG) setups, where they excel at information retrieval and classification at a significantly lower cost to run.&lt;/p&gt;

&lt;p&gt;In a similar fashion, the current trend of agentic systems will favor small, specialized models. These models will be fine-tuned to be as good as frontier models in really specific tasks, but used in conjunction in an agentic workflow, will allow for really powerful and resource efficient AI systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multimodal models and Test-time compute
&lt;/h3&gt;

&lt;p&gt;We can expect significant advances in multimodal capabilities, with AI systems becoming more natural at processing and responding to various forms of input—text, voice, images, and video—simultaneously. These improvements will make AI interactions feel more natural and contextually aware.&lt;/p&gt;

&lt;p&gt;The concept of "test-time compute"—giving models more time to think—emerged as a game-changer in 2024 and will likely become more prominent in 2025. Frontier models like OpenAI’s o3 are crushing reasoning, math and coding benchmarks. They are consistently beating humans at really complex, deep reasoning tasks, making us wonder how far Artificial General Intelligence really is.&lt;/p&gt;

&lt;p&gt;Test-time compute is not only showing remarkable results for frontier models. Used in small edge models, they are surpassing the performance of much larger models when given more processing time. For instance, Llama 3.2 3B outperformed Llama 3.1 70B in the Math-500 benchmark using a test-time compute strategy of 256 iterations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The evolution of AI regulation
&lt;/h3&gt;

&lt;p&gt;Following the EU's lead, we're likely to see more regions implement AI regulations in 2025. Interestingly, major tech companies are actively calling for regulation, potentially as a strategy to shape the regulatory environment and raise barriers to entry for competitors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;2024 has been a year of remarkable progress in AI. We have seen many new ways to use AI in our day-to-day lives, as well as new methods to improve models that are making giant strides closer to AGI. As we move into 2025, the focus appears to be shifting toward more specialized and efficient AI systems, improved reasoning capabilities through techniques like test-time compute, and a focus on AI tools to improve productivity.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>discuss</category>
      <category>chatgpt</category>
    </item>
    <item>
      <title>Is the EU Falling Behind in the AI Race?</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Fri, 29 Nov 2024 16:47:23 +0000</pubDate>
      <link>https://dev.to/rogiia/is-the-eu-falling-behind-in-the-ai-race-3o5b</link>
      <guid>https://dev.to/rogiia/is-the-eu-falling-behind-in-the-ai-race-3o5b</guid>
      <description>&lt;p&gt;The recent announcement that Meta's &lt;strong&gt;Llama 3.2 Vision models&lt;/strong&gt; won't be available in the European Union has reignited discussions about the impact of EU regulations on AI innovation and accessibility. This development joins a growing list of AI technologies from major tech companies that are currently unavailable to EU users, including &lt;strong&gt;ChatGPT's Advanced Voice mode&lt;/strong&gt; and &lt;strong&gt;Apple Intelligence&lt;/strong&gt;, raising concerns about whether the EU might be falling behind in the global AI race.&lt;/p&gt;

&lt;h3&gt;
  
  
  The EU AI Act and Its Impact
&lt;/h3&gt;

&lt;p&gt;In April 2021, the European Commission proposed the &lt;strong&gt;EU AI Act&lt;/strong&gt;. This act classifies AI systems according to the risk they pose to users. The different risk levels mean more or less regulation. Furthermore, it sets some rules for &lt;strong&gt;General Purpose AI systems&lt;/strong&gt;, like Meta’s Llama or ChatGPT. Model developers must provide technical documentation, instructions, comply with the Copyright Directive, and publish a summary about the content used for training. If the model is open, like Llama, and it presents a &lt;strong&gt;systemic risk&lt;/strong&gt;, which Llama does, it must also conduct model evaluations, adversarial testing, track and report serious incidents and ensure cybersecurity protections.&lt;/p&gt;

&lt;p&gt;According to the study by Stanford University “Do Foundation Model Providers Comply with the Draft EU AI Act?”, as of June 2023, &lt;strong&gt;no foundation models fully comply with the EU AI Act's requirements&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Meta and EU regulation
&lt;/h3&gt;

&lt;p&gt;The situation becomes more complex when examining Meta's challenges with training their models in the EU. In June 2024, Meta faced a setback when the &lt;strong&gt;Irish Data Protection Commission&lt;/strong&gt; requested a delay in training their large language models using &lt;strong&gt;public content from adult Facebook and Instagram users in the EU&lt;/strong&gt;. Meta expressed disappointment with this decision, arguing it would hinder European innovation.&lt;/p&gt;

&lt;p&gt;It seems like these requirements influenced Meta's decision not to release Llama 3.2 Vision models in the EU. However, this also looks like a retaliation tactic to pressure the EU to let Meta use private user data to train its models.&lt;/p&gt;

&lt;p&gt;As a side note, according to Llama 3.2’s Use Policy, the restriction of use of these models specifically applies to companies and individuals based in the EU who wish to use and build services using these models directly - end users in Europe that use services built on these models are not affected.&lt;/p&gt;

&lt;h3&gt;
  
  
  EU Needs AI
&lt;/h3&gt;

&lt;p&gt;A significant countermovement has emerged in response to these regulatory challenges. The &lt;strong&gt;"EU Needs AI"&lt;/strong&gt; initiative, supported by prominent figures including &lt;strong&gt;Meta's Chief AI Scientist Yann LeCun&lt;/strong&gt;, argues that fragmented regulation threatens Europe's competitive position in AI development. Their position statement emphasizes that "Europe has become less competitive and less innovative compared to other regions and it now risks falling further behind in the AI era due to inconsistent regulatory decision making."&lt;/p&gt;

&lt;h3&gt;
  
  
  What Does the Future Hold for AI in the EU?
&lt;/h3&gt;

&lt;p&gt;In my opinion, probably the pushback in regulation is overblown. I believe that, even with these regulations in place, the EU will still have access to some amazing models, and its citizens will enjoy more ethical AI practices. &lt;/p&gt;

&lt;p&gt;While companies from Europe will need to comply with many more requirements as companies from the rest of the world, those requirements are not out of place or unnecessary. Many of the requirements for General Purpose AI systems, systemic risk or not, described in the EU AI Act, are from practices that are already widely in place and considered good practices from reliable models. Also, citizens in the EU will be able to enjoy more privacy and security than any other citizen in the world. Their private data, including pictures, won’t be used for training models. Abusive AI systems like social scoring and manipulative AI won’t be a problem for them.&lt;/p&gt;

&lt;p&gt;In conclusion, the EU will not be falling behind in the AI race. It will keep being competitive, in part thanks to its more ethical practices. And AI regulation is not only not a blocker, but also a necessity, and more countries should follow suit.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>news</category>
    </item>
    <item>
      <title>Build an Advanced RAG App: Query Routing</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Thu, 12 Sep 2024 11:07:55 +0000</pubDate>
      <link>https://dev.to/rogiia/build-an-advanced-rag-app-query-routing-cn1</link>
      <guid>https://dev.to/rogiia/build-an-advanced-rag-app-query-routing-cn1</guid>
      <description>&lt;p&gt;In previous articles, we built a basic RAG application. We also learned to introduce more advanced techniques to improve a RAG application. Today, we will explore how to tie those advanced techniques together. Those techniques might do different, sometimes opposite, things. Still, sometimes we need to use all of them, to cover all possibilities. So let's see how we can link different techniques together. In this article we will take a look at a technique called Query Routing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with Advanced RAG Applications
&lt;/h2&gt;

&lt;p&gt;When our Generative AI application receives a query, we have to decide what to do with it. For simple Generative AI applications, we send the query directly to the LLM. For simple RAG applications, we use the query to retrieve context from a single data source and then query the LLM. But, if our case is more complex, we can have multiple data sources, or different queries need different types of context. So do we build a one-size-fits-all solution, or do we make the application adapt to take different actions depending on the query?&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Query Routing?
&lt;/h2&gt;

&lt;p&gt;Query Routing is about giving our RAG app the power of decision-making. Query Routing is a technique that takes the query from the user and uses it to make a decision on the next action to take, from a list of predefined choices.&lt;/p&gt;

&lt;p&gt;Query Routing is a module in our Advanced RAG architecture. It is usually found after any query rewriting or guardrails. It analyses the input query and it decides the best tool to use from a list of predefined actions. The actions are usually retrieving context from one or many data sources. It could also decide to use a different index for a data source (like parent-child retrieval). Or it could even decide to search for context on the Internet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which are the choices for the Query Router?
&lt;/h2&gt;

&lt;p&gt;We have to define the choices that the Query Router can take beforehand. We must first implement each of the different strategies, and accompany each one with a nice description. It is very important that the description explains in detail what each strategy does, since this description will be what our router will base its decision on.&lt;/p&gt;

&lt;p&gt;The choices a Query Router takes can be the following:&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval from different data sources
&lt;/h3&gt;

&lt;p&gt;We can catalog multiple data sources that contain information on different topics. We might have a data source that contains information about a product that the user has questions about. And another data source with information about our return policies, etc. Instead of looking for the answers for the user’s questions in all data sources, the query router can decide which data source to use based on the user query and the data source description.&lt;/p&gt;

&lt;p&gt;Data sources can be text stored in vector databases, regular databases, graph databases, etc.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval from different indexes
&lt;/h3&gt;

&lt;p&gt;Query Routers can also choose to use a different index for the same data source.&lt;/p&gt;

&lt;p&gt;For example, we could have an index for keyword based search and another for semantic search using vector embeddings. The Query Router can decide which of the two is best for getting the relevant context for answering the question, or maybe use both of them at the same time and combine the contexts from both.&lt;/p&gt;

&lt;p&gt;We could also have different indexes for different retrieval strategies. For example, we could have a retrieval strategy based on summaries, or a sentence window retrieval strategy, or a parent-child retrieval strategy. The Query Router can analyze the specificity of the question and decide which strategy is best to use to get the best context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other data sources
&lt;/h3&gt;

&lt;p&gt;The decision that the Query Router takes is not limited to databases and indexes. It can also decide to use a tool to look for the information elsewhere. For example, it can decide to use a tool to look for the answer online using a search engine. Or it can also use an API from a specific service (for example, weather forecasting) to get the data it needs to get the relevant context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of Query Routers
&lt;/h2&gt;

&lt;p&gt;An important part of our Query Router is how it makes the decision to choose one or another path. The decision can vary depending on each of the different types of Query Routers. The following are a few of the most used Query Router types:&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM Selector Router
&lt;/h3&gt;

&lt;p&gt;This solution gives a prompt to an LLM. The LLM completes the prompt with the solution, which is the selection of the right choice. The prompt includes all the different choices, each with its description, as well as the input query to base its decision on. The response of this query will be used to programmatically decide which path to take.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM Function Calling Router
&lt;/h3&gt;

&lt;p&gt;This solution leverages the function calling capabilities (or tool using capabilities) of LLMs. Some LLMs have been trained to be able to decide to use some tools to get to an answer if they are provided for them in the prompt. Using this capability, each of the different choices is phrased like a tool in the prompt, prompting the LLM to choose which one of the tools provided is best to solve the problem of retrieving the right context for answering the query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic Router
&lt;/h3&gt;

&lt;p&gt;This solution uses similarity search on the vector embedding representation of the user query. For each choice, we will have to write a few examples of a query that would be routed to this path. When a user query arrives, an embeddings model converts it to a vector representation and it is compared to the example queries for each router choice. The example with the nearest vector representation to the user query is chosen as the path the router must route to.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero-shot classification Router
&lt;/h3&gt;

&lt;p&gt;For this type of router, a small LLM is selected to act as a router. This LLM will be finetuned using a dataset of examples of user queries and the correct routing for each of them. The finetuned LLM’s sole purpose will become to classify user queries. Small LLMs are more cost-effective and more than good enough for a simple classification task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Language Classification Router
&lt;/h3&gt;

&lt;p&gt;In some cases, the purpose of the Query Router will be to redirect the query to a specific database or model depending on the language the user wrote the query in. Language can be detected in many ways, like using a ML classification model or a Generative AI LLM with a specific prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keyword router
&lt;/h3&gt;

&lt;p&gt;Sometimes the use case is extremely simple. In this case, the solution could be to route one way or another depending on if some keywords are present in the user query. For example, if the query contains the word “return” we could use a data source with information useful about how to return a product. For this solution, a simple code implementation is enough, and therefore, no expensive model is needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Single choice routing vs Multiple choice routing
&lt;/h2&gt;

&lt;p&gt;Depending on the use case, it will make sense for the router to just choose one path and run it. However, in some cases it also can make sense to use more than one choice for answering the same query. To answer a question that spans many topics, the application needs to retrieve information from many data sources. Or the response might be different based on each data source. Then, we can use all of them to answer the question and consolidate them in a single final answer.&lt;/p&gt;

&lt;p&gt;We have to design the router taking these possibilities into account.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example implementation of a Query Router
&lt;/h2&gt;

&lt;p&gt;Let’s get into the implementation of a Query Router within a RAG application. You can follow the implementation step by step and run it yourself in the &lt;a href="https://colab.research.google.com/drive/1B1rGvGriKIVe7PMClrMC0z3wMBbsLIYW?usp=sharing" rel="noopener noreferrer"&gt;Google Colab notebook&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For this example, we will showcase a RAG application with a query router. The application can decide to answer questions based on two documents. The first document is a paper about RAG and the second a recipe for chicken gyros. Also, the application can decide to answer based on a Google search. We will implement a single-source Query Router using an LLM function calling router.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load the paper
&lt;/h3&gt;

&lt;p&gt;First, we will prepare the two documents for retrieval. Let's first load the paper about RAG:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd4ha0xazzpftro4jiol.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd4ha0xazzpftro4jiol.png" alt="Load the RAG paper" width="800" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Load the recipe
&lt;/h3&gt;

&lt;p&gt;We will also load the recipe for chicken gyros. This recipe from Mike Price is hosted in tasty.co. We will use a simple web page reader to read the page and store it as text.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgnojj5f9kcgf2ru224hz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgnojj5f9kcgf2ru224hz.png" alt="Load chicken gyros recipe" width="800" height="189"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Save the documents in a vector store
&lt;/h3&gt;

&lt;p&gt;After getting the two documents we will use for our RAG application, we will split them into chunks and we will convert them to embeddings using BGE small, an open-source embeddings model. We will store those embeddings in two vector stores, ready to be questioned.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqii1tl3bw585byl6z17a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqii1tl3bw585byl6z17a.png" alt="Create Vector Stores" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Search engine tool
&lt;/h3&gt;

&lt;p&gt;Besides the two documents, the third option for our router will be to search for information using a Google Search. For this example I have created my own Google Search API keys. If you want this part to work, you should use your own API keys.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flo7cagk0qfdfnifxd24r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flo7cagk0qfdfnifxd24r.png" alt="Define Google Search Tool" width="800" height="769"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create the Query Router
&lt;/h3&gt;

&lt;p&gt;Next, using the LlamaIndex library, we create a Query Engine Tool for each of the three options that the router will choose between. We provide a description for each of the tools, explaining what it is useful for. This description is very important, since it will be the basis on which the Query Router decide which path it chooses.&lt;/p&gt;

&lt;p&gt;Finally, we create a Router Query Engine, also with Llama. We give the three query engine tools to this router. Also, we define the selector. This is the component that will make the choice of which tool to use. For this example, we are using an LLM Selector. It's also a single selector, meaning it will only choose one tool, never more than one, to answer the query.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4uokrd0dnzjxddceyv0q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4uokrd0dnzjxddceyv0q.png" alt="Create the query router" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Run our RAG application!
&lt;/h3&gt;

&lt;p&gt;Our Query Router is now ready. Let's test it with a question about RAG. We provided a vector store loaded with information from a paper on RAG techniques. The Query Router should choose to retrieve context from that vector store in order to answer the question. Let's see what happens:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftxqga818ykkxehsj9h1v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftxqga818ykkxehsj9h1v.png" alt="Question the app about RAG" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our RAG application answers correctly. Along with the answer, we can see that it provides the sources from where it got the information from. As we expected, it used the vector store with the RAG paper.&lt;/p&gt;

&lt;p&gt;We can also see an attribute "selector_result" in the result. In this attribute we can inspect which one of the tools the Query Router chose, as well as the reason that the LLM gave to choose that option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwwdhkf7e61oenxpxxnf1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwwdhkf7e61oenxpxxnf1.png" alt="Selector result for the RAG question" width="800" height="143"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let's ask it a culinary question. The recipe used to create the second vector store is for chicken gyros. Our application should be able to answer which are the ingredients needed for that recipe based on that source.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sapj6t5ekwkfcmsd0j3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sapj6t5ekwkfcmsd0j3.png" alt="Question the app about the recipe" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see, the chicken gyros recipe vector store was correctly chosen to answer that question.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffewfsx07u2o5gw3axmgr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffewfsx07u2o5gw3axmgr.png" alt="Selector result for the recipe question" width="800" height="106"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, let's ask it a question that can be answered with a Google Search.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm4f9h3w92402j4ot0xs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnm4f9h3w92402j4ot0xs.png" alt="Question for Google Search" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, Query Routing is a great step towards a more advanced RAG application. It allows to set up a base for a more complex system, where our app can better plan how to best answer questions. Also, Query Routing can be the glue that ties together other advanced techniques for your RAG application and makes them work together as a whole system.&lt;/p&gt;

&lt;p&gt;However, the complexity for better RAG systems doesn't end with Query Routing. Query Routing is just the first stepping stone for orchestration within RAG applications. The next stepping stone for making our RAG applications better reason, decide and take actions based on the needs of the users are Agents. In later articles, we will be diving deeper on how Agents work within RAG and Generative AI applications in general.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>machinelearning</category>
      <category>learning</category>
    </item>
    <item>
      <title>Build an Advanced RAG App: Query Rewriting</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Sun, 30 Jun 2024 17:02:54 +0000</pubDate>
      <link>https://dev.to/rogiia/build-an-advanced-rag-app-query-rewriting-h3p</link>
      <guid>https://dev.to/rogiia/build-an-advanced-rag-app-query-rewriting-h3p</guid>
      <description>&lt;p&gt;In the last article, I established the basic architecture for a basic RAG app. In case you missed that, I recommend to first read that article over here. That will set the base from which we can improve our RAG system. Also in that last article, I listed some common pitfalls that RAG applications tend to fail on. We will be tackling some of them with some advanced techniques in this article.&lt;/p&gt;

&lt;p&gt;To recap, a basic RAG app uses a separate knowledge base that aids the LLM to answer the user’s questions by providing it with more context. This is also called a retrieve-then-read approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;To answer the user’s question, our RAG app will retrieve appropriate based on the query itself. It will find chunks of text on the vector DB with similar content to whatever the user is asking. Other knowledge bases (search engine, etc.) also apply. The problem is, the chunk of information where the answer lies, might not be similar to what the user is asking. The question can be badly written, or expressed differently to what we expect. And, if our RAG app can’t find the information needed to answer the question, it won’t answer correctly.&lt;/p&gt;

&lt;p&gt;There are many ways to solve this problem, but for this article, we will look at query rewriting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Query Rewriting?
&lt;/h2&gt;

&lt;p&gt;Simply put, query rewriting means we will rewrite the user query in our own words, that our RAG app will know best how to answer. Instead of just doing retrieve-then-read, our app will do a rewrite-retrieve-read approach.&lt;/p&gt;

&lt;p&gt;We use a Generative AI model to rewrite the question. This model be a large model, like (or the same as) the one we use to answer the question in the final step. Or it can also be a smaller model, specially trained to perform this task.&lt;/p&gt;

&lt;p&gt;Also, query rewriting can take many different forms depending on the needs of the app. Most of the time, basic query rewriting will be enough. But, depending on the complexity of the questions we need to answer, we might need more advanced techniques like HyDE, multi-querying or step-back questions. More information on those in the following section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why does it work?
&lt;/h2&gt;

&lt;p&gt;Query Rewriting usually gives better performance in any RAG app that is knowledge intensive. This is because RAG applications are sensitive to the phrasing and specific keywords of the query. Paraphrasing this query is helpful in the following scenarios:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It restructures oddly written questions so they can be better understood by our system.&lt;/li&gt;
&lt;li&gt;It erases context given by the user which is irrelevant to the query.&lt;/li&gt;
&lt;li&gt;It can introduce common keywords, which will give it a better chance of matching up with the correct context.&lt;/li&gt;
&lt;li&gt;It can split complex questions into different sub.questions, which can be more easily responded separately, each with their corresponding context.&lt;/li&gt;
&lt;li&gt;It can answer question that require multiple levels of thinking by generating a step-back question, which is a higher-level concept question to the one from the user. It then uses both the original and the step-back question to retrieve context.&lt;/li&gt;
&lt;li&gt;It can use more advanced query rewriting techniques like HyDE to generate hypothetical documents to answer the question. These hypothetical documents will better capture the intent of the question and match up with the embeddings that contain the answer in the vector DB.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How to implement Query Rewriting
&lt;/h2&gt;

&lt;p&gt;We have stablished that there are different strategies of Query Rewriting depending on the complexity of the questions. We will briefly visit how to implement each of them. After, we will see a real example to compare the result of a basic RAG app versus a RAG app with Query Rewriting. You can also follow all the examples in &lt;a href="https://colab.research.google.com/drive/1-NT0_mmyoSnaDQJ1Zuo0XX613TG5lzjZ?usp=sharing" rel="noopener noreferrer"&gt;the article’s Google Colab notebook&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero-shot Query Rewriting
&lt;/h3&gt;

&lt;p&gt;This is simple query rewriting. Zero-shot refers to the prompt engineering technique of giving examples of the task to the LLM, which in this case we give none.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0i5am6hvebowyzecjgwc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0i5am6hvebowyzecjgwc.png" alt="Zero-shot Query Rewriting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Few-shot Query Rewriting
&lt;/h3&gt;

&lt;p&gt;For a slightly better result at the cost of using a few more tokens per rewrite, we can give some examples of how we want the rewrite to be done.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimekxtzo3q156vlhxlgw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimekxtzo3q156vlhxlgw.png" alt="Few-shot Query Rewriting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Trainable rewriter
&lt;/h3&gt;

&lt;p&gt;We can fine-tune a pre-trained model to perform the query rewriting task. Instead of relying on examples, we can teach it how query rewriting should be done to achieve the best results in context retrieving. Also, we can further train it using Reinforcement Learning so it can learn to recognize problematic queries and avoid toxic and harmful phrases. Or we can also use an open-source model that has already been trained by somebody else on the task of query rewriting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sub-queries
&lt;/h3&gt;

&lt;p&gt;If the user query contains multiple questions, this can make context retrieval tricky. Each question probably needs different information, and we are not going to get all of it using all the questions as basis for information retrieval. To solve this problem, we can decompose the input into multiple sub-queries, and perform retrieval for each of the sub-queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu9vd5qvm1stg4rbzsa9p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu9vd5qvm1stg4rbzsa9p.png" alt="Sub-queries Query Rewriting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step-back prompt
&lt;/h3&gt;

&lt;p&gt;Many questions can be a bit too complex for the RAG pipeline’s retrieval to grasp the multiple levels of information needed to answer them. For these cases, it can be helpful to generate multiple additional queries to use for retrieval. These queries will be more generic than the original query. This will enable the RAG pipeline to retrieve relevant information on multiple levels.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5kyuh25pltxn7n1i7g8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5kyuh25pltxn7n1i7g8.png" alt="Step-back prompt Query Rewriting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  HyDE
&lt;/h3&gt;

&lt;p&gt;Another method to improve how queries are matched with contexts chunks are Hypothetical Document Embeddings or HyDE. Sometimes, questions and answers are not that semantically similar, which can cause the RAG pipeline to miss critical context chunks in the retrieval stage. However, even if the query is semantically different, a response to the query should be semantically similar to another response to the same query. The HyDE method consists of creating hypothetical context chunks that answer the query and using them to match the real context that will help the LLM answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuq421s3my99dsao07yom.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuq421s3my99dsao07yom.png" alt="HyDE Query Rewriting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuyli3bxsc9uhzsuclrx8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuyli3bxsc9uhzsuclrx8.png" alt="HyDE Query Rewriting comparison"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: RAG with vs without Query Rewriting
&lt;/h2&gt;

&lt;p&gt;Taking the RAG pipeline from the last article, “How to build a basic RAG app”, we will introduce Query Rewriting into it. We will ask it a question a bit more advanced than last time and observe whether the response improves with Query Rewriting over without it. First, let’s build the same RAG pipeline. Only this time, I’ll only use the top document returned from the vector database to be less forgiving to missed documents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02iuxjn1iwulbtx2q8c2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02iuxjn1iwulbtx2q8c2.png" alt="Example without Query Rewriting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The response is good and based on the context, but it got caught up in me asking about evaluation and missed that I was specifically asking for tools. Therefore, the context used does have information on some benchmarks, but it misses the next chunk of information that talks about tools.&lt;/p&gt;

&lt;p&gt;Now, let’s implement the same RAG pipeline but now with Query Rewriting. As well as the query rewriting prompts, we have already seen in the previous examples, I’ll be using a Pydantic parser to extract and iterate over the generated alternative queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcwfgwuvqfdhtoxfpv1ca.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcwfgwuvqfdhtoxfpv1ca.png" alt="Example with Query Rewriting"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The new query now matches with the chunk of information I wanted to get my answer from, giving the LLM a better chance of answering a much better response for my question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We have taken our first step out of basic RAG pipelines and into Advanced RAG. Query Rewriting is a very simple Advanced RAG technique but a powerful one for improving the results of a RAG pipeline. We have gone over different ways to implement it depending on what kind of questions we need to improve. In future articles we will go over other Advanced RAG techniques that can tackle different RAG issues than those seen in this article.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to build a basic RAG app</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Wed, 29 May 2024 19:50:28 +0000</pubDate>
      <link>https://dev.to/rogiia/how-to-build-a-basic-rag-app-h9p</link>
      <guid>https://dev.to/rogiia/how-to-build-a-basic-rag-app-h9p</guid>
      <description>&lt;p&gt;The dawn of Generative AI makes possible new kinds of capabilities for the applications we build. LLMs can answer the user’s questions with an incredible skill. So, why not use them as part of our systems. If the user needs help getting around the app, we can put a chat function where the LLM will answer all the user’s questions. If our app has blog posts explaining important concepts, instead of making the user read all of them to get the knowledge it needs, it could just ask and get an immediate response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why RAG?
&lt;/h2&gt;

&lt;p&gt;We decide to integrate a LLM into our app to bring these features to our users. However, we soon find that the model can’t answer the user’s questions. It doesn’t have any information about our application! If the information needed to answer is not in the LLM’s training data, it can’t answer. Even worse, if it doesn’t know the answer, it might hallucinate a completely wrong fact! This is bad, so how do we fix this? LLMs with the Transformer architecture have shown great in-context learning capabilities. So, we just have to pass all the facts that it needs in the prompt, together with the question! Uh oh, it will definitely be expensive to stuff all the data in every prompt. So, how do we do it?&lt;/p&gt;

&lt;h2&gt;
  
  
  What is RAG?
&lt;/h2&gt;

&lt;p&gt;RAG stands for &lt;strong&gt;Retrieval Augmented Generation&lt;/strong&gt;. RAG was born together with Transformers. Initially, it was used to augment the pre-training data of LLMs with additional facts. Once Transformers’ in-context learning capabilities became obvious, it became a common practice also during inference, to augment the prompt.&lt;/p&gt;

&lt;p&gt;A basic RAG pipeline consists of three steps: indexing, retrieval and generation. All the information that the LLM needs to answer is indexed in a vector database. When the user asks a question, we can retrieve the relevant parts of the information from that vector database. Finally, together with just the relevant information and the user’s question, we can prompt the LLM to give an answer based on the information we give it as a context. Let’s look in more detail how to achieve this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Indexing
&lt;/h3&gt;

&lt;p&gt;First, we extract the information that the model needs from wherever it is. Generative models work with plain text (some models can also work with images or other formats, which can also be indexed, but this is a topic for another time). If the information is already in plain text, we are in luck. But it might also be in PDF documents, Word documents, Excel, Markdown, etc. We must convert this data to plain text and clean it so it can be usable for the model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rbz60uizoiswi8lng0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rbz60uizoiswi8lng0k.png" alt="Indexing process"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the information is in text format, we can store it in a vector database. The vector database will store the embeddings representation of that text. That will allow us to search for parts of the text that have a similar embedding representation as another text, therefore they are about a similar concept. We will divide the whole text into smaller parts or chunks, calculate the embeddings representation for each of them, and finally store them in the vector database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval
&lt;/h3&gt;

&lt;p&gt;When the user asks us a question, we can convert that question into a vector representation, using the same embeddings model we used to index the data. With that vector representation, we will calculate the similarity factor between the question and each one of the chunks stored in the vector database. We will select the top K chunks that are the most similar to the query, and therefore their contents are about the same concept as the question (and therefore they might contain the answer).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisms443c1ytjcazo5t17.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisms443c1ytjcazo5t17.png" alt="Retrieval process"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Generation
&lt;/h3&gt;

&lt;p&gt;A prompt is built, putting together the user’s question and the relevant contexts to help the LLM answer. We might also include previous messages from the conversation between the user and the AI assistant. The LLM generates an answer for the user based on the context, instead of its previously learned pre-training data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx9ih4417ubpht7kyskav.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx9ih4417ubpht7kyskav.png" alt="Retrieval process"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;For this example, we will ingest a paper called “Retrieval-Augmented Generation for Lange Language Models: A Survey”. We will query the LLM using the information contained in this paper, so it can answer the user’s questions on its contents. You can follow this example in &lt;a href="https://colab.research.google.com/drive/1mFmPN0GBHpS-kMDMuU8EDrWu1KENy69e?usp=sharing" rel="noopener noreferrer"&gt;the Google Colab notebook provided for this article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;First, we will load the PDF document and parse it using LangChain’s PyPDF connector.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmggsh8vxc1i6aknze50x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmggsh8vxc1i6aknze50x.png" alt="Load document using pypdf"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once we have the text from the document, we have to split it into smaller chunks. We can use LangChain’s available splitters, like RecursiveCharacterSplitter in this case:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F92h7gf78bv699oup9xfc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F92h7gf78bv699oup9xfc.png" alt="Split document into chunks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will be using BGE-small, an opensource embeddings model. We will download it from HuggingFace Hub and run it on all chunks to calculate their vector representations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9qoe5p4b0t37gooh4ix.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg9qoe5p4b0t37gooh4ix.png" alt="Calculate embeddings"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once we have the vector representations for all chunks, we can create an in-memory vector database and store all vectors in it. For this example, we will be using a FAISS database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvw8o3f9hwtafr3olord.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvw8o3f9hwtafr3olord.png" alt="Load embeddings into the vector db"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The database is now set up. Now, we will be taking queries from the user on this information. In this case, the user asks which are the drawbacks of Naive RAG. We encode this query using the same embeddings model as before. Then, we retrieve the top 5 most similar chunks to that query.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3euuftp1j1edlvj8oau.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3euuftp1j1edlvj8oau.png" alt="Retrieve documents similar to the query from the vectordb"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After retrieving the relevant context, we build a prompt using this information and the user’s original query. We will use Claude’s Haiku as a LLM for this example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdl7s7gownp37psg9084.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdl7s7gownp37psg9084.png" alt="Use the context and the query to generate an answer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Common problems and pitfalls
&lt;/h2&gt;

&lt;p&gt;As the title implies, this solution is a basic or naïve RAG implementation. It will empower your application to make the most out of the LLM it’s using and your data. But it won’t work for all cases. These are just some of the most common problems with RAG:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve irrelevant information.&lt;/strong&gt; If the retriever gets data from the vector database that is not relevant to the question, it will confuse the model trying to answer the question. This might lead to either not using the context to answer the question, or answering something different than what was asked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Miss important information.&lt;/strong&gt; Maybe the information it needs to answer the question is not in the database. Maybe the retrieval mechanism fails to find the relevant chunks. We must find ways to help the retriever find the information it needs easily and more reliably.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate responses not supported by the context.&lt;/strong&gt; If the context has the information the model needs, but it doesn’t use it and instead relies on its own pre-training data, all this was for nothing. The information from the pre-training data might be outdated or wrong. We must favor the model to always use the context to answer, or answer “I don’t know” if it can’t answer from the context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Irrelevant response to the query.&lt;/strong&gt; The LLM might use all the information that you give it to generate a response, but that doesn’t mean that it answers the user’s question. It’s important that the model sticks to the user’s original question, instead on getting lost in a ton of information.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redundant response caused by similar contexts.&lt;/strong&gt; When we ingest multiple documents with similar information, there’s a chance that the retriever will get multiple chunks of information that say almost the same. This might cause the LLM to repeat the same information more than one time in its response.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to avoid these problems?
&lt;/h2&gt;

&lt;p&gt;To avoid these problems, a naïve RAG pipeline might not be enough. We will need to set up a more advanced and complex RAG system. There exist tested techniques to solve the problems we have laid out. We can incorporate them into our RAG pipeline to improve the RAG application’s performance.&lt;/p&gt;

&lt;p&gt;Another important point to address is that, to improve your RAG application, you will need to be able to measure and evaluate the whole process. You can’t improve what you can’t measure. Plus, when you evaluate you might find that a basic RAG setup is enough for your use case, and you don’t need to overcomplicate it. After all, even a very basic RAG implementation can improve your LLM powered application enormously.&lt;/p&gt;

&lt;p&gt;In future articles, I will explain in more detail the advanced RAG techniques that will help us avoid common problems and bring our RAG applications to the next level.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>rag</category>
    </item>
    <item>
      <title>How to use LLMs: Summarize long documents</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Wed, 01 May 2024 20:54:48 +0000</pubDate>
      <link>https://dev.to/rogiia/how-to-use-llms-summarize-long-documents-4ee1</link>
      <guid>https://dev.to/rogiia/how-to-use-llms-summarize-long-documents-4ee1</guid>
      <description>&lt;p&gt;One of the tasks Generative AI models are good at is summarizing texts. However, if you want to summarize a text, you need the whole text to fit in the context window, plus the command to summarize it. The following is an example prompt (you can test it yourself using the &lt;a href="https://colab.research.google.com/drive/1z5wwBLOag7TK5ylimSuosEQQdjkBswYu?usp=sharing" rel="noopener noreferrer"&gt;Google Colab notebook for this article&lt;/a&gt;):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

Write a concise summary of the following text:
The Turing test, originally called the imitation game by Alan Turing in 1950, is a test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human. Turing proposed that a human evaluator would judge natural language conversations between a human and a machine designed to generate human-like responses. The evaluator would be aware that one of the two partners in conversation was a machine, and all participants would be separated from one another. The conversation would be limited to a text-only channel, such as a computer keyboard and screen, so the result would not depend on the machine's ability to render words as speech. If the evaluator could not reliably tell the machine from the human, the machine would be said to have passed the test. The test results would not depend on the machine's ability to give correct answers to questions, only on how closely its answers resembled those a human would give. Since the Turing test is a test of indistinguishability in performance capacity, the verbal version generalizes naturally to all of human performance capacity, verbal as well as nonverbal (robotic).

Summary: 


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakgsu22rygorbp8s2bud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fakgsu22rygorbp8s2bud.png" alt="Summarization with direct use"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This might get tricky if the text to summarize is too long. There exist LLMs with very large window contexts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-4 has a variant with a context window of 32k tokens.&lt;/li&gt;
&lt;li&gt;Claude 3 Opus has a context window of 200k tokens.&lt;/li&gt;
&lt;li&gt;Gemini 1.5 Pro can have a context window of up to 1 million tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, sometimes using a LLM with such a large context window is off the table. There are many factors that can make this the case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Those models might be too expensive.&lt;/li&gt;
&lt;li&gt;Models sometimes have a hard time using all the information in very long prompts.&lt;/li&gt;
&lt;li&gt;All the models with large context windows might be off the table because of restrictions like only being able to use open-source models.&lt;/li&gt;
&lt;li&gt;Your text might be longer that all available models (yes even with 1M context windows).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Luckily, there exists a technique that can get an LLM to summarize a document longer than its context window size. The technique is called MapReduce. It’s based on dividing the text in a collection of smaller texts that do fit in the context window and then summarizing each part separately. The steps to perform a MapReduce summary are the following (Again, you can execute and see for yourself all the process in the &lt;a href="https://colab.research.google.com/drive/1z5wwBLOag7TK5ylimSuosEQQdjkBswYu?usp=sharing" rel="noopener noreferrer"&gt;Google Colab notebook for this article&lt;/a&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, the long document is divided into chunks using a text splitter. The strategy we use to divide the text can be different depending on the type of document.

&lt;ul&gt;
&lt;li&gt;If the text has sections and all sections are smaller than the context window, we could divide it by those sections.&lt;/li&gt;
&lt;li&gt;If the text has no clear sections or the sections are too large, the text can be divided in equal sized chunks character-wise. This last approach has the problem of separating related sentences in different chunks. To avoid this problem, we can have an overlap between chunks. With this solution, the last N characters of a chunk will be repeated as the first N characters of the next chunk, so context is not lost.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6isaw2usolr8imz4stnm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6isaw2usolr8imz4stnm.png" alt="Chunks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Second, all chunks are summarized separately using the LLM. Remember to tune your summarize prompt to help understand the model what kind of document it is and how to summarize it properly. For example, you might want the summary to be a bullet point list of the main points of the text or you might want the summary to be just a few sentences.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8t72ugt7q8y60y89git5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8t72ugt7q8y60y89git5.png" alt="Map: Summarization of all chunks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The third step is optional. If the combination of all summaries does not fit into the context window, we won’t be able to ask for a consolidated summary. Instead, we need to make the combination of all summaries smaller.

&lt;ul&gt;
&lt;li&gt;We will group the summaries in groups that fit into the context window.&lt;/li&gt;
&lt;li&gt;Then, we will write a prompt that combines all the summaries to a single summary with the key ideas.&lt;/li&gt;
&lt;li&gt;After reducing all the summary groups, if the resulting combined summaries still don't fit in the context window, this process is executed again.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F10t3iq6za6c4tvqz1p4z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F10t3iq6za6c4tvqz1p4z.png" alt="Combine: Consolidate summaries"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lastly, when all summaries fit into the context window, we can finally combine or reduce them into the final summary.

&lt;ul&gt;
&lt;li&gt;We will write a prompt that combines all the summaries into a single final summary with all the key ideas. This prompt is usually the same as the one in step 3 since the idea is basically the same; combine a list of summaries into one shorter summary. However, you might need the freedom to diverge the two prompts to accommodate specific needs for special kinds of documents or special summaries.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdde8ngi6r0qba4gjidqb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdde8ngi6r0qba4gjidqb.png" alt="Reduce: consolidate all summaries into a single final summary"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And that’s it! You now have a short summary of the most important points of a large document. But before you start processing your whole documentation, there are a few important notes you need to consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This MapReduce method might not be less expensive than using an LLM with a large context window, especially if your cost is per token (as it is in all Model-as-a-service LLMs). Using this method, you will still use all the tokens in the document, plus the tokens of the intermediate summaries (as output and then as input of intermediate prompts). I recommend you study first which method will be more costly for your use case.&lt;/li&gt;
&lt;li&gt;While better models will produce better summaries, the summarization capabilities of not-so-good models will be good enough for most cases. The result after many layers of summarization will be of similar quality, so you might be able to get away with using cheaper models if you decide to use this method.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Understanding LLMs: Mixture of Experts</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Mon, 01 Apr 2024 18:16:18 +0000</pubDate>
      <link>https://dev.to/rogiia/understanding-llms-mixture-of-experts-jbm</link>
      <guid>https://dev.to/rogiia/understanding-llms-mixture-of-experts-jbm</guid>
      <description>&lt;p&gt;Unlike the Transformers architecture, Mixture of Experts is not a new idea. Still, it is the latest hot topic in Large Language Model architecture. This architecture has been rumored to power OpenAI's GPT-4 (and maybe GPT3.5-turbo) and is the backbone of Mistral's Mixtral 8x7B, Grok-1 and Databricks' DBRX, which rival or even surpass GPT 3.5 with a relatively smaller size. Follow along to learn more about how this kind of architecture works and why does it lead to such great results for LLMs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;A Mixture of Experts is a model with a sparse layer and a router. The experts reside in the sparse layer, and they are models unconnected between them. Each expert specializes in a specific task. The router is a gating mechanism that learns and decides which experts is best equipped to deal with the input. The simplicity of this concept allows this architecture to work with any type of model. In this article we will focus on Transformers where the experts are feed-forward networks, but they might as well be RNNs, SVMs or even Linear Regression models. Another possibility is Hierarchical experts, which use multiple routers at different levels.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hxxkoshlwhx6pmoytwr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hxxkoshlwhx6pmoytwr.png" alt="Mixture of Experts Architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The big advantage of this kind of architecture is conditional computation. Every single inference doesn’t need to use all the model’s weights. The gating mechanism is trained to choose the top k experts and route the input only to those. This choice also has a degree of random noise, which prevents overloading the most popular experts and ensures that other experts are also trained on all kinds of data.&lt;/p&gt;

&lt;h2&gt;
  
  
  History
&lt;/h2&gt;

&lt;p&gt;The first sentence of this article stated that Mixture of Experts is not a recent idea. In fact, it was first proposed in 1991 in the paper Adaptive Mixture of Local Experts. In this article, the authors proposed that when the model had to perform different tasks, it was beneficial to have different experts with decoupled weights that weren’t affected by other experts fitting their weights to their own task.&lt;/p&gt;

&lt;p&gt;Even though the idea is old, the Mixture of Experts architecture benefits a lot of today’s computing power and horizontal scaling. MoE models can easily be distributed between multiple devices. Since not all weights of the model activate on each inference, each expert can be located to a different device, which frees up the devices with other experts to handle other tasks in parallel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tw1leb8zuyfk8vbxv36.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tw1leb8zuyfk8vbxv36.png" alt="Mixture of Experts Communication"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How many experts should a model have?
&lt;/h2&gt;

&lt;p&gt;When we train a Mixture of Experts model, we expect each expert to learn and be proficient with specific tasks. Experts do seem to specialize in handling specific inputs. For example, for a language model experts tend divide their expertise in handling nouns, verbs, punctuation, numbers and counting, etc. However, they don’t specialize in other tasks that we would consider obvious to divide. When we train a MoE model in a multilingual corpus, different experts don’t learn different languages, they all seem to try to learn all of them.&lt;/p&gt;

&lt;p&gt;A crucial decision when designing a Mixture of Experts model is the number of experts it will have. Normally, more experts mean more efficiency, since a smaller part of the whole model will need to be used for each inference. However, there are some caveats. The advantages of adding another expert diminish the more experts we have; 4 to 16 experts seem to be a sweet spot. Also, even though it doesn’t use all weights for every inference, reducing computing time, it still must always hold all the weights in VRAM. Looking at some popular models, DBRX has 16 experts (4 activate at any inference), while Mixtral and Grok have 8 (2 activate).&lt;/p&gt;

&lt;h2&gt;
  
  
  Fine-tuning MoE
&lt;/h2&gt;

&lt;p&gt;A particular problem with Mixture of Experts is that they are hard to fine-tune. MoEs are very prone to overfitting. After fine tuning, they are bad at reasoning tasks, but still good at knowledge tasks. A way to mitigate this is to reduce the number of experts, as fewer experts lead to better fine tuning. Also, a recent study has shed some hope for MoE fine tuning. It had great success at finetuning a Flan MoE, suggesting that Moe's might benefit from instruction fine tuning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling MoE
&lt;/h2&gt;

&lt;p&gt;In the other hand, Mixture of Experts are great for high-throughput scenarios, as opposed to dense models. MoEs can be scaled with many techniques.&lt;/p&gt;

&lt;p&gt;A paper by Google named GShard explored solving device underutilization to successfully scale a MoE horizontally across many devices. They replicated all non-MoE layers between all devices, but MoE layers had a different expert for each device. They also introduced the concept of expert capacity, which is the maximum number of tokens an expert can take before it is considered overflowed, after when the next expert in line would take over.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd52icf3cghyhr44hlbp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd52icf3cghyhr44hlbp.png" alt="Expert Capacity Example"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another paper, named Switch Transformers, looked at techniques to reduce communication costs between devices and reduce training instabilities. To optimize parallelism, they proposed to use a single expert approach and reduce the capacity factor to almost all tokens being equally divided between the experts (with some small wiggle room for over choosing a specific expert). Switch Transformers also proposed to only use bfloat16 precision for expert layers and use full precision for other layers. This stabilizes training, as other layers like the router need better precision due to an exponentiating function, while still reducing communication costs between experts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimizing MoE
&lt;/h2&gt;

&lt;p&gt;Mixture of Expert models can also be optimized through different means. Distillation of a sparse model into a dense model keeps 30% of sparsity gains while being much smaller in total model size. Another technique is Aggregation of MoE, which merges weights of all experts into one, which still performs very good on all tasks. Also, QMoE is a quantization technique that can store 1.6 trillion parameters in less than 160GB (0.8 bits per parameter!).&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, given that there’s a need today for models that perform a multitude of different tasks for a group of millions of people (think ChatGPT or similar products), MoE’s excellence in high-throughput, distributed scenarios shines. Being training and inference efficient will also mean lower costs and faster innovation. Of course, not everything is great, there are some drawbacks. Being hard to fine tune is a problem, as needing a lot of VRAM to operate. What is certain is that in the future we will keep seeing better techniques to optimize sparse models and it will lead to better LLMs.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>learning</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>What To Expect For AI in 2024?</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Sun, 28 Jan 2024 20:57:26 +0000</pubDate>
      <link>https://dev.to/rogiia/what-to-expect-for-ai-in-2024-4h45</link>
      <guid>https://dev.to/rogiia/what-to-expect-for-ai-in-2024-4h45</guid>
      <description>&lt;p&gt;2023 was a great year for AI. Large Language Models were already in the spotlight for both users and businesses. ChatGPT had been just released in late 2022 and was taking the world by storm. Still, 2023 has brought more rapid change in the field than we could have imagined. This last year we got the newest version of OpenAI’s model, GPT-4. We also got a ton of open-source models competing with OpenAI, like LLaMa, Falcon, and Mistral. Google didn’t want to miss the party and stepped up its game unveiling Gemini, the successor to Google’s earlier model, PaLM. Anthropic launched Claude, AI21 Labs launched Jurassic-2 and Amazon announced its Generative AI service, Bedrock, as well as their own LLM, Titan. Some LLMs gained a ton of new functionality by going multimodal, as well as embracing agents, which allow them to have up-to-date information and interact with the world around them.&lt;/p&gt;

&lt;p&gt;Also, regulators are beginning to catch up to the rapid evolution of this new technology. The United States of America passed an Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, and the European Union approved the EU AI Act, the first actual regulation on artificial intelligence.&lt;/p&gt;

&lt;p&gt;So, what’s next for 2024? Given the unpredictably fast rise of new AI technology in the last 14 months, any prediction that we make on its evolution in the future will probably be ridiculed by reality. However, an educated guess can be made given the current direction of AI. I invite you to hear about my take on AI in 2024 and of course, take it with a grain of salt.&lt;/p&gt;

&lt;h2&gt;
  
  
  AGI
&lt;/h2&gt;

&lt;p&gt;As new LLMs begin to approach the trillion parameters mark, new architecture ideas improve on the transformer architecture, and AI’s reasoning abilities continue to improve, I predict that we will begin to hear some Artificial General Intelligence claims. Artificial General Intelligence is the highly anticipated milestone when AI will be able to perform any intellectual task that a human could do. While this kind of artificial intelligence is still very far from being a reality, it seems like it will be inevitable. With current AIs improving exponentially, it's just a matter of time until we begin to hear claims that a top LLM has reached AGI level. While this does not mean that we are there yet, it’s a clear step that we are on the path to that very realistic future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open-source models
&lt;/h2&gt;

&lt;p&gt;HuggingFace is a platform that makes it easy to share and contribute to models and datasets. It’s an incredible resource for anyone who wants to start using AI models right away, without having to invest time and money in training their own model from scratch. HuggingFace also provides a ton of resources for working with models and easily fine-tuning them to adapt any model to the user’s needs. In my opinion, these kinds of resources are priceless to both individuals and enterprises that want to get into using AI. And given the fact that every day more and more individuals and businesses introduce AI as a part of their workflows and products, it is likely that this kind of platform will explode further that it already has, providing tons of new open-source models and AI resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodality
&lt;/h2&gt;

&lt;p&gt;Some LLMs like ChatGPT and Google’s Gemini have already unveiled multimodal capabilities. Multimodal models are Large Language Models that can understand and interact with a combination of different types of inputs, like text, audio, images, and video. Since LLMs convert text and images to a vector representation to interpret them correctly, it's quite challenging to find a vector representation that works for both text and images, so this is an incredible feat. LLMs having multimodal capabilities broaden the list of tasks they can perform by a large margin. This year, I expect even more models embracing multimodality and the capabilities of AI agents to increase significantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Responsible AI
&lt;/h2&gt;

&lt;p&gt;As regulators start to intensify legislation on artificial intelligence, Responsible AI will not just be a choice in developing new AI models, but a must. Responsible AI is an approach to training AI models with ethics and safety in mind. Generative AI models should be trained to be fair and unbiased, as well as to avoid harmful responses and avoid endangering human life, property, or the environment. I expect not only to see an improvement in Responsible AI frameworks and expectations but also to witness the first legal consequences for enterprises not following those new rulings.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI use in enterprises
&lt;/h2&gt;

&lt;p&gt;As using LLMs becomes easier and more democratized, the barrier of entry for tons of enterprises that are not primarily technologically focused will begin to fall. More and more kinds of companies will begin to use AI to aid in their daily operations. We will see more AI B2B products to address these new necessities. Some companies may just start using AI for the hype of it but will fail to implement it in a meaningful manner for their bottom line. On the other hand, companies that succeed in introducing AI in their daily operations will see an unprecedented boost in productivity. Sadly, this also might mean that we will see some companies lay off an important part of their workforce, as employees are quickly replaced by AI systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI race
&lt;/h2&gt;

&lt;p&gt;The potential of artificial intelligence is no longer a secret. Many countries around the world will not only begin to legislate for AI but also invest heavily in it, as being at the forefront of AI innovation can give them an edge in many areas, including military, scientific, economic, and informational. Even if one specific part of the world advocates for AI ethics and halts AI progress to ensure it not being a threat to humanity, other parts of the world might take this opportunity to put themselves at the front of innovation, which will inevitably make the former’s efforts worthless. We might begin to witness a new AI “space race”, as countries put pressure on each other to never stop investing resources in developing new AI, under the threat of falling behind in the new future that AI is shaping.&lt;/p&gt;

&lt;p&gt;Even if my predictions don’t end up becoming a reality, what is certain is that AI will continue to be a major topic in 2024. I’m excited to see how AI will evolve during the following year and what new capabilities it will bring. Certainly, there are many concerns to be had with the rapid evolution of AI, but I’m hopeful that we will be responsible enough to address them before they do become a problem. As a personal note, this year I will be testing how to use AI in many of my daily activities to make them easier or even completely automate them.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How To Supercharge Your LLM with LangChain Agents</title>
      <dc:creator>Roger Oriol</dc:creator>
      <pubDate>Sun, 13 Aug 2023 11:48:19 +0000</pubDate>
      <link>https://dev.to/rogiia/how-to-supercharge-your-llm-with-langchain-agents-3fl6</link>
      <guid>https://dev.to/rogiia/how-to-supercharge-your-llm-with-langchain-agents-3fl6</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8s0bzr1t908e21a0zj9t.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8s0bzr1t908e21a0zj9t.jpg" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with LLMs
&lt;/h2&gt;

&lt;p&gt;LLMs are very capable to perform a lot of feats that seem incredible to us, but &lt;strong&gt;they are bound by the borders of what Generative AI was originally meant to do: generate text based on the data it has been trained on.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They cannot access any services on the web to give us more accurate and recent answers and instead, they are meant to answer using the training data that their creators were able to gather many years ago. They cannot access tools as simple as a calculator that would help to give mathematical answers instead of having to memorize the result of every mathematical operation. They are reliant on us to perform the actions they suggest to do in the real world and we report back to them with the results.&lt;/p&gt;

&lt;p&gt;To give our LLMs the powers they are missing to be truly powerful for us, we can use &lt;strong&gt;Agents&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are Agents?
&lt;/h2&gt;

&lt;p&gt;Agents are an LLM that is being prompted to &lt;strong&gt;reason about the actions needed to complete a request&lt;/strong&gt;, using a set of &lt;strong&gt;tools&lt;/strong&gt; that it has been provided with. An agent can be used alongside &lt;strong&gt;any LLM&lt;/strong&gt;. It is only a layer on top of it that builds the prompts indicating to the Agent the &lt;strong&gt;context, its personality and the strategies it must use&lt;/strong&gt; to complete requests.&lt;/p&gt;

&lt;p&gt;Obviously, the Agent by itself is not enough. We also need to create/use tools for the Agent to be able to use. The &lt;strong&gt;Agent Executor&lt;/strong&gt; is the runtime that executes both the Agent and the tools that it uses. &lt;strong&gt;Tools&lt;/strong&gt; can just be standalone functions or they can come in collections of tools called &lt;strong&gt;toolkits&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.ruxu.dev%2Fassets%2Fimages%2FLangChainAgents.svg" class="article-body-image-wrapper"&gt;&lt;img alt="Agent Executor, Agents and Tools diagram" src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.ruxu.dev%2Fassets%2Fimages%2FLangChainAgents.svg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools and toolkits
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tools are functions&lt;/strong&gt; that will perform actions on behalf of the LLM. An agent gets a list of tools for it to use and it will request to use one, several, or none. The Agent Executor will execute the required tools and feed the result back to the Agent. An example of a tool is the Google Search function, which allows LLMs to check some information that they don't have using a Google search. &lt;/p&gt;

&lt;p&gt;For the Agent to be able to choose correctly which tools it needs, &lt;strong&gt;those tools have to exist, be correctly implemented for the purpose that the Agent needs them for, and be in the list of tools for the Agent&lt;/strong&gt;. But more importantly, these tools &lt;strong&gt;must be thoroughly described&lt;/strong&gt; so Agents can easily decide if they need them and what they will be helpful for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Toolkits are just a set of tools&lt;/strong&gt; that are usually useful together. These tools might be useful to be used together for achieving a multi-step goal, or they might be grouped because they do similar actions or actions in the same domain. An example of a toolkit is the Gmail Toolkit, which allows LLMs to read emails, draft new emails and delete them. &lt;/p&gt;

&lt;h2&gt;
  
  
  What strategies do Agents use?
&lt;/h2&gt;

&lt;p&gt;After a set of tools is provided to the Agent, how does it know which one to use? Agents can use a variety of prompt engineering strategies to make the LLM reason and decide about the actions it has to take. Some popular prompt engineering practices were discussed in the previous article: Prompt Engineering.&lt;/p&gt;

&lt;p&gt;The most popular strategy for agents is the &lt;strong&gt;ReAct&lt;/strong&gt; method. ReAct uses few-shot learning together with some Chain-of-Thought reasoning examples. These examples contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Thoughts&lt;/strong&gt; transcribed from the reasoning strategies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actions&lt;/strong&gt; that let the LLM interact with its environment in a verbal manner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observations&lt;/strong&gt; gained after taking the actions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The LLM then understands how to act in this manner and interact with its tools and can apply it to real interactions. An example of a ReAct prompt would be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Question: What is the current temperature at the city where John Cena was born in Celcius?
Thought: I need to search for the city John Cena was born in, then find the current temperature for that city, then convert the temperature to Celsius.
Action: Search[City where John cena was born]
Observation: Cena was born in West Newbury, Massachusetts.
Thought: I have to find the current temperature in West Newbury, Massachusetts.
Action: WeatherLookup[West Newbury, Massachusetts]
Observation: 81°F. Sunshine to start, then a few afternoon clouds. High 81F. Winds WNW at 5 to 10 mph. 45% humidity.
Thought: I have to convert 81°F to Celsius.
Action: UnitConversion[FahrenheitToCelcius, 81]
Observation: 27.2
Thought: The answer is 27.2 degrees Celsius.
Action: Finish[27.2 degrees Celsius]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to use Langchain Agents
&lt;/h2&gt;

&lt;p&gt;Langchain is a Python library (as well as JS/TS) that is very useful for rapidly getting started on integrating LLMs in your applications. It has many prompting strategies available out of the box and also many Agents.&lt;/p&gt;

&lt;p&gt;You can get started writing tools of your own using Langchain’s tools, like this (&lt;strong&gt;VERY IMPORTANT: give the function a docstring description, this will be used by the agent to choose what tool to use&lt;/strong&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;unit_conversion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Converts a given unit to another unit. To convert from Fahrenfeit to Celcius, give FahrenheitToCelcius as first parameter, then the value as second parameter&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;units&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FahrenheitToCelcius&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid type of unit conversion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Afterwards, use this tool along with your agent of choice, also with Langchain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.llms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;initialize_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentType&lt;/span&gt;

&lt;span class="c1"&gt;# We will use OpenAI's GPT as the LLM
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# We will create the Agent
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;initialize_agent&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;unit_conversion&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AgentType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZERO_SHOT_REACT_DESCRIPTION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is 81 degrees Fahrenheit in Celsius?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can learn more about Langchain and its tools by visiting &lt;a href="https://python.langchain.com/docs/get_started" rel="noopener noreferrer"&gt;its documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It looks like Agents are going to bring a lot more capabilities to LLMs in the recent future. As these new capabilities roll out to applications and tools, more outstanding things will be possible. I’m very curious and looking forward to the new possibilities.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>chatgpt</category>
      <category>langchain</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
