<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arin Zingade</title>
    <description>The latest articles on DEV Community by Arin Zingade (@arinzingade).</description>
    <link>https://dev.to/arinzingade</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1898201%2F857a0546-e35b-49b0-a819-05a273cc8491.png</url>
      <title>DEV Community: Arin Zingade</title>
      <link>https://dev.to/arinzingade</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arinzingade"/>
    <language>en</language>
    <item>
      <title>AI Web Agents: The Future of Intelligent Automation</title>
      <dc:creator>Arin Zingade</dc:creator>
      <pubDate>Sat, 04 Jan 2025 15:27:18 +0000</pubDate>
      <link>https://dev.to/arinzingade/ai-web-agents-the-future-of-intelligent-automation-2odo</link>
      <guid>https://dev.to/arinzingade/ai-web-agents-the-future-of-intelligent-automation-2odo</guid>
      <description>&lt;h2&gt;
  
  
  Introduction: What Are AI Web Agents?
&lt;/h2&gt;

&lt;p&gt;AI web agents represent a powerful, emerging force within the digital landscape, fundamentally reshaping how organizations approach automation. As software tools capable of simulating human-like interactions, AI agents can understand, execute, and adapt to user requests. They are not merely passive systems responding to pre-defined commands but actively work to understand broader goals, learn from interactions, and dynamically refine responses.&lt;/p&gt;

&lt;p&gt;A recent Capgemini survey of large enterprises reveals that one in ten organizations is already deploying AI agents, with over half planning to explore these technologies within the coming year. Forrester Research also highlights AI web agents as one of the top 10 emerging technologies for 2024, with VP &lt;a href="https://www.forrester.com/blogs/author/brian_hopkins/?utm_source=pr&amp;amp;utm_medium=pr_pitch&amp;amp;utm_campaign=tech" rel="noopener noreferrer"&gt;Brian Hopkins&lt;/a&gt; calling them “perhaps the most exciting development” on this year’s &lt;a href="https://www.forrester.com/blogs/top-10-emerging-technologies-for-2024/?utm_source=pr&amp;amp;utm_medium=pr_pitch&amp;amp;utm_campaign=tech" rel="noopener noreferrer"&gt;list&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The concept of &lt;a href="https://www.geeksforgeeks.org/rabbit-ai-large-action-models-lams/" rel="noopener noreferrer"&gt;Large Action Models&lt;/a&gt; (LAMs) has become a focal point in discussions about AI agents. Rabbit AI, a pioneering player in this space, has introduced a product—a custom OS-equipped device supporting a trainable AI assistant capable of handling a wide range of actions. This assistant leverages LAMs to manage tasks such as making reservations, giving directions, ordering services, and adapting to user-specific prompts.&lt;/p&gt;

&lt;p&gt;Imagine a team of robotic coworkers, each able to support various business operations, be it customer service, data analysis, or scheduling tasks. These agents act as powerful extensions of human teams, handling operational tasks so human team members can focus on higher-level strategic work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large Action Models: A Step Toward the Future
&lt;/h3&gt;

&lt;p&gt;As AI technology advances, an exciting new category emerges: Large Action Models (LAMs). Large action models have a more expansive role than traditional language models, which primarily generate text. They’re built to "generate" or perform actions, executing complex tasks based on clear instructions. This progression brings us closer to artificial general intelligence (AGI), the idea of AI capable of performing various tasks. Although AGI remains a distant vision, the development of LAMs brings us a step nearer to it in practical, impactful ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are Large Action Models?
&lt;/h3&gt;

&lt;p&gt;Large action models combine multiple components, creating a system capable of interpreting instructions, understanding context, and performing diverse tasks. Think of them as supercharged LLMs that operate with multimodal capabilities, meaning they can handle not just text but also images, videos, and more. Additionally, they’re designed to interact with external tools and environments, empowering them to execute actions within complex workflows seamlessly.&lt;/p&gt;

&lt;h4&gt;
  
  
  Capabilities and Real-World Applications
&lt;/h4&gt;

&lt;p&gt;Large action models are reshaping how we think about automation. They go beyond handling complex queries; they adapt to a range of situations and user requirements. For example, MultiOn agents use websites and online services to perform a variety of tasks, all based on a simple prompt. With applications in areas like personalized marketing, these agents are positioned to change how people interact with digital services by simplifying workflows, automating repetitive tasks, and handling entire workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero-shot Learning&lt;/strong&gt;: LAMs are designed to perform new tasks without explicit training, relying on the vast data they’re trained on. This enables them to take on unfamiliar tasks with minimal guidance, broadening their application scope.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Few-shot Learning&lt;/strong&gt;: LAMs can also handle custom tasks by learning from a few examples provided in the input. This lets us adapt them to specific needs or contexts, adding a level of flexibility that traditional automation tools often lack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Potential Limitations
&lt;/h3&gt;

&lt;p&gt;Despite their promise, LAMs face some hurdles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Latency Issues&lt;/strong&gt;: Efficiency is a core design goal for LAMs, yet complex, multi-step tasks can introduce delays. This can impact user experience, particularly in environments where real-time responses are crucial.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Experimental Phase&lt;/strong&gt;: Many LAMs are still in development, and while their capabilities are impressive, they may not be fully reliable in all real-world applications. Continued refinement and testing will be key to achieving consistency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Dependency&lt;/strong&gt;: Like any advanced AI, LAMs require extensive datasets to make accurate, informed decisions. In domains where data is scarce, their performance may be limited.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Complexity of Integration&lt;/strong&gt;: Integrating LAMs into existing systems requires sophisticated infrastructure and support for multimodal processing, which can be challenging and costly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Understanding AI Web Agents
&lt;/h2&gt;

&lt;p&gt;AI web agents are revolutionizing how we approach automation. These agents gather information from their surroundings, process that data, and take actions to transform the environment—whether physical, digital, or a blend of both. As technology continues to advance, many AI agents are becoming increasingly capable of learning and adapting their behavior over time. They explore new solutions to challenges, continuously refining their approach until they achieve the desired outcome.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowaqp6xlw5p1h4ykrtaw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowaqp6xlw5p1h4ykrtaw.png" alt="Image description" width="800" height="659"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Pipeline of how AI Web Agent functions&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Web Agents vs. Traditional Automation Tools
&lt;/h3&gt;

&lt;p&gt;For many years, businesses have relied on traditional automation tools to handle repetitive, rule-based tasks—things like data entry, email marketing, and scheduling. These tools were highly effective for straightforward, repetitive processes but often lacked flexibility and intelligence when faced with more complex scenarios. That’s where AI-powered web agents come in, offering a much more sophisticated approach to automation.&lt;/p&gt;

&lt;p&gt;Unlike traditional automation tools, which rely on fixed rules and processes, AI agents leverage advanced technologies such as machine learning, &lt;a href="https://www.ibm.com/think/topics/natural-language-processing" rel="noopener noreferrer"&gt;natural language processing&lt;/a&gt; (NLP), and &lt;a href="https://cloud.google.com/discover/what-is-cognitive-computing" rel="noopener noreferrer"&gt;cognitive computing&lt;/a&gt;. This allows them to perform tasks in a much more flexible manner, adapting to new information and evolving conditions in real time. With these capabilities, AI agents can learn from past experiences and make smarter decisions without being explicitly programmed for every possible scenario.&lt;/p&gt;

&lt;p&gt;In the past, web automation often required businesses to write custom scripts for each website, using techniques like DOM parsing and XPath-based interactions. However, these scripts could easily break if a website's layout or structure changed. AI agents, on the other hand, have evolved beyond such limitations, offering a more resilient and dynamic approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Technologies Behind AI Web Agents
&lt;/h3&gt;

&lt;p&gt;AI web agents harness a suite of advanced technologies to bring a new level of automation and intelligence to digital workflows. At the core of these agents are systems that not only understand tasks but also adapt to dynamic online environments, recognize visual elements, interpret language, and extract meaningful data.&lt;/p&gt;

&lt;h4&gt;
  
  
  Large Language Models (LLM)
&lt;/h4&gt;

&lt;p&gt;LLMs play a central role in AI web agents. They understand the task at hand, process the language, and generate the necessary steps to complete the objective. Whether it’s interacting with a website or gathering information, the LLM drives the decision-making process.&lt;/p&gt;

&lt;h4&gt;
  
  
  Natural Language Processing (NLP)
&lt;/h4&gt;

&lt;p&gt;NLP allows AI agents to interpret and understand human language. It helps the agent communicate with websites, forms, and other digital environments, enabling tasks like reading text, answering questions, or extracting key information.&lt;/p&gt;

&lt;h4&gt;
  
  
  Computer Vision
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Computer_vision" rel="noopener noreferrer"&gt;Computer vision&lt;/a&gt; enables AI agents to "see" and interact with visual elements on a webpage. By scanning for images, buttons, or other interactive items, AI agents can make informed decisions about how to engage with the environment.&lt;/p&gt;

&lt;h4&gt;
  
  
  Understanding Context
&lt;/h4&gt;

&lt;p&gt;Context is crucial for accurate decision-making. AI agents use context to adapt their behavior based on real-time data, past experiences, or user input. This ensures tasks are completed intelligently, even when conditions change.&lt;/p&gt;

&lt;h4&gt;
  
  
  Entity Recognition and Extraction
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.ibm.com/think/topics/named-entity-recognition" rel="noopener noreferrer"&gt;Entity recognition&lt;/a&gt; helps AI agents identify important pieces of information, like product names, dates, or locations, within text or data. This capability allows agents to make smarter decisions based on extracted entities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Standard Pipeline for Web AI Agents
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Understanding the Task&lt;/strong&gt;: The LLM interprets the task and identifies the objective.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Website Interaction&lt;/strong&gt;: The agent accesses the target website and uses computer vision to scan the content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action Generation&lt;/strong&gt;: The agent generates action plans, such as Selenium or Playwright code, to interact with the website.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution&lt;/strong&gt;: The generated code is executed to perform the required actions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repeat&lt;/strong&gt;: The agent repeats the process until the task is completed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxlczb0l123c6x0aqdy0t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxlczb0l123c6x0aqdy0t.png" alt="Image description" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Demonstrates the recursive approach AI Web Agents take&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Leading AI Web Agents
&lt;/h2&gt;

&lt;p&gt;As we dive into AI-driven web agents - many of them open-source, are paving the way for developers to build powerful, customized solutions. These frameworks offer us robust foundations, allowing us to focus on tailoring and scaling AI agents for specific needs rather than building everything from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  LaVague
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.lavague.ai/" rel="noopener noreferrer"&gt;LaVague&lt;/a&gt; is an open-source framework designed for developers seeking to build AI web agents that automate processes for their users. It provides a comprehensive solution for creating adaptable and effective AI agents.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Features
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;World Model:&lt;/strong&gt; LaVague's World Model processes the current web page and the given objective to generate a set of instructions for the agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action Engine:&lt;/strong&gt; The Action Engine compiles these instructions into executable code, such as Selenium or Playwright, and then performs the required action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supported Drivers&lt;/strong&gt;: LaVague supports three main driver options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Selenium WebDriver&lt;/li&gt;
&lt;li&gt;Playwright WebDriver&lt;/li&gt;
&lt;li&gt;Chrome Extension Driver&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Stagehand
&lt;/h3&gt;

&lt;p&gt;Stagehand, from &lt;a href="https://www.browserbase.com/" rel="noopener noreferrer"&gt;Browserbase&lt;/a&gt;, is a high-performance, serverless platform that simplifies the management of headless browsers. It allows developers to run, manage, and monitor web automation tasks at scale, offering a robust solution for integrating AI web agents.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Features
&lt;/h4&gt;

&lt;p&gt;Native compatibility with popular automation tools like &lt;a href="https://docs.browserbase.com/quickstart/playwright" rel="noopener noreferrer"&gt;Playwright&lt;/a&gt;, &lt;a href="https://docs.browserbase.com/quickstart/puppeteer" rel="noopener noreferrer"&gt;Puppeteer&lt;/a&gt;, and &lt;a href="https://docs.browserbase.com/quickstart/selenium" rel="noopener noreferrer"&gt;Selenium&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Integration:&lt;/strong&gt; Seamless integration with AI technologies like &lt;a href="https://docs.browserbase.com/integrations/crew-ai/introduction" rel="noopener noreferrer"&gt;crewAI&lt;/a&gt; and &lt;a href="https://docs.browserbase.com/integrations/langchain/introduction" rel="noopener noreferrer"&gt;Langchain&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; Full observability with the &lt;a href="https://docs.browserbase.com/features/session-inspector" rel="noopener noreferrer"&gt;Session Inspector&lt;/a&gt;, which provides deep insights into agent interactions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stealth Mode:&lt;/strong&gt; Automatically solves captchas and uses residential proxies for improved anonymity and reliability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Features:&lt;/strong&gt; Advanced features like custom extensions, file downloads, long-running sessions, and an API for live view and session logs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Stagehand offers a scalable, secure, and reliable infrastructure that supports the creation and deployment of powerful AI agents in the web automation space.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skyvern
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.skyvern.com/" rel="noopener noreferrer"&gt;Skyvern&lt;/a&gt; is a cutting-edge solution that automates browser-based workflows using large language models (LLMs) and computer vision. It’s designed to replace traditional automation solutions with a more robust, adaptable system.&lt;/p&gt;

&lt;p&gt;Skyvern offers an intuitive user interface, allowing you to automate workflows with ease. Here’s how to get started with setting up Skyvern on your machine.&lt;/p&gt;

&lt;h4&gt;
  
  
  Steps to Set Up Skyvern
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Prerequisites
&lt;/h5&gt;

&lt;p&gt;Before you begin, make sure Docker is installed on your system. Docker will allow Skyvern to run seamlessly across different environments.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clone the Repository&lt;/strong&gt;
Start by cloning Skyvern's repository from GitHub:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   git clone https://github.com/Skyvern-AI/skyvern
   &lt;span class="nb"&gt;cd &lt;/span&gt;Skyvern
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Configure Your API Key&lt;/strong&gt;
Open the &lt;code&gt;docker-compose.yml&lt;/code&gt; file in a text editor:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   nano docker-compose.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace the placeholder with your OpenAI or Anthropic API key. This key enables Skyvern to access the AI functionalities needed for workflow automation.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build and Run Skyvern&lt;/strong&gt;
Once you’ve added the API key, start Skyvern with Docker:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   docker-compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Skyvern will start running on &lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;http://localhost:8080&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;You can set up tasks and workflows using this user interface: visit &lt;a href="https://docs.skyvern.com/introduction" rel="noopener noreferrer"&gt;Skyvern Docs&lt;/a&gt; for more.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5lqnagwja43ivzwak9b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5lqnagwja43ivzwak9b.png" alt="Image description" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Features:
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Adaptability&lt;/strong&gt;: Skyvern can operate on websites it has never encountered before, thanks to its ability to map visual elements to actions necessary for completing workflows, without relying on custom code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resilience&lt;/strong&gt;: Unlike traditional automation tools that depend on fixed XPath selectors, Skyvern can adapt to website layout changes, ensuring it remains functional even as websites evolve.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Skyvern is capable of applying a single workflow across a large number of websites, reasoning through interactions and automating complex tasks reliably.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Automating Web Tasks with AI
&lt;/h2&gt;

&lt;p&gt;Automation is no longer limited to simple, rule-based processes; with AI-driven agents, we can automate nuanced, multi-step operations that require adaptability and intelligence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Use Cases
&lt;/h3&gt;

&lt;p&gt;We’re seeing AI web agents transform various workflows. Here are some common use cases where they’re making a real difference:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Extraction and Web Scraping&lt;/strong&gt;: Collecting and structuring information from online sources, saving us the time and effort of manual data gathering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automating Repetitive Tasks&lt;/strong&gt;: From logging data entries to filling forms, AI agents handle repetitive actions with precision, freeing us to focus on higher-level tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow Automation&lt;/strong&gt;: Our agents can coordinate multiple steps across platforms, streamlining workflows and reducing the need for human intervention.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Benefits of Automation
&lt;/h3&gt;

&lt;p&gt;By adopting AI automation, we’re not just saving time; we’re enhancing our work in meaningful ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time Savings and Efficiency&lt;/strong&gt;: AI agents allow us to focus on critical, creative aspects of our work, increasing our productivity and freeing up time for innovation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduction in Human Error&lt;/strong&gt;: With AI managing repetitive tasks, accuracy improves, errors decrease, and we benefit from more consistent, reliable results.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhv0j3ydblgo3t56qnuz3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhv0j3ydblgo3t56qnuz3.png" alt="Image description" width="800" height="574"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How AI Agents can increase efficiency&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Demonstration
&lt;/h3&gt;

&lt;p&gt;In this demo, Lavague is used in combination with a WebAI agent to automate the task of navigating from the &lt;a href="https://finance.yahoo.com/" rel="noopener noreferrer"&gt;Yahoo Finance&lt;/a&gt; homepage to the World Indices page. The process is broken down into a few simple steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Install the Required Libraries:&lt;/strong&gt; To get started, you'll first need to install Lavague and its dependencies:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  pip &lt;span class="nb"&gt;install &lt;/span&gt;lavague llama_index
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Set Up the WebAI Agent:&lt;/strong&gt; The agent uses the Selenium web driver to interact with the Yahoo Finance website. Here's the Python code that sets up the agent and directs it to the Yahoo Finance homepage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lavague.drivers.selenium&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SeleniumDriver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lavague.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ActionEngine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WorldModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lavague.core.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WebAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lavague.core.navigation&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NavigationEngine&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lavague.core.retrievers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpsmSplitRetriever&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lavague.contexts.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenaiContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.llms.groq&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Groq&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your_api_key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;selenium_driver&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SeleniumDriver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;action_engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ActionEngine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;selenium_driver&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;world_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WorldModel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WebAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;world_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;action_engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://finance.yahoo.com/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;instruction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Objective: Go to the World Indices Page
1. Click on &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Markets&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
2. Click on the &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;World Indices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; link in the &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Markets&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; dropdown menu
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objective&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;display&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the agent successfully completes the task, it automatically takes a screenshot of the current web page. Lavague stores these screenshots, which can later be used in a Visual Language Model (VLM) to extract important information and create a workflow for further interactions.&lt;/p&gt;

&lt;p&gt;The output of the agent’s task would look something like this:&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating AI Web Agents into Workflows
&lt;/h2&gt;

&lt;p&gt;Integrating AI web agents into workflows has become a transformative approach for businesses looking to enhance efficiency, automate repetitive tasks, and improve overall productivity. Here’s how organizations can practically implement AI agents into their operational frameworks&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Define Use Cases&lt;/strong&gt;: Identify specific tasks or processes that can benefit from automation. Common use cases include customer service, order management, HR processes, and project management. For example, AI agents can automate customer inquiries, manage recruitment workflows, or optimize project task allocations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Select the Right Platform&lt;/strong&gt;: Choose an AI platform that supports the creation and management of AI agents. Platforms like Automation Anywhere's AI Agent Studio allow businesses to build custom agents tailored to their unique needs. These platforms often provide tools for integrating generative AI into existing workflows seamlessly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Design Agentic Workflows&lt;/strong&gt;: Implement agentic workflows that enable AI agents to operate independently while pursuing specific goals. Unlike traditional systems that react to commands, these workflows allow agents to analyze their environment and make proactive decisions based on real-time data.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;These tools are not just simplifying automation—they are evolving it, bringing unprecedented adaptability, intelligence, and efficiency to workflows across industries. The transformative capabilities of LAMs, we’re seeing a clear shift towards AI agents that understand and actively respond to the world around them.&lt;/p&gt;

&lt;p&gt;In this article, we’ve explored the technologies, frameworks, and key features that make AI web agents a game-changer. From enhanced data extraction to seamless workflow automation, these agents provide us with new possibilities for maximizing efficiency and minimizing errors. LAMs, especially, represent a leap forward, empowering agents to perform a broader range of tasks with little or no additional training. As LAMs continue to evolve, they’re opening doors to more complex actions, bringing us closer to the vision of artificial general intelligence.&lt;/p&gt;

&lt;p&gt;As we move forward, we’re excited to continue integrating these innovations into our processes, harnessing the full potential of AI agents to create smarter, more autonomous workflows.&lt;/p&gt;

&lt;p&gt;Let’s step confidently into this future together, knowing that we’re building a more productive, efficient, and innovative digital landscape.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>automation</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>ClickHouse Vs DuckDB</title>
      <dc:creator>Arin Zingade</dc:creator>
      <pubDate>Tue, 03 Dec 2024 06:13:24 +0000</pubDate>
      <link>https://dev.to/arinzingade/clickhouse-vs-duckdb-4o1l</link>
      <guid>https://dev.to/arinzingade/clickhouse-vs-duckdb-4o1l</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The rise of OLAP databases has been hard to ignore, with the relentless growth of data and demand for powerful, fast analytics tools. OLTP databases have been invaluable for transaction-heavy applications but often fall short when faced with the sheer complexity of modern analytical workloads. Enter OLAP databases, designed to make slicing through massive datasets feel nearly effortless.&lt;/p&gt;

&lt;p&gt;In exploring OLAP solutions, we found two that stood out: ClickHouse and DuckDB. While both are OLAP-focused, they’re fundamentally different tools, each with unique strengths. ClickHouse is a powerhouse designed for multi-node, distributed systems that scale up to petabytes of data. DuckDB, on the other hand, is more like the SQLite of OLAP—a nimble, desktop-friendly database that brings OLAP capabilities to local environments without the need for elaborate setup. Despite their differences, these databases share a versatility that makes them adaptable to a range of tasks: querying data in object storage, handling cross-database queries, and even parsing compressed files or semi-structured data. &lt;/p&gt;

&lt;p&gt;This article will mainly focus on the capabilities of DuckDB -- while touching on ClickHouse and its key differences which makes both the projects great in their own niche. I have covered ClickHouse in much detail &lt;a href="https://www.cloudraft.io/blog/clickhouse-key-to-faster-insights#how-does-clickhouse-work" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4dysnanpdarykqw6734d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4dysnanpdarykqw6734d.png" alt="Image description" width="800" height="226"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  DuckDB: Speedy Analytics, Zero Setup
&lt;/h3&gt;

&lt;p&gt;DuckDB is a welcome solution for data analysts and scientists seeking efficient, local OLAP processing without the usual infrastructure demands. DuckDB has carved out a unique niche as a lightweight relational database, built to perform analytical tasks with impressive speed while remaining incredibly easy to use.&lt;/p&gt;

&lt;p&gt;For those of us who are accustomed to the high costs of platforms like Redshift, Databricks, Snowflake, or BigQuery, DuckDB offers a refreshing alternative. DuckDB allows you to simply upload files to cloud storage, enabling teams to use their existing laptops compute to perform analytics, bypassing the need for expensive and complex infrastructure for smaller tasks and analysis.&lt;/p&gt;

&lt;p&gt;Personally, we use DuckDB as a go-to solution for handling tasks that exceed the capacity of tools like Pandas or Polars. Its ability to load large CSVs directly into dataframes with speed and efficiency has streamlined my workflows. It performs well for ETL tasks in Kubernetes environments, showcasing its adaptability and reliability across a wide range of data processing needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafd4o26osfzhemsoxsyq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafd4o26osfzhemsoxsyq.png" alt="Image description" width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Versatility of DuckDB&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ClickHouse: Analytics for Scale
&lt;/h2&gt;

&lt;p&gt;Then there's ClickHouse, a database built for scale, known for its incredible speed and efficiency when working with vast amounts of data. ClickHouse’s column-oriented architecture, coupled with unique table engines, enables it to process millions of rows per second. Companies like Cloudflare use it to cut down memory usage by over four times, highlighting its role in real-world, large-scale applications. In environments where data volumes are measured in terabytes or petabytes, ClickHouse really shines.&lt;/p&gt;

&lt;p&gt;One of the standout aspects of ClickHouse is its ability to leverage the full power of the underlying hardware, optimizing memory and CPU usage to handle massive, complex queries with ease. The distributed nature of ClickHouse allows it to scale horizontally across nodes, making it resilient and highly available for mission-critical applications. Its real-time query capabilities allow companies to power dashboards and interactive reports with minimal latency, providing instant insights for fast-paced decision-making. For anyone managing large-scale analytics, it delivers a robust set of features for everything from web analytics to detailed log analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Similarities and Differences between ClickHouse and DuckDB
&lt;/h2&gt;

&lt;p&gt;While both ClickHouse and DuckDB excel in fast, efficient querying and share similar columnar architectures, their design philosophies and deployment models cater to unique needs. Together, these tools showcase the spectrum of options available for handling analytical workloads, from enterprise-scale, distributed systems to flexible, embedded analytics.&lt;/p&gt;

&lt;h4&gt;
  
  
  ClickHouse: Enterprise-Grade, Big Data Workloads
&lt;/h4&gt;

&lt;p&gt;ClickHouse is designed for large-scale analytics, widely adopted by enterprises handling vast datasets for real-time analytics, monitoring, and business intelligence. Built to handle multi-node deployments, ClickHouse scales effectively with its Massively Parallel Processing (MPP) architecture, making it a strong choice for multi-terabyte, distributed, cloud-first deployments.&lt;/p&gt;

&lt;h4&gt;
  
  
  DuckDB: Small-to-Medium Data and Data Science Workflows
&lt;/h4&gt;

&lt;p&gt;DuckDB is ideal for small-to-medium datasets and data science tasks. It’s lightweight, embedded, and designed to run directly on local machines with minimal configuration. This makes it perfect for data exploration and prototyping on tens-of-gigabyte datasets without needing a complex database setup.&lt;/p&gt;

&lt;h4&gt;
  
  
  Installation and Embedding: ClickHouse’s chDB and DuckDB’s In-Process Model
&lt;/h4&gt;

&lt;p&gt;DuckDB is completely embedded—no server setup is needed, making it easily deployable within the same process as the host application. ClickHouse offers similar ease with &lt;strong&gt;chDB&lt;/strong&gt;, a library that allows ClickHouse SQL queries to be run directly in Python environments, providing a streamlined setup for local analytics.&lt;/p&gt;

&lt;h4&gt;
  
  
  In-Memory and Serialization Capabilities
&lt;/h4&gt;

&lt;p&gt;Both databases support in-memory processing, though they differ in approach. DuckDB can operate in-memory by default for fast, temporary analyses, while ClickHouse also provides in-memory storage options through specific storage engines. For data serialization to flat files, ClickHouse is generally faster, thanks to its optimized storage architecture.&lt;/p&gt;

&lt;h4&gt;
  
  
  Performance on Complex Computations and DataFrame Integration
&lt;/h4&gt;

&lt;p&gt;DuckDB excels at handling complex relational data operations, often providing faster local analysis for structured datasets. Its seamless querying of &lt;strong&gt;Pandas&lt;/strong&gt;, &lt;strong&gt;Polars&lt;/strong&gt;, and &lt;strong&gt;Arrow&lt;/strong&gt; DataFrames within Python is a key feature that makes it highly useful for data scientists, serving as a powerful in-process SQL engine.&lt;/p&gt;

&lt;h4&gt;
  
  
  Distributed Scaling vs. Local, Serverless Execution
&lt;/h4&gt;

&lt;p&gt;ClickHouse’s MPP architecture allows for horizontal scaling across nodes, ideal for enterprise-grade, cloud-based analytics workloads. DuckDB, meanwhile, thrives in serverless, single-machine setups and is perfect for tasks like ETL, semi-structured data queries, and quick analyses on local storage.&lt;/p&gt;

&lt;p&gt;In Summary,&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Feature&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;ClickHouse&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;DuckDB&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Column-oriented, OLAP&lt;/td&gt;
&lt;td&gt;Column-oriented, OLAP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary Use Case&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enterprise-grade, big data workloads&lt;/td&gt;
&lt;td&gt;Small-to-medium data volumes; data science workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Adoption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Widely used in large enterprises for high-speed analytics&lt;/td&gt;
&lt;td&gt;Popular among data analysts and scientists&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-node, distributed (supports MPP architecture)&lt;/td&gt;
&lt;td&gt;Single-machine, embedded (runs in-process)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Installation Requirement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server software typically required (chDB for local SQL)&lt;/td&gt;
&lt;td&gt;No server installation needed; embedded within host&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Volume Scale&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Suited for multi-terabyte, large datasets&lt;/td&gt;
&lt;td&gt;Ideal for tens of GB-level datasets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;In-Memory Processing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supported through specific storage engines&lt;/td&gt;
&lt;td&gt;Supported with “:memory:” mode (default)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serialization Performance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Faster for serializing flat file data&lt;/td&gt;
&lt;td&gt;Slower than ClickHouse for serialization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Complex Query Performance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strong for large-scale aggregation and distributed tasks&lt;/td&gt;
&lt;td&gt;Excels with complex computations on relational schema&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DataFrame Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Available via chDB in Python&lt;/td&gt;
&lt;td&gt;Directly supports &lt;strong&gt;Pandas&lt;/strong&gt;, &lt;strong&gt;Polars&lt;/strong&gt;, and &lt;strong&gt;Arrow&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scalability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Highly scalable across multiple nodes&lt;/td&gt;
&lt;td&gt;Limited to single-machine; serverless, embedded use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best For&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Big data analytics, enterprise BI, web analytics&lt;/td&gt;
&lt;td&gt;Local data exploration, data prototyping, ETL tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Performance Comparison: ClickHouse vs. DuckDB
&lt;/h2&gt;

&lt;p&gt;When it comes to high-performance analytical databases, both ClickHouse and DuckDB have unique strengths and limitations. &lt;/p&gt;

&lt;h4&gt;
  
  
  General Performance Overview
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ClickHouse&lt;/strong&gt; generally outperforms &lt;strong&gt;DuckDB&lt;/strong&gt; for larger data volumes and relatively straightforward queries. This strength can be attributed to ClickHouse's columnar storage, distributed nature, and optimizations for large-scale data processing, which allow it to efficiently manage and retrieve massive datasets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DuckDB&lt;/strong&gt;, however, is highly optimized for &lt;strong&gt;in-memory data processing&lt;/strong&gt; and excels at handling &lt;strong&gt;complex analytical queries&lt;/strong&gt; on single-node setups. DuckDB's ability to work seamlessly in-memory allows it to execute queries quickly for moderate data sizes without needing the distributed setup that ClickHouse typically requires for peak performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Factors Affecting Performance&lt;/strong&gt;: The structure and complexity of the data (like normalized vs. denormalized tables) and query complexity also impact performance for both databases. For instance, ClickHouse performs best with denormalized data, while DuckDB handles normalized data more effectively, especially in analytical tasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  DuckDB's Strengths and Limitations
&lt;/h4&gt;

&lt;p&gt;DuckDB’s limitations resemble those of other single-node query engines like &lt;strong&gt;Polars, DataFusion, and Pandas&lt;/strong&gt;. Although it’s often compared to Spark, it’s a bit of an “apples to oranges” comparison due to Spark’s multi-node, distributed setup. DuckDB, Polars, and similar engines are more suitable for fast, single-node analytics and don’t scale out to multiple nodes like Spark or ClickHouse.&lt;/p&gt;

&lt;p&gt;For example, in a recent benchmark involving a huge dataset, DuckDB excelled at in-memory querying. This left us impressed and somewhat surprised, given the common praise for ClickHouse’s speed and efficiency in handling large datasets. &lt;/p&gt;

&lt;h4&gt;
  
  
  Observations on ClickHouse’s Speed with Memory-Table Engine
&lt;/h4&gt;

&lt;p&gt;ClickHouse avoids disk I/O, decompression, and deserialization. This kind of performance is advantageous when dealing with high-speed requirements and straightforward query patterns.&lt;/p&gt;

&lt;p&gt;However, the &lt;strong&gt;complexity of queries&lt;/strong&gt;—especially analytical ones like TPC-DS benchmarks—can challenge ClickHouse’s performance, as it relies heavily on denormalized data for speed. Our test, seemed to amplify the impact of query complexity on ClickHouse’s performance. If anything, this reinforces the need to tailor the setup and data structure based on use case requirements, particularly when handling ClickHouse in scenarios it’s not fully optimized for.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Advantages of DuckDB
&lt;/h2&gt;

&lt;p&gt;As discussed earlier, DuckDB stands out for several key advantages, making it a valuable tool for data analysis.&lt;/p&gt;

&lt;h4&gt;
  
  
  Dependency-Free, Single Binary Deployment
&lt;/h4&gt;

&lt;p&gt;One of the biggest selling points of DuckDB is its minimalist approach to installation. Unlike other databases that require complex setups, DuckDB can be deployed as a single binary with no external dependencies. This makes it incredibly easy to get started with, as there’s no need to configure or maintain a separate database server. You can run it directly within your local environment or even integrate it into your existing workflows with minimal effort.&lt;/p&gt;

&lt;h4&gt;
  
  
  Querying Data Directly
&lt;/h4&gt;

&lt;p&gt;What sets it apart is its ability to query these DataFrames directly using SQL. This means you can interact with your data in a familiar Python-based environment and use the full power of SQL for your analysis, without having to worry about converting between formats or loading the entire dataset into memory. It's like running SQL queries directly on your existing data structures, streamlining your workflows.&lt;/p&gt;

&lt;h4&gt;
  
  
  Filling the Gap Between Traditional Databases and Data Science Workflows
&lt;/h4&gt;

&lt;p&gt;DuckDB bridges the gap between traditional database management systems and the fast-paced, iterative work often done in data science. For many data scientists, working with large datasets typically means turning to complex and heavyweight systems like PostgreSQL or even Spark. DuckDB, however, offers a simpler, more lightweight alternative while still providing the power of SQL-based analytics. It enables analysts to perform complex queries directly on datasets, whether they're stored locally or in the cloud, without the overhead of setting up a full-fledged database system.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cost-Effective Analytics: Using DuckDB with Parquet Files on GCS
&lt;/h4&gt;

&lt;p&gt;DuckDB’s cost-effectiveness is becoming a major selling point for companies looking to reduce their cloud analytics costs. For instance, many teams are turning to DuckDB to perform analytics on &lt;strong&gt;Parquet&lt;/strong&gt; files stored in &lt;strong&gt;Google Cloud Storage (GCS)&lt;/strong&gt;, rather than using more expensive solutions like &lt;strong&gt;BigQuery&lt;/strong&gt;. BigQuery’s costs can add up quickly with frequent analytical queries, whereas DuckDB enables a "bring-your-own-compute" model, allowing users to leverage their local machines to process cloud data without incurring heavy charges. This makes DuckDB an attractive alternative for data teams looking to cut down on operational costs while still performing powerful analytics.&lt;/p&gt;

&lt;h4&gt;
  
  
  Efficient Data Handling Without Full Data Loading
&lt;/h4&gt;

&lt;p&gt;One of the key benefits of DuckDB over tools like &lt;strong&gt;SQLite&lt;/strong&gt; or &lt;strong&gt;Pandas&lt;/strong&gt; is its ability to process data without loading the entire file into memory. While &lt;strong&gt;Pandas&lt;/strong&gt; requires the full dataset to be loaded before any analysis can be done, DuckDB allows you to copy compressed data directly into memory, bypassing the need to load everything at once. This not only saves memory but also makes DuckDB more efficient when dealing with large files or datasets.&lt;/p&gt;

&lt;h4&gt;
  
  
  Enhancing Data Science Workflows: DuckDB and Polars
&lt;/h4&gt;

&lt;p&gt;While &lt;strong&gt;Polars&lt;/strong&gt; is known for its performance in data manipulation, DuckDB offers a unique advantage by being a full-fledged database. DuckDB can read data from a &lt;strong&gt;Polars&lt;/strong&gt; DataFrame without any manual conversion, allowing you to work with data in both systems seamlessly. You can process data in Polars, then pass it to DuckDB for further SQL-based operations, and even save the results directly to the DuckDB database—all without the need for manual copying or reformatting. This smooth integration significantly enhances productivity and streamlines workflows for data scientists.&lt;/p&gt;

&lt;h4&gt;
  
  
  SQL Support with Advanced Features
&lt;/h4&gt;

&lt;p&gt;Another key advantage of DuckDB is its SQL dialect, which we find to be incredibly powerful. It supports advanced features like macros, which allow for more flexible and reusable queries. This is especially useful for data scientists who need to run complex queries and streamline their analysis. DuckDB also has a functional interface, which means you can work with data in a way similar to &lt;strong&gt;Spark&lt;/strong&gt; or &lt;strong&gt;Pandas&lt;/strong&gt;, but with the power of SQL under the hood. This hybrid approach allows you to transform and manipulate data efficiently, combining the best aspects of both worlds.&lt;/p&gt;

&lt;p&gt;The appeal of DuckDB becomes even clearer when considering the limitations that existed before it. Previously, working with smaller datasets locally was manageable with formats like CSV or Parquet, but as data size increased, the process grew challenging. Setting up traditional databases like MySQL or PostgreSQL for these mid-sized tasks was cumbersome, and distributed systems like Spark felt excessive for datasets that didn’t require that scale. DuckDB fills this gap, allowing small-to-medium datasets to be processed locally, without the need for complex database setups.&lt;/p&gt;

&lt;p&gt;In modern data analysis, data must often be combined from a wide variety of different sources. Data might sit in CSV files on your machine, in Parquet files in a data lake, or in an operational database. DuckDB has strong support for moving data between many different data sources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4sn2mah9499nlxf6mlj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4sn2mah9499nlxf6mlj.png" alt="Image description" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;*Source: &lt;a href="https://practicaldataengineering.substack.com/p/duckdb-beyond-the-hype" rel="noopener noreferrer"&gt;DuckDB beyond the hype&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Use Cases and Applications
&lt;/h2&gt;

&lt;p&gt;Both &lt;strong&gt;ClickHouse&lt;/strong&gt; and &lt;strong&gt;DuckDB&lt;/strong&gt; serve unique purposes in data processing, offering complementary strengths for different tasks.&lt;/p&gt;

&lt;h4&gt;
  
  
  ClickHouse for Large-Scale, Distributed OLAP Workloads
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;ClickHouse&lt;/strong&gt; excels in handling large-scale, &lt;strong&gt;distributed OLAP&lt;/strong&gt; workloads. Its &lt;strong&gt;MPP architecture&lt;/strong&gt; scales horizontally, making it perfect for real-time analytics over multi-terabyte datasets. It's used in industries like telecom, finance, and e-commerce where fast query performance on large datasets is crucial. Companies like &lt;strong&gt;Yandex&lt;/strong&gt; and &lt;strong&gt;Uber&lt;/strong&gt; leverage ClickHouse for real-time analytics, making it a top choice for enterprise-scale applications.&lt;/p&gt;

&lt;h4&gt;
  
  
  DuckDB for Serverless Pipelines and Local Data Processing
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;DuckDB&lt;/strong&gt; is ideal for &lt;strong&gt;serverless pipelines&lt;/strong&gt; and local data processing, excelling with &lt;strong&gt;small-to-medium datasets&lt;/strong&gt;. It's great for temporary staging in &lt;strong&gt;ELT jobs&lt;/strong&gt; and data transformations, especially when dealing with &lt;strong&gt;Parquet&lt;/strong&gt; and other semi-structured formats.&lt;/p&gt;

&lt;p&gt;In embedded systems or sensor data applications, DuckDB’s &lt;strong&gt;columnar storage&lt;/strong&gt; and compression make it highly efficient, processing data in tight memory constraints.&lt;/p&gt;

&lt;h4&gt;
  
  
  Complementary Roles in the Data Ecosystem
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;ClickHouse&lt;/strong&gt; is for large, distributed workloads, while &lt;strong&gt;DuckDB&lt;/strong&gt; handles smaller, local processing tasks. They complement each other, with &lt;strong&gt;ClickHouse&lt;/strong&gt; powering big data and cloud-based analytics, and &lt;strong&gt;DuckDB&lt;/strong&gt; simplifying local, serverless data tasks. Together, they provide a flexible, efficient data pipeline for different analytics needs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In the evolving landscape of data analytics, both ClickHouse and DuckDB have carved out distinct yet complementary niches. ClickHouse has established itself as a powerhouse for large-scale, distributed OLAP workloads, making it the go-to choice for enterprise-grade deployments handling petabytes of data. DuckDB, meanwhile, has revolutionized local data analysis by offering a lightweight, embedded solution that seamlessly integrates with modern data science tools like Pandas, Polars, and Apache Arrow. &lt;/p&gt;

&lt;p&gt;While ClickHouse excels at handling massive distributed datasets with impressive performance, DuckDB shines in scenarios requiring quick, in-process analytics and complex queries on smaller datasets. The choice between these tools ultimately depends on specific use cases: ClickHouse for organizations requiring robust, distributed analytics at scale, and DuckDB for data scientists and analysts who need efficient, local data processing without the overhead of traditional database systems. &lt;/p&gt;

&lt;p&gt;As organizations continue to grapple with diverse data processing needs, having both tools in the modern data stack enables teams to choose the right tool for their specific analytical requirements, whether it's processing petabytes in the cloud or analyzing gigabytes on a local machine.&lt;/p&gt;

&lt;p&gt;At the end of the day, it’s not about which one is better—it’s about choosing the right tool for the job. &lt;strong&gt;ClickHouse&lt;/strong&gt; powers through big data at scale, while &lt;strong&gt;DuckDB&lt;/strong&gt; gives data scientists the flexibility to run powerful queries on their own machines. &lt;strong&gt;Together, they form the perfect duo&lt;/strong&gt;—the heavyweight and the lightweight—both designed to make data processing faster and more efficient. &lt;/p&gt;




</description>
      <category>database</category>
      <category>analytics</category>
      <category>opensource</category>
      <category>learning</category>
    </item>
    <item>
      <title>ClickHouse: The Key to Faster Insights</title>
      <dc:creator>Arin Zingade</dc:creator>
      <pubDate>Tue, 03 Dec 2024 06:08:32 +0000</pubDate>
      <link>https://dev.to/arinzingade/clickhouse-the-key-to-faster-insights-32me</link>
      <guid>https://dev.to/arinzingade/clickhouse-the-key-to-faster-insights-32me</guid>
      <description>&lt;p&gt;&lt;a href="https://clickhouse.com/" rel="noopener noreferrer"&gt;ClickHouse&lt;/a&gt; is rapidly gaining traction for its unmatched speed and efficiency in processing big data. Cloudflare, for example, uses ClickHouse to process millions of rows per second and reduce memory usage by over four times, making it a key player in large-scale analytics. With its advanced features and real-time query performance, ClickHouse is becoming a go-to choice for companies handling massive datasets.&lt;br&gt;
In this article, we'll explore why ClickHouse is increasingly favored for analytics, its key features, and how to deploy it on Kubernetes. We'll also cover some best practices for scaling ClickHouse to handle growing workloads and maximize performance.&lt;/p&gt;
&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;ClickHouse is a high-performance, column-oriented SQL database management system (DBMS) designed for online analytical processing (OLAP), excelling in handling large datasets with remarkable speed, particularly for filtering and aggregating data. By utilizing columnar storage, it enables rapid data access and efficient compression, making it ideal for industries that demand fast data retrieval and analysis. Its common use cases include web analytics, where it processes vast amounts of tracking data, business intelligence to power high-speed decision-making, and log analysis for large-scale monitoring and troubleshooting.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Features of Clickhouse:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Columnar Storage:&lt;/strong&gt; Enables fast data access and efficient compression, enhancing the speed of analytical queries and efficient compressions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High Performance and Scalability:&lt;/strong&gt; Optimized for handling massive datasets and complex queries with unique table engines that determine how data is stored.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Analytics:&lt;/strong&gt; Supports real-time data processing and analytics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maximizing Hardware Usage:&lt;/strong&gt; ClickHouse is designed to utilize all available resources of the system effectively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rich Functionality:&lt;/strong&gt; Offers a wide array of built-in functions that enhance data manipulation and analysis.
### How Does ClickHouse Work?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ClickHouse is designed for speed and scalability, making it ideal for handling vast amounts of data. Its distributed nature allows for data replication across multiple nodes, ensuring both fault tolerance and high availability.&lt;/p&gt;
&lt;h4&gt;
  
  
  Architecture
&lt;/h4&gt;

&lt;p&gt;ClickHouse operates on a distributed architecture where data is partitioned and replicated across nodes. It employs a &lt;strong&gt;Shared Nothing Architecture&lt;/strong&gt;, moving towards a decoupled compute and storage model, facilitating parallel and vectorized execution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29beiw388dct3liq1wc9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29beiw388dct3liq1wc9.png" alt="Image description" width="800" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;An Example of Shared Nothing ClickHouse Cluster with 3 replica servers&lt;/em&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Storage Mechanism
&lt;/h4&gt;

&lt;p&gt;ClickHouse uses columnar storage which allows it to read and compress large amounts of data quickly. Organizations migrating from row-based systems like Postgres can benefit significantly in terms of performance.&lt;br&gt;
Tables utilize unique &lt;strong&gt;Table Engines&lt;/strong&gt;—notably the &lt;a href="https://dev.to**MergeTree**"&gt;MergeTree&lt;/a&gt; engine family—to store data effectively, leveraging ClickHouse’s strengths in analytical processing.&lt;/p&gt;
&lt;h4&gt;
  
  
  Query Execution
&lt;/h4&gt;

&lt;p&gt;ClickHouse utilizes a unique query engine optimized for high-speed data retrieval, leveraging Single Instruction, Multiple Data (SIMD) instructions to process multiple data points simultaneously. This parallel processing significantly enhances performance, especially for complex queries. As demonstrated in the video &lt;a href="https://www.youtube.com/watch?v=XpkFEj1rVXg&amp;amp;t=966s" rel="noopener noreferrer"&gt;A Day in the Life of a Query&lt;/a&gt;, ClickHouse efficiently breaks down and executes queries, focusing on answering specific questions rather than merely retrieving raw data.&lt;br&gt;
To further understand query execution, we can use the &lt;code&gt;EXPLAIN&lt;/code&gt; clause. The &lt;code&gt;EXPLAIN&lt;/code&gt; clause in SQL is used to display the execution plan of a query. When you run a query with &lt;code&gt;EXPLAIN&lt;/code&gt;, the database doesn't actually execute the query. Instead, it shows a detailed breakdown of how the query would be executed, including the steps the query optimizer will take.&lt;/p&gt;

&lt;p&gt;For ClickHouse query execution steps look like: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvze4tzf7xmwxh0wmv0m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvze4tzf7xmwxh0wmv0m.png" alt="Image description" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Source: &lt;a href="https://www.youtube.com/watch?v=hP6G2Nlz_cA&amp;amp;t=366s" rel="noopener noreferrer"&gt;Performance introspection EXPLAIN clause&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EXPLAIN PLAN:&lt;/strong&gt; The query plan shows the in a generic way the stages that need to be executed for the query, but the query plan does not show how the ClickHouse executes the query using the available resources on the machine, its handy to check in what order the clauses are getting executed, read the plan from bottom to top.&lt;/p&gt;

&lt;p&gt;For demonstration purposes, we will be using the &lt;a href="https://clickhouse.com/docs/en/getting-started/example-datasets/uk-price-paid" rel="noopener noreferrer"&gt;UK Property Prices dataset&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;PLAN&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;postcode1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;property_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_price&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;uk_price_paid&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;is_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2023-01-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;postcode1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;avg_price&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;for the above query, we get output as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Expression &lt;span class="o"&gt;(&lt;/span&gt;Project names&lt;span class="o"&gt;)&lt;/span&gt;
Limit &lt;span class="o"&gt;(&lt;/span&gt;preliminary LIMIT &lt;span class="o"&gt;(&lt;/span&gt;without OFFSET&lt;span class="o"&gt;))&lt;/span&gt;
Sorting &lt;span class="o"&gt;(&lt;/span&gt;Sorting &lt;span class="k"&gt;for &lt;/span&gt;ORDER BY&lt;span class="o"&gt;)&lt;/span&gt;
Expression &lt;span class="o"&gt;((&lt;/span&gt;Before ORDER BY + Projection&lt;span class="o"&gt;))&lt;/span&gt;
Aggregating
Expression &lt;span class="o"&gt;(&lt;/span&gt;Before GROUP BY&lt;span class="o"&gt;)&lt;/span&gt;
Expression
ReadFromMergeTree &lt;span class="o"&gt;(&lt;/span&gt;default.uk_price_paid&lt;span class="o"&gt;)&lt;/span&gt;

Indexes:
    PrimaryKey
    Condition: &lt;span class="nb"&gt;true
    &lt;/span&gt;Parts: 1/1
    Granules: 3598/3598
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In analyzing the query execution plan, it's essential to interpret the steps from the bottom up (in this case from ReadMergeTree to Limit) , as each layer represents a sequential operation performed on the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EXPLAIN AST:&lt;/strong&gt; With this clause, we can explore the Abstract Syntax Tree, we can also visualize this via &lt;a href="https://graphviz.org/" rel="noopener noreferrer"&gt;Graphviz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;AST&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;postcode1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;property_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_price&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;uk_price_paid&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;is_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2023-01-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;postcode1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;avg_price&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we get Abstract Syntax Tree as:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5djry4119qltjac83jb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp5djry4119qltjac83jb.png" alt="Image description" width="800" height="182"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EXPLAIN PIPELINE&lt;/strong&gt;: Introspecting the query pipeline can help you identify where the bottle necks of the query.&lt;/p&gt;

&lt;p&gt;For the query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;PIPELINE&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;postcode1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;property_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_price&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;uk_price_paid&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;is_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2023-01-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;postcode1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;avg_price&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we get output as: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq1kgs6z0u0k04ne0sqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq1kgs6z0u0k04ne0sqa.png" alt="Image description" width="800" height="298"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ClickHouse naturally parallelizes queries, with each step utilizing multiple threads by default. In this example, the stages are handled by 4 threads, meaning each thread processes roughly one-fourth of the data in parallel before combining the results. This approach speeds up execution significantly.&lt;br&gt;
For instance, identifying stages that run in a &lt;strong&gt;single thread&lt;/strong&gt; is key to optimizing slow queries. By isolating these bottlenecks, we can target specific parts of the query for performance improvements, ensuring faster and more efficient execution overall.&lt;/p&gt;
&lt;h4&gt;
  
  
  Integration Capabilities
&lt;/h4&gt;

&lt;p&gt;ClickHouse is highly compatible with a wide range of data tools, including ETL/ELT processes and BI tools like &lt;a href="https://superset.apache.org/" rel="noopener noreferrer"&gt;Apache Superset&lt;/a&gt;. It supports virtually all common data formats, making integration seamless across diverse ecosystems.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why Choose ClickHouse and Migrate?
&lt;/h3&gt;

&lt;p&gt;Choosing ClickHouse offers significant advantages, particularly for organizations dealing with large-scale data analytics. Its unique combination of performance, cost-effectiveness, and community support makes it a compelling choice for migrating from traditional databases.&lt;/p&gt;
&lt;h4&gt;
  
  
  Performance Advantages
&lt;/h4&gt;

&lt;p&gt;ClickHouse is optimized for &lt;strong&gt;OLAP&lt;/strong&gt; workloads, delivering exceptional speed in both data ingestion and query execution, offering sub-second query performance even when processing billions of rows. This makes it ideal for real-time analytics and decision-making in data-intensive industries. &lt;br&gt;
The &lt;strong&gt;primary key&lt;/strong&gt; in ClickHouse plays a crucial role in determining how data is stored and searched. It's important to select columns that are frequently queried, as the primary key should optimize query execution, especially for the &lt;code&gt;WHERE&lt;/code&gt; clause. &lt;strong&gt;In ClickHouse, primary key is not unique to each row.&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Real-World Success Stories
&lt;/h4&gt;

&lt;p&gt;Many organizations have successfully migrated to ClickHouse, achieving substantial improvements in performance and cost savings. From e-commerce giants to financial companies, success stories highlight ClickHouse’s ability to transform data analytics capabilities at scale. For more details, refer to &lt;a href="https://clickhouse.com/docs/en/about-us/adopters" rel="noopener noreferrer"&gt;ClickHouse Adopters&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Running ClickHouse on Kubernetes
&lt;/h2&gt;

&lt;p&gt;In this guide, we’ll walk through the process of running ClickHouse on a Kubernetes cluster in 7 steps:&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Install Kubectl
&lt;/h3&gt;

&lt;p&gt;First, we need to install &lt;code&gt;kubectl&lt;/code&gt;, the command-line tool for interacting with Kubernetes clusters. Run the following commands in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; kubectl
&lt;span class="c"&gt;# Download Minikube&lt;/span&gt;
&lt;span class="c"&gt;# Please check your OS configuration and download from:&lt;/span&gt;
&lt;span class="c"&gt;# https://minikube.sigs.k8s.io/docs/start/?arch=%2Flinux%2Fx86-64%2Fstable%2Fbinary+download&lt;/span&gt;
&lt;span class="nb"&gt;sudo install &lt;/span&gt;minikube-linux-amd64 /usr/local/bin/minikube
minikube version
minikube start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, you have set up Kubernetes locally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Install Altinity ClickHouse Operator
&lt;/h3&gt;

&lt;p&gt;Next, we will download and install the Altinity ClickHouse operator to manage our ClickHouse deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/Altinity/clickhouse-operator/master/deploy/operator/clickhouse-operator-install-bundle.yaml

kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the ClickHouse operator pod running, which indicates that the operator is successfully deployed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Install the ClickHouse Database
&lt;/h3&gt;

&lt;p&gt;Now we need to install the ClickHouse database itself. Follow these steps:&lt;/p&gt;

&lt;p&gt;A basic configuration example for our demo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clickhouse.altinity.com/v1"&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ClickHouseInstallation"&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;  name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-clickhouse&lt;/span&gt;
&lt;span class="na"&gt;  namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test-clickhouse-operator&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;  configuration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;    clusters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;      - name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cluster&lt;/span&gt;
&lt;span class="na"&gt;        layout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;          shardsCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;          replicasCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;  templates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;    podTemplates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;      - name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse-pod-template&lt;/span&gt;
&lt;span class="na"&gt;        spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;          containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;            - name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse&lt;/span&gt;
&lt;span class="na"&gt;              image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse/clickhouse-server:latest&lt;/span&gt;
&lt;span class="na"&gt;              resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;                requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;                  cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;100m"&lt;/span&gt;
&lt;span class="na"&gt;                  memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1Gi"&lt;/span&gt;
&lt;span class="na"&gt;                limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;                  cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1"&lt;/span&gt;
&lt;span class="na"&gt;                  memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2Gi"&lt;/span&gt;
&lt;span class="na"&gt;  defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;    templates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;      podTemplate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;clickhouse-pod-template&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Now apply the configuration and check the status of the pods and services:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;clickhouse-install.yaml | kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; -
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; test-clickhouse-operator
kubectl get services &lt;span class="nt"&gt;-n&lt;/span&gt; test-clickhouse-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see services running as defined in your installation configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Connect to ClickHouse Database
&lt;/h3&gt;

&lt;p&gt;To interact with the ClickHouse database, we need to install the ClickHouse client on our local machine. If you are using a different operating system, refer to the official &lt;a href="https://clickhouse.com/docs/en/install#quick-install" rel="noopener noreferrer"&gt;ClickHouse installation guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Run the following commands to install ClickHouse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; apt-transport-https ca-certificates curl gnupg

curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; &lt;span class="s1"&gt;'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key'&lt;/span&gt; | &lt;span class="nb"&gt;sudo &lt;/span&gt;gpg &lt;span class="nt"&gt;--dearmor&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /usr/share/keyrings/clickhouse-keyring.gpg

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main"&lt;/span&gt; | &lt;span class="nb"&gt;sudo tee&lt;/span&gt; /etc/apt/sources.list.d/clickhouse.list

&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; clickhouse-server clickhouse-client
&lt;span class="nb"&gt;sudo &lt;/span&gt;clickhouse start

kubectl &lt;span class="nt"&gt;-n&lt;/span&gt; test-clickhouse-operator port-forward &amp;lt;pod_name&amp;gt; 9000:9000 &amp;amp;

clickhouse-client
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Test Your Services
&lt;/h3&gt;

&lt;p&gt;To verify that everything is running correctly, execute the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; test-clickhouse-operator
kubectl get services &lt;span class="nt"&gt;-n&lt;/span&gt; test-clickhouse-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6: Execute Queries
&lt;/h3&gt;

&lt;p&gt;Now, let’s create a table and execute some queries in ClickHouse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;clickhouse-client
CREATE TABLE test_table &lt;span class="o"&gt;(&lt;/span&gt;
    id UInt32,
    name String
&lt;span class="o"&gt;)&lt;/span&gt; ENGINE &lt;span class="o"&gt;=&lt;/span&gt; MergeTree&lt;span class="o"&gt;()&lt;/span&gt;
ORDER BY &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

INSERT INTO test_table VALUES &lt;span class="o"&gt;(&lt;/span&gt;1, &lt;span class="s1"&gt;'CloudRaft'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="o"&gt;(&lt;/span&gt;2, &lt;span class="s1"&gt;'ClickHouse'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM test_table&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the results in the CLI with the changed path, indicating that you are interacting directly with the ClickHouse cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Load Testing
&lt;/h3&gt;

&lt;p&gt;To further evaluate the performance of your ClickHouse installation, consider using load testing tools like &lt;a href="https://jmeter.apache.org/" rel="noopener noreferrer"&gt;Apache JMeter&lt;/a&gt; or &lt;a href="https://k6.io/" rel="noopener noreferrer"&gt;k6&lt;/a&gt; to simulate increased query loads. Measure how query response times change as you add more nodes to the cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Differences between PostgreSQL and ClickHouse
&lt;/h2&gt;

&lt;p&gt;While both &lt;a href="https://www.postgresql.org/" rel="noopener noreferrer"&gt;Postgres&lt;/a&gt; and ClickHouse serve different purposes, the key distinction lies in how they handle &lt;strong&gt;replication&lt;/strong&gt; and &lt;strong&gt;sharding&lt;/strong&gt;. Postgres is primarily designed for transactional workloads (OLTP), where data consistency and durability are prioritized. On the other hand, ClickHouse is tailored for analytical workloads (OLAP), and optimized for high-speed querying and large-scale data analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  Materialized Views
&lt;/h3&gt;

&lt;p&gt;In ClickHouse, &lt;a href="https://clickhouse.com/docs/en/materialized-view" rel="noopener noreferrer"&gt;Materialized Views&lt;/a&gt; are a powerful feature designed to improve query performance by pre-aggregating and storing data. Unlike regular views, which are calculated on-the-fly during query execution, materialized views physically store the results of a query, allowing for faster reads. These views can also leverage the efficient compression and fast access capabilities of the columnar storage model, further enhancing performance. &lt;/p&gt;

&lt;p&gt;Materialized views are particularly useful in environments where query performance is critical, as they provide pre-computed results that save time during execution.&lt;br&gt;
Postgres’s Materialized Views need to be manually re-updated, whereas ClickHouse automatically updates them with insert-and-optimize-later philosophy.&lt;br&gt;
 &lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling ClickHouse
&lt;/h2&gt;

&lt;p&gt;In ClickHouse, scaling can be achieved through &lt;strong&gt;replication&lt;/strong&gt; and &lt;strong&gt;sharding&lt;/strong&gt; mechanisms. These help distribute data and queries across multiple nodes for performance and fault tolerance.&lt;/p&gt;

&lt;p&gt;ClickHouse traditionally relies on &lt;strong&gt;ZooKeeper&lt;/strong&gt;, a centralized service for coordinating distributed systems. ZooKeeper ensures that data replicas are in sync across nodes by maintaining metadata, managing locks, and handling failovers. It acts as a key component to keep the cluster’s state consistent, ensuring that replicas do not diverge and that read and write operations are properly distributed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replication
&lt;/h3&gt;

&lt;p&gt;Replication ensures that copies of the same data are stored across multiple nodes to provide redundancy and improve fault tolerance. Replication in ClickHouse is at the Table Level.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ReplicatedMergeTree&lt;/strong&gt; is the engine used for replicated tables.&lt;/li&gt;
&lt;li&gt;Each table has a replica on multiple servers, and these replicas are kept in sync.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clickhouse-Keeper&lt;/strong&gt; manages the coordination between these replicas, ensuring consistency by managing locks, transactions, and metadata related to replication.&lt;/li&gt;
&lt;li&gt;In case one replica goes down, the system can still read from and write to the available replicas.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Replication Process Example&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Let’s assume there are two replicas, A and B. A write to Replica A will be logged and replicated to Replica B, ensuring that both have the same data. This happens asynchronously to avoid latency issues.
### Sharding
Sharding in Clickhouse is the process of dividing the data horizontally into smaller parts and distributing it across different servers (shards). This allows Clickhouse to handle very large datasets by spreading the load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distributed Table&lt;/strong&gt;: Clickhouse uses a distributed table to achieve sharding. A distributed table is a logical table that sits on top of local tables (sharded across different nodes) and acts as a query router.&lt;/li&gt;
&lt;li&gt;When a query is executed on a distributed table, it is automatically routed to the relevant shard(s) b ased on the sharding key.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sharding Process Example&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suppose you have 3 nodes (Node 1, Node 2, Node 3), and data is sharded by a key such as user ID. A distributed table will split the data based on the user ID and store different users’ data on different nodes. Queries on user-specific data will be routed directly to the shard holding that user’s data, improving performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, ClickHouse offers a powerful solution for businesses seeking high-speed, large-scale analytics. With its columnar storage, real-time query performance, and scalability through replication and sharding, it serves as an excellent alternative for organizations transitioning from traditional row-based databases like Postgres. Particularly effective in industries such as web analytics, business intelligence, and log analysis, ClickHouse meets the demands for rapid data retrieval and analysis.&lt;/p&gt;

&lt;p&gt;However, while ClickHouse excels in query performance and scalability, it may introduce complexities in data insertion compared to traditional databases, and it’s not well-suited for OLTP use cases. Organizations considering migration to ClickHouse should weigh these trade-offs, especially if they require frequent real-time inserts or updates. Ultimately, its scalability, cost-effectiveness, and growing community support make ClickHouse a compelling choice for modern data-driven applications, transforming how businesses manage and analyze data.&lt;/p&gt;




</description>
      <category>database</category>
      <category>analytics</category>
      <category>opensource</category>
      <category>learning</category>
    </item>
    <item>
      <title>Decoding OCR: A Comprehensive Guide</title>
      <dc:creator>Arin Zingade</dc:creator>
      <pubDate>Wed, 07 Aug 2024 19:49:09 +0000</pubDate>
      <link>https://dev.to/arinzingade/decoding-ocr-a-comprehensive-guide-4n86</link>
      <guid>https://dev.to/arinzingade/decoding-ocr-a-comprehensive-guide-4n86</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Optical Character Recognition (OCR) stands as a fundamental technology that transforms visual text representations into machine-readable formats. This capability is essential for digitizing printed documents and optimizing data entry processes. Advancements in artificial intelligence (AI) and machine learning (ML) have brought significant improvements to traditional OCR systems. These technologies enhance OCR's ability to accurately interpret text from complex or low-quality images by learning from data variations.&lt;/p&gt;

&lt;p&gt;Looking to the future, OCR is on track for exciting developments. The technology is expected to integrate more seamlessly with other AI domains, such as natural language processing (NLP) and image recognition. This evolution will not only refine its core functionalities but also extend its utility across more sophisticated and holistic data processing solutions. Moreover, the scope of OCR applications is set to expand dramatically. Beyond mere text digitization, future applications may include real-time translation services, accessibility tools for the visually impaired, and interactive educational platforms. This broadening of scope will undoubtedly make OCR an even more vital component of our increasingly digital world.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Metrics for Evaluating OCR Systems
&lt;/h3&gt;

&lt;p&gt;Evaluating the performance of OCR systems is crucial to ensure they meet the required accuracy and efficiency standards. Key metrics such as with application of &lt;strong&gt;Levenshtein Distance&lt;/strong&gt; Character Error Rate (CER),  Word Error Rate (WER) are measured.&lt;br&gt;
An advanced metric known as ZoneMapAltCnt also provides comprehensive insights into the performance of OCR systems.&lt;/p&gt;
&lt;h4&gt;
  
  
  Levenshtein Distance
&lt;/h4&gt;

&lt;p&gt;Levenshtein Distance is a measure of the difference between two sequences. In the context of OCR, it quantifies how many single-character edits (insertions, deletions, or substitutions) are necessary to change the recognized text into the ground truth text. &lt;/p&gt;
&lt;h4&gt;
  
  
  Character Error Rate (CER)
&lt;/h4&gt;

&lt;p&gt;Character Error Rate (CER) is a fundamental metric in OCR evaluation, representing the percentage of characters that were incorrectly recognized in a text document. It is calculated by comparing the recognized text to a ground-truth text and counting the number of insertions, deletions, and substitutions needed to make the recognized text identical to the ground truth. &lt;/p&gt;
&lt;h4&gt;
  
  
  Word Error Rate (WER)
&lt;/h4&gt;

&lt;p&gt;Word Error Rate (WER) measures the performance of OCR systems at the word level. It is similar to CER but evaluates errors in terms of whole words instead of individual characters. WER is calculated by the number of word insertions, deletions, and substitutions required to match the recognized text with the ground truth. &lt;/p&gt;
&lt;h4&gt;
  
  
  ZoneMapAltCnt
&lt;/h4&gt;

&lt;p&gt;The ZoneMapAltCnt metric represents a more advanced approach in evaluating OCR systems. It assesses both the accuracy of text segmentation and the correctness of the recognized text within those segments. This metric evaluates the precision of detected text zones and measures character and word accuracy within these zones. By handling segmentation errors effectively.  For more details, refer to this &lt;a href="**[https://inria.hal.science/hal-01981731/file/Paper-Devashish.pdf](https://inria.hal.science/hal-01981731/file/Paper-Devashish.pdf)**"&gt;document&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Factors Affecting OCR Accuracy
&lt;/h4&gt;

&lt;p&gt;Several factors influence the accuracy of OCR systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Document Condition:&lt;/strong&gt; Poor quality or damaged documents can significantly reduce OCR accuracy due to obscured or unreadable text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image Resolution:&lt;/strong&gt; Higher resolution images provide more detail, allowing for better character recognition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document Language:&lt;/strong&gt; OCR systems must be optimized for specific languages, as character sets and linguistic rules vary widely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preprocessing:&lt;/strong&gt; Techniques such as noise reduction, binarization, and normalization improve text readability and OCR accuracy.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Framework For an OCR Model
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fym5hfuuva8sts0sts0yo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fym5hfuuva8sts0sts0yo.png" alt="Image description" width="800" height="248"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Preprocessing Images for OCR
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Opening an Image
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;
&lt;span class="n"&gt;image_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PATH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Inverting Image
&lt;/h4&gt;

&lt;p&gt;Inverting an image in the context of OCR refers to reversing the color scheme of the image to enhance the text's readability and contrast for better recognition accuracy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;inverted_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bitwise_not&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp/inverted.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inverted_image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, OpenCV handles this for us and we write the image in "temp" folder for further analysis.&lt;/p&gt;

&lt;h4&gt;
  
  
  Binarization
&lt;/h4&gt;

&lt;p&gt;Binarization in the context of OCR is a crucial preprocessing step that involves converting a color or grayscale image into a binary image. This binary image consists of only two colors—typically black and white. This step becomes important because most OCR models are designed to handle &lt;br&gt;
this kind of format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;grayScale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cvtColor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COLOR_BGR2GRAY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;gray_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;grayScale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;thresh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;im_bw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gray_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;230&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;THRESH_BINARY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp/bw.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;im_bw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Noise Removal
&lt;/h4&gt;

&lt;p&gt;Noise removal is a critical preprocessing step in OCR because it enhances the quality of the input images, leading to more accurate text recognition&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;noiseRemoval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;kernal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dilate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kernal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;erode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kernal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;morphologyEx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MORPH_CLOSE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kernal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;medianBlur&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; 
&lt;span class="n"&gt;no_noise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;noiseRemoval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;im_bw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp/no_noise.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;no_noise&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Dilation and Erosion
&lt;/h4&gt;

&lt;p&gt;In OCR, the processes of dilation and erosion play pivotal roles in improving the readability and recognition accuracy of text. Dilation helps to enhance the visibility of characters by thickening them, thus aiding in better character recognition in low-quality or faint prints. Conversely, erosion is used to thin out characters, which prevents misinterpretations and enhances the separation of text from the background.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;#Erosion
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;thin_font&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bitwise_not&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;kernel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;erode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bitwise_not&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;

&lt;span class="c1"&gt;#Dilation
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;thick_font&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bitwise_not&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;kernel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uint8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dilate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bitwise_not&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt; &lt;/span&gt; &lt;span class="err"&gt; &lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;

&lt;span class="n"&gt;eroded_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;thin_font&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;no_noise&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dilate_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;thick_font&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;no_noise&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp/eroded_image.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eroded_image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp/dilate.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dilate_image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Dilation and Erosion processes work only when the image is inverted - background is black and text is white.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Best OCR Models for Different Use Cases
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Amazon Textract&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Case&lt;/strong&gt;: Industry Level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strength&lt;/strong&gt;: Amazon Textract is highly effective for industrial-scale document processing, capable of extracting text and data from virtually any type of document, including forms and tables. It integrates seamlessly with other AWS services, making it ideal for businesses looking to automate document workflows in cloud environments.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/textract/" rel="noopener noreferrer"&gt;https://aws.amazon.com/textract/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SuryaOCR&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Case&lt;/strong&gt;: Large Language Range Support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strength&lt;/strong&gt;: SuryaOCR stands out for its extensive language support, making it suitable for global applications where documents in multiple languages need to be processed. This makes it a valuable tool for international organizations and government agencies dealing with multilingual data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/VikParuchuri/surya" rel="noopener noreferrer"&gt;https://github.com/VikParuchuri/surya&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tesseract&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Case&lt;/strong&gt;: Customizable and Versatile&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strength&lt;/strong&gt;: Tesseract is an open-source OCR engine that offers flexibility and customization, which is perfect for developers looking to integrate OCR into their applications without significant investment. Its versatility makes it a popular choice for academic research, prototype development, and small business applications.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/tesseract-ocr" rel="noopener noreferrer"&gt;https://github.com/tesseract-ocr&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;EasyOCR&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Case&lt;/strong&gt;: Good for Small and Simple Projects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strength&lt;/strong&gt;: EasyOCR is an accessible and straightforward tool for developers who need a quick and efficient solution for small-scale projects. It supports multiple languages and is easy to set up, making it ideal for startups and individual developers working on applications with less complex OCR requirements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/JaidedAI/EasyOCR" rel="noopener noreferrer"&gt;https://github.com/JaidedAI/EasyOCR&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Different Use Cases Where OCR Can Be Used
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Automated Form Processing&lt;/li&gt;
&lt;li&gt;Digital Archiving&lt;/li&gt;
&lt;li&gt;License Plate Recognition&lt;/li&gt;
&lt;li&gt;Legal Document Analysis&lt;/li&gt;
&lt;li&gt;Educational Resources&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Case Study - A Demo on &lt;a href="https://github.com/VikParuchuri/surya" rel="noopener noreferrer"&gt;Surya-OCR&lt;/a&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Overview of Surya OCR
&lt;/h4&gt;

&lt;p&gt;Surya OCR is a comprehensive document OCR toolkit. This toolkit is designed to handle a wide range of document types and supports OCR in over 90 languages, benchmarking favorably against other leading cloud services.&lt;/p&gt;

&lt;p&gt;They also have a hosted API &lt;a href="https://www.datalab.to/" rel="noopener noreferrer"&gt;https://www.datalab.to/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multilingual Support:&lt;/strong&gt; Capable of performing OCR in more than 90 languages, making it highly versatile for global applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced Text Detection:&lt;/strong&gt; Offers line-level text detection capabilities, which work effectively across any language.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sophisticated Layout Analysis:&lt;/strong&gt; Detects various layout elements such as tables, images, headers, etc., and determines their arrangement within the document.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reading Order Detection:&lt;/strong&gt; Identifies and follows the reading order in documents, which is crucial for understanding structured data like forms and articles.&lt;/li&gt;
&lt;li&gt;Surya also offers performance tips for optimizing GPU and CPU usage during OCR processing, ensuring efficient handling of resources, unlike Tesseract.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Usage:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Surya is particularly adept at handling complex OCR tasks such as processing scientific papers, textbooks, scanned documents, and even mixed-language content efficiently.&lt;/li&gt;
&lt;li&gt;The toolkit is available through a hosted API that supports PDFs, images, Word documents, and PowerPoint presentations, ensuring high reliability and consistent performance without latency spikes.&lt;/li&gt;
&lt;li&gt; Surya can be installed via pip and requires Python 3.9+ and PyTorch. The model weights download automatically upon the first run.&lt;/li&gt;
&lt;li&gt;It includes a user-friendly Streamlit app that allows for interactive testing of the OCR capabilities on images or PDF files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this demonstration, we will explore how to perform &lt;strong&gt;Text Detection, OCR, Reading Layout, and Reading Order&lt;/strong&gt; using Surya. We will cover three methods: using the Streamlit GUI, through the Command Line Interface, and directly from Python code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Surya OCR Through GUI
&lt;/h4&gt;

&lt;p&gt;To run Surya OCR GUI locally on your machine, you will need to open your Command Line Interface (CLI) and follow the given instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;streamlit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After successfully installing streamlit, execute below snippet in CLI&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;suryu_gui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32acal5mhr3ekdo03232.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32acal5mhr3ekdo03232.png" alt="Image description" width="800" height="119"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The app will start running on &lt;br&gt;
&lt;a href="http://localhost:8501" rel="noopener noreferrer"&gt;http://localhost:8501&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq22fslqessoy9i4habyp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq22fslqessoy9i4habyp.png" alt="Image description" width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Above dashboard will be displayed if all the steps above are executed successfully. &lt;br&gt;
Now just follow the steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on Browse File Button&lt;/li&gt;
&lt;li&gt;Select your desired file and language.&lt;/li&gt;
&lt;li&gt;Choose one between:

&lt;ol&gt;
&lt;li&gt;Run Text Detection&lt;/li&gt;
&lt;li&gt;Run OCR&lt;/li&gt;
&lt;li&gt;Run Layout Analysis&lt;/li&gt;
&lt;li&gt;Run Reading Order&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;All the images from this point are processed by Surya OCR&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Text Detection&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Identifies areas within an image or document where text is present. This step involves locating text blocks and distinguishing them from non-text elements like images and backgrounds.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjiqfep9fnv7ttksxb6zr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjiqfep9fnv7ttksxb6zr.png" alt="Image description" width="800" height="702"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey2908nksglga3wlilx7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey2908nksglga3wlilx7.png" alt="Image description" width="800" height="702"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OCR (Optical Character Recognition)&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Transforms the detected text areas into machine-readable characters. This process involves analyzing the shapes of characters and converting them into corresponding text data.&lt;/li&gt;
&lt;li&gt;Note that here, we have used an inverted image as uploaded file.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fncxhkz46onln3wszfwx8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fncxhkz46onln3wszfwx8.png" alt="Image description" width="800" height="870"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4i4qte7pqbw1128ew4hh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4i4qte7pqbw1128ew4hh.png" alt="Image description" width="800" height="869"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reading Order&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Analyzes the physical structure of the document to understand how different elements are organized. This includes the detection of headers, footers, columns, tables, and images, helping to interpret the document as a whole rather than just isolated text blocks.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd6djjml3j0ocwaq32ycs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd6djjml3j0ocwaq32ycs.png" alt="Image description" width="679" height="960"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zh6gfdvpsmgyc4ovfwv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zh6gfdvpsmgyc4ovfwv.png" alt="Image description" width="679" height="960"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Using Surya OCR via Command Line
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Text Recognition
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Open a Command Prompt&lt;/strong&gt;: Navigate to the folder containing the images you wish to process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Execute Surya OCR&lt;/strong&gt;: Type the following command to process your images using Surya OCR. Replace &lt;code&gt;DATA_PATH&lt;/code&gt; with the path to your images relative to your current directory. This command will output the recognized text in the "results" folder.&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight shell"&gt;&lt;code&gt;surya_ocr DATA_PATH &lt;span class="nt"&gt;--images&lt;/span&gt; &lt;span class="nt"&gt;--langs&lt;/span&gt; hi,en
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;In the command above, &lt;code&gt;--langs hi,en&lt;/code&gt; specifies the languages for OCR. "en" represents English. Surya OCR supports up to 90 different ISO language codes. For a complete list of supported languages, refer &lt;a href="https://github.com/VikParuchuri/surya/blob/master/surya/languages.py" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;Similarly,&lt;/p&gt;

&lt;h3&gt;
  
  
  Text Line Detection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;surya_detect DATA_PATH &lt;span class="nt"&gt;--images&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layout Analysis
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;surya_layout DATA_PATH &lt;span class="nt"&gt;--images&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Reading Order
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;surya_order DATA_PATH &lt;span class="nt"&gt;--images&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It should be noted that the results obtained will be the same as that presented above in this article.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Surya OCR from Python
&lt;/h3&gt;

&lt;p&gt;Sometimes the images on which we want to apply OCR may not be upto the mark, need some preprocessing techniques as discussed earlier in this article, improvements etx.&lt;br&gt;
So we need to create a pipeline through which our image processes.&lt;br&gt;
This can be easily done in python by combining various functions and then applying Surya OCR on the processed document&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;surya-ocr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Text Recognition
&lt;/h4&gt;

&lt;p&gt;This Python script utilizes the &lt;code&gt;surya.ocr&lt;/code&gt; library to perform optical character recognition (OCR) on images. The script:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loads an image for OCR.&lt;/li&gt;
&lt;li&gt;Initializes necessary models and processors for text detection and recognition.&lt;/li&gt;
&lt;li&gt;Executes the OCR process on the image, returning text predictions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;To use the segformer, use version 0.4.14 of Surya as in the latest update, the file is missing.&lt;/em&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.ocr&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;run_ocr&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.model.detection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.model.recognition.model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_model&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.model.recognition.processor&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_processor&lt;/span&gt;

&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IMAGE_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="n"&gt;langs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;det_processor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;det_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_processor&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;rec_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rec_processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nf"&gt;load_processor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  

&lt;span class="c1"&gt;# Perform OCR and get predictions
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_ocr&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;langs&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;det_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;det_processor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rec_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rec_processor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Line Detection
&lt;/h4&gt;

&lt;p&gt;This segment of the code focuses on detecting textual lines within an image:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It loads an image and uses the &lt;code&gt;surya.detection&lt;/code&gt; module.&lt;/li&gt;
&lt;li&gt;Applies a text detection model to find textual lines.&lt;/li&gt;
&lt;li&gt;Outputs a list of dictionaries containing detected text lines for further processing.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.detection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;batch_text_detection&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.model.detection.model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;load_processor&lt;/span&gt;

&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IMAGE_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nf"&gt;load_processor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  

&lt;span class="c1"&gt;# Get predictions of text lines
&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;batch_text_detection&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Layout Analysis
&lt;/h4&gt;

&lt;p&gt;This script analyzes the layout of the page within an image:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loads an image and initializes models for both line detection and layout analysis.&lt;/li&gt;
&lt;li&gt;First, detects text lines, then performs layout analysis based on these lines.&lt;/li&gt;
&lt;li&gt;Returns structured data indicating the layout of content in the image.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.detection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;batch_text_detection&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.layout&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;batch_layout_detection&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.model.detection.model&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;load_processor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.settings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;

&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IMAGE_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="n"&gt;det_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;det_processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nf"&gt;load_processor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LAYOUT_MODEL_CHECKPOINT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;load_processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LAYOUT_MODEL_CHECKPOINT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="c1"&gt;# First detect lines, then analyze layout
&lt;/span&gt;&lt;span class="n"&gt;line_predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;batch_text_detection&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;det_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;det_processor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;layout_predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;batch_layout_detection&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_predictions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Reading Order
&lt;/h4&gt;

&lt;p&gt;This code snippet establishes the reading order within a document:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loads an image and extracts bounding boxes (bboxes) of detected text elements.&lt;/li&gt;
&lt;li&gt;Utilizes the &lt;code&gt;surya.ordering&lt;/code&gt; module to determine the sequential order of text blocks.&lt;/li&gt;
&lt;li&gt;Outputs ordered text predictions to guide further content analysis or extraction.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.ordering&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;batch_ordering&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya.model.ordering.processor&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_processor&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;surya&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordering&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;load_model&lt;/span&gt;

&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IMAGE_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="n"&gt;bboxes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;bbox1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bbox2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt; 
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nf"&gt;load_processor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 

&lt;span class="c1"&gt;# Get ordered text predictions
&lt;/span&gt;&lt;span class="n"&gt;order_predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;batch_ordering&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;bboxes&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Like text detection, this function returns structured data with coordinates and descriptions of various layout components, organized in a way that reflects the physical structure of the page.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a deeper dive into Surya-OCR, an advanced OCR system, enthusiasts and developers can explore its extensive components on GitHub. This open-source project is readily accessible for those eager to understand its mechanics or contribute to its evolution. Visit &lt;a href="https://github.com/VikParuchuri/surya" rel="noopener noreferrer"&gt;Surya-OCR on GitHub&lt;/a&gt; to explore the documentation, source code, and more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limitations of Surya-OCR &amp;amp; Scope of Improvement
&lt;/h3&gt;

&lt;p&gt;Surya-OCR stands out for its impressive multilingual support and specialization in digitizing printed documents. Despite its strengths, there are a few limitations users should be aware of. Primarily, Surya-OCR is optimized for printed text and can struggle with text on complex backgrounds or in handwritten formats, potentially leading to inaccuracies.&lt;/p&gt;

&lt;p&gt;Additionally, the toolkit requires substantial GPU resources for optimal performance, with recommendations like 16GB of VRAM for batch processing. This high demand may exclude users with limited hardware capabilities. Also, issues with the confidence levels in the model's text detection could affect its reliability, especially in critical applications where accuracy is paramount.&lt;/p&gt;

&lt;p&gt;Optical Character Recognition (OCR) technology has made significant strides, evolving from simple text digitization to becoming an integral part of complex AI-driven applications. This evolution can be further enhanced by the integration with multimodal Large Language Models (LLMs), which are capable of processing and understanding information from multiple data types, including text, images, and audio.&lt;/p&gt;

&lt;p&gt;Multimodal LLMs can complement traditional OCR systems in several ways. While OCR excels at extracting raw text from images, multimodal LLMs can interpret the context within which the text appears, understanding nuances and subtleties that OCR alone might miss. This synergy allows for a more nuanced understanding of documents in contexts where text is intertwined with visual elements, such as infographics, annotated diagrams, and mixed media documents.&lt;/p&gt;

&lt;p&gt;For example, in educational materials where diagrams are annotated with textual explanations, OCR can extract the text, and the multimodal LLM can provide insights into how the text relates to the graphical content. This could be invaluable for creating accessible educational tools, where both text and visuals need to be made comprehensible to users with different needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;In my opinion, OCR has transcended its traditional role, enhanced by advancements in AI and ML, to become a cornerstone technology in our digital era. As it integrates further with fields like NLP and image recognition, OCR is expanding into dynamic applications such as real-time translation and accessibility tools, transforming how we interact with information. But there is still much to be done in the field.&lt;/p&gt;

&lt;p&gt;Multimodal Large Language Models (LLMs) represent a promising evolution in OCR technology. By combining OCR with these models, we can extract not just text but understand the context of images, making digital content more accessible and interpretable. &lt;/p&gt;

&lt;p&gt;As we continue to refine these technologies, the potential for creating seamless and intuitive user interfaces that can interpret and respond to a complex blend of textual, visual, and auditory inputs is immense. This could revolutionize the way we interact with our devices, making technology an even more integral part of everyday life.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>computervision</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
