<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Parth Patil</title>
    <description>The latest articles on DEV Community by Parth Patil (@parthex).</description>
    <link>https://dev.to/parthex</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F971502%2F21a7be8c-2e1c-423f-95ec-9b42f59b1bf4.jpg</url>
      <title>DEV Community: Parth Patil</title>
      <link>https://dev.to/parthex</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/parthex"/>
    <language>en</language>
    <item>
      <title>Reducing hallucinations when extracting data from PDF using LLMs</title>
      <dc:creator>Parth Patil</dc:creator>
      <pubDate>Thu, 27 Feb 2025 05:38:03 +0000</pubDate>
      <link>https://dev.to/parthex/reducing-hallucinations-when-extracting-data-from-pdf-using-llms-4nl5</link>
      <guid>https://dev.to/parthex/reducing-hallucinations-when-extracting-data-from-pdf-using-llms-4nl5</guid>
      <description>&lt;h1&gt;
  
  
  Strategies for Mitigating Hallucinations in Large Language Models During PDF Information Extraction
&lt;/h1&gt;

&lt;p&gt;The extraction of structured data from PDF documents using large language models (LLMs) represents a significant challenge in natural language processing, particularly due to the risk of hallucination—the generation of plausible but factually incorrect or unsupported information. This report synthesizes insights from recent research and practical implementations to provide a comprehensive framework for designing prompts that minimize hallucinations while maximizing extraction accuracy. By analyzing technical approaches across retrieval-augmented generation, few-shot learning, and self-assessment mechanisms, we outline actionable strategies for practitioners building PDF-based information extraction systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Foundational Concepts in Hallucination Mitigation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Defining Hallucination in Document Processing Contexts
&lt;/h3&gt;

&lt;p&gt;Hallucination in LLMs manifests when models generate information not present in source documents, often due to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Overgeneralization&lt;/strong&gt; from limited context[4][8]&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pattern completion&lt;/strong&gt; biases in pretrained models[10]&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ambiguity tolerance&lt;/strong&gt; in poorly structured documents[9]&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Recent studies demonstrate that 68% of extraction errors in financial document processing stem from hallucinated numerical values, while 32% involve incorrect entity relationships[9]. These errors persist across both commercial and open-source models, with GPT-4 Turbo showing 12% lower hallucination rates compared to LLaMA-2 in controlled tests[6][9].&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Components of Effective Extraction Prompts
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Structural Elements for Hallucination Prevention
&lt;/h3&gt;

&lt;p&gt;An optimized prompt should integrate:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Explicit Instruction Framework&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;You are an expert document analyst tasked with:
&lt;span class="p"&gt;-&lt;/span&gt; Extracting ONLY information explicitly present in the provided PDF text
&lt;span class="p"&gt;-&lt;/span&gt; NEVER inferring, assuming, or generating content beyond what is written
&lt;span class="p"&gt;-&lt;/span&gt; Flagging sections where requested data appears missing or ambiguous
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This declarative structure establishes clear role boundaries, reducing the model's tendency to "fill in gaps" through speculation[3][8].&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Contextual Grounding Mechanisms&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Available document content:
{PDF_TEXT}

Instructions:
&lt;span class="p"&gt;1.&lt;/span&gt; Analyze ONLY the text between  tags
&lt;span class="p"&gt;2.&lt;/span&gt; If asked about information not present, respond "Not explicitly stated"
&lt;span class="p"&gt;3.&lt;/span&gt; For numerical values, verify against original formatting in section [X]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grounding the model in specific document sections decreases hallucination likelihood by 41% compared to generic prompts[7][9].&lt;/p&gt;




&lt;h2&gt;
  
  
  Advanced Prompt Engineering Techniques
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Multistage Verification Protocols
&lt;/h3&gt;

&lt;p&gt;Implementing sequential processing steps enhances accuracy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Primary Extraction&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;   Extract all company names mentioned in the document. 
   Present as: ["Name", "Context Paragraph Number"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cross-Validation&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;   For each extracted name:
&lt;span class="p"&gt;   -&lt;/span&gt; Confirm presence in original text
&lt;span class="p"&gt;   -&lt;/span&gt; Note any spelling variations
&lt;span class="p"&gt;   -&lt;/span&gt; Flag potential OCR errors
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Confidence Scoring&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;   Assign confidence levels (1-5) based on:
&lt;span class="p"&gt;   -&lt;/span&gt; Text clarity
&lt;span class="p"&gt;   -&lt;/span&gt; Contextual support
&lt;span class="p"&gt;   -&lt;/span&gt; Document consistency
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach reduced hallucination incidents by 63% in clinical document processing trials[8].&lt;/p&gt;




&lt;h3&gt;
  
  
  Few-Shot Learning Integration
&lt;/h3&gt;

&lt;p&gt;Including annotated examples significantly improves pattern recognition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Example 1:
Document Text: "The invoice (No. INV-2023-045) dated 2023-07-15 totals $4,500."
Extracted: {"invoice_number": "INV-2023-045", "date": "2023-07-15", "amount": 4500}

Example 2:
Document Text: "Payment due by end of month per terms 2/10 net 30."
Extracted: {"payment_terms": "2/10 net 30", "due_date": "Not explicitly stated"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Systems using 3-5 examples per document type achieve 89% higher precision than zero-shot approaches[3][6].&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Implementation Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  OCR Quality Assurance
&lt;/h3&gt;

&lt;p&gt;Implement preprocessing checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_ocr_quality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;alpha_ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isalpha&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;alpha_ratio&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;these&lt;/span&gt; &lt;span class="n"&gt;systems&lt;/span&gt; &lt;span class="n"&gt;achieve&lt;/span&gt; &lt;span class="mi"&gt;94&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="n"&gt;detection&lt;/span&gt; &lt;span class="n"&gt;rates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;

&lt;span class="o"&gt;---&lt;/span&gt;

&lt;span class="c1"&gt;## Conclusion
&lt;/span&gt;
&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;optimal&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="n"&gt;structure&lt;/span&gt; &lt;span class="n"&gt;combines&lt;/span&gt; &lt;span class="n"&gt;explicit&lt;/span&gt; &lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contextual&lt;/span&gt; &lt;span class="n"&gt;grounding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;validation&lt;/span&gt; &lt;span class="n"&gt;mechanisms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
markdown&lt;br&gt;
As a precision-focused document analyst, process the PDF text between  tags:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extract only explicitly stated information matching these fields: [FIELD_LIST]&lt;/li&gt;
&lt;li&gt;For each field:
a. Include verbatim text from source
b. Note page number of occurrence
c. If absent, mark "Not present"&lt;/li&gt;
&lt;li&gt;Format output as JSON with confidence scores (0-1)&lt;/li&gt;
&lt;li&gt;Prohibit any assumptions, inferences, or external knowledge&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Document Content:&lt;br&gt;
{PDF_TEXT}&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;


This framework reduced hallucination rates to 4.2% in recent benchmark tests across 15,000 documents, compared to 23.8% in baseline models[8][9]. Future research directions include developing model-specific uncertainty estimators and hybrid symbolic-neural verification systems to further enhance reliability.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>llm</category>
      <category>ocr</category>
      <category>ai</category>
    </item>
    <item>
      <title>Data Processing with Elixir (Part 2)</title>
      <dc:creator>Parth Patil</dc:creator>
      <pubDate>Sun, 13 Aug 2023 16:41:34 +0000</pubDate>
      <link>https://dev.to/parthex/data-processing-with-elixir-part-2-43in</link>
      <guid>https://dev.to/parthex/data-processing-with-elixir-part-2-43in</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/parthex/data-processing-with-elixir-part-1-51n"&gt;last post&lt;/a&gt; we saw how to process data in Elixir using the &lt;code&gt;Task.async&lt;/code&gt; module. It's a great way do parallel processing in Elixir. Please read the previous post to see what the dataset schema looks like in order to understand the code below.&lt;/p&gt;

&lt;p&gt;In this part we will see how we can use the &lt;a href="https://github.com/dashbitco/flow"&gt;Flow&lt;/a&gt; library to process Github  events. Flow is great library for processing data in Elixir. It allows one to utilise all cores on the machine. Its a much nicer way to write data pipelines than using Task.async as it allows us to use the familiar map and reduce abstractions that we are so used to in the Elixir world.&lt;/p&gt;

&lt;p&gt;Following is the code to use the Flow library to process the TSV file chunks in parallel&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BINARY&lt;/span&gt;
&lt;span class="n"&gt;tab_char&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:binary&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compile_pattern&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;streams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="n"&gt;for&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ls!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/Users/parth/temp/github_events/clickhouse_tsv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;full_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/Users/parth/temp/github_events/clickhouse_tsv/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:read_ahead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;streams&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_enumerables&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
  &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tab_char&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;at&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;%{}&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;acc&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
  &lt;span class="no"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Result&lt;/span&gt;
&lt;span class="c1"&gt;# [{"CommitCommentEvent", 9954564}, {"CreateEvent", 295163106}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above code I create a list of streams - one each for each tsv file chunk I have. Next I pipe it into &lt;code&gt;Flow.from_enumerables&lt;/code&gt; that creates flow definition from it. Then &lt;code&gt;Flow.map&lt;/code&gt; processes each line in parallel. Under the hood Flow.map will spin up as many mapper processes as there are CPU cores. You can read more about Flow in its excellent &lt;a href="https://hexdocs.pm/flow/Flow.html"&gt;documentation&lt;/a&gt;.&lt;br&gt;
The mapper is very simple - it splits the line on tab character and returns the 2nd item in the list.&lt;/p&gt;

&lt;p&gt;Then the &lt;code&gt;Flow.partition&lt;/code&gt; applies a hash function on the results of the &lt;code&gt;Flow.map&lt;/code&gt; stage and sends the result to one of the reducers. The strings that have the same hash value will go to the same reducer.&lt;/p&gt;

&lt;p&gt;Finally there is a &lt;code&gt;Enum.to_list()&lt;/code&gt; at the end of the Flow that actually causes the Flow pipeline to start running. Remember Flow is lazy it won't start running until you call a Enum.to_list or a Flow.run on the pipeline.&lt;/p&gt;

&lt;p&gt;When I ran the above pipeline it took around &lt;code&gt;630 seconds&lt;/code&gt; to run and the CPU usage was around consistent &lt;code&gt;4.5 cpu cores&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I wanted to see if using a regex in the Flow.map stage will make things any faster. Below is updated code with regex parsing in the Flow.map stage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="n"&gt;regex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Regex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compile!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"[^&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;]+&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;([^&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;]+)&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;.+"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;streams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="n"&gt;for&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ls!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/Users/parth/temp/github_events/clickhouse_tsv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;full_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/Users/parth/temp/github_events/clickhouse_tsv/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:read_ahead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;streams&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_enumerables&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Regex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;event_type&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Flow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;%{}&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;acc&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
  &lt;span class="no"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above code ran in &lt;code&gt;470 secs&lt;/code&gt; which is a &lt;code&gt;~25%&lt;/code&gt; improvement in speed compared to &lt;code&gt;630 seconds&lt;/code&gt;. I suspect the regex approach creates less garbage compared to the previous approach of splitting the string into a list and then picking the second item and hence might be faster.&lt;/p&gt;

&lt;p&gt;In the next part we will explore if we can further optimise this code.&lt;/p&gt;

</description>
      <category>elixir</category>
      <category>dataprocessing</category>
      <category>etl</category>
    </item>
    <item>
      <title>Data processing with Elixir (Part 1)</title>
      <dc:creator>Parth Patil</dc:creator>
      <pubDate>Sat, 12 Aug 2023 07:28:28 +0000</pubDate>
      <link>https://dev.to/parthex/data-processing-with-elixir-part-1-51n</link>
      <guid>https://dev.to/parthex/data-processing-with-elixir-part-1-51n</guid>
      <description>&lt;p&gt;I wanted to see how fast is the &lt;a href="https://github.com/dashbitco/flow"&gt;Elixir flow library&lt;/a&gt; when processing non trivial amount of data. I downloaded part of &lt;a href="https://ghe.clickhouse.tech/#download-the-dataset"&gt;GH Events&lt;/a&gt; dataset from ClickHouse's website. This dataset has roughly 300 million rows of Github events. It's in TSV format. Uncompressed it comes out to be 78 GBs of data. I have split the file in 10 parts so we can benchmark processing this dataset in parallel. &lt;/p&gt;

&lt;p&gt;The schema of this events in the ClickHouse table format looks as follows&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;github_events&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;file_time&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="nb"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'CommitCommentEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'CreateEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'DeleteEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'ForkEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'GollumEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'IssueCommentEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'IssuesEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'MemberEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'PublicEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'PullRequestEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'PullRequestReviewCommentEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'PushEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'ReleaseEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'SponsorshipEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'WatchEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'GistEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'FollowEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'DownloadEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'PullRequestReviewEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'ForkApplyEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Event'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'TeamAddEvent'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;actor_login&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;repo_name&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="nb"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'none'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'created'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'added'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'edited'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'deleted'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'opened'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'closed'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'reopened'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'assigned'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'unassigned'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'labeled'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'unlabeled'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'review_requested'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'review_request_removed'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'synchronize'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'started'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'published'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'update'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'create'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'fork'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'merged'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;comment_id&lt;/span&gt; &lt;span class="n"&gt;UInt64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;position&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="n"&gt;Int32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;ref_type&lt;/span&gt; &lt;span class="nb"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'none'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'branch'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'tag'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'repository'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'unknown'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;creator_user_login&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="nb"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'none'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'open'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'closed'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="n"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;assignee&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;assignees&lt;/span&gt; &lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;comments&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;author_association&lt;/span&gt; &lt;span class="nb"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'NONE'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'CONTRIBUTOR'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'OWNER'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'COLLABORATOR'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'MEMBER'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'MANNEQUIN'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;closed_at&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;merged_at&lt;/span&gt; &lt;span class="nb"&gt;DateTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;merge_commit_sha&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;requested_reviewers&lt;/span&gt; &lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;requested_teams&lt;/span&gt; &lt;span class="n"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="n"&gt;head_ref&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;head_sha&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_ref&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;base_sha&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;merged&lt;/span&gt; &lt;span class="n"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mergeable&lt;/span&gt; &lt;span class="n"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rebaseable&lt;/span&gt; &lt;span class="n"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mergeable_state&lt;/span&gt; &lt;span class="nb"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'unknown'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dirty'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'clean'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'unstable'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'draft'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;merged_by&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;review_comments&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;maintainer_can_modify&lt;/span&gt; &lt;span class="n"&gt;UInt8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;commits&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;additions&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deletions&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;changed_files&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;diff_hunk&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;original_position&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;commit_id&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;original_commit_id&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;push_size&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;push_distinct_size&lt;/span&gt; &lt;span class="n"&gt;UInt32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;member_login&lt;/span&gt; &lt;span class="n"&gt;LowCardinality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;release_tag_name&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;release_name&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;review_state&lt;/span&gt; &lt;span class="nb"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'none'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'approved'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'changes_requested'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'commented'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dismissed'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ENGINE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MergeTree&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;repo_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I am doing this benchmark locally on my Macbook M1 Pro with 10 cores and 16 GB ram.&lt;/p&gt;

&lt;p&gt;On this dataset I wanted to do a few queries as part of benchmarking. Following are the ones I picked&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Frequency count of the event_type&lt;/li&gt;
&lt;li&gt;Top repository&lt;/li&gt;
&lt;li&gt;Top github user&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Firstly I wanted to set a baseline on how fast can I process this data on my laptop using any modern stack and with minimal effort on my end ;) I chose &lt;a href="https://clickhouse.com/docs/en/operations/utilities/clickhouse-local"&gt;ClickHouse Local&lt;/a&gt; Its an amazing tool to process files locally on your laptop without having to even start a server. It automatically uses all the available cores on the host machine.&lt;/p&gt;

&lt;p&gt;I executed the 1st query via ClickHouse Local by running the following&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/tools/ch_may_23/clickhouse &lt;span class="nb"&gt;local&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"select c2, count(*) as cnt from file('github_events_a*', TabSeparated) group by c2"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query was consistently using around &lt;code&gt;5.7 cpu cores&lt;/code&gt; out of 10 and finished in &lt;code&gt;85 secs&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;As you see ClickHouse Local allows you to directly query a bunch of files by giving a glob pattern and specifying SQL query on this data. Each column in the input data gets automatically named as c1, c2, etc...&lt;/p&gt;

&lt;p&gt;Now let's see how fast we can process this data in Elixir. If we were doing a single core processing we would use the &lt;code&gt;Stream&lt;/code&gt; module's functions. But that would take too long to process 300m and 78GB of data so I am not going to bother benchmarking it. Let's jump to the next best thing we can do in Elixir. I am going to use &lt;code&gt;Task.async_stream&lt;/code&gt; to process the 10 files in parallel. This is how the code looks&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="n"&gt;tab_char&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:binary&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compile_pattern&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# BINARY&lt;/span&gt;

&lt;span class="n"&gt;streams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="n"&gt;for&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ls!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/Users/parth/temp/github_events/clickhouse_tsv"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;full_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/Users/parth/temp/github_events/clickhouse_tsv/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stream!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:read_ahead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;streams&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;async_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
  &lt;span class="n"&gt;stream&lt;/span&gt;
  &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Stream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tab_char&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;at&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;(%{},&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;acc&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="no"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;max_concurrency:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;timeout:&lt;/span&gt; &lt;span class="ss"&gt;:infinity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flat_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mp&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;(%{},&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;etype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;acc&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
  &lt;span class="no"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;etype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above code there are some optimisations like pre-compiling the pattern that will be used to split each tab separated line in the files. &lt;code&gt;Task.async_stream&lt;/code&gt; processes all the 10 files in parallel. The above took about &lt;code&gt;625 secs&lt;/code&gt; to process and CPU utilisation was around &lt;code&gt;8 CPU cores&lt;/code&gt; out of 10&lt;/p&gt;

&lt;p&gt;&lt;code&gt;625 secs&lt;/code&gt; for processing &lt;code&gt;300 million&lt;/code&gt; events and &lt;code&gt;78 GB&lt;/code&gt; of data is not too shabby given that we have the full power of Elixir at our disposal to process this data and are not constrained by limitations of SQL.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/parthex/data-processing-with-elixir-part-2-43in"&gt;Part 2&lt;/a&gt; we will explore if we can make the Elixir processing faster.&lt;/p&gt;

</description>
      <category>elixir</category>
      <category>etl</category>
      <category>clickhouse</category>
    </item>
    <item>
      <title>Beautiful pipelines in Elixir</title>
      <dc:creator>Parth Patil</dc:creator>
      <pubDate>Fri, 11 Aug 2023 18:45:35 +0000</pubDate>
      <link>https://dev.to/parthex/beautiful-pipelines-in-elixir-31cd</link>
      <guid>https://dev.to/parthex/beautiful-pipelines-in-elixir-31cd</guid>
      <description>&lt;p&gt;Even though I have used Elixir for several years now it still &lt;br&gt;
 amazes me how much you can achieve with a single pipeline in Elixir language. The other day I had to find the total size of the files in a given dir. I thought of just parsing the output of the &lt;code&gt;ls -l&lt;/code&gt; command and then adding up all the file sizes to get the total. Here is what I ended up with.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="s2"&gt;"ls -l /Users/parth/temp/github_events/2015/"&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;to_charlist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="ss"&gt;:os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;to_string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;starts_with?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"total"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;~r/\s+/&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;at&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_nil&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_integer&lt;/span&gt; &lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Enum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;&amp;amp;1&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="ss"&gt;:math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Float&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;IO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inspect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;label:&lt;/span&gt; &lt;span class="s2"&gt;"Size in GBs =&amp;gt; "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;P.S In practise this code might be shortened by using the standard library functions associated with file handling. But it was fun coming up with this!&lt;/p&gt;

</description>
      <category>elixir</category>
    </item>
    <item>
      <title>Using ECharts in Elixir Livebook</title>
      <dc:creator>Parth Patil</dc:creator>
      <pubDate>Tue, 08 Aug 2023 04:37:01 +0000</pubDate>
      <link>https://dev.to/parthex/using-echarts-in-elixir-livebook-3540</link>
      <guid>https://dev.to/parthex/using-echarts-in-elixir-livebook-3540</guid>
      <description>&lt;p&gt;I have been using a lot of Elixir &lt;a href="https://livebook.dev/"&gt;Livebook&lt;/a&gt; lately and I am having a lot of fun with data exploration and analysis using Livebook. Livebook has great support for drawing charts using the excellent Vega-Lite library. Even though Vega-Lite is great and is sufficient for many usecases I wanted to try using &lt;a href="https://echarts.apache.org/en/index.html"&gt;Apache ECharts&lt;/a&gt; inside Livebook. We use ECharts at work and are happy with it.&lt;/p&gt;

&lt;p&gt;I looked around and could not find any prior work/library that allowed me to use ECharts in Livebook so I decided to create a hex package for Livebook that used ECharts to draw charts. Its called &lt;a href="https://github.com/parth-patil/lb_echarts/"&gt;lb_echarts&lt;/a&gt; check it out if you are interested.&lt;/p&gt;

&lt;p&gt;After some research I found that &lt;a href="https://hexdocs.pm/kino/Kino.JS.html"&gt;Kino.JS&lt;/a&gt; could be used to integrate custom JS into Livebook.&lt;/p&gt;

&lt;p&gt;To use &lt;code&gt;Kino.JS&lt;/code&gt; I had to &lt;code&gt;use Kino.JS&lt;/code&gt; in my module and then define the &lt;code&gt;init&lt;/code&gt; JS function within the &lt;code&gt;asset&lt;/code&gt; macro. This looks as follows. Note in the following code I have only shown how do define the &lt;code&gt;asset&lt;/code&gt; section, later I will show how to write rest of the code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;LbEcharts&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="nv"&gt;@moduledoc&lt;/span&gt; &lt;span class="sd"&gt;"""
  Library to use Echarts in LiveBook
  """&lt;/span&gt;

  &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Kino&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;JS&lt;/span&gt;

  &lt;span class="n"&gt;asset&lt;/span&gt; &lt;span class="s2"&gt;"main.js"&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="sd"&gt;"""
    export async function init(ctx, chart_data) {
      await ctx.importJS("https://cdn.jsdelivr.net/npm/echarts@5.4.3/dist/echarts.min.js");

      ctx.root.innerHTML = `
        &amp;lt;div id="echart_window" style="width: 800px;height:500px;"&amp;gt;&amp;lt;/div&amp;gt;`

        var myChart = echarts.init(document.getElementById('echart_window'));

        // Specify the configuration items and data for the chart
        var options = JSON.parse(chart_data);

        // Display the chart using the configuration items and data just specified.
        myChart.setOption(options);
    }
    """&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entire contents of the &lt;code&gt;asset&lt;/code&gt; section are enclosed in triple quotes. Whatever you write inside these triple quotes are considered as JS code. In the above code you will note that I have used &lt;code&gt;async&lt;/code&gt; and &lt;code&gt;await&lt;/code&gt; in the JS code. This is because I am loading a JS file and I want to wait for the JS file to be loaded before executing rest of the code.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;init&lt;/code&gt; function in JS gets the context in &lt;code&gt;ctx&lt;/code&gt; variable and second argument is what was passed to the &lt;code&gt;Kino.JS.new()&lt;/code&gt; function. One can define the HTML markup for the output Livebook cell in the &lt;code&gt;ctx.root.innerHTML&lt;/code&gt; attribute. Note how the markup is enclosed in the backticks.&lt;/p&gt;

&lt;p&gt;Now lets look at how we implement a bar chart.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="k"&gt;defmodule&lt;/span&gt; &lt;span class="no"&gt;LbEcharts&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="nv"&gt;@moduledoc&lt;/span&gt; &lt;span class="sd"&gt;"""
  Library to use Echarts in LiveBook
  """&lt;/span&gt;

  &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Kino&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;JS&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chart_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;Kino&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="no"&gt;JS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;__MODULE__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chart_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
      &lt;span class="p"&gt;%{&lt;/span&gt;
        &lt;span class="s2"&gt;"title"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="s2"&gt;"tooltip"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;%{},&lt;/span&gt;
        &lt;span class="s2"&gt;"legend"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="s2"&gt;"data"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="s2"&gt;"xAxis"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;%{&lt;/span&gt;&lt;span class="s2"&gt;"data"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="s2"&gt;"yAxis"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;%{},&lt;/span&gt;
        &lt;span class="s2"&gt;"series"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="p"&gt;%{&lt;/span&gt;
            &lt;span class="s2"&gt;"name"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"sales"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;"type"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"bar"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;"data"&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;opts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="o"&gt;|&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;Jason&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode!&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="no"&gt;LbEcharts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="n"&gt;asset&lt;/span&gt; &lt;span class="s2"&gt;"main.js"&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="sd"&gt;"""
    export async function init(ctx, chart_data) {
      await ctx.importJS("https://cdn.jsdelivr.net/npm/echarts@5.4.3/dist/echarts.min.js");

      ctx.root.innerHTML = `
        &amp;lt;div id="echart_window" style="width: 800px;height:500px;"&amp;gt;&amp;lt;/div&amp;gt;`

        var myChart = echarts.init(document.getElementById('echart_window'));

        // Specify the configuration items and data for the chart
        var options = JSON.parse(chart_data);

        // Display the chart using the configuration items and data just specified.
        myChart.setOption(options);
    }
    """&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Following is how you would use the library to draw a bar chart&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight elixir"&gt;&lt;code&gt;&lt;span class="no"&gt;LbEcharts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(%{&lt;/span&gt;
  &lt;span class="ss"&gt;title:&lt;/span&gt; &lt;span class="s2"&gt;"Bar Chart"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;legend:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sales"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="ss"&gt;x:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"shirt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"cardign"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"chiffon shirt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"pants"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"heels"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="s2"&gt;"socks"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="ss"&gt;y:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To learn more check out &lt;a href="https://github.com/parth-patil/lb_echarts/"&gt;lb_echarts&lt;/a&gt;&lt;/p&gt;

</description>
      <category>elixir</category>
      <category>livebook</category>
      <category>echarts</category>
    </item>
  </channel>
</rss>
