<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ppaanngggg</title>
    <description>The latest articles on DEV Community by ppaanngggg (@ppaanngggg).</description>
    <link>https://dev.to/ppaanngggg</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1350534%2F688b3652-01f1-4845-9173-61b268ca0d1e.jpeg</url>
      <title>DEV Community: ppaanngggg</title>
      <link>https://dev.to/ppaanngggg</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ppaanngggg"/>
    <language>en</language>
    <item>
      <title>YOLOv12: The Next Evolution in Document Layout Analysis</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Mon, 07 Apr 2025 14:47:33 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/yolov12-the-next-evolution-in-document-layout-analysis-2o2a</link>
      <guid>https://dev.to/ppaanngggg/yolov12-the-next-evolution-in-document-layout-analysis-2o2a</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Building upon the success of &lt;a href="https://docs.ultralytics.com/models/yolo11/" rel="noopener noreferrer"&gt;YOLOv11&lt;/a&gt;, which demonstrated significant improvements over &lt;a href="https://docs.ultralytics.com/models/yolov8/" rel="noopener noreferrer"&gt;YOLOv8&lt;/a&gt; in document layout analysis, &lt;a href="https://docs.ultralytics.com/models/yolo12/#overview" rel="noopener noreferrer"&gt;YOLOv12&lt;/a&gt; introduces several architectural innovations and optimization techniques that further advance the field. I retrained the complete YOLOv12 series on the &lt;a href="https://github.com/DS4SD/DocLayNet" rel="noopener noreferrer"&gt;DocLayNet&lt;/a&gt; dataset.&lt;/p&gt;

&lt;p&gt;The project uses my codebase &lt;a href="https://github.com/ppaanngggg/yolo-doclaynet" rel="noopener noreferrer"&gt;yolo-doclaynet&lt;/a&gt;. You can find all free models on &lt;a href="https://huggingface.co/hantian/yolo-doclaynet" rel="noopener noreferrer"&gt;huggingface&lt;/a&gt;, while the largest model is available &lt;a href="https://buymeacoffee.com/ppaanngggg/e/384798" rel="noopener noreferrer"&gt;here&lt;/a&gt; (trained using rented GPU resources).&lt;/p&gt;

&lt;h1&gt;
  
  
  Key Improvements in YOLOv12
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Area Attention Mechanism&lt;/strong&gt;: This feature efficiently processes large receptive fields by dividing feature maps into equal-sized regions, typically 4. This approach maintains effectiveness while significantly reducing computational costs compared to standard self-attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Residual Efficient Layer Aggregation Networks (R-ELAN)&lt;/strong&gt;: This improved feature aggregation module introduces block-level residual connections with scaling and a redesigned bottleneck-like structure for better optimization in large-scale attention models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimized Attention Architecture&lt;/strong&gt;: This streamlined attention mechanism incorporates multiple efficiency improvements, including FlashAttention, removed positional encoding, and adjusted MLP ratios. It also uses a 7x7 separable convolution as a "position perceiver" and leverages strategic convolution operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Experimental Results
&lt;/h1&gt;

&lt;p&gt;Using the DocLayNet dataset, we conducted comprehensive evaluations of all YOLOv12 variants compared to previous series such as YOLOv11 and YOLOv8. The results show that YOLOv12 significantly outperforms YOLOv8. Smaller YOLOv12 models (nano, small, and medium) also demonstrate substantial improvements over YOLOv11, though larger models (Large and Extra) show comparable performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Metrics
&lt;/h2&gt;

&lt;p&gt;The figure shows the comparative performance metrics (model size and mAP scores) across different YOLO versions, from YOLOv8 to YOLOv12, demonstrating the evolution and improvements in model efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1n17fv0r06hwjpfrxrwe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1n17fv0r06hwjpfrxrwe.png" alt="Image description" width="800" height="590"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The table above compares model sizes (in millions of parameters) and mAP scores across different YOLO versions from v8 to v12.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Size/Model&lt;/th&gt;
&lt;th&gt;YOLOv12&lt;/th&gt;
&lt;th&gt;YOLOv11&lt;/th&gt;
&lt;th&gt;YOLOv10&lt;/th&gt;
&lt;th&gt;YOLOv9&lt;/th&gt;
&lt;th&gt;YOLOv8&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Nano&lt;/td&gt;
&lt;td&gt;2.6M/0.756&lt;/td&gt;
&lt;td&gt;2.6M/0.735&lt;/td&gt;
&lt;td&gt;2.3M/0.730&lt;/td&gt;
&lt;td&gt;2.0M/0.737&lt;/td&gt;
&lt;td&gt;3.2M/0.718&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;9.3M/0.782&lt;/td&gt;
&lt;td&gt;9.4M/0.767&lt;/td&gt;
&lt;td&gt;7.2M/0.762&lt;/td&gt;
&lt;td&gt;7.2M/0.766&lt;/td&gt;
&lt;td&gt;11.2M/0.752&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;20.2M/0.788&lt;/td&gt;
&lt;td&gt;20.1M/0.781&lt;/td&gt;
&lt;td&gt;15.4M/0.780&lt;/td&gt;
&lt;td&gt;20.1M/0.775&lt;/td&gt;
&lt;td&gt;25.9M/0.775&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large&lt;/td&gt;
&lt;td&gt;26.4M/0.792&lt;/td&gt;
&lt;td&gt;25.3M/0.793&lt;/td&gt;
&lt;td&gt;24.4M/0.790&lt;/td&gt;
&lt;td&gt;25.5M/0.782&lt;/td&gt;
&lt;td&gt;43.7M/0.783&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extra&lt;/td&gt;
&lt;td&gt;59.1M/0.794&lt;/td&gt;
&lt;td&gt;56.9M/0.794&lt;/td&gt;
&lt;td&gt;29.5M/0.793&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;68.2M/0.787&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Key Findings
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consistent Small Model Improvements:&lt;/strong&gt; YOLOv12 shows notable performance gains in nano through medium variants, with mAP improvements of up to 0.021 points compared to YOLOv11.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient Area Attention:&lt;/strong&gt; The new Area Attention Mechanism successfully reduces computational complexity while maintaining high accuracy, particularly evident in the nano and small models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parameter Efficiency:&lt;/strong&gt; YOLOv12 achieves superior or comparable performance with significantly fewer parameters than YOLOv8, with the large model using only 26.4M parameters compared to YOLOv8's 43.7M.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Competitive Large Model Performance:&lt;/strong&gt; Larger variants (Large and Extra) maintain performance parity with YOLOv11, demonstrating mAP scores of 0.792 and 0.794 respectively.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;YOLOv12 represents a significant step forward in document layout analysis, offering improved accuracy and efficiency across all model sizes. The series demonstrates particular strength in handling complex document structures while maintaining real-time performance capabilities.&lt;/p&gt;

&lt;p&gt;The nano and small models show substantial improvements with minimal computational cost, making them ideal for mobile devices and other edge computing applications requiring document layout analysis.&lt;/p&gt;

</description>
      <category>computervision</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Design0: Effortless Design for Everyone</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Wed, 06 Nov 2024 01:36:54 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/design0-effortless-design-for-everyone-41c0</link>
      <guid>https://dev.to/ppaanngggg/design0-effortless-design-for-everyone-41c0</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/pgai"&gt;Open Source AI Challenge with pgai and Ollama &lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Cover image is designed by Desigin0!&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;As a non-professional designer seeking to create attractive posts and cover images for my blogs and projects, I envisioned a tool that would allow me to select a base image, highlight specific areas, and use natural language to instruct AI on desired edits, overcoming the unpredictability of current text-to-image generators.&lt;/p&gt;

&lt;p&gt;Enter Design0—an AI-powered design tool I built to simplify image editing with natural language commands. Using Design0, you can search an image database by description (I've included 5,000 free Unsplash images for this demo). Once you've found your image, simply drag and drop to mask areas for editing, then write prompts describing your desired changes. Click "Edit" and watch the magic unfold! 🎉&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Website
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://design0.ai" rel="noopener noreferrer"&gt;https://design0.ai&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Source Code
&lt;/h3&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.dev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/design0webapp" rel="noopener noreferrer"&gt;
        design0webapp
      &lt;/a&gt; / &lt;a href="https://github.com/design0webapp/design0" rel="noopener noreferrer"&gt;
        design0
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Effortless Design for Everyone 
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Design0&lt;/h1&gt;

&lt;/div&gt;
&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/pgai" rel="nofollow"&gt;Open Source AI Challenge with pgai and Ollama&lt;/a&gt;, read more about it &lt;a href="https://dev.to/ppaanngggg/design0-effortless-design-for-everyone-41c0" rel="nofollow"&gt; Design0: Effortless Design for Everyone &lt;/a&gt; .&lt;/em&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Website&lt;/h2&gt;

&lt;/div&gt;
&lt;p&gt;&lt;a href="https://design0.app" rel="nofollow noopener noreferrer"&gt;https://design0.app&lt;/a&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;What I Built&lt;/h2&gt;

&lt;/div&gt;
&lt;p&gt;As a non-professional designer seeking to create attractive posts and cover images for my blogs and projects, I envisioned a tool that would allow me to select a base image, highlight specific areas, and use natural language to instruct AI on desired edits, overcoming the unpredictability of current text-to-image generators.&lt;/p&gt;
&lt;p&gt;Enter Design0—an AI-powered design tool I built to simplify image editing with natural language commands. Using Design0, you can search an image database by description. Once you've found your image, simply drag and drop to mask areas for editing, then write prompts describing your desired changes. Click "Edit" and watch the magic unfold! 🎉&lt;/p&gt;
&lt;/div&gt;



&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/design0webapp/design0" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;


&lt;h3&gt;
  
  
  Screenshots
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Search for the base image you want&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrmrq3whkjlk1cqs2j95.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrmrq3whkjlk1cqs2j95.png" alt="homepage" width="800" height="662"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Select the area you want to edit, then enter your prompt&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatmn95zim6j28h3zimjg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatmn95zim6j28h3zimjg.png" alt="edit and prompt" width="800" height="1309"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click "Edit" and wait for the new image to generate&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F608rbu4n0k9dfzm7an5x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F608rbu4n0k9dfzm7an5x.png" alt="final image" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Tools Used
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;pgvector&lt;/code&gt;: I use pgvector to store image embeddings as vector data types. I also added an HNSW index on the embeddings to accelerate search performance.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;          &lt;span class="nb"&gt;varchar&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;         &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;    &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;   &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;images_embedding_idx&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;
    &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;pgai&lt;/code&gt;: I use pgai to invoke Ollama's embed API and search for images based on the distance between embeddings.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT id,url,category,description,embedding&amp;lt;=&amp;gt;ai.ollama_embed(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, %s, host=&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ollama_host&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;) as distance FROM images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Ollama&lt;/code&gt;: I use Ollama to host the state-of-the-art, compact text embedding model &lt;code&gt;nomic-embed-text&lt;/code&gt;. This model embeds both image descriptions and user queries, allowing us to search for images that match users' requirements.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This demo showcases my 2D image design tool. I aim to develop it into a fully functional SaaS platform, empowering non-professional users to create stunning designs on their own.&lt;/p&gt;

&lt;p&gt;I use Ollama to host the embedding model, which qualifies me for the 'Open-source Models from Ollama' prize.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>pgaichallenge</category>
      <category>database</category>
      <category>ai</category>
    </item>
    <item>
      <title>YOLOv11: A New Breakthrough in Document Layout Analysis</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Wed, 30 Oct 2024 07:50:38 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/yolov11-a-new-breakthrough-in-document-layout-analysis-2n2p</link>
      <guid>https://dev.to/ppaanngggg/yolov11-a-new-breakthrough-in-document-layout-analysis-2n2p</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As mentioned in the &lt;a href="https://dev.to/ppaanngggg/how-to-analyze-document-layout-by-yolo-3jd"&gt;previous blog post&lt;/a&gt;, YOLOv8 performs exceptionally well in Document Layout Analysis. I trained all models from the YOLOv8 series by &lt;a href="https://github.com/DS4SD/DocLayNet" rel="noopener noreferrer"&gt;DocLayNet&lt;/a&gt; dataset and found that even the smallest model achieves an overall mAP50-95 of 71.8, while the largest model reaches an impressive 78.7.&lt;/p&gt;

&lt;p&gt;Recently, Ultralytics released &lt;a href="https://docs.ultralytics.com/models/yolo11/" rel="noopener noreferrer"&gt;YOLOv11&lt;/a&gt;, the latest iteration in their YOLO series of real-time object detectors. This new version brings significant improvements to both architecture and training methods.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcspt72krvspfpqmaqw0u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcspt72krvspfpqmaqw0u.png" alt="benchmark from Ultralytics" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🚀 The results look promising! I decided to train all YOLOv11 models on the DocLayNet dataset again and compare them with the previous YOLOv8 series.&lt;/p&gt;

&lt;h2&gt;
  
  
  Training Method
&lt;/h2&gt;

&lt;p&gt;For this experiment, I continued to use my repository &lt;a href="https://github.com/ppaanngggg/yolo-doclaynet" rel="noopener noreferrer"&gt;https://github.com/ppaanngggg/yolo-doclaynet&lt;/a&gt; to prepare the data and train the models using my custom scripts. This approach ensures consistency in the data preparation and training process, allowing for a fair comparison between YOLOv8 and YOLOv11 models.&lt;/p&gt;

&lt;p&gt;The training and evaluation process for YOLOv11 models is straightforward and can be executed with simple command-line instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# To train the model&lt;/span&gt;
python train.py &lt;span class="o"&gt;{&lt;/span&gt;base-model&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# To evaluate the model&lt;/span&gt;
python eval.py &lt;span class="o"&gt;{&lt;/span&gt;path-to-your-trained-model&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Comparing the Results
&lt;/h2&gt;

&lt;p&gt;Here is the detailed evaluation table comparing YOLOv8 models with YOLOv11:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;label&lt;/th&gt;
&lt;th&gt;boxes&lt;/th&gt;
&lt;th&gt;yolov8n&lt;/th&gt;
&lt;th&gt;yolov11n&lt;/th&gt;
&lt;th&gt;yolov8s&lt;/th&gt;
&lt;th&gt;yolov11s&lt;/th&gt;
&lt;th&gt;yolov8m&lt;/th&gt;
&lt;th&gt;yolov11m&lt;/th&gt;
&lt;th&gt;yolov8l&lt;/th&gt;
&lt;th&gt;yolov11l&lt;/th&gt;
&lt;th&gt;yolov8x&lt;/th&gt;
&lt;th&gt;yolov11x&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Params (M)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;3.2&lt;/td&gt;
&lt;td&gt;2.6&lt;/td&gt;
&lt;td&gt;11.2&lt;/td&gt;
&lt;td&gt;9.4&lt;/td&gt;
&lt;td&gt;25.9&lt;/td&gt;
&lt;td&gt;20.1&lt;/td&gt;
&lt;td&gt;43.7&lt;/td&gt;
&lt;td&gt;25.3&lt;/td&gt;
&lt;td&gt;68.2&lt;/td&gt;
&lt;td&gt;56.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caption&lt;/td&gt;
&lt;td&gt;1542&lt;/td&gt;
&lt;td&gt;0.682&lt;/td&gt;
&lt;td&gt;0.717&lt;/td&gt;
&lt;td&gt;0.721&lt;/td&gt;
&lt;td&gt;0.744&lt;/td&gt;
&lt;td&gt;0.746&lt;/td&gt;
&lt;td&gt;0.746&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.772&lt;/td&gt;
&lt;td&gt;0.753&lt;/td&gt;
&lt;td&gt;0.765&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Footnote&lt;/td&gt;
&lt;td&gt;387&lt;/td&gt;
&lt;td&gt;0.614&lt;/td&gt;
&lt;td&gt;0.634&lt;/td&gt;
&lt;td&gt;0.669&lt;/td&gt;
&lt;td&gt;0.683&lt;/td&gt;
&lt;td&gt;0.696&lt;/td&gt;
&lt;td&gt;0.701&lt;/td&gt;
&lt;td&gt;0.702&lt;/td&gt;
&lt;td&gt;0.715&lt;/td&gt;
&lt;td&gt;0.717&lt;/td&gt;
&lt;td&gt;0.71&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Formula&lt;/td&gt;
&lt;td&gt;1966&lt;/td&gt;
&lt;td&gt;0.655&lt;/td&gt;
&lt;td&gt;0.673&lt;/td&gt;
&lt;td&gt;0.695&lt;/td&gt;
&lt;td&gt;0.705&lt;/td&gt;
&lt;td&gt;0.723&lt;/td&gt;
&lt;td&gt;0.729&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.747&lt;/td&gt;
&lt;td&gt;0.765&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;List-item&lt;/td&gt;
&lt;td&gt;10521&lt;/td&gt;
&lt;td&gt;0.789&lt;/td&gt;
&lt;td&gt;0.81&lt;/td&gt;
&lt;td&gt;0.818&lt;/td&gt;
&lt;td&gt;0.836&lt;/td&gt;
&lt;td&gt;0.836&lt;/td&gt;
&lt;td&gt;0.843&lt;/td&gt;
&lt;td&gt;0.841&lt;/td&gt;
&lt;td&gt;0.847&lt;/td&gt;
&lt;td&gt;0.841&lt;/td&gt;
&lt;td&gt;0.845&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page-footer&lt;/td&gt;
&lt;td&gt;3987&lt;/td&gt;
&lt;td&gt;0.588&lt;/td&gt;
&lt;td&gt;0.591&lt;/td&gt;
&lt;td&gt;0.61&lt;/td&gt;
&lt;td&gt;0.621&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;td&gt;0.653&lt;/td&gt;
&lt;td&gt;0.641&lt;/td&gt;
&lt;td&gt;0.678&lt;/td&gt;
&lt;td&gt;0.655&lt;/td&gt;
&lt;td&gt;0.684&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page-header&lt;/td&gt;
&lt;td&gt;3365&lt;/td&gt;
&lt;td&gt;0.707&lt;/td&gt;
&lt;td&gt;0.704&lt;/td&gt;
&lt;td&gt;0.754&lt;/td&gt;
&lt;td&gt;0.76&lt;/td&gt;
&lt;td&gt;0.769&lt;/td&gt;
&lt;td&gt;0.778&lt;/td&gt;
&lt;td&gt;0.776&lt;/td&gt;
&lt;td&gt;0.788&lt;/td&gt;
&lt;td&gt;0.784&lt;/td&gt;
&lt;td&gt;0.795&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Picture&lt;/td&gt;
&lt;td&gt;3497&lt;/td&gt;
&lt;td&gt;0.723&lt;/td&gt;
&lt;td&gt;0.758&lt;/td&gt;
&lt;td&gt;0.762&lt;/td&gt;
&lt;td&gt;0.783&lt;/td&gt;
&lt;td&gt;0.789&lt;/td&gt;
&lt;td&gt;0.8&lt;/td&gt;
&lt;td&gt;0.796&lt;/td&gt;
&lt;td&gt;0.805&lt;/td&gt;
&lt;td&gt;0.805&lt;/td&gt;
&lt;td&gt;0.802&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Section-header&lt;/td&gt;
&lt;td&gt;8544&lt;/td&gt;
&lt;td&gt;0.709&lt;/td&gt;
&lt;td&gt;0.713&lt;/td&gt;
&lt;td&gt;0.727&lt;/td&gt;
&lt;td&gt;0.745&lt;/td&gt;
&lt;td&gt;0.742&lt;/td&gt;
&lt;td&gt;0.753&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.748&lt;/td&gt;
&lt;td&gt;0.751&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Table&lt;/td&gt;
&lt;td&gt;2394&lt;/td&gt;
&lt;td&gt;0.82&lt;/td&gt;
&lt;td&gt;0.846&lt;/td&gt;
&lt;td&gt;0.854&lt;/td&gt;
&lt;td&gt;0.874&lt;/td&gt;
&lt;td&gt;0.88&lt;/td&gt;
&lt;td&gt;0.88&lt;/td&gt;
&lt;td&gt;0.885&lt;/td&gt;
&lt;td&gt;0.891&lt;/td&gt;
&lt;td&gt;0.886&lt;/td&gt;
&lt;td&gt;0.89&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;td&gt;29917&lt;/td&gt;
&lt;td&gt;0.845&lt;/td&gt;
&lt;td&gt;0.851&lt;/td&gt;
&lt;td&gt;0.86&lt;/td&gt;
&lt;td&gt;0.869&lt;/td&gt;
&lt;td&gt;0.876&lt;/td&gt;
&lt;td&gt;0.878&lt;/td&gt;
&lt;td&gt;0.878&lt;/td&gt;
&lt;td&gt;0.88&lt;/td&gt;
&lt;td&gt;0.877&lt;/td&gt;
&lt;td&gt;0.883&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Title&lt;/td&gt;
&lt;td&gt;334&lt;/td&gt;
&lt;td&gt;0.762&lt;/td&gt;
&lt;td&gt;0.793&lt;/td&gt;
&lt;td&gt;0.806&lt;/td&gt;
&lt;td&gt;0.817&lt;/td&gt;
&lt;td&gt;0.83&lt;/td&gt;
&lt;td&gt;0.832&lt;/td&gt;
&lt;td&gt;0.846&lt;/td&gt;
&lt;td&gt;0.844&lt;/td&gt;
&lt;td&gt;0.84&lt;/td&gt;
&lt;td&gt;0.848&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;All&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;66454&lt;/td&gt;
&lt;td&gt;0.718&lt;/td&gt;
&lt;td&gt;0.735&lt;/td&gt;
&lt;td&gt;0.752&lt;/td&gt;
&lt;td&gt;0.767&lt;/td&gt;
&lt;td&gt;0.775&lt;/td&gt;
&lt;td&gt;0.781&lt;/td&gt;
&lt;td&gt;0.783&lt;/td&gt;
&lt;td&gt;0.793&lt;/td&gt;
&lt;td&gt;0.787&lt;/td&gt;
&lt;td&gt;0.794&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I've also created a plot to illustrate the relationship between model size and score for these two series:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6swsalv3et212eitnt8d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6swsalv3et212eitnt8d.png" alt="plot of YOLOv8 vs YOLOv11" width="687" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Based on the table and plot above, we can conclude &lt;/p&gt;

&lt;p&gt;Based on the table and plot above, we can conclude that YOLOv11 models consistently outperform their YOLOv8 counterparts across all sizes. The improvements are particularly noticeable in the smaller models, with YOLOv11n achieving a 1.7% increase in mAP50-95 compared to YOLOv8n. Furthermore, YOLOv11 models generally have fewer parameters than their YOLOv8 equivalents, indicating improved efficiency in addition to better performance.&lt;/p&gt;

&lt;p&gt;My favorite model is YOLOv11l. It's only about the same size as YOLOv8m, but it outperforms even YOLOv8x!&lt;/p&gt;

&lt;p&gt;However, YOLOv11x shows only a slight improvement over YOLOv11l despite having twice the model size.&lt;/p&gt;

&lt;h2&gt;
  
  
  More
&lt;/h2&gt;

&lt;p&gt;What are your thoughts on the YOLOv11 results? Have you had experience using YOLO models for document layout analysis? I'd love to hear your insights and experiences in the comments below!&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;YOLOv11 documentation: &lt;a href="https://docs.ultralytics.com/models/yolo11/" rel="noopener noreferrer"&gt;https://docs.ultralytics.com/models/yolo11/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;DocLayNet GitHub repository: &lt;a href="https://github.com/DS4SD/DocLayNet" rel="noopener noreferrer"&gt;https://github.com/DS4SD/DocLayNet&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;My YOLO-DocLayNet GitHub project: &lt;a href="https://github.com/ppaanngggg/yolo-doclaynet" rel="noopener noreferrer"&gt;https://github.com/ppaanngggg/yolo-doclaynet&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>computervision</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Can AI really solve math from pictures?</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Fri, 12 Jul 2024 06:52:33 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/can-ai-really-solve-math-from-pictures-3315</link>
      <guid>https://dev.to/ppaanngggg/can-ai-really-solve-math-from-pictures-3315</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff373gqdqjb3bhobiaqw3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff373gqdqjb3bhobiaqw3.png" alt="image of AI solve math"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Recently, I found the MathVista benchmark evaluates LLMs solving math problems from pictures. Remarkably, &lt;strong&gt;Claude 3.5 Sonnet, Gemini 1.5 Pro (May 2024),&lt;/strong&gt; and &lt;strong&gt;GPT-4o&lt;/strong&gt; have outperformed the average human. &lt;/p&gt;

&lt;p&gt;And the newly released &lt;strong&gt;Claude 3.5 Sonnet&lt;/strong&gt; reached a score of &lt;strong&gt;67&lt;/strong&gt;, much higher than &lt;strong&gt;Gemini 1.5 Pro (May 2024) (63.9)&lt;/strong&gt; and &lt;strong&gt;GPT-4o (63.8)&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is MathVista?
&lt;/h3&gt;

&lt;p&gt;Here is a quota introduction of MathVista from its &lt;a href="https://mathvista.github.io/" rel="noopener noreferrer"&gt;homepage&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks. It consists of &lt;strong&gt;6,141 examples&lt;/strong&gt;, derived from &lt;strong&gt;28 existing multimodal datasets&lt;/strong&gt; involving mathematics and &lt;strong&gt;3 newly created datasets&lt;/strong&gt; (i.e., &lt;strong&gt;IQTest, FunctionQA, and PaperQA&lt;/strong&gt;). Completing these tasks requires fine-grained, deep visual understanding and compositional reasoning, which all state-of-the-art foundation &lt;br&gt;
models find challenging.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the &lt;a href="https://mathvista.github.io/#leaderboard" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt; published:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frny669r7hxbo2zyt5z0t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frny669r7hxbo2zyt5z0t.png" alt="latest mathvista leaderboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My Experiments
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Data source
&lt;/h3&gt;

&lt;p&gt;I found a website called &lt;a href="https://www.math-exercises.com/" rel="noopener noreferrer"&gt;Math-Exercises&lt;/a&gt; that categorizes math problems. It offers pictures of the problems and their answers.&lt;/p&gt;

&lt;p&gt;I picked 5 problems from this page and tested them on &lt;strong&gt;Claude 3.5 Sonnet&lt;/strong&gt;, &lt;strong&gt;Gemini 1.5 Pro&lt;/strong&gt;, and &lt;strong&gt;GPT-4o&lt;/strong&gt;. Let me show you each problem and their answers. Let's start.&lt;/p&gt;

&lt;h3&gt;
  
  
  Round 1 - Set Operations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Find the intersection A∩B, union A∪B and differences A-B, B-A of sets A, B if :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1o4pjp6mokmvkpkscxa2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1o4pjp6mokmvkpkscxa2.png" alt="set operation problem 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude 3.5 Sonnet:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facdcn162g7ddpw5j4dbq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facdcn162g7ddpw5j4dbq.png" alt="Claude solution of set operation problem 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 1.5 Pro:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wlwjak33t8bxi19ckba.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wlwjak33t8bxi19ckba.png" alt="Gemini solution of set operation problem 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-4o:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7xluokiyuz3adw5p0hs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7xluokiyuz3adw5p0hs.png" alt="GPT solution of set operation problem 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Right Answer:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd1w1azvtxe2kcd36p1t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd1w1azvtxe2kcd36p1t.png" alt="Answer of set operation problem 1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;🌟 Awesome! All three models are correct! Maybe it’s too simple for AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Round 2 - Set Operations Again!
&lt;/h3&gt;

&lt;p&gt;This time, I am using a slightly more abstract set problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Find the intersection A∩B, union A∪B and differences A-B, B-A of sets A, B if :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frs7ql0vmg0yg0se368nc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frs7ql0vmg0yg0se368nc.png" alt="set operation problem 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude 3.5 Sonnet:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F387irmfxoafg6wxqki95.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F387irmfxoafg6wxqki95.png" alt="Claude solution of set operation problem 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 1.5 Pro:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm499p47ackre4yumfjkc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm499p47ackre4yumfjkc.png" alt="Gemini solution of set operation problem 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-4o:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ggzuhmsjktgyae7380x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ggzuhmsjktgyae7380x.png" alt="GPT solution of set operation problem 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Right Answer:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jp4hb8228r4qh34l9po.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jp4hb8228r4qh34l9po.png" alt="Answer of set operation problem 2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;🌟 Incredible, they are all right again! Is AI really good at set operations?&lt;/p&gt;

&lt;h3&gt;
  
  
  Round 3 - &lt;strong&gt;Algebraic Expressions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I chose a different type of problem. This time, I want to test AI's algebraic skills.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By grouping the terms factor the polynomials and algebraic expressions :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F589a3zzg3zd0a3c8ky2t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F589a3zzg3zd0a3c8ky2t.png" alt="Algebraic Expressions problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude 3.5 Sonnet:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4m9pnwkd9eou8nyawykm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4m9pnwkd9eou8nyawykm.png" alt="Claude solution of Algebraic Expressions problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 1.5 Pro:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flpgg877z0od66mvkxjbf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flpgg877z0od66mvkxjbf.png" alt="Gemini solution of Algebraic Expressions problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-4o:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjf3esr4is1m7iqh5luc8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjf3esr4is1m7iqh5luc8.png" alt="GPT solution of Algebraic Expressions problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Right Answer:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsomsto2opoid2loj3l6o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsomsto2opoid2loj3l6o.png" alt="Answer of Algebraic Expressions problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;😱 What? They are all wrong this time. I can’t understand how they know how to solve this problem but still make mistakes at the end.&lt;/p&gt;

&lt;h3&gt;
  
  
  Round 4 - &lt;strong&gt;Linear Equations&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Solve the linear equations and check the solution :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ap28ujpa35hnixs6755.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ap28ujpa35hnixs6755.png" alt="Linear Equations problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude 3.5 Sonnet:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lnaob79593tzfao3ekw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9lnaob79593tzfao3ekw.png" alt="Claude solution of Linear Equations problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 1.5 Pro:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffj221rh9cac23c1reixv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffj221rh9cac23c1reixv.png" alt="Gemini solution of Linear Equations problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-4o:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It prints out too much. I only screenshot the final answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmx5vpmgzmfnew4j5u36.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flmx5vpmgzmfnew4j5u36.png" alt="GPT solution of Linear Equations problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Right Answer:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57xp612y2bpeu5ftymtg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57xp612y2bpeu5ftymtg.png" alt="Answer of Linear Equations problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;🌟 They all found the right answer again! However, Claude failed in the answer check, even though the answer is correct. It's so odd.&lt;/p&gt;

&lt;h3&gt;
  
  
  Round 5 - &lt;strong&gt;Inequalities&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Maybe the Equations is too simple for AIs, this time I use inequalities!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Solve the linear inequalities with absolute value :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkbwyw5n1hevbhs94n7r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkbwyw5n1hevbhs94n7r.png" alt="Inequalities problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude 3.5 Sonnet:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlrghu7khm0a68p6sj1k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlrghu7khm0a68p6sj1k.png" alt="Claude solution of Inequalities problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 1.5 Pro:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It prints out too much. I only screenshot the final answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd210xkqbjnjn9kz0kwy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd210xkqbjnjn9kz0kwy.png" alt="Gemini solution of Inequalities problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-4o:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It prints out too much. I only screenshot the final answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4rt9co1jaklqqv6cod88.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4rt9co1jaklqqv6cod88.png" alt="GPT solution of Inequalities problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Right Answer:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcxzclsabh2a07afum6xy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcxzclsabh2a07afum6xy.png" alt="Answer of Inequalities problem"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;😢 Maybe it’s too difficult for AI. They are trying hard, but it's all wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;AI can solve many math problems. Although it may fail in some cases, it can still provide useful hints.&lt;/p&gt;

&lt;p&gt;And finally, &lt;strong&gt;Claude 3.5 Sonnet&lt;/strong&gt;, &lt;strong&gt;Gemini 1.5 Pro&lt;/strong&gt;, and &lt;strong&gt;GPT-4o&lt;/strong&gt; received the same scores in this math match! In conclusion, state-of-the-art AIs perform similarly at math.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://poe.com/" rel="noopener noreferrer"&gt;Poe&lt;/a&gt; is really awesome. I used Poe to finish this test. It offers many advanced models and community models for users.&lt;/li&gt;
&lt;li&gt;I am building a web application to help solve math problems. Feel free to try it out: &lt;a href="https://www.aimathsolve.com/" rel="noopener noreferrer"&gt;AI Math Solve&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>How to Use Ollama for Front-end with Streaming Output</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Mon, 17 Jun 2024 10:58:52 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/how-to-use-ollama-for-front-end-with-streaming-output-5efj</link>
      <guid>https://dev.to/ppaanngggg/how-to-use-ollama-for-front-end-with-streaming-output-5efj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;LLM applications are becoming increasingly popular. However, there are numerous LLM models, each with its differences. Handling streaming output can be complex, especially for new front-end developers.&lt;/p&gt;

&lt;p&gt;Thanks to the &lt;a href="https://sdk.vercel.ai/" rel="noopener noreferrer"&gt;AI SDK&lt;/a&gt; developed by Vercel, implementing LLM chat in next.js with streaming output has become incredibly easy. Next, I'll provide a step-by-step tutorial on how to integrate Ollama into your front-end project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install Ollama
&lt;/h2&gt;

&lt;p&gt;Ollama is the premier local LLM inferencer. It allows for direct model downloading and exports APIs for backend use. If you're seeking lower latency or improved privacy through local LLM deployment, Ollama is an excellent choice. For installation, if you're using Linux, simply run the following command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you're using a different OS, please follow this &lt;a href="https://ollama.com/download" rel="noopener noreferrer"&gt;link&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a New Next.js Project
&lt;/h2&gt;

&lt;p&gt;To create a new Next.js project, enter the command &lt;code&gt;npx create-next-app@latest your-new-project&lt;/code&gt;. Make sure you choose App route mode. After that, run &lt;code&gt;npm dev&lt;/code&gt; and open &lt;code&gt;localhost:3000&lt;/code&gt; in your preferred browser to verify if the new project is set up correctly.&lt;/p&gt;

&lt;p&gt;Next, you need to install the AI SDK:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

npm &lt;span class="nb"&gt;install &lt;/span&gt;ai


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The AI SDK utilizes a sophisticated provider design, enabling you to implement your own LLM provider. At present, it is only necessary to install the Ollama provider offered by third-party support.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

npm &lt;span class="nb"&gt;install &lt;/span&gt;ollama-ai-provider


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Server-Side Code
&lt;/h2&gt;

&lt;p&gt;Now that you've gathered all the prerequisites for your LLM application, create a new file named &lt;code&gt;actions.ts&lt;/code&gt; in the &lt;code&gt;app&lt;/code&gt; folder:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ollama&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ollama-ai-provider&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;streamText&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createStreamableValue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai/rsc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Message&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;continueConversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createStreamableValue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;llama3:8b&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;textStream&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;textStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;done&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;})().&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;newMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let me provide some explanation about this code.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;interface Message&lt;/code&gt; is a shared interface that establishes the structure of a message. It includes two properties: 'role' (which can be either 'user' or 'assistant') and 'content' (the actual text of the message).&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;continueConversation&lt;/code&gt; function is a server component that utilizes the conversation history to generate the assistant's response. This function interacts with the Ollama model (specifically &lt;code&gt;llama3:8b&lt;/code&gt;, but you can replace it with any model of your choice) to generate a continuous text output.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;streamText&lt;/code&gt; function is part of the AI SDK and it creates a text stream that will be updated with the assistant's response as it is generated.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Client-Side Code
&lt;/h2&gt;

&lt;p&gt;Next, replace the contents of &lt;code&gt;page.tsx&lt;/code&gt; with the new code:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;continueConversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Message&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./actions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;readStreamableValue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai/rsc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;Home&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setConversation&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;([]);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setInput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;: &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;

      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;input&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"text"&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="na"&gt;onChange&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;setInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;
          &lt;span class="na"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;continueConversation&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
              &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nf"&gt;readStreamableValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newMessage&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

              &lt;span class="nf"&gt;setConversation&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
              &lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          Send Message
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This is a very simple UI you can continue talk with LLM model now. There are some important snips:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;input&lt;/code&gt; field captures the user's input. It is controlled by a React state variable that gets updated every time the input changes.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;button&lt;/code&gt; has an &lt;code&gt;onClick&lt;/code&gt; event that triggers the &lt;code&gt;continueConversation&lt;/code&gt; function. This function takes the current conversation history, appends the user's new message, and waits for the assistant's response.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;conversation&lt;/code&gt; array holds the history of the conversation. Each message is displayed on the screen, and new messages are appended at the end. By using &lt;code&gt;readStreamableValue&lt;/code&gt; from the AI SDK, we're able to read the streaming output value from the server component function and update the conversation in real-time.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Let’s Test Now
&lt;/h2&gt;

&lt;p&gt;I type "who are you" into the input placeholder.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl60fvzva3w66mhfud00b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl60fvzva3w66mhfud00b.png" alt="ollama input"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the output of &lt;code&gt;llama:8b&lt;/code&gt; supported by Ollama. You'll notice that the output is printed in a streaming manner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfkzsvt30nopxm2st7ey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfkzsvt30nopxm2st7ey.png" alt="ollama output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Documentation for the AI SDK: &lt;a href="https://sdk.vercel.ai/docs/introduction" rel="noopener noreferrer"&gt;https://sdk.vercel.ai/docs/introduction&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Ollama Github: &lt;a href="https://github.com/ollama/ollama" rel="noopener noreferrer"&gt;https://github.com/ollama/ollama&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Find more models supported oy Ollama: &lt;a href="https://ollama.com/library" rel="noopener noreferrer"&gt;https://ollama.com/library&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>webdev</category>
      <category>ollama</category>
      <category>llm</category>
      <category>nextjs</category>
    </item>
    <item>
      <title>How to analyze document layout by YOLO</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Thu, 13 Jun 2024 11:49:13 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/how-to-analyze-document-layout-by-yolo-3jd</link>
      <guid>https://dev.to/ppaanngggg/how-to-analyze-document-layout-by-yolo-3jd</guid>
      <description>&lt;h2&gt;
  
  
  Why we need document layout analysis
&lt;/h2&gt;

&lt;p&gt;Analyzing document layout is critical because it aids in the proper interpretation and understanding of document content. It becomes more significant with the rise of RAG, which relies heavily on the ability to parse documents accurately.&lt;/p&gt;

&lt;p&gt;RAG systems frequently interact with a variety of documents. Scientific papers, for instance, typically have a complex layout that includes figures, tables, references, and structured sections. Proper parsing is important to avoid content disarray. If not done correctly, the LLM could fail due to the 'garbage in, garbage out' principle.&lt;/p&gt;

&lt;p&gt;However, due to the complexity of this problem, it's impossible to apply handcrafted rules as a solution. The best approach is to train a machine learning model.&lt;/p&gt;

&lt;h2&gt;
  
  
  My solution
&lt;/h2&gt;

&lt;p&gt;You can find my solution in &lt;a href="https://github.com/ppaanngggg/yolo-doclaynet" rel="noopener noreferrer"&gt;yolo-doclaynet&lt;/a&gt;. After examining several models and datasets, I've chosen &lt;code&gt;YOLO&lt;/code&gt; as the base model and &lt;code&gt;DocLayNet&lt;/code&gt; as the training data. Let's delve into more details.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;YOLO&lt;/code&gt; is the most advanced vision detection model. It is maintained by &lt;a href="https://github.com/ultralytics/ultralytics" rel="noopener noreferrer"&gt;Ultralytics&lt;/a&gt;, a leading computer vision team. The model is easy to train, evaluate, and deploy. Plus, its size is compact enough to run in a browser or on a smartphone.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DocLayNet&lt;/code&gt; is a human-annotated document layout segmentation dataset, containing 80,863 pages from a wide variety of document sources. To the best of my knowledge, it is the highest-quality dataset for document layout analysis. You can download and find more information from this &lt;a href="https://github.com/DS4SD/DocLayNet" rel="noopener noreferrer"&gt;link&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Live Demo
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Download the pretrained model from &lt;a href="https://huggingface.co/hantian/yolo-doclaynet/tree/main" rel="noopener noreferrer"&gt;huggingface&lt;/a&gt;. Your options include &lt;code&gt;yolov8n-doclaynet&lt;/code&gt;, &lt;code&gt;yolov8s-doclaynet&lt;/code&gt;, and &lt;code&gt;yolov8m-doclaynet&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Install &lt;code&gt;Ultralytics&lt;/code&gt; by executing &lt;code&gt;pip install ultralytics&lt;/code&gt;. If you encounter any issues, please refer to &lt;a href="https://docs.ultralytics.com/quickstart/#install-ultralytics" rel="noopener noreferrer"&gt;https://docs.ultralytics.com/quickstart/#install-ultralytics&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Copy and modify this code snippet to output the detection result:&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ultralytics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;YOLO&lt;/span&gt;

&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;your_image_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IMREAD_COLOR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;YOLO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;your_model_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Debugging the model can be challenging when you can only check the plain-text output. Fortunately, visualizing the results is simple:&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ultralytics.utils.plotting&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Colors&lt;/span&gt;

&lt;span class="n"&gt;colors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Colors&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;annotator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Annotator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;line_width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;font_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;font_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;box&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;boxes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;boxes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xyxyn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;annotator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;box_label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;names&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;colors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bgr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;annotator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;annotated-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;your_image_path&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Examine the annotated image. Here's an example:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fda19oyuolsj8l6r5r6y9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fda19oyuolsj8l6r5r6y9.png" alt="yolo-doclaynet output example"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark
&lt;/h2&gt;

&lt;p&gt;I also evaluate the &lt;code&gt;mAP50-95&lt;/code&gt; performance of the entire &lt;code&gt;yolov8&lt;/code&gt; series on the &lt;code&gt;DocLayNet&lt;/code&gt; test set.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;label&lt;/th&gt;
&lt;th&gt;images&lt;/th&gt;
&lt;th&gt;boxes&lt;/th&gt;
&lt;th&gt;yolov8n&lt;/th&gt;
&lt;th&gt;yolov8s&lt;/th&gt;
&lt;th&gt;yolov8m&lt;/th&gt;
&lt;th&gt;yolov8l&lt;/th&gt;
&lt;th&gt;yolov8x&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Caption&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;1542&lt;/td&gt;
&lt;td&gt;0.682&lt;/td&gt;
&lt;td&gt;0.721&lt;/td&gt;
&lt;td&gt;0.746&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.753&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Footnote&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;387&lt;/td&gt;
&lt;td&gt;0.614&lt;/td&gt;
&lt;td&gt;0.669&lt;/td&gt;
&lt;td&gt;0.696&lt;/td&gt;
&lt;td&gt;0.702&lt;/td&gt;
&lt;td&gt;0.717&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Formula&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;1966&lt;/td&gt;
&lt;td&gt;0.655&lt;/td&gt;
&lt;td&gt;0.695&lt;/td&gt;
&lt;td&gt;0.723&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.747&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;List-item&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;10521&lt;/td&gt;
&lt;td&gt;0.789&lt;/td&gt;
&lt;td&gt;0.818&lt;/td&gt;
&lt;td&gt;0.836&lt;/td&gt;
&lt;td&gt;0.841&lt;/td&gt;
&lt;td&gt;0.841&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page-footer&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;3987&lt;/td&gt;
&lt;td&gt;0.588&lt;/td&gt;
&lt;td&gt;0.61&lt;/td&gt;
&lt;td&gt;0.64&lt;/td&gt;
&lt;td&gt;0.641&lt;/td&gt;
&lt;td&gt;0.655&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page-header&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;3365&lt;/td&gt;
&lt;td&gt;0.707&lt;/td&gt;
&lt;td&gt;0.754&lt;/td&gt;
&lt;td&gt;0.769&lt;/td&gt;
&lt;td&gt;0.776&lt;/td&gt;
&lt;td&gt;0.784&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Picture&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;3497&lt;/td&gt;
&lt;td&gt;0.723&lt;/td&gt;
&lt;td&gt;0.762&lt;/td&gt;
&lt;td&gt;0.789&lt;/td&gt;
&lt;td&gt;0.796&lt;/td&gt;
&lt;td&gt;0.805&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Section-header&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;8544&lt;/td&gt;
&lt;td&gt;0.709&lt;/td&gt;
&lt;td&gt;0.727&lt;/td&gt;
&lt;td&gt;0.742&lt;/td&gt;
&lt;td&gt;0.75&lt;/td&gt;
&lt;td&gt;0.748&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Table&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;2394&lt;/td&gt;
&lt;td&gt;0.82&lt;/td&gt;
&lt;td&gt;0.854&lt;/td&gt;
&lt;td&gt;0.88&lt;/td&gt;
&lt;td&gt;0.885&lt;/td&gt;
&lt;td&gt;0.886&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;29917&lt;/td&gt;
&lt;td&gt;0.845&lt;/td&gt;
&lt;td&gt;0.86&lt;/td&gt;
&lt;td&gt;0.876&lt;/td&gt;
&lt;td&gt;0.878&lt;/td&gt;
&lt;td&gt;0.877&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Title&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;334&lt;/td&gt;
&lt;td&gt;0.762&lt;/td&gt;
&lt;td&gt;0.806&lt;/td&gt;
&lt;td&gt;0.83&lt;/td&gt;
&lt;td&gt;0.846&lt;/td&gt;
&lt;td&gt;0.84&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;4983&lt;/td&gt;
&lt;td&gt;66454&lt;/td&gt;
&lt;td&gt;0.718&lt;/td&gt;
&lt;td&gt;0.752&lt;/td&gt;
&lt;td&gt;0.775&lt;/td&gt;
&lt;td&gt;0.783&lt;/td&gt;
&lt;td&gt;0.787&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here is an overview of the &lt;code&gt;mAP50-95&lt;/code&gt; performance with different model sizes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6w4644x9ipw8x5bvhmil.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6w4644x9ipw8x5bvhmil.png" alt="performance between yolov8 model sizes"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>computervision</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Upload Images to Google Gemini for Next.js</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Mon, 03 Jun 2024 13:33:09 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/how-to-upload-images-to-google-gemini-for-nextjs-2c75</link>
      <guid>https://dev.to/ppaanngggg/how-to-upload-images-to-google-gemini-for-nextjs-2c75</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Google Gemini exhibits strong performance in multi-model tasks, particularly the latest Gemini 1.5 Flash and Gemini 1.5 Pro. There are two benchmarks for multi-model tasks: reasoning and math. As demonstrated, the Gemini 1.5 Pro performs on par with the latest GPT-4o in visual math tasks 🎉. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Gemini 1.5 Flash&lt;/th&gt;
&lt;th&gt;Gemini 1.5 Pro&lt;/th&gt;
&lt;th&gt;GPT-4o&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MMMU&lt;/td&gt;
&lt;td&gt;Multi-discipline college-level reasoning problems&lt;/td&gt;
&lt;td&gt;56.1%&lt;/td&gt;
&lt;td&gt;62.2%&lt;/td&gt;
&lt;td&gt;69.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MathVista&lt;/td&gt;
&lt;td&gt;Mathematical reasoning in visual contexts&lt;/td&gt;
&lt;td&gt;58.4%&lt;/td&gt;
&lt;td&gt;63.9%&lt;/td&gt;
&lt;td&gt;63.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In this blog, I will guide you on how to unlock the vision capabilities of Google Gemini. Let's get started 🚀.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisite
&lt;/h2&gt;

&lt;p&gt;In my latest &lt;a href="https://dev.to/ppaanngggg/how-to-use-google-gemini-for-nextjs-with-streaming-output-352g"&gt;blog&lt;/a&gt;, I demonstrated how to use Google Gemini with Next.js for streaming output. While the previous guide focused on text input, this article will show you how to upload images to Google Gemini, using a simple demo. If you're unfamiliar with registering a Google AI API Key or using the Vercel AI SDK, I recommend reading the previous &lt;a href="https://dev.to/ppaanngggg/how-to-use-google-gemini-for-nextjs-with-streaming-output-352g"&gt;blog&lt;/a&gt; first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Server-Side
&lt;/h2&gt;

&lt;p&gt;Here is the complete server-side function. I made a few modifications, namely removing the custom &lt;code&gt;Message&lt;/code&gt; and importing &lt;code&gt;CoreMessage&lt;/code&gt; instead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;google&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@ai-sdk/google&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CoreMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;LanguageModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;streamText&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createStreamableValue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai/rsc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;continueConversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CoreMessage&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createStreamableValue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;models/gemini-1.5-pro-latest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;textStream&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;textStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;done&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;})().&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;newMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;CoreMessage&lt;/code&gt; is a complex structure that can accept various types of data. &lt;code&gt;CoreUserMessage&lt;/code&gt; is a message sent by a user, it has a fixed role &lt;code&gt;user&lt;/code&gt; and flexible &lt;code&gt;content&lt;/code&gt;. The &lt;code&gt;UserContent&lt;/code&gt; can either be a plain string, a &lt;code&gt;TextPart&lt;/code&gt; object, or an &lt;code&gt;ImagePart&lt;/code&gt; object.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;CoreUserMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;UserContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;UserContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;TextPart$1&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nx"&gt;ImagePart&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;TextPart$1&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ImagePart&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="cm"&gt;/**
  Image data. Can either be:

  - data: a base64-encoded string, a Uint8Array, an ArrayBuffer, or a Buffer
  - URL: a URL that points to the image
     */&lt;/span&gt;
    &lt;span class="nl"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DataContent&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nx"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="cm"&gt;/**
  Optional mime type of the image.
     */&lt;/span&gt;
    &lt;span class="nl"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Delve deep into the &lt;code&gt;ImagePart&lt;/code&gt;. You can pass either base64-encoded image data or an image URL into the image field. In this instance, to simplify the system, we will pass base64-encoded image data into the message.&lt;/p&gt;

&lt;h2&gt;
  
  
  Client-Side
&lt;/h2&gt;

&lt;p&gt;This page requires key modifications. We need to upload an image, encode it into a base64 message, and preview the image within the message. The following are the complete codes for the page after the update. You can copy and paste this code, and I'll explain the key points afterward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;continueConversation&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./actions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;readStreamableValue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai/rsc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CoreMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;Home&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setConversation&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;CoreMessage&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;([]);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;imageInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setImageInput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;textInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setTextInput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getBase64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;File&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FileReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readAsDataURL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;:&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
            &lt;span class="si"&gt;{&lt;/span&gt;
              &lt;span class="c1"&gt;// if it's string, just show it, else if it is image, preview image, if it is text, show the text&lt;/span&gt;
              &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;
              &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;img&lt;/span&gt;
                  &lt;span class="na"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;""&lt;/span&gt;
                  &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;
                    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data:image;base64,&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
                  &lt;span class="si"&gt;}&lt;/span&gt;
                  &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
              &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;
              &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="dl"&gt;""&lt;/span&gt;
              &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;

      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;input&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"file"&lt;/span&gt;
          &lt;span class="na"&gt;onChange&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
              &lt;span class="nf"&gt;getBase64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nf"&gt;setImageInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
              &lt;span class="p"&gt;});&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nf"&gt;setImageInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;input&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"text"&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;textInput&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="na"&gt;onChange&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;setTextInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;
          &lt;span class="na"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// append user messages&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;userMessages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CoreMessage&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
            &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imageInput&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="c1"&gt;// remove data:*/*;base64 from result&lt;/span&gt;
              &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pureBase64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;imageInput&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^data:image&lt;/span&gt;&lt;span class="se"&gt;\/\w&lt;/span&gt;&lt;span class="sr"&gt;+;base64,/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
              &lt;span class="nx"&gt;userMessages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pureBase64&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
              &lt;span class="p"&gt;});&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;textInput&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nx"&gt;userMessages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;textInput&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
              &lt;span class="p"&gt;});&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;continueConversation&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
              &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;userMessages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="c1"&gt;// collect assistant message&lt;/span&gt;
            &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nf"&gt;readStreamableValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newMessage&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

              &lt;span class="nf"&gt;setConversation&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                  &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
              &lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          Send Message
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Due to the complexity of &lt;code&gt;CoreMessage&lt;/code&gt;, I have added some conditional branches to handle message previews. This is particularly the case when using the &lt;code&gt;&amp;lt;img /&amp;gt;&lt;/code&gt; tag to display base64-encoded images.&lt;/li&gt;
&lt;li&gt;Add another &lt;code&gt;&amp;lt;input&amp;gt;&lt;/code&gt; with &lt;code&gt;type="file"&lt;/code&gt; to upload an image. When a change occurs, read the image file and convert it into a base64 string.&lt;/li&gt;
&lt;li&gt;Finally, when the send button is clicked, we need to convert the image and text inputs into an array of &lt;code&gt;CoreMessage&lt;/code&gt;. Please note that the base64 header should be discarded from the image input.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Body Size Config
&lt;/h2&gt;

&lt;p&gt;The default &lt;code&gt;bodySizeLimit&lt;/code&gt; for &lt;code&gt;Next.js&lt;/code&gt; is set to 1MB. If you wish to upload files larger than 1MB, you need to adjust the configuration as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;experimental&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;serverActions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;bodySizeLimit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;10mb&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Let’s Test Now
&lt;/h2&gt;

&lt;p&gt;I upload the cover image from the previous blog and ask, "What is this picture about?" Then, I click the send button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs17lcnqgjic66tzh1v8v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs17lcnqgjic66tzh1v8v.png" alt="input of image and text" width="725" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Examine the assistant's output; it's quite impressive 👏👏👏.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq5u5l3o730mbz6bz8xv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq5u5l3o730mbz6bz8xv.png" alt="output of google gemini" width="725" height="539"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Documentation for the AI SDK: &lt;a href="https://sdk.vercel.ai/docs/introduction" rel="noopener noreferrer"&gt;https://sdk.vercel.ai/docs/introduction&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Google AI Studio: &lt;a href="https://ai.google.dev/aistudio" rel="noopener noreferrer"&gt;https://ai.google.dev/aistudio&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this post, I've explored the key features and benefits of Google Gemini in front-end.&lt;/p&gt;

&lt;p&gt;If you're interested in seeing Google Gemini in action, check out these products that have successfully implemented it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AI Math Solver - A webapp that help users to solve math problems.
Learn more: &lt;a href="https://www.aimathsolver.com" rel="noopener noreferrer"&gt;AIMathSolver&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Have you used Google Gemini in your projects? Share your experiences in the comments below!&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>llm</category>
      <category>gemini</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to Use Google Gemini for Next.js with Streaming Output</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Thu, 30 May 2024 08:03:34 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/how-to-use-google-gemini-for-nextjs-with-streaming-output-352g</link>
      <guid>https://dev.to/ppaanngggg/how-to-use-google-gemini-for-nextjs-with-streaming-output-352g</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;LLM applications are becoming increasingly popular. However, there are numerous LLM models, each with its differences. Handling streaming output can be complex, especially for new front-end developers.&lt;/p&gt;

&lt;p&gt;Thanks to the &lt;a href="https://sdk.vercel.ai/" rel="noopener noreferrer"&gt;AI SDK&lt;/a&gt; developed by Vercel, implementing LLM chat in next.js with streaming output has become incredibly easy. Next, I'll provide a step-by-step tutorial on how to integrate Google Gemini into your front-end project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a Google AI Studio Account
&lt;/h2&gt;

&lt;p&gt;Head to &lt;a href="https://aistudio.google.com" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt; and signup, after you login, you can find the button “Get API Key” on the left, click it and create a API Key. This API Key will be used later.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl63q9m0sautuvsmp4hb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsl63q9m0sautuvsmp4hb.png" alt="create google ai api key"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a New Next.js Project
&lt;/h2&gt;

&lt;p&gt;To create a new Next.js project, enter the command &lt;code&gt;npx create-next-app@latest your-new-project&lt;/code&gt;. Make sure you choose App route mode. After that, run &lt;code&gt;npm dev&lt;/code&gt; and open &lt;code&gt;localhost:3000&lt;/code&gt; in your preferred browser to verify if the new project is set up correctly.&lt;/p&gt;

&lt;p&gt;Next, you need to install the AI SDK:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

pnpm &lt;span class="nb"&gt;install &lt;/span&gt;ai


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The AI SDK uses an advanced provider design, allowing you to implement your own LLM provider. Currently, we only need to install the official Google Provider.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

pnpm &lt;span class="nb"&gt;install&lt;/span&gt; @ai-sdk/google


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Set Your API Key in Your Local Environment
&lt;/h2&gt;

&lt;p&gt;Next.js integrates well with environment variables. Simply create a file named &lt;code&gt;.env.local&lt;/code&gt; in the root folder of your project.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

GOOGLE_GENERATIVE_AI_API_KEY={your API Key}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Afterwards, the AI SDK will automatically load your key when you use Google AI to generate text.&lt;/p&gt;

&lt;h2&gt;
  
  
  Server-Side Code
&lt;/h2&gt;

&lt;p&gt;Now that you've gathered all the prerequisites for your LLM application, create a new file named &lt;code&gt;actions.ts&lt;/code&gt; in the &lt;code&gt;app&lt;/code&gt; folder:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;google&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@ai-sdk/google&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;streamText&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createStreamableValue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai/rsc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Message&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;continueConversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use server&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createStreamableValue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;google&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;models/gemini-1.5-pro-latest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;textStream&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;textStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;done&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;})().&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;newMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let me provide some explanation about this code.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;interface Message&lt;/code&gt; is a shared interface that establishes the structure of a message. It includes two properties: 'role' (which can be either 'user' or 'assistant') and 'content' (the actual text of the message).&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;continueConversation&lt;/code&gt; function is a server component function which uses the history of the conversation to generate the assistant's response. The function communicates with Google's Gemini model to generate a streaming text output.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;streamText&lt;/code&gt; function is part of the AI SDK and it creates a text stream that will be updated with the assistant's response as it is generated.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Client-Side Code
&lt;/h2&gt;

&lt;p&gt;Next, replace the contents of &lt;code&gt;page.tsx&lt;/code&gt; with the new code:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;

&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;use client&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;continueConversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Message&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./actions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;readStreamableValue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai/rsc&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;Home&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setConversation&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;([]);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setInput&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;: &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;

      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;input&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"text"&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="na"&gt;onChange&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;setInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;
          &lt;span class="na"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;newMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;continueConversation&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
              &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nf"&gt;readStreamableValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newMessage&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

              &lt;span class="nf"&gt;setConversation&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
              &lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          Send Message
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This is a very simple UI you can continue talk with LLM model now. There are some important snips:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;input&lt;/code&gt; field captures the user's input. It is controlled by a React state variable that gets updated every time the input changes.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;button&lt;/code&gt; has an &lt;code&gt;onClick&lt;/code&gt; event that triggers the &lt;code&gt;continueConversation&lt;/code&gt; function. This function takes the current conversation history, appends the user's new message, and waits for the assistant's response.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;conversation&lt;/code&gt; array holds the history of the conversation. Each message is displayed on the screen, and new messages are appended at the end. By using &lt;code&gt;readStreamableValue&lt;/code&gt; from the AI SDK, we're able to read the streaming output value from the server component function and update the conversation in real-time.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Let's Test Now
&lt;/h2&gt;

&lt;p&gt;I type "who are you" into the input placeholder.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzom4vanr61ge3dmpeeec.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzom4vanr61ge3dmpeeec.png" alt="llm input"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the output of Google Gemini. You'll notice that the output is printed in a streaming manner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feg4zc2rbncp2z9cixsb5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feg4zc2rbncp2z9cixsb5.png" alt="llm output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Documentation for the AI SDK: &lt;a href="https://sdk.vercel.ai/docs/introduction" rel="noopener noreferrer"&gt;https://sdk.vercel.ai/docs/introduction&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Google AI Studio: &lt;a href="https://ai.google.dev/aistudio" rel="noopener noreferrer"&gt;https://ai.google.dev/aistudio&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this post, I've explored the key features and benefits of Google Gemini in front-end.&lt;/p&gt;

&lt;p&gt;If you're interested in seeing Google Gemini in action, check out these products that have successfully implemented it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AI Math Solver - A webapp that help users to solve math problems.
Learn more: &lt;a href="https://www.aimathsolver.com" rel="noopener noreferrer"&gt;AIMathSolver&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Have you used Google Gemini in your projects? Share your experiences in the comments below!&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>nextjs</category>
      <category>gemini</category>
      <category>llm</category>
    </item>
    <item>
      <title>Step by Step to deploy Go API on AWS lambda and access by function URL</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Fri, 24 May 2024 16:22:56 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/step-by-step-to-deploy-go-api-on-aws-lambda-and-access-by-function-url-4668</link>
      <guid>https://dev.to/ppaanngggg/step-by-step-to-deploy-go-api-on-aws-lambda-and-access-by-function-url-4668</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In today's world of cloud computing,  &lt;a href="https://aws.amazon.com/lambda"&gt;AWS Lambda&lt;/a&gt; is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. You can trigger Lambda from over 200 AWS services and software as a service (SaaS) applications, and only pay for what you use.&lt;/p&gt;

&lt;p&gt;Go is a statically typed, compiled language known for its simplicity, efficiency, and ease of use. It's particularly well-suited for building scalable and efficient cloud services.&lt;/p&gt;

&lt;p&gt;In this guide, I will demonstrate how to deploy a Go API server on AWS Lambda step by step.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a Lambda Function
&lt;/h2&gt;

&lt;p&gt;First, you need to create a new Lambda function. Log into your AWS console and navigate to the Lambda service. Click the 'Create function' button. You will see many options, but don't worry, we only need to adjust a few. Leaving the rest as they are will be sufficient to host our Go server.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enter your function name. For example, I used &lt;code&gt;go-api&lt;/code&gt;. You should use a name that is meaningful.&lt;/li&gt;
&lt;li&gt;Select the runtime. In this case, we'll choose &lt;code&gt;Amazon Linux 2023&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Optionally, if you're building a Go application for ARM, you should change the Architecture to arm64.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uinhhkbycobnbnug37e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uinhhkbycobnbnug37e.png" alt="create aws lambda" width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click to open Advanced settings and enable the &lt;code&gt;function URL&lt;/code&gt;. This function is powerful and easy to use, and we will be using it.&lt;/li&gt;
&lt;li&gt;Setting the &lt;code&gt;Auth type&lt;/code&gt; to &lt;code&gt;NONE&lt;/code&gt; simplifies its use. For enhanced security, you could use &lt;code&gt;AWS_IAM&lt;/code&gt;, but that's a different topic that we won't discuss here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrxn9083vp7kpu40ajor.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrxn9083vp7kpu40ajor.png" alt="setting function URL" width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Edit the Go Source Code
&lt;/h2&gt;

&lt;p&gt;Next, we need to create a minimal Go function for AWS Lambda. Here's a sample &lt;code&gt;main.go&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We import &lt;code&gt;github.com/aws/aws-lambda-go/lambda&lt;/code&gt;. This dependency is necessary for running Go as a Lambda function.&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;main&lt;/code&gt; function, we use &lt;code&gt;lambda.Start&lt;/code&gt; to start the handler.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/aws/aws-lambda-go/lambda"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;RequestEvent&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;RawPath&lt;/span&gt;        &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"rawPath"`&lt;/span&gt;
    &lt;span class="n"&gt;RawQueryString&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"rawQueryString"`&lt;/span&gt;
    &lt;span class="n"&gt;Body&lt;/span&gt;           &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"body"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;HandleRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;RequestEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"received nil event"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"RawPath: %s, RawQueryString: %s, Body: %s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RawPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RawQueryString&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lambda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HandleRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we need to compile the main into a binary. Simply copy the following command line, ensuring the output filename is 'bootstrap'.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;GOOS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linux &lt;span class="nv"&gt;GOARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;amd64 go build &lt;span class="nt"&gt;-tags&lt;/span&gt; lambda.norpc &lt;span class="nt"&gt;-o&lt;/span&gt; bootstrap main.go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we need to compress the bootstrap file into a zip format. Without doing this, we cannot upload it to AWS Lambda. Be sure to place the bootstrap file at the root of the zip file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;zip myFunction.zip bootstrap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We've now set up the basic Go application. It's time to upload and conduct a test.&lt;/p&gt;

&lt;h2&gt;
  
  
  Upload to AWS lambda
&lt;/h2&gt;

&lt;p&gt;Click to enter the lambda function we created previously and scroll down. Find the &lt;code&gt;Code Source&lt;/code&gt; block within the code tab. To the right of this block, click the &lt;code&gt;Upload from&lt;/code&gt; button and select the &lt;code&gt;.zip file&lt;/code&gt; option. Locate your zip file and upload it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F028086lxxag0inige63x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F028086lxxag0inige63x.png" alt="upload zip file" width="800" height="98"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After uploading, you should find information similar to the details below the &lt;code&gt;Code Source&lt;/code&gt; block. This indicates that your upload was successful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxgehpdj9zo3aitwd4bdo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxgehpdj9zo3aitwd4bdo.png" alt="code information" width="800" height="127"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Test the function
&lt;/h2&gt;

&lt;p&gt;Scroll up, and on the right of &lt;code&gt;Function overview&lt;/code&gt;, you'll find the &lt;code&gt;function URL&lt;/code&gt;. This is the advanced option we just set. The &lt;code&gt;function URL&lt;/code&gt; is a powerful tool that can convert your RESTful requests into lambda handler requests, and then encode the handler output into a RESTful response. It's particularly useful for building API servers based on lambda.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pkux99umirrbo1ozoxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pkux99umirrbo1ozoxd.png" alt="copy function URL" width="368" height="204"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click the small copy button to duplicate the URL. You can now use &lt;code&gt;curl&lt;/code&gt; to test your lambda function. Input this command into your terminal and press enter. Remember, you need to replace the &lt;code&gt;function URL&lt;/code&gt; with your own.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-XPOST&lt;/span&gt; &lt;span class="s1"&gt;'https://{your_lambda_function_URL}/hello?
name=world'&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s1"&gt;'{"age": 16}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The magic happens, you should see the output like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RawPath: /hello, RawQueryString: name=world, Body: {"age": 16}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Review the codes again, and you'll find the structure. This structure is the second parameter of the handler. The magic here is that the lambda library decodes the path, query string, and body into our structure. While many other fields are available, these three are the most crucial.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;RequestEvent&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;RawPath&lt;/span&gt;        &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"rawPath"`&lt;/span&gt;
    &lt;span class="n"&gt;RawQueryString&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"rawQueryString"`&lt;/span&gt;
    &lt;span class="n"&gt;Body&lt;/span&gt;           &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"body"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reference
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-golang.html"&gt;Building Lambda functions with Go&lt;/a&gt;&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-urls.html"&gt;Lambda function URLs&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>go</category>
      <category>api</category>
      <category>aws</category>
    </item>
    <item>
      <title>How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Tue, 21 May 2024 15:44:17 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/how-to-count-tokens-in-frontend-for-popular-llm-models-gpt-claude-and-llama-efm</link>
      <guid>https://dev.to/ppaanngggg/how-to-count-tokens-in-frontend-for-popular-llm-models-gpt-claude-and-llama-efm</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Today, apps using Language Learning Machines (LLM) are growing fast. People use LLMs a lot to solve tough problems. LLMs are important in many areas like education, money matters, health, and more. Seeing this, developers worldwide are making lots of new apps using LLM. These apps are changing how we live, work, and talk to each other.&lt;/p&gt;

&lt;p&gt;Counting tokens before sending prompts to the Language Learning Model (LLM) is important for two reasons. First, it helps users manage their budget. Knowing how many tokens a prompt uses can prevent surprise costs. Second, it helps the LLM work better. The total tokens in a prompt should be less than the model's maximum. If it's more, the model might not work as well or might even make mistakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tokenizer in Backend vs Frontend
&lt;/h2&gt;

&lt;p&gt;In text processing, the calculation of prompt tokens is a crucial task and there are essentially two methods to accomplish this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Backend Implementation
&lt;/h3&gt;

&lt;p&gt;The first, and often most common, solution is to run a tokenizer in the backend system of the application. This approach involves exposing an Application Programming Interface (API) for the frontend to invoke when needed. This method is generally straightforward to implement, especially given the existence of Python libraries like &lt;code&gt;tiktoken&lt;/code&gt; and &lt;code&gt;tokenizers&lt;/code&gt; that are designed specifically for this purpose and are incredibly user-friendly.&lt;/p&gt;

&lt;p&gt;However, there are some drawbacks. Firstly, it's inefficient as it requires sending large volumes of text to the backend to receive a simple number. This can be particularly wasteful when handling exceptionally long text. Secondly, it misuses server CPU resources since the CPUs are constantly calculating tokens, which doesn't significantly contribute to the product's value. Lastly, notable latency occurs when a user is typing and waiting for the token count, leading to a poor user experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Frontend Implementation
&lt;/h3&gt;

&lt;p&gt;Thanks to &lt;a href="https://github.com/xenova/transformers.js" rel="noopener noreferrer"&gt;transformers.js&lt;/a&gt;, we can run the tokenizer and model locally in the browser. Transformers.js is designed to be functionally equivalent to Hugging Face's &lt;a href="https://github.com/huggingface/transformers" rel="noopener noreferrer"&gt;transformers&lt;/a&gt; python library, meaning you can run the same pretrained models using a very similar API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;To install via NPM, run:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

npm i @xenova/transformers


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To run &lt;code&gt;transformers&lt;/code&gt; on the client side of &lt;code&gt;next.js&lt;/code&gt;, you need to update the &lt;code&gt;next.config.js&lt;/code&gt; file:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;

&lt;span class="cm"&gt;/** @type {import('next').NextConfig} */&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// (Optional) Export as a static site&lt;/span&gt;
    &lt;span class="c1"&gt;// See https://nextjs.org/docs/pages/building-your-application/deploying/static-exports#configuration&lt;/span&gt;
    &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;export&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Feel free to modify/remove this option&lt;/span&gt;

    &lt;span class="c1"&gt;// Override the default webpack configuration&lt;/span&gt;
    &lt;span class="na"&gt;webpack&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// See https://webpack.js.org/configuration/resolve/#resolvealias&lt;/span&gt;
        &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;alias&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sharp$&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;onnxruntime-node$&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;nextConfig&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Code Sample
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Firstly, you need to import AutoTokenizer from &lt;code&gt;@xenova/transformers&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AutoTokenizer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@xenova/transformers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;You can create a tokenizer using the &lt;code&gt;AutoTokenizer.from_pretrained&lt;/code&gt; function, which requires the &lt;code&gt;pretrained_model_name_or_path&lt;/code&gt; parameter. &lt;code&gt;Xenova&lt;/code&gt; provides tokenizers designed for widely-used Language Learning Models (LLMs) like GPT-4, Claude-3, and Llama-3. To access these, visit the Hugging Face website, a hub for Machine Learning resources, at &lt;a href="https://huggingface.co/Xenova" rel="noopener noreferrer"&gt;huggingface.co/Xenova&lt;/a&gt;. The tokenizer configurations for the latest GPT-4o model are available at &lt;a href="https://huggingface.co/Xenova/gpt-4o" rel="noopener noreferrer"&gt;Xenova/gpt-4o&lt;/a&gt;. You can create a tokenizer for GPT-4o now:&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Xenova/gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;The usage of the tokenizer is very similar to the tokenizer library in Python. The &lt;code&gt;tokenizer.encode&lt;/code&gt; method can convert text into tokens.&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hello world&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// [24912, 2375]&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;As you can see, the tokenizer of transformers.js is extremely easy to use. Due to its core code's implementation in Rust, it can calculate tokens at an impressive speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Using this pure browser technique, I created an all-in-one website to provide &lt;a href="https://token-counter.app/" rel="noopener noreferrer"&gt;token counters&lt;/a&gt; for all popular models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmec8mi1lcrvcn4uu08hm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmec8mi1lcrvcn4uu08hm.jpg" alt="All in one LLM Token counter"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can test &lt;a href="https://token-counter.app/openai/gpt-4o" rel="noopener noreferrer"&gt;tokenizer of GPT-4o&lt;/a&gt; there. There is a screenshot of page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdbgziz3ns1aa6eqpxjoq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdbgziz3ns1aa6eqpxjoq.png" alt="Tokenizer of GPT-4o"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>tokenizer</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Simplifying PyTorch Installation: Introducing Install.PyTorch</title>
      <dc:creator>ppaanngggg</dc:creator>
      <pubDate>Mon, 01 Apr 2024 12:55:20 +0000</pubDate>
      <link>https://dev.to/ppaanngggg/easy-tool-to-find-the-right-pytorch-version-375l</link>
      <guid>https://dev.to/ppaanngggg/easy-tool-to-find-the-right-pytorch-version-375l</guid>
      <description>&lt;p&gt;&lt;strong&gt;Understanding the Complexity of ML Engineering:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Machine learning engineering involves working with diverse hardware devices and software dependencies. ML engineers often find themselves in a situation where they need to identify the ideal PyTorch version that supports their specific combination of devices and Python versions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Introducing Install.PyTorch:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Visiting the &lt;a href="https://install.pytorch.site/"&gt;Install.PyTorch&lt;/a&gt; website, ML engineers can easily determine the exact PyTorch version they need based on their specific requirements.&lt;/p&gt;

&lt;p&gt;Let’s consider an example to better illustrate how Install.PyTorch streamlines the PyTorch installation process. Suppose you need to install PyTorch with CUDA 12.1 and Python 3.8. By visiting &lt;a href="https://install.pytorch.site/?device=CUDA+12.1&amp;amp;python=Python+3.8"&gt;Install.PyTorch selected CUDA 12.1 and Python 3.8&lt;/a&gt; , you can effortlessly find the compatible PyTorch version. Another example you want to download only CPU PyTorch with Python 3.12, you can visit &lt;a href="https://install.pytorch.site/?device=CPU&amp;amp;python=Python+3.12"&gt;Install.PyTorch selected CPU and Python 3.12&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After selecting the appropriate PyTorch version using Install.PyTorch, you can download the PyTorch package directly through the provided link, or you can simply copy the pip install command line and execute it in their preferred environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;ML engineering can be a complex endeavor, especially when it comes to finding the right PyTorch version that satisfies specific device and Python requirements. Install.PyTorch emerges as a valuable tool, simplifying the process of identifying and installing the ideal PyTorch version for ML engineers.&lt;/p&gt;

</description>
      <category>pytorch</category>
      <category>cuda</category>
      <category>rocm</category>
      <category>python</category>
    </item>
  </channel>
</rss>
