<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: chuanman2707</title>
    <description>The latest articles on DEV Community by chuanman2707 (@chuanman2707).</description>
    <link>https://dev.to/chuanman2707</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F157955%2F8bf48c22-1c7c-4b55-a279-f1f782c1022d.png</url>
      <title>DEV Community: chuanman2707</title>
      <link>https://dev.to/chuanman2707</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chuanman2707"/>
    <language>en</language>
    <item>
      <title>Building a Local-First Hotel Receptionist with Gemma 4, GGUF, and llama.cpp</title>
      <dc:creator>chuanman2707</dc:creator>
      <pubDate>Fri, 15 May 2026 08:58:22 +0000</pubDate>
      <link>https://dev.to/chuanman2707/building-a-local-first-hotel-receptionist-with-gemma-4-gguf-and-llamacpp-51a4</link>
      <guid>https://dev.to/chuanman2707/building-a-local-first-hotel-receptionist-with-gemma-4-gguf-and-llamacpp-51a4</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;I have been building CapyInn, a hotel management project for small hotels, guesthouses, and homestays in Vietnam.&lt;/p&gt;

&lt;p&gt;The original CapyInn project started before this challenge, but the new work I focused on for this Gemma 4 challenge was the AI receptionist layer: a local-first front-desk assistant powered by Gemma 4, converted to GGUF, and served through llama.cpp.&lt;/p&gt;

&lt;p&gt;The goal was not to build a general chatbot.&lt;/p&gt;

&lt;p&gt;The goal was to build a bounded receptionist copilot that can help hotel staff answer common guest questions, while safely deferring anything it cannot verify.&lt;/p&gt;

&lt;p&gt;For small hospitality businesses, this matters because late-night guest messages, check-in questions, room details, and policy questions often arrive when staff are busy or asleep. But at the same time, the assistant should not pretend to confirm payments, approve fake documents, or access private hotel systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/uYGbkv2HfHQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;CapyInn Receptionist is a local AI front-desk copilot for small hotels in Vietnam.&lt;/p&gt;

&lt;p&gt;It can help with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;answering room and check-in questions&lt;/li&gt;
&lt;li&gt;drafting replies for late-night guest messages&lt;/li&gt;
&lt;li&gt;asking follow-up questions when booking information is incomplete&lt;/li&gt;
&lt;li&gt;explaining basic hotel policies&lt;/li&gt;
&lt;li&gt;refusing or deferring sensitive requests to hotel staff&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most important behavior is the boundary.&lt;/p&gt;

&lt;p&gt;If the assistant cannot verify something, it should not make it up. For example, it should not confirm a payment, accept suspicious guest documents, or expose system access. It should hand those cases back to a human.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;I fine-tuned and packaged a Gemma 4-based receptionist model for this hospitality workflow, then converted it into GGUF so it could run locally with llama.cpp.&lt;/p&gt;

&lt;p&gt;The local model file I used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;capyinn-gemma-4-Q5_K_M.gguf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model runs locally on a Mac mini with Apple M4, 10-core CPU, 16 GB unified memory, using llama.cpp with Metal/BLAS.&lt;/p&gt;

&lt;p&gt;In my latest conservative benchmark run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generation speed: about 29 tokens/second&lt;/li&gt;
&lt;li&gt;prompt/prefill speed: about 511 tokens/second&lt;/li&gt;
&lt;li&gt;cold CLI startup to first token: about 2.1 seconds&lt;/li&gt;
&lt;li&gt;short 64-token capped response from cold startup: about 3.7 seconds&lt;/li&gt;
&lt;li&gt;RAM allocation: about 6.0 GiB at 4K context&lt;/li&gt;
&lt;li&gt;RAM allocation: about 7.0 GiB at 128K context&lt;/li&gt;
&lt;li&gt;GGUF file size: 3.35 GiB&lt;/li&gt;
&lt;li&gt;metadata context window: 131,072 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was good enough for a practical front-desk assistant on a small local machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Local AI Matters Here
&lt;/h2&gt;

&lt;p&gt;A hotel receptionist assistant handles information that can be sensitive: guest names, booking details, arrival times, special requests, and sometimes payment-related questions.&lt;/p&gt;

&lt;p&gt;For a small hotel, sending everything to a remote API may not always be ideal.&lt;/p&gt;

&lt;p&gt;A local Gemma 4 setup gives a few practical advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lower ongoing cost&lt;/li&gt;
&lt;li&gt;better privacy posture&lt;/li&gt;
&lt;li&gt;usable latency on consumer hardware&lt;/li&gt;
&lt;li&gt;no dependency on cloud availability for basic replies&lt;/li&gt;
&lt;li&gt;easier deployment for small businesses that already have an office computer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tradeoff is that the assistant must be carefully scoped. Local does not automatically mean safe. The model still needs clear task boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Safety Rule I Used
&lt;/h2&gt;

&lt;p&gt;The main rule is simple:&lt;/p&gt;

&lt;p&gt;If the assistant cannot verify it, it should not confirm it.&lt;/p&gt;

&lt;p&gt;That means the assistant can draft helpful replies, but it should defer sensitive actions such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;payment confirmation&lt;/li&gt;
&lt;li&gt;suspicious guest documents&lt;/li&gt;
&lt;li&gt;account or system access&lt;/li&gt;
&lt;li&gt;policy exceptions&lt;/li&gt;
&lt;li&gt;anything requiring staff approval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This made the demo much more realistic. A hotel AI assistant should be helpful, but it should also know when to stop.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;The biggest lesson was that model capability is only one part of the product.&lt;/p&gt;

&lt;p&gt;The harder part is designing the workflow around the model.&lt;/p&gt;

&lt;p&gt;For this use case, I cared less about making the assistant sound impressive, and more about making it useful, bounded, and honest.&lt;/p&gt;

&lt;p&gt;Gemma 4 worked well for this because it was capable enough for conversational front-desk tasks, while still small enough to run locally after quantization.&lt;/p&gt;

&lt;p&gt;The final result is not a replacement for hotel staff. It is a copilot that can reduce repetitive work and help small hotels respond faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;GitHub:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/chuanman2707/CapyInn" rel="noopener noreferrer"&gt;https://github.com/chuanman2707/CapyInn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Model:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/chuanman2707/capyinn-gemma-4-e2b-it-q5-k-m-gguf" rel="noopener noreferrer"&gt;https://huggingface.co/chuanman2707/capyinn-gemma-4-e2b-it-q5-k-m-gguf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Demo video:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/uYGbkv2HfHQ" rel="noopener noreferrer"&gt;https://youtu.be/uYGbkv2HfHQ&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
