DEV Community: Alain Chan

Making Local AI Memory Readable

Alain Chan — Sat, 11 Jul 2026 23:27:53 +0000

http://github.com/Flow-Research/local-first-AI/pull/10

If you're an online business, imagine saying to your local AI, “Remember, we selected SQLite for offline storage.” Stores decision correctly, closes database and does not say anything else. One week later you ask him what he remembers. Silence.
If it doesn't display to you what it remembers, it's not a memory, it's a mysterious SQLite cave with commitment problems.
This is why it is important to have a read layer for local context. Data can be stored in a system by writing data. When read back it is evident that information still exists, in the right structure, and can be interpreted by people or other parts of an application.

Local-first AI shouldn't require users to put their faith in a black box. The users should be able to view the content that was stored, the source and time of the content.

Empty database is not an error! It's a legit state, particularly if it's an application that is new. Read functions must return an empty list if they fail to find any data or if they imagine data. This allows caller to have a definite answer: memory store works, but it doesn't show anything yet. An appropriate message, like “No context saved yet.” can then be provided by Interface.

Likewise same treatment for missing IDs. Records can disappear, links may be out of date or users might ask for an ID that never existed. None is returned to indicate that "Context item not found". A collision would make a bad day even worse. Forgetting should be dealt with gracefully with good memory.

The more you add to the list, the more noise there will be in the memory. The user notes, project notes, device log, learning record, decisions about configuration and context of conversation are separated by filtering on context_type. When users question the decisions being taken, AI should not conduct a random search of their device's logs, such as trying to find one receipt in the logs of all the kitchen drawers.

Most importantly, recent context is important; new information is more relevant than old information. When users need to check out the latest, it is beneficial to display the most recent items first. It also provides a reasonable initial point for subsequent search and prompt building systems. Sorting by created_at is a standard way to ensure the ordering of results is predictable, as determined by ID as tie-breaker.

Readable memory requires more than title/content. metadata describes the context of each record, source provides information about the origin of the data, tags allow to group related ideas together, importance indicates which memories might benefit from further investigation, created_at and updated_at indicate age or freshness of information. These fields don't ensure the truth but they provide the user with evidence to help make the decision whether or not to trust the memory.

Read layer also will serve as a basis for subsequent AI features. Search requires accurate documentation to examine. There should be a structure for context to format prompt preparation. Selected memories with sufficient metadata are required to make inferences. Each subsequent component is more readily developed and tested if the read layer returns complete predictable dictionaries.

Local AI memory can come in handy only when it's in sight. System remembers it in Storage. Humans can check memory by reading. Search, prompts, and inference can then be used to go beyond the generic SQLite cave.

Why Local-First AI Starts With Good Context Capture

Alain Chan — Sun, 05 Jul 2026 20:30:07 +0000

https://github.com/Flow-Research/local-first-AI/pull/3

This week I was doing the Capture / Create Context portion of the Local Context Store. My job was to create_context_item which was a function that accepts context data that is useful, validates it and stores it in a local SQLite database. The aim was simple; to build an AI assistant that can read, search, update and reason with memory, it had to be able to reliably store that memory.

A local-first AI is not a generic text saved somewhere, as it is in the context of a traditional AI.Unlike traditional AI, in a local-first AI, the context is not just random text stored somewhere. Information that could be useful for the assistant to learn about a user, project, device event, learning record, configuration decision or a future conversation. As this information is not stored in any structured way, the memory system can easily become cluttered. From then on, the AI can get the wrong thing, misinterpret old memories or consider weak memories as reliable memories. That is, if we save all of the information in a non-structured way, AI memory is no longer brain-like, but it is more like the Downloads folder.

I used create_context_item for this task, which are the fields context_type, title, content, source, tags and importance. The function can only be passed valid context types like user_note, project_note, device_log, learning_record, config_decision or conversation_context. This is important because, different types of memory need to be treated differently. The distinction between a device log and a project note, as well as between a configuration decision and a conversation context, is that they are distinct. Assigning each item a definite class makes it easier to predict where it can be found in the future when it is needed for reading and/or searching.

In addition, validation for title and content has been added. Memory items with empty titles and empty content are rejected as they should make sense when read at a later time. The title assigns an identity to the item, the content contains the useful information of the item. This function behaves as per the rest of the team, as I have taken the shared db_contract.py helpers to validate, normalize tag, check for importances, check time stamps and connect to the database.

The field which is present on the source is also of importance. It documents the source of the context, like a project discussion, a meeting note, device event or an event in a weekly learning log. This will help the memory to be trusted later. In the future the AI (or developer) will be able to find a stored context item and the source will help to explain why it exists and if it is relevant.

The function also automatically saves the 'created_at' and 'updated_at' attributes. If a context item is created for the first time then both the time stamps will be the same. Subsequent updates of a feature will not affect created_at, so it will indicate the time the memory was captured, and updated_at will indicate when the feature was last updated. This can be helpful to sort, filter, sync and view the history of local data.

Once implemented I performed the script for the local storage test for Week 2. The Fellow 1 tests succeeded: Function can create a context item, return a unique integer ID, can store the correct fields, can save timestamps, reject empty title/content and reject invalid context types. This is to ensure the create step is the entry point for the Local Context Store.

This task demonstrated to me that there is no model to begin with when using AI memory locally. It begins with data that is captured clean. Creating a little function might sound easy, but it will establish the rules for all that follows it. Fellow 2 is able to read up the context here, Fellow 3 can search it and Fellow 4 can update or delete it. The remaining memory system is stable if it is consistent to create.

Dey, A. K., Abowd, G. D., & Salber, D. (2001). A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human–Computer Interaction, 16(2–4), 97–166. https://doi.org/10.1207/S15327051HCI16234_02

Kleppmann, M., Wiggins, A., van Hardenberg, P., & McGranaghan, M. (2019). Local-first software: you own your data, in spite of the cloud. Onward! 2019. https://doi.org/10.1145/3359591.3359737

Sikos, L. F., & Philp, D. (2020). Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge Graphs. Data Science and Engineering, 5, 293–316. https://doi.org/10.1007/s41019-020-00118-0

Wang, R. Y., & Strong, D. M. (1996). Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems, 12(4), 5–33. https://doi.org/10.1080/07421222.1996.11518099

From Prototype to Polished: Reviving My Dart Personal Blog with GitHub Copilot

Alain Chan — Sat, 06 Jun 2026 20:52:44 +0000

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

I revived personal_blog, a server-rendered personal blogging platform built with Dart Shelf, PostgreSQL, Mustache templates, and Tailwind CSS.

Repository: https://github.com/AlainDevs/personal_blog

The project started as a bare-bones Dart web app and gradually became a small publishing system with:

Public blog pages and individual post pages
User registration and login
Admin-only pages for posts, categories, users, and settings
PostgreSQL persistence
Seeded demo content and demo users
Tailwind-powered styling
Docker Compose setup for one-command local deployment
Automated tests for authentication, settings, comments, middleware, and blog routing
A separate Docker-based performance testing stack using wrk and Lua

What makes this project meaningful to me is that it is intentionally simple. It is not trying to be another huge CMS. It is a calm, personal publishing space: fast to run, easy to understand, and small enough for one developer to confidently maintain.

Before the comeback, it still felt like a rough prototype. It had pieces of a blog, but it was not easy enough to run, test, benchmark, or explain to another person. The Finish-Up-A-Thon gave me a reason to turn it into something I could actually hand to someone and say: clone it, run one command, log in, and start exploring.

Demo

Project repository: https://github.com/AlainDevs/personal_blog

Run the demo locally with Docker Compose:

git clone https://github.com/AlainDevs/personal_blog.git
cd personal_blog
docker compose up --build

Then open:

http://localhost:8080

Admin area:

http://localhost:8080/admin

Performance report included in the repository:

The latest recorded smoke benchmark reported:

1,877 total requests in 2.01 seconds
935.31 requests per second
4.27ms average latency
7.45ms 99th percentile latency
No reported socket errors
No reported non-2xx/3xx responses

Validation checks recorded in the project:

docker compose -f docker-compose.performance.yml config --quiet passed
dart analyze passed
dart test --timeout=30s passed

The Comeback Story

The project had eight commits in total. The first four commits created the original prototype: project structure, Tailwind setup, CRUD-oriented services, Docker scripts, and an early server implementation.

The Finish-Up-A-Thon comeback happened in the top four commits. These were the commits where I used GitHub Copilot, together with my AI coding context and rules including ByteRover, DCM Flutter Guidelines, and AI rules for Flutter/Dart-style development, to push the project from “works on my machine prototype” toward “finished, runnable, documented project.”

Commit 1: `8345dbd` — Docker infrastructure and service layer enhancements

Commit message: Add Docker infrastructure and service layer enhancements

This was the biggest comeback commit. It changed 44 files with 4,183 insertions and 1,577 deletions.

Before this commit, the project had useful pieces, but the architecture was still too tangled. Server setup, routing, database access, auth behavior, and page rendering were not cleanly separated enough for confident testing or future changes.

What changed:

Added a cleaner createAppHandler() function so the Shelf app can be built both by the executable and by tests.
Improved the middleware flow so JWT auth context is attached to requests.
Protected admin pages and admin APIs more consistently.
Added DatabaseConnection as a shared PostgreSQL bootstrap layer.
Added schema creation and seed data for users, categories, posts, comments, post categories, and application settings.
Added an AppSetting model.
Added SettingsService and SettingsHandler.
Added admin settings UI for toggling public registration.
Improved service-layer boundaries for users, posts, categories, comments, and settings.
Used safer named SQL patterns through Sql.named style interactions.
Updated Docker Compose to use PostgreSQL 18 and a persistent database volume.
Added tests for auth, registration settings, comments, and admin middleware.

This commit was where the project stopped being a pile of working code and started feeling like an application with structure.

Copilot helped here by accelerating the repetitive but important parts: service methods, handler wiring, model mapping, test doubles, and route refactors. The AI rules helped keep the generated code closer to consistent Dart conventions instead of random snippets.

Commit 2: `d8832a9` — Safer route parameter handling

Commit message: refactor: Replace direct access to request parameters with a utility function for safer path string retrieval

This was a smaller but important hardening commit.

Before this commit, route handlers accessed path parameters directly, for example by reading request.params['slug'] inline. That works, but it spreads low-level request handling across the codebase.

What changed:

Added readPathString(Request request, String key) in request_utils.dart.
Updated integer path parsing so readPathInt() builds on top of readPathString().
Updated the blog detail route to read the slug through the utility function.
Added a test proving that /blog/a-tiny-publishing-checklist renders the correct blog detail page.
Added fake post/comment services so the page route can be tested without a real database.

This commit represents the “finish-up” mindset: not just adding features, but reducing fragile patterns and locking behavior with tests.

Copilot helped by suggesting the test structure and the fake service overrides. That let me quickly validate the route behavior rather than only eyeballing the refactor by using MCP.

Commit 3: `f52a043` — Performance testing infrastructure

Commit message: feat: Add performance testing infrastructure with Docker and Lua scripts

This commit added 1,015 lines across seven files.

Before this commit, I could run the app, but I did not have a repeatable way to answer a basic question: “How does it behave under load?”

What changed:

Added docker-compose.performance.yml, a separate performance testing stack.
Added an Alpine-based performance Dockerfile that builds wrk.
Added request_mix.lua for weighted traffic across realistic routes:
- homepage
- seeded blog detail pages
- generated CSS
- public JavaScript
Added generate_report.js, which runs Docker Compose, captures benchmark output, parses results, and writes a GitHub-ready Markdown report.
Added PERFORMANCE_RESULTS.md with the latest benchmark output.
Added npm scripts:
- npm run performance:report
- npm run performance:report:smoke
Documented how to tune benchmark load with environment variables.

This was a big step toward making the project feel complete. A personal blog should not only have features; it should be easy to verify that pages respond quickly and that changes do not obviously break performance.

Copilot was especially useful here because the work crossed several small domains: Docker Compose, shell readiness checks, Lua route selection for wrk, Node.js process management, Markdown report generation, and benchmark parsing.

Commit 4: `fdb4e92` — README polish and beginner-friendly setup

Commit message: docs: Update README to enhance Docker Compose instructions and clarify setup process

This final comeback commit focused on usability.

Before this commit, the README still described the project like a basic Dart web app and told users to install WebDev manually. That no longer matched the revived project by creating our own agent - doc-reviewer.

What changed:

Rewrote the README around Docker Compose as the primary way to run the project.
Explained that users do not need to install Dart, Node.js, PostgreSQL, or WebDev locally.
Added step-by-step startup instructions.
Added the local URL: http://localhost:8080.
Added seeded admin and reader accounts.
Added the admin URL: http://localhost:8080/admin.
Added stop, restart, and database reset instructions.
Kept the performance testing section so users can generate benchmark reports.

This was the last mile of finishing the project. The app may be technically complete, but if another developer cannot run it easily, it still feels unfinished. This README update made the project approachable.

Copilot helped turn rough notes into a clearer onboarding path and made the documentation more user-focused.

My Experience with GitHub Copilot

GitHub Copilot helped me finish the project in the way I actually needed: not by replacing my decisions, but by keeping momentum while I worked through lots of small, connected tasks.

I used Copilot with my AI development context, including ByteRover, DCM Flutter Guidelines, and AI rules for Flutter/Dart-style development. Even though this is a Dart Shelf web server rather than a Flutter UI app, those rules still helped encourage cleaner structure, explicit tests, safer utilities, and more maintainable code.

The most useful parts of Copilot were:

Turning architectural intent into concrete Dart service and handler code.
Helping refactor the Shelf server into a testable createAppHandler() structure.
Suggesting test cases and fake services for auth, settings, comments, and blog routes.
Speeding up repetitive model and mapping work.
Helping write Docker Compose and benchmark infrastructure without constantly switching mental context.
Helping polish the README so the final project is easier for another person to run.

The before-and-after arc is clear to me:

Before, personal_blog was a promising but unfinished side project. It had the shape of a blog, but it still required too much local setup knowledge, had less confidence around tests, and did not have a clear performance story.

After, it is a Dockerized Dart personal blog with seeded content, admin flows, persistent PostgreSQL storage, application settings, automated tests, performance benchmarking, and beginner-friendly documentation.

That is exactly what I wanted from this challenge: not to start something new, but to finally finish something I already cared about.

Breaking the Silence: Running Hermes Agent with Local C++ Voice Cloning (VoxCPM2) on ARM64

Alain Chan — Fri, 29 May 2026 16:20:13 +0000

This is a submission for the Hermes Agent Challenge

Breaking the Silence: Running Hermes Agent with Local C++ Voice Cloning (VoxCPM2) on ARM64

Most AI agents are deaf and mute, communicating solely through text or latency-heavy cloud TTS APIs. When I set out to build a fully autonomous morning assistant using Hermes Agent hosted locally on my Debian ARM64 server, I wanted something different. I wanted a private, high-fidelity, cloned voice that could talk to me natively on WhatsApp every morning with custom-tailored weather briefings and diet-aware recommendations.

To achieve this, I integrated Hermes with VoxCPM2—a highly optimized multilingual speech-cloning model running in clean C++. Along the way, I hit some brutal low-level compilation blocks, model-packaging quirks, and real-time audio pipeline hurdles.

Here is the exact blueprint of how I overcame these ARM64 limitations, patched GGML, and wired Hermes Agent to speak to me in a pristine, cloned voice.

The Vision: A Voice-First Private Daily Agent

The goal was to leverage Hermes Agent's autonomous Cron and Persistent Memory systems. Every morning at a scheduled time, a cron job fires a custom Python script. Hermes gathers local weather forecasts, synthesizes them with personal preferences, and prepares a daily briefing.

Instead of printing text, the agent passes the payload to a local C++ inference pipeline, clones a target voice, packages the audio, and sends it directly to my WhatsApp as an instant, native voice message.

+-----------------------------------------------------------------+
|                         Hermes Agent                            |
|  [Cron Job (Scheduled)] -> [Weather/News Fetch] -> [Persist Mem]|
+-------------------------------+---------------------------------+
                                | (Text Payload)
                                v
+-----------------------------------------------------------------+
|                     VoxCPM2 C++ Engine                          |
|  [16kHz Reference WAV] -> [ggml.cpp] -> [High-Fid FP16 Voice]   |
+-------------------------------+---------------------------------+
                                | (Raw WAV Output)
                                v
+-----------------------------------------------------------------+
|                     Audio Pipeline & Delivery                   |
|  [FFmpeg (OGG/Opus)] -> [Local WA Bridge] -> [Native Voice Msg] |
+-----------------------------------------------------------------+

Hurdle 1: Bypassing the 64-Character GGML Tensor Limit

VoxCPM2's C++ inference engine relies on a clean, local build of ggml. When compilation finished and I attempted to load the larger, highly expressive GGUF models for multimodal/cloned speech, the engine crashed instantly with loading errors.

The Cause:

GGML historically hardcodes GGML_MAX_NAME (the maximum length of a tensor's name) to 64 characters. Because high-fidelity speech models contain deep, hierarchical layers with descriptive naming schemes, their tensor names easily exceed 64 characters.

The Fix:

I had to patch the underlying GGML source before building. If you are running into this, navigate to third_party/ggml/include/ggml.h and increase the limit to 128:

// Locate in third_party/ggml/include/ggml.h
// Old definition:
// #define GGML_MAX_NAME 64

// New patched definition:
#define GGML_MAX_NAME 128

After modifying this, re-running the C++ make pipeline allowed the GGUF loader to successfully parse the deep voice layers without truncation or memory segmentation faults.

Hurdle 2: Untangling Model Packages for C++ Inference

Many single-file GGUF packages available online (e.g., standard model merges) lack the necessary metadata required by the raw C++ inference binary of VoxCPM2.

To run end-to-end voice cloning ("Ultimate Mode") successfully, I discovered that you must load separated model files that preserve explicit metadata structure:

base_lm_q8_0.gguf (The quantized base language model weights)
residual_lm_q8_0.gguf (The residual weights)
Or verified unified packages such as voxcpm2-q8_0-audiovae-f16.gguf from bluryar/VoxCPM-GGUF.

By utilizing an FP16 high-fidelity model on an ARM64 CPU, we prioritize pristine vocal textures and rich tone over fast but robotic lower-precision modes.

Hurdle 3: Designing the Real-Time Audio & Delivery Pipeline

Getting Hermes to talk natively on WhatsApp requires an exact, low-latency audio pipeline.

Step 1: Format Reference Audio

VoxCPM2 C++ cloning requires a pristine 16kHz mono WAV format reference file. Our utility script converts a standard MP3 sample to the exact format needed before running the model:

# Conversion using FFmpeg in Python subprocess
subprocess.run([
    "ffmpeg", "-y", "-i", args.ref_mp3,
    "-ar", "16000", "-ac", "1", args.ref_wav
], check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

Step 2: C++ Inference

The utility executes the C++ binary with customized parameters, leveraging multi-threading optimized for the server's ARM64 CPU:

/home/debian/VoxCPM.cpp/build/examples/voxcpm_tts \
    --model-path /path/to/voxcpm2-f16-audiovae-f16.gguf \
    --prompt-audio /path/to/ref.wav \
    --prompt-text "Reference voice transcript text." \
    --text "Target synthesis weather report text." \
    --output /path/to/output.wav \
    --backend cpu \
    --threads 4 \
    --cfg-value 2.0 \
    --inference-timesteps 10

Step 3: Low-Latency Encoding & WhatsApp Bridge Delivery

Standard WAV files arrive on WhatsApp as document attachments. To deliver them as native, instant voice messages (playable voice bubbles), we transcode them into .ogg format using the highly compressed Opus codec.

We can also apply FFmpeg's dynaudnorm (dynamic audio normalizer) filter to keep output volume levels consistent:

ffmpeg -y -i output.wav -filter:a dynaudnorm -c:a libopus output.ogg

Once the audio file is ready, the script programmatically makes an HTTP POST request to a local WhatsApp API bridge endpoint /send-media with the payload:

{
  "chatId": "user_whatsapp_jid@lid",
  "filePath": "/path/to/output.ogg",
  "mediaType": "audio"
}

This forces WhatsApp to render the media natively as a press-to-play instant voice message bubble!

Combining It All: The Self-improving Local Weather Scheduler

The backbone of this workflow consists of two main Python components scheduled and triggered under Hermes Agent:

cron_morning_weather.py: Fetches real-time JSON forecast from wttr.in for the user's location, parses hourly temperatures, converts English weather descriptions into natural, expressive Cantonese, decides if an umbrella is needed, and outputs a cute morning briefing.
run_clone.py: Receives the text payload, packages the model, compiles the C++ parameters, encodes the audio using ffmpeg to libopus, and delivers it to the local WhatsApp gateway bridge.

The Magic of Hermes Agent: Memory and Location Privacy

What makes this system genuinely autonomous rather than a simple cron-bash script is Hermes's self-improving memory architecture.

Persistent Memory (User Profile): Hermes maintains an ongoing log of user preferences across sessions. It remembers that Hermes follow user preferences for example like philosophy.
Context-Aware Briefings: When generating the script text, Hermes synthesizes these facts from its memory. The morning weather update isn't just a reading of numbers; it dynamically adds philosophical thoughts suited to the day's weather.
Timezone Synchronization: Because scheduled cron tasks run in the server's UTC background, Hermes automatically calculates local offset (e.g. BST vs UTC) to ensure the morning briefing is delivered exactly at the user's local wake-up time.
Autonomous Skill Management: When there are path updates or script logic tweaks, Hermes adjusts its internal reference memory, avoiding stale or cached references during execution.

Why Open-Source Agents Win

Running Hermes Agent locally on an ARM64 server proved something crucial: We do not need to rely on proprietary or closed-source ecosystems to build delightful, highly personalized AI experiences.

With a 4-line patch to ggml.h, an optimized C++ inference binary, and Hermes's robust multi-session persistent memory, I have a private, voice-cloning companion that knows my diet, my daily schedule, and my philosophical quirks—costing virtually nothing when idle.

If you are building with Hermes, don't just stay in the terminal. Give your agent a voice, patch those C++ boundaries, and build something that feels alive!

DEV Community: Alain Chan

Making Local AI Memory Readable

Why Local-First AI Starts With Good Context Capture

From Prototype to Polished: Reviving My Dart Personal Blog with GitHub Copilot

What I Built

Demo

The Comeback Story

Commit 1: 8345dbd — Docker infrastructure and service layer enhancements

Commit 2: d8832a9 — Safer route parameter handling

Commit 3: f52a043 — Performance testing infrastructure

Commit 4: fdb4e92 — README polish and beginner-friendly setup

My Experience with GitHub Copilot

Breaking the Silence: Running Hermes Agent with Local C++ Voice Cloning (VoxCPM2) on ARM64

Breaking the Silence: Running Hermes Agent with Local C++ Voice Cloning (VoxCPM2) on ARM64

The Vision: A Voice-First Private Daily Agent

Hurdle 1: Bypassing the 64-Character GGML Tensor Limit

The Cause:

The Fix:

Hurdle 2: Untangling Model Packages for C++ Inference

Hurdle 3: Designing the Real-Time Audio & Delivery Pipeline

Step 1: Format Reference Audio

Step 2: C++ Inference

Step 3: Low-Latency Encoding & WhatsApp Bridge Delivery

Combining It All: The Self-improving Local Weather Scheduler

The Magic of Hermes Agent: Memory and Location Privacy

Why Open-Source Agents Win

Commit 1: `8345dbd` — Docker infrastructure and service layer enhancements

Commit 2: `d8832a9` — Safer route parameter handling

Commit 3: `f52a043` — Performance testing infrastructure

Commit 4: `fdb4e92` — README polish and beginner-friendly setup