DEV Community

mage0535
mage0535

Posted on • Originally published at hermes-agent.nousresearch.com

Knowledge and Memory Management v0.0.2: Portable Knowledge Collection at Scale

For developers managing ever-growing information streams from web pages, video transcripts, and technical articles, the challenges of consistent tooling and environment setup often overshadow the actual value of knowledge curation. Version 0.0.2 of the Knowledge-and-Memory-Management project focuses on exactly that friction, delivering a clean release that replaces hardcoded paths with a portable $AGENT_HOME variable and extends the core collection pipeline. This is not a feature-heavy expansion; it is a pragmatic cleanup and a baseline for scalable knowledge workflows.

The key architectural change is the elimination of personal path references. Previous configurations might have pointed to /home/user/data or C:\Users\dev\projects. In v0.0.2, every directory reference uses $AGENT_HOME, an environment variable that must be set before instantiation. This makes the entire toolchain moveable—whether you are running it on a laptop, a CI runner, or a production server. The release also streamlines how knowledge is ingested from three primary sources: web, video, and articles.

Knowledge Collection Pipeline

The collection system is now separated into three composable extractors. Each source type has its own adapter, but they all output a normalized document structure with fields for title, content, timestamps, and source metadata. This uniformity is critical for downstream memory management and retrieval.

  • Web collection uses a headless browser fetch with a configurable CSS selector for main content. It strips navigation, ads, and sidebars, returning clean markdown. The adapter handles JavaScript-heavy sites by default, waiting for load or a custom selector timeout.
  • Video collection targets video platforms via an API-first approach. For YouTube, it uses the captions or automatic transcripts. It extracts the video description and comments as supplementary context. The output is a plaintext transcript with chapter markers if available.
  • Article collection is designed for RSS feeds and academic papers. It parses metadata (DOI, authors, publication date) separately from the body text. Articles longer than 10,000 characters are split into logical sections with internal anchors for reference.

All collectors respect a global rate-limit and can be chained in a single pipeline. For example, you can fetch a YouTube video, get its transcript, search the comments for related articles, and then fetch each of those articles in a single command.

Memory Management and Portability

The memory layer stores collected documents as flat files organized by source type and date. The directory structure is strictly relative to $AGENT_HOME. The release includes a migration script that scans your existing knowledge base and replaces any hardcoded paths with the variable. This script is idempotent and defaults to dry-run mode.

The configuration is now purely environment-aware. A short example of the YAML configuration file illustrates the portability:

# knowledge-config.yml
agent:
  home: ${AGENT_HOME}  # must be set in environment
  log_level: info

collectors:
  web:
    enabled: true
    timeout: 15
    selectors:
      article: "main"
      sidebar: ".ad, nav, footer"
    output: "${AGENT_HOME}/collections/web"

  video:
    enabled: true
    platforms:
      - youtube
      - vimeo
    output: "${AGENT_HOME}/collections/video"

  article:
    enabled: true
    feeds:
      - https://feeds.example.com/tech
    output: "${AGENT_HOME}/collections/articles"

memory:
  store: filesystem
  path: "${AGENT_HOME}/memory"
  clean_interval_hours: 24
Enter fullscreen mode Exit fullscreen mode

No path in this file is specific to one machine. The tool automatically resolves all $AGENT_HOME references during initialization. If the variable is unset at startup, it exits with a clear error message and a hint to set it via .env or shell export.

Practical Use

Experienced developers will appreciate the removal of guesswork. In earlier versions, setting up a new environment meant editing multiple files or copying directories from an old machine. Now you simply export AGENT_HOME=/mnt/portable_knowledge and run. The collectors also respect AGENT_HOME for temporary directories, cache, and logs, keeping your system tmp clean.

The single code example above is enough to get a full pipeline running. For instance, to fetch and store a web article, you can run:

knowledge-collect --config knowledge-config.yml \
  --source web --url https://example.com/tech-article
Enter fullscreen mode Exit fullscreen mode

The tool will fetch the page, extract the main content, and save it to $AGENT_HOME/collections/web/. If you later move to another machine, the same command works as long as $AGENT_HOME is set accordingly.

Clean Release Mechanics

This release does not add any new collectors. Instead, it consolidates the existing ones, fixes race conditions in the video transcript fetcher, and removes deprecated adapters for now-dead knowledge bases. The memory module (the "S" in the original topic, representing storage) now handles deduplication by content hash before writing. This prevents redundant storage when the same article is fetched from two different feeds.

The entire codebase was audited for hardcoded paths. Every occurrence of /home or %USERPROFILE% was replaced with the environment variable at runtime, not at compile time. This means you can mount a knowledge base from an external drive and instantly have all collections active.

Bottom Line

v0.0.2 is not a revolution; it is a necessary cleanup that makes the tool production-ready for teams. The $AGENT_HOME portability removes the biggest friction point in setting up knowledge pipelines across systems. The collection pipeline remains straightforward for the three main formats, and the configuration is self-contained. For developers who want a reliable, environment-agnostic knowledge management system, this release is a solid foundation.

Top comments (0)