Knowledge-and-Memory-Management v0.0.2: Knowledge Collection & Memory Management

#ai #automation #opensource

Knowledge-and-Memory-Management v0.0.2 is a clean release that introduces structured knowledge collection from web, video, and article sources, alongside memory management enhancements. All hardcoded paths have been replaced with the portable $AGENT_HOME variable, making the system deployable across environments without manual configuration. This release targets developers building autonomous systems that require persistent, queryable knowledge bases.

The core addition in v0.0.2 is the Knowledge Collection module. It abstracts content ingestion into a unified pipeline with plugins for specific sources: web scraping (HTML and RSS), video transcript extraction (via YouTube API or local file processing), and article parsing (supporting PDF, EPUB, and Markdown). Each plugin normalizes content into a chunked, timestamped structure that is passed directly to memory storage—no intermediate files are written by default.

Memory Management in v0.0.2 uses a vector-based index with optional persistent backends (SQLite, PostgreSQL, or Redis). Ingested knowledge is automatically embedded using a configurable model (default: all-MiniLM-L6-v2) and stored with metadata tags. The system supports automatic deduplication via content hashing and offers a hybrid retrieval mechanism that combines vector similarity with keyword filters. A new forget API allows explicit removal of entries by ID or age, enabling control over memory capacity.

The transition to $AGENT_HOME is the most impactful infrastructure change. Previously, the module hardcoded paths like /home/user/.km or C:\\Users\\.km. Now, all data directories (index files, plugin caches, config) are resolved at runtime relative to the KM_ROOT environment variable, which defaults to $AGENT_HOME/km. This makes containerized deployments and multi-user setups trivial—each agent instance automatically uses a separate, isolated directory.

The following code example demonstrates a basic workflow in v0.0.2: configuring an agent, collecting content from two sources, and querying memory.

from knowledge_memory import AgentMemory, KnowledgeCollector
import os

# Agent home is automatically resolved from KM_ROOT or $AGENT_HOME
agent_home = os.environ.get("AGENT_HOME", "/tmp/agent")
km = AgentMemory(home=agent_home)

# Initialize collector with source-specific options
collector = KnowledgeCollector(memory=km)
collector.add_source("web", url="https://example.com/report", selector="article")
collector.add_source("video", url="https://youtube.com/watch?v=abc123", language="en")

# Run ingestion (extracts, chunks, and stores in memory)
collector.run()

# Query memory with vector + keyword filter
results = km.query("latest findings from report", top_k=3, tags=["web", "article"])
for r in results:
    print(f"[{r.metadata['source']}] {r.content[:100]}...")

Note that add_source accepts plugin-specific parameters (e.g., selector for HTML, language for video). The collector handles all retries and error logging internally.

For developers migrating from earlier versions, the main API changes are:

AgentMemory replaces the old MemoryStore class.
All file paths must now be relative to $AGENT_HOME/km. If you were using absolute paths in custom plugins, update them to use agent_home parameter.
The knowledge collection plugins are separate PyPI extras (km[web], km[video], km[articles])—install what you need.

Potential gotchas in v0.0.2:

Video collection requires yt-dlp and ffmpeg binaries in PATH.
Article plugin uses pandoc for EPUB conversion; if absent, it falls back to plain text extraction.
Memory index upgrades are not automatic between minor versions—run km-migrate index after upgrading.

Looking ahead, the v0.1.0 roadmap includes multi-agent shared memory and temporal decay for entries. For now, v0.0.2 provides a solid foundation for applications that need to ingest web content, maintain a growing knowledge base, and retrieve it efficiently. The $AGENT_HOME shift ensures that this works equally well in a Docker container, on a Raspberry Pi, or in a cloud function.

Try it out: pip install knowledge-memory[web,video] and set AGENT_HOME to your working directory. The examples in the /plugins folder show how to extend the collector for custom content sources.

DEV Community

Knowledge-and-Memory-Management v0.0.2: Knowledge Collection & Memory Management

Top comments (0)