when you are building a memory system for your agents you will likely go through following abstractions during development. these abstractions have been listed in the order of manual instrumentation required, and go from definitive action to more and more probabilistic action.
few definitions to make sure we are on the same page:
memory: any peace of data that is passed either to an agent or a model or to the user
memory system: data storage, could be any combination of sql, nosql, blob storage
agents: a catch-all term for agents, workflows, and single api call
why would we even want to build such a system?
when operating on a small scale you can keep shovelling entire corpus for a model to chew upon. think applications such as 'chat with pdf', or data analytics over a single spreadsheet. but as the data and complexity grows you need to be more careful with your context management; both in order to maintain accuracy and reduce cost/token consumption. here are some scenarios where you might want a dedicated memory system.
due to limitation in the context window of a model, its better to provide specific information as part of prompt, instead of shovelling the entire corpus.
you might be doing data analysis over a stream of unstructured, in which case its better to gather and provide specific entities instead of entire blob.
you might be building a user preference profile, so that you the response can be tailored instead of generic.
the abstractions
query represents any piece of code/statement that you may execute in order to either read or update data, or make changes to the storage system. its a catch-all term to include all the crud operations you may perform on a storage system at any level (schema, table, object)
let us now a construct a pseudo memory interface
memory = new Memory()
1. exact query known
in this scenario you know the exact query that needs to be run. its completely deterministic.
memory.execute_query(_query_)
2. user intent (read vs update) known
in this scenario you only know the requirement. a requirement here is ideally a combination of user input/intent and knowledge about existing system. the exact query will have to be derived and then executed
query = memory.generate_query(_requirement_)
memory.execute_query(_query_)
we may chose to expose only 'execute' function, which would run the steps internally
memory.execute(_requirement_)
3. ambiguous intent is know
in this scenario you have bits of data, and some semblance of intent is known. you have knowledge about existing system but you are not sure about the user intent. this could be either read intent for existing data or non-existing data. in the latter case the data will have to generate.
requirement = memory.generate_requirement(_vague_intent_)
List<query> queries = memory.generate_query(_requirement_)
# because the intent is vague we may end up with a list of queries, that'll have to be executed in order to fulfil the action.
for query in queries:
memory.execute_query(_query_)
we may chose to expose a minimal function called execute, which runs all the steps internally
memory.execute(_vague intent_)
this minimal exposure is the ideal state because the developer doesn't have to worry about anything. this is also the one most prone to error because every step is guided by llms, and error from those compounds
Top comments (0)