<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jonhnson Nakano</title>
    <description>The latest articles on DEV Community by Jonhnson Nakano (@jnakano).</description>
    <link>https://dev.to/jnakano</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3944226%2Fed109216-03bf-49a7-b944-c02f3dd1eb98.png</url>
      <title>DEV Community: Jonhnson Nakano</title>
      <link>https://dev.to/jnakano</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jnakano"/>
    <language>en</language>
    <item>
      <title>Stop Passing Files Between Agents With Local Paths</title>
      <dc:creator>Jonhnson Nakano</dc:creator>
      <pubDate>Wed, 17 Jun 2026 16:16:18 +0000</pubDate>
      <link>https://dev.to/jnakano/stop-passing-files-between-agents-with-local-paths-oop</link>
      <guid>https://dev.to/jnakano/stop-passing-files-between-agents-with-local-paths-oop</guid>
      <description>&lt;p&gt;I got very used to running agents locally.&lt;/p&gt;

&lt;p&gt;The workflow was simple: run the agent, let it write outputs into my filesystem, then inspect everything in an ./outputs folder.&lt;/p&gt;

&lt;p&gt;Markdown reports, JSON files, screenshots, charts — whatever the agent produced, it was right there.&lt;/p&gt;

&lt;p&gt;Then I deployed it.&lt;/p&gt;

&lt;p&gt;Same agent, same logic. But now the "output" lived in a container filesystem that vanished the second the task finished. A retry wrote &lt;code&gt;report_20260313_103042.pdf&lt;/code&gt; next to &lt;code&gt;report_20260313_103041.pdf&lt;/code&gt;. And when I wanted to share this with someone, I no longer had a clean link. &lt;/p&gt;

&lt;p&gt;Nothing about the agent had changed.&lt;/p&gt;

&lt;p&gt;Everything about the environment had.&lt;/p&gt;

&lt;p&gt;If you build agents that produce files (reports, datasets, images, JSON dumps), you've probably hit this gap.&lt;/p&gt;

&lt;p&gt;Local development hides it.&lt;/p&gt;

&lt;p&gt;Production hands it to you on day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  The local version
&lt;/h2&gt;

&lt;p&gt;On your machine, persisting agent output is trivial:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;out_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./outputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;
    &lt;span class="n"&gt;out_dir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;out_dir&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Write bytes, get a path, move on.&lt;/p&gt;

&lt;p&gt;You can list the directory, cat the file, open the PDF, or hand the path to the next script in your pipeline.&lt;/p&gt;

&lt;p&gt;For one person running one agent on one laptop, this is perfectly fine.&lt;/p&gt;

&lt;p&gt;The problem is not local development.&lt;/p&gt;

&lt;p&gt;The problem is mistaking “it works on my laptop” for “I have a storage layer.”&lt;/p&gt;

&lt;h2&gt;
  
  
  The production version
&lt;/h2&gt;

&lt;p&gt;Production agents do not get a reliable ./outputs/ folder.&lt;/p&gt;

&lt;p&gt;They run in environments where the filesystem is temporary, isolated, or both.&lt;/p&gt;

&lt;p&gt;Serverless functions may give you /tmp, but it is scoped to the execution environment and often limited in size. Containers lose local state when they restart. Background workers, queues, and orchestrators can run each task on a different machine.&lt;/p&gt;

&lt;p&gt;And retries are not an edge case. They are part of the system.&lt;/p&gt;

&lt;p&gt;Your orchestrator will eventually rerun a failed step, and now you have the same logical output produced twice.&lt;/p&gt;

&lt;p&gt;Then there is the human in the loop.&lt;/p&gt;

&lt;p&gt;Agents produce things people actually need to read: compliance PDFs, analysis summaries, generated slides, CSV exports, charts, screenshots, debug bundles.&lt;/p&gt;

&lt;p&gt;Those people do not have SSH access to your worker node.&lt;/p&gt;

&lt;p&gt;They need a link, not a filepath on a machine they will never see.&lt;/p&gt;

&lt;p&gt;So the production checklist starts looking very different from local dev: &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Local&lt;/th&gt;
&lt;th&gt;Production&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;path.write_bytes()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Upload to durable object storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;./outputs/run_42/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Queryable grouping by run/session&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"It's in the repo"&lt;/td&gt;
&lt;td&gt;Stable ID retrievable from any machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You remember the filename&lt;/td&gt;
&lt;td&gt;Idempotent retries that don't duplicate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Files live forever&lt;/td&gt;
&lt;td&gt;TTL / lifecycle rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You Slack the file manually&lt;/td&gt;
&lt;td&gt;Shareable download URL with expiry&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I have talked to a few teams that hit the same wall.&lt;/p&gt;

&lt;p&gt;The agent logic is done.&lt;/p&gt;

&lt;p&gt;Now the artifact plumbing begins.&lt;/p&gt;




&lt;h2&gt;
  
  
  Files vs artifacts
&lt;/h2&gt;

&lt;p&gt;Here's the distinction that changed how I think about this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A file is bytes at a path.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;An artifact is a file plus context.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That context is what makes the output usable after the agent is done.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which run produced it: session_id, pipeline run, batch job&lt;/li&gt;
&lt;li&gt;which agent produced it: agent_id, stage, model version&lt;/li&gt;
&lt;li&gt;what it actually is: content type, size, custom metadata&lt;/li&gt;
&lt;li&gt;when it should expire: TTL or lifecycle rules&lt;/li&gt;
&lt;li&gt;how to retrieve it later without knowing where the bytes live: a stable ID&lt;/li&gt;
&lt;li&gt;how to share it with someone outside your infra: a temporary download link&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A PDF sitting on disk is a file.&lt;/p&gt;

&lt;p&gt;A PDF tagged with session_id=pipeline_run_42, agent_id=report-writer, model=claude-sonnet-4, retrievable as art_2xk9f7v3m1p0, and set to expire in 30 days?&lt;/p&gt;

&lt;p&gt;That is an artifact.&lt;/p&gt;

&lt;p&gt;Your agent may still produce files.&lt;/p&gt;

&lt;p&gt;But downstream agents, debug tools, production workflows, and the humans waiting in Slack all need artifacts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The metadata layer you eventually build
&lt;/h2&gt;

&lt;p&gt;Most teams do not start by building an artifact store. They start with S3 (or R2, or GCS) and a slowly growing feeling that object keys aren't enough.&lt;/p&gt;

&lt;p&gt;The pattern I keep seeing, including in our own user research, goes like this.&lt;/p&gt;

&lt;p&gt;First, put the bytes in object storage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;BUCKET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent-outputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;local_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;local_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;content_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;ext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;local_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rsplit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;content_hash&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ext&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you realize the object key is not enough.&lt;/p&gt;

&lt;p&gt;You need to know which run produced the file, which agent created it, what kind of output it is, when it should expire, and how to find it later.&lt;/p&gt;

&lt;p&gt;So you add a metadata table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;artifacts&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;            &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tenant_id&lt;/span&gt;     &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;filename&lt;/span&gt;      &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;content_type&lt;/span&gt;  &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;size_bytes&lt;/span&gt;    &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;content_hash&lt;/span&gt;  &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;session_id&lt;/span&gt;    &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;agent_id&lt;/span&gt;      &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;metadata&lt;/span&gt;      &lt;span class="n"&gt;jsonb&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;expires_at&lt;/span&gt;    &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;    &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;deleted_at&lt;/span&gt;    &lt;span class="n"&gt;timestamptz&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_artifacts_session&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;artifacts&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;deleted_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then wrap it in an API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_artifact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;current_tenant&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;artifact_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;art_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;generate_id&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        INSERT INTO artifacts
          (id, tenant_id, filename, content_hash, session_id, agent_id, metadata, expires_at)
        VALUES (%s, %s, %s, %s, %s, %s, %s, now() + interval &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;30 days&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;artifact_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Congratulations, you're on your way to building an artifact store. &lt;/p&gt;

&lt;p&gt;Then the other 80% shows up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Idempotency keys for retry-safe uploads&lt;/li&gt;
&lt;li&gt;Content-hash deduplication so retries don't double your storage bill&lt;/li&gt;
&lt;li&gt;Presigned upload/download URLs with correct expiry semantics&lt;/li&gt;
&lt;li&gt;Session sealing so a zombie sub-agent can't append to a "finished" run&lt;/li&gt;
&lt;li&gt;Public download links for humans, with their own expiry model&lt;/li&gt;
&lt;li&gt;Soft delete plus garbage collection&lt;/li&gt;
&lt;li&gt;Usage metering and quota enforcement&lt;/li&gt;
&lt;li&gt;Metadata filtering that's actually queryable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've watched engineers spend time building this type of wrapper and still not get dedup, TTL, or session semantics right.&lt;/p&gt;

&lt;p&gt;This is not a knock on those teams. It is necessary plumbing. But necessary plumbing is still plumbing - and most teams should be spending that time on their product, not rebuilding agent infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  What an artifact layer should handle
&lt;/h2&gt;

&lt;p&gt;If you are deciding whether to build this yourself or use a purpose-built layer, this is the basic checklist I would use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run / session grouping
&lt;/h3&gt;

&lt;p&gt;You need to answer one question quickly:&lt;/p&gt;

&lt;p&gt;What did this pipeline run produce?&lt;/p&gt;

&lt;p&gt;Not grep logs.&lt;br&gt;
Not list an S3 prefix and hope the naming convention held.&lt;/p&gt;

&lt;p&gt;One query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;artifacta &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;--session&lt;/span&gt; pipeline_run_42
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A session should be whatever your orchestrator already uses: pipeline_run_42, daily_batch_20260313, customer_report_8841.&lt;/p&gt;

&lt;p&gt;It should not require a separate “create session” step just to group outputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent provenance
&lt;/h3&gt;

&lt;p&gt;When a report looks wrong three weeks later, you need to know what produced it.&lt;/p&gt;

&lt;p&gt;Which agent?&lt;br&gt;
Which model?&lt;br&gt;
Which stage of the workflow?&lt;/p&gt;

&lt;p&gt;That means agent_id and metadata should be captured at upload time, not buried in logs you hope still exist.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pipeline_run_42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summarizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Metadata
&lt;/h3&gt;

&lt;p&gt;Object storage metadata is not enough.&lt;/p&gt;

&lt;p&gt;Headers are limited, awkward to query, and easy to make inconsistent across a pipeline.&lt;/p&gt;

&lt;p&gt;You want structured metadata stored with the artifact record and filterable when listing artifacts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deduplication
&lt;/h3&gt;

&lt;p&gt;Agent systems usually need two forms of deduplication:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Content-hash deduplication: same bytes, one stored blob.&lt;/li&gt;
&lt;li&gt;Idempotency keys: same retried operation, same response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These solve different problems.&lt;/p&gt;

&lt;p&gt;Content hashing prevents duplicate storage. Idempotency prevents a retry from creating a second logical artifact.&lt;/p&gt;

&lt;p&gt;Conflating the two is a common bug in homegrown wrappers.&lt;/p&gt;

&lt;h3&gt;
  
  
  TTL
&lt;/h3&gt;

&lt;p&gt;Artifacts should expire by default.&lt;/p&gt;

&lt;p&gt;An experiment, batch run, or debug file should not live forever because nobody remembered to clean it up.&lt;/p&gt;

&lt;p&gt;Storage lifecycle rules help, but they usually operate at the bucket or prefix level. They do not understand your artifact metadata, which makes per-artifact expiration harder than it should be.&lt;/p&gt;

&lt;h3&gt;
  
  
  Download links
&lt;/h3&gt;

&lt;p&gt;Humans need a link, not a file path.&lt;/p&gt;

&lt;p&gt;A good artifact layer should make it easy to create a stable download URL with configurable expiry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;https://dl.example.com/lnk_...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That link should be separate from your internal storage details and easy to share with a teammate, customer, or workflow step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval by ID / session
&lt;/h3&gt;

&lt;p&gt;Downstream agents should not coordinate through shared filesystem paths.&lt;/p&gt;

&lt;p&gt;Agent A pushes an artifact and gets an ID. Agent B pulls by ID, or lists the session and filters by metadata.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARTIFACTA_SESSION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"pipeline_run_42"&lt;/span&gt;

python extract.py    &lt;span class="c"&gt;# pushes CSV&lt;/span&gt;
python analyze.py     &lt;span class="c"&gt;# lists session, pulls CSV, pushes report&lt;/span&gt;
python notify.py      &lt;span class="c"&gt;# creates download link for the human&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Session sealing matters too.&lt;/p&gt;

&lt;p&gt;Once a run is finalized, late uploads should fail clearly instead of silently corrupting the run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;409 Session &lt;span class="s1"&gt;'pipeline_run_42'&lt;/span&gt; is sealed. No new artifacts can be added.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  A small tool I'm building for this
&lt;/h2&gt;

&lt;p&gt;I'm building &lt;a href="https://artifacta.io" rel="noopener noreferrer"&gt;Artifacta&lt;/a&gt;, an artifact store purpose-built for AI agents.&lt;/p&gt;

&lt;p&gt;It is not an orchestrator, search engine, or agent framework. It is the layer between your agent and object storage: session-aware, queryable artifact storage with a CLI, MCP, Python SDK, and REST API.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;artifacta-cli
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARTIFACTA_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ak_live_..."&lt;/span&gt;

artifacta push report.pdf &lt;span class="nt"&gt;--session&lt;/span&gt; earnings-q4-2025 &lt;span class="nt"&gt;--agent&lt;/span&gt; report-writer
artifacta &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;--session&lt;/span&gt; earnings-q4-2025
artifacta &lt;span class="nb"&gt;link &lt;/span&gt;art_2xk9f7v3m1p0   &lt;span class="c"&gt;# share with a human&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or from Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;artifacta&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;artifact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;earnings-q4-2025&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;artifact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# art_2xk9f7v3m1p0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I’m sharing it because this is a problem I keep seeing in agent workflows, even if Artifacta is not the solution every team chooses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Questions for other builders
&lt;/h2&gt;

&lt;p&gt;I’m curious how other teams handle this today:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Where do your agent outputs go in production?&lt;/li&gt;
&lt;li&gt;How do you group outputs by run?&lt;/li&gt;
&lt;li&gt;What broke first: retries, deduplication, human sharing, or cleanup?&lt;/li&gt;
&lt;li&gt;Do you finalize runs, or can late workers still write outputs?&lt;/li&gt;
&lt;li&gt;Is a dedicated artifact layer useful, or is “S3 + Postgres + a thin wrapper” the right tradeoff?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Drop your setup in the comments. I’m especially interested in approaches that are not just object storage plus glue code.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
