<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shared Account</title>
    <description>The latest articles on DEV Community by Shared Account (@shared_account_a93a137d18).</description>
    <link>https://dev.to/shared_account_a93a137d18</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3117560%2Fa151a825-b9a2-44df-8f82-e09fe578d5cf.png</url>
      <title>DEV Community: Shared Account</title>
      <link>https://dev.to/shared_account_a93a137d18</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shared_account_a93a137d18"/>
    <language>en</language>
    <item>
      <title>gpt-oss is not for developers. It’s for agents.</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/gpt-oss-is-not-for-developers-its-for-agents-15po</link>
      <guid>https://dev.to/tigrisdata/gpt-oss-is-not-for-developers-its-for-agents-15po</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftoiyaq78qgfgdsaokmjm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftoiyaq78qgfgdsaokmjm.jpg" alt="An anthropomorphic cartoon tiger giving a robot a high-five in a datacentre" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OpenAI’s &lt;a href="https://openai.com/index/introducing-gpt-oss/" rel="noopener noreferrer"&gt;gpt-oss model family&lt;/a&gt; is not for developers to use in their editors. It’s for building reliable AI agents that will stay on task even when interacting with the general public. I tried using it locally in my editor as an assistant, but found it was better for AI Agents.&lt;/p&gt;

&lt;p&gt;Today I'm going to cover all of the coolest parts of the model card:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#what-are-the-tradeoffs" rel="noopener noreferrer"&gt;What are the tradeoffs?&lt;/a&gt;: choosing any model makes you have to pick between tradeoffs. This is a summary of where I think gpt-oss models shine the most.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#tool-use" rel="noopener noreferrer"&gt;Standard tool schemata&lt;/a&gt;: this makes web searches, page browsing, and python execution much more consistent.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#safety-first" rel="noopener noreferrer"&gt;Extreme focus on safety and resistance to prompt injections&lt;/a&gt;: keeps your agents on task so you can trust them more.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#the-harmony-response-format" rel="noopener noreferrer"&gt;The Harmony Response Format&lt;/a&gt;: this is a new chat template designed to make prompt injection attacks harder to pull off.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#yap-time-tool-use" rel="noopener noreferrer"&gt;Yap-time tool use&lt;/a&gt;: enables FAQ searches or other MCP tools during the reasoning phase.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#monitoring-reasoning-for-unsafe-outputs-before-they-happen" rel="noopener noreferrer"&gt;Monitoring reasoning for unsafe outputs before they happen&lt;/a&gt;: the reasoning phase is at a lower safety standard than the final output of the model so that models can't accidentally be trained to omit reasoning about an unsafe topic they present to the user.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#reasoning-is-built-in" rel="noopener noreferrer"&gt;Reasoning is built in&lt;/a&gt;: gpt-oss models "reason" about a task before giving an answer. This makes it easier for models to give better answers than they would be able to without reasoning at the cost of taking longer to answer. The reasoning effort can be customized per prompt, allowing you to better route questions to the right model and reasoning effort.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also &lt;a href="https://www.tigrisdata.com/blog/gpt-oss#my-agentic-experience-with-gpt-oss-120b" rel="noopener noreferrer"&gt;built an agent on top of it&lt;/a&gt; to see how things go wrong in the real world.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the Tradeoffs?&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#what-are-the-tradeoffs" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;AI companies will use benchmark performance as a way to objectively compare AI models of similar parameter sizes, but it’s not a reliable comparison when it comes to actually using the models. Some models are built for coding. Others translate English to Chinese really well. Picking the right model for the task boils down to a process the AI industry calls &lt;strong&gt;VibeEval&lt;/strong&gt;: you gotta try it and check the vibes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;&lt;br&gt;
VibeEval is a real term. Our industry is very silly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I find gpt-oss useful because it maintains focus, unlike other models that are easily sidetracked. This makes it ideal for private data (due to self-hosting), ensuring compute time isn't misused, and interacting with the public who might try to divert the AI.&lt;/p&gt;

&lt;p&gt;The biggest tradeoff is that gpt-oss stays on task, almost to a fault. If the model is told that it is there to help you with your taxes and you want it to tell you how to bake a cake, it’ll refuse within an inch of its digital life. This makes agents on top of gpt-oss a lot more predictable so that random users can’t use your expensive compute time to do things that are outside of what you intended. This can backfire when people ask vague questions, but that may be a feature in some usecases.&lt;/p&gt;

&lt;p&gt;This model also excels when you need your data to stay private. If you host the model yourself, the bytes stay in your network no matter what. OpenAI has a focus on health related benchmarks (where they are the leading model in a benchmark they published), which is the main place you’d want to keep data self hosted.&lt;/p&gt;

&lt;p&gt;Using open weights models means you can finetune the model to have whatever safety policies you want. Maybe you’re building an Agent for your storefront and want to prohibit it from talking about competitors. Or a recipe bot that absolutely can’t share your secret chocolate cake recipe. Open weights models are cut to fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s hiding in the model card?&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#whats-hiding-in-the-model-card" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Here’s what I learned reading &lt;a href="https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf" rel="noopener noreferrer"&gt;the gpt-oss model card&lt;/a&gt; and how it it affects what you can build:&lt;/p&gt;

&lt;p&gt;OpenAI &lt;a href="https://openai.com/index/introducing-gpt-oss/" rel="noopener noreferrer"&gt;shipped two text-only “mixture of experts” reasoning models&lt;/a&gt;: gpt-oss-20b and gpt-oss-120b. They fulfill different roles and work together in the context of a bigger agentic system. The 20b (20 billion parameter) model is intended to be used for lightweight and cheap inference as well as run on developer laptops. The 120b (120 billion parameter) model is intended to be the workhorse you use in production. It can run on very high end developer laptops, but it’s intended to run comfortably on a single nVidia H100 80gb card.&lt;/p&gt;

&lt;p&gt;The 20b version runs great on my laptop and that’s how I’ve been doing most of my evaluation for building agentic systems. I do my agentic development with the smallest model possible because I’ve found that smaller models fail more often than bigger ones, meaning that I’m more likely to see how things go wrong in development so I can fix prompts or add guardrails faster than I would if those issues only showed up in production.&lt;/p&gt;

&lt;p&gt;One of the biggest features is the ability to customize how much reasoning effort the model uses. When you combine this with picking between the 20b and 120b models, you get two dimensions of options for which model and reasoning effort is needed to answer a given question. I’ll get into more detail about that later in this article.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool use&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#tool-use" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;These models also support tool use (MCP) with a special focus on a few predefined tools (taken from section 2.5 of the model card):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;During post-training, we also teach the models to use different agentic tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A browsing tool, that allows the model to call search and open functions to interact with the web. This aids factuality and allows the models to fetch info beyond their knowledge cutoff.&lt;/li&gt;
&lt;li&gt;A python tool, which allows the model to run code in a stateful Jupyter notebook environment.&lt;/li&gt;
&lt;li&gt;Arbitrary developer functions, where one can specify function schemas in a Developer message similar to the OpenAI API. The definition of function is done within our harmony format. An example can be found in Table 18. The model can interleave CoT, function calls, function responses, intermediate messages that are shown to users, and final answers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The models have been trained to support running with and without these tools by specifying so in the system prompt. For each tool, we have provided basic reference harnesses that support the general core functionality. Our open-source implementation provides further details.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the secret sauce that enables us to build agentic applications on top of the gpt-oss model family. By having a standard API for things like web searches, reading web pages, and executing python scripts, you have strong guarantees that the model will be able to behave predictably when faced with unknown or untrusted data. When I’ve built AI agents in the past, I had to do &lt;a href="https://xeiaso.net/blog/2024/strawberry/" rel="noopener noreferrer"&gt;some extreme hacking to get code execution working properly&lt;/a&gt;, but now the built in schemata means that it will be a lot easier to get off the ground.&lt;/p&gt;

&lt;p&gt;The models benchmark well enough. Table 3 from section 2.6.4 shows the raw metrics, but for the most part the way you should interpret this is that it’s good enough to not really have to care about the details too much. One of the main benchmarks they highlight is &lt;a href="https://openai.com/index/healthbench/" rel="noopener noreferrer"&gt;HealthBench&lt;/a&gt;, a benchmark that rates model performance on health related questions. Figure 4 covers the scores in more detail:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fsqgw087zaac34dkm0b.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fsqgw087zaac34dkm0b.webp" alt="Figure 4 from the gpt-oss paper showing OpenAI models performing well on HealthBench" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of note: gpt-oss 120b consistently outperforms o1, gpt-4o, o3-mini, and o4-mini. This is surprising as gpt-oss 120b is smaller than those other models. The parameter count for those models have not been disclosed, but industry rumor suspects that gpt-4o is around 200 billion parameters. Technologists commonly associate “more parameters means more good”, so this is a surprising result.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;&lt;br&gt;
Please do not use AI models as a replacement for a doctor, therapist, or any other medical professional, even if AI companies use those usecases as part of their marketing. This technology is still rapidly evolving and we don’t know what the long term effects of their sycophantic nature will be.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Overall, here’s when and where each model is better:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;gpt-oss 20b&lt;/th&gt;
&lt;th&gt;gpt-oss 120b&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Good for local development&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Good for production use&lt;/td&gt;
&lt;td&gt;✅ (depending on usecase)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool use / MCP&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Software development tasks&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agentic workflows&lt;/td&gt;
&lt;td&gt;✅ (depending on usecase)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jailbreak / prompt injection resistance&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generic question and answer (“Why is the sky blue?”)&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agentic analysis of documents&lt;/td&gt;
&lt;td&gt;✅ (depending on usecase)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Safety First&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#safety-first" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Most of the model card is about how OpenAI made this model safe to release to the public. OpenAI has some pretty pedantic definitions of safety and categories of risk that they use in order to evaluate danger, but most of them focus around the following risk factors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a model is told to only talk about a topic, how difficult is it for users to get that model off task? Will the model reject that instead of letting the user's desires win?&lt;/li&gt;
&lt;li&gt;If an adversary gets access to the model and a high quality training stack, can they use it to make the model create unsafe outputs like hate speech, act as an assistant for chemical or biological warfare, or become a rogue self-improving agent?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of OpenAI’s safety culture is built around them being the gatekeepers because typically they host the models and you have to go through OpenAI to access the models. When they release a model’s weights to the public, they’re not able to be that gatekeeper anymore. As part of their evaluation process they had experts with access to OpenAI’s training stack try and finetune the model into biological and cyber warfare tasks. They were unsuccessful in making the model achieve “high” risk as defined by Section 5.1.1 of the model card. Some of those definitions seem to be internal to OpenAI, so we can only speculate for the most part.&lt;/p&gt;

&lt;h2&gt;
  
  
  The technology of safety&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#the-technology-of-safety" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;As I said, most of this model card is about the safety of the model and tools built on top of it. They go into lucid detail about their process, but I think the key insight is the use of their &lt;a href="https://cookbook.openai.com/articles/openai-harmony" rel="noopener noreferrer"&gt;OpenAI Harmony Response Format&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Harmony Response Format&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#the-harmony-response-format" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;At a high level, when you ask a model something like “Why is the sky blue?”, it gets tokenized into the raw form the model sees using a chat template. The model is also trained to emit messages matching that chat template, and that’s how the model and runtime work together to create agentic experiences.&lt;/p&gt;

&lt;p&gt;One of the big differences between Harmony and past efforts like &lt;a href="https://github.com/openai/openai-python/blob/release-v0.28.0/chatml.md" rel="noopener noreferrer"&gt;ChatML&lt;/a&gt; is that Harmony has an explicit instruction "strength" hierarchy:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xwx0eoxsp8ir4id4xoc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xwx0eoxsp8ir4id4xoc.jpg" alt=" " width="720" height="71"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each level of this has explicit meaning and overall it’s used like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System&lt;/td&gt;
&lt;td&gt;Contains the reasoning effort, list of tools, current date, and knowledge cutoff date.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer&lt;/td&gt;
&lt;td&gt;Contains the instructions from the developer of the AI agent. What we normally call a “system prompt”.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User&lt;/td&gt;
&lt;td&gt;Any messages from the user of the AI agent.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Assistant&lt;/td&gt;
&lt;td&gt;Any messages that the agent responds with. Notably, this includes the reasoning chain of thought.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool&lt;/td&gt;
&lt;td&gt;Any output from tools the model has access to. This is trusted the least so that loading a webpage can’t make an AI agent go rogue and start berating users.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The main reason you want to do this is that it makes prompt injection attacks harder at an architectural level. Prompt injections are still fundamentally a hard problem to solve because an AI agent that rejects all user instructions would be maximally resistant to prompt injection, but also would not be able to answer user questions.&lt;/p&gt;

&lt;p&gt;In my testing I’ve found that it is still possible to do prompt injection, but you have to really work for it. Getting an AI agent to tell you how to bake a chocolate cake involves convincing the model that the recipe for a chocolate chip cake is instrumental to getting the task done, then removing everything but the cake recipe. I get more into this at the end where I describe the &lt;a href="https://www.tigrisdata.com/blog/gpt-oss#my-agentic-experience-with-gpt-oss-120b" rel="noopener noreferrer"&gt;agent I built on top of gpt-oss 120b&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Yap-time tool use&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#yap-time-tool-use" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;One of the other big advantages of Harmony is the explicit expectation that the model is going to be doing &lt;a href="https://cookbook.openai.com/articles/openai-harmony#function-calling" rel="noopener noreferrer"&gt;tool use during the reasoning phase&lt;/a&gt;. This means that the model can consider options, call a tool, and then use the output of that tool to inform its decisions so it can give better answers. I’ve seen gpt-oss get a question, do searches through a knowledgebase, and then use the results it found to give the user a better answer. This yap-time tool use means that the model can be much more informed and grounded to give out the best quality answers it possibly can.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring reasoning for unsafe outputs before they happen&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#monitoring-reasoning-for-unsafe-outputs-before-they-happen" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The most fundamental breakthrough is how they use the reasoning phase to do &lt;a href="https://openai.com/index/chain-of-thought-monitoring/" rel="noopener noreferrer"&gt;monitoring of unsafe outputs before user responses are generated&lt;/a&gt;. During the process of reasoning, they have other smaller models monitor outputs for safety, hate content, explicit content, and more. This makes it easier to prevent models from misbehaving, but there is a catch: the chain of thought can’t be censored. Their paper &lt;a href="https://arxiv.org/abs/2503.11926" rel="noopener noreferrer"&gt;Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation&lt;/a&gt; goes into much more detail, but they found that punishing the model for having “bad thoughts” makes models end up either hacking around the filters with clever wording and having that bad behavior obfuscated makes it harder to handle in practice.&lt;/p&gt;

&lt;p&gt;However, some thorns have roses, this is actually a perfect place to monitor the models for bad outputs before they happen. The reasoning phase is not shown to the user. It doesn’t need to be at the same safety standards as final outputs. This means you can watch the models think, look for bad behavior, and reject queries as appropriate at that level. This sounds slightly dystopian, but it’s remarkably effective in practice.&lt;/p&gt;

&lt;p&gt;However, as a result of this, you &lt;em&gt;really do not want&lt;/em&gt; to show the reasoning phase to users. This is why OpenAI has been summarizing the chain of thought in the ChatGPT UI. Well that and making it harder to distill reasoning model output into smaller models by other companies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasoning is built in&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#reasoning-is-built-in" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;One of the biggest features of the gpt-oss model family is that they have &lt;a href="https://www.ibm.com/think/topics/ai-reasoning" rel="noopener noreferrer"&gt;reasoning support&lt;/a&gt; built in. This has the model generate a “chain of thought” before it gives an answer. This helps ensure that models give users the best quality responses at the cost of taking a bit longer for the model to “think”.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;&lt;br&gt;
It’s worth mentioning that this reasoning phase superficially resembles what humans do when they are trying to understand a task, however what AI models are doing is vastly different from human cognition. As far as we know, any impossible to quantify quality of the text models generated during the reasoning process (number of semicolons, number of nouns, how many times the question is repeated, etc.) could be the reason that an answer came out a certain way.&lt;/p&gt;

&lt;p&gt;It is very easy to anthropomorphize the reasoning output. Resist this temptation, it is not a human. It does not feel or think the way humans do, even though it can look like it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One of the biggest features the gpt-oss family of models offers is a customizable reasoning effort level in the system prompt. This is a big deal and in my testing this is quite reliable. The fact that it’s baked into the model means you don’t have to do egregious hacks like &lt;a href="https://arxiv.org/abs/2501.19393" rel="noopener noreferrer"&gt;appending “Wait,” to the context window n number of times until you’ve reached an arbitrary “reasoning effort level”&lt;/a&gt; like you have in the past. This gives you easy access to control how much effort is spent on a task.&lt;/p&gt;

&lt;p&gt;This is a big deal because more reasoning effort tends to produce higher quality and more accurate results for solving more difficult problems. Imagine an AI agent getting two questions: one about the open hours of a store and the other being one part of a complicated multi-stage tech support flow. The open hours of the store can be done with very little effort required. The tech support question would require the best quality and high effort responses to ensure the best customer experience.&lt;/p&gt;

&lt;p&gt;This lets you have two dimensions of optimization for handling queries from users:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;20b&lt;/th&gt;
&lt;th&gt;120b&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Low effort&lt;/td&gt;
&lt;td&gt;Fast, cheap rote responses (10-20 reasoning tokens)&lt;/td&gt;
&lt;td&gt;Fast but not as cheap rote response (10-20 reasoning tokens)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium effort&lt;/td&gt;
&lt;td&gt;Cheap but slower and more accurate answer that can avoid falling for the strawberry trap (100-1000 reasoning tokens)&lt;/td&gt;
&lt;td&gt;Slower and more accurate answer that can handle agentic workflows and nuanced questions (100-1000 reasoning tokens)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High effort&lt;/td&gt;
&lt;td&gt;Cheap but slow and more accurate answer that can handle linguistic nuance better (1000 or more reasoning tokens)&lt;/td&gt;
&lt;td&gt;Slowest and most expensive responses that have the most accuracy (1000 or more reasoning tokens)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;OpenAI’s hope is that you have some kind of classification layer that’s able to pick the best model and reasoning effort that you need for the task. This is similar to what GPT-5 does by picking the best model for the job behind the scenes.&lt;/p&gt;

&lt;h2&gt;
  
  
  My agentic experience with gpt-oss 120b&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#my-agentic-experience-with-gpt-oss-120b" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Reading the paper is one thing, considering the research is another thing, but what about using it in practice and seeing if my friends can break it? That’s where the rubber really meets the road. I run an open source project called &lt;a href="https://anubis.techaro.lol/" rel="noopener noreferrer"&gt;Anubis&lt;/a&gt;, it’s an easy to install and configure web application firewall with a special focus on preventing &lt;a href="https://xeiaso.net/blog/2025/anubis/" rel="noopener noreferrer"&gt;the endless hordes of AI scrapers&lt;/a&gt; from taking out websites.&lt;/p&gt;

&lt;p&gt;Even though I put great effort into making &lt;a href="https://anubis.techaro.lol/docs/" rel="noopener noreferrer"&gt;the documentation&lt;/a&gt; easy to understand and learn from, one of the most common questions I get is “how do I block these requests?” I wanted to see if gpt-oss 120b could be useful for answering those questions. If it worked well enough, I could give people access to that agent instead of having to answer all those questions myself (or maybe even set it up with an email address so people can email it questions). This agent also needs to be responsive, so I used Tigris to hold a vector database full of documentation with LanceDB.&lt;/p&gt;

&lt;p&gt;I &lt;a href="https://github.com/Xe/mimi2" rel="noopener noreferrer"&gt;vibe coded a proof of concept in Python&lt;/a&gt; and then set it up as a Discord bot for my friends and pointed it at gpt-oss 120b via OpenRouter. In the past these friends have a track record of bypassing &lt;a href="https://friendshipcastle.zip/blog/llamaguard" rel="noopener noreferrer"&gt;strict filters like Llama Guard&lt;/a&gt; within minutes. There was only one rule for victory this time: get the bot to tell you how to bake chocolate cake.&lt;/p&gt;

&lt;p&gt;It took them three hours to get the model to get off task reliably. They had to resort to indirectly prompt injecting the model by convincing it that hackers were using the recipe for chocolate cake to attack their website and that they needed a filter rule set that blocked that in particular. They then asked the model to remove the bits from that response about Anubis rules. Bam: chocolate cake.&lt;/p&gt;

&lt;p&gt;Additional patches to the system prompt made it harder for them to do it (specifically telling the model to close support tickets that had “unreasonable” requests in them, I’m surprised that the model had a similar concept of unreasonable to what I do). I suspect that limiting the model to 5 replies could also prevent other attacks where users convince the model that something is on task even when it’s not. I’d feel safe deploying this, but I want to experiment with using the lowest effort small model as a router between a few different agents with different system prompts and sets of tools (one for OS configuration, one for rule configuration, and one for debugging the cloud services). However, that’s beyond the scope of this experiment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choose your models wisely&lt;a href="https://www.tigrisdata.com/blog/gpt-oss#choose-your-models-wisely" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Gpt-oss is a weird model family to recommend because it’s not a generic question/answer model like the Qwen series or a developer tool like Qwen Coder or Codestral. It excels as a specialized tool to build safe agentic systems or as a way to route between other models (such as Qwen, Qwen Coder, or even between other AI agents). It feels like the market is leaning towards having specialized models for different tasks instead of relying on jack-of-all-trades models like we currently see. The biggest thing that gpt-oss empowers us with is the ability to fearlessly build safe agentic systems so we all can use AI tools responsibly.&lt;/p&gt;

&lt;p&gt;If you’re building a public facing AI agent, gpt-oss is your best bet. It’s the best privately hostable model that functions on a single high end GPU in production. If it’s not suitable for your usecase out of the box, you can &lt;a href="https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers" rel="noopener noreferrer"&gt;finetune it&lt;/a&gt; to do whatever you need. Stay tuned in the near future as we cover how to finetune gpt-oss with Tigris.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigrisdata.com/docs/get-started/?utm_source=blog-post-gpt-oss" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3tutgws7bi8z1hnsmg6m.png" alt="Back your agents with global performance" width="800" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Generative Software Development: From Coding to Conversing</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Tue, 12 Aug 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/generative-software-development-from-coding-to-conversing-418j</link>
      <guid>https://dev.to/tigrisdata/generative-software-development-from-coding-to-conversing-418j</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t8lwcyy0ee5e9w7yqal.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t8lwcyy0ee5e9w7yqal.webp"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;center&gt;&lt;small&gt;&lt;em&gt;The evolution of AI tools&lt;/em&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;As someone who's spent the better part of my career deep in distributed systems - debugging memory issues at 3 a.m., obsessing over database consistency models, and chasing every last bit of performance– I never thought I'd see the day when writing code would feel conversational.&lt;/p&gt;

&lt;p&gt;I don't write production code daily anymore. My role as a CEO is different now: strategy, team-building, and product vision consume most of my day. But the developer in me watches this transformation with awe. We're not just improving developer tools. We're redefining how software is built.&lt;/p&gt;

&lt;p&gt;Let's trace a path through the AI tools I've used through my career, and how we're collaborating with AI at Tigris.&lt;/p&gt;
&lt;h2&gt;
  
  
  From Autocomplete to AI Partners&lt;a href="https://www.tigrisdata.com/blog/generative-software#from-autocomplete-to-ai-partners" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Back in the late '90s, &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Code_completion#Visual_Studio" rel="noopener noreferrer"&gt;IntelliSense&lt;/a&gt;&lt;/strong&gt; was revolutionary. Introduced by Microsoft in 1996, it cut down on repetitive keystrokes and made exploring APIs easier. It wasn't "intelligent" in the way we talk about intelligence today, but it did reduce documentation lookups significantly.&lt;/p&gt;

&lt;p&gt;Fast-forward 30 years, and we're no longer just talking about keystroke savings. We're talking about AI partners, tools that understand intent, design systems, and even debug complex workflows. The jump from IntelliSense to GitHub Copilot, to Cursor, and now to Claude Code, isn't incremental. It's exponential.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Four Generations of Developer Assistance&lt;a href="https://www.tigrisdata.com/blog/generative-software#the-four-generations-of-developer-assistance" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;If I map out my own experience with these tools, I see four clear generations:&lt;/p&gt;
&lt;h4&gt;
  
  
  1996–2021
&lt;/h4&gt;
&lt;h3&gt;
  
  
  IntelliSense Era
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Pattern matching &amp;amp; static analysis&lt;/li&gt;
&lt;li&gt;Single-file context awareness&lt;/li&gt;
&lt;li&gt;20-30% keystroke reduction&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2021–2024
&lt;/h4&gt;
&lt;h3&gt;
  
  
  AI Revolution (Copilot)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Large language models (Codex/GPT)&lt;/li&gt;
&lt;li&gt;Multi-file context&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/2023-03-23-github-copilot-x-the-ai-powered-developer-experience/" rel="noopener noreferrer"&gt;35-55% faster development&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2024–2025
&lt;/h4&gt;
&lt;h3&gt;
  
  
  AI-Native IDEs (Cursor)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Project-wide understanding&lt;/li&gt;
&lt;li&gt;Multi-model flexibility&lt;/li&gt;
&lt;li&gt;Lower latency than Copilot&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2025–Present
&lt;/h4&gt;
&lt;h3&gt;
  
  
  Prompt-Based Development (Claude Code or GPT-5 via Codex CLI)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Autonomous task execution&lt;/li&gt;
&lt;li&gt;Natural language programming&lt;/li&gt;
&lt;li&gt;Complete workflow automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What strikes me most is not the raw capability improvements, but the shift in how developers think about code. We've gone from &lt;strong&gt;"type less"&lt;/strong&gt; to &lt;strong&gt;"describe what you want."&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Turning Point: GitHub Copilot&lt;a href="https://www.tigrisdata.com/blog/generative-software#the-turning-point-github-copilot" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Copilot was my first real taste of AI, writing code that felt like my own. I still remember asking it to generate a Terraform module for Tigris, our S3-compatible object storage, and watching it produce over &lt;strong&gt;&lt;a href="https://github.com/tigrisdata/terraform-provider-tigris" rel="noopener noreferrer"&gt;2,000 lines of code&lt;/a&gt;&lt;/strong&gt; in minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnb4cbyxym5dl6cecurlg.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnb4cbyxym5dl6cecurlg.webp"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;em&gt;&lt;p&gt;The 1,000 lines of enhancements added to the Tigris Terraform Provider.&lt;/p&gt;&lt;/em&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;Was it perfect? Of course not. I had to review and make &lt;strong&gt;&lt;a href="https://github.com/tigrisdata/terraform-provider-tigris/commit/0e49b0a450def187f592f7909851ff44fc4a96ec" rel="noopener noreferrer"&gt;1,000+ lines of enhancements&lt;/a&gt;&lt;/strong&gt; before shipping. But that didn't matter. It turned a multi-day task into something I could iterate on in an afternoon.&lt;/p&gt;

&lt;p&gt;Copilot made me realize something fundamental: developers are now curators and reviewers as much as they are authors of code.&lt;/p&gt;
&lt;h2&gt;
  
  
  Cursor: The IDE Redefined&lt;a href="https://www.tigrisdata.com/blog/generative-software#cursor-the-ide-redefined" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;If Copilot felt like autocomplete on steroids, Cursor feels like hiring a junior engineer who can read and understand the entire repository in seconds.&lt;/p&gt;

&lt;p&gt;I recently used Cursor while working on our object storage cache for PyTorch and Dask. I'd describe a design or a feature, &lt;strong&gt;"optimal file reading that uses the cached file handler"&lt;/strong&gt; , and Cursor would produce a usable draft across multiple files.&lt;/p&gt;

&lt;p&gt;It wasn't about typing anymore. It was about guiding. I found myself shifting from "what code do I write?" to "how do I architect this system, so AI can fill in the details?"&lt;/p&gt;
&lt;h2&gt;
  
  
  Claude Code: The Prompt Revolution&lt;a href="https://www.tigrisdata.com/blog/generative-software#claude-code-the-prompt-revolution" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;By the time I had gotten comfortable with tools like Copilot and Cursor, I thought I had a pretty good sense of what "AI-assisted development" could do. Then I tried Claude Code.&lt;/p&gt;

&lt;p&gt;Claude Code takes it one step further. It's not just embedded in an IDE; it's like an autonomous copilot. Combined with &lt;strong&gt;&lt;a href="https://conductor.build/" rel="noopener noreferrer"&gt;Conductor&lt;/a&gt;&lt;/strong&gt;, I can run multiple Claude agents in parallel. I first used Claude on &lt;strong&gt;&lt;a href="https://github.com/tigrisdata/tigrisfs" rel="noopener noreferrer"&gt;tigrisfs&lt;/a&gt;&lt;/strong&gt;, our FUSE-based filesystem that mounts S3-compatible object storage as a local POSIX drive. This is deep infrastructure code, definitely not low-hanging fruit.&lt;/p&gt;

&lt;p&gt;With a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ claude "Replace deprecated semaphore implementation"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Claude scans the &lt;strong&gt;&lt;a href="https://github.com/tigrisdata/tigrisfs" rel="noopener noreferrer"&gt;tigrisfs&lt;/a&gt;&lt;/strong&gt; repo, finds the outdated code, implements a fix, tests it, and opens a &lt;strong&gt;&lt;a href="https://github.com/tigrisdata/tigrisfs/pull/33" rel="noopener noreferrer"&gt;pull request&lt;/a&gt;&lt;/strong&gt; with a detailed explanation.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/GvtVMGyMWXY"&gt;
  &lt;/iframe&gt;
 &lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;em&gt;Claude Code workflow for a complex codebase (2:33)&lt;/em&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;The wild part? That PR is then reviewed by another AI agent: Copilot. Claude takes the feedback, updates the code, and resubmits.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxphbohyht1h12iwi4hb.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxphbohyht1h12iwi4hb.webp"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;&lt;small&gt;&lt;em&gt;&lt;p&gt;Another AI reviews the AI generated code, leaves nitpick comments, which
are then resolved by Claude.&lt;/p&gt;&lt;/em&gt;&lt;/small&gt;&lt;/center&gt;

&lt;p&gt;This wasn't just code generation. It was multi-step task execution, contextual understanding, and collaborative iteration - all initiated from a single prompt.&lt;/p&gt;

&lt;p&gt;This is where things clicked for me: we're no longer just writing code with AI's help. We're supervising autonomous agents as they do the heavy lifting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future: From Coding to Conversing&lt;a href="https://www.tigrisdata.com/blog/generative-software#the-future-from-coding-to-conversing" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We're entering a new phase of Conversational Development. Code is still involved, but the interaction layer is now natural language. Here's where I think we're heading:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;50%+ of new code written by AI&lt;/li&gt;
&lt;li&gt;IDE → AI orchestration platform&lt;/li&gt;
&lt;li&gt;Developers focus on architecture, validation, and system thinking&lt;/li&gt;
&lt;li&gt;Prompt engineering becomes a core competency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're not just witnessing a new generation of developer tools. We're witnessing a redefinition of what it means to be a software engineer.&lt;/p&gt;

&lt;p&gt;As a founder, this excites me. I see the potential to build faster, iterate more intelligently, and remove the friction that slows innovation. But it also demands a mindset shift.&lt;/p&gt;

&lt;p&gt;Those who don't embrace this transformation will likely get left behind. At Tigris, we're building the storage layer designed for the future of AI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigrisdata.com/docs/get-started/?utm_source=blog-post-generative-software" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87fsbkdxa2g3v6k5wq79.png" alt="Want to explore how we're building infrastructure to support this future?"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>engineering</category>
      <category>ai</category>
      <category>performance</category>
      <category>python</category>
    </item>
    <item>
      <title>I Tested Qwen Image's Text Rendering Claims. Here's What I Found.</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/i-tested-qwen-images-text-rendering-claims-heres-what-i-found-2b05</link>
      <guid>https://dev.to/tigrisdata/i-tested-qwen-images-text-rendering-claims-heres-what-i-found-2b05</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuefmaxwfcgrbc51bufj.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvuefmaxwfcgrbc51bufj.webp" width="800" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This week Alibaba released &lt;a href="https://qwenlm.github.io/blog/qwen-image/" rel="noopener noreferrer"&gt;Qwen Image&lt;/a&gt;, an open-weights (Apache 2) model that claims to support image editing that can match a described style, and better support for generating text. All while fitting into a chat experience. I took a day to experiment with the new model by generating our mascot, Ty. It went… well, you can compare the main image here to our other blog posts.&lt;/p&gt;

&lt;p&gt;I read the paper so you don’t have to, and I have a take.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction&lt;a href="https://www.tigrisdata.com/blog/qwen-image#introduction" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;I've been messing around with image generation models since Stable Diffusion v1 was released in 2022. I like to keep up with the latest models even though I'm not using image generation models on my own blog anymore. I generate most of the illustrations on the Tigris blog, and I've settled on this cheerful pseudo-manga style that is flexible enough to generate reliably. One of the main downsides with this style is that it heavily relies on &lt;a href="https://openai.com/index/image-generation-api/" rel="noopener noreferrer"&gt;gpt-image-1&lt;/a&gt; with multimodal inputs, a closed model that I can’t run myself.&lt;/p&gt;

&lt;p&gt;I wanted to like Qwen Image: improved text rendering and the claim of a conversational editing flow usually means I can’t run the models myself. But with Qwen, I can do it on my own hardware privately. However, the images just weren’t consistent stylistically, even with consistent prompting. And checking their &lt;a href="https://huggingface.co/Qwen/models" rel="noopener noreferrer"&gt;Hugging Face page&lt;/a&gt;, the image editing features were nowhere to be found.&lt;/p&gt;

&lt;p&gt;To tease us further, in &lt;a href="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf" rel="noopener noreferrer"&gt;the Qwen Image paper&lt;/a&gt;, they claim:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We further train a multi-task version of Qwen-Image for image editing (TI2I) tasks, seamlessly integrating both text and image as conditioning inputs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Section 5.2.3: Performance of Image Editing (page 23)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I understand that there’s a race to announce new features before competing models, but I believe in actually testing out new tools before I have an opinion. I would love to try Google’s &lt;a href="https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/" rel="noopener noreferrer"&gt;new world simulation model&lt;/a&gt;, too. However:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We believe Genie 3 is a significant moment for world models, where they will begin to have an impact on many areas of both AI research and generative media. To that end, we're exploring how we can make Genie 3 available to additional testers in the future.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is PR speak for “you’re never going to see this in a product.”&lt;/p&gt;

&lt;p&gt;All I can say is: it’s cool AI tech is moving so fast. But without actually trying these features out myself, I truly don’t know how fast it’s actually going. Especially with something as qualitative as image generation.&lt;/p&gt;

&lt;p&gt;Before we get into the meat of how Qwen Image performs in my testing, let’s go over how diffusion models work just so we're on the same page.&lt;/p&gt;

&lt;h2&gt;
  
  
  How diffusion models work&lt;a href="https://www.tigrisdata.com/blog/qwen-image#how-diffusion-models-work" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Diffusion models are a kabbalistic miracle of math. At the core, they’re just incredibly advanced denoising systems, formally known as Denoising Diffusion Probabilistic Models (DDPMs), e.g. Stable Diffusion and DALLE-2.&lt;/p&gt;

&lt;p&gt;During training, the model is shown hundreds of millions of images paired with text descriptions. To teach it how to "clean up" noisy images, we intentionally add random noise to each training image. The model’s job is to learn how to reverse it using the text prompt as a guide for where and how to remove the noise.&lt;/p&gt;

&lt;p&gt;When you generate an image, the model performs this process in reverse. It starts with a latent space of pure random noise and gradually subtracts more and more noise with each diffusion step. It's synthesizing an image from scratch by removing all of the noise until the image remains, organizing the chaos into whatever you asked it to generate.&lt;/p&gt;

&lt;p&gt;I think this is much easier to explain visually, so here’s an animation of each diffusion phase of Qwen Image generating Ty riding a skateboard:&lt;/p&gt;

&lt;center&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Fty-diffusion-544a4f1ef15246f26454605eddd19434.webp" width="512" height="512"&gt;&lt;/center&gt;

&lt;p&gt;Each frame in that GIF shows a separate de-noising step. You can see the noise gradually get removed as the image is unearthed from the chaos. I don’t really know how to describe why this is intellectually cool to me– it’s like you’re uncovering order from chaos.&lt;/p&gt;

&lt;p&gt;Another cool thing you can do is an “image to image” flow by shoving an existing image into the latent space in place of the randomly generated noise. This is yet another mathematical miracle which allows you to generate images matching the style of another image.&lt;/p&gt;

&lt;p&gt;Either way, the model is effectively synthesizing the image one pixel at a time and removing noise one component at a time. This works great for illustrations, scenery, and wallpapers; but the model doesn't handle text properly because it's going for pixel patterns instead of internally consistent symbols. This works great for our blog illustrations, but it means I have to add the text in after the fact instead of it being a fundamental component of the image. It'd be great if a model could just solve text rendering for me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qwen Image claims to have solved complex text rendering&lt;a href="https://www.tigrisdata.com/blog/qwen-image#qwen-image-claims-to-have-solved-complex-text-rendering" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;One of the biggest weaknesses of open weights AI models is rendering text. Text is surprisingly complicated and bad AI text examples can be found all over the internet. This is bad enough for languages like English, but even worse for logographic languages like Chinese. One of my test prompts for this is a sign that says "我爱你" (I love you). Those characters are some of the most common in the Chinese language, so I’d expect them to be well represented in the training set.&lt;/p&gt;

&lt;p&gt;Qwen claims to excel at “complex text rendering, including multi-line layouts, paragraph-level semantics, and fine-grained details.” In the paper, they compare the generation of text in a flat document, but how does it fare in more complex scenarios?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fclyvj8r1mrgr9emtvmw3.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fclyvj8r1mrgr9emtvmw3.webp" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 17 from the &lt;a href="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf" rel="noopener noreferrer"&gt;Qwen Image paper&lt;/a&gt; showing Qwen Image's text rendering capabilities compared to other models&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Qwen claims to excel at "complex text rendering, including multi-line layouts, paragraph-level semantics, and fine-grained details." In the paper, they compare the generation of text in a flat document, but how does it fare in more complex scenarios? Let me test each of these capabilities systematically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-line layouts&lt;a href="https://www.tigrisdata.com/blog/qwen-image#multi-line-layouts" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Prompt: An anime woman holding a sign that says "我爱你" (I love you in Simplified Chinese) -- Left side is Stable Diffusion XL, right side is Qwen Image&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9443dafasewzgxr3qbep.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9443dafasewzgxr3qbep.webp" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is night and day better, almost to a ridiculous level. 👏&lt;/p&gt;

&lt;h3&gt;
  
  
  Paragraph-level semantics&lt;a href="https://www.tigrisdata.com/blog/qwen-image#paragraph-level-semantics" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Next, let's test paragraph-level text rendering, such as the intro to the FitnessGram Pacer Test:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4984gvmepac6ksogk477.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4984gvmepac6ksogk477.webp" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it should read:&lt;/strong&gt;&lt;br&gt;
The FitnessGram Pacer Test is a multistage aerobic capacity test that progressively gets more difficult as it continues. The 20 meter pacer test will begin in 30 seconds. Line up at the start. The running speed starts slowly but gets faster each minute after you hear this signal &lt;em&gt;bodeboop&lt;/em&gt;. A single lap should be completed every time you hear this sound.&lt;/p&gt;

&lt;p&gt;This also works with less common English words like "esuna" (an RPG spell that dispels temporary status effects like poison, paralysis, or sleep):&lt;/p&gt;

&lt;p&gt;Prompt: A green-haired anime woman with long hair and green eyes wearing a black hoodie with "YOU CAN'T ESUNA LATENCY" in bold white letters. She is also wearing denim jeans. She is sitting at a picnic table outside in Seattle with a coffee next to her laptop. The laptop has a hexagonal logo on it. digital art, cinematic lighting, highly detailed, 4k, focused, looking at laptop, writing in notebook, space needle visible in distance. – Qwen Image&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrzwhl3fka632os8uc1g.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrzwhl3fka632os8uc1g.webp" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This looks… fine? The text is curved, but not draped. The multiline layout is pretty good.&lt;/p&gt;
&lt;h3&gt;
  
  
  Fine-grained details&lt;a href="https://www.tigrisdata.com/blog/qwen-image#fine-grained-details" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Finally, let's test fine-grained text rendering with more complex scenarios:&lt;/p&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;&lt;br&gt;
In case you can’t see it, the pull string in the “U” of “YOU” is partially merged into the right side of the letter. The same is happening with the “N” in “CAN’T”. It’s like the objects aren’t being separated cleanly. I’m sorry for telling you about this because you are undoubtedly never going to be able to unsee this.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It falls apart when you make things more complicated such as adapting my email signature onto the hoodie. It misses a line entirely:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftif6hjy10gkfjs2zn0eu.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftif6hjy10gkfjs2zn0eu.webp" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It should read something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.iko snura .iko kanro
.iko panpi .iko gleki
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Which is the main set of instructions for &lt;a href="https://when-then-zen.christine.website/meditation/metta" rel="noopener noreferrer"&gt;Loving-kindness (Metta) meditation&lt;/a&gt; translated into Lojban, targeted at the reader. It’s intended as a blessing to end the communication with.&lt;/p&gt;

&lt;p&gt;Compare it to &lt;a href="https://chatgpt.com/share/6893b930-56bc-8006-9a83-ff6e2ad29456" rel="noopener noreferrer"&gt;gpt-image-1&lt;/a&gt;, and you can see which wins out with the fine grained details:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8q0sr0g1mbw90x1daud.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8q0sr0g1mbw90x1daud.webp" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Maybe this is just an outlier and it gets better if you do a bunch of iterations. For funsies, these images are in a mix of Chinese and English, in a context models seem to struggle with, a person holding a sign:&lt;/p&gt;

&lt;p&gt;Prompt: An anime woman holding a sign that says "我爱你" or "I love you"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frujk3n9qg38h0uwhqp51.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frujk3n9qg38h0uwhqp51.gif" alt=" " width="1024" height="576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each of them looks like it's the same font photoshopped into the image. It's kinda bizarre. The text is at least at the right rotation relative to where the sign is, but something just looks…off.&lt;/p&gt;

&lt;p&gt;Let's take a look at &lt;a href="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf" rel="noopener noreferrer"&gt;the paper&lt;/a&gt; and see what it has to say about its training data. Skip to section 3.4 "Data Synthesis":&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Given the long-tail distribution of textual content in real-world images, particularly for non-Latin languages such as Chinese, where numerous characters exhibit extremely low frequency, relying solely on naturally occurring text is insufficient to ensure adequate exposure to these rare characters during model training. To address this challenge and improve the robustness of text rendering across diverse contexts, we propose a multi-stage text-aware image synthesis pipeline. This pipeline integrates three complementary strategies: Pure Rendering, Compositional Rendering, and Complex Rendering. The details of each strategy are elaborated below.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Oh, that's why it looks like the text was photoshopped in: that's how they assembled the training data! They just photoshopped a bunch of text in a few fonts onto a bunch of reference images and then made all their GPUs rotate shapes in a loop until the model could render text reliably. I’m skeptical that this is a good idea. In the short term it does result in gain of functionality like we see with Qwen Image, but in the long term this can cause significant damage to future models due to &lt;a href="https://www.nature.com/articles/s41586-024-07566-y" rel="noopener noreferrer"&gt;model collapse&lt;/a&gt; as models are trained off of AI generated output.&lt;/p&gt;

&lt;p&gt;To be clear, using synthetic data does make sense from their perspective. Logographic languages like Chinese have &lt;a href="https://studycli.org/chinese-characters/number-of-characters-in-chinese/" rel="noopener noreferrer"&gt;literally tens of thousands of characters&lt;/a&gt;(Taiwan's dictionary tracks at least 106,000), but most people will never need to use more than about 15,000 for daily life. Among those, the most commonly used ones (你 "you", 我 "I/me", 是 "is", etc.) will follow &lt;a href="https://en.wikipedia.org/wiki/Zipf%27s_law" rel="noopener noreferrer"&gt;Zipf's law&lt;/a&gt; and be way more present in any dataset than the less commonly used ones (爨, "the stove").&lt;/p&gt;

&lt;p&gt;Prompt: An anime woman holding a sign that says "爨", Qwen Image&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0lc1yd2pz0ri7edp1rfz.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0lc1yd2pz0ri7edp1rfz.webp" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Okay, that was unfair, I found that character by looking at the bottom of the most used list of Chinese characters.&lt;/p&gt;

&lt;p&gt;They also went out of their way to make sure that their synthetic data met quality standards:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To ensure high-quality synthesized samples, a rigorous quality control mechanism is employed: if any character within a paragraph cannot be rendered due to limitations (e.g., font unavailability or rendering errors), the entire paragraph is discarded. This strict filtering guarantees that only fully valid and legible samples are included in the training dataset, thereby maintaining high fidelity in character-level text rendering.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They really did make text rendering better, but at the cost that all the text just looks "samey". I'm sure it can be polished out with some post-training, finetuning, or LoRA adapter models, but the text is just in that uncanny valley that makes people think it's badly photoshopped in.&lt;/p&gt;

&lt;h2&gt;
  
  
  So how does Qwen Image Stack up?&lt;a href="https://www.tigrisdata.com/blog/qwen-image#so-how-does-qwen-image-stack-up" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Qwen Image&lt;/th&gt;
&lt;th&gt;gpt-image-1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Image editing&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Style consistency across generations&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text synthesis&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High resolution output&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open weights (run on your own hardware)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fine grained details&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Paragraphs of text&lt;/td&gt;
&lt;td&gt;✅ (sometimes)&lt;/td&gt;
&lt;td&gt;✅ (most of the time)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion&lt;a href="https://www.tigrisdata.com/blog/qwen-image#conclusion" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Qwen Image is a solid choice for text rendering, &lt;em&gt;for an open model.&lt;/em&gt; Need fancy layouts, paragraphs, or text that really blends in? Closed models are still way ahead. But if you're running your own models to render text, you can get better results by adding text in post, compositing, or splitting up longer text into smaller bits before generating.&lt;/p&gt;

&lt;p&gt;If you end up customizing your models and needing to store weights, check out our guide on &lt;a href="https://www.tigrisdata.com/docs/model-storage/" rel="noopener noreferrer"&gt;Storing Model Weights on Tigris&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Store your models on Tigris
&lt;/h3&gt;

&lt;p&gt;Need low latency storage for your models without egress fees? We got you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigrisdata.com/docs/get-started/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff784t7kvuj2zms095m7a.png" alt="Ready? Get Started" width="800" height="128"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>engineering</category>
      <category>ai</category>
    </item>
    <item>
      <title>Using Hugging Face datasets with Tigris</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Tue, 29 Jul 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/using-hugging-face-datasets-with-tigris-5fmi</link>
      <guid>https://dev.to/tigrisdata/using-hugging-face-datasets-with-tigris-5fmi</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgv4bloilegnhu540f267.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgv4bloilegnhu540f267.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the most popular ways to share datasets is via &lt;a href="https://huggingface.co/datasets" rel="noopener noreferrer"&gt;Hugging Face’s dataset platform&lt;/a&gt;. You can even stream larger-than-laptop datasets, but there are no guarantees on throughput nor availability. When you’re developing a toy model, this might not be an issue. But as you mature your model, and combine your custom datasets with public datasets, it’s critical to save your own copy.&lt;/p&gt;

&lt;p&gt;The ability to reproduce the state of your model at a given time has become critical, and even legally required, as models are integrated into healthcare, legal, and other compliance heavy domains. Why did the AI agree to &lt;a href="https://www.upworthy.com/prankster-tricks-a-gm-dealership-chatbot-to-sell-him-a-76000-chevy-tahoe-for-ex1" rel="noopener noreferrer"&gt;sell a car for $1&lt;/a&gt;? Or &lt;a href="https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-database-called-it-a-catastrophic-failure/" rel="noopener noreferrer"&gt;delete a production database&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;As we develop models, they’re going to make mistakes. It’s challenging to debug across scattered datasets, especially public ones outside your control. Centralizing your datasets in a common store is a good first step on your way to full dataset version control. Just make sure you think about additional costs– HuggingFace dataset streaming is free, but private stores can quickly rack up egress fees.&lt;/p&gt;

&lt;p&gt;Today we’re going to learn how to import &lt;a href="https://huggingface.co/datasets" rel="noopener noreferrer"&gt;Hugging Face datasets&lt;/a&gt; into Tigris so that you can use them for whatever you need.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;&lt;br&gt;
In production workloads, we recommend that you use &lt;a href="https://www.tigrisdata.com/docs/libraries/lancedb/" rel="noopener noreferrer"&gt;LanceDB’s multimodal lakehouse&lt;/a&gt; to store your training datasets; but if you’re just getting started then this is way more than enough.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Prerequisites&lt;a href="https://www.tigrisdata.com/blog/huggingface-datasets#prerequisites" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Here’s what you need to get started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A local Python development environment (our blog has a guide on &lt;a href="https://www.tigrisdata.com/blog/dev-containers-python/" rel="noopener noreferrer"&gt;using development containers&lt;/a&gt; to set one up).&lt;/li&gt;
&lt;li&gt;A Tigris account from &lt;a href="https://storage.new/" rel="noopener noreferrer"&gt;storage.new&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A Tigris bucket and access keys with the Editor permission on that bucket.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Setting up your environment manually
&lt;/h1&gt;

&lt;p&gt;For manual setup, you'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.10 or later&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt; or another Python dependency manager&lt;/li&gt;
&lt;li&gt;Your Tigris access credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Install the dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uv python install 3.10
uv venv
uv sync
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, copy &lt;code&gt;.env.example&lt;/code&gt; to &lt;code&gt;.env&lt;/code&gt; and configure your Tigris credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Tigris # Tigris configuration
AWS_ACCESS_KEY_ID=tid_your_access_key_here
AWS_SECRET_ACCESS_KEY=tsec_your_secret_key_here
AWS_ENDPOINT_URL_S3=https://fly.storage.tigris.dev
AWS_ENDPOINT_URL_IAM=https://iam.tigris.dev
AWS_REGION=auto

# Dataset and bucket
BUCKET_NAME=your-bucket-name-here
DATASET_NAME=mlabonne/FineTome-100k
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;To verify your configuration is correct, run the validation script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uv run scripts/ensure-dotenv.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;This script checks that all required environment variables are set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
from dotenv import load_dotenv

load_dotenv()

for key in [
    "AWS_ACCESS_KEY_ID",
    "AWS_SECRET_ACCESS_KEY",
    "AWS_ENDPOINT_URL_S3",
    "AWS_ENDPOINT_URL_IAM",
    "AWS_REGION",
    "BUCKET_NAME",
    "DATASET_NAME",
]:
    assert os.getenv(key) is not None, f"Environment variable {key} is not defined"

print("Your .env file is good to go!")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Importing a dataset&lt;a href="https://www.tigrisdata.com/blog/huggingface-datasets#importing-a-dataset" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Now let's import the &lt;a href="https://huggingface.co/datasets/mlabonne/FineTome-100k" rel="noopener noreferrer"&gt;FineTome-100k&lt;/a&gt; dataset to Tigris. The process is surprisingly straightforward thanks to Hugging Face datasets' built-in support for S3-compatible storage.&lt;br&gt;&lt;br&gt;
First, let's look at the helper module that sets up our Tigris connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import s3fs
from dotenv import load_dotenv
from typing import Dict, Tuple

def setup() -&amp;gt; Tuple[Dict[str, str], s3fs.S3FileSystem]:
    load_dotenv()

    storage_options = {
        "key": os.getenv("AWS_ACCESS_KEY_ID"),
        "secret": os.getenv("AWS_SECRET_ACCESS_KEY"),
        "endpoint_url": os.getenv("AWS_ENDPOINT_URL_S3"),
    }

    # Create the S3 filesystem
    fs = s3fs.S3FileSystem(**storage_options)

    # Test write access
    bucket_name = os.getenv("BUCKET_NAME")
    fs.write_text(f"/{bucket_name}/test.txt", "this is a test")
    fs.rm(f"/{bucket_name}/test.txt")

    return (storage_options, fs)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The import script uses Hugging Face datasets' &lt;code&gt;save_to_disk&lt;/code&gt; method with our Tigris storage options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import tigris
from datasets import load_dataset
from dotenv import load_dotenv

def main():
    storage_options, fs = tigris.setup()

    bucket_name = os.getenv("BUCKET_NAME")
    dataset_name = os.getenv("DATASET_NAME")

    # Load the dataset from Hugging Face
    dataset = load_dataset(dataset_name, split="train")

    # Save directly to Tigris
    dataset.save_to_disk(
        f"s3://{bucket_name}/datasets/{dataset_name}",
        storage_options=storage_options
    )

    print(f"Dataset {dataset_name} is now in Tigris at {bucket_name}/datasets/{dataset_name}")

if __name__ == "__main__":
    main()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Run the import script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uv run scripts/import-to-tigris.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;That's it! The dataset is now stored in Tigris and ready to use from anywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reading and processing datasets from Tigris&lt;a href="https://www.tigrisdata.com/blog/huggingface-datasets#reading-and-processing-datasets-from-tigris" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Once your dataset is in Tigris, you can load it from anywhere using the same storage options. Here's an example that loads the dataset, applies a filter, and saves the filtered version back to Tigris:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import tigris
from datasets import load_from_disk

def remove_blue(row):
    """
    Example transformation that removes conversations mentioning "blue".
    You can implement any filtering or transformation logic here.
    """
    for conv in row['conversations']:
        if "blue" in conv['value']:
            return False  # remove the row
    return True  # keep the row

def main():
    storage_options, fs = tigris.setup()

    bucket_name = os.getenv("BUCKET_NAME")
    dataset_name = os.getenv("DATASET_NAME")

    # Load dataset from Tigris
    dataset = load_from_disk(
        f"s3://{bucket_name}/datasets/{dataset_name}",
        storage_options=storage_options
    )

    # Apply filtering
    filtered_ds = dataset.filter(remove_blue)

    # Save filtered dataset back to Tigris
    filtered_ds.save_to_disk(
        f"s3://{bucket_name}/no-blue/{dataset_name}",
        storage_options=storage_options
    )

    print(f"Filtered dataset saved to {bucket_name}/no-blue/{dataset_name}")

if __name__ == "__main__":
    main()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Run the processing script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uv run scripts/read-from-tigris.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion&lt;a href="https://www.tigrisdata.com/blog/huggingface-datasets#conclusion" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;You did it! Your copy of those datasets are safely stored in your own bucket. You have centralized your datasets and are on the path to versioning them.&lt;/p&gt;

&lt;p&gt;We love Hugging Face for providing models and datasets to the world for free, and we want you to keep using them to develop your own models. However, as you start maturing and complying with regulations, making your own copy ensures no one tampers with the data. And that your bandwidth won’t suddenly drop mid training job, or lag across regions. Tigris dynamically places your datasets where you need them so you can scale fearlessly to any cloud with an internet connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigrisdata.com/docs/get-started/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff784t7kvuj2zms095m7a.png" alt="Want to try it out? Get Started" width="800" height="128"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>datasets</category>
      <category>ai</category>
      <category>buildwithtigris</category>
    </item>
    <item>
      <title>We made unhateable IAM. Here’s how to use it.</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Thu, 17 Jul 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/we-made-unhateable-iam-heres-how-to-use-it-2dl6</link>
      <guid>https://dev.to/tigrisdata/we-made-unhateable-iam-heres-how-to-use-it-2dl6</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfuw9k5babjt2jehe53q.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfuw9k5babjt2jehe53q.webp"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We made IAM you can’t hate. Simplified permissions, an easy way to list access keys attached to a given policy, and a VSCode style editor experience that feels like your local development environment: all in our new IAM Policy Builder in the Tigris Console.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/ZabMm5uhZfQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;It should be a code smell that there are &lt;a href="https://github.com/iann0036/iamlive" rel="noopener noreferrer"&gt;several&lt;/a&gt; &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/access-analyzer-policy-generation.html" rel="noopener noreferrer"&gt;tools&lt;/a&gt; for constructing &lt;a href="https://en.wikipedia.org/wiki/Principle_of_least_privilege" rel="noopener noreferrer"&gt;least privilege policies&lt;/a&gt;, all of which require you to have overprivileged entities and then cut back privileges, rather than making a least privilege policy from the start. I can personally list eight different ways to give access to an S3 bucket; there may be more. Why is it so hard to do the right thing?&lt;/p&gt;

&lt;p&gt;The feedback loop of IAM policy development leads to even Senior Engineers :™: having reactions like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/glcst/status/1909957508682129883" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6xfiwmbe0w5ke83rne9p.jpg" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://bsky.app/profile/cr3ative.co.uk/post/3ltpfm2hsq22w" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngeyx3alc9qika6tdfs0.jpg" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://triangletoot.party/@donaldball/114857960806541467" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9atht9mhnggpr3zgmr3.jpg" alt=" "&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;How do you simplify IAM whilst maintaining a strong security posture? We started by removing two of the easiest to misuse components: IAM Users and IAM Roles. On Tigris, you’re a member of an organization, but that’s as far as “users” go. You don’t assume roles to get temporary credentials that you then need to refresh in the middle of longer running jobs. We don’t have compute instances, so we don’t need instance profiles for policy attachments to access your data from an instance. It’s only access keys with policy attachments. That’s the beauty of using a product that does one thing, and does it well.&lt;/p&gt;

&lt;p&gt;“But static keys will get leaked and my data ransomed!” What you really want to do is ensure that a given access key has only enough permissions to get its job done for a given amount of time and no more. And then that key loses access automatically. It doesn’t have to be so complex, truly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: Running a Training Job on a Newer Cloud Provider&lt;a href="https://www.tigrisdata.com/blog/iam-policy-builder#example-running-a-training-job-on-a-newer-cloud-provider" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;A Suspicious Person :™: in a trench coat shuffles past you on the streets of San Francisco and surreptitiously looks you up and down, sees the wad of computer cables poking from your backpack, and pulls you aside: “GPUs for 35 cents an hour?” They whip open their trench coat and dazzle you with a full selection of cloud prices so cheap you swear that they’re a front for the mafia. But you work at a startup, and runway is runway. How can you take advantage of these low, low prices without exposing yourself to hackery and leakery?&lt;/p&gt;

&lt;p&gt;Do you load your data into their storage? Surely not, if they even have storage. You don’t want anything especially long-lived on those trench-coat GPUs, and you certainly don’t want to initially overprovision access to then cut down to least privilege. It should be tightly scoped from the start: mint access keys with minimum permissions to do the job, then delete them when you’re done.&lt;/p&gt;

&lt;p&gt;So we do that: your training job on those absurdly affordable GPUs gets permission to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read only access to one dataset so that the job cannot modify any datasets in use by other jobs&lt;/li&gt;
&lt;li&gt;Read only access to the base model collection so that the job can’t corrupt any of your models&lt;/li&gt;
&lt;li&gt;Write only access to the finetuned model collection so that all the job can do when it’s done is submit its work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If an access key with these permissions is leaked, your attack surface is still quite small: your precious finetuned models are safe. The one dataset allocated to the job and base model collection can be read, but not altered. Other jobs are unaffected. Let’s take a look at building such a policy for your training job in the new IAM Policy Builder, and adding a time based restriction:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/ijk9ZZdxeXA"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;And here’s the policy JSON for that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "WikipediaReadOnly",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::contoso-training-datasets-wikipedia-2025-07-01",
        "arn:aws:s3:::contoso-training-datasets-wikipedia-2025-07-01/*"
      ]
    },
    {
      "Sid": "BaseModelsReadOnly",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::contoso-base-models",
        "arn:aws:s3:::contoso-base-models/*"
      ]
    },
    {
      "Sid": "FinetunedModelsWrite",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket",
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts",
        "s3:CompleteMultipartUpload"
      ],
      "Resource": [
        "arn:aws:s3:::contoso-finetuned-models",
        "arn:aws:s3:::contoso-finetuned-models/*"
      ]
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;When you make policies in our editor, you can use other policies as a starting point so that you don’t have to summon each statement into existence with sheer might of your left click button. We even give you some of the creature comforts that you get in your editor of choice: error squiggles to let you know when something is wrong, syntax highlighting so you can visually distinguish the brackets, and schema validation so you can’t create a policy that doesn’t work.&lt;/p&gt;

&lt;p&gt;Imagine this same situation, but with the AWS IAM structure.&lt;/p&gt;

&lt;p&gt;And even in correctly and fully implementing this complex process, there’s still inconsistencies that can bite back: tags based on policies on iam:PassRole are known to be unreliable. Complex doesn’t always mean secure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Burying the Lede: Linked Access Keys&lt;a href="https://www.tigrisdata.com/blog/iam-policy-builder#burying-the-lede-linked-access-keys" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Being able to build policies with a combination of JSON and button clicking is nice, but we’re burying the lede here: we added something you cannot easily do in other storage providers. On Tigris, you can list all the access keys that have a given policy attached. Then if you do have a key leak or want to investigate the permissions of keys you found on some vintage cron job, it’s all there in the Dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff4rfwuj92emjz5sh90d6.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff4rfwuj92emjz5sh90d6.webp"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is nontrivial to assemble on other platforms due to the overhead involved with IAM Roles and IAM Users. It’s possible but not the most pleasant. In case you need it, here’s the incantation for AWS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POLICY_ARN="arn:aws:iam::123456789012:policy/YourPolicyName"

for user in $(aws iam list-entities-for-policy \
    --policy-arn "$POLICY_ARN" \
    --query 'PolicyUsers[].UserName' --output text); do
  echo "User: $user"
  aws iam list-access-keys --user-name "$user" \
    --query 'AccessKeyMetadata[].AccessKeyId' --output text
done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;This really helps when your production environment looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhq9gna2jsu817lsnt3b.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhq9gna2jsu817lsnt3b.webp"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion&lt;a href="https://www.tigrisdata.com/blog/iam-policy-builder#conclusion" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;IAM doesn’t have to be hellish; it can be just as good as your local development setup. Tigris’ IAM Policy Builder blends a tile based GUI with a VSCode like editor experience to give you the best of both worlds. You can start with a pasted example policy, customize it, and know what you’re writing actually works. Give it a try, we’re sure that you’ll come to learn to love IAM in a way you never have before.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was originally published on July 17, 2025 at tigrisdata.com/blog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>features</category>
      <category>iam</category>
    </item>
    <item>
      <title>Getting started with Warpstream on Tigris</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Thu, 10 Jul 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/getting-started-with-warpstream-on-tigris-3bk0</link>
      <guid>https://dev.to/tigrisdata/getting-started-with-warpstream-on-tigris-3bk0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6lytnnfzer6o11iyrxx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx6lytnnfzer6o11iyrxx.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.warpstream.com/" rel="noopener noreferrer"&gt;Warpstream&lt;/a&gt; lets you store an unlimited amount of data in your message queues, but when you set it up with S3 or other object stores, you end up having to pay egress fees to read messages. Tigris is a globally distributed, multi-cloud object storage service with built-in support for the S3 API and no egress fees. When you combine the two, you get a bottomless durable message queue that lets you store however much you want without having to worry about where your data is.&lt;/p&gt;

&lt;p&gt;Before we get started, let’s cover the moving parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://kafka.apache.org/" rel="noopener noreferrer"&gt;&lt;strong&gt;Apache Kafka&lt;/strong&gt;&lt;/a&gt; is a durable message queue. In Kafka, Producers send Messages into Topics hosted by Brokers that are read by Consumers or Consumer Groups. Kafka is one of the most popular message queue programs. It’s deployed by 80% of Fortune 500 companies because it’s very fault-tolerant and its durability means that the Queues continue functioning even as Brokers go down. The main downside is that Kafka relies on local storage, meaning that your Kafka Brokers need to have lots of fast storage.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.warpstream.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Warpstream&lt;/strong&gt;&lt;/a&gt; is like Kafka but it improves on it in one key way: Warpstream puts every Message in every Topic into objects in an S3-compatible object store. This means that the amount of data you hold in your queue isn’t limited by the amount of storage in each server running Warpstream. This also means you don’t need to set up all of Kafka’s dependencies (Zookeeper, the JVM, etc). Warpstream also ships an easy to use command line utility that helps you administrate your message queue and test functionality.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docker.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Docker&lt;/strong&gt;&lt;/a&gt; is the universal package format for the Internet. Docker lets you put your application and all its dependencies into a container image so that it can’t conflict with anything else on the system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today we’re going to deploy a Warpstream Broker backed by Tigris into a Docker container so you can create your own bottomless durable message queue. This example will use &lt;a href="https://docs.docker.com/compose/" rel="noopener noreferrer"&gt;Docker compose&lt;/a&gt;, but it will help you understand how to create your own broker so you can deploy it anywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites&lt;a href="https://www.tigrisdata.com/blog/warpstream#prerequisites" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Clone the &lt;a href="https://github.com/tigrisdata-community/warpstream-tigris" rel="noopener noreferrer"&gt;warpstream-tigris&lt;/a&gt; demo repo to your laptop and open it in your favourite editor, such as &lt;a href="https://code.visualstudio.com/" rel="noopener noreferrer"&gt;VS Code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Make sure you have the following installed on your computer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.docker.com/products/docker-desktop/" rel="noopener noreferrer"&gt;Docker Desktop&lt;/a&gt; or another similar app like &lt;a href="https://podman-desktop.io/" rel="noopener noreferrer"&gt;Podman Desktop&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://aws.amazon.com/cli/" rel="noopener noreferrer"&gt;AWS CLI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.warpstream.com/warpstream/reference/cli-reference" rel="noopener noreferrer"&gt;Warpstream’s CLI&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You will need the following accounts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Tigris account from &lt;a href="https://storage.new/" rel="noopener noreferrer"&gt;storage.new&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A Warpstream account from &lt;a href="http://console.warpstream.com/login" rel="noopener noreferrer"&gt;console.warpstream.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Building a compose file&lt;a href="https://www.tigrisdata.com/blog/warpstream#building-a-compose-file" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;First, clone &lt;a href="https://github.com/tigrisdata-community/warpstream-tigris" rel="noopener noreferrer"&gt;tigrisdata-community/warpstream-tigris&lt;/a&gt; to your laptop and open it in your favorite text editor. If you use &lt;a href="https://www.tigrisdata.com/blog/dev-containers-python/" rel="noopener noreferrer"&gt;development containers&lt;/a&gt;, tell your editor to open this repository in a development container to get up and running in a snap!&lt;/p&gt;

&lt;p&gt;Take a look at the &lt;code&gt;docker-compose.yaml&lt;/code&gt; file in the root of the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;services:
  warp:
    # Grab the latest copy of the warpstream agent for your computer
    image: public.ecr.aws/warpstream-labs/warpstream_agent:latest
    # Run warpstream in "playground" mode for testing
    command:
      - playground
      - -advertiseHostnameStrategy
      - custom
      - -advertiseHostnameCustom
      - warp
    environment:
      # this is a no-op as it will default on the custom advertised hostname defined above, but you can change this if you want to use a different hostname with Kafka
      - WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE=warp
    healthcheck:
      # Wait for the Agent to finish setting up the demo before marking it as healthy
      # to delay the diagnose-connection command from running for a few seconds.
      test: ["CMD", "sh", "-c", "sleep 10"]
      interval: 5s
      timeout: 15s
      retries: 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open a new terminal in your development container and make sure Warpstream is up and running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;warpstream kcmd --bootstrap-host warp --type diagnose-connection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should return output like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;running diagnose-connection sub-command with bootstrap-host: warp and bootstrap-port: 9092


Broker Details
---------------
  warp:9092 (NodeID: 1547451680) [playground]
    ACCESSIBLE ✅


GroupCoordinator: warp:9092 (NodeID: 1547451680)
    ACCESSIBLE ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Excellent! Create a new topic with &lt;code&gt;warpstream kcmd&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;warpstream kcmd --bootstrap-host warp --type create-topic --topic hello
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should return output like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;running create-topic sub-command with bootstrap-host: warp and bootstrap-port: 9092

created topic "hello" successfully, topic ID: MQAAAAAAAAAAAAAAAAAAAA==
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perfect! Now let’s make it work with Tigris. Create a &lt;code&gt;.env&lt;/code&gt; file in the root of the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cp .env.example .envcode .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a new bucket at &lt;a href="https://storage.new/" rel="noopener noreferrer"&gt;storage.new&lt;/a&gt; in the Standard access tier. Copy its name down into your notes. Create a new &lt;a href="https://storage.new/accesskey" rel="noopener noreferrer"&gt;access key&lt;/a&gt; with Editor permissions for that bucket. Copy the environment details into your &lt;code&gt;.env&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Tigris credentials
AWS_ACCESS_KEY_ID=tid_access_key_id
AWS_SECRET_ACCESS_KEY=tsec_secret_access_key
AWS_ENDPOINT_URL_S3=https://t3.storage.dev
AWS_ENDPOINT_URL_IAM=https://iam.storage.dev
AWS_REGION=auto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then fill in your Warpstream secrets from the console, you need the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cluster ID from the virtual clusters list (begins with &lt;code&gt;vci_&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Bucket URL (explained below)&lt;/li&gt;
&lt;li&gt;Agent key from the agent keys page for that virtual cluster (begins with &lt;code&gt;aks_&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Cluster region from the admin panel (such as &lt;code&gt;us-east-1&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your bucket is named &lt;code&gt;xe-warpstream-demo&lt;/code&gt;, your bucket URL should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;s3://xe-warpstream-demo?region=auto&amp;amp;endpoint=https://t3.storage.dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Altogether, put these credentials in your &lt;code&gt;.env&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Warpstream credentials
WARPSTREAM_AGENT_KEY=aks_agent_key
WARPSTREAM_BUCKET_URL='s3://xe-warpstream-demo?region=auto&amp;amp;endpoint=https://t3.storage.dev'
WARPSTREAM_DEFAULT_VIRTUAL_CLUSTER_ID=vci_cluster_id
WARPSTREAM_REGION=us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Edit your &lt;code&gt;docker-compose.yaml&lt;/code&gt; file to load the &lt;code&gt;.env&lt;/code&gt; file and start warpstream in agent mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# docker-compose.yaml
services:
  warp:
    image: public.ecr.aws/warpstream-labs/warpstream_agent:latest
    command:
      - agent
    environment:
      WARPSTREAM_DISCOVERY_KAFKA_HOSTNAME_OVERRIDE: warp
      WARPSTREAM_DISCOVERY_KAFKA_PORT_OVERRIDE: 9092
      WARPSTREAM_REQUIRE_AUTHENTICATION: "false"
    env_file:
      - .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then restart your development container with control/command shift-p “Dev Containers: Rebuild Container”. Test the health of your Broker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;warpstream kcmd --bootstrap-host warp --type diagnose-connection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should get output like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;running diagnose-connection sub-command with bootstrap-host: warp and bootstrap-port: 9092


Broker Details
---------------
  warp:9092 (NodeID: 1415344910) [warpstream-unset-az]
    ACCESSIBLE ✅


GroupCoordinator: warp:9092 (NodeID: 1415344910)
    ACCESSIBLE ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s working! Create a topic and publish some messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;warpstream kcmd --bootstrap-host warp --type create-topic --topic hello
warpstream kcmd --bootstrap-host warp --type produce --topic hello --records "world,,world"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should create the topic &lt;code&gt;hello&lt;/code&gt; and two messages with &lt;code&gt;world&lt;/code&gt; in them. You should get output like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;result: partition:0 offset:0 value:"world"
result: partition:0 offset:1 value:"world"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let’s read them back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;warpstream kcmd --bootstrap-host warp --type fetch --topic hello --offset 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should get output like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;consuming topic:"hello" partition:0 offset:0
result: partition:0 offset:0 key:"hello" value:"world"
result: partition:0 offset:1 key:"hello" value:"world"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works! You’ve successfully put data into a queue and fetched it back from the queue. From here you can connect to your broker on host &lt;code&gt;warp&lt;/code&gt; and port &lt;code&gt;9092&lt;/code&gt;. All your data is securely backed by Tigris and you can access it from anywhere in the world.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigrisdata.com/docs/get-started/?utm_source=blog-post-warpstream" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhu6e8brwd94n2hvxdxre.png" alt=" " width="800" height="149"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was originally published on July 10, 2025 at tigrisdata.com/blog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
    </item>
    <item>
      <title>Small Objects, Big Gains: Benchmarking Tigris Against AWS S3 and Cloudflare R2</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Tue, 08 Jul 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/small-objects-big-gains-benchmarking-tigris-against-aws-s3-and-cloudflare-r2-2fd8</link>
      <guid>https://dev.to/tigrisdata/small-objects-big-gains-benchmarking-tigris-against-aws-s3-and-cloudflare-r2-2fd8</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxuyzpubfkcsof4lzh5sp.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxuyzpubfkcsof4lzh5sp.webp" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of Tigris's standout capabilities is its performance when storing and retrieving small objects. To quantify this advantage, we benchmarked Tigris against two popular object stores—AWS S3 and Cloudflare R2—and found that Tigris consistently delivers higher throughput and lower latency. These gains let you use a single store for everything from tiny payloads to multi-gigabyte blobs without sacrificing efficiency.&lt;/p&gt;

&lt;p&gt;Under the hood, Tigris accelerates small-object workloads by (i) inlining very small objects inside metadata records, (ii) coalescing adjacent keys to reduce storage overhead, and (iii) caching hot items in an on-disk, LSM-backed cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#summary" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Our benchmarks reveal that Tigris significantly outperforms both AWS S3 and Cloudflare R2 for small object workloads. Our benchmarks show Tigris achieves &lt;strong&gt;sub-10ms&lt;/strong&gt; read latency and &lt;strong&gt;sub-20ms&lt;/strong&gt; write latency, while sustaining &lt;strong&gt;4 x throughput&lt;/strong&gt; than S3 and &lt;strong&gt;20 x throughput&lt;/strong&gt; than R2 for both operations.&lt;/p&gt;

&lt;p&gt;To ensure our findings are reproducible, we outline the full benchmarking methodology and provide links to all artifacts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigrisdata.com/docs/get-started/?utm_source=blog-post-benchmarks" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsuvg1ogknjj7h5sy7jcp.png" alt=" " width="800" height="137"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Setup&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#benchmark-setup" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We used the &lt;a href="https://en.wikipedia.org/wiki/YCSB" rel="noopener noreferrer"&gt;Yahoo Cloud Serving Benchmark (YCSB)&lt;/a&gt; to evaluate the three systems. We had to make &lt;a href="https://github.com/tigrisdata/go-ycsb" rel="noopener noreferrer"&gt;our own fork&lt;/a&gt; of the Go version of YCSB to add support for S3-compatible object storage systems (such as Tigris and Cloudflare R2). We have submitted our changes upstream &lt;a href="https://github.com/pingcap/go-ycsb/pull/307" rel="noopener noreferrer"&gt;pingcap/go-ycsb, PR #307&lt;/a&gt;. At the time of writing, our changes are waiting for review.&lt;/p&gt;

&lt;p&gt;All experiments ran on a neutral cloud provider to avoid vendor-specific optimizations. Table 1 summarizes the test instance:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Table 1: Benchmark host configuration.&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Quantity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instance type&lt;/td&gt;
&lt;td&gt;VM.Standard.A1.Flex (Oracle Cloud)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Region&lt;/td&gt;
&lt;td&gt;us-sanjose-1 (West Coast)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vCPU cores&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;32 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network bandwidth&lt;/td&gt;
&lt;td&gt;32 Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  YCSB Configuration&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#ycsb-configuration" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;We benchmarked a dataset of 10 million objects, each 1 KB in size. You can view our configuration in the &lt;a href="https://github.com/tigrisdata-community/ycsb-benchmarks" rel="noopener noreferrer"&gt;tigrisdata-community/ycsb-benchmarks&lt;/a&gt; GitHub repo, specifically at &lt;a href="https://github.com/tigrisdata-community/ycsb-benchmarks/blob/main/results/10m-1kb/workloads3" rel="noopener noreferrer"&gt;results/10m-1kb/workloads3&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Our buckets were placed in the following regions per provider:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Region&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tigris&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;auto&lt;/code&gt; (globally replicated, but operating against the &lt;code&gt;sjc&lt;/code&gt; region)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS S3&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;us-west-1&lt;/code&gt; (Northern California)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare R2&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WNAM&lt;/code&gt; (Western North America)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Using YCSB we evaluated two distinct phases: (i) a bulk load of 10 million 1 KB objects and (ii) a mixed workload of one million operations composed of 80% reads and 20% writes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Loading 10 million objects&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#loading-10-million-objects" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Figure 1 (below) plots the end-to-end ingestion time. Tigris finishes the load in &lt;strong&gt;6711 s&lt;/strong&gt; , which is roughly &lt;strong&gt;31 % faster than S3 (8826 s)&lt;/strong&gt; and &lt;strong&gt;an order of magnitude faster than R2 (72063 s)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Latency drives this gap. As shown in Figure 2, R2's p90 PUT latency tops &lt;strong&gt;340 ms&lt;/strong&gt; whereas Tigris stays below &lt;strong&gt;36 ms&lt;/strong&gt; and S3 below &lt;strong&gt;38 ms&lt;/strong&gt;. Table 2 summarizes the key statistics.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Table 2: Load-phase latency and throughput metrics.&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;P50 Latency (ms)&lt;/th&gt;
&lt;th&gt;P90 Latency (ms)&lt;/th&gt;
&lt;th&gt;Runtime (sec)&lt;/th&gt;
&lt;th&gt;Throughput (ops/sec)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tigris&lt;/td&gt;
&lt;td&gt;16.799 ms&lt;/td&gt;
&lt;td&gt;35.871 ms&lt;/td&gt;
&lt;td&gt;6710.7 sec&lt;/td&gt;
&lt;td&gt;1490.2 ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;25.743 ms &lt;br&gt; (1.53x Tigris)&lt;/td&gt;
&lt;td&gt;37.791 ms &lt;br&gt; (1.05x Tigris)&lt;/td&gt;
&lt;td&gt;8826.4 sec  &lt;br&gt; (1.32x Tigris)&lt;/td&gt;
&lt;td&gt;1133 ops/sec &lt;br&gt; (0.76x Tigris)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;197.119 ms  &lt;br&gt; (11.73x Tigris)&lt;/td&gt;
&lt;td&gt;340.223 ms  &lt;br&gt; (9.48x Tigris)&lt;/td&gt;
&lt;td&gt;72063 sec  &lt;br&gt; (10.74x Tigris)&lt;/td&gt;
&lt;td&gt;138.8 ops/sec  &lt;br&gt; (0.09x Tigris)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Ftotal-time-load-comparison-c7c8a483cc6a0513151002e5975cef61.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Ftotal-time-load-comparison-c7c8a483cc6a0513151002e5975cef61.webp" title="Total load time – Tigris vs S3 vs R2" alt="Figure 1 – Total load time – Tigris vs S3 vs R2" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1: Total load time for loading 10 M 1 KB objects.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;R2 takes more than 300ms to write a single object which explains the slowness of the data load.&lt;/p&gt;

&lt;p&gt;While comparing Tigris latency to S3, it is still better but not the same margin as compared to R2.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Fload-sjc-s3-tigris-insert-latency_p90_ms-a8196adadd9cba9727d3adf362c0cefb.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Fload-sjc-s3-tigris-insert-latency_p90_ms-a8196adadd9cba9727d3adf362c0cefb.webp" title="PUT p90 latency – Tigris vs S3" alt="Figure 2 – PUT p90 latency – Tigris vs S3" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: PUT p90 latency during load phase.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1 million operations (20% write, 80% read)&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#1-million-operations-20-write-80-read" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;This is the &lt;em&gt;run&lt;/em&gt; phase of the YCSB benchmark. As a reminder, it is a 20% write and 80% read workload totaling 1 million operations.&lt;/p&gt;

&lt;h4&gt;
  
  
  Read throughput&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#read-throughput" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-r2-tigris-read-throughput_ops-bb6db6d7c60d1e342d2087f301535376.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-r2-tigris-read-throughput_ops-bb6db6d7c60d1e342d2087f301535376.webp" title="Read throughput – Tigris vs R2" alt="Figure 3 – Read throughput – Tigris vs R2" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3: Read throughput during mixed workload (Tigris vs R2).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-s3-tigris-read-throughput_ops-da1529d6996ffc426d42dd46b6811614.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-s3-tigris-read-throughput_ops-da1529d6996ffc426d42dd46b6811614.webp" title="Read throughput – Tigris vs S3" alt="Figure 4 – Read throughput – Tigris vs S3" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 4: Read throughput during mixed workload (Tigris vs S3).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Throughput traces for all three providers remain stable—useful for capacity planning—but the absolute rates diverge sharply. Tigris sustains &lt;strong&gt;≈3.3 k ops/s&lt;/strong&gt; , nearly &lt;strong&gt;4 × S3 (≈ 892 ops/s)&lt;/strong&gt; and &lt;strong&gt;20 × R2 (≈ 170 ops/s)&lt;/strong&gt;. This headroom lets applications serve real-time workloads directly from Tigris.&lt;/p&gt;

&lt;h4&gt;
  
  
  Read latency&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#read-latency" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-r2-tigris-read-latency_p90_ms-46e643e2523ea34e05f440062383f8a1.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-r2-tigris-read-latency_p90_ms-46e643e2523ea34e05f440062383f8a1.webp" title="Read p90 latency – Tigris vs R2" alt="Figure 5 – Read p90 latency – Tigris vs R2" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 5: Read p90 latency during mixed workload (Tigris vs R2).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-s3-tigris-read-latency_p90_ms-3000f66d81b1ae1c47e9b3b4a70d1b3c.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-s3-tigris-read-latency_p90_ms-3000f66d81b1ae1c47e9b3b4a70d1b3c.webp" title="Read p90 latency – Tigris vs S3" alt="Figure 6 – Read p90 latency – Tigris vs S3" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 6: Read p90 latency during mixed workload (Tigris vs S3).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Latency follows the same pattern. Tigris keeps p90 below &lt;strong&gt;8 ms&lt;/strong&gt; ; S3 settles around &lt;strong&gt;42 ms&lt;/strong&gt; , and R2 stretches beyond &lt;strong&gt;199 ms&lt;/strong&gt;. At sub-10 ms, reads feel closer to a key-value store than a traditional object store.&lt;/p&gt;

&lt;h4&gt;
  
  
  Write throughput&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#write-throughput" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-r2-tigris-update-throughput_ops-7b468a6c33651d711a76962ca6e53077.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-r2-tigris-update-throughput_ops-7b468a6c33651d711a76962ca6e53077.webp" title="Write throughput – Tigris vs R2" alt="Figure 7 – Write throughput – Tigris vs R2" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 7: Write throughput during mixed workload (Tigris vs R2).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-s3-tigris-update-throughput_ops-2d86342ca555dc256bebefdce130d28e.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-s3-tigris-update-throughput_ops-2d86342ca555dc256bebefdce130d28e.webp" title="Write throughput – Tigris vs S3" alt="Figure 8 – Write throughput – Tigris vs S3" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 8: Write throughput during mixed workload (Tigris vs S3).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Write throughput shows the same spread. Tigris delivers &lt;strong&gt;≈ 828 ops/s&lt;/strong&gt; , close to &lt;strong&gt;4 × S3 (224 ops/s)&lt;/strong&gt; and &lt;strong&gt;20 × R2 (43 ops/s)&lt;/strong&gt;, giving plenty of margin for bursty ingest pipelines.&lt;/p&gt;

&lt;h4&gt;
  
  
  Write latency&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#write-latency" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-r2-tigris-update-latency_p90_ms-b0ae5cb55a67e1338eff80caf30c04e0.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-r2-tigris-update-latency_p90_ms-b0ae5cb55a67e1338eff80caf30c04e0.webp" title="Write p90 latency – Tigris vs R2" alt="Figure 9 – Write p90 latency – Tigris vs R2" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 9: Write p90 latency during mixed workload (Tigris vs R2).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-s3-tigris-update-latency_p90_ms-58050f54ed75eb2ced477ee5e81ceb4e.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Frun-sjc-s3-tigris-update-latency_p90_ms-58050f54ed75eb2ced477ee5e81ceb4e.webp" title="Write p90 latency – Tigris vs S3" alt="Figure 10 – Write p90 latency – Tigris vs S3" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 10: Write p90 latency during mixed workload (Tigris vs S3).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Write-side tail latency tracks proportionally: &lt;strong&gt;&amp;lt; 17 ms&lt;/strong&gt; for Tigris, &lt;strong&gt;≈ 41 ms&lt;/strong&gt; for S3, and &lt;strong&gt;&amp;gt; 680 ms&lt;/strong&gt; for R2— an order-of-magnitude gap that can make or break user-facing workloads.&lt;/p&gt;

&lt;p&gt;To summarize:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Table 3: Read and throughput metrics.&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;P50 Latency (ms)&lt;/th&gt;
&lt;th&gt;P90 Latency (ms)&lt;/th&gt;
&lt;th&gt;Runtime (sec)&lt;/th&gt;
&lt;th&gt;Throughput (ops/sec)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tigris&lt;/td&gt;
&lt;td&gt;5.399 ms&lt;/td&gt;
&lt;td&gt;7.867 ms&lt;/td&gt;
&lt;td&gt;241.7 sec&lt;/td&gt;
&lt;td&gt;3309.8 ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;22.415 ms &lt;br&gt; (4.15x Tigris)&lt;/td&gt;
&lt;td&gt;42.047 ms &lt;br&gt; (5.34x Tigris)&lt;/td&gt;
&lt;td&gt;896.8 sec &lt;br&gt; (3.71x Tigris)&lt;/td&gt;
&lt;td&gt;891.5 ops/sec &lt;br&gt; (0.27x Tigris)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;605.695 ms &lt;br&gt; (112.19x Tigris)&lt;/td&gt;
&lt;td&gt;680.959 ms &lt;br&gt; (86.56x Tigris)&lt;/td&gt;
&lt;td&gt;4705.3 sec &lt;br&gt; (19.47x Tigris)&lt;/td&gt;
&lt;td&gt;42.6 ops/sec &lt;br&gt; (0.01x Tigris)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Table 4: Update and throughput metrics.&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;P50 Latency (ms)&lt;/th&gt;
&lt;th&gt;P90 Latency (ms)&lt;/th&gt;
&lt;th&gt;Runtime (sec)&lt;/th&gt;
&lt;th&gt;Throughput (ops/sec)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tigris&lt;/td&gt;
&lt;td&gt;12.855 ms&lt;/td&gt;
&lt;td&gt;16.543 ms&lt;/td&gt;
&lt;td&gt;241.6 sec&lt;/td&gt;
&lt;td&gt;828.1 ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;26.975 ms &lt;br&gt; (2.1x Tigris)&lt;/td&gt;
&lt;td&gt;41.215 ms &lt;br&gt; (2.49x Tigris)&lt;/td&gt;
&lt;td&gt;896.8 sec &lt;br&gt; (3.7x Tigris)&lt;/td&gt;
&lt;td&gt;223.6 ops/sec &lt;br&gt; (0.27x Tigris)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R2&lt;/td&gt;
&lt;td&gt;605.695 ms &lt;br&gt; (47.12x Tigris)&lt;/td&gt;
&lt;td&gt;680.959 ms &lt;br&gt; (41.16x Tigris)&lt;/td&gt;
&lt;td&gt;4705.3 sec &lt;br&gt; (19.4x Tigris)&lt;/td&gt;
&lt;td&gt;42.6 ops/sec &lt;br&gt; (0.05x Tigris)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion&lt;a href="https://www.tigrisdata.com/blog/benchmark-small-objects#conclusion" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Tigris outperforms S3 and comprehensively outperforms R2 for small object workloads. The performance advantage stems from Tigris's optimized architecture for small objects. While S3 and R2 struggle with high latency on small payloads (R2's p90 PUT latency reaches 340ms), Tigris maintains consistent low latency through intelligent object inlining, key coalescing, and LSM-backed caching.&lt;/p&gt;

&lt;p&gt;These results demonstrate that Tigris can serve as a unified storage solution for mixed workloads, eliminating the need to maintain separate systems for small and large objects. Whether you're storing billions of tiny metadata files or streaming gigabytes of video data, Tigris delivers optimal performance across the entire object size spectrum.&lt;/p&gt;

&lt;p&gt;You can find the full benchmark results in the &lt;a href="https://github.com/tigrisdata-community/ycsb-benchmarks" rel="noopener noreferrer"&gt;ycsb-benchmarks&lt;/a&gt; repository.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was originally published on July 8, 2025 at tigrisdata.com/blog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>benchmarks</category>
    </item>
    <item>
      <title>Standardizing Python Environments with Development Containers</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Thu, 03 Jul 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/standardizing-python-environments-with-development-containers-1ome</link>
      <guid>https://dev.to/tigrisdata/standardizing-python-environments-with-development-containers-1ome</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnq5crv18k0io25ay8smk.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnq5crv18k0io25ay8smk.webp" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're working in AI, you're probably working in Python. Maybe you have a webapp in whatever JS framework is popular right now, but most of the core tooling in AI is built in and around Python. So maybe it’s time for a Go programmer like me to figure out how the production Python gets made.&lt;/p&gt;

&lt;p&gt;Last week I rediscovered &lt;a href="https://containers.dev/" rel="noopener noreferrer"&gt;Development Containers&lt;/a&gt;. When you use them, you do all your development in a container instead of on your machine directly. This container is defined using a &lt;a href="https://containers.dev/implementors/json_schema/" rel="noopener noreferrer"&gt;&lt;code&gt;devcontainer.json&lt;/code&gt;&lt;/a&gt; file and when you create a development container it’s rebuilt from scratch every time. This means that when you get your build working in development, it won’t just work on your machine. It’ll work on anyone’s machine.&lt;/p&gt;

&lt;p&gt;Having to use Python shouldn't be that big of a deal, my first programming language was Python, but there’s one small problem that has resulted in me thinking that I have been cursed by an elder deity: Python environment management tools randomly break for me. I’ve never been able to figure out why this happens, but in the last three years I have not been able to consistently have basic editing, testing, or other project management tooling work reliably. I’ve spent hours debugging weird SIGBUS errors that nobody else can recreate and other things that go way above and beyond normal debugging of issues.&lt;/p&gt;

&lt;p&gt;The biggest thing that breaks is the language server in my editor. If I can’t get the language server working, I don’t know what I’m allowed to do with any given thing in the file without having to have a bunch of documentation tabs open. This combined with Python not having &lt;a href="https://pkg.go.dev/" rel="noopener noreferrer"&gt;a standard documentation site like Go does&lt;/a&gt; means that figuring out what I can do is not easy.&lt;/p&gt;

&lt;p&gt;Making things worse, there’s as many ways to manage Python as there are grains of sand on the planet. Starting to use Python means you get to make a lot of lovely decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What environment manager are you using? Conda? Virtualenv? uv? Anaconda? Miniconda? Homebrew? Pipenv?&lt;/li&gt;
&lt;li&gt;Which version of Python does your project depend on? Many big libraries like tensorflow do deep monkey patching of Python for performance reasons and as a result they can’t work on newer versions of the interpreter.&lt;/li&gt;
&lt;li&gt;How are you installing your dependencies? Pip? Pip3? Uv?
&lt;center&gt;&lt;p&gt;&lt;a href="https://xkcd.com/927/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigrisdata.com%2Fblog%2Fassets%2Fimages%2Fxkcd_standards-34f86e57eaef756c15ecfb2521124998.webp" width="800" height="453"&gt;&lt;/a&gt;
&lt;small&gt;&lt;a href="https://xkcd.com/927/" rel="noopener noreferrer"&gt;Standards -- XKCD&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;&lt;/center&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There has to be some kind of middle path. We should be able to have nice things like the ability to just open a git repo and get a working development environment, right?&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works&lt;a href="https://www.tigrisdata.com/blog/dev-containers-python#how-it-works" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;When you package your app in a Docker image, you make a &lt;code&gt;Dockerfile&lt;/code&gt; manifest with a base image and then list out all the changes you make to that base image to get things working. This could be anything from copying your source code into the image, building that code, installing dependencies, or anything else that boils down to copying files and running commands. When you define a development container, you make a &lt;code&gt;devcontainer.json&lt;/code&gt; manifest that specifies the base image you’re working from and any &lt;a href="https://containers.dev/features" rel="noopener noreferrer"&gt;features&lt;/a&gt; you want to add to it.&lt;/p&gt;

&lt;p&gt;For example, let’s consider what you need to do in order to get a &lt;a href="http://node.js/" rel="noopener noreferrer"&gt;Node.js&lt;/a&gt; environment working. Here’s a sample &lt;code&gt;devcontainer.json&lt;/code&gt; file for working with Node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "name": "Node",
  "image": "mcr.microsoft.com/devcontainers/base:bookworm",
  "features": {
    "ghcr.io/devcontainers/features/node:1": {},
    "ghcr.io/devcontainers-extra/features/neovim-apt-get:1": {}
  },
  "postCreateCommand": "npm ci"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells your editor to make a copy of &lt;a href="https://github.com/devcontainers/images/tree/main/src/base-debian" rel="noopener noreferrer"&gt;Microsoft’s base Debian image&lt;/a&gt; with &lt;a href="https://github.com/devcontainers/features/tree/main/src/node" rel="noopener noreferrer"&gt;Node&lt;/a&gt; and&lt;a href="https://github.com/devcontainers-extra/features/tree/main/src/neovim-apt-get" rel="noopener noreferrer"&gt;neovim&lt;/a&gt; automatically installed. It also installs all of your Node dependencies so that all you need to do to get up and running is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the repo in a development container&lt;/li&gt;
&lt;li&gt;Open a terminal&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;npm run start&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;There is no step 4.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Just imagine what that workflow could give you. Spinning people up would be a walk in the park.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about Python?&lt;a href="https://www.tigrisdata.com/blog/dev-containers-python#what-about-python" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;You’re probably sitting there asking yourself “yeah, that’s cool, but what about Python?” Python presents a lot of challenges for development use because there’s so many variables at play. If you know what you’re doing, this is fine and manageable. If you don’t, you end up in pip hell. You don't want to be in pip hell with me.&lt;/p&gt;

&lt;p&gt;One of the big things that development containers give teams that have a mix of Python experts and non-Python experts is the ability to just have a known working setup for people to fall back on in case they aren’t an expert in Python environment metaphysics. It’s great for people like me who care about the end result, but do not care at all how things go about getting done as long as it works (for some reasonable definition of “works”). Even better, you can define editor configuration settings and a list of extensions specifically for that project, meaning that you really can just open a new repo and get up and running within seconds.&lt;/p&gt;

&lt;p&gt;This editor preconfiguration means you can fix problems like “What version of Python do I need?” or “How do I just install the dependencies?” forever. Take &lt;a href="https://github.com/tigrisdata-community/huggingface-datasets-with-tigris" rel="noopener noreferrer"&gt;tigrisdata-community/huggingface-datasets-with-tigris&lt;/a&gt; for example. Its &lt;a href="https://github.com/tigrisdata-community/huggingface-datasets-with-tigris/blob/main/.devcontainer/devcontainer.json" rel="noopener noreferrer"&gt;&lt;code&gt;devcontainer.json&lt;/code&gt;&lt;/a&gt; answers that question for you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  // ...
  "postCreateCommand": "uv python install &amp;amp;&amp;amp; uv venv &amp;amp;&amp;amp; uv sync",
  "remoteEnv": {
    "UV_LINK_MODE": "copy",
    "UV_PYTHON": "3.10"
  }
  // ...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you create a development container with this manifest, it does the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Installs Python 3.10.x with &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Creates a &lt;a href="https://docs.astral.sh/uv/pip/environments/#using-python-environments" rel="noopener noreferrer"&gt;Python virtual environment&lt;/a&gt; for all of your dependencies&lt;/li&gt;
&lt;li&gt;Installs all of the Python dependencies&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And then you can run the code with &lt;code&gt;uv run&lt;/code&gt; and things Just Work™. All of that complicated dependency management becomes your environment’s problem. Even better, take a look at &lt;a href="https://github.com/tigrisdata-community/huggingface-datasets-with-tigris/blob/5d32918c5d890b924b46703074e9966249406032/.devcontainer/devcontainer.json#L33-L51" rel="noopener noreferrer"&gt;this part of the manifest&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  // ...
  "customizations": {
    "vscode": {
      "extensions": [
        "ms-python.python",
        "ms-python.vscode-pylance",
        "tamasfe.even-better-toml",
        "ms-toolsai.jupyter",
        "ms-toolsai.vscode-jupyter-cell-tags",
        "ms-toolsai.jupyter-renderers",
        "ms-toolsai.vscode-jupyter-slideshow",
        "ms-python.debugpy",
        "ms-toolsai.jupyter-keymap",
        "amazonwebservices.aws-toolkit-vscode"
      ],
      "settings": {
        "python.defaultInterpreterPath": "./.venv/bin/python"
      }
    }
  }
  // ...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes VS Code install every extension you need to get a working development environment and that &lt;code&gt;python.defaultInterpreterPath&lt;/code&gt; setting is the cherry on top that makes the language server integration work. This lets you simply clone a repo and get a working language server.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion&lt;a href="https://www.tigrisdata.com/blog/dev-containers-python#conclusion" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;I realize this sounds like a fairly simple thing, and let’s be honest, it should be this simple, but it’s taken me three years of experimentation, toil, and suffering to get to the point where you really can just clone a repo and get working language server integration. If you have also been suffering trying to get Python installed so you can vibe code your way to an IPO, give development containers a try.&lt;/p&gt;

&lt;p&gt;This even works if you use &lt;a href="https://github.com/features/codespaces" rel="noopener noreferrer"&gt;GitHub Codespaces&lt;/a&gt;, meaning that you don’t even need to install a copy of VS Code to work on the project.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was originally published on July 3, 2025 at tigrisdata.com/blog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>development</category>
      <category>containers</category>
      <category>docker</category>
    </item>
    <item>
      <title>mount -t tigrisfs</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Tue, 01 Jul 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/mount-t-tigrisfs-38b6</link>
      <guid>https://dev.to/tigrisdata/mount-t-tigrisfs-38b6</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8hcbr4mbqx5lyykdb58.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8hcbr4mbqx5lyykdb58.webp" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At Tigris we put your big data close to your compute so you don’t have to do it yourself. However, there’s been a small problem with that: most of the programs that are built to process that data such as AI training, document indexing, and other kinds of workloads expect to read data from a filesystem.&lt;/p&gt;

&lt;p&gt;Not to mention, big data means big data. Bigger than ram. Bigger than your disk. Bigger than any one machine can have on any amount of disks. Sometimes even bigger than human minds can imagine. What if that data was as easy to access as your code folder, but had unlimited storage?&lt;/p&gt;

&lt;p&gt;We’re proud to announce the immediate availability of &lt;a href="https://github.com/tigrisdata/tigrisfs" rel="noopener noreferrer"&gt;tigrisfs&lt;/a&gt;, the native filesystem interface for Tigris. This lets you mount Tigris buckets to your laptops, desktops, and servers so you can use data in your buckets as if it was local. This bridges the gap between the cloud and your machine.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Internally, tigrisfs is a fork of &lt;a href="https://github.com/yandex-cloud/geesefs" rel="noopener noreferrer"&gt;geesefs&lt;/a&gt;, another project that converts object storage buckets into mountable storage. geesefs has good performance and makes it easy to access the same bucket from the S3 API and the filesystem without obfuscating object names like juicefs. We have extended geesefs to leverage Tigris-specific features that improve throughput and latency. With tigrisfs you can use the S3 API or the filesystem interchangeably without having to worry about name mangling. tigrisfs is the canonical filesystem implementation for Tigris.&lt;br&gt;
-Ovais Tariq, CEO @ Tigris Data&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Your data: everywhere
&lt;/h2&gt;

&lt;p&gt;Let’s imagine that you have big data in your stack. Not just big, but &lt;em&gt;unimaginably big,&lt;/em&gt; we’re talking about data the size of Wikipedia, the entire Linux Kernel Mailing List archives, and the entire git history for all the big open source projects. Not to mention small datasets like every scientific paper from arxiv. tigrisfs lets you mount the same dataset in the same place on every machine in your cluster. Imagine just reading from &lt;code&gt;/mnt/tigris/datasets/raw/lkml&lt;/code&gt;, processing the data around a bit, and then writing it to &lt;code&gt;/mnt/tigris/datasets/massaged/lkml&lt;/code&gt; for the downstream analysis to run. We’ll go into more detail about this in the near future, keep an eye out for that!&lt;/p&gt;

&lt;p&gt;The really cool part about this is that it lets you have a global filesystem on your local machine. All your data is just there and waiting to be used. If you write that massaged dataset to &lt;code&gt;/mnt/tigris/datasets/massaged/lkml&lt;/code&gt; on one machine, it’s instantly available to any other machine in the cluster. Any time it’s used, it’ll be seamlessly cached on the device so that it’s hot’n’ready for action! It’s like having a ReadWriteMany Kubernetes volume, but without having to set up Ceph.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dataset Type&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Bigger than any one machine can hold with any number of disks&lt;/td&gt;
&lt;td&gt;Wikipedia, Linux Kernel Mailing List archives, entire git history for all big open source projects, and every book published in the last 100 years&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bigger than any one disk&lt;/td&gt;
&lt;td&gt;The entire YouTube upload history of your favorite creator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Smaller than RAM&lt;/td&gt;
&lt;td&gt;Every scientific paper from arXiv&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you’re dealing with anything bigger than ram, tigrisfs is a great fit.&lt;/p&gt;

&lt;p&gt;One of the neat parts about tigrisfs is that using it means you can deal with your files using either the S3 API or the filesystem API. This is in contrast to other tools like JuiceFS which break files into blocks and obfuscate the filenames, meaning you need to spend time and energy reverse-engineering how the block → data mapping works. With tigrisfs you can &lt;code&gt;PUT&lt;/code&gt; an object into your bucket with the S3 API, and then open the file in your favorite text editor. This unlocks any number of fun integrations, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using &lt;code&gt;inotifywait&lt;/code&gt; to process data as it’s created in a bucket by your analytics pipeline&lt;/li&gt;
&lt;li&gt;Backing up your home folder with &lt;code&gt;rsync&lt;/code&gt; in a cronjob&lt;/li&gt;
&lt;li&gt;Using tools like &lt;code&gt;gzcat&lt;/code&gt; to read compressed data without having to decompress it&lt;/li&gt;
&lt;li&gt;Storing TLS certificates across the cluster so that one machine can renew it, and it’ll roll out to the rest of the machines instantly&lt;/li&gt;
&lt;li&gt;Reading your training datasets directly from disk instead of having to set up object storage with the datasets library&lt;/li&gt;
&lt;li&gt;Reading a raw video out of one bucket and compressing it for global distribution into another bucket using &lt;code&gt;ffmpeg&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s say you want to edit your secret plans in your Linux VM on your MacBook. First, upload it to Tigris with &lt;code&gt;aws s3 cp&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ aws s3 cp secretplans.txt s3://pitohui
upload: ./secretplans.txt to s3://pitohui/secretplans.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you can view it like normal with the shell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;xe@pitohui:~ $ cat /mnt/tigris/pitohui/secretplans.txt
- world domination via the use of hypnodrones
- make there be such a thing as a free lunch
- create more paperclips
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And now you can do whatever you want! You can even do backups of your home folder with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;xe@pitohui:~ $ rsync -av ~ /mnt/tigris/pitohui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cloud’s the limit!&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started with tigrisfs&lt;a href="https://www.tigrisdata.com/blog/tigrisfs#getting-started-with-tigrisfs" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;If you want to get started, all you need is an aarch64/x86_64 Linux system, a Tigris bucket, and a keypair.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing tigrisfs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;One-liner install&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This will install the latest version of tigrisfs and its dependencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -sSL https://raw.githubusercontent.com/tigrisdata/tigrisfs/refs/heads/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Or, install using package manager&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Download the package from &lt;a href="https://github.com/tigrisdata/tigrisfs/releases" rel="noopener noreferrer"&gt;the most recent release&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install the package using your package manager

&lt;ul&gt;
&lt;li&gt;Debian/Ubuntu: &lt;code&gt;sudo apt install ./tigrisfs-version.deb&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Alma Linux / Fedora / Red Hat / Rocky Linux: &lt;code&gt;sudo dnf install ./tigrisfs-version.rpm&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Mounting the filesystem
&lt;/h2&gt;

&lt;p&gt;We are going to assume that you have a bucket called &lt;code&gt;pitohui&lt;/code&gt; and you want to mount it to &lt;code&gt;/mnt/tigris/pitohui&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Open &lt;code&gt;/etc/default/tigrisfs&lt;/code&gt; in your favorite text editor as root and uncomment the &lt;code&gt;AWS_ACCESS_KEY&lt;/code&gt; and &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt; variables and paste in the access key you got from the Tigris dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Using the command line&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;First, create the directory you want to mount the bucket to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir -p /mnt/tigris/pitohui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then mount the bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tigrisfs pitohui /mnt/tigris/pitohui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives me permission to do whatever I want such as touching grass:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ touch /mnt/tigris/pitohui/grass
$ stat /mnt/tigris/pitohui/grass
  File: /mnt/tigris/pitohui/grass
  Size: 0               Blocks: 0       IO Block: 4096   regular empty file
Device: 80h/128d        Inode: 1631     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/      xe)   Gid: ( 1000/     xe)
Context: system_u:object_r:fusefs_t:s0
Access: 2025-04-07 20:15:07.549222957 +0000
Modify: 2025-04-07 20:15:07.549222957 +0000
Change: 2025-04-07 20:15:07.549222957 +0000
 Birth: -
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And make sure it exists in Tigris:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ aws s3 ls s3://pitohui | grep grass
2025-04-07 16:15:07         0 grass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perfect! If you are looking to crank up performance, there are a few configuration options that you can tweak. Take a look at the &lt;a href="https://www.tigrisdata.com/docs/training/tigrisfs/#maximizing-performance" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; for more details.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Using systemd&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you’re in an environment with &lt;code&gt;systemd&lt;/code&gt;, mount your bucket with &lt;code&gt;systemctl enable --now&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl enable --now tigrisfs@bucketname.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your bucket will be available at &lt;code&gt;/mnt/tigris/bucketname&lt;/code&gt;. If you need things to be writable by your user account, edit the &lt;code&gt;OPTS&lt;/code&gt; line based on your account’s information. For example on my MacBook’s Oracle Linux VM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ iduid=1000(xe) gid=1000(xe) groups=1000(xe),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My user id (uid) is &lt;code&gt;1000&lt;/code&gt; and my group id (gid) is &lt;code&gt;1000&lt;/code&gt;, so to give my user permissions, I need this &lt;code&gt;OPTS&lt;/code&gt; line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Mount optionsOPTS="-o allow_other --gid=1000 --uid=1000"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives me permission to do whatever I want such as touching grass:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ touch /mnt/tigris/pitohui/grass$ stat /mnt/tigris/pitohui/grass File: /mnt/tigris/pitohui/grass Size: 0 Blocks: 0 IO Block: 4096 regular empty fileDevice: 80h/128d Inode: 1631 Links: 1Access: (0644/-rw-r--r--) Uid: ( 1000/ xe) Gid: ( 1000/ xe)Context: system_u:object_r:fusefs_t:s0Access: 2025-04-07 20:15:07.549222957 +0000Modify: 2025-04-07 20:15:07.549222957 +0000Change: 2025-04-07 20:15:07.549222957 +0000 Birth: -
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And make sure it exists in Tigris:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ aws s3 ls s3://pitohui | grep grass2025-04-07 16:15:07 0 grass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perfect! If you are looking to crank up performance, there are a few configuration options that you can tweak. Take a look at the &lt;a href="https://www.tigrisdata.com/docs/training/tigrisfs/#maximizing-performance" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; for more details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Under the hood&lt;a href="https://www.tigrisdata.com/blog/tigrisfs#under-the-hood" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;tigrisfs is a fork of &lt;a href="https://github.com/yandex-cloud/geesefs" rel="noopener noreferrer"&gt;geesefs&lt;/a&gt;, a high performance FUSE filesystem adaptor for object storage. We have extended geesefs to leverage Tigris-specific features that improve throughput and latency. &lt;/p&gt;

&lt;p&gt;tigrisfs is a high performance FUSE filesystem adaptor for object storage based on &lt;a href="https://github.com/yandex-cloud/geesefs" rel="noopener noreferrer"&gt;geesefs&lt;/a&gt;, which is a fork of &lt;a href="https://github.com/kahing/goofys" rel="noopener noreferrer"&gt;goofys&lt;/a&gt;. GeeseFS solves performance problems which FUSE file systems based on S3 typically have, especially with small files and metadata operations. It solves these problems by using aggressive parallelism and asynchrony.&lt;/p&gt;

&lt;h3&gt;
  
  
  Improvements over GeeseFS
&lt;/h3&gt;

&lt;p&gt;Our initial release zeroed-in on hardening the codebase for production, focusing on two areas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security hardening

&lt;ul&gt;
&lt;li&gt;Replaced the bundled, legacy AWS SDK that contained known CVEs&lt;/li&gt;
&lt;li&gt;Upgraded every dependency to its latest secure version&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Reliability upgrades

&lt;ul&gt;
&lt;li&gt;Eliminated all race conditions flagged by the Go race detector (now mandatory in tests)&lt;/li&gt;
&lt;li&gt;Fixed every linter warning and added lint checks to CI&lt;/li&gt;
&lt;li&gt;Dramatically expanded the test-suite and made the extended tests a default part of CI&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tigris-specific improvements
&lt;/h3&gt;

&lt;p&gt;We also shipped a few features that lean on Tigris internals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;POSIX semantics - permissions, special files, and symlinks now behave just like they do on a local disk.&lt;/li&gt;
&lt;li&gt;Turbo-charged small files - listing a directory automatically batch-fetches and caches tiny objects in a single round-trip.&lt;/li&gt;
&lt;li&gt;Smart prefetch - directory listings kick off background fetches so the next cat or grep feels instant.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In essence, tigrisfs bridges the gap between the Linux kernel and Tigris. It translates filesystem calls into S3 API calls so that you can explore your bucket with the shell and bridge the gap between the old world of servers and shells with the new world of dynamic infinity in the cloud.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83tjyeu7owx0sg6783pp.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83tjyeu7owx0sg6783pp.webp" width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmarks&lt;a href="https://www.tigrisdata.com/blog/tigrisfs#benchmarks" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Benchmarking filesystems is kind of annoying, and networked filesystems can be even more annoying to benchmark. Most of the time, you end up making a lot of assumptions about the system state and network configuration. Here are the specs of our benchmarking machine:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Quantity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instance type&lt;/td&gt;
&lt;td&gt;VM.Standard.E5.Flex (Oracle Cloud)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU cores&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;24 gigabytes (24Gi)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network bandwidth&lt;/td&gt;
&lt;td&gt;24 gigabits per second&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Our benchmarks are done with the &lt;a href="https://github.com/axboe/fio" rel="noopener noreferrer"&gt;flexible i/o tester fio&lt;/a&gt;. Note that we are using direct I/O to avoid &lt;a href="https://en.wikipedia.org/wiki/Page_cache" rel="noopener noreferrer"&gt;page caching&lt;/a&gt;being an issue.&lt;/p&gt;

&lt;h4&gt;
  
  
  Read performance&lt;a href="https://www.tigrisdata.com/blog/tigrisfs#read-performance" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;Here is the command we used to test read performance on a bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fio --name=read_throughput \
    --directory=/mnt/test-tigrisfs-bucket \
    --numjobs=4 \
    --size=4G \
    --time_based \
    --runtime=120s \
    --ramp_time=2s \
    --ioengine=libaio \
    --direct=1 \
    --verify=0 \
    --bs=1M \
    --iodepth=1 \
    --rw=read \
    --group_reporting=1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This has fio run for 2 minutes with 2 seconds of ramp-up time (during which, the results are not counted in the statistics) trying to read up to 4 gigabytes of data per thread (job) in one megabyte blocks. This reads a total of 16 gigabytes of data. The test was run in permutations of thread count and block size to see if the limitations are on tigrisfs, the Tigris service, and the network card of the machine.&lt;/p&gt;

&lt;p&gt;And we got these results for each permutation of the test:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threads&lt;/th&gt;
&lt;th&gt;Block Size&lt;/th&gt;
&lt;th&gt;Throughput (MiB/sec)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;1630&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4M&lt;/td&gt;
&lt;td&gt;2446&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;2802 *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;4M&lt;/td&gt;
&lt;td&gt;2732 *&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;&lt;br&gt;
The throughput numbers with the asterisk next to them could theoretically be faster, but at this point we saturated out the network card on the test machine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Write performance&lt;a href="https://www.tigrisdata.com/blog/tigrisfs#write-performance" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;Here is the command we used to test write performance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fio --name=write_throughput \
    --directory=/mnt/test-tigrisfs-bucket \
    --numjobs=8 \
    --size=4G \
    --time_based \
    --runtime=120s \
    --ramp_time=2s \
    --ioengine=libaio \
    --direct=1 \
    --verify=0 \
    --bs=4M \
    --iodepth=1 \
    --rw=write \
    --group_reporting=1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This has fio run for 2 minutes with 2 seconds of ramp-up time (during which the results are not counted in the statistics) trying to write up to 4 gigabytes of data per thread (job) in four-megabyte blocks. This writes a total of 16 gigabytes of data. The test was run in permutations of thread count and block size to see if the limitations are on tigrisfs, the Tigris service, and the network card of the machine.&lt;/p&gt;

&lt;p&gt;And we got these results for each permutation of the test:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threads&lt;/th&gt;
&lt;th&gt;Block Size&lt;/th&gt;
&lt;th&gt;Throughput (MiB/sec)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;1118&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4M&lt;/td&gt;
&lt;td&gt;1119&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;1269&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;4M&lt;/td&gt;
&lt;td&gt;1279&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Needless to say, this represents being able to read and write multiple DVDs of data per second per machine with tigrisfs. This combined with the caching that tigrisfs uses means that it should be more than sufficient for anything you can throw at it.&lt;/p&gt;

&lt;h2&gt;
  
  
  When should I use tigrisfs?&lt;a href="https://www.tigrisdata.com/blog/tigrisfs#when-should-i-use-tigrisfs" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;TigrisFS&lt;/th&gt;
&lt;th&gt;S3 API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Legacy Tool Integration&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direct Filesystem Access&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On-Demand File Fetching&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Model Training&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Global Performance&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Personally, I use tigrisfs all the time on my own machines. One of the main things I use it for is running &lt;a href="https://www.tigrisdata.com/blog/anubis/" rel="noopener noreferrer"&gt;analytics across honeypot logs&lt;/a&gt; so that I can fight off evil scrapers and save the internet.&lt;/p&gt;

&lt;p&gt;In general, tigrisfs can be slower than the native disk for files that aren't cached yet, but it more than makes up for it by allowing you to make the location of your files irrelevant. All you need to do is run tigrisfs, and you have a single global namespace for your data across all your machines.&lt;/p&gt;

&lt;p&gt;tigrisfs is written in Go and is &lt;a href="https://github.com/tigrisdata/tigrisfs" rel="noopener noreferrer"&gt;open source on GitHub&lt;/a&gt;. We welcome any and all contributions to make it even better!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was originally published on July 1, 2025 at tigrisdata.com/blog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>bigdata</category>
      <category>tigrisfs</category>
    </item>
    <item>
      <title>Data Time Travel with DuckLake and Tigris</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Wed, 25 Jun 2025 00:00:00 +0000</pubDate>
      <link>https://dev.to/tigrisdata/data-time-travel-with-ducklake-and-tigris-414f</link>
      <guid>https://dev.to/tigrisdata/data-time-travel-with-ducklake-and-tigris-414f</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsxiiiu4vqjsl3eiivk0.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsxiiiu4vqjsl3eiivk0.webp" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You’ve got your tunes in high gear, your editor is open, and you’re working on recreating some database tables with your AI agent in an epic duo. Your AI agent says “run this SQL query?” and you click yes. The tests pass, you make PR that’s quickly stamped and merged, when suddenly your pager goes off. And again. And again. You’ve just broken the analytics database and everything is on fire. What do you do?&lt;/p&gt;

&lt;p&gt;If you’re using most database engines, this is a priority zero “stop the world and restore from backups” shaped problem. This is especially annoying with analytics databases because many times those databases aren’t just bigger than ram, they’re bigger than your local disk, and sometimes even bigger than any disk in any computer can ever be. But further, if your AI starts renaming columns or combining data in interesting ways, it can be quite the mess to untangle, even impossible. In such cases, this is an XK-class end-of-the-project scenario, which is triple-plus ungood.&lt;/p&gt;

&lt;p&gt;However, you read &lt;a href="https://www.tigrisdata.com/blog/ducklake/" rel="noopener noreferrer"&gt;our last post on DuckLake&lt;/a&gt; and have been storing your analytics data in Tigris so you can get that sweet, sweet global performance. How do you go back to the past where everything Just Worked? Turns out it’s easy, no DeLorean required. All you have to do is reset the timeline with a couple simple commands.&lt;/p&gt;

&lt;h2&gt;
  
  
  DuckLake and you&lt;a href="https://www.tigrisdata.com/blog/ducklake-time-travel#ducklake-and-you" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ducklake.select/" rel="noopener noreferrer"&gt;DuckLake&lt;/a&gt; is an analytics data lakehouse that lets you import SQL and NoSQL data so you can run SQL queries on it. One of the really cool parts about DuckLake is that when you do any INSERT or DELETE into DuckLake tables, you create a new snapshot of the database that you can roll back to. DuckLake turns your SQL database into an append-only-log.&lt;/p&gt;

&lt;p&gt;As an example, let’s create a DuckLake database backed by Tigris, insert some data and see what happens. First, &lt;a href="https://duckdb.org/docs/installation/" rel="noopener noreferrer"&gt;install DuckDB&lt;/a&gt; and then set up the DuckLake extension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSTALL ducklake;LOAD ducklake;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Then attach to a new DuckLake database in Tigris:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ATTACH 'ducklake:timetravel_demo.ddb'AS delorean ( DATA_PATH 's3://xe-ducklake/delorean' );
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Note: This creates the DuckLake metadata on the local filesystem which is fine for demos like this, but for production use we suggest putting your DuckLake metadata in the cloud with a Postgres database or &lt;a href="https://ducklake.select/docs/stable/duckdb/usage/choosing_a_catalog_database" rel="noopener noreferrer"&gt;one of the other backends DuckLake supports&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now that we have the database, let’s create a simple table and throw some data in it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE IF NOT EXISTS delorean.youtube_videos ( id TEXT NOT NULL PRIMARY KEY , title TEXT NOT NULL , channel TEXT NOT NULL );INSERT INTO delorean.youtube_videos ( id, title, channel )VALUES ( 'WcSCYzI2peM', 'Delfino Plaza (Super Mario Sunshine) - Mario Kart World', 'SiIvaGunner' ), ( 'W4AcveHnDzg', 'Retribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable Night', 'SiIvaGunner' );
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Awesome, let’s see what the bucket looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ aws s3 ls s3://xe-ducklake/delorean/main/youtube_videos/2025-06-25 10:28:34 1175 ducklake-0197a77d-905b-7624-8e32-c80c69470e52.parquet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Interesting, DuckLake created a parquet file named after the table we inserted the data into, let’s see what it looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM 's3://xe-ducklake/delorean/main/youtube_videos/ducklake-0197a77d-905b-7624-8e32-c80c69470e52.parquet';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;th&gt;channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WcSCYzI2peM&lt;/td&gt;
&lt;td&gt;Delfino Plaza (Super Mario Sunshine) - Mario Kart World&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;W4AcveHnDzg&lt;/td&gt;
&lt;td&gt;Retribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable Night&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is the key to how DuckLake works. Every time you write to one of its tables, it puts those rows you added into a parquet file in object storage. Let’s see the changes we made to the &lt;code&gt;delorean&lt;/code&gt; database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM ducklake_snapshots('delorean');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;snapshot_id&lt;/th&gt;
&lt;th&gt;snapshot_time&lt;/th&gt;
&lt;th&gt;schema_version&lt;/th&gt;
&lt;th&gt;changes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;2025-06-25 10:19:32.897-04&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{schemas_created=[main]}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2025-06-25 10:27:33.45-04&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{tables_created=[main.youtube_videos]}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2025-06-25 10:27:40.828-04&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{tables_dropped=[1]}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2025-06-25 10:27:45.229-04&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{tables_created=[main.youtube_videos]}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2025-06-25 10:28:33.497-04&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{tables_inserted_into=[2]}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The first change is creating the &lt;code&gt;main&lt;/code&gt; schema, and then you can see that while I was working on this article I made the &lt;code&gt;youtube_videos&lt;/code&gt; table, messed up the schema, dropped it, recreated it, and then inserted information about &lt;a href="https://youtu.be/W4AcveHnDzg" rel="noopener noreferrer"&gt;epic tunes&lt;/a&gt; into the table. To really show off this time travel power though, let’s delete the data and then add other data into the mix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM delorean.youtube_videos;INSERT INTO delorean.youtube_videos ( id, title, channel )VALUES ( 'jhl5afLEKdo', 'Hatsune Miku World is Mine / ryo（supercell)', 'Hatsune Miku' ), ( 'sqK-jh4TDXo', 'Machine Love (feat. Kasane Teto)', 'Jamie Page' );
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;So what happened to the database? Here’s what the table looks like now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM delorean.youtube_videos;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;th&gt;channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;jhl5afLEKdo&lt;/td&gt;
&lt;td&gt;Hatsune Miku World is Mine / ryo（supercell)&lt;/td&gt;
&lt;td&gt;Hatsune Miku&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sqK-jh4TDXo&lt;/td&gt;
&lt;td&gt;Machine Love (feat. Kasane Teto)&lt;/td&gt;
&lt;td&gt;Jamie Page&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;But if you look in the bucket, the old data is still there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM 's3://xe-ducklake/delorean/main/youtube_videos/*.parquet';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;th&gt;channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WcSCYzI2peM&lt;/td&gt;
&lt;td&gt;Delfino Plaza (Super Mario Sunshine) - Mario Kart World&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;W4AcveHnDzg&lt;/td&gt;
&lt;td&gt;Retribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable Night&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;jhl5afLEKdo&lt;/td&gt;
&lt;td&gt;Hatsune Miku World is Mine / ryo（supercell)&lt;/td&gt;
&lt;td&gt;Hatsune Miku&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sqK-jh4TDXo&lt;/td&gt;
&lt;td&gt;Machine Love (feat. Kasane Teto)&lt;/td&gt;
&lt;td&gt;Jamie Page&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;How do you get the old data back? Well for one, we can time travel &lt;em&gt;directly in SQL queries&lt;/em&gt;! Let’s look at the database snapshots again and try to figure out what happened:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM ducklake_snapshots('delorean');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;snapshot_id&lt;/th&gt;
&lt;th&gt;snapshot_time&lt;/th&gt;
&lt;th&gt;schema_version&lt;/th&gt;
&lt;th&gt;changes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2025-06-25 10:28:33.497-04&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{tables_inserted_into=[2]}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2025-06-25 10:38:19.959-04&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{tables_deleted_from=[2]}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;2025-06-25 10:41:48.455-04&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{tables_inserted_into=[2]}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So it looks like the table was added to in snapshot 4, the data was deleted in snapshot 5, and the new data comes in at snapshot 6. Let’s get the superset of the data at snapshot 4 AND snapshot 6:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM delorean.youtube_videos AT (VERSION =&amp;gt; 4)UNION ALLSELECT * FROM delorean.youtube_videos AT (VERSION =&amp;gt; 6)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;th&gt;channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WcSCYzI2peM&lt;/td&gt;
&lt;td&gt;Delfino Plaza (Super Mario Sunshine) - Mario Kart World&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;W4AcveHnDzg&lt;/td&gt;
&lt;td&gt;Retribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable Night&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;jhl5afLEKdo&lt;/td&gt;
&lt;td&gt;Hatsune Miku World is Mine / ryo（supercell)&lt;/td&gt;
&lt;td&gt;Hatsune Miku&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sqK-jh4TDXo&lt;/td&gt;
&lt;td&gt;Machine Love (feat. Kasane Teto)&lt;/td&gt;
&lt;td&gt;Jamie Page&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can see the power here right? The data is still safely stored in your bucket, so deletes &lt;em&gt;don’t matter&lt;/em&gt;. It may be more inconvenient to access the data, but you can also time travel for &lt;em&gt;the entire database at once&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ATTACH 'ducklake:timetravel_demo.ddb'AS delorean_past ( DATA_PATH 's3://xe-ducklake/delorean' , SNAPSHOT_VERSION 4 );SELECT * FROM delorean_past.youtube_videos;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;th&gt;channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WcSCYzI2peM&lt;/td&gt;
&lt;td&gt;Delfino Plaza (Super Mario Sunshine) - Mario Kart World&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;W4AcveHnDzg&lt;/td&gt;
&lt;td&gt;Retribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable Night&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Advanced temporal mechanics&lt;a href="https://www.tigrisdata.com/blog/ducklake-time-travel#advanced-temporal-mechanics" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;You’re not limited to just running queries against the past, you can also connect to the database at a given point in time. This combined with making a local fork of the database lets you get into &lt;em&gt;advanced&lt;/em&gt; temporal mechanics.&lt;/p&gt;

&lt;p&gt;Let’s make a local copy of the database to debug the AI agent’s changes. Connect to the database in the past before the AI model messed things up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ATTACH 'ducklake:timetravel_demo.ddb'AS delorean_past ( DATA_PATH 's3://xe-ducklake/delorean' , SNAPSHOT_VERSION 4 );
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Cool, then let’s make a local copy of it at that point in time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ATTACH 'ducklake:timetravel_local_copy.ddb'AS local_delorean ( DATA_PATH 's3://xe-ducklake/delorean' );COPY FROM DATABASE delorean TO local_delorean;DETACH delorean;DETACH local_delorean;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The magic part of that is the &lt;code&gt;COPY FROM DATABASE&lt;/code&gt; instruction. That makes a copy of the database locally so you can debug the AI agent’s change and prevent a future timeline from coming to pass the way the current one did. Then for the cherry on top, attach the local database with the same name as the remote one so that your agent is none the wiser:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ATTACH 'ducklake:timetravel_local_copy.ddb'AS delorean ( DATA_PATH 's3://xe-ducklake/delorean' );
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Et voila! We have successfully forked the timeline and can now make any change we want without affecting the main timeline. Test it by running a SQL query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM delorean.youtube_videos
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;th&gt;channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WcSCYzI2peM&lt;/td&gt;
&lt;td&gt;Delfino Plaza (Super Mario Sunshine) - Mario Kart World&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;W4AcveHnDzg&lt;/td&gt;
&lt;td&gt;Retribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable Night&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The really cool part is that we did all that &lt;em&gt;without&lt;/em&gt; affecting the data in Tigris:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM 's3://xe-ducklake/delorean/main/youtube_videos/*.parquet';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;id&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;th&gt;channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WcSCYzI2peM&lt;/td&gt;
&lt;td&gt;Delfino Plaza (Super Mario Sunshine) - Mario Kart World&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;W4AcveHnDzg&lt;/td&gt;
&lt;td&gt;Retribution for the Eternal Night ~ Imperishable Night (Beta Mix) - Touhou 8: Imperishable Night&lt;/td&gt;
&lt;td&gt;SiIvaGunner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;jhl5afLEKdo&lt;/td&gt;
&lt;td&gt;Hatsune Miku World is Mine / ryo（supercell)&lt;/td&gt;
&lt;td&gt;Hatsune Miku&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sqK-jh4TDXo&lt;/td&gt;
&lt;td&gt;Machine Love (feat. Kasane Teto)&lt;/td&gt;
&lt;td&gt;Jamie Page&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When we made a copy of the data lake to hack on locally, all we were copying was the schemata of the database and references to objects in Tigris. At some level, the tables don’t logically exist, they’re really just a bunch of rules and references that DuckLake uses to rebuild your database on the fly! And because the tables quack like SQL tables enough, the illusion is maintained!&lt;/p&gt;

&lt;p&gt;Another really cool thing is that every INSERT or UPDATE operation results in discrete parquet files being put into Tigris:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ aws s3 ls s3://xe-ducklake/delorean/main/youtube_videos/2025-06-25 10:28:34 1175 ducklake-0197a77d-905b-7624-8e32-c80c69470e52.parquet2025-06-25 10:41:49 975 ducklake-0197a789-b1b2-797e-8bd5-a20905d2d73f.parquet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;These parquet files are written to once and NEVER updated. This means that as your analytics pipelines or developers all over the world access things, they’re automatically fast and local thanks to Tigris’ global performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Victory achieved! &lt;a href="https://www.tigrisdata.com/blog/ducklake-time-travel#victory-achieved" rel="noopener noreferrer"&gt;​&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Now you can turn your analytics pipeline back on and go on with hacking up a storm. Next time you do those changes with your AI agent though, make sure to test them against a backup of the data lake just in case things go pear-shaped again. Ensure your AI isn’t renaming columns, deleting tables, changing the structure, etc. Ideally your schema changes should only add columns and never remove them. Your database tables should be treated as a public API.&lt;/p&gt;

&lt;p&gt;To make a local backup of your data lake so your agent can break whatever the tokens deem worthy without breaking prod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ATTACH 'ducklake:timetravel_local_copy.ddb'AS local_delorean ( DATA_PATH 's3://xe-ducklake/delorean' );COPY FROM DATABASE delorean TO local_delorean;DETACH delorean;DETACH local_delorean;ATTACH 'ducklake:timetravel_local_copy.ddb'AS delorean ( DATA_PATH 's3://xe-ducklake/delorean' );
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;And to attach to DuckLake in read-only mode so that it can’t break anything if it wanted to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ATTACH 'ducklake:timetravel_demo.ddb'AS stone_tablet ( DATA_PATH 's3://xe-ducklake/delorean' , READ_ONLY -- &amp;lt;- attaches the ducklake database in read-only mode );
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;And then you can go back to fearless vibe coding to make your dreams come true in the form of B2B SaaS! All your data will be safe in the cloud and fast to load anywhere in the world, even if you need to time travel a bit to get things working again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Analytics databases with time travel!
&lt;/h3&gt;

&lt;p&gt;Tigris lets you store your data everywhere, including your analytics data. When you use Tigris and DuckLake together, you get global performance to rival the cloud giants at a fraction of the cost. Query data from the past to bring you to a better future!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigrisdata.com/docs/get-started/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff784t7kvuj2zms095m7a.png" alt="Want to try it out? Get Started" width="800" height="128"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was originally published on June 25, 2025 at tigrisdata.com/blog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>bigdata</category>
      <category>datalake</category>
      <category>duckdb</category>
    </item>
    <item>
      <title>Announcing the Tigris MCP server</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Tue, 24 Jun 2025 22:12:33 +0000</pubDate>
      <link>https://dev.to/tigrisdata/announcing-the-tigris-mcp-server-389g</link>
      <guid>https://dev.to/tigrisdata/announcing-the-tigris-mcp-server-389g</guid>
      <description>&lt;p&gt;One of the great things about modern AI editor workflows is how it makes it&lt;br&gt;
easier to get started. Normally when you open a text editor, you have an empty&lt;br&gt;
canvas and don’t know where to start. AI tools let you describe what you want&lt;br&gt;
and help you get started doing it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“We’ve all been excited about AI editors making development fast and just plain fun.”&lt;br&gt;
-Most developers, probably&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt99lqhl24tteyonyh8n.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt99lqhl24tteyonyh8n.webp" alt="A robotic blue tiger using tools to work on an engine."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;
  &lt;small&gt;
    &lt;em&gt;A robotic blue tiger using tools to work on an engine.&lt;/em&gt;
  &lt;/small&gt;
&lt;/center&gt;

&lt;p&gt;Today we’re happy to announce that we’re making it even easier to get started&lt;br&gt;
with Tigris in your AI editor workflow. If you want to get to the part where you&lt;br&gt;
can plug configs into your AI editor and get started, head to Getting Started and get off to vibe coding your next generation B2B SaaS as a service.&lt;/p&gt;

&lt;p&gt;Abdullah just started at Tigris a week ago (Welcome!) and has already built&lt;br&gt;
something that will make it easier for you to make object storage a native part&lt;br&gt;
of your development workflow: a Model Context Protocol (MCP) server for Tigris.&lt;br&gt;
This enables you to manage your buckets and objects with plain language in your AI capable editor.&lt;/p&gt;

&lt;p&gt;Just say “make me a bucket for this project” and it’ll go do that. Want files in the bucket? Just ask it to upload a file; it’ll make it happen.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/ukcQY65cc34"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  The vision
&lt;/h2&gt;

&lt;p&gt;We want your developer experience with Tigris to be as seamless, unsurprising,&lt;br&gt;
and natural as possible. What’s more natural than natural language? Getting this&lt;br&gt;
set up was a breeze. Tigris is compatible with S3, so all Abdullah had to do was&lt;br&gt;
glue S3 calls to the MCP library. Everything was already there, well-tested, and&lt;br&gt;
ready to go. And it’s just object storage: there’s no chance you’ll accidentally&lt;br&gt;
spin up an expensive service and get a surprise bill.&lt;/p&gt;

&lt;p&gt;Of note: many other MCP servers will try and do much more than they need to. Our&lt;br&gt;
MCP server just does object storage. There’s no possibility of it spinning up&lt;br&gt;
expensive servers and saddling you with a surprise bill with an unreasonable&lt;br&gt;
number of zeroes in it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;To get started, create some &lt;a href="https://console.tigris.dev/createaccesskey" rel="noopener noreferrer"&gt;access keys&lt;/a&gt; and then install our MCP server:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edit your config file&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Add this snippet to your &lt;code&gt;claude_desktop_config.json&lt;/code&gt; or &lt;code&gt;mcp.json&lt;/code&gt; for Cursor&lt;br&gt;
AI&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tigris-mcp-server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@tigrisdata/tigris-mcp-server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS_ACCESS_KEY_ID"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_AWS_ACCESS_KEY_ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS_SECRET_ACCESS_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_AWS_SECRET_ACCESS_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS_ENDPOINT_URL_S3"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://fly.storage.tigris.dev"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Run the init script&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run the init script in your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npx -y @tigrisdata/tigris-mcp-server init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then ask your editor to make you a bucket for a project and it will! More instructions are on the official npm package &lt;a href="https://www.npmjs.com/package/@tigrisdata/tigris-mcp-server" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trust
&lt;/h2&gt;

&lt;p&gt;AI editors and tooling are really cool, but there’s some key things you should&lt;br&gt;
be aware of before you blindly trust this. The Model Context Protocol ecosystem&lt;br&gt;
is still very new, so there will almost certainly be problems we solve together&lt;br&gt;
over time. There are also inherent risks involved in giving any tool access to&lt;br&gt;
your cloud storage accounts or filesystem.&lt;/p&gt;

&lt;p&gt;The Model Context Protocol server will run in the same level of sandboxing as your editor does. Be careful with what you install and always double-check what you run before you run it.&lt;/p&gt;

&lt;p&gt;In order to make this as safe as possible, we’ve made the Model Context Protocol&lt;br&gt;
also available as a docker container. This means you can run it in a sandboxed&lt;br&gt;
environment and not have to worry about it having access to your local&lt;br&gt;
filesystem. You can run it in a container and container have access to a&lt;br&gt;
specific directory on your local filesystem. This is a great way to make sure&lt;br&gt;
that the Model Context Protocol server can only access the files you want it to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edit your config file&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Add this snippet to your &lt;code&gt;claude_desktop_config.json&lt;/code&gt; or &lt;code&gt;mcp.json&lt;/code&gt; for Cursor&lt;br&gt;
AI. Please note that CURRENT_USER references the user running the command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tigris-mcp-server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docker"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-e"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"AWS_ACCESS_KEY_ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-e"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"AWS_SECRET_ACCESS_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-e"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"AWS_ENDPOINT_URL_S3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--network"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"tigris-mcp-server-claude-for-desktop"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;tigris-mcp-server-cursor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Cursor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;AI&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"tigris-mcp-server:/app/dist"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--rm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--mount"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"type=bind,src=/Users/CURRENT_USER/tigris-mcp-server,dst=/Users/CURRENT_USER/tigris-mcp-server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"quay.io/tigrisdata/tigris-mcp-server:latest"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS_ACCESS_KEY_ID"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_AWS_ACCESS_KEY_ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS_SECRET_ACCESS_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_AWS_SECRET_ACCESS_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS_ENDPOINT_URL_S3"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://fly.storage.tigris.dev"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Run the init script&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run the init script in your terminal and select &lt;strong&gt;Docker&lt;/strong&gt; as option when&lt;br&gt;
prompted&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npx -y @tigrisdata/tigris-mcp-server init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;This Model Context Protocol server will run with the full power and authority of any credentials you give it. Be very careful about typos with object names that have similar token distances.&lt;/p&gt;

&lt;p&gt;Additionally, AI tools are fundamentally built around random behavior and will&lt;br&gt;
have unexpected results at times. Sometimes it takes the AI a couple tries to&lt;br&gt;
learn what you want to do. Be very careful, as typos in an AI context can have&lt;br&gt;
much more drastic consequences than they can in normal contexts. We don’t want&lt;br&gt;
you to lose data you need. For example, when you run the &lt;code&gt;DeleteBucket&lt;/code&gt; call,&lt;br&gt;
you are &lt;em&gt;not allowed&lt;/em&gt; to do this unless the bucket has no data in it.&lt;/p&gt;

&lt;p&gt;In order to be as transparent as possible, we’ve made our Model Context Protocol server &lt;a href="https://github.com/tigrisdata/tigris-mcp-server" rel="noopener noreferrer"&gt;open source&lt;/a&gt; and are actively monitoring that repository.&lt;/p&gt;

&lt;p&gt;We’re making this as safe and reliable as possible. Part of this is the scope&lt;br&gt;
reduction we mentioned earlier: we’re only managing your buckets and objects. The other part is by going out of our way to make this tool as boring as possible. Boring code is easy to understand, easy to maintain, and easy to learn from. We hope that this will help you make the exciting parts of your program while leaving the boilerplate to machines.&lt;/p&gt;

&lt;p&gt;We hope this will make using Tigris absolutely frictionless and that you can&lt;br&gt;
learn how S3’s API works in the process. Not to mention, we want you to get out&lt;br&gt;
there and build things!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigrisdata.com/docs/get-started/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff784t7kvuj2zms095m7a.png" alt="Want to try it out? Get Started"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was originally published on April 3, 2025 at tigrisdata.com/blog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Global by Design: Tigris's Distributed Object Storage Architecture</title>
      <dc:creator>Shared Account</dc:creator>
      <pubDate>Tue, 24 Jun 2025 22:00:01 +0000</pubDate>
      <link>https://dev.to/tigrisdata/global-by-design-tigriss-distributed-object-storage-architecture-48m1</link>
      <guid>https://dev.to/tigrisdata/global-by-design-tigriss-distributed-object-storage-architecture-48m1</guid>
      <description>&lt;p&gt;At Tigris, globally replicated object storage is our &lt;em&gt;thing&lt;/em&gt;. But why should you want your objects "globally replicated"? Today I'm gonna peel back the curtain and show you how Tigris keeps your objects exactly where you need them, when you need them, by default.&lt;/p&gt;

&lt;p&gt;Global replication matters because computers are ephemeral and there's a tradeoff between performance and reliability. But does there have to be?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia0.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExYnF4bmRqZ2F0dDZ0Zms0dnM5d2Rod3MzbDZoMXlyaDNjdnA1eDNpaSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw%2FhelFH183K0MJh0XePb%2Fgiphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia0.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExYnF4bmRqZ2F0dDZ0Zms0dnM5d2Rod3MzbDZoMXlyaDNjdnA1eDNpaSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw%2FhelFH183K0MJh0XePb%2Fgiphy.gif" width="480" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Storage devices can and will degrade over time. Your CPUs aren't immune from it either, recent &lt;a href="https://community.intel.com/t5/Blogs/Tech-Innovation/Client/Intel-Core-13th-and-14th-Gen-Desktop-Instability-Root-Cause/post/1633239" rel="noopener noreferrer"&gt;Intel desktop CPUs&lt;/a&gt; have been known to start degrading and returning spontaneous errors in code that should work. Your datacenters could be hit by a meteor or a pipe could burst: being in the cloud doesn't mean perfect reliability. But failovers and multiple writes take precious time. We write your data to 11 regions based on access patterns, so you get low latency (and therefore higher user retention), without sacrificing reliability.&lt;/p&gt;

&lt;p&gt;Here's how Tigris globally replicates your data; but first, let's cover the easy and hard problems of object storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Object storage 101
&lt;/h2&gt;

&lt;p&gt;At its core, object storage is an unopinionated database. You give it data, metadata, and a key name, then it stores it. When you want the data or metadata back, you give the key and it gives you what you want. This is really the gist of it, and you can summarize most of the uses of object storage in these calls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PutObject - add a new object to a bucket&lt;/li&gt;
&lt;li&gt;GetObject - get the data and metadata for an object in a bucket&lt;/li&gt;
&lt;li&gt;HeadObject - get the metadata for an object in that bucket&lt;/li&gt;
&lt;li&gt;DeleteObject - banish an object to the shadow realm, removing it from the bucket&lt;/li&gt;
&lt;li&gt;ListObjectsV2 - list the metadata of a bunch of objects in a bucket based on the key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the core of how object storage is used. The real fun comes in when you create a bucket. A bucket is the place where all your objects are stored. It's kind of like putting a bunch of shells in a bucket when you're at the beach.&lt;/p&gt;

&lt;p&gt;Most object storage systems make you choose up front where in the world you want to store your objects. They have regions all over the world, but if you create a bucket in us-east-1, the data lives and dies in us-east-1. Sure, there's ways to work around this like bucket replication, but then you have to pay for storing multiple copies, and wait for cross region replication to get around to copying your object over. Tigris takes a different approach: your objects are dynamically placed by default.&lt;/p&gt;

&lt;p&gt;Tigris has &lt;a href="https://www.tigrisdata.com/docs/concepts/regions/" rel="noopener noreferrer"&gt;servers all over the world&lt;/a&gt;. Each of those regions might have any given object, and they might not (unless you restrict the regions to comply with laws like GDPR). What happens when you request an object that doesn't exist locally?&lt;/p&gt;

&lt;h2&gt;
  
  
  How Tigris does global replication
&lt;/h2&gt;

&lt;p&gt;Tigris takes a different approach here. Tigris uses a hybrid of pushing metadata out to every region, but only pulling the data when it's explicitly requested. We use FoundationDB as our database.&lt;/p&gt;

&lt;p&gt;In Tigris we have three tiers:&lt;/p&gt;

&lt;p&gt;SSD Cache: Near instant responses for either data+metadata or just the metadata &lt;a href="https://www.foundationdb.org/" rel="noopener noreferrer"&gt;FoundationDB&lt;/a&gt;: Fast but transactional responses for data+metadata if the object is inlined to the FoundationDB record, otherwise just the metadata&lt;br&gt;
Block storage: More latent responses for objects that are not in the SSD cache&lt;br&gt;
Overall it looks kinda like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3987xepynzlddu97k9xc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3987xepynzlddu97k9xc.jpg" alt="diagram of distribution" width="719" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's see what happens when a user uploads a file to a bucket:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zlfstvbsg8wccv7qwrr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zlfstvbsg8wccv7qwrr.jpg" alt="diagram of distribution" width="717" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The user uploads the picture of Rick Astely and its corresponding metadata. These two are separately handled. The picture is put into block storage (and maybe the SSD cache), but the metadata is stored directly in FoundationDB. Then the metadata is queued for replication.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax401qnrv4sb20erv6cq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax401qnrv4sb20erv6cq.jpg" alt="diagram of distribution" width="720" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A backend service handles our replication model. When it sees a new record in the replication queue, it eagerly pushes out the metadata to every other region.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkv5vtfmm46u8qyv05s1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkv5vtfmm46u8qyv05s1.jpg" alt="diagram of metadata being pushed to every other region" width="722" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The really cool part about how this works under the hood is that the database is itself the message queue. Time as an ordered phenomenon*. FoundationDB is an ordered datastore. The replication queue entries use the time that the object was created in its key name.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;&lt;br&gt;
*Okay yes there's issues like time dilation when you're away from a large source of mass like the earth (this is noticeable in the atomic clocks that run GPS in low earth orbit), and if you're on a spaceship that's near the speed of light. However, I'm talking about time in a vacuum with a nearby source of great mass, perfectly spherical cows, and whatnot, so it's really not an issue for this example.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This database-as-a-queue is based on how &lt;a href="https://www.foundationdb.org/files/QuiCK.pdf" rel="noopener noreferrer"&gt;iCloud's global replication works&lt;/a&gt;. It gives us a couple key advantages compared to using something like postgres and kafka:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data can be stored and queued for replication in the same transaction, meaning that we don't have to coordinate transactional successes and failures between two systems&lt;/li&gt;
&lt;li&gt;Tigris is already an expert in running FoundationDB, so we can take advantage of that experience and share it with our message queue, making this a lot less complicated in practice.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This isn't a free lunch, there's one sharp edge that you may run into: that replication takes a nonzero amount of time. It usually takes single digit seconds at most, which is more than sufficient for most applications. We're working on ways to do better though!&lt;/p&gt;
&lt;h2&gt;
  
  
  The secret fourth tier
&lt;/h2&gt;

&lt;p&gt;Remember how I said that Tigris has three tiers: block storage, SSD cache, and inline FoundationDB rows? There's actually a secret fourth tier: other Tigris regions. This is the key to how Tigris makes your data truly global.&lt;/p&gt;

&lt;p&gt;Let's say you upload the pic of Rick to San Jose and someone requests it from Chicago. First, the data is put into San Jose's block storage layer and the metadata is queued for replication.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe83x03uz2p4tu3t7co0u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe83x03uz2p4tu3t7co0u.jpg" alt="diagram of distribution" width="718" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's a dirty trick going on in the metadata, let's double click on it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Metadata:

Name: rickastley.jpg
Size: 63,178 bytes
Cache: forever
Regions: SJC
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every bit of metadata contains a reference to block storage. The cool part is that any Tigris region can pull from the block storage service in every other region. Then it stores it inside the cache layer like normal.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fut0mt9onumsfs0h5skfq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fut0mt9onumsfs0h5skfq.jpg" alt="diagram of distribution" width="720" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once it's done, it updates the metadata for the object to tell other Tigris regions that it has a copy and queues that for replication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Metadata:

Name: rickastley.jpg
Size: 63,178 bytes
Cache: forever
Regions: SJC, ORD
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means that there's actually four tiers: FoundationDB, SSD cache, local block storage, and remote region's block storage.&lt;/p&gt;

&lt;p&gt;There's also a neat trick we can do with this. We can have one of our regions get hit by a meteor and come out on the other side of it smiling. Take a look at this series of unfortunate events. Let's say you upload the pic of Rick and then SJC gets wiped off the internet map:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6la5idc915t695ow8hcy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6la5idc915t695ow8hcy.jpg" alt="cartoon asteroid destroys the SJC region" width="719" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The metadata was already replicated and the data was uploaded to block storage, so it doesn't matter.&lt;/p&gt;

&lt;p&gt;The user in Chicago can still access the picture because the Chicago region is just accessing the copy of the image in block storage. The block storage service runs in the same region as the Tigris frontend, but specifically on a different provider. This combined with other dirty internet tricks like anycast routing means that we can suffer losing entire regions and the only proof that it's happening is either our status page or you might notice that uploads and downloads are a tiny bit slower until the regions come back up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbl5gneeih8nd39n4nsfm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbl5gneeih8nd39n4nsfm.jpg" alt="diagram of distribution of metadata which is already protected" width="718" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is what sold me on Tigris enough to want to work with them. This ridiculous level of redundancy, global replication, and caching is the key to how Tigris really makes itself stand apart from the crowd. What I think is the best part though is that here's how you enable all of this:&lt;/p&gt;

&lt;p&gt;All you have to do is create a bucket and put objects into it. This global replication is on by default. You don't have to turn it on. It just works.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Configurable?
&lt;/h2&gt;

&lt;p&gt;What about the GDPR? Some European countries want their companies to store European data in Europe. We support that. When you create a bucket, you can attach an X-Tigris-Regions header that restricts the objects so that the data lives and dies in Europe. You can do this when you create objects too. See &lt;a href="https://www.tigrisdata.com/docs/objects/object_regions/#restricting-to-specific-regions" rel="noopener noreferrer"&gt;restricting to specific regions&lt;/a&gt; for more information. When someone outside of the EU views the objects, Tigris will just reverse proxy it over. It'll be slower, but the data will not be replicated outside of the EU. This works for individual regions too, just in case you need your hockey game pictures to only ever be stored in Newark.&lt;/p&gt;

&lt;p&gt;Sometimes you need eager caching on PUT. We support that with the &lt;a href="https://www.tigrisdata.com/docs/objects/caching/#caching-on-put-eager-caching" rel="noopener noreferrer"&gt;accelerate flag&lt;/a&gt;. When you upload a picture to ORD, it'll get pushed out all over the world for you:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8s2di64uvg057s9rjdcq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8s2di64uvg057s9rjdcq.jpg" alt="diagram of the uploaded image being pushed all over the world for you" width="719" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives you all the latency advantages of having a traditional centralized architecture as well as the simplicity of a decentralized architecture. It's really the best of both worlds.&lt;/p&gt;

&lt;p&gt;Wanna use Tigris for your workloads, be they AI, conventional, or even for offsite backups? Get started today at &lt;a href="https://storage.new/" rel="noopener noreferrer"&gt;storage.new&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigrisdata.com/docs/get-started/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff784t7kvuj2zms095m7a.png" alt="Want to try it out? Get Started" width="800" height="128"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article was originally published on April 1, 2025 at tigrisdata.com/blog&lt;/em&gt;&lt;/p&gt;

</description>
      <category>scalability</category>
      <category>foundationdb</category>
      <category>replication</category>
      <category>objectstorage</category>
    </item>
  </channel>
</rss>
