<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ramin Najjarbashi</title>
    <description>The latest articles on DEV Community by Ramin Najjarbashi (@raminnietzsche).</description>
    <link>https://dev.to/raminnietzsche</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1053598%2Fdc1aca15-37d9-4d9b-b934-665b023a9465.jpg</url>
      <title>DEV Community: Ramin Najjarbashi</title>
      <link>https://dev.to/raminnietzsche</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/raminnietzsche"/>
    <language>en</language>
    <item>
      <title>The Great Orphan Object Hunt: How I Fixed Our Cloud Storage Woes</title>
      <dc:creator>Ramin Najjarbashi</dc:creator>
      <pubDate>Mon, 27 Mar 2023 21:27:41 +0000</pubDate>
      <link>https://dev.to/raminnietzsche/the-great-orphan-object-hunt-how-i-fixed-our-cloud-storage-woes-2gni</link>
      <guid>https://dev.to/raminnietzsche/the-great-orphan-object-hunt-how-i-fixed-our-cloud-storage-woes-2gni</guid>
      <description>&lt;p&gt;As an SRE working with a cloud provider and object storage, I've learned firsthand how tricky it can be to keep a Ceph cluster running smoothly. And while most of the time, everything hums along just fine, every once in a while, we encounter a problem that leaves us scratching our heads.&lt;/p&gt;

&lt;p&gt;Recently, we started getting calls from users who were confused about their usage metrics. They were seeing that the sum of all objects in their buckets was many fewer than what we were showing in our panel, and naturally, they were concerned. After some digging, we discovered that there were orphan objects stuck in the cluster - objects that couldn't be shown by any client, like s3cmd, boto3, or mc.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EZb7ZmsL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/63cpzo9gd54cs5bbdyzx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EZb7ZmsL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/63cpzo9gd54cs5bbdyzx.jpg" alt="Oh shit! something wrong" width="563" height="665"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, with billions of objects in our cluster, finding and removing these orphan objects was no easy task. I tried using normal tools, but they just weren't up to the job. So, I had to get creative.&lt;/p&gt;

&lt;p&gt;I started by creating a bash script that would help me identify and remove these orphan objects. Here's the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="c"&gt;#set -e&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"No bucket name provided"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi
&lt;/span&gt;&lt;span class="nv"&gt;BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;

&lt;span class="nv"&gt;BUCKET_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;radosgw-admin bucket stats &lt;span class="nt"&gt;-b&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.id'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Failed to get bucket ID for &lt;/span&gt;&lt;span class="nv"&gt;$BUCKET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nv"&gt;INDEX_POOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;radosgw-admin zone get | jq &lt;span class="s1"&gt;'.placement_pools[].val.index_pool'&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;DATA_POOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;radosgw-admin zone get | jq &lt;span class="s1"&gt;'.placement_pools[].val.storage_classes[].data_pool'&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;ORPHANS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;radosgw-admin bucket check &lt;span class="nt"&gt;-b&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;".[]"&lt;/span&gt; &lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ORPHANS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"No orphan objects found in &lt;/span&gt;&lt;span class="nv"&gt;$BUCKET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nv"&gt;UPLOAD_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;object &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$ORPHANS&lt;/span&gt;
&lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;NEW_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$object&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s1"&gt;'.'&lt;/span&gt; &lt;span class="s1"&gt;'{print $(NF-1)}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;UPLOAD_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$UPLOAD_ID&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$NEW_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# You must Remove this objects from data pool&lt;/span&gt;
&lt;span class="nv"&gt;ORPHAN_OBJECTS_DATA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;radosgw-admin bucket radoslist &lt;span class="nt"&gt;-b&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET&lt;/span&gt; | egrep &lt;span class="nt"&gt;--text&lt;/span&gt;  &lt;span class="s2"&gt;"(&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$UPLOAD_ID&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="s1"&gt;'|'&lt;/span&gt; &lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;)"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ORPHAN_OBJECTS_DATA&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;then
    while &lt;/span&gt;&lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; object
    &lt;span class="k"&gt;do
      &lt;/span&gt;&lt;span class="nb"&gt;echo &lt;/span&gt;rados &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$DATA_POOL&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nv"&gt;$object&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true
    &lt;/span&gt;&lt;span class="k"&gt;done&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt;  &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ORPHAN_OBJECTS_DATA&lt;/span&gt;&lt;span class="p"&gt;// /\\ &lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# You must Remove this omapkeys from index pool&lt;/span&gt;
&lt;span class="nv"&gt;SHARDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;radosgw-admin bucket stats &lt;span class="nt"&gt;-b&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;".num_shards"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt; &lt;span class="nv"&gt;COUNTER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0&lt;span class="p"&gt;;&lt;/span&gt; COUNTER&amp;lt;&lt;span class="nv"&gt;$SHARDS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; COUNTER+&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="o"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;ORPHAN_OMAP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;rados &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$INDEX_POOL&lt;/span&gt; listomapkeys .dir.&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BUCKET_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;COUNTER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; | egrep &lt;span class="nt"&gt;--text&lt;/span&gt; &lt;span class="s2"&gt;"(&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$UPLOAD_ID&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="s1"&gt;'|'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;)"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$ORPHAN_OMAP&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;then
      while &lt;/span&gt;&lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; omap
      &lt;span class="k"&gt;do
        &lt;/span&gt;&lt;span class="nb"&gt;echo &lt;/span&gt;rados &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$INDEX_POOL&lt;/span&gt; rmomapkey .dir.&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BUCKET_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;COUNTER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nv"&gt;$omap&lt;/span&gt;
      &lt;span class="k"&gt;done&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ORPHAN_OMAP&lt;/span&gt;&lt;span class="p"&gt;// /\\ &lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;fi
done&lt;/span&gt;

&lt;span class="c"&gt;# Fix bucket&lt;/span&gt;
&lt;span class="nb"&gt;echo &lt;/span&gt;radosgw-admin bucket check &lt;span class="nt"&gt;-b&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--fix&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script uses radosgw-admin to check for orphan objects in a specified bucket, then removes them from both the data pool and the index pool. It's not a perfect solution, but it worked for us - and it might just work for you too!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qvzfmgPO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kfnjwuokpmlu48t6uga9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qvzfmgPO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kfnjwuokpmlu48t6uga9.jpg" alt="Oh shit! something right" width="564" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  But why do orphan objects even happen in the first place?
&lt;/h2&gt;

&lt;p&gt;Well, it's a complicated issue, and one that's beyond the scope of this post. But suffice it to say that they can be caused by a number of factors, including bugs in the system, network issues, or even user error.&lt;/p&gt;

&lt;h2&gt;
  
  
  So, what can you do if you find yourself in a similar situation?
&lt;/h2&gt;

&lt;p&gt;My advice would be to start by using a script like the one I've provided here to identify and remove any orphan objects in your cluster. But beyond that, there are a few best practices you can follow to help prevent orphan objects from occurring in the first place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep your software up-to-date: Orphan objects can sometimes be caused by bugs in older versions of your storage software. Make sure you're always running the latest stable release.&lt;/li&gt;
&lt;li&gt;Monitor your cluster closely: Regularly monitoring your cluster's performance can help you catch orphan objects early, before they become a big problem.&lt;/li&gt;
&lt;li&gt;Educate your users: Make sure your users understand how to use your storage system correctly to minimize the risk of orphan objects caused by user error.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In conclusion, the Great Orphan Object Hunt was a challenging but ultimately rewarding experience. By using a bit of creativity and some handy tools, we were able to fix our cloud storage woes and keep our users happy. And with a bit of luck, we'll be able to avoid orphan objects altogether in the future!&lt;/p&gt;

&lt;h2&gt;
  
  
  UPDATE:
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://tracker.ceph.com/issues/16767"&gt;https://tracker.ceph.com/issues/16767&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cloudcomputing</category>
      <category>ceph</category>
      <category>sre</category>
      <category>objectstorage</category>
    </item>
  </channel>
</rss>
