DEV Community: Andrzej Górski

How I (almost) rescued data from a failed ZFS pool

Andrzej Górski — Sat, 22 Feb 2025 00:00:00 +0000

Introduction

Recently HDD storage pool in my Homelab started to work “funny”. Funkiness appeared as pool clogging when heavy traffic happened on it. I’ve lived with that, restarting machine occasionally for a month or so, being busy with the life.

But one day I thought that maybe it’s time to make a backup of the data on that pool. Oh boy, was I wrong and right at the same time. Making backups was definitely right thing to do, but not at that specific time.

After I’ve started the backup process, the pool started to clog again. “No problem, another reboot and we’re alive” I’ve thought. And this time it was different - the pool didn’t import anymore.

All disks in pool had clear SMART status before the malfunction. So I’ve started to look into the zfs status output. And it was not good:

# zpool import hdd
              cannot import 'hdd': I/O error
        Destroy and re-create the pool from
        a backup source.

One of the disks in the pool was dead. And it was dead that much, that it interfered with controller’s operation. Moreover, last reboot not only didn’t help, but caused the pool to be exported forcefully (I guess) and it broke the metadata.

Troubleshooting

First, I’ve tried to import the pool with -f flag, then -f -F, but it didn’t help either. The pool was definitely dead.

Normally, I would just replace the broken disk with a new one, replace it in zfs and do a resilver. But the problem was that the metadata was corrupted. And pool with corrupted metadata can’t be imported. And without importing the pool, I can’t replace the disk.

After some Googling and more or less proper troubleshooting, I’ve found a way to import the pool with broken metadata. These commands did the trick:

echo 1 > /sys/module/zfs/parameters/zfs_max_missing_tvds
echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata
echo 0 > /sys/module/zfs/parameters/spa_load_verify_data

First one makes it possible to import the pool with missing disk. The other two disable metadata and data verification. It’s not recommended to use these commands in production. But in my case, I didn’t have anything to lose.

After that, wiser with the knowledge gathered during Googling, I’ve imported the pool in read-only mode:

zpool import -f -o readonly=on hdd

And it worked! The pool was imported in read-only mode. I’ve checked the data and it was there. Mostly. Some files were corrupted, but most of them were intact.

Data recovery

After the pool was imported, I’ve tried to copy the data to another disk. First with simple cp, then with rsync. But they all hanged on broken files.

Another research later I’ve found a tool called cpio. It’s a tool that can copy files to and from archives. But with some flags, and some pipes magic, I’ve managed to use it to copy file-by-file.

find /hdd -depth -print0 | cpio -pdmv0 /target

This command copies all files from /hdd to /target.

After a few hours, the data was copied. I’ve managed to rescue about 70% of files intact. The rest was corrupted. But it was better than nothing :)

Conclusion

Backups, people!

Shame one me, cause it was second time when I lost some data because of broken disk. And it was second time when I didn’t have a proper, automated backup. I’ve learned my lesson and I’ve started to make backups of my data. And I recommend you to do the same.

The good thing is, I’ve had some old (manual) backup of the very same pool, so I’ve managed to backfill some of the lost files from it.

RAID (or ZRAID) is not a backup

I knew that, but I didn’t act on that knowledge. RAID is not a backup. It’s a redundancy. It’s good to have it, but it’s not enough. You should have a backup of your data.

SMART monitoring won’t help you sometimes

All disks in the pool had clear SMART status before the malfunction. It just happens that disk can die without any SMART errors. So don’t rely only on SMART monitoring.

But it’s still better to have them monitored than not. I’ve had few disks that started showing SMART errors, and I wast fast enough to replace them before they died.

Backblaze B2 as a Terraform remote state storage

Andrzej Górski — Fri, 21 Feb 2025 00:00:00 +0000

Introduction

Recently I’ve started The Big Migration™ from Ansible to Terraform in my homelab. But as soon as I started to write my first Terraform manifests, I’ve started to think about remote state storage too.

First, I thought about storing the state file on selfhosted Minio instance. But the problem is, that Minio instance will be managed by the very same Terraform manifests.

Later on, I started researching what else can I manage with Terraform. I’ve found out that Backblaze B2, where I keep backups anyway, is manageable with Terraform! That was the moment when it clicked in my head - all in all, B2 is S3 compatible, so I can use it as a remote state storage for Terraform!

Prerequisites

Backblaze B2 account, with:
1. Bucket in which you want to store the state file
2. Application key with permissions to manage this bucket
Terraform installed on your machine

Terraform configuration

The basic config for storing the state file in S3 would look like this:

terraform { 
  backend "s3" {
    bucket = "my-terraform-state-bucket"
    key = "terraform.tfstate"
    region = "us-east-1"
  }
}

One might think that it’s enough to change the region to B2 substitute, add Backblaze endpoint and we’re good to go. But it’s not that simple - B2 is almost S3 compatible, so we have to do some extra steps.

What we need to do is to skip some checks and validations that Backblaze B2 doesn’t support. And there’s actually quite a lot of them.

A fully working example of Terraform configuration for storing the state file in Backblaze B2 would look like this:

terraform {
  backend "s3" {
    bucket = "my-terraform-state-bucket"
    key = "terraform.tfstate"
    region = "us-west-004"
    endpoint = "https://s3.us-west-004.backblazeb2.com"

    skip_credentials_validation = true
    skip_region_validation = true
    skip_metadata_api_check = true
    skip_requesting_account_id = true
    skip_s3_checksum = true
  }
}

Of course, you’ll have to replace bucket, region, and endpoint values with your proper ones according to your Backblaze B2 config.

As you can see, there’s no access_key and secret_key provided. That’s because I provide them through environment variables (and you should too!). B2’s application key goes to AWS_SECRET_ACCESS_KEY and key ID goes to AWS_ACCESS_KEY_ID env var.

Some security considerations

State bucket

Keep it private. It’s not a good idea to make your state file publicly available, as it might contain secrets.

You might also want to enable versioning on this bucket. With versioning you can easily revert to the previous state if something goes wrong. I’ve seen Terraform go bananas a few times, so it’s a good idea to have this feature enabled.

Application key

Don’t use your master key for this. Create a separate application key with permissions to manage only this bucket. It’s a good practice to have separate keys for different tasks.

Don’t put your credentials in the Terraform code (or any code, really). Especially if you’re gonna ever put that code publicly eg. on GitHub. One “oops” too far and your keys leaked. Use environment variables to provide them to Terraform. I personally load them into env vars from 1password with op cli tool.

Summary

It seems that Backblaze B2 is enough S3-compatible to be used as a remote state storage for Terraform. It’s good to keep your state file in some remote storage, as it’s a good practice to have it versioned and not stored on your local machine. And if you already use B2 for backups, why not use it for Terraform state file as well?

Compose Key

Andrzej Górski — Tue, 13 Jun 2023 00:00:00 +0000

What is a Compose Key?

Compose Key is a special key on your keyboard that allows you to type special characters like →, °, €, ä, ™, ®, ¿ and many more. It’s very useful if you’re writing in any foreign language that uses special characters and don’t want to mangle your keyboard settings all the time.

Examples

Compose + - + > will produce a → character
Compose + o + o will produce a ° character
Compose + c + = will produce a € character
Compose + a + " will produce a ä character
Compose + t + m will produce a ™ character
Compose + r + o will produce a ® character
Compose + ? + ? will produce a ¿ character

As you can see in the examples above, the combinations are very thought out and easy to remember. Most of the time you don’t need to know them by heart. You just have to think what two (or more) keys from the standard key set combined will give you the character that you want.

How to enable Compose Key?

As I use Gnome 3, I’ll describe how to set it up in this environment. But don’t worry, it can be enabled in pretty much every WM/DE.

So, at first you have to go to the Settings and then Keyboard:

Then you have to click the Compose Key option in the Special Character Entry section. There you can turn it on and choose which key you want to use as a Compose Key.

As you can see on the screenshot, I use the Caps Lock key, but choose whatever you like.

Lists of possible combinations

These two lists are the best that I found in the internet:

GTKComposeTable - rather short, very readable list of most useful combinations.
libX11 documentation - very long, hard to read but very complete list of all possible combinations.

Big PostgreSQL Problems: ID exhaustion

Andrzej Górski — Tue, 13 Jun 2023 00:00:00 +0000

If you’re a PostgreSQL user, you may have encountered some issues with ID exhaustion already. This is a common problem, especially for databases that handle a large amount of data and/or have a high volume of insertions and deletions.

How does this happen?

In my case, it was a web app backend. There was a heavily used many-to-many relationship between the two tables. The application operated in such a way that the associations between those two tables were created and removed quite often.

One day I found that the app had crashed. On production! After a quick investigation, I found the cause of the problem - IDs in the associative table have been exhausted. It turns out that heavy associations and disassociations consumed all the available IDs.

Other causes of ID exhaustion that I met

Simply, tons of data.
INSERT ... ON CONFLICT ... - so, upserts most of the time.

Simply put, each insert, successful or not, will bump the ID’s sequence. You can even finish with a very small number of rows in the table, like a hundred or something, and yet the IDs will be exhausted.

But they said in the docs that the number of rows in the table is unlimited!

As I mentioned in the first post of this series, the maximum number of rows in a single PostgreSQL table is unlimited. So why did we end up with a broken app in the middle of the night?

Types!

To be exact, the type of the primary key field. The “default” way of creating a new table that can be spotted in many tutorials looks like this:

CREATE TABLE user (
  id serial PRIMARY KEY,
  username VARCHAR (50) UNIQUE NOT NULL,
  ...
);

As you can see in the highlighted line, the primary key (id) column, is gonna be of type serial. Also, as far as I know, most of the frameworks/ORMs choose serial as the default type for the primary key column. But what serial is?

In reality, it is just four bytes signed integer. Or, rather, a positive part of it. So, a table with a primary key column of type serial can hold up to 2^31 (a bit more than two billion) rows.

How to fix this?

The easiest fix is to migrate both the primary key column and sequence for that column to another, bigger type. If there are any tables that have a relation to this column, they need to be migrated too. The most used, bigger than serial, types for primary key columns are bigserial and UUID.

Bigserial

As you probably guessed, the bigserial type is in reality a bigint - an eight bytes signed integer. A table with a primary key of that type can hold up to 2^63 rows - it’s more than four billion times more than the serial.

Migration of both the column and sequence from serial to bigserial is rather easy and fast. But what if it’s still not enough?

UUID

UUID is an even bigger type - it’s 128 bits long! But it’s not numeric like the previous ones, so it has certain consequences.

The most important one, in this case, is that the migration of the primary key column from serial or bigserial to UUID is not that simple. The clue of the problem is that UUIDs aren’t generated in series, but randomly. The representation of the UUIDs also is different than numeric types. It all together causes that it’ll be needed to:

Rewrite all the existing primary keys - both in the problematic table and in the tables that relate to it. And this isn’t a cheap (in terms of execution time) operation at all.
Change the way they are generated - as they’re no more serial, you’ll need to generate it either on the app side or the DB side. On the DB side, the UUID-OSSP module is recommended.
Pay attention to collisions - as they’re randomly generated, the more rows you’ll have, the more likely the collisions will appear.

But besides those downsides, there are some good sides to using UUID as the primary key:

Obviously, it’s size.
As UUIDs aren’t serial, you’ll have an extra layer of “security by obscurity” for free - one can not simply guess the ID of the next or previous row in the table anymore.
As they can be generated on the app side, it enables the development of the app in a more DDD way, as you can have a known entity ID even before it’s stored in DB and push it down through the layers.

But hey, they said in the docs that the number of rows in the table is unlimited!

And obviously, that’s true, who would lie in the documentation :) As long as you have a primary key in the table, you’re limited by it. If you want to get rid of that limit, you’ll have to get rid of the primary key - and that is, too, a fix to ID exhaustion.

Exercise!

If you want to try it on your own, below you can find a very simple showcase of ID exhaustion:

CREATE TABLE tbl (
  id smallserial PRIMARY KEY
);

INSERT INTO tbl SELECT FROM generate_series(1, 32767);

INSERT INTO tbl DEFAULT VALUES;

In lines 1-3, you can see the preparation of the table that we will test on. As you can see, I used type even smaller than serial, a smallserial - two bytes signed integer.

Then in line 5, the table is filled to the brim.

And finally, in line 7 with the last one insert, we’re causing the following error:error: nextval: reached maximum value of sequence "tbl_id_seq" (32767)which means that the IDs were exhausted and this row (and the following rows) will not be inserted.

Extra links

That’s it!

Thanks for reading!

Big PostgreSQL Problems: Introduction

Andrzej Górski — Tue, 20 Sep 2022 00:00:00 +0000

This is the first post in a series about problems that I encountered while working with rather big PostgreSQL databases. I will describe in it what are my assumptions about the database size and also some facts from PostgreSQL documentation.

How big table or database have to be to be considered really “big”?

There’s no hard definition of that topic, so there are only my assumptions.

About a single table, I’d say that it can be considered big when it approaches 100 million rows. But of course, it depends on the row size itself. Things will be completely different for a table with two or three simple integers contrary to, let’s say twenty text columns loaded with heavy data.

On the database topic – in my opinion, 100 GB is a quite large database. But , again, it depends, on how many tables are there and how heavy single rows are.

What are the PostgreSQL limits?

Now we are in a much better situation, as they’re documented in the official PostgreSQL FAQ. So, after the FAQ:

Maximum size for a database? Unlimited (32 TB databases exist)
Maximum size for a table? 32 TB
Maximum size for a row? 400 GB
Maximum size for a field? 1 GB
Maximum number of rows in a table? unlimited
Maximum number of columns in a table? 250-1600, depending on column types
Maximum number of indexes on a table? unlimited

That’s all for today

It wasn’t a very long post, was it? But no worries, the next articles in the series will be more substantial (at least I hope so 🙂 ). This article is more like a common point to have something to refer to in subsequent posts.