DEV Community

Julian Engel for The Codeero Group

Posted on

Migrating Plausible Data Between Instances (Servers)

Overview

Over the weekend, our Caprover Instance (open-source heroku alternative) crashed inexplicably. After lots of digging, we found out that there was a corrupted file which prevented the swarm from starting. The error we got was: can't initialize raft node: irreparable WAL error: wal: max entry size limit exceeded. This is the equivalent to the Windows Blue Screen of Death... This docker instance was beyond repair. We started to investigate and found out that we had a 40gb Click House volume from Plausible.
It turned out that query logging had inflated our database size to 40GB, whereas the actual data was just 1GB.
Important Note: This was fixed by the Plausible team a long time ago, however we supported (sponsored) and used the project right from the start, so our instance had the logging issue. To be able to save our data, we needed to move to a new Plausible install on a new server. (Migrating to Hetzner along the way).

We determined that these were the needed steps: export data from ClickHouse and PostgreSQL, transfer it securely, and import it into a new environment.

Easier said than done.... A weekend later, with lots of Stack Overflow and GPT-4 help, we managed to migrate over without any data loss. The below is a tutorial we wrote for our internal knowledge base.

Step 1: Exporting Data

A. Export from ClickHouse

  1. Access ClickHouse Container:
   docker exec -it <clickhouse-container-id> clickhouse-client
Enter fullscreen mode Exit fullscreen mode
  1. Check Database Size:

    • To list all databases and their sizes:
     SELECT database, formatReadableSize(sum(bytes_on_disk)) as total_size_on_disk FROM system.parts GROUP BY database ORDER BY sum(bytes_on_disk) DESC;
    
  • For detailed table sizes within a specific database:

     SELECT table AS table_name, formatReadableSize(sum(bytes_on_disk)) AS size_on_disk FROM system.parts WHERE database = 'your_database_name' GROUP BY table_name ORDER BY size_on_disk DESC;
    

==Note: Here we found that the clickhouse database was the culprit and had over 30gb in query logs.==

  1. Export Individual Tables: For each table in your database, export it as a CSV file. Repeat this process for every table you wish to export:
   clickhouse-client --query="SELECT * FROM your_database_name.your_table_name FORMAT CSV" > your_table_name.csv
Enter fullscreen mode Exit fullscreen mode

Optionally, compress the CSV files into a single archive for convenience using tar:

   tar -czvf your_database_backup.tar.gz *.csv
Enter fullscreen mode Exit fullscreen mode

B. Export from PostgreSQL

  1. Direct Export to Host: Use docker exec to run pg_dump within the PostgreSQL container, saving the output directly to the host machine:
   docker exec <postgres_container_id> pg_dump -U postgres your_database_name > /path/on/host/backupfile.sql
Enter fullscreen mode Exit fullscreen mode

Step 2: Download via SSH :

For us, the easiest way to grab the CSV files from ClickHouse and the sql file, was to copy them over using scp.

CSV Folder (Recursive):

   scp -r your_username@remote_host:/path/to/csv_folder /local/directory
Enter fullscreen mode Exit fullscreen mode

Single File:

   scp your_username@remote_host:/path/to/backupfile.sql /local/directory
Enter fullscreen mode Exit fullscreen mode

Step 3: Creating A New Plausible Instance

With our data safely stored on our local machine, it was time to spin up a new instance of Plausible. For this, I recommend the official Community Edition Guide from GitHub.

Important Note: Do not create a new account. Once you're on the registration page, it's time to import our data.

Step 3: Importing Data

A. Stop Plausible Docker Service

It's a good idea to pause the main Plausible container here in order to avoid any type of data corruption.

B. Importing the Postgres DB

  1. Access Postgres: First, access the PostgreSQL command line interface within your Docker container. Replace <container_name_or_id> with the name or ID of your PostgreSQL container:
docker exec -it <container_name_or_id> psql -U postgres
Enter fullscreen mode Exit fullscreen mode
  1. Drop the Existing Database Warning: Dropping a database will permanently delete it and all data contained within. Ensure you have backups if necessary.

From within the PostgreSQL CLI, run:

DROP DATABASE plausible_db;
Enter fullscreen mode Exit fullscreen mode

If the database is being accessed by other users, you might encounter an error. To force the database to drop, you can disconnect all connected users by running:

SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'plausible_db';
DROP DATABASE plausible_db;
Enter fullscreen mode Exit fullscreen mode

Then, exit the PostgreSQL CLI:

\q
Enter fullscreen mode Exit fullscreen mode
  1. Create a New Database: Still, within the Docker container's shell, create a new database with the same name:
docker exec -it <container_name_or_id> createdb -U postgres plausible_db
Enter fullscreen mode Exit fullscreen mode
  1. Import the SQL File:

Now, let's import the SQL file into the newly created database. On your host machine, run:

cat /path/to/your/plausible_backup.sql | docker exec -i <container_name_or_id> psql -U postgres -d plausible_db
Enter fullscreen mode Exit fullscreen mode

Replace /path/to/your/plausible_backup.sql with the actual path to your SQL file. This command streams the SQL file into the psql command running inside your Docker container, importing the data into your plausible_db database.

Additional Notes & Handling Issues :

  • Ensure that the SQL file contains the necessary commands to create tables and insert data. If it was generated by pg_dump, it should be fine.
  • If your SQL file is particularly large, the import process might take some time. Monitor the process and check for any errors in the output.

  • Role Does Not Exist Error: When importing into PostgreSQL we faced a "role does not exist" error. This was because the version we used had the user plausible, while the new one used postgres. Modify the SQL dump file (in any text editor) to replace OWNER TO plausible with OWNER TO postgres.

C. Check ClickHouse DB and Structure

  1. Access the ClickHouse Client

Initiate an interactive session with your ClickHouse container to access the ClickHouse client. Replace <container_name_or_id> with your container's actual name or ID:

docker exec -it <container_name_or_id> clickhouse-client
Enter fullscreen mode Exit fullscreen mode
  1. Select Your Database Switch to your target database to ensure subsequent commands apply to it:
USE plausible_events_db;
Enter fullscreen mode Exit fullscreen mode
  1. ** List All Tables** Display all tables within your selected database:
SHOW TABLES;
Enter fullscreen mode Exit fullscreen mode
  1. Examine Table Structure For details and to check that nothing changed on a specific table's structure, use the DESCRIBE TABLE command:
DESCRIBE TABLE events;
Enter fullscreen mode Exit fullscreen mode

Or more succinctly:

DESC events;
Enter fullscreen mode Exit fullscreen mode
  1. Query Table Data (Optional) To query data from a particular table, execute a SELECT statement:
SELECT * FROM events LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

D. Importing CSV Data into ClickHouse

Importing CSV into Clickhouse is done by one simple command per CSV.

You need to repeat this for every csv that you want to import. For us, these were the non-empty csv files from the export:

  • events_v2
  • events
  • sessions_v2
  • sessions
  • ingest_counters
  • schema_migrations
docker exec -i <container_name_or_id> bash -c "clickhouse-client --query=\"INSERT INTO plausible_events_db.TABLENAME FORMAT CSV\" < /path/in/container/CSVFILE.csv"
Enter fullscreen mode Exit fullscreen mode

Replace <container_name_or_id> and TABLENAME and CSVFILE with your container's name or ID and the CSV file's name and table.

Verifying Data Import

Ensure your data was accurately imported by executing a few checks:

  1. Count Imported Rows Verify the total row count in the events_v2 table:
   docker exec -it <container_name_or_id> clickhouse-client --query="SELECT COUNT(*) FROM plausible_events_db.events_v2;"
Enter fullscreen mode Exit fullscreen mode
  1. Inspect Initial Rows Look at the first few rows to confirm the data appears as expected:
   docker exec -it <container_name_or_id> clickhouse-client --query="SELECT * FROM plausible_events_db.events_v2 LIMIT 10;"
Enter fullscreen mode Exit fullscreen mode
  1. Check for Specific Data If looking for particular data, such as a specific event_id, tailor a query to verify its presence:
   docker exec -it <container_name_or_id> clickhouse-client --query="SELECT * FROM plausible_events_db.events_v2 WHERE event_id = 'expected_event_id' LIMIT 1;"
Enter fullscreen mode Exit fullscreen mode

Final Steps

  • Restart Plausible Container: After completing the imports, restart the Plausible container to initiate database connections and migrations. This can take 10-15 seconds.
  • Verification: Log in with your previous credentials to verify that all data has been successfully migrated.

Conclusion

I hope that this guide provided a structured approach to exporting, transferring, and importing database data for Plausible analytics, including troubleshooting common issues. Ensure you have backups and verify data integrity at each step to ensure a smooth transition.

Top comments (0)