Tarlan Isaev 🍓

Posted on Jan 25, 2023 • Edited on Feb 9, 2023

Adapted Firestore data migration to Supabase

#discuss #community #networking

As I continue my journey with data migration with my awesome dev team, I am adapting another tutorial and providing some SQL snippets for reference. I have previously mentioned that this task is quite difficult for me, but I am finding it to be an enjoyable learning experience. I am discovering new and useful knowledge, and I hope that it is not too intimidating for those who are also learning. Despite the challenges, I am eager to keep pushing forward and making progress. Data migration can be a complex process, but with dedication and perseverance, it is possible to master. I will keep sharing my experiences and insights along the way, in the hopes that it will be helpful for others who are also working on data migration projects :)

Please note that before updating the column, make sure to take the backup of the table or if you are using it in production take care of the down time as well.

Supabase provides several tools to convert data from a Firebase Firestore database to a Supabase PostgreSQL database. The process copies the entire contents of a single Firestore collection to a single PostgreSQL table.

The Firestore collection is "flattened" and converted to a table with basic columns of one of the following types: text, numeric, boolean, or jsonb. If your structure is more complex, you can write a program to split the newly-created json file into multiple, related tables before you import your json file(s) to Supabase.

Set up the migration tool
Clone the firebase-to-supabase repository:

git clone https://github.com/supabase-community/firebase-to-supabase.git
In the /firestore directory, create a file named supabase-service.json with the following contents:

{
  "host": "database.server.com",
  "password": "secretpassword",
  "user": "postgres",
  "database": "postgres",
  "port": 5432
}

Go to the Database settings for your project in the Supabase Dashboard.
Under Connection Info, copy the Host string and replace the entry in your supabase-service.json file.
Enter the password you used when you created your Supabase project in the password entry in the supabase-service.json file.

Generate a Firebase private key [#generate-firebase-private-key]
Log in to your Firebase Console and open your project.
Click the gear icon next to Project Overview in the sidebar and select Project Settings.
Click Service Accounts and select Firebase Admin SDK.
Click Generate new private key.
Rename the downloaded file to firebase-service.json.
Command line options
List all Firestore collections
node collections.js

Dump Firestore collection to JSON file

node firestore2json.js <collectionName> [<batchSize>] [<limit>]

batchSize (optional) defaults to 1000
output filename is .json
limit (optional) defaults to 0 (no limit)

In my case, I dumped Users collection from Firestore:
node firestore2json.js Users 1000 0

Customize the JSON file with hooks

I've done this step on VS Code. You can customize the way your JSON file is written using a custom hook. A common use for this is to "flatten" the JSON file, or to split nested data into separate, related database tables. For example, you could take a Firestore document that looks like this:

[{ "user": "mark", "score": 100, "items": ["hammer", "nail", "glue"] }]

And split it into two files (one table for users and one table for items):

[{ "user": "mark", "score": 100 }]

[
  { "user": "mark", "item": "hammer" },
  { "user": "mark", "item": "nail" },
  { "user": "mark", "item": "glue" }
]

Import JSON file to Supabase (PostgreSQL) [#import-to-supabase]#
node json2supabase.js <path_to_json_file> [<primary_key_strategy>] [<primary_key_name>]

The full path of the file you created in the previous step (Dump Firestore collection to JSON file ), such as ./my_collection.json
Is one of:
none (default) No primary key is added to the table.
smallserial Creates a key using (id SMALLSERIAL PRIMARY KEY) (autoincrementing 2-byte integer).
serial Creates a key using (id SERIAL PRIMARY KEY) (autoincrementing 4-byte integer).
bigserial Creates a key using (id BIGSERIAL PRIMARY KEY) (autoincrementing 8-byte integer).
uuid Creates a key using (id UUID PRIMARY KEY DEFAULT uuid_generate_v4()) (randomly generated UUID).
firestore_id Creates a key using (id TEXT PRIMARY KEY) (uses existing firestore_id random text as key).
Name of primary key. Defaults to "id".

I imported using this command:
node json2supabase.js ./Users.json uuid id 1000 0

If you get duplicates in your table, you can use the following SQL snippets, such as this one:

DELETE FROM users_duplicate s
USING
(SELECT firestore_id, 
max (ctid) as max_ctid
FROM users_duplicate
GROUP by firestore_id, email) t
WHERE s.ctid <> t.max_ctid 
AND s.firestore_id=t.firestore_id;

Note that I renamed the 'users' table with a low case and created a duplicate 'users_duplicate' table in case I screw something up and need to return it to its original state. This way, we can copy an exact copy of any table.

When you're done with migration you can add a foreigh key relation to 'user' column and matching by email from 'profiles' or 'users' table.

Supabase allows you to write values as NULL and in the beginning I enabled the Allow Nullable option, after filling all users in the 'profile' table I disabled it right on the fly - very handy option :)

Custom hooks

Hooks are used to customize the process of exporting a collection of Firestore documents to JSON. They can be used for:

Customizing or modifying keys
Calculating data
Flattening nested documents into related SQL tables

Write a custom hook

Create a .js file for your collection

If your Firestore collection is called users, create a file called users.js in the current folder.

Construct your .js file

The basic format of a hook file looks like this:

module.exports = (collectionName, doc, recordCounters, writeRecord) => {
  // modify the doc here
  return doc
}

Parameters

collectionName: The name of the collection you are processing.
doc: The current document (JSON object) being processed.
recordCounters: An internal object that keeps track of how many records have been processed in each collection.
writeRecord: This function automatically handles the process of writing data to other JSON files (useful for "flatting" your document into separate JSON files to be written to separate database tables). writeRecord takes the following parameters:

name: Name of the JSON file to write to.
doc: The document to write to the file.
recordCounters: The same recordCounters object that was passed to this hook (just passes it on).

Examples

Add a new (unique) numeric key to a collection

module.exports = (collectionName, doc, recordCounters, writeRecord) => {
  doc.unique_key = recordCounter[collectionName] + 1
  return doc
}

Add a timestamp of when this record was dumped from Firestore
module.exports = (collectionName, doc, recordCounters, writeRecord) => {
  doc.dump_time = new Date().toISOString()
  return doc
}

Flatten JSON into separate files

Flatten the users collection into separate files:

[
  {
    "uid": "abc123",
    "name": "mark",
    "score": 100,
    "weapons": ["toothpick", "needle", "rock"]
  },
  {
    "uid": "xyz789",
    "name": "chuck",
    "score": 9999999,
    "weapons": ["hand", "foot", "head"]
  }
]

The users.js hook file:

module.exports = (collectionName, doc, recordCounters, writeRecord) => {
  for (let i = 0; i < doc.weapons.length; i++) {
    const weapon = {
      uid: doc.uid,
      weapon: doc.weapons[i],
    }
    writeRecord('weapons', weapon, recordCounters)
  }
  delete doc.weapons // moved to separate file
  return doc
}

The result is two separate JSON files:

[
  { "uid": "abc123", "name": "mark", "score": 100 },
  { "uid": "xyz789", "name": "chuck", "score": 9999999 }
]

[
  { "uid": "abc123", "weapon": "toothpick" },
  { "uid": "abc123", "weapon": "needle" },
  { "uid": "abc123", "weapon": "rock" },
  { "uid": "xyz789", "weapon": "hand" },
  { "uid": "xyz789", "weapon": "foot" },
  { "uid": "xyz789", "weapon": "head" }
]

Here are the SQL snippets in the Supabase interface. You can combine them into multiple queries for the same table in one snippet.

In case you want to delete duplicates based on values of multiple columns, here is the query template:

DELETE FROM users_duplicate
WHERE id IN
    (SELECT firestore_id
    FROM 
        (SELECT firestore_id,
         ROW_NUMBER() OVER( PARTITION BY firestore_id,
         email
        ORDER BY firestore_id ) AS row_num
        FROM users_duplicate ) t
        WHERE t.row_num > 1 );

Find duplicates:

SELECT firestore_id, 
COUNT(1) 
FROM users_duplicate 
GROUP by firestore_id

Identify each record in a table with PostgresQL inbuilt column ctid:

SELECT firestore_id, 
ctid
FROM users_duplicate;

List down required i.e. distinct rows:

SELECT firestore_id, 
max (ctid) as max_ctid
FROM users_duplicate
GROUP by firestore_id;

We can delete duplicate records with below query:

DELETE FROM users_duplicate s
USING
(SELECT firestore_id, 
max (ctid) as max_ctid
FROM users_duplicate
GROUP by firestore_id, email) t
WHERE s.ctid <> t.max_ctid 
AND s.firestore_id=t.firestore_id;

Get column names:

SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'users_duplicate';

Update from one table to another based on a id match:

UPDATE profiles
SET email = (
SELECT
email
FROM
auth.users
WHERE
profiles.id = auth.users.id
-- AND auth.users.id = 8
);

Update first_name in profiles table from another table users_duplicate matching by email
UPDATE profiles
SET first_name = users_duplicate.first_name
FROM users_duplicate
WHERE profiles.email = users_duplicate.email;
);

Update rows in several columns at profiles table from another table users_duplicate matching by email:

UPDATE profiles
SET
first_name = users_duplicate.first_name,
second_name = users_duplicate.second_name,
username = users_duplicate.username,
avatar_url = users_duplicate.avatar_url,
phone_number = users_duplicate.phone_number,
user_metro_station = users_duplicate.user_metro_station,
birth_date = users_duplicate.birth_date,
user_location = users_duplicate.user_location,
liked_post = users_duplicate.liked_post,
about_me = users_duplicate.about_me,
user_address = users_duplicate.user_address,
created_time = users_duplicate.created_time,
firestore_id = users_duplicate.firestore_id
FROM users_duplicate
WHERE profiles.email = users_duplicate.email;

Update at admin table updated_at column with null value to the curront stams:

UPDATE admin
SET updated_at = (now() AT TIME ZONE 'utc'::timestamptz)
WHERE updated_at IS NULL;

Update at posts from null to empty string:

UPDATE posts
SET post_views = '0'::numeric
WHERE post_views IS NULL;

Update null value in column five_star to false boolean at post table:

UPDATE posts
SET five_star = 'false'::boolean
WHERE five_star IS NULL;

Update column gif_url2 at posts table from another gif_url column:

gif_url table if value is null
UPDATE posts
SET gif_url_2 = gif_url
WHERE gif_url_2 IS NULL;

This query will delete all rows in the "chat_messages" table where the "text" column is not unique, by keeping only the row with the minimum "id" for each group of duplicate rows:

DELETE FROM chat_messages
WHERE text IN (
SELECT text
FROM chat_messages
GROUP BY text
HAVING COUNT(*) > 1
) AND id NOT IN (
SELECT MIN(id)
FROM chat_messages
WHERE text IN (
SELECT text
FROM chat_messages
GROUP BY text
HAVING COUNT(*) > 1
)
GROUP BY text
);

Delete all rows in the "chat_messages" table where the "text" column is not unique, by keeping only the row with the minimum "id" for each group of duplicate rows where value of the text column is null:

DELETE FROM chat_messages
WHERE text IS NULL AND id NOT IN (
SELECT MIN(id)
FROM chat_messages
WHERE text IS NULL
GROUP BY text
);

Working with timestamptz data format

To update the "last_message_timee" column in the "chats" table with the value of the "_seconds" key from the "last_message_time" column, converted to a timestamptz value, you can use the following query:

UPDATE chats SET last_message_timee = to_timestamp(jsonb_extract_path_text(last_message_time, '_seconds')::bigint) AT TIME ZONE 'UTC'

This query uses the jsonb_extract_path_text() function to extract the value of the "_seconds" key from the "last_message_time" column, and then uses the to_timestamp() function to convert the seconds to a timestamptz value, and the AT TIME ZONE 'UTC' is used to set the time zone to UTC, and update the last_message_timee column with the result.

Now we can delete 'last_massage_time' and rename another 'last_massage_timee' column with converted data to 'last_massage_time'.

To convert jsonb to timestamptz type of data manualy we can use this awesome resource: unixtimestamp.com For example we have last_message_time column with jsonb data type:

{
  "_seconds": 1655911605,
  "_nanoseconds": 442000000
}

I don't need nanoseconds and just simply converting seconds 1655911605 to timestamptz:

Format  Seconds
GMT Wed Jun 22 2022 15:26:45 GMT+0000
Your Time Zone  Wed Jun 22 2022 18:26:45 GMT+0300 (Moscow Standard Time)
Relative    7 months ago

In conclusion, data migration can be a daunting task, but it is also an exciting opportunity to learn and grow as a data professional. I find that by approaching the process piece by piece and utilizing helpful tutorials and SQL snippets, I'm able to make steady progress. I'm reminded that even though there may be challenges and obstacles along the way, the end result is worth the effort. So grateful for the opportunity to take on this project and I'm looking forward to continuing this exciting journey. I encourage others who may be facing similar challenges to not be discouraged and to keep pushing forward, as the rewards of a successfully completed data migration project on Supabase are well worth the effort.

Thank you, guys :)

DEV Community

Adapted Firestore data migration to Supabase

Top comments (0)