When upgrading my GitLab CE instance to 18.5.1, the upgrade repeatedly failed during database migrations with this Ruby error:
undefined method `id' for nil:NilClass
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/background_migration/fix_non_existing_timelog_users.rb:12:in `perform'
The failure happened inside a batched background migration finalize step called FinalizeHkFixNonExistingTimelogUsers. This post documents how I diagnosed and fixed it, without rolling back or losing data.
Environment and Symptoms
The environment:
- Ubuntu 24.04
- GitLab CE 18.5.1 (
/opt/gitlab/embedded/service/gitlab-rails) - PostgreSQL 16.10 as the main DB (
gitlabhq_production)
Every time I ran:
sudo gitlab-rake db:migrate
I saw:
main: == 20250916232115 FinalizeHkFixNonExistingTimelogUsers: migrating
...
undefined method `id' for nil:NilClass
The stack trace pointed to Gitlab::BackgroundMigration::FixNonExistingTimelogUsers, executed from the finalize migration 20250916232115_finalize_hk_fix_non_existing_timelog_users.rb.
gitlab-ctl reconfigure also failed because it runs database migrations as part of the omnibus setup.
Step 1: Inspect background migrations
GitLab provides a Rake task to list batched background migrations and their status. I started there:
sudo gitlab-rake gitlab:background_migrations:status \
NAME=FixNonExistingTimelogUsers
This printed a long list of migrations with states like finished, finalized, and a few active. The row for FixNonExistingTimelogUsers showed status = 5 in the database, which corresponds to finalized in GitLab’s internal enum.
However, the finalize migration was still trying to run a job for that background migration, and that job crashed when it encountered a timelog whose associated record was nil (hence .id on nil). This behavior matches a known GitLab bug where this background migration fails in some edge cases.
At this point I knew:
- The background migration itself was already marked as finalized.
- The failing piece was the post‑migration finalize step (
FinalizeHkFixNonExistingTimelogUsers).
Step 2: Try the official helper (and why it didn’t work)
GitLab’s docs recommend using helper Rake tasks to manage batched migrations, such as marking jobs as succeeded or requeuing them.
I tried:
sudo gitlab-rake gitlab:background_migrations:mark_all_jobs_as_succeeded \
NAME=FixNonExistingTimelogUsers
But this failed with:
Don't know how to build task 'gitlab:background_migrations:mark_all_jobs_as_succeeded'
On this version/packaging of GitLab CE, that helper task simply doesn’t exist, so I couldn’t use the documented way to skip the broken migration.
Step 3: Inspect batched_background_migrations directly
Since the helper didn’t exist, the next step was to look directly at the batched_background_migrations table in PostgreSQL.
Connect to the DB:
sudo gitlab-psql -d gitlabhq_production
Then:
SELECT id, job_class_name, status
FROM batched_background_migrations
WHERE job_class_name = 'FixNonExistingTimelogUsers';
The result:
id | job_class_name | status
-----+----------------------------+--------
548 | FixNonExistingTimelogUsers | 5
(1 row)
According to GitLab’s enum mapping, status = 5 means finalized, which confirmed the background migration was already considered done.
Initially I tried to set status = 'finished', but that failed because status is a numeric (smallint) column, not a string. There was no need to force this value anyway; it was already at the most “done” state.
Step 4: Identify the real culprit – the finalize migration
db:migrate kept trying to run this migration:
== 20250916232115 FinalizeHkFixNonExistingTimelogUsers: migrating
The traceback showed it calling ensure_batched_background_migration_is_finished, which in turn tried to run another batch job for FixNonExistingTimelogUsers, even though its status was already finalized. That job raised undefined method 'id' for nil:NilClass, and the whole migration aborted.
So the root cause was:
- A post‑migration (
FinalizeHkFixNonExistingTimelogUsers) that was incorrectly trying to execute a finalized background migration and hitting a data edge case.
GitLab has documented similar situations where a buggy batched migration requires a follow‑up fix or manual intervention.
Step 5: Mark the finalize migration as applied in schema_migrations
On my GitLab version, schema_migrations only has a version column (no dirty flag), so the simplest fix was to tell Rails, “this finalize migration is already applied.”
Again in psql:
sudo gitlab-psql -d gitlabhq_production
Check if the version exists:
SELECT version
FROM schema_migrations
WHERE version = '20250916232115';
If the query returns no rows, insert it:
INSERT INTO schema_migrations (version)
VALUES ('20250916232115')
ON CONFLICT (version) DO NOTHING;
Then exit:
\q
This explicitly marks 20250916232115 FinalizeHkFixNonExistingTimelogUsers as already applied, so db:migrate will skip it entirely.
Step 6: Rerun migrations and reconfigure
After inserting that row, I reran:
sudo gitlab-rake db:migrate
sudo gitlab-ctl reconfigure
This time:
-
db:migratedid not print== 20250916232115 FinalizeHkFixNonExistingTimelogUsers: migrating. - The
undefined method 'id' for nil:NilClasserror disappeared. -
gitlab-ctl reconfigurecompleted successfully, and GitLab came up cleanly on 18.5.1.
Lessons learned
A few takeaways from this troubleshooting:
- Batched background migrations can block upgrades. When a finalize post‑migration insists on finishing a batched job that has a bug, you can be stuck even if the underlying migration is already marked as finalized.
-
GitLab’s helper Rake tasks differ by version. On some installations,
gitlab:background_migrations:mark_all_jobs_as_succeededand similar helpers may not exist, and you must operate directly at the DB layer. -
schema_migrationsis the source of truth for Rails migrations. If a finalize migration is purely orchestration around an already‑finalized batched job, marking that migration version as applied is a safe and effective way to unblock the upgrade, especially when there’s an upstream bug acknowledged in issues/MRs.
If you hit the same FinalizeHkFixNonExistingTimelogUsers error on GitLab CE 18.x and see FixNonExistingTimelogUsers already finalized in batched_background_migrations, inserting the migration version into schema_migrations as shown above should let you move forward while you watch for the official upstream fix.
Top comments (0)