How would you approach use cases where the workers need to "commit" their result outside of the database? For example, backing up a table to a file system. Say the task is to backup table foo to file /backups/foo.csv. The worker can create a temp file with the backup and can rename it to the final name /backups/foo.csv as to make sure that only one worker succeeds. Unfortunately the worker won't be able to do that in the same transaction that deletes the work item. I guess one approach may be to rename the temp file before deleting the item from the database hence actually "committing" the work before the item is deleted. If the worker dies immediately after, the next worker that picks the work item would notice that the work has been done by checking for the existence of file named /backups/foo.csv and will delete the item from the table.
I am curious what your thoughts are on the above.
Yeah the fact that the backup is itself an idempotent operation means that there's no hazard here due to the interaction with the external system. Even if the second iteration could not detect that the first iteration had completed its work, everything here would be okay. The detection of the file utilized the fact that rename is an atomic operation that itself does indicate complete success.
The problem becomes much trickier if instead the task has non-idempotent side-effects. Coordinating with sending a push-notification for example is a real bummer. Sometimes those sorts of systems will provide a mechanism whereby the publishing of the event is made to be idempotent. For example documentation.onesignal.com/docs/i....
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Great article!
How would you approach use cases where the workers need to "commit" their result outside of the database? For example, backing up a table to a file system. Say the task is to backup table
foo
to file/backups/foo.csv
. The worker can create a temp file with the backup and can rename it to the final name/backups/foo.csv
as to make sure that only one worker succeeds. Unfortunately the worker won't be able to do that in the same transaction that deletes the work item. I guess one approach may be to rename the temp file before deleting the item from the database hence actually "committing" the work before the item is deleted. If the worker dies immediately after, the next worker that picks the work item would notice that the work has been done by checking for the existence of file named/backups/foo.csv
and will delete the item from the table.I am curious what your thoughts are on the above.
Yeah the fact that the backup is itself an idempotent operation means that there's no hazard here due to the interaction with the external system. Even if the second iteration could not detect that the first iteration had completed its work, everything here would be okay. The detection of the file utilized the fact that rename is an atomic operation that itself does indicate complete success.
The problem becomes much trickier if instead the task has non-idempotent side-effects. Coordinating with sending a push-notification for example is a real bummer. Sometimes those sorts of systems will provide a mechanism whereby the publishing of the event is made to be idempotent. For example documentation.onesignal.com/docs/i....