Recently, I've been working on a side project that allows users to manage and schedule tasks. Initially, I thought that the database design would be pretty simple. But, I quickly realized that handling recurring events is much more complex than I initially anticipated. Recurrence is not easy for applications to deal with. It is a herculean task, especially when you consider all possible recurring scenarios - including creating bi-weekly or quarterly events or allowing the rescheduling of all future event instances. So, How should we handle the database storage for recurring events?
In fact, this problem is pretty similar to the system design interview question:
Consider Google Calendar's recurring events feature. In particular, the ability of a user to create an ad-hoc or recurring event.
In case of recurring events, edit either individual instances of the event or all the remaining instances of the event
How would you design a schema to store event data to support such a feature?
Requirements:
Users are allowed to create regular and recurring events.
Daily, weekly, bi-weekly, monthly, quarterly, bi-yearly, and yearly events can be created with no end date restrictions.
Users can reschedule or cancel an instance of an event or all future instances of an event.
Changes to schedules should not update past events
Naive approach
In the naive approach, the instinct was to store every single instance of a recurring event separately. This led to the creation of two tables, namely tasks and schedules, as illustrated in the schema below:
Here, a user could create a schedule with a set frequency, choosing from options like weekly, daily, or monthly. A background cron job would then generate associated tasks based on the schedule.
Pros of the Naive Approach:
- Straightforward implementation.
Cons of the Naive Approach:
- Significant space requirements, especially for a large user base.
- Cumbersome updating process, impacting application performance.
- Complex handling of exceptions, especially in cases of rescheduling.
- How do we handle recurring events with no end_date ?
Creating a database entry for each instance of a recurring task seems unacceptable, because we want infinite forward visibility, and because when a user edits the parameters of a recurring task (e.g. changes it from daily to weekly), we have to update all future tasks. This results in operations that are simply too computationally intensive.
Refined approach
This approach envisions infinite sequences of recurring events without cluttering the database. Here's how it works:
Schedule essentially defines infinite sequences of recurring events, which are not represented in the database, but are visible and editable via the UI
Exclusion Records for Deletions: Deleting a specific instance prompts the creation of an "exclusion" record for that day. This prevents unnecessary generation of instances in the recurrence sequence.
Handling Edits with Exclusion Records: Editing a specific instance triggers the creation of an "exclusion" record for that day. Simultaneously, a database record is generated for the edited instance, seamlessly transforming it into a regular task.
Historical Records for Completed Tasks: Completed or past-due tasks get recorded as historical records. This not only preserves a comprehensive history but also prevents miscalculations when editing the entire schedule in the future.
The refined approach avoids the computational intensity of creating a separate database entry for each instance of a recurring task. Instead, it ensures infinite forward visibility by generating instances programmatically based on recurring patterns. We could also improve on read performance by introducing a caching layer.
That's it! Let me know if you've any suggestions for this design, I'd love to hear and learn more!
Helpful Links
https://martinfowler.com/apsupp/recurring.pdf
Top comments (0)