DEV Community

Cover image for SQL Antipatterns: Jaywalking
Monir Hossain
Monir Hossain

Posted on

SQL Antipatterns: Jaywalking

Introduction

Antipatterns are techniques that are implemented to solve a specific problem but rather they create new problems.

Antipatterns are commonly used to solve common development problems. At first glance, they might seem like appropriate and clever solutions but instead, they create more unwanted coincidences than good ones. The term "antipattern" was first coined by Andrew Koenig in his paper "Patterns and Antipatterns" in 1995. The concept of design patterns inspired him. In his paper, he refers to antipatterns as

"An antipattern is just like a pattern, except that instead of a solution, it gives something that looks like a solution but isn’t one."

SQL Antipatterns refer to antipatterns that we use to solve database design problems.

Jaywalking

Assume You are working on a Ticketing system where each ticket is handled by one employee. when a Ticket is raised by the customer the customer care department assigns the ticket to the responsible Employee to resolve customer issues. The process is fairly simple however often customer raises a ticket where multiple people are needed to solve the issue. Since your system can only assign one employee to a ticket the work of the other employees is not reflected on their monthly KPI. So you are assigned to fix the issue. You always try to avoid joining since they can increase query time so you came up with a clever solution to use a comma-separated list of user identifiers instead of the single identifier it used before.

However, soon your boss came up to you complaining that they can't assign more than 5 users. If they try to add more. they get an error. After examining the error you understand the list of IDs has to fit in a string with maximum length, If the maximum length exceeds an exception arises.

This method of using a comma-separated list to avoid using an intersection table for a many-to-many relationship is called Jaywalking, because jaywalking is also an act of avoiding an intersection.

Objective

Initially, You have a Ticketing system where each ticket is assigned to one user. Storing the data of one user is fairly simple. You associate each ticket with a user using the UUID column in the Ticket table. So each user may handle many tickets but each ticket will be assigned to one user. As the product matures you understand that often it is necessary to assign tickets to multiple users.

Image description

Fig.1 - Initial Design

Antipattern: Format Comma Separate List

To minimize changes in the database and avoid joining you decide to alter the "assiged_user" field to a comma-separated list to store the list of assiged users.

Image description

Fig.2 - Antipattern Design

This might seem like a good solution since you avoiding adding new tables and columns but let's look at some of the performance and integrity issues this design will suffer from.

  • Querying Ticket for a Specific User: It is difficult to query Tickets for a specific user when foreign keys are combined into a single field. Straight Forward equality check will not work. You have to use pattern matching and will not get the benefit of indexing. Pattern matching expression is different for each vendor, so your query will not be vendor-natural.
  • Querying User For a Given Ticket: Likewise It is costly to join User and Ticket Table when User IDs are represented as a comma-separated list. You have to use awkward regular expressions to match rows in different tables.
  • Making Aggregate Query: Aggregate functions are designed to work over a list of rows, not on comma-separated list, so It will be tricky to use them in code. moreover, these tricks will make queries less clear, time-consuming to develop, and hard to debug.
  • Updating User for a Specific Ticket: To assign a ticket you can contact the ID end of the string but the list will not be in sorted order. To Delete a user from a ticket you need to perform two queries one to fetch the current comman-seperated ID list and another to update the row with a new list.
  • Validating User IDs: Since IDs do not refer to any field it is possible to add garbage data and lose data integrity.
  • Choosing a Separator Character: It is also necessery to choose separator cautiously since the separator can not appear in any IDs.
  • List Length Limitation: How many IDs can be added to the list is not fixed since It is based on ID size. If the maximum length is 30 and each ID consist of 2 characters you will manage to add boos10 User to a ticket but if the size of the ID is not fixed the number above will vary.

There are some legitimate uses of this pattern. like your system might need data in a comma-separated format and you don't need to access individual items in a list. Likewise, if your application receives data in comma-separated format you just need to store them as it is.

Solution

A better solution would be to create an intersection table "TicketAssignee" of the Ticket and User table that indicates the Many-to-Many relation between User and Ticket.

Image description

Fig.3 - Intersection Table Design

Let's see how the intersection table solves all the problems we see in Antipattern.

  • Querying Tickets by User and the Other Way Around: With intersection table query on Tickets and TicketAssigneee table is much straight forward. With indexing this query will perform better than the query on a common-separated string.
  • Making Aggregate Queries: Aggregate functions can be used to generate more sophisticated reports.
  • Updating Ticket Assignee for a Specific Ticket: Add and Delete Ticket Assignee can be possible by inserting and deleting rows in TicketAssignee Table.
  • Validating User IDs: Invalid user ID add is not possible since ForeignKey validates the value against valid entries.
  • Choosing a Separator Character: Choosing a separator character is not necessary since each entry is stored in separate rows.
  • List Length Limitations Since entries are stored in separate rows. The list is only limited by the maximum number of rows that can physically exist in one table.

Reference

Top comments (0)