DEV Community

Cover image for Arrays in SQL? Yay or Nay?
66np
66np

Posted on

Arrays in SQL? Yay or Nay?

Photo by Naim Benjelloun

Disclaimer: Coming from a dev perspective, NOT DBA.

I'm visiting this topic for an nth time, and decided to illustrate use cases (with and without), and come up with a pros/cons list.

Problem: In a database for a social networking platform, suppose I need to organize the data for a user's friends.

This can be done in 1 of 2 ways:

1. Within a singular table ('user profile')

| id | name | friends_id |
--------------------------
| 1  | jess | {3, 4}     |
| 2  | mike |            |
| 3  | raj  | {1}        |
| 4  | sam  | {1}        |
Enter fullscreen mode Exit fullscreen mode

2. Split into two tables ('user_profile', 'user_friends')

| id | name |
-------------
| 1  | jess |
| 2  | mike |
| 3  | raj  |
| 4  | sam  |

| id | user_id | friend_id |
----------------------------
| 1  | 1       | 3         |
| 2  | 1       | 4         |
| 3  | 3       | 1         |
| 4  | 4       | 1         |
Enter fullscreen mode Exit fullscreen mode

PROS:

  • Spatial locality allows us to minimize the number of query calls to the DB which can be a bit of performance booster (i.e. INSERT INTO user_profile (name, friends_id) VALUES ('jess', ARRAY[3, 4]); VS. INSERT INTO user_profile (name) VALUES ('jess'); INSERT INTO user_friends (user_id, friend_id) VALUES (1, 3); INSERT INTO user_friends (user_id, friend_id) VALUES (1, 4);)
  • Reduces data redundancy (i.e. the multiple records for user id 1 under table 'user_friends')
  • Can be especially useful when processing large amounts of data (i.e. PostGIS point data, ML datasets)

CONS:

  • Breaks database normalization rules like 1NF (First Normal Form) being a multi-valued attribute which introduces CRUD anomalies (i.e. UPDATE user_profile SET friends_id = array_append(friends_id, 2) WHERE id = 1; VS. INSERT user_friends (user_id, friend_id) VALUES (1, 2);)
  • Breaks database normalization rules like introducing null values where the data can be normalized (i.e. User with id 2 in the first example has 0 friends, consequently nulling the friends_id column. This could've been prevented by introducing the second table 'user_friends'.)
  • Does not allow for modularity or new information per entry within the array (What if I wanted to date when each friendship was initiated through the app? Order the data based on this new attribute?)
| id | name |
-------------
| 1  | jess |
| 2  | mike |
| 3  | raj  |
| 4  | sam  |

| id | user_id | friend_id | created_at |
-----------------------------------------
| 1  | 1       | 3         | 2020-09-02 |
| 2  | 1       | 4         | 2020-03-20 | 
| 3  | 3       | 1         | 2020-09-02 |
| 4  | 4       | 1         | 2020-03-20 |
Enter fullscreen mode Exit fullscreen mode
  • Not a lot of SQL Engines and ORMS are optimized to handle arrays and lists (i.e. from experience, the schema file generated by Diesel ORM for Rust tends to mess up on optional array fields, as well as reported issues on migrations to MySQL/SQLite)

What are your thoughts?

Top comments (1)

Collapse
 
darkain profile image
Vincent Milum Jr

Use JSON syntax rather than your own custom syntax for handling ARRAYs, as most SQL engines today can natively handle that. This also solves most of the issues, as they include methods to insert/remove items from a JSON object, rather than writing a whole new object. Some database system, like MySQL 8.0, can even index the individual elements of an ARRAY.