DEV Community

Cover image for SlimUUID: The Compact, Memory-Efficient Alternative to Standard UUIDs
DHRUV MEHTA
DHRUV MEHTA

Posted on • Edited on

5 1

SlimUUID: The Compact, Memory-Efficient Alternative to Standard UUIDs

Lets start with a Banger that is Benchmark testing : Fast , Best are slimUUID , Last one is Google UUID .

BenchMark Testing

Table of contents :

  1. Introduction
  2. Existing Google UUID
  3. Introducing SlimUUID
  4. Performance Testing
  5. Collison Probabilty
  6. Additional Advantages of SlimUUID

1. Introduction

See the title image for a quick comparison if you're looking for a better memory-efficient UUID alternative!

In today's data-driven world, unique identifiers are essential for managing records in databases. However, traditional UUIDs (Universally Unique Identifiers) can be bulky and inefficient. SlimUUID is a compact, memory-efficient alternative designed to reduce database storage costs and enhance performance. As you guys have lot to read so , I shall get to the point directly.

A few days back I was working on a Database Model which involved the below definition:

type Mcq struct {
        gorm.Model
    ID      string   `gorm:"type:uuid,primaryKey"`
    Answer      string   `gorm:"not null"`
    StudentID   string   `gorm:"type:uuid,index:idx,composite"`
    ContestID   string   `gorm:"type:uuid,index:idx,composite" 
    Student     Student     
    Contest     Contest     
}
Enter fullscreen mode Exit fullscreen mode

As you can see there are ID , ContestID , StudentID 3 UUIDs are present . and these all are indexed .

  • ID as primary Key (btree default )
  • ContestID , StudentID (composite can't be hashed in hash table !!)

Now while calculating and maintaining my DB's space I encountered a sad :(((( , no worse :(((( , actually evil fact that out of my table's total most of the space was taken by attributes which are just helper , which actually doesn't contain actual data .

single entry : ( 36 + 36 + 36 + 10 : base) + (64 + 64 + 64 : index)
             :  310 most of this is index 

index calculation : 36 (string) + 8 (pointer) + 16 (node) + 4 (pad)

Enter fullscreen mode Exit fullscreen mode

I had questions :

  • Isnt all this making my DB slow ?
  • Isnt this taking lot of space ?
  • Why do i need time stamps like created_at ?
  • Can't I do something ?

I've already decoded these riddles, so if you're stuck in the same loop of confusion, welcome to the club . This blog post guides you through how SlimUUID addresses these database inefficiencies by being a memory-efficient UUID generator.

2. Existing Google UUID

Here's a quick overview of standard UUIDs and their limitations , if you wanna read more about code and functions go to : Github

  1. UUID is an Unique Identifier used as a primary key or unique key for identifiaction .
  2. Have different versions v3,v4,v6 for various reasons .
  3. Has 32 chars with 4 hyphens that makes total of 36 char string .

Problems I encountered :

  1. Takes a lot of space , and if by chance , by chance (no i didn't make a mistake by rrewriting it.... did it intensionalyyy) made compositely indexed with other attribute then , This would significantly increases storage costs and slows down database performance .

Khatam Tata , GoodBYE !!!

  1. Do not accept hash types like type-pattern ops , cids , any available in pg admin .

3. Introducing SlimUUID

To combat inefficiencies, the SlimUUID was created!

Phase1 : Integer Hashes
What about Integer hashes ?
Isn't the query fastest in it ... So just make Time.Now() timestamp for milliseconds or nanoseconds till now simple , fast retrieval .
Problem in this approach :

  1. Clock Synchronization in Different systems
  2. Sql generates data in microseconds , so even if we consodered microseconds => 13-14 digits and increasing digits every microsecond , loose control of fix size primary key (for hashing)

Phase2 : MD5 Hashes
So why not take any MD5 hash or something that can help , unique always (Yes!!!!!!!!) Right ???
Problem :

  1. Not at all MD5 hashes and most of the cryptographic hashes are even larger than 32 characters so it would be needed to truncated leading to exponentially high collision risk .

The Final Solution :

Then I saw the created_at timestamp with my id in DB. Then I came with the ID why not just make something with mixing phase1 and phase2:

Then I thought of a solution :

Timetamp + MD5 hash

This is same the way how uuids by google are made 32 character have blocks of different time stamps and needs .
Problems :

  1. timestamp for 30 years or more already consume 13-15 digits .
  2. MD5 hashes takes time to build and also truncating it too .

Then at that moment slimUUID was made as I clicked two things for corresponding problem , each character in timestamp was taking '0-9' chracters potentially it could hold more ..... Huh...

_Eureka !!! Eureka !!!!! I found the answer

So, I changed it and created my own format (as shown in the code window). This new format directly reduced the 13 characters to 7-8 (depending on the precision needed). I had plenty of space left, so I generated another hash using MD5 from the remaining 12 characters.

However, MD5 was cryptographic, but after reading about it, I came across Murmur3 which is fast but not cryptographically secure backward (and I didn't need that security either Heheheheeh).

So murmur3 is used to generate has with two parts because Murmur3 offers a good balance of speed and low collision risk, making it ideal for this purpose , which are then combined with conversion into binary encoding , after these two timestamp and murmur3hash has been a combined a 19 character string is formed . This hash can not clash after a millisecond has passed . But in micro , nano second time line it can be clashed but that is mitigated byr Random hash generated by Murmur3's truncated part.

This brought down my generation time from about 50 microseconds to 20 microseconds. Then, with the increasing number of systems, what to do? So, I added a character for each machine, which now can handle up to 62 concurrent machines uniquely .

Character Set:

Base-62: 0-9 (10 chars) + a-z (26 chars) + A-Z (26 chars) = 62 total.
Structure:

Year (1 char, 62 possible values, starting from 2025).
Month (1 char, up to 12 possible values).
Day (1 char, up to 31 possible values).
Hour (1 char, up to 24 possible values).
Minute (1 char, up to 60 possible values).
Millisecond (2 chars, derived from current second + millisecond).
Optional Machine ID (1 char, up to 60 devices).
Murmur3 Hash (8-12 chars).

Total Characters:

19 characters (without machine ID).
20 characters (with optional machine ID).
Enter fullscreen mode Exit fullscreen mode

For whole code and testing u can use

import "github.com/theMitcondria/slimuuid"

func main(){
     id := slimuuid.Generate()
     // output example : 2mV5F7b6y0fHXJK9tz
}

// or visit github for the code base
Enter fullscreen mode Exit fullscreen mode

Quick Recap of SlimUUID features.

  • Reduced Storage Space
  • Significatly Impoved performace speed for data generation .

4. Performance Testing

I started writing the code , began testing on a code base to see how much better it was performing. I had a simple setup where there was just one ID and one foreign key, along with other attributes , without this adjustment of my hashed key the statistics were ,

70k entries took 2-3ms
for 70k entries resulting in => 1.5-2 ms

how much better it was performing with 7lakh entries (10 times ) with hashed 19 character slimuuid :

7lakh entries (10 times ) with hashed 19 character slimuuid

)

Last but not at all the Least : Memory-Efficiency

type Mcq struct {
    ID      string   `gorm:"type:uuid,primaryKey"`
    Answer      string   `gorm:"not null"`
    StudentID   string   `gorm:"type:uuid,index:idx,composite"`
    ContestID   string   `gorm:"type:uuid,index:idx,composite" 
    Student     Student     
    Contest     Contest     
}
// same model which neede 310 bytes now needs:
=> (19 + 19 + 19 + 10 Base) + (39 + 39 + 39 ) : 184 

// index size 
=> 19 + 8 (pointer) + 4(collision mitigation ) + 8 (hash value): 39

// Memory Efficiency 
=> 184/310 : 60%

Enter fullscreen mode Exit fullscreen mode

The adoption of SlimUUID has resulted in reducing memory efficiency by almost 60% .

5. Collison Probabilty

Then for surety I checked on various models for collision probabilty with birtthday formula :

Results :

Collison Overview

Huhhh !!! Stillllll , I thought I need an exact number ....

Collison Probability

6. Additional Advantages of SlimUUID

  1. Was able to implement hash with SlimUUID with pattern-ops which UUID with Postgres didn't allow.
  2. After changing to hashed, a sudden performance increase of 15% on average was seen in response time when calculated for searching. This enhancement underscores the potential of SlimUUID as a fast UUID alternative.
  3. SlimUUIDs embed timestamp information, eliminating the need for a separate 'created_at' column. Hence can do search with "character%" search in DB. This means the speed gets highly improved for searching Do check out : github.com/theMitocondria/slimuuid

If you're looking for UUID alternatives or a better space efficient alternatives for UUID, then slimUUID is your thing!

Any suggestions or pull requests are appreciated . And do read ReadME.md and use it once . https://github.com/theMitocondria/slimuuid

This is my first blog and I am an amateur in DBs , and optimization so if any of this can be rewritten or changes needs to be done i will be highly obliged to do so .

Thanks a lot for giving it a read 😊

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more