Lets start with a Banger that is Benchmark testing : Fast , Best are slimUUID , Last one is Google UUID .
Table of contents :
- Introduction
- Existing Google UUID
- Introducing SlimUUID
- Performance Testing
- Collison Probabilty
- Additional Advantages of SlimUUID
1. Introduction
See the title image for a quick comparison if you're looking for a better memory-efficient UUID alternative!
In today's data-driven world, unique identifiers are essential for managing records in databases. However, traditional UUIDs (Universally Unique Identifiers) can be bulky and inefficient. SlimUUID is a compact, memory-efficient alternative designed to reduce database storage costs and enhance performance. As you guys have lot to read so , I shall get to the point directly.
A few days back I was working on a Database Model which involved the below definition:
type Mcq struct {
gorm.Model
ID string `gorm:"type:uuid,primaryKey"`
Answer string `gorm:"not null"`
StudentID string `gorm:"type:uuid,index:idx,composite"`
ContestID string `gorm:"type:uuid,index:idx,composite"
Student Student
Contest Contest
}
As you can see there are ID , ContestID , StudentID 3 UUIDs are present . and these all are indexed .
- ID as primary Key (btree default )
- ContestID , StudentID (composite can't be hashed in hash table !!)
Now while calculating and maintaining my DB's space I encountered a sad :(((( , no worse :(((( , actually evil fact that out of my table's total most of the space was taken by attributes which are just helper , which actually doesn't contain actual data .
single entry : ( 36 + 36 + 36 + 10 : base) + (64 + 64 + 64 : index)
: 310 most of this is index
index calculation : 36 (string) + 8 (pointer) + 16 (node) + 4 (pad)
I had questions :
- Isnt all this making my DB slow ?
- Isnt this taking lot of space ?
- Why do i need time stamps like created_at ?
- Can't I do something ?
I've already decoded these riddles, so if you're stuck in the same loop of confusion, welcome to the club . This blog post guides you through how SlimUUID addresses these database inefficiencies by being a memory-efficient UUID generator.
2. Existing Google UUID
Here's a quick overview of standard UUIDs and their limitations , if you wanna read more about code and functions go to : Github
- UUID is an Unique Identifier used as a primary key or unique key for identifiaction .
- Have different versions v3,v4,v6 for various reasons .
- Has 32 chars with 4 hyphens that makes total of 36 char string .
Problems I encountered :
- Takes a lot of space , and if by chance , by chance (no i didn't make a mistake by rrewriting it.... did it intensionalyyy) made compositely indexed with other attribute then , This would significantly increases storage costs and slows down database performance .
Khatam Tata , GoodBYE !!!
- Do not accept hash types like type-pattern ops , cids , any available in pg admin .
3. Introducing SlimUUID
To combat inefficiencies, the SlimUUID was created!
Phase1 : Integer Hashes
What about Integer hashes ?
Isn't the query fastest in it ... So just make Time.Now() timestamp for milliseconds or nanoseconds till now simple , fast retrieval .
Problem in this approach :
- Clock Synchronization in Different systems
- Sql generates data in microseconds , so even if we consodered microseconds => 13-14 digits and increasing digits every microsecond , loose control of fix size primary key (for hashing)
Phase2 : MD5 Hashes
So why not take any MD5 hash or something that can help , unique always (Yes!!!!!!!!) Right ???
Problem :
- Not at all MD5 hashes and most of the cryptographic hashes are even larger than 32 characters so it would be needed to truncated leading to exponentially high collision risk .
The Final Solution :
Then I saw the created_at timestamp with my id in DB. Then I came with the ID why not just make something with mixing phase1 and phase2:
Then I thought of a solution :
Timetamp + MD5 hash
This is same the way how uuids by google are made 32 character have blocks of different time stamps and needs .
Problems :
- timestamp for 30 years or more already consume 13-15 digits .
- MD5 hashes takes time to build and also truncating it too .
Then at that moment slimUUID was made as I clicked two things for corresponding problem , each character in timestamp was taking '0-9' chracters potentially it could hold more ..... Huh...
_Eureka !!! Eureka !!!!! I found the answer
So, I changed it and created my own format (as shown in the code window). This new format directly reduced the 13 characters to 7-8 (depending on the precision needed). I had plenty of space left, so I generated another hash using MD5 from the remaining 12 characters.
However, MD5 was cryptographic, but after reading about it, I came across Murmur3 which is fast but not cryptographically secure backward (and I didn't need that security either Heheheheeh).
So murmur3 is used to generate has with two parts because Murmur3 offers a good balance of speed and low collision risk, making it ideal for this purpose , which are then combined with conversion into binary encoding , after these two timestamp and murmur3hash has been a combined a 19 character string is formed . This hash can not clash after a millisecond has passed . But in micro , nano second time line it can be clashed but that is mitigated byr Random hash generated by Murmur3's truncated part.
This brought down my generation time from about 50 microseconds to 20 microseconds. Then, with the increasing number of systems, what to do? So, I added a character for each machine, which now can handle up to 62 concurrent machines uniquely .
Character Set:
Base-62: 0-9 (10 chars) + a-z (26 chars) + A-Z (26 chars) = 62 total.
Structure:
Year (1 char, 62 possible values, starting from 2025).
Month (1 char, up to 12 possible values).
Day (1 char, up to 31 possible values).
Hour (1 char, up to 24 possible values).
Minute (1 char, up to 60 possible values).
Millisecond (2 chars, derived from current second + millisecond).
Optional Machine ID (1 char, up to 60 devices).
Murmur3 Hash (8-12 chars).
Total Characters:
19 characters (without machine ID).
20 characters (with optional machine ID).
For whole code and testing u can use
import "github.com/theMitcondria/slimuuid"
func main(){
id := slimuuid.Generate()
// output example : 2mV5F7b6y0fHXJK9tz
}
// or visit github for the code base
Quick Recap of SlimUUID features.
- Reduced Storage Space
- Significatly Impoved performace speed for data generation .
4. Performance Testing
I started writing the code , began testing on a code base to see how much better it was performing. I had a simple setup where there was just one ID and one foreign key, along with other attributes , without this adjustment of my hashed key the statistics were ,
how much better it was performing with 7lakh entries (10 times ) with hashed 19 character slimuuid :
)
Last but not at all the Least : Memory-Efficiency
type Mcq struct {
ID string `gorm:"type:uuid,primaryKey"`
Answer string `gorm:"not null"`
StudentID string `gorm:"type:uuid,index:idx,composite"`
ContestID string `gorm:"type:uuid,index:idx,composite"
Student Student
Contest Contest
}
// same model which neede 310 bytes now needs:
=> (19 + 19 + 19 + 10 Base) + (39 + 39 + 39 ) : 184
// index size
=> 19 + 8 (pointer) + 4(collision mitigation ) + 8 (hash value): 39
// Memory Efficiency
=> 184/310 : 60%
The adoption of SlimUUID has resulted in reducing memory efficiency by almost 60% .
5. Collison Probabilty
Then for surety I checked on various models for collision probabilty with birtthday formula :
Results :
Huhhh !!! Stillllll , I thought I need an exact number ....
6. Additional Advantages of SlimUUID
- Was able to implement hash with SlimUUID with pattern-ops which UUID with Postgres didn't allow.
- After changing to hashed, a sudden performance increase of 15% on average was seen in response time when calculated for searching. This enhancement underscores the potential of SlimUUID as a fast UUID alternative.
- SlimUUIDs embed timestamp information, eliminating the need for a separate 'created_at' column. Hence can do search with "character%" search in DB. This means the speed gets highly improved for searching Do check out : github.com/theMitocondria/slimuuid
If you're looking for UUID alternatives or a better space efficient alternatives for UUID, then slimUUID is your thing!
Any suggestions or pull requests are appreciated . And do read ReadME.md and use it once . https://github.com/theMitocondria/slimuuid
This is my first blog and I am an amateur in DBs , and optimization so if any of this can be rewritten or changes needs to be done i will be highly obliged to do so .
Thanks a lot for giving it a read 😊
Top comments (0)