DEV Community

ZeeshanAli-0704
ZeeshanAli-0704

Posted on • Updated on

How do you design a URL shortening service like TinyURL or bit.ly?

TinyURL or bit.ly takes a long URL and generates a new unique short URL. These systems are also capable of taking the shortened URL and returning the original full URL.

Step 1: Functional Requirement + Non Functional Requirement & Problems Consideration:

  • Generate a short URL having a length shorter than the original URL.
  • Store the original URL and map it to the shortened one.
  • Allow redirects in the shortened URLs.
  • Handle multiple requests at the same time.

Step 2: What are some of the Common Problems encountered?

  • What if two users input the same custom URL?
  • What happens if there are more user load than expected?
  • How do you regulate the database storage space?

Step 3:Non-Functional Requirements

  • Highly Available
  • Url should not be predictable
  • NoSQL databases for storing original URLs.

Step 4: Estimations

Read to write ratio : 100:1 => 100 Read(redirection) : 1 write

Lets say we have 1 Million DAU - 1,000,000 writes / per day (writes)
Request per sec = 1MDAU /10^5 = 10 RPS for writes

Reads per second
As ratio is 100:1 = 100X10 = 1000 Reads per second
Total 1000+10 = 1010 R/W request/seconds

STORAGE
let say size is 20KB
New data per day = 20KB _ 1M (writes) = 20 (10^3)b _ 1 (10^6) = 20x10^9 = 20GB/day

Retention Data =
5Years _ 400 Days (365 rounded off to 400) _ 20GB
= 5*400*20GB

= 2k*20GB
= 2x(10^3) * 20*(10^9)
= 2x20*(10^12)
= 40 TB

BANDWIDTH

Writes
we have => 10 writes/sec x 20KB = 200 KB/sec

Reads
we have => 1000 read/sec x 20KB = 20X1000X10^3 = 20x10^3x10^3 = 20 MB/sec

Step 5 : Encoding Method & Options

We can compute a unique hash (e.g., MD5 or SHA256, etc.) of the given URL.
The hash can then be encoded for displaying.

This encoding could be base36 ([a-z, 0-9]) or base62 ([A-Z, a-z, 0-9]) and if we add ‘-’ and ‘.’, we can use base64 encoding.

Example Base 62 Method
Using Base 62 Algo:(to generate 7 character unique str)

Will go with base 62 -> what is base 62 explained belowUsing 62 will give large no of unique combination

We will take
a-z => 26
A-Z => 26
0-9 => 10

Total 62 chars

let say for
1 we have 62 chars option to select
2 we have 62 chars * 62 chars ( 62 ways to select each char)

n 62^n ( 62 ways to select each)

62^n unique combination

In our case we have 7 chars so 62^7 3.5 Trillion combination

Why are we not using Base 10:
Reason for not using base 10
10^7 => 10 Millions unique combination only

Step 6 : Few Technique

technique 1 : Random Number Generator

generate 7 digit random number => pass this to base 64 =>Create tiny url with this => save to DB
Will have to check if that random number is already present, if not add
but different server will check random number & may be possible two req at same time generate same random & save it data goes corrupt
issue :
same random number possibility is high

technique 2 : Using MD5 but output is length & duplicate possible

MD5 Hashing
MD5 Encoding: MD5 also gives base62 output but the MD5 hash gives a lengthy output which is more than 7 characters. MD5 hash generates 128-bit long output so out of 128 bits we will take 43 bits to generate a tiny URL of 7 characters. MD5 can create a lot of collisions. For two or many different long URL inputs we may get the same unique id for a short URL and that could cause data corruption. So we need to perform some checks to ensure that this unique id doesn’t exist in the database already.
issue:
It can give same start bits of different URLS, result data correct

technique 3 : Using Counter but it can grow over time & big number

Using Counter
we can use unique counter value to generate random number.
easy for when small application
can use two counter can solve a bit but still when one counter goes down things will break

issue:
can use two counter can solve a bit but still when one counter goes down things will break
counter reset & range selection is different

technique 4 : Zookeeper Good option
Zookeper ( help in co-ordination between server & services)

  • we will store range in Zookeper associated with server
  • all server will take responsibility of providing unique nunber i.e counter number
  • keep some configuration info for each server / host

technique 5 : Key Generation Service (KGS)

We can have a standalone Key Generation Service (KGS) that generates random six letter strings beforehand and stores them in a database (let’s call it key-DB).
Whenever we want to shorten a URL, we will just take one of the already generated keys and use it.
This approach will make things quite simple and fast since we will not be encoding the URL or worrying about duplications or collisions.
KGS will make sure all the keys inserted in key-DB are unique.

Step 7 : API Methods

createURL(api_dev_key, original_url, expire_date=None);
getURL(apiKey, shortURL);
deleteURL(api_dev_key, url_key)

Enter fullscreen mode Exit fullscreen mode

api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota.
original_url (string): Original URL to be shortened.
custom_alias (string): Optional custom key for the URL.
user_name (string): Optional user name to be used in encoding.
expire_date (string): Optional expiration date for the shortened URL.

Step 8 :Database Schema:

DataBase:
we will opt for No SQL - Eventual Consistency - Highly Available - Easy Scaling

Image description

What kind of database should we use ?
Since we are likely going to store billions of rows and we don’t need to use relationships between objects – a NoSQL key-value store like Dynamo or Cassandra is a better choice, which would also be easier to scale.
If we choose NoSQL, we cannot store UserID in the URL table (as there are no foreign keys in NoSQL), for that we would need a third table which will store the mapping between URL and the user.

Top comments (0)