Applications are currently used by people all over the world. The applications have the following features.
- Have millions of users
- Store a large amount of data that amounts from petabytes to exabytes(TB-EB)
- Require performance from ms to μs
- Handle millions of requests per second
The number TB-EB may be difficult to understand because it is a number of digits that is not often used in daily life.
Hardware is always required to run the software. It is necessary to select the hardware that is suitable for the features of the software to be run. Understanding the amount of data will help you choose the memory and storage of your hardware. Hardware selection ranges from on-premises server devices and virtual server instance types to PCs used at home.
Today, the advent of the cloud computing blurs the border between application engineers and infrastructure engineers. Many teams of application engineers build infrastructure using the cloud. Understanding the amount of data makes you choose the right hardware for yourself.
In this article, we'll give you a sense of the amount of data by showing the amount of data of various things.
Data size unit
The unit of data size is a byte. Currently, 1 byte is defined as 8 bits.
Since the data size handles a large number of digits, add a prefix to omit the number of digits. In the International System of Units(SI), the prefix is as follows:
Symbol | Name | Factor | Power | EN |
k | kilo | 10^3 | 10^3 | thousand |
M | mega | 10^3k | 10^6 | million |
G | giga | 10^3M | 10^9 | billion |
T | tera | 10^3G | 10^12 | trillion |
P | peta | 10^3T | 10^15 | quadrillion |
E | exa | 10^3P | 10^18 | quintillion |
On the other hand, the data size prefix is as follows:
Symbol | Name | Factor |
B | byte | 8bit |
KB | kilo byte | 1024B |
MB | mega byte | 1024KB |
GB | giga byte | 1024MB |
TB | tera byte | 1024GB |
PB | peta byte | 1024TB |
EB | exa byte | 1024PB |
The data is treated as a binary number in computer, so the prefix is every 1024, which is the 8th power of 2, instead of every 1000. There is also a notation that uses KiB instead of KB to distinguish it from the International System of Units, but this article uses KB.
Website data size
According to Page Weight, the data size of a website component is as follows:
Name | Size |
Total | 1.96MB |
HTML | 31.4KB |
CSS | 68.9KB |
JavaScript | 452KB |
Font | 119KB |
Image | 956KB |
Video | 2.07 MB |
The target period for aggregation is from January 2017/1 to 2022/1, and the target is mobile sites.
Note that it is the total size per page, not the size per file. Some people may find the size of JavaScript larger than expected. The reason is that it includes not only own code but also the code of external packages such as frameworks and libraries.
File size
The size varies depending on the contents of the file, so it is for your reference. The size and format were converted from the reference as needed. Also, these data are not a comparison of good and bad file formats. This is because the appropriate file format depends on the features of the file.
Name | Size | cf. |
Image - small JPG (size 320 x 320) | 21.0KB | |
Image - small PNG (size 320 x 320) | 137KB | |
Image - small WebP (size 320 x 320) | 16KB | |
Image - large JPG (size 1036 x 1036) | 187KB | |
Image - large PNG (size 1036 x 1036) | 1.38MB | |
Image - large WebP (size 1036 x 1036) | 148KB | |
Audio - music MP3 (playback time 3:01) | 5.80MB | |
Movie - short MP4 720p (playback time 0:09) | 851KB | |
Movie - short WebM 720p (playback time 0:09) | 1.10MB | |
Movie - short GIF 720p (playback time 0:09) | 3.50MB | |
Document - PDF (4 pages) | 150KB | - |
Document - DOC (4 pages) | 100KB | - |
Document - XLSX (1000 rows) | 140KB | - |
Document - PPT (3 pages) | 248KB | - |
Application - Firefox 97.0.1 (Mac) | 364MB | |
Application - Discord 0.0.265 (Mac) | 193MB | |
Application - Zoom 5.1.1 (Mac) | 52.5MB | |
Application - Xcode 13.2.1 (Mac) | 32.1GB | |
Hardware capacity
The data size of the hardware memory and storage are following table. If there is no standard, the value is shown as a guide.
Name | Size |
Memory - AWS EC2 instance t2.micro | 1GB |
Memory - AWS EC2 instance T2 | 0.5GB ~ 32GB |
Memory - AWS EC2 instance M5 | 5GB ~ 384GB |
Memory - MacBook Pro 13 inch 2020 | 8GB ~ 16GB |
Memory - MacBook Pro 14 inch 2021 | 16GB ~ 64GB |
Memory - iPhone (1st generation) | 128MB |
Memory - iPhone (13 Pro max) | 6GB |
Storage - AWS EBS Provisioned HDD | 125GB ~ 16TB |
Storage - AWS EBS Provisioned IOPS SSD | 4GB ~ 16TB |
Storage - AWS RDS SSD | 20GB ~ 64TB |
Storage - MacBook Pro 13 inch 2020 SSD | 256GB ~ 2TB |
Storage - MacBook Pro 14 inch 2021 SSD | 1TB ~ 8TB |
Storage - iPhone (1st generation) | 4 ~ 16GB |
Storage - iPhone (13 Pro max) | 128GB ~ 1TB |
Storage - Floppy Disk | 720KB ~ 1.44MB |
Storage - Compact Disk | 650 ~ 700MB |
Storage - DVD | 4.7GB ~ 8.5GB |
Storage - Blu-ray | 25GB ~ 128GB |
Storage - USB memory | 32GB ~ 256GB |
Real application data volume
According to Data Never Sleeps, the data created by the actual application is following:
Name | Volume per minute | Volume per day | Volume per year |
Twitter tweet | 575K tweet/min | 828M tweet/day | 302G tweet/year |
Instagram photo | 65K photo/min | 93.6M photo/day | 34.2G photo/year |
Slack message | 148K message/min | 213M message/day | 77.8G message/year |
Next, let's look at it in bytes. Assuming to tweet on Twitter has 100 characters and one character is 1 byte, 1tweet is 0.1KB. Instagram 1photo is assumed to be 0.1MB. Assuming to message in Slack has 50 characters and one character is 1 byte, 1message is 0.05KB. Based on these things, it is as follows.
Name | Size per year |
Twitter tweet | 30.2TB/year |
Instagram photo | 3.42PB/year |
Slack message | 3.89TB/year |
If you operate the service for one year, more than terabytes of data are accumulated. It is difficult to store this amount of data on a single database server, and you need to scale it out and store it on a distributed database server. We call this a distributed database.
There are two ways to build a distributed database: Master/Slave method and partitioning method. The Master/Slave method is an approach to high traffic, not a large amount of data volume. We should use partitioning for a large amount of data.
RDBs are not designed for partitioning. Therefore, maintaining the partitioned RDB is costly. If you want to build a partitioned distributed database, consider a database designed for partitioning, such as DynamoDB.
Top comments (0)