Introduction
Base64 is a binary to text encryption algorithm. It converts ASCII to a base64 representation.
Process of conversion
We know that a UTF-8 letter consists of 8 bits. Base64 converts the provided string into binary representation and then it removes the last two binary digits from the binary representation of each UTF-8 letter from the provided string. After that, each letter is represented with 6 bits. Remember, the total number of the bits of the whole string should stay the same, like if a string contains 6 ASCII values, corresponding to 8*6 = 48 bits, base64 will convert the binary values into 8 groups of 6 bits.
The 6 bit groups are then converted into their corresponding integer values(0-63). After that we convert the integer values to their corresponding ASCII values using the base64 conversion chart. Then another chart is used to convert the corresponding ascii values into the original ascii values.
Also, when using base64 on images, we need to use Buffer to convert the base64 string into binary representation of the image.
string => binary => binary in the groups of 6 bits => base64 ascii string => original string
Where is it used
- It is used to store and transfer content on media which only support ASCII.
- It is used to ensure that the data remains intact without any modification in the transfer.
- It is also used in sending emails.
- It is used to encode binary data so it can be included in a url
Examples
- Suppose you want to send an image over a medium that only supports ASCII, you will have to convert it to ASCII using base64 and then send it.
Encoded size increase
When you encrypt a string using base64, the encoded string would be larger than the actual string. This is because a base64 character is represented by 6 bits, whereas a normal character is represented by 8 bits, thus increasing the number of letters in the base64 string, hence increasing the size of the string. When you use base64 on a string, the size of the string is AT LEAST increased by 133%
Unicode Problem
The DOM strings are 16-bit(UTF-16) encoded strings, which pose a problem for base64 as it only supports 8-bit strings(UTF-8). You can solve this problem by converting the string to UTF-8 and there are other methods to do the same.
The code for overcoming this problem by converting the string to UTF-8 is as follows:
function utf16_To_utf8(str) {
let utf8 = unescape(encodeURIComponent(str));
return utf8;
}
btoa(utf16_To_utf8("pog"));)
Demonstration
A working demonstration of base64 algorithm in a real life scenario in which we transfer an image from a source to its destination by using base64 because we can only transfer ascii data over the medium of transfer. The below demonstration is used is of converting a .jpg
file to .png
file.
const fs = require('fs');
const base64 = fs.readFileSync('./original.jpg', 'base64');
// convert the binary text in the image file to a base64 string
const buffer = Buffer.from(base64, 'base64');
// generate a base64 buffer of the base64 string, a buffer of the base64 string is required to convert the string back to graphics
fs.writeFileSync('new.jpg', buffer);
// write the buffer into a file
fs.writeFileSync('new.png', buffer);
// you can even convert a jpg into png using this technique
// the process
// image => binary => base64 string => buffer => image
Credits
- [Alex Lohr] for correcting a mistake and also for sharing useful information to be added to the blog.
Top comments (3)
Base64 is not an encryption. For an encryption, you would need a key to decrypt the data. It is an encoding.
The main reason for base64 is that some protocols dedicated 2 bits of each byte for error correction data, so in order to transmit anything, it would have to be encoded in base64 (the same reason applies for 1 bit and Uuenc).
If our data would read "Ok.", that would be '01001111 01101011 00101110' in base2, better known as binary. To get base 64, we split the data not in segments of 8, but of 6 bits: '010011 110110 101100 101110' convert them back to numbers from 0-63, which would make '19 54 44 46' and then use these as indices in a list of A-Za-z0-9+ and use the equal sign for padding, which gives us 'T2su'.
Thanks Alex for correcting me and sharing the useful information. I'll edit the blog and give credits to you for sharing the information.
Good call! Is a little detail that sometimes led people, specially beginners to think they are “encrypting” data.