Unicode
- A character set that defines different character encodings, like UTF-8, UTF-16, and UTF-32
UTF-8
- encodes common ASCII characters using 8-bits by assigning every character a unique number called a code point
- vs. utf8mb4: a preferred alternative since it stores a maximum of four bytes per code point instead of 3 (utf8 is an alias of utf8mb3)—meaning utf-8 might not support some characters from other languages and symbols
An example of a sql file with the utf8mb4 charset
CREATE TABLE IF NOT EXISTS dbName.tableName
(
`id` int NOT NULL AUTO_INCREMENT,
`email` varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,
`username` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL,
`password` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
- Collation provides the sorting rules, case, and accent sensitivity properties for the data
- You specify the COLLATE where ci stands for 'case insensitive'
Top comments (0)