Skip to content

DEV Community

David Hwang

Posted on Jun 2, 2021

6/1 TIL: Unicode utf8mb4, sql COLLATE

#todayilearned

Unicode

A character set that defines different character encodings, like UTF-8, UTF-16, and UTF-32

UTF-8

encodes common ASCII characters using 8-bits by assigning every character a unique number called a code point
vs. utf8mb4: a preferred alternative since it stores a maximum of four bytes per code point instead of 3 (utf8 is an alias of utf8mb3)—meaning utf-8 might not support some characters from other languages and symbols

An example of a sql file with the utf8mb4 charset


CREATE TABLE IF NOT EXISTS dbName.tableName

(

`id` int NOT NULL AUTO_INCREMENT,

`email`          varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,

`username`       varchar(20) COLLATE utf8mb4_unicode_ci  NOT NULL,

`password`       varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,

PRIMARY KEY (`id`),

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Collation provides the sorting rules, case, and accent sensitivity properties for the data
You specify the COLLATE where ci stands for 'case insensitive'

Top comments (0)

Subscribe

Read next

Bigger Isn’t Always Better: The Truth About PC Screen Sizes

Alice Carry - Dec 18

Automating RDS Snapshot Management for Daily Testing

Radurga Rajendran - Dec 18

.NET Cross-Platform Web Desktop App Frameworks as Electron Alternatives

Rocky LIU Yan - Dec 18

DAY 14 Mastering Array-Based Challenges

Somuya Khandelwal - Dec 18