DEV Community

David Hwang
David Hwang

Posted on

6/1 TIL: Unicode utf8mb4, sql COLLATE

Unicode

  • A character set that defines different character encodings, like UTF-8, UTF-16, and UTF-32

UTF-8

  • encodes common ASCII characters using 8-bits by assigning every character a unique number called a code point
  • vs. utf8mb4: a preferred alternative since it stores a maximum of four bytes per code point instead of 3 (utf8 is an alias of utf8mb3)—meaning utf-8 might not support some characters from other languages and symbols

An example of a sql file with the utf8mb4 charset


CREATE TABLE IF NOT EXISTS dbName.tableName

(

`id` int NOT NULL AUTO_INCREMENT,

`email`          varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,

`username`       varchar(20) COLLATE utf8mb4_unicode_ci  NOT NULL,

`password`       varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,

PRIMARY KEY (`id`),

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Enter fullscreen mode Exit fullscreen mode
  • Collation provides the sorting rules, case, and accent sensitivity properties for the data
  • You specify the COLLATE where ci stands for 'case insensitive'

Top comments (0)