DEV Community

kumaraish
kumaraish

Posted on

Unicode, UCS & UTF-8

Unicode, UCS & UTF-8

Unicode is a standard that defines, along with ISO/IEC 10646, Universal Character Set (UCS) which is a superset of all existing characters required to represent practically all known languages.

Unicode assigns a Name and a Number (Character Code, or Code-Point) to each character in its repertoire.

UTF-8 encoding, is a way to represent these characters digitally in computer memory. UTF-8 maps each code-point into a sequence of octets (8-bit bytes)

Become a member
For e.g.,

UCS Character = Unicode Han Character

UCS code-point = U+24B62

UTF-8 encoding
(in hex) = F0 A4 AD A2
(in binary) = 11110000 10100100 10101101 10100010

Top comments (0)