DEV Community

Arjun
Arjun

Posted on

1 1

Character set and Character encoding

Machines understand 0s and 1s. Character sets are created to convert characters between machine code and human-readable symbols. For example, the character a will be encoded based on the character set when it is stored in memory. And when the character is displayed, a decoder is used to convert the content in the memory into a human-readable symbol.

ASCII is one of the character set which represents a character in 7bit. Since it uses 7 bits, there can be 2^7, 128 characters represented in ASCII. ASCII contains only English alphabets

So with ASCII, character a will be stored in its binary representation of 01100001 which is 97 in decimal which is the character code of a in ASCII.

Unicode was created to have a universal code for almost all characters across all languages in the world and some commonly used symbols. Unicode uses 1 to 6 bytes. To encode and decode Unicode characters, different techniques were used like UTF-8, UTF-16, UTF-32, etc.

  • In UTF-8 each character is encoded into 1 to 4 bytes ( the dominant encoding )
  • In UTF16 each character is encoded into 1 to two 16-bit words and
  • in UTF-32 every character is encoded as a single 32-bit word.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay