DEV Community

Paweł bbkr Pabian
Paweł bbkr Pabian

Posted on • Edited on

2

Unicode vs UTF

Hi and welcome to the series that will explain various aspects of UTF encoding.

Let's start with common misconception: Unicode and UTF. Many people use those terms interchangeably and say that "This text has Unicode encoding". However these are not synonyms.

Unicode is a consortium. Non-profit corporation devoted to developing, maintaining, and promoting software internationalization standards and data. Here is their logo:

Unicode consortium logo

They created and maintain Unicode standard, which catalogues all characters used worldwide. Current version 15.0 contains 149 186 characters.

UTF stands for Unicode Transformation Format and it is the technical implementation of Unicode standard. Tells how to represent all those catalogued characters as bytes. It has UTF-8, UTF-16 and UTF-32 variants (which will be explained later). But also less common encodings like BOCU and SCSU implement the same standard but are binary incompatible with UTF.

So if you refer to specific byte representation of a text (like a document on a disk or variable in a memory) you should say precisely "This text has UTF-8 encoding".

Coming up next: Madness before UTF - a short history lesson about dark times.

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay