"Text" is an intuitive concept to humans, but it's fairly involved for a computer to "understand" text.
Computers natively understand only sequences of (small) numbers. Most computers treat all memory as a sequence of bytes (aka, octets, meaning a pattern of 8 bits). A byte has 256 distinct values, which we usually identified with the numbers 0, 1, 2, ..., 255.
To store text in a computer, we need to encode that text as a sequence of bytes. In a single byte encoding (like ASCII), we break text into characters, and assign a byte value to each character.
For example, in ASCII, the text Hello is broken into the characters
So we call the sequences of bytes [72, 101, 108, 108, 111] the ASCII encoding of the string Hello.
[72, 101, 108, 108, 111]
ASCII is a character encoding, meaning it is a method for encoding text into bytes.
ASCII is special in a few important ways:
The first fact makes it very easy for computers to use ASCII. However, it also means you can only use a few hundred distinct symbols in your text -- this means it's impossible to represent, for example, Chinese and Japanese text.
The second fast means that you can make "ASCII compatible" encodings, by utilizing the extra unused bit. UTF-8, the most popular (and best) unicode encoding, is "ASCII compatible", so text that is encoded as ASCII can be safely decoded as UTF-8 (the reverse is not true, however).
If you are only using English, and no funny symbols, ASCII will be enough. However, if you want to work with the full set of available symbols and languages, you will want to use a Unicode encoding. The best Unicode encoding is UTF-8.
UTF-8 is different from ASCII in a few crucial ways:
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.