Hi all, I am a newbie who is trying to learn python3. While solving a edabit coding challenge I found the term ASCII values which I don't completely understand. Can somebody explain please?
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (2)
"Text" is an intuitive concept to humans, but it's fairly involved for a computer to "understand" text.
Computers natively understand only sequences of (small) numbers. Most computers treat all memory as a sequence of bytes (aka, octets, meaning a pattern of 8 bits). A byte has 256 distinct values, which we usually identified with the numbers
0
,1
,2
, ...,255
.To store text in a computer, we need to encode that text as a sequence of bytes. In a single byte encoding (like ASCII), we break text into characters, and assign a byte value to each character.
For example, in ASCII, the text
Hello
is broken into the charactersH
with value72
e
with value101
l
with value108
l
with value108
o
with value111
So we call the sequences of bytes
[72, 101, 108, 108, 111]
the ASCII encoding of the stringHello
.ASCII is a character encoding, meaning it is a method for encoding text into bytes.
ASCII is special in a few important ways:
128
to255
.The first fact makes it very easy for computers to use ASCII. However, it also means you can only use a few hundred distinct symbols in your text -- this means it's impossible to represent, for example, Chinese and Japanese text.
The second fast means that you can make "ASCII compatible" encodings, by utilizing the extra unused bit. UTF-8, the most popular (and best) unicode encoding, is "ASCII compatible", so text that is encoded as ASCII can be safely decoded as UTF-8 (the reverse is not true, however).
If you are only using English, and no funny symbols, ASCII will be enough. However, if you want to work with the full set of available symbols and languages, you will want to use a Unicode encoding. The best Unicode encoding is UTF-8.
UTF-8 is different from ASCII in a few crucial ways:
0
to127
; all the characters that aren't ASCII characters are at least 2 bytes long, and are made up of only the bytes128
to255
.0
to1114111
. As mentioned,0
to127
align with how ASCII assigns byte values to the characters in ASCII.Computers works with numbers (binary encoded). To make computers work with letters we come up with encoding. We agreed that, for example, 64 stands for A, 65 for B etc. ASCII is a standard which describes encoding (one of them).
ASCII stands for American Standard Code for Information Interchange. ASCII was originally designed for use with teletypes.
For historical reasons ASCII overlived original teletypes, teletypes were used to interact with mainframes (very big old computers with size of a room), then this standard was adopted for other computers and it survived till our days. UTF-8 keep these exact mappings (for compatibility reasons).
Read more: