UTF-8 VALIDATION
An integer array "data" representing the data was given to return whether it was a valid UTF-8 encoding. That is, if it translates to a sequence of valid UTF-8 encoded characters.
PROCEDURE
Track pendingBytes by starting with zero value.
Iterate through the input data, convert 1 to 00000001.
If the pendingBytes equalizes to zero(0) at the same time with the first value in byte array (byte[0]), continue.
Using the loop, iterate through the bit of byte, it must start with 1, if bit is not equal to 1, break and increment pendingBytes.
NOTES
If starting, it needs to be > 1, 10 is only for following bytes.
secondly, if greater than 4, it is also invalid.
Lastly, a character in UTF-8 can be from 1 to 4 bytes long.
Therefore, if pendingBytes is less than or equal to 1 or greater than 4, return false and decrease pendingBytes.
If the first value in byte array equalizes to "1" and second value equalizes to "0", pendingBytes is decreased while we jump over one iteration in the loop and return false.
If none pending, it is valid, therefore we return pendingBytes to equalize zero (0).
OUTCOME
Top comments (0)