DEV Community

michael-2509
michael-2509

Posted on

UTF-8 Validation

Another leetCode challenge on UTF-8 Validation.

PROBLEM
Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters). - You can use the link above to see more details.

SOLUTION
This was quite a challenge! let's look at the step to solve it.

  • Loop through the array.

  • Find the binary representation for each integer using the format method that takes in two arguments

  • Loop through the first octet sequence to determine if it is a valid UTF-8 encoding, by checking if the character is 1 - 4 bytes Long.

  • Check the next two bit of the sequence if they follow the rules, to check for validity

Rules
A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For a 1-byte character, the first bit is a 0, followed by its Unicode code.
For an n-bytes character, the first n bits are all one's, the n + 1 bit is 0, followed by n - 1 bytes with the most significant 2 bits being 10.

This is how the UTF-8 encoding would work:

Number of Bytes   |        UTF-8 Octet Sequence
                       |              (binary)
   --------------------+-----------------------------------------
            1          |   0xxxxxxx
            2          |   110xxxxx 10xxxxxx
            3          |   1110xxxx 10xxxxxx 10xxxxxx
            4          |   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Enter fullscreen mode Exit fullscreen mode

x denotes a bit in the binary form of a byte that may be either 0 or 1.

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more