DEV Community

Stefanos Kouroupis
Stefanos Kouroupis

Posted on

1

Displaying all printable* utf-8 characters using Rust

Since I got my first year badge I decided to celebrate by listing my achievements and writing one of the most useless applications I've ever written.

achievements

  • > 20.000 views...pretty neat! nearly 60 per day
  • > 2500 followers ...again nice! that's nearly 7 per day
  • 2.5 years on the same job.

application

This amazing application as the title states prints all utf8 characters that can be printed. The star on the title is that I limited the output to the first 3 bytes.

ENJOY

Alt Text

Our main function has 3 loops

  • one for the single byte chars 0000 - 007F
  • one for the two byte chars 00C0 - 00DF | 0080 - 00BF
  • one for the three byte chars 00E0 - 00EF | 0080 - 00BF | 0080 - 00BF
use std::num::ParseIntError;
use std::str;

fn main() {
    let mut char_index = 0;
    let one_byte = vec![0, 127];

    for i in one_byte[0]..one_byte[1] {
        let mut first = format!("{:X}", i);
        first = make_even(first);
        char_index = output(first, char_index);
    }

    let two_bytes = vec![192, 223, 64, 191];

    for i in two_bytes[0]..two_bytes[1] {
        for j in two_bytes[2]..two_bytes[3] {
            let mut first = format!("{:X}", i);
            let mut second = format!("{:X}", j);

            first = make_even(first);
            second = make_even(second);

            char_index = output(first.to_string() + &second.to_string(), char_index);
        }
    }

    let three_bytes = vec![224, 239, 64, 191, 64, 191];

    for i in three_bytes[0]..three_bytes[1] {
        for j in three_bytes[2]..three_bytes[3] {
            for k in three_bytes[4]..three_bytes[5] {
                let mut first = format!("{:X}", i);
                let mut second = format!("{:X}", j);
                let mut third = format!("{:X}", k);

                first = make_even(first);
                second = make_even(second);
                third = make_even(third);

                char_index = output(
                    first.to_string() + &second.to_string() + &third.to_string(),
                    char_index,
                );
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Hex string needs an even amount of characters

pub fn make_even(mut s: String) -> String {
    if s.len() % 2 == 1 {
        s = "0".to_string() + &s.to_string();
    }
    return s;
}
Enter fullscreen mode Exit fullscreen mode

I got this function from here. What it basically does is, it converts a hex string to a u8 array.

pub fn decode_hex(s: &str) -> Result<Vec<u8>, ParseIntError> {
    (0..s.len())
        .step_by(2)
        .map(|i| u8::from_str_radix(&s[i..i + 2], 16))
        .collect()
}
Enter fullscreen mode Exit fullscreen mode

Nested matches for the win. Checks :

  • if the hex string is valid
  • if the character is valid in utf-8
  • if the character has a printable representation (by looking the printed output length)
pub fn output(hex: String, mut i: i32) -> i32 {
    match &decode_hex(&hex) {
        Ok(dh) => match str::from_utf8(dh) {
            Ok(v) => {
                if format!("{:?}", v).len() < 7 {
                    if i % 10 == 0 {
                        println!("{:?} {:?} \t", hex, v);
                    } else {
                        print!("{:?} {:?} \t", hex, v);
                    }
                    i += 1;
                }
            }
            _ => {}
        },
        _ => {}
    }

    return i;
}

Enter fullscreen mode Exit fullscreen mode

Image of Datadog

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More

Top comments (0)