DEV Community

Cover image for Rust: A JSON parser
Valacor
Valacor

Posted on

Rust: A JSON parser

Github: Source file on github

It's bin a while since I wrote my last Rust code so I thought why not writting a little JSON parser. So I fired up my vscode.

Json consists of the same types that JavaScript has (well Javascript Object Notation).

Json types

  1. boolean
  2. number
  3. string
  4. array
  5. object

Rust representation

So how can I represent these types in Rust? Well enums have a very cool feature where every entry can wrap one or more types.
Here we go:

#[derive(Debug, PartialEq)]
pub enum JsonType {
    Object(HashMap<String, JsonType>),
    Array(Vec<JsonType>),
    String(String),
    Number(i64),
    Decimal(f64),
    Boolean(bool)
}
Enter fullscreen mode Exit fullscreen mode

The properties of an object are stored in a HashMap where JsonType can be any of the other representations inside the enum. For arrays I use a Vector. This is a growable array type which also holds our JsonType. I required two entries for numbers because there are different types for floating-point and integer numbers and String/Boolean are represented by the according build-in types.

There could be parsing errors due to unexpected/missing tokens or a feature is not supported (e.g nested arrays). I represent them by the ParserError type:

#[derive(Debug, PartialEq)]
pub enum ParserError {
    UnexpectedToken(String),
    InvalidSyntax(String),
    MissingToken(String),
    EmptyInput,
    NotSupported(String)
}
Enter fullscreen mode Exit fullscreen mode

Parsing logic

pub fn parse_json(mut input: &str) -> Result<JsonType, ParserError>  {
    if input.trim().is_empty() {
        return Err(ParserError::EmptyInput);
    }

    input = &input.trim_start();

    match input.chars().nth(0).unwrap() {
        '{' => {
            // Parse JSON object
            match parse_object(&input) {
                Ok(obj) => Ok(JsonType::Object(obj.0)),
                Err(e) => Err(e)
            }
        },  
        '[' => {
            // Parse JSON array
            match parse_array(&input) {
                Ok(arr) => Ok(JsonType::Array(arr.0)),
                Err(e) => Err(e)
            }
        },
        _ => return Err(ParserError::UnexpectedToken(format!("Unexpected token: {}", input.chars().nth(0).unwrap())))
    }
}
Enter fullscreen mode Exit fullscreen mode

What do we have here? This method is our starting point which takes a string slice as a parameter and returns a Result<JsonType, ParserError>. Results represent either OK or Err.

JSON's top-level type is either an array or an object. So we take a look at the first character and depending on the sign we parse an array/object or we return a ParserError.

Now the code for the implementation of the object parsing function (If you want to see the complete code visit the link included above):

fn parse_object(mut input: &str) -> Result<(HashMap<String, JsonType>, &str), ParserError> {
    let mut result = HashMap::new();

    if input.chars().nth(0).unwrap() != '{' {
        return Err(ParserError::InvalidSyntax("Object must start with '{'".to_string()));
    }

    input = &input[1..].trim_start();

    loop {
        // Parse each key-value pair
        if (input.chars().nth(0).unwrap()) == '}' {
            return Ok((result, &input[1..])); // Empty object
        }

        match parse_string(&input) {
            Ok(key) => {
                // Expect a colon
                input = key.1;

                if input.chars().nth(0).unwrap() != ':' {
                    return Err(ParserError::MissingToken("Expected ':' after key".to_string()));
                }

                input = &input[1..].trim_start();

                let value = if input.chars().nth(0).unwrap() == '{' {
                    match parse_object(&input) {
                        Ok(obj) => {
                            input = obj.1;
                            JsonType::Object(obj.0)
                        },
                        Err(e) => return Err(e)
                    }
                } else if input.chars().nth(0).unwrap() == '[' {
                    match parse_array(&input) {
                        Ok(arr) => {
                            input = arr.1;
                            JsonType::Array(arr.0)
                        },
                        Err(e) => return Err(e)
                    }
                } else if input.chars().nth(0).unwrap() == '"' {
                    match parse_string(input) {
                        Ok(s) => {
                            input = s.1;
                            JsonType::String(s.0)
                        },
                        Err(e) => return Err(e)
                    }
                } else if input.chars().nth(0).unwrap() == 't' || input.chars().nth(0).unwrap() == 'f' {
                    match parse_boolean(input) {
                        Ok(b) => {
                            input = b.1;
                            JsonType::Boolean(b.0)
                        },
                        Err(e) => return Err(e)
                    }
                } else if input.chars().nth(0).unwrap().is_digit(10) || input.chars().nth(0).unwrap() == '-' {
                    match parse_number(input) {
                        Ok(n) => {
                            input = n.1;
                            n.0
                        },
                        Err(e) => return Err(e)
                    }
                } else {
                    return Err(ParserError::UnexpectedToken(format!("Unexpected token in object value: {}", input.chars().nth(0).unwrap())));
                };

                result.insert(key.0, value);
                input = input.trim_start();

                // Check for comma or end of object
                if input.chars().nth(0).unwrap() == ',' {
                    // Move past the comma
                    input = &input[1..].trim_start();
                } else if input.chars().nth(0).unwrap() == '}' {
                    input = &input[1..].trim_start();
                    break; // End of object
                } else {
                    return Err(ParserError::UnexpectedToken(format!("Expected ',' or '}}' in object, found: {}", input.chars().nth(0).unwrap())));
                }
            },
            Err(e) => return Err(e)
        }
    }

    Ok((result, &input))
}
Enter fullscreen mode Exit fullscreen mode

I want to explain the concept these methods follow: So each method returns in the success case a tuple containing the parsed element as well as a string slice.
The slice is the remaining json after the parsing logic. This will be used in the calling parsing method as the point where it should continue.

For reading a JSON object we need a loop that continues reading properties until it encounters a }. Every property contains of a name:value pair. The name is always a string so we can reuse our string parsing method. If this was successful we read the : and then check the next character. Depending on this sign we go into the appropriate reading method. When this operation was successful we set the remaining slice and create the correct JsonType. The last step is adding the key-value-pair to our hashmap.

Feel free to send us suggestions, bug reports and improvement ideas🙂

Top comments (0)