DEV Community

loading...

A Gemini Client in Rust - 05 Handling URLs

Nivethan
Originally published at nivethan.dev Updated on ・5 min read

Hello! Welcome back after the mess that was getting out connection set up! I'll hold my thoughts to the end but I'll just say that it feels very much like there is a better way to do the connections to Gemini servers.

Anyway let's get started with this chapter! What we need to do now is clean up our URLs, currently we can do visit gemini.conman.org and that's it. We can't do something like visit gemini.conman.org/news.txt or vist gemini://gemini.conman.org.

The Gemini specification references RFC 3986 when talking about URLs. They key part we need to worry about is page 16 - Syntax.

https://tools.ietf.org/html/rfc3986#section-1.1.1

This page highlights what a URL is really made of and gives us the structure we need to parse the URL into.

So what we'll do is create a struct that contains our URL and then we'll have the generate the various URLs we need as we need them.

Let's get started!

...
#[derive(Debug)]
struct Url {
    scheme: String,
    address: String,
    port: String,
    path: String,
    query: String,
    fragment: String,
}

impl Url {
    fn new(url: &str) -> Self {
        let mut url_string = url.to_string();

        let scheme: String;
        let scheme_end_index  = url_string.find("://").unwrap_or(0);

        if scheme_end_index == 0 {
            scheme = String::from("gemini");
        } else {
            scheme = url_string.drain(..scheme_end_index).collect();
        }
        url_string = url_string.replacen("://", "", 1);

        let mut address_end_index  = url_string.find(":").unwrap_or(0);

        let address: String;
        let port: String;

        if address_end_index != 0 {
            url_string = url_string.replacen(":", "", 1);
            address = url_string.drain(..address_end_index).collect();

            let port_end_index = url_string.find("/").unwrap_or(url_string.len());
            port = url_string.drain(..port_end_index).collect();
            url_string = url_string.replacen("/", "", 1);

        } else {
            address_end_index = url_string.find("/").unwrap_or(url_string.len());
            address = url_string.drain(..address_end_index).collect();
            url_string = url_string.replacen("/", "", 1);

            match scheme.as_str() {
                "gemini" => port = "1965".to_string(),
                "http" => port = "80".to_string(),
                "https" => port = "443".to_string(),
                _ => port = "".to_string()
            }
        }

        let path_end_index = url_string.find("?").unwrap_or(url_string.len());
        let path: String = url_string.drain(..path_end_index).collect();
        url_string = url_string.replacen("?", "", 1);


        let query_end_index = url_string.find("#").unwrap_or(url_string.len());
        let query: String = url_string.drain(..query_end_index).collect();
        url_string = url_string.replacen("#", "", 1);

        let fragment = url_string;

        Url {
            scheme, address, port, path, query, fragment
        }
    }

}
...
Enter fullscreen mode Exit fullscreen mode

Based off the syntax is RFC 3986 we know we need to capture the following things, we need the scheme, the authority which we will break into address and port, the path, the query, and lastly the fragment.

We first create a struct to match that.

Next what we need to do is create a constructor for our Url object that can take a wide variety of input and create a Url object out of it.

We do this by writing a new function for our Url. It looks complex but really we are just following a simple series of steps. Each portion of a URL is delimited by some character. The first step is to locate that character and everything before that character will the correspond to a portion of the URL.

Let's go over how to parse out the scheme. We first find where :// occurs. If it doesn't occur we will default the value to 0. This is what our unwrap_or() does. Next we drain our url_string, starting from the beginning to our end point. This means that url_string will lose characters and those characters will go into our scheme variable.

Now if there was no scheme set, then we will drain 0 elements. Lastly we need to remove :// from our URL string. We do this by using the replacen function. This way we remove just the first instance of this delimiter rather than all. We shouldn't see this text in the URL again anyway but I feel safer replacing just the first instance.

Now if we don't find a scheme, then we will default to gemini. This is specified in the Gemini specification.

We then repeat these steps for each portion of the URL.

Each portion of address has a delimiter we need to find, we then drain everything from the beginning to the delimiter. We then remove the delimiter from our URL and move to the next portion.

One thing to note here is that we don't get a port number, we will try to default one based on the scheme. If we can't match against the scheme we will leave the port number blank. (This will blow up but if don't have the scheme we shouldn't be connecting anyway!)

Now let's write our url formatter functions!

...
impl Url {
    fn new() -> Self { ... }
    fn for_tcp(&self) -> String {
        format!("{address}:{port}", address=self.address, port=self.port)
    }
    fn for_dns(&self) -> String {
        format!("{address}", address=self.address)
    }
    fn request(&self) -> String {
        format!("{scheme}://{address}:{port}/{path}\r\n",
            scheme = self.scheme,
            address = self.address,
            port = self.port,
            path = self.path
        )
    }
}
...
Enter fullscreen mode Exit fullscreen mode

Here we have some very simple helper functions, we create a string for our TcpStream, we create one for a DNS lookup and lastly we create out full gemini request.

Something to note here is that the request has the \r\n, this is part of the gemini spec.

Now we can go ahead and update our visit function and main function to use our new Url object!

...
        match tokens[0] {
            "q" => break,
            "visit" => {
                if tokens.len() < 2 {
                    println!("Nowhere to visit.");
                    return;
                }

                let url = Url::new(tokens[1]);
                visit(url)
            },
            _ => println!("{:?}", tokens),

        }
...
Enter fullscreen mode Exit fullscreen mode

We moved out length check from our visit function to inside our command processor. I wanted to have the URL passed into the visit function so this logic needed to happen before we create the url.

We use our new constructor to create a Url object and we then pass that to our visit function.

Now let's update our visit function!

...
fn visit(url: Url) {

    println!("Attempting to visit....{}", url.address);
...
    let dns_url = url.for_dns();
    let dns_name = webpki::DNSNameRef::try_from_ascii_str(&dns_url).unwrap();
...
    let mut socket = TcpStream::connect(url.for_tcp()).unwrap();
...
    stream.write(url.request().as_bytes()).unwrap();
...
Enter fullscreen mode Exit fullscreen mode

We have changed the parameter we pass into our function, we now will pass in just our Url object. We have also replaced the various URLs we constructed with strings from our Url object!

With that we should be done!

Try visit gemini.conman.org/news.txt.

This should now work! We now have a much better version of our original visit function. Now that we have all the parts of our URL in a struct we manipulate it and use them in any way!

! We have the core of our gemini client done! We can use it now as is to browse gemini space but we do have some things to fix up. Currently we print out the status and response body at all times, however that's not the only thing that can happen in Gemini! In the next chapter we will look at dealing with the various Gemini statuses!

See you soon!

PS. I'm not in love with the way we've done our parsing of the URL, let me know if you have better ideas. I may take a look at some URL parsing crates to see how they do things.

Discussion (0)