DEV Community

Cover image for Introduction to URL router from scratch with Golang
Kenta Takeuchi
Kenta Takeuchi

Posted on • Updated on

Introduction to URL router from scratch with Golang

This article is the 13th day article of Makuake Advent Calendar 2020.

This article is based on the following article with some modifications.

Introduction

I will write about the process of creating a URL router in Golang.

Routing of logic, it will adopt the algorithm trie-tree as the base rather than the regular expression.

There are various router packages in Go.

ex.

Complex algorithms of the tree structure for optimization of performance in many of the library has been adopted.

This time, I would like to implement it in a form that reduces the difficulty of the algorithm by defining a data structure that seems to be relatively easy to understand.

What is a URL router?

Screen Shot 2019-12-14 at 23 05 02

URL router is an application for sorting processes based on URL.

It may not be necessary to explain, but I would like to confirm the definition of the word URL and router.

URL

URL represents the address of a page on the Internet and is an abbreviation for Uniform Resource Locator.

The format of the URL string is defined as follows.

 <scheme>:<scheme-specific-part>
Enter fullscreen mode Exit fullscreen mode

Protocol names such as http, https, and ftp are often used for the part, but schema names other than the protocol name are also defined.

cf. Uniform Resource Identifier (URI) Schemes

In the <sceme-specific-part> part, a string based on the schema is defined.

For example, in the case of http and https schemes, the rule is that the domain name and path name (or directory name) are defined.

See RFC1738 for detailed URL specifications.

cf. rfc1738 - Uniform Resource Locators (URL)

RFC1738 is positioned as the Internet Standard (STD1).

Router

As you know, Routing means routing control.

For example, a router that relays communication between networks uses a route table to perform routing that mediates the transfer of packets.

The URL router receives information called a URL from a browser request and acts as an intermediary between the URL and processing in order to process the data based on the received URL.

Just as routers in your network have a data structure called a routing table, URL routers need to have a data structure for routing.

Requirement definition

There are various multifunctional routers, but this time we will implement a router with the following functions.

  • Supporting static routing
    • The URL /foo/bar/ can be matched with the static routing definition /foo/bar/ and the process can be returned.
  • Supporting dynamic routing
    • The URL /foo/bar/123 can be matched with the routing definition that has dynamic path parameters such as /foo/bar /:id, and the process can be returned using the path parameter data.

Data flow

I think it's not particularly difficult, but I would like to confirm the data flow.

Screen Shot 2019-12-15 at 19 07 42

The router receives the path part of the URL as input data, determines the data that matches the path, and calls the process.

How to match the path and the process is the basis of the implementation.

Tree structure

The basic implementation that matches the path and the process implements the algorithm using a data structure called a tree structure.

Screen Shot 2019-12-15 at 20 55 26

A tree structure is a data structure that has roots, branches, nodes, and leaves (nodes at the end of Leaf).

There are various types of tree structures depending on the pattern of data storage method and search method.

Since we want to handle strings mainly in URL routing, we adopt a data structure called a trie tree that specializes in string search.

Trie tree structure

A trie tree is a type of tree structure that handles a set of strings.

Also called a prefix tree.

Each node has a single or multiple string or number, and expresses a word by searching from the root node toward the leaf and connecting the values.

Each node does not necessarily have to have a value.

Trie tree are used for IP address search in the network layer, http routing in the application layer, and morphological analysis in the context of machine learning.

The image of the algorithm is the following visualization tool
I think it's easy to grasp when you experience it.

Trie(Prefix Tree)

If you want to optimize the memory efficiency of the trie tree, you should consider a tree structure that efficiently stores the prefix of the string such as Radix Tree (Patricia Trie).

The computational complexity of the trie tree is the same for both search and insert, and when the length of the search key is m, the worst computational complexity is O(m).

I have prepared sample code so please take a look.

trie

How to handle trie trees?

Customize the trie tree to make it easier to use with your URL router.

Screen Shot 2019-12-15 at 21 39 38

As an example, the figure above defines a routing that only supports GET.

  • /foo/
  • /baz/
  • /foo/bar/
  • /foo/:name[string]/
  • /foo/bar/:id[int]

A node for HTTP methods is created directly under the root, and the path is stored for each HTTP method.

Nodes starting with : are nodes that store path parameters.

The DSL [int] is for mapping regular expressions to their parameters.

If the node that was searched and arrived has the processing data, the matching will be successful, and if it is not, the matching will fail.

Below is an implementation of this customized tri-tree.

[goblin/trie.go(https://github.com/bmf-san/goblin/blob/master/trie.go)

Implementation of router in Golang

In Golang, the standard package net/http has a routing function.

That feature is called multiplexer in Golang.

However, the standard package does not support path parameters, so if you want to use it, you need to prepare an external package or extend the standard multiplexer.

This time, I will create my own routing, so I need to extend the multiplexer.

Before learning about multiplexer extensions, it's a good idea to familiarize yourself with the http server specification.

http server code reading

Reference implementation

We will read it with reference to the following implementation.

package main

import (
    "net/http"
)

func main() {
    mux := http.NewServeMux()
    handler := new(HelloHandler)
    mux.Handle("/", handler)

    s := http.Server{
        Addr:    ":3000",
        Handler: mux,
    }
    s.ListenAndServe()
}

type HelloHandler struct{}

func (h *HelloHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("Hello World"))
}
Enter fullscreen mode Exit fullscreen mode

I'm going to read this redundantly written code line by line, simplifying the code.

ServeHttp(w ResponseWriter, r *Request)

I'm going to read this redundantly written code line by line, simplifying the code.

type HelloHandler struct{}

func (h *HelloHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("Hello World"))
}
Enter fullscreen mode Exit fullscreen mode

ServeHTTP (w ResponseWriter, r * Request) is an implementation of the Handler interface.

// url: https://golang.org/src/net/http/server.go?s=61586:61646#L1996
func (mux *ServeMux) ServeHTTP(w ResponseWriter, r *Request) {
    if r.RequestURI == "*" {
        if r.ProtoAtLeast(1, 1) {
            w.Header().Set("Connection", "close")
        }
        w.WriteHeader(StatusBadRequest)
        return
    }
    h, _ := mux.Handler(r)
    h.ServeHTTP(w, r)
}
Enter fullscreen mode Exit fullscreen mode
// url: https://golang.org/src/net/http/server.go?s=61586:61646#L79
type Handler interface {
    ServeHTTP(ResponseWriter, *Request)
}
Enter fullscreen mode Exit fullscreen mode

In the reference implementation, the HelloHandler structure is prepared for ServeHTTP (w ResponseWriter, r * Request),
you can rewrite it more concisely by using HandlerFunc.

// url: https://golang.org/src/net/http/server.go?s=61509:61556#L1993
type HandlerFunc func(ResponseWriter, *Request)

func (f HandlerFunc) ServeHTTP(w ResponseWriter, r *Request) {
    f(w, r)
}
Enter fullscreen mode Exit fullscreen mode

Rewriting the reference implementation looks like this.

package main

import (
    "net/http"
)

func main() {
    mux := http.NewServeMux()
    mux.Handle("/", http.HandlerFunc(hello))

    s := http.Server{
        Addr:    ":3000",
        Handler: mux,
    }
    s.ListenAndServe()
}

func hello(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("Hello World"))
}
Enter fullscreen mode Exit fullscreen mode

I was able to rewrite the part that used ServeHTTP (w ResponseWriter, r * Request).

By the way, the contents of mux.Handle are implemented like this.

// url: https://golang.org/src/net/http/server.go?s=75321:75365#L2390
func (mux *ServeMux) Handle(pattern string, handler Handler) {
    mux.mu.Lock()
    defer mux.mu.Unlock()

    if pattern == "" {
        panic("http: invalid pattern")
    }
    if handler == nil {
        panic("http: nil handler")
    }
    if _, exist := mux.m[pattern]; exist {
        panic("http: multiple registrations for " + pattern)
    }

    if mux.m == nil {
        mux.m = make(map[string]muxEntry)
    }
    e := muxEntry{h: handler, pattern: pattern}
    mux.m[pattern] = e
    if pattern[len(pattern)-1] == '/' {
        mux.es = appendSorted(mux.es, e)
    }

    if pattern[0] != '/' {
        mux.hosts = true
    }
}
Enter fullscreen mode Exit fullscreen mode

ServeMux

Let's take a closer look at the previous code.

    mux := http.NewServeMux()
    mux.Handle("/", http.HandlerFunc(hello))

    s := http.Server{
        Addr:    ":3000",
        Handler: mux,
    }
Enter fullscreen mode Exit fullscreen mode

Since the part of mux.Handle ("/", http.HandlerFunc (hello)) can be processed internally by using HandleFunc, you can write shorter.

url: https://golang.org/src/net/http/server.go?s=75575:75646#L2448
func HandleFunc(pattern string, handler func(ResponseWriter, *Request)) {
    DefaultServeMux.HandleFunc(pattern, handler)
}
Enter fullscreen mode Exit fullscreen mode
url: https://golang.org/src/net/http/server.go?s=75575:75646#L2435
func (mux *ServeMux) HandleFunc(pattern string, handler func(ResponseWriter, *Request)) {
    if handler == nil {
        panic("http: nil handler")
    }
    mux.Handle(pattern, HandlerFunc(handler))
}
Enter fullscreen mode Exit fullscreen mode

If you rewrite it in consideration of the above, it will look like this.

package main

import (
    "net/http"
)

func main() {
    http.HandleFunc("/", hello)

    s := http.Server{
        Addr:    ":3000",
        Handler: http.DefaultServeMux,
    }
    s.ListenAndServe()
}

func hello(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("Hello World"))
}
Enter fullscreen mode Exit fullscreen mode

DefaultServeMux is internally a variable that contains a pointer to the ServeMux structure.

HandleFunc is a method that allows you to register a url pattern match to DefaultServeMux.

url: https://golang.org/src/net/http/server.go?s=75575:75646#L2207
// DefaultServeMux is the default ServeMux used by Serve.
var DefaultServeMux = &defaultServeMux

var defaultServeMux ServeMux
Enter fullscreen mode Exit fullscreen mode
url: https://golang.org/src/net/http/server.go?s=68149:68351#L2182
type ServeMux struct {
    mu    sync.RWMutex
    m     map[string]muxEntry
    es    []muxEntry // slice of entries sorted from longest to shortest.
    hosts bool       // whether any patterns contain hostnames
}
Enter fullscreen mode Exit fullscreen mode

Server

The last thing to look at is this part.

    s := http.Server{
        Addr:    ":3000",
        Handler: http.DefaultServeMux,
    }
    s.ListenAndServe()
Enter fullscreen mode Exit fullscreen mode

The implementation of s.ListenAndServe () are here.

url: https://golang.org/src/net/http/server.go?s=68149:68351#L3093
func (srv *Server) ListenAndServe() error {
    if srv.shuttingDown() {
        return ErrServerClosed
    }
    addr := srv.Addr
    if addr == "" {
        addr = ":http"
    }
    ln, err := net.Listen("tcp", addr)
    if err != nil {
        return err
    }
    return srv.Serve(ln)
}
Enter fullscreen mode Exit fullscreen mode

If you don't need to give detailed settings to Server, you can write it short by using ListenAndServe ().

Please refer to golang.org --server.go for the setting value of Server.

url: https://golang.org/src/net/http/server.go?s=68149:68351#L3071
func ListenAndServe(addr string, handler Handler) error {
    server := &Server{Addr: addr, Handler: handler}
    return server.ListenAndServe()
}
Enter fullscreen mode Exit fullscreen mode

It looks like this in short.

package main

import (
    "net/http"
)

func main() {
    http.HandleFunc("/", hello)

    http.ListenAndServe(":3000", nil)
}

func hello(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("Hello World"))
}
Enter fullscreen mode Exit fullscreen mode

It looks like this when used with an anonymous function.

package main

import (
    "net/http"
)

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("Hello World"))
    })

    http.ListenAndServe(":3000", nil)
}
Enter fullscreen mode Exit fullscreen mode

Extend multiplexer

Here is the code for the part that extends the multiplexer.

goblin/router.go

Golang is easy to extend because the interface of the standard package is properly maintained.

Conclusion

This is the package actually developed in this way.

bmf-san/goblin

The amount of code is small, so please read it.

Of course, contributions are also welcome. :D

Appendix

goblin listed in awesome-go

I posted an article related to HTTP Router. Please read this too if you like.

Top comments (0)