DEV Community

Chris White
Chris White

Posted on

Python Networking: Servers

Note: I've added table of contents to previous installments so they'll hopefully be easier to navigate. Thanks to derlin for the nifty TOC generation tool!

So far we've seen the basic communication pattern for networking and the three low level protocols that make up the core of Internet communication. Now we're going to look at how servers operate. Servers are anything from the Bulletin Board Systems (BBS) from back in the days, to modern web servers hosting millions of clients. Covered in this article will be a simple python server and slowly add more functionality to how it serves data.

Security Note

The code here does not guard against malicious attacks done via manipulating how client data is sent. It's only meant to show the basics of how each type of server works. If you're working with a public facing service you should really have a reverse proxy and even a firewall in front of it to handle such attacks. I generally prefer doing it at that layer since it's easier to handle network hardening in easy to update software than trying to handle it across who knows how many codebases. So basically:

Don't use any of this in production

Basic Server

Most servers have a workflow of:

  1. Bind to a part
  2. Start listening for traffic
  3. Accept a connection
  4. Deal with the connection
  5. Close the connection

So we'll start with an echo server that simply replies back to the client with what it was sent. Here is some example code from the python documentation:

# Echo server program
import socket

HOST = ''                 # Symbolic name meaning all available interfaces
PORT = 50007              # Arbitrary non-privileged port
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind((HOST, PORT))
    s.listen(1)
    conn, addr = s.accept()
    with conn:
        print('Connected by', addr)
        while True:
            data = conn.recv(1024)
            if not data: break
            conn.sendall(data)
Enter fullscreen mode Exit fullscreen mode

And the client:

# Echo client program
import socket

HOST = 'localhost'    # The remote host
PORT = 50007              # The same port as used by the server
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((HOST, PORT))
    s.sendall(b'Hello, world')
    data = s.recv(1024)
print('Received', repr(data))
Enter fullscreen mode Exit fullscreen mode

The results are:

> python .\simple_server.py
Connected by ('127.0.0.1', 53811)
>

> python .\simple_client.py
Received b'Hello, world'
>
Enter fullscreen mode Exit fullscreen mode

Before continuing with this I'd like to take a moment to discuss port bind permissions.

Permissions Dropping

One interesting thing to note is per the IANA well known ports listing there is actually a specific port number 7 which is designated for an echo server. If I try to bind this in Windows as a non-privileged user:

TCP    0.0.0.0:7              0.0.0.0:0              LISTENING       24664
Enter fullscreen mode Exit fullscreen mode

It happily complies with the request (though you may need a one time windows firewall exception). Linux on the other hand:

    s.bind((HOST, PORT))
PermissionError: [Errno 13] Permission denied
Enter fullscreen mode Exit fullscreen mode

The port bind gets rejected. This will occur on most any *NIX like system. Now we could just run it as root to solve the problem:

# python3 SimpleServer/simple_server.py
Server bound to port 7
Enter fullscreen mode Exit fullscreen mode

But in general running services as root is not really desired since if someone manages to exploit the server they could potentially have full control over the system. To get around this we can utilize os.setuid and os.setgid. The code then becomes something like this:

# Echo server program
import socket, os, pwd, grp

HOST = ''                 # Symbolic name meaning all available interfaces
PORT = 7                  # Well known echo port

# https://stackoverflow.com/a/2699996
def drop_privileges(uid_name='nobody', gid_name='nogroup'):
    if os.getuid() != 0:
        # We're not root so, like, whatever dude
        return

    # Get the uid/gid from the name
    running_uid = pwd.getpwnam(uid_name).pw_uid
    running_gid = grp.getgrnam(gid_name).gr_gid

    # Remove group privileges
    os.setgroups([])

    # Try setting the new uid/gid
    os.setgid(running_gid)
    os.setuid(running_uid)

    # owner/group r+w
    old_umask = os.umask(0x007)

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind((HOST, PORT))
    print(f'Server bound to port {PORT}')
    drop_privileges()
    s.listen(1)
    conn, addr = s.accept()
    with conn:
        print('Connected by', addr)
        while True:
            data = conn.recv(1024)
            if not data: break
            conn.sendall(data)
Enter fullscreen mode Exit fullscreen mode

This will drop permissions to a specific user, with "nobody" and "nogroup" by default. The pwd.getpwnam call obtains the entry for the user in the UNIX password database, (most of the time will be /etc/passwd) and grp.getgrnam does the same for the UNIX group database (most of the time will be /etc/group). After running this we can see the port is bound, but the process is running as nobody:

# python3 SimpleServer/simple_server_drop_priv.py
Server bound to port 7
$ pgrep -a -u nobody
285691 python3 SimpleServer/simple_server_drop_priv.py
Enter fullscreen mode Exit fullscreen mode

umask is related to permissions for files and directories created by the process. The 007 I have set allows user and group to have full access to the files, while all other users are blocked from access. This means I could change the process group to something like "serveradmin" and users in those groups would be able to interact with the server's files. Alex Juarez has a good article on permissions in general. This Stack Overflow answer also has an interesting look at the nuances of how umask operates.

Socket Server

Now the problem with the existing server is it exits right away and only handles one connection. This functionality is not practical for something like a web server which needs to constantly serve clients. Now we could make some modifications to have it continually serve connections:

# Echo server program
import socket

HOST = ''     # Symbolic name meaning all available interfaces
PORT = 9999   # Arbitrary non-privileged port
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind((HOST, PORT))
    s.listen()
    while True:
        conn, addr = s.accept()
        print('Connected by', addr)
        with conn:
            data = conn.recv(1024).strip()
            print("{} wrote:".format(addr))
            print(data)
            conn.sendall(data)
Enter fullscreen mode Exit fullscreen mode

But it's fairly low level not very extendable as-is. Thankfully python has the socket server module to provide abstraction in setting up a server. The python docs also have an example for a socket server:

import socketserver

class MyTCPHandler(socketserver.BaseRequestHandler):
    """
    The request handler class for our server.

    It is instantiated once per connection to the server, and must
    override the handle() method to implement communication to the
    client.
    """

    def handle(self):
        # self.request is the TCP socket connected to the client
        self.data = self.request.recv(1024).strip()
        print("{} wrote:".format(self.client_address[0]))
        print(self.data)
        # just send back the same data, but upper-cased
        self.request.sendall(self.data.upper())

if __name__ == "__main__":
    HOST, PORT = "localhost", 9999

    # Create the server, binding to localhost on port 9999
    with socketserver.TCPServer((HOST, PORT), MyTCPHandler) as server:
        # Activate the server; this will keep running until you
        # interrupt the program with Ctrl-C
        server.serve_forever()
Enter fullscreen mode Exit fullscreen mode

While the server creation is still somewhat more imperative in nature, client connections are now handled via an object which inherits off socketserver.BaseRequestHandler. This requires the implementing class to define a handle() method, which for TCP will expose self.request to hold a socket referencing the connection. Now to show multiple connections working I'll utilize the Apache HTTP server benchmarking tool. This is easily available in Ubuntu via sudo apt-get install apache2-utils:

$ ab -i -n 20 http://172.18.128.1:9999/
Enter fullscreen mode Exit fullscreen mode

This will make several short HTTP requests to the server (I've adjusted the code to bind to the proper IP address). 20 requests will be executed to the server which we can see the result of here:

Animation showing real time traffic on the socketserver

Being a benchmarking tool we also get some nice statistics, mostly:

Requests per second:    1992.99 [#/sec] (mean)
Enter fullscreen mode Exit fullscreen mode

This is compared to the simple server infinite loop version:

Requests per second:    1745.88 [#/sec] (mean)
Enter fullscreen mode Exit fullscreen mode

Now while the code layout has improved there's still the issue of handling multiple clients at once. One interesting way to handle this is to separate the socket acceptance and the actual client handler out.

A Thread Story or GIL Steals Your Lunch Money

Threads is one way of looking at this issue. The short story is it makes threading not as performant as a language without it using native thread. The long story is another full article. Socketserver has a thread server wrap around to help with this:

import threading
import socketserver

class ThreadedTCPRequestHandler(socketserver.BaseRequestHandler):

    def handle(self):
        print("{} wrote:".format(self.client_address[0]))
        data = str(self.request.recv(1024), 'ascii')
        print(data)
        cur_thread = threading.current_thread()
        response = bytes("{}: {}".format(cur_thread.name, data), 'ascii')
        self.request.sendall(response)

if __name__ == "__main__":
    # Port 0 means to select an arbitrary unused port
    HOST, PORT = "localhost", 50007

    server = socketserver.ThreadedTCPServer((HOST, PORT), ThreadedTCPRequestHandler)
    server.serve_forever()
Enter fullscreen mode Exit fullscreen mode

While you do have to deal with the GIL, it's really not that bad for a basic sized server.

Multiprocessing

When you run a server it gets a identified by the system as a process. A process can in turn run another process (the top of the chain is the init process for most operating systems). These are often known as child processes and the process that spawned them is the parent process. The multiprocessing module in the python standard library is able to manage such child processes. Using this method a client is attached to a process for handling:

import multiprocessing as mp
import logging
import socket
import time

logger = mp.log_to_stderr(logging.DEBUG)

# https://stackoverflow.com/a/8545724
# With modifications for echo server
def worker(socket):
    while True:
        client, address = socket.accept()
        data = client.recv(1024)
        logger.debug("{u} connected".format(u=address))
        print(data)
        client.sendall(data)
        client.close()
if __name__ == '__main__':
    num_workers = 20

    serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    serversocket.bind(('',9999))
    serversocket.listen(20)

    workers = [mp.Process(target=worker, args=(serversocket,)) for i in
            range(num_workers)]

    for p in workers:
        p.daemon = True
        p.start()

    while True:
        try:
            time.sleep(10)
        except:
            break
Enter fullscreen mode Exit fullscreen mode

So this will create 20 worker processes which will be listening for connections to the main server (yes, you can have multiple accept() calls). Looking at the processes:

21 processes shown in Windows task manager

Indeed we see that there are 21 python processes, the main parent process and the 20 worker processes. Now the issue here is that while we've split the workload up each worker process is still bound to finishing the client communication before moving on to the next. What if we could remove some of the barriers in waiting for client communication?

Blocking and Polling via Selector

It turns out that IO has a concept of blocking and non-blocking. Socket communication by default is blocking, meaning you have to wait for work like receiving data from a connection to be done before moving on. To get around this we can set socket communication to non-blocking via socket.setblocking. This means the usual socket methods will return right away. Unfortunately this has two inherent issues with a standard setup:

  • Not sending/receiving all the client data
  • High CPU usage on accept() loops due to continually calling it

To work around this there are several calls that deal with a connection being available which are supported by the selector module. By using DefaultSelector the most optimal for your operating system is chosen. As an example:

import socket
import selectors
import types

from io import BytesIO

host = "localhost"
port = 50007

def accept_wrapper(sock):
    conn, addr = sock.accept()
    print('accepted connection from', addr)
    conn.setblocking(False)
    data = types.SimpleNamespace(addr=addr, inb=b'', outb=BytesIO())
    sel.register(conn, selectors.EVENT_READ, data=data)

def service_connection(key, mask):
    sock = key.fileobj
    data = key.data
    if mask & selectors.EVENT_READ:
        try:
            recv_data = sock.recv(1024)
            if recv_data:
                data.outb.write(recv_data)
            else:
                sel.modify(socket, selectors.EVENT_WRITE, data=data)
        except:
            print('closing connection to', data.addr)
            sel.unregister(sock)
            sock.close()

    if mask & selectors.EVENT_WRITE:
        print('writing data to ', data.addr)
        sock.sendall(data.outb.getvalue())
        print('closing connection to', data.addr)
        sel.unregister(sock)
        sock.close()

sel = selectors.DefaultSelector()

lsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
lsock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
lsock.bind((host, port))
lsock.listen()
print('listening on', (host, port))
lsock.setblocking(False)
sel.register(lsock, selectors.EVENT_READ, data=None)

while True:
    events = sel.select(timeout=None)
    for key, mask in events:
        if key.data is None:
            accept_wrapper(key.fileobj)
        else:
            service_connection(key, mask)
Enter fullscreen mode Exit fullscreen mode

Now behind the scenes selector has a few options available as to how it's doing things. In the end though the process is:

  1. Set sockets to non-blocking
  2. Block until a client connects
  3. The accept will immediately return the client socket
  4. Add that socket to the list of sockets to check
  5. Block again, this time the client socket we just accepted is in the list of sockets to be notified about along with the server socket
  6. A socket is ready
  7. Go to 3 if it's the server socket
  8. If it's a client connection, run a handler against it to deal with the data
  9. Mostly loop back to 5, except we're not adding any new sockets

Which is pretty much how the loop goes. The main ways you'll generally deal with this are: select(), poll(), and epoll(). All of these have their own Selector() implementation. Using DefaultSelector generally picks the most optimal. In general, select is not quite the best performant due to the limit it has of 1024 sockets it can check (though it does work on Windows). poll() is an enhanced version, while still keeping somewhat portable. Both select() and poll() are essentially keeping a list of sockets to look at and going through them each time. epoll() on the other hand is more reactive instead allowing the ability to handle a large amount of sockets more efficiently than select() and poll(). That said, it's only available on Linux which limits portability (not a huge issue given how easy it is to get a Linux server these days). Handling a large number of connections efficiently is often referred to as the C10k problem (or some variant of k). Looking at the code now:

lsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
lsock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
lsock.bind((host, port))
lsock.listen()
print('listening on', (host, port))
lsock.setblocking(False)
sel.register(lsock, selectors.EVENT_READ, data=None)
Enter fullscreen mode Exit fullscreen mode

Here we have a normal socket bind and listen for the server. The server's socket is set to be non-blocking and registered into the list of sockets were interested in.

while True:
    events = sel.select(timeout=None)
    for key, mask in events:
        if key.data is None:
            accept_wrapper(key.fileobj)
        else:
            service_connection(key, mask)
Enter fullscreen mode Exit fullscreen mode

Now we have a main event loop. For the server socket the data property is set to none. If this is the case we run the client socket accept handler. Otherwise we're dealing with an existing connection that needs to be handled.

def accept_wrapper(sock):
    conn, addr = sock.accept()
    print('accepted connection from', addr)
    conn.setblocking(False)
    data = types.SimpleNamespace(addr=addr, inb=b'', outb=BytesIO())
    sel.register(conn, selectors.EVENT_READ, data=data)
Enter fullscreen mode Exit fullscreen mode

This will accept our connection and also set it to non-blocking. The next thing it does is setup a SimpleNamespace which is nicely explained here. It will be attached to the socket as a way to keep state when dealing with it. This allows for interaction between readers and writers. outb is set to a BytesIO type which is very performant when working with byte concatenation, which we'll be doing to keep track of data read in.

def service_connection(key, mask):
    sock = key.fileobj
    data = key.data
    if mask & selectors.EVENT_READ:
        try:
            recv_data = sock.recv(1024)
            if recv_data:
                data.outb.write(recv_data)
            else:
                sel.modify(socket, selectors.EVENT_WRITE, data=data)
        except:
            print('closing connection to', data.addr)
            sel.unregister(sock)
            sock.close()

    if mask & selectors.EVENT_WRITE:
        print('writing data to ', data.addr)
        sock.sendall(data.outb.getvalue())
        print('closing connection to', data.addr)
        sel.unregister(sock)
        sock.close()
Enter fullscreen mode Exit fullscreen mode

Now is the interesting part. The code will check if this is a read or write event. By default, the only thing that's being checked is if a socket is ready for reading. When everything is done we need to echo back so we switch writing mode. Then on the writing side we simply send all the data we have gathered back and close and remove the socket from the list of sockets we're interested in. The epoll() version gives a nice count on requests per second:

Requests per second:    6216.97 [#/sec] (mean)
Enter fullscreen mode Exit fullscreen mode

You can force a specific selector by changing DefaultSelector to:

  • SelectSelector ( select() )
  • PollSelector ( poll() )
  • EpollSelector ( epoll() )

Conclusion

I will say that this article to me is mostly showing different server types. If you're really in need of true performance it might be better to consider a language built for that ( such as GoLang, especially since it has emphasis on networking ) or have dedicated software that deals with all the nuances of network communication. In fact, most of the time you won't need to deal with this much in the modern cloud computing world. Load balancers, containerized microservices, and many managed services handle much of this for you. If you really just want to work with one to test things out the blocking threaded socketserver is good enough in my opinion. Now that we've seen different types of servers the next installment will be looking at a specialized type of server: HTTP.

Top comments (0)