DEV Community

Enmanuel Medina
Enmanuel Medina

Posted on • Updated on

IPv4 prefix lookup with binary search (Python)

This time, I want to show you how to perform an IPv4 prefix lookup using the binary search algorithm.

Motivation

diagram
Imagine that you are the network developer of a company and your boss requires you to write a script for validating if an IPv4 segment is inserted in the company's Edge Firewall.

The company has a FortiGate as an edge firewall, below is shown an extract of the target configuration.

!
! OUTPUT OMITTED FOR SIMPLICITY
!
config firewall addrgrp
    edit "Trusted_Clients"
        set member "200.30.40.0/23" "200.70.40.0/27" "100.30.40.0/28" "3.23.130.0/29" "5.53.20.0/22" "200.30.40.0/26"
    next
end
config firewall policy
    edit 1
        set srcintf "port1"
        set dstintf "port2"
        set srcaddr "Trusted_Clients"
        set dstaddr "Internal_WEB_SERVER"
        set action accept
        set status enable
        set schedule "always"
        set service "HTTPS"
    next
end
!
Enter fullscreen mode Exit fullscreen mode

Collecting data

In order to perform the task, you have to gather addresses configured in the trusted client's address group from the FortiGate device for further analysis.

For performing this action, you could use the fortiosapi module for retrieving address information from the FortiGate. Below is a script for consulting the information of a specific address group.

from fortiosapi import FortiOSAPI
from pprint import pprint
from ipaddress import IPv4Network
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

credentials = {
    'host': '192.168.149.110', #Fortigate Management IP Address
    'username': 'admin',
    'password': 'admin',
    'verify': False
}
fgt = FortiOSAPI()
fgt.https('off')
fgt.login(**credentials)
fortiResponse = fgt.get('firewall', 'addrgrp',
                 parameters='filter=name==Trusted_Clients')
pprint(fortiResponse)
Enter fullscreen mode Exit fullscreen mode

If you execute the above script, you will get the following information from the Fortigate.

{'build': 163,
 'http_method': 'GET',
 'http_status': 200,
 'name': 'addrgrp',
 'path': 'firewall',
 'results': [{'allow-routing': 'disable',
              'color': 0,
              'comment': '',
              'member': [{'name': '200.30.40.0/23',
                          'q_origin_key': '200.30.40.0/23'},
                         {'name': '200.70.40.0/27',
                          'q_origin_key': '200.70.40.0/27'},
                         {'name': '100.30.40.0/28',
                          'q_origin_key': '100.30.40.0/28'},
                         {'name': '3.23.130.0/29',
                          'q_origin_key': '3.23.130.0/29'},
                         {'name': '5.53.200.0/22',
                          'q_origin_key': '5.53.200.0/22'},
                         {'name': '200.30.40.0/26',
                          'q_origin_key': '200.30.40.0/26'}],
              'name': 'Trusted_Clients',
              'q_origin_key': 'Trusted_Clients',
              'tagging': [],
              'uuid': '8c410732-89b9-51eb-d2df-440818fc2646',
              'visibility': 'enable'}],
 'revision': '9.0.81.10657368498858893048.1616271400',
 'serial': '-',
 'status': 'success',
 'vdom': 'root',
 'version': 'v6.0.2'}
Enter fullscreen mode Exit fullscreen mode

Now you have to gather the IPv4 ranges from the retrieved FortiGate's address group. For making more simple the parsing process, assume that the name of each address object within the Trusted_Clients's address group is a valid IPv4 Network. For making this possible, add the following lines to the script:

rawAddrGrpInfo = [x['name'] for x in fortiResponse['results'][0]['member']]
print(rawAddrGrpInfo)
Enter fullscreen mode Exit fullscreen mode

With these changes, you have saved all IP address's member of the Trusted_Clients's address group in an array.

['200.30.40.0/23', '200.70.40.0/27', '100.30.40.0/28', '3.23.130.0/29', '5.53.200.0/22', '200.30.40.0/26']
Enter fullscreen mode Exit fullscreen mode

However, if you notice the data type of each IPv4 network, it's a string. This might not be useful to you for comparing other IPv4 addresses with this list. For making more useful these information, let's use the ipaddress's library that lets you convert a string into an IPv4Network object.

from ipaddress import IPv4Network
addrgGroupInfo = [IPv4Network(x) for x in rawAddrGrpInfo]
Enter fullscreen mode Exit fullscreen mode

Hey wait, what is the ipaddress module for?

As is stated its Python3's docs, ipaddress provides the capabilities to create, manipulate and operate on IPv4 and IPv6 addresses and networks. This module is available from Python3.3 and above.

This module provides the capability to check if an IPv4 network is a subnetwork of a greater IPv4 network, let's check this with an example:

Imagine that you have two IPv4 networks, the 192.168.1.64/27 and 192.168.1.0/25. You could validate if 192.168.1.64/27 is inside the 192.168.1.0/25 by performing the VLSM of the 192.168.1.0/25 to /27, below is shown the table with this process. (If you have doubt with this procedure please watch this tutorial).

IP address network mask
192.168.1.0 27
192.168.1.32 27
192.168.1.64 27
192.168.1.96 27

As you could evaluate in the table, the whole address 192.168.1.64/27 is inside the master block address 192.168.1.0/25.

In python, you could validate this by using the subnet_of method of the IPv4Network Object, below is shown an example:

>>> IP_A = IPv4Network('192.168.1.0/25')
>>> IP_B = IPv4Network('192.168.1.64/27')
>>> IP_B.subnet_of(IP_A)
 True
Enter fullscreen mode Exit fullscreen mode

In addition to this method, you could use operators like greater than > and less than < for checking if IPv4Network is greater or less to another.

>>> IP_A = IPv4Network('192.168.0.0/24')
>>> IP_B = IPv4Network('172.16.0.0/24')
>>> IP_A > IP_B
 True
>>> IP_A < IP_B
 False
Enter fullscreen mode Exit fullscreen mode

Need of binary search

At this point, you have to check if an IPv4 Network is included in the Edge_FW's address group, for performing this action the first approach would be to iterate through all the IPv4 Network of the addrgGroupInfo list, checking if the desired IPv4 Network is a subnet of the i-th element in the addrgGroupInfo.

from ipaddress import IPv4Network

rawAddrGrpInfo = ['200.30.40.0/23', '200.70.40.0/27',
                  '100.30.40.0/28', '3.23.130.0/29', '5.53.200.0/22', '200.30.40.0/26']
addrgGroupInfo = [IPv4Network(x) for x in rawAddrGrpInfo]
desiredAddrs = [
    IPv4Network('5.53.201.34/31'),
    IPv4Network('200.70.40.64/28'),
    IPv4Network('200.30.40.8/29')]
for dAddr in desiredAddrs:
    found = False
    for addr in addrgGroupInfo:
        if dAddr.subnet_of(addr):
            found = True
            break
    if found:
        print(f"{dAddr} is included in the address group, because is subnet of {dAddr}.")
    else:
        print(f"{dAddr} is not included in the address group.")

# output:
# 5.53.201.34/31 is included in the address group, because is subnet of 5.53.200.0/22.
# 200.70.40.64/28 is not included in the address group.
# 200.30.40.8/29 is included in the address group, because is subnet of 200.30.40.0/23.
Enter fullscreen mode Exit fullscreen mode

The above code will do the job, hence as the number of elements in the desiredAddrs and addrgGroupInfo lists increase, the runtime of the script will increase exponentially. That is because the big O notation of linear search for each q queries element in the worst of the cases is O(q * n).

For reducing the runtime complexity, you have to use a better approach, like binary search algorithm that is a search algorithm that finds the position of a target value within a sorted array by using the divide and conquer approach. Its big O notation in the worst cases is O(log(n)) and the big O notation for each q queries element is O(q * log(n)).

Comparasion of binary and linear search by realpython

The image was taken from realpython.com

Below is the implementation of binary search for looking up IPv4 Networks. Take into consideration that this implementation uses the subnet_of method of the IPv4Network object for checking if the desired IPv4Network is a subnet of the middle element itself.

def ipv4binary_search(arr, value):
    """
    :param list(IPv4Network) arr: the list of sorted IPv4Network that must be looked up.
    :param IPv4Network value: the desired IPv4Network to search. 

    :return int: return true if the desired IPv4Network was found as an subnet of any IPv4Network in the arr list
    """
    def binary_search(arr, low, high, x):

        # Check base case
        if high >= low:

            mid = (high + low) // 2

            # If element is present as subnet of the middle IPv4network itself
            if x.subnet_of(arr[mid]):
                return mid

            # If element is smaller than mid, then it can only
            # be present in left subarray
            elif arr[mid] > x:
                return binary_search(arr, low, mid - 1, x)

            # Else the element can only be present in right subarray
            else:
                return binary_search(arr, mid + 1, high, x)

        else:
            # Element is not present in the array
            return -1
    return binary_search(arr, 0, len(arr) - 1, value) != -1

Enter fullscreen mode Exit fullscreen mode

Now let's solve the requirement using the above function:

from ipaddress import IPv4Network

rawAddrGrpInfo = ['200.30.40.0/23', '200.70.40.0/27',
                  '100.30.40.0/28', '3.23.130.0/29', '5.53.200.0/22', '200.30.40.0/26']
addrgGroupInfo = [IPv4Network(x) for x in rawAddrGrpInfo]
addrgGroupInfo.sort()
desiredAddrs = [
    IPv4Network('5.53.201.34/31'),
    IPv4Network('200.70.40.64/28'),
    IPv4Network('200.30.40.8/29')]
for dAddr in desiredAddrs:
    found = ipv4binary_search(addrgGroupInfo, dAddr)
    if found:
        print(f"{dAddr} is included in the address group, because is subnet of {dAddr}.")
    else:
        print(f"{dAddr} is not included in the address group.")
# output:
# 5.53.201.34/31 is included in the address group, because is subnet of 5.53.200.0/22.
# 200.70.40.64/28 is not included in the address group.
# 200.30.40.8/29 is included in the address group, because is subnet of 200.30.40.0/23.
Enter fullscreen mode Exit fullscreen mode

Runtime Comparison

For comparing the runtime of each approach, let's use the timeit library for measuring the execution time.

import timeit
from ipaddress import IPv4Network

def with_binary(desiredAddrs, addrgGroupInfo):
    for dAddr in desiredAddrs:
        found = ipv4binary_search(addrgGroupInfo, dAddr)


def with_linear(desiredAddrs, addrgGroupInfo):

    for dAddr in desiredAddrs:
        found = False
        for addr in addrgGroupInfo:
            if dAddr.subnet_of(addr):
                found = True
                break


rawAddrGrpInfo = ['37.64.125.0/24', '254.6.178.0/24', '140.59.154.0/24', '216.172.137.0/24',
                  '165.108.169.0/24', '20.82.220.0/24', '35.66.143.0/24', '10.36.100.0/24',
                  '32.57.6.0/24', '200.2.153.0/24', '25.174.209.0/24', '21.228.19.0/24',
                  '215.229.157.0/24', '133.146.52.0/24', '108.212.12.0/24', '168.228.72.0/24',
                  '228.106.212.0/24', '194.75.185.0/24', '189.102.231.0/24', '237.107.131.0/24',
                  '200.30.40.0/23', '200.70.40.0/27', '100.30.40.0/28', '3.23.130.0/29',
                  '5.53.200.0/22', '200.30.40.0/26']
addrgGroupInfo = [IPv4Network(x) for x in rawAddrGrpInfo]
addrgGroupInfo.sort()
temp = [
    IPv4Network('5.53.201.34/31'),
    IPv4Network('200.70.40.64/28'),
    IPv4Network('200.30.40.8/29')]
desiredAddrs = list()
for _ in range(1000):
    desiredAddrs.extend(temp)


print(f"Linear search approach", timeit.timeit(
    lambda: with_linear(desiredAddrs, addrgGroupInfo),  number=200), "sec")
print(f"Binary Search approach", timeit.timeit(
    lambda: with_binary(desiredAddrs, addrgGroupInfo),  number=200), "sec")

# output:
# Linear search approach 8.6255804 sec
# Binary Search approach 4.6382135 sec
# 
Enter fullscreen mode Exit fullscreen mode

As shown above, as the number of elements is increased, the runtime of the binary search will be better than the runtime of the linear search.

Wrapping up

For the requirement presented at the beginning of the post, in production networks, you may find a policy with thousands of IP prefix entries not just for allowing or denying access, instead, it may be found in BGP policies, SPAM filters, routing redistributions, and so on. As a network developer, for those scenarios, if you have to validate the existance of a list of IPv4 networks within another given list in your application, you might consider implementing a binary search to solve the problem.

Latest comments (0)