DEV Community

Masayoshi Mizutani
Masayoshi Mizutani

Posted on

Created Go library to acquire and manage blacklists of IP addresses and domain names

It is almost the title, but I have created a library called badman for the purpose of obtaining blacklists published on various sites and using it for comparison with traffic logs etc. badman stands for Blacklisted Address and Dmain name Manager.

https://github.com/m-mizutani/badman

A little while ago, if you wanted to monitor your network security, I think you often used a network based IDS such as snort. However, in recent years, most communications have been on HTTPS. Then, network based IDS has not been a very effective method at present. One of the alternatives is to store network flow information (IP address, port number, protocol, data size exchanged, etc.) and DNS query + response logs, and then verifies if communication to malware's Command & Control server or fraudulent site has occurred. With this method, even if the communication such as the Web is encrypted, you can discover if there was suspicious communication. However, if you download the list from the blacklist provider every time you check the communication logs that occur continuously, it will put a load on the provider and your network. Therefore, I created this library in the hope that it could be used once and then reused over the medium to long term.

The reason for implementing the code as library is that the method of inputting logs and the contents of logs were considered to differ greatly depending on the environment. Specific architecture examples will be described later.

How to use

If you are familiar with Go language, I think that if you see the following code, you will get an image.

package main

import (
    "bufio"
    "log"
    "os"

    "github.com/m-mizutani/badman"
    "github.com/m-mizutani/badman/source"
)

func main() {
    man := badman.New()

    if err := man.Download(source.DefaultSet); err != nil {
        log.Fatal("Fail to download:", err)
    }

    // ipaddrs_in_traffic_logs.txt contains IP address line by line
    fd, err := os.Open("ipaddrs_in_traffic_logs.txt")
    if err != nil {
        log.Fatal("Fail to open a file:", err)
    }
    defer fd.Close()

    scanner := bufio.NewScanner(fd)
    for scanner.Scan() {
        entities, err := man.Lookup(scanner.Text())
        if err != nil {
            log.Fatal("Fail to lookup:", err)
        }

        if len(entities) > 0 {
            log.Printf("Matched %s in %s list (reason: %s)\n",
                entities[0].Name, entities[0].Src, entities[0].Reason)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Here's what this code does:

  1. Download multiple lists from sites offering blacklists and create your own blacklist repository
  2. Open ipaddrs_in_traffic_logs.txt which contains the list of IP addresses and extract one line (one IP address) at a time
  3. Verify that the retrieved IP address is included in the repository

In this sample, the download of the blacklist of 1. will be executed every time, but originally it is assumed that the downloaded blacklist data is saved locally once and reused. There are two ways to save: A) Write the serialized data to a file and read the file from next time, B) Persistent data store (currently packaged only in AWS DynamoDB) as backend. See README in the repository for detailed usage.

Architecture examples

Assuming use on AWS, I present two example architectures. It is assumed that each of them will create and run their own programs using badman as a library.

Serverless architecture

Serverless based architecture

In this case, we use two Lambda functions. The first (left) function gets the blacklist periodically and stores the serialized blacklist data in S3. The second (right) Lambda function is called by the ObjectCreated event when the traffic log file is uploaded to S3. The Lambda function then downloads both the serialized blacklist data and the log file and checks if the IP address in the traffic log is on the blacklist. If present, Lambda will notify the administrator via a communication tool such as Slack.

On the other hand, the disadvantage is delay. Depending on the log flow, there will be a delay in the buffering time since logs will be compiled to some extent before uploading to S3. As a rule of thumb, it takes about a few minutes to a dozen minutes. If you need real-time performance, you may want to adopt the following server model.

Server based architecture

Server based architecture

In this architecture, a constantly running program runs on the host (AWS EC2 in this example). The main advantage is stream processing for real-time performance. As mentioned earlier, in the serverless model, it is necessary to collect logs to some extent before uploading them to S3, so a delay of several minutes occurs from the generation of logs to the completion of processing. Minimize delays by using a constantly running server program to constantly flush data.

Assuming that this program runs fluentd's forward service (the protocol is [https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1]) And receive traffic logs via fluentd. Then use badman to check the traffic log for the blacklisted IP addresses. Blacklists are expected to be updated regularly. Use DynamoDB as a repository for storing data so that you can recover if the host running the program (in this case, EC2) crashes.

The disadvantage of this architecture is that it is difficult to allocate resources. Perhaps the log flow is a bottleneck, so you need to configure your host resources for that maximum. You should also be aware of the potential for a significant increase in log traffic due to trouble or unusual events. In order to solve this, it is necessary to use it in combination with a mechanism such as automatic scaling.

Precautions for use

Each site offering a blacklist has a different policy. Please note that the usage conditions may not be met depending on the environment or organization. In order to limit the blacklist service sites that can be used, users can also select sites to use themselves. You can also import data from your own site or a private data store by implementing a structure that satisfies the interface provided by badman yourself.

The policy of each site is also summarized in README, so please refer to that .

Top comments (0)