Paweł Piwosz

Posted on Nov 14, 2022

SBOM with ScanCode.io

#sbom #cybersecurity #compliance #process

So, we know what SBOM is and why it is more and more important that we should generate it. It is time to go into some tools which can help us to create and validate reports.

I have not very deep experience with SBOM tools. However, what I heard and learned so far, is that this area is still not fully covered. Besides standards and approaches, if we want to have operational pipeline with all its elements, we need to play with existing stuff a little more than we would like.

The first tool we discuss here, is ScanCode.io.

The tool

What is ScanCode?

Simply speaking, it is a scanner and code analyser tool which allows to scan codebase for origins and licenses. In general, ScanCode collects information about components and their licenses during the Software Composition Analysis (SCA) process.

Installation

ScanCode has a few option of installation, we cover here container version. When we clone the repository

$ git clone https://github.com/nexB/scancode.io.git

we should create the .env file, using prepared Makefile

$ make envfile

This will create the file and secret. In my case it looks like this

$ cat .env
SECRET_KEY="Rxg+cZJQDOdinwXAPc/D2d2QyEODpl5xz4NJp5f/aSSDmf106a"

We are ready to build and run the tool.

In fact, we have docker-compose template in our disposal. This template contains a few elements

db (postgres)
redis
web (app)
worker
nginx

Obviously, Nginx is our entry point. Behind it the app is working with workers (where the actual scans are executed). On the end we have Redis and PotgreSQL to keep data.

Build and run

This part for all of us who knows docker is simple and clear. First, we need to build the containers

$ docker-compose build

In fact, only app container is processed in this step. However, the build process takes a lot of time. And the image is quite huge

$ docker images
REPOSITORY          TAG             IMAGE ID       CREATED          SIZE
scancodeio_web      latest          a6a6380ef6f8   34 seconds ago   2.31GB
scancodeio_worker   latest          a6a6380ef6f8   34 seconds ago   2.31GB

When build is finished, we are ready to run our stack

$ docker-compose up -d

And here we have some issue (well, maybe "issue" is a little bit too big word). Compose exposes ports 80 and 443 for Nginx service, but the Nginx server is configured for port 80 only.

GUI

When the stack starts, we can go to GUI console in the browser, by entering http://localhost:80.

The console is simplistic but nice, clear and comfortable to work with.

Setup the project

It is time to setup our first project. On the beginning we will setup simple scan. Many of us use python container image, correct? Let's see, what we can learn about python:latest!

After we click "New project" button, on the right side we can select the project type.

These are predefined. We can create our own too.

As we wish to scan docker image, we select docker and now it is time to configure the project. Configuration is very simple.

I entered three values:

Project's name
url docker://python.latest (this will connect to dockerhub and collect the proper image)
pipeline - docker in this case.

And that's it!

Let's click Create.

Processing

On main screen we can observe the progress of project's execution. In my case I had to refresh the view manually, but it is not an issue.

As this task took a lot of time, I created second project, this time for Python based on Alpine Linux. This execution was queued. Can we run these tasks simultaneously? Well, yes, it is explicit config setting.

Execution of the first run took more than 1 hour on my machine. It is a lot.

So here we see some downside of this process - it can be very time-consuming. Therefore if we want to design this process as part of CI/CD pipeline, we need to be careful and aware of potential time needed for execution.

Report

Now the juice. When I generated the report, I started to dig deeper and looking around and after long time (and I mean it!) I realised "Hey! You write an article. Write the article, then!" Reports generated by ScanCode are simply great.

Ok, let's navigate through them, shall we?

Scan's summary

First, let's click on the green Success button in the row where our scan is. This report shows some, let's say, meta information about scan process.

We see status of the run, info about the task, execution time, dates, resources. Quite useful summary.

Here we see more details about execution itself. What steps were performed, how long these steps took, etc.

Scan's details

And here is the core of our report (it is not a SBOM yet!).

On the top of the screen we can see UUID and work directory information. Below we have some numbers about the execution and buttons to download the raport in different formats. And after that we have information about input artifact, in this case python:latest. Project data shows a lot of information about docker image, with layers, descriptions, commands etc. Very useful.

Next section shows a lot of visualised data

Information about packages.

Dependencies information. Here we can learn everything about dependencies discovered during the scan.

Finally, codebase resources. As the picture above presents, multiple scopes are available for us to analyze.

Let's go into some of details now.

In codebase resources in HOLDER category we see some of holders, and there is Mr. Vinay Sajip. Let's see, what is his contribution here.

Hoover the proper element and click

Here we have details about every finding.

Now things go even more interesting. Click on any Path element and...

We go into the file! Let's find out information about licenses. Click Licenses in Detected values list

Ok, let's look on something else now!

Select Other in PACKAGE LICENSE EXPRESSION

We can check every individual package and learn what license type it uses.

Another example, Go to PACKAGE TYPE and click pypi

As we can see, the information detected by ScanCode.io is very detailed. We took one of the most popular images out there and we are able to depict it to the smallest elements.

Download data

Finally, we can download reports in different formats.

Click one of the buttons from picture below

And it will be simply downloaded :)

Report can be downloaded as JSON od Excel file. Two more options format the report with the standards restrictions - one for SPDX (Software Package Data eXchange) and second for CycloneDX standard (and these are our SBOMs).

We scanned docker images so far. I also did the test for code bundle. So, I have my very old python Alexa Skill script for AWS Lambda. It contains a few dependencies, let's take a look on the requirements.txt file

ask-sdk-core==1.9.0
Pillow

That's it. In the function code I import a few libraries

import logging
import json
import requests

from ask_sdk_core.skill_builder import SkillBuilder
from ask_sdk_core.dispatch_components import AbstractRequestHandler
from ask_sdk_core.dispatch_components import AbstractExceptionHandler
from ask_sdk_core.utils import is_request_type, is_intent_name
from ask_sdk_core.handler_input import HandlerInput

from ask_sdk_model.ui import SimpleCard
from ask_sdk_model import Response

I created a bundle and scanned it with these settings

So, I put the zip directly into the GUI. Of course, I could send it to my artefact storage and scan it from there (similarly like we did with Docker images).

After the scan is finished, I have a very interesting report

Let's take a look on dependencies

And packages

So, this was the quick review of ScanCode.io. The tool is very easy to use, very easy to maintain and, what is very important for the teams, very easy to start with.

As this operation - create SBOM - might be obligatory very soon, it is a good idea to start preparing ourselves for it.

However, there is one thing, which I wasn't able to successfully run. Vulnerability scan

But we will handle it in the next episode.

It is worth to mention that ScanCode provides API. It means that it can serve in the security pipelines and provide its functionality on demand without delays needed for provisioning. API functionality gives the needed flexibility and scalability needed to act as a important tools in the Organization's governance and compliance.

Cover image by Suzy from Pixabay

DEV Community

SBOM with ScanCode.io

The tool

Installation

Build and run

GUI

Setup the project

Processing

Report

Scan's summary

Scan's details

Download data

Top comments (0)

Read next

How to Monitor the Length of Your Individual Azure Storage Queues

Understanding Fractional Positions: The Future of Flexible Employment in 2025

📰 tf-nightly-intel 2.19.0.dev20250126

Fixed: My React/Node Setup is Broken (A Practical Guide)