Policy Automation in Open Source Software
One of the biggest challenges with application security fundamentally comes from the amount of data that must be reasoned about. Nowhere is this more true than the moderation of issues stemming from open source packages - it is easy for teams to be totally inundated with issues and findings.
One of Phylum's key design goals was to help address this issue: Provide users with the tools they need in order to proactively reduce the number of false-positives they face, by allowing better automation from start to finish. It's with these tools we start to really enable the building blocks of automation - and to that end, allowing better integration into the overall development ecosystem.
Starting with Policy Management
While Phylum provides tools today to help proactively manage policy, they may not be as granular or robust as some organizations would desire. To that end, we will explore the development of a custom policy framework to enable automated triage of issues, as well as proactive screening of major engineering hygiene and compliance problems.
We will start here with a slightly more in-depth, custom version of the existing NPM shim extension - a tool that enforces default project policy when installing NPM packages. This custom extension will do some additional custom validation before allowing the installation process to continue.
First, per the API documentation we will start with the manifest file for our new extension - PhylumExt.toml
:
name = "npm-policy"
description = "Example custom policy extension"
entry_point = "main.ts"
[permissions]
read = ["./package.json", "./package-lock.json"]
write = ["./package.json", "./package-lock.json"]
run = ["npm"]
The Beginning of the Extension
Now, we will jump over and start working on our custom policy extension. Starting from the very beginning, we will create a basic skeleton in main.ts
that essentially emulates the behavior of the linked shim extension. In the current implementation of the extension API, the first two command line arguments (the binary path + first value - what would usually be contained in arg[0]
and arg[1]
) are effectively consumed before the extension itself is invoked. This means that once our extension is installed, since the registered name is npm-policy
, if we were to invoke the CLI with the following command: phylum npm-policy install packageX
, Deno.args[0]
will contain install
and Deno.args[1]
will contain packageX
.
Starting with a rough analog to the npm policy shim, we get something similar to the following:
import { PhylumApi } from 'phylum';
const LOCKFILE_NAME = "./package-lock.json";
const PACKAGEFILE_NAME = "./package.json";
type NullableString = string | null;
// This method will return a promise containing a null value if the file is
// missing, and the containing text otherwise.
async function readTextFileIfExists(path: string): Promise<NullableString> {
try {
// We will attempt to read the contents of an existing
// lockfile, if one exists. This will let us restore
// the data after we perform our analysis.
return await Deno.readTextFile(path);
} catch(e) {
// Pass - we will just know that we won't need to _restore_ the lockfile after
// the operation is complete if we have no file content.
return null;
}
}
// Method to attempt: generating a new lockfile, parsing said lockfile, and
// performing analysis on the contained package information.
async function preAnalyzeNpm(subcmd: string, args: string[]): Promise<Object> {
try {
// Attempt to leverage npm to generate a new lockfile based on the
// provided arguments
await Deno.run({
cmd: ['npm', subcmd, '--package-lock-only', ...args],
stdout: 'piped',
stderr: 'piped',
}).status();
// Attempt to parse the new lockfile - if that fails, we will return
// an error.
const lockFile = await PhylumApi.parseLockFile(LOCKFILE_NAME, 'npm');
if(!lockFile.packages.length)
return {pass: false, message: 'error: no packages found in lockfile'};
// Analyze the underlying packages, and get back the job information
const jobId = await PhylumApi.analyze('npm', lockFile.packages);
const results = await PhylumApi.getJobStatus(jobId);
// Give back the results
return results;
} catch(e) {
// An error occurred along the way - likely with command execution or similar
return {pass: false, message: `error: analysis attempt failed! ${e}`};
}
}
async function runNpmAnalysis(): Promise<Object> {
// First, we will capture the existing text within both the
// package and lockfiles.
const packageList = readTextFileIfExists(PACKAGEFILE_NAME);
const lockfile = readTextFileIfExists(LOCKFILE_NAME);
// We can't _really_ proceed without a package.json.
if(!packageList)
return {pass: false, message: "package.json is missing"};
// Attempt the analysis
const data = preAnalyzeNpm(Deno.args[0], Deno.args.slice(1));
try {
// We now need to restore the lockfile and package.json.
// Since we dont know if we want to allow the installation to
// proceed just yet.
await Deno.writeTextFile(PACKAGEFILE_NAME);
if(lockfile)
await Deno.writeTextFile(LOCKFILE_NAME);
else
await Deno.remove(LOCKFILE_NAME);
} catch(e) {
console.error(`Error: failed to restore either package or lockfile! ${e}`);
}
// Give back the data to the user.
return data;
}
const args = new Set(['install', 'isntall', 'update', 'udpate']);
if(Deno.args.length >= 1 && args.has(Deno.args[0])) {
// Attempt the analysis if we meet the criteria above
const res = await runNpmAnalysis();
// Now we will do some basic validation - this basically
// gets us parity with the current extension:
if(!res.pass) {
if(res.message) {
console.error(`An error occurred while attempting to check - ${res.message}`)
Deno.exit(-1);
}
console.error("the installation attempt triggered a policy failure!");
Deno.exit(-2);
}
if('complete' !== res.status) {
console.warn("Scan is incomplete - please try again shortly!");
Deno.exit(-3);
}
// TODO: Add more validation
}
console.log("[phylum] installation ok to proceed...");
const status = await Deno.run({cmd: ['npm', ...Deno.args]}).status();
Deno.exit(status);
At this point we can now start to drill into specific issues, add logic to filter for certain types of license, or even apply local environment checks. Probably the best place to start our journey would be the Phylum API documentation around the actual analysis object - this gives us a spot to start digging into the types of findings that may come back. From that perspective, we will start by adding a set of new methods that we will ultimately plumb through to handle the extra validations. From this view, we should make sure that this API is flexible and easy to extend. Something we can start with might be as follows:
// First, we will define our "filter" type to handle examination of each
// package analysis result that comes back from our request. We will return
// an error message to display to the user if the check fails, or null if
// the package is ok to use.
type Filter = (pkg: Object) => NullableString;
// Our validation method will take a list of packages (from our API response),
// and a list of "filter" methods (as defined above). It will return an error
// message to display to the user if the checks fail, or null if all pass.
const packageValidator = (pkgList: Object[], fltList: Filter[]): NullableString => {
let errorMessages = [];
// We will walk through the list of packages here
for(const pkg of pkgList) {
// For each package, we will apply each provided
// filter method, and add the result to our list
// of errors if a problem is found.
for(const flt of fltList) {
const res = flt(pkg);
if(res)
errorMessages.push(res);
}
}
// Now we will return either a consolidated error list, or null.
if(errorMessages.length)
return errorMessages.join("\n");
return null;
}
So now we have a general framework we can apply rules to - now, if we take the operative block of code from the first example where we left our // TODO:
initially, we can start by slotting in our new filter method:
// ...
// TODO: populate with new filters!
const filterMethods: Filter[] = [];
if(Deno.args.length >= 1 && args.has(Deno.args[0])) {
// Attempt the analysis if we meet the criteria above
const res = await runNpmAnalysis();
// Now we will do some basic validation - this basically
// gets us parity with the current extension:
if(!res.pass) {
if(res.message) {
console.error(`An error occurred while attempting to check - ${res.message}`)
Deno.exit(-1);
}
console.error("the installation attempt triggered a policy failure!");
Deno.exit(-2);
}
if('complete' !== res.status) {
console.warn("Scan is incomplete - please try again shortly!");
Deno.exit(-3);
}
// Now, we slot in our filter method, and provide the package list from
// our previous analysis
const filteredResult = packageValidator(res.packages, filterMethods);
if(filteredResult) {
// If we have landed here, that means at least one error was returned.
// In that case, we will print the error message to the user and exit.
console.error(filteredResult);
Deno.exit(-4);
}
// Otherwise, our new filter methods have passed - this means we are free to
// continue as before.
}
// ...
Now, we are at the point where it makes sense to start actually adding in filters. For this, we can start with a view of what the actual package structure will look like - we should receive a call to each of our filters for each package structure returned by the API. This will have a set of fields describing various attributes of the package, and also, optionally a list of issues. To keep things simple, we will start by writing a rule that filters packages with certain, restrictive licenses - this could include GPL or AGPL software - so it simply can't be installed anymore from here on out.
// List of validators to check licenses against
const disallowedLicenses = [/(A|L)?GPL-\d+.*/, /CC-BY-.*/];
const licenseValidator = (pkg: Object): NullableString => {
// Now we will walk through the list of disallowed license regexes,
// and if any match, we will return an error.
for(const license of disallowedLicenses)
if(license.test(pkg.license))
return `disallowed license ${pkg.license} in ${pkg.name}:${pkg.version}`;
return null;
}
Perhaps in addition to this, we want to add in some filters to help manage against actual issues returned by the API; in this case, we will keep the example simple and focus on filtering malware issues above a threshold of "medium", and all other "critical" issues identified:
const disallowedMalware = new Set(['medium', 'high', 'critical']);
const issueValidator = (pkg: Object): NullableString => {
// Issues are currently defined as:
// {
// tag: "UNIQUE_ID",
// title: "issue title",
// description: "Issue description"
// severity: "low | medium | high | critical",
// domain: "malicious_code | author | engineering | license | vulnerability"
// }
for(const issue of pkg.issues) {
// If we encounter _any_ critical issues, we will return now
if('critical' === issue.severity)
return `critical issue ${issue.title} found in ${pkg.name}:${pkg.version}`;
// Similarly, we will check to see if the issue is both a malware finding,
// and has a severity of medium or higher.
if(issue.domain === 'malicious_code' && disallowedMalware.has(issue.severity))
return `potential malware ${issue.title} found - ${pkg.name}:${pkg.version}`;
}
// If we made it here, none of the issues have created an error we need to report
return null;
}
Now, we can go back and slot in our two new filter methods, and should have some validation during the installation process while utilizing this new extension:
// ...
const filterMethods: Filter[] = [licenseValidator, issueValidator];
if(Deno.args.length >= 1 && args.has(Deno.args[0])) {
// ...
// Now, with the update at the top to filterMethods, we will apply
// our two new filters to each returned package.
const filteredResult = packageValidator(res.packages, filterMethods);
if(filteredResult) {
// ...
Now this is certainly not an entirely comprehensive solution, but should provide enough to begin building stronger policy capabilities on top of Phylum's tooling and API.
You can read many more articles in the same vein on our blog here, and sign up to use Phylum and try out this example (among others).
Top comments (0)