DEV Community

Masayoshi Mizutani
Masayoshi Mizutani

Posted on

Goast: Generic static analysis for Go Abstract Syntax Tree by OPA/Rego

TL; DR

  • Go language has various static analysis tools, but to check your own rules, you need to create your own tools each time.
  • To enable more general checks with a single tool, I created a tool to analyze the Go language AST (Abstract Syntax Tree) with Rego, a general-purpose policy language.

https://github.com/m-mizutani/goast

Comment by goast
The rule "always take context.Context as the first argument" checked by CI.

Motivation

Various static analysis tools are available for the Go language, and existing static analysis tools can check general best practices. For example, gosec is a tool to check secure Go coding, and I use it myself. However, coding rules in software development are not only based on best practices, but can also be software- or team-specific. For example

  • All functions must take context.Context as an argument in some package
  • All functions must call an audit logging function at least once in some package
  • The User structure must be initialized by the function NewUser().
  • Some function must be called only in a specific package.

These rules can be checked by humans during review, but it is difficult to rely on humans alone because humans are always capable of making mistakes. Also, even if the review checking works when there are only a few rules, the more rules there are, the more likely they are to be missed inadvertently due to distraction. It also makes it difficult to focus on the essential reviews.

The Go language provides official tools and frameworks for static analysis, making it easy to create your own tools. On the other hand, however, creating an analysis tool for each rule seems to be a bit difficult from the standpoint of implementation and maintenance costs. Therefore, I thought it would be better to create a tool that can check static code as universally as possible.

Separate "rule" and "implementation" by Rego

Although not only for static analysis, one of the key points in creating a versatile checking tool is how to let users describe rules. It is too costly to create an original description language, and if the rules are given in structural data such as YAML or JSON, the expressive power will be limited and the versatility will be reduced.

A useful tool for such applications is the policy description language Rego. Rego is a general-purpose language that can be used to evaluate structured data by OPA. Some of the most popular uses include checking the status of resources used in cloud environments, checking the content of Infrastructure as Code descriptions, and checking authorization for access to servers. Please see this document for more detail of Rego.

By using Rego, the checking implementation and rules can be completely separated. The implementation is responsible for reading files, reading policies, passing data for evaluation, and outputting evaluation results, while the rules are written only in Rego. This allows a separation of interest between those who implement the tool and those who think about the rules.

Implementation

Then, I implemented generic static analysis tool for Go language, goast.

https://github.com/m-mizutani/goast

The tool reads Go code and evaluates the AST (Abstruct Syntax Tree, syntax abstract tree), an abstract representation of the code, by a policy written in Rego. The parser package is used to get the AST of the Go source code, which is then evaluated by Rego's policy. The evaluation can pass the AST of the whole file only once, or provide a mode to evaluate it node by node of the AST.

Let's look AST of Go code

I am also a beginner in AST in Go, so I can't imagine AST at all just by looking at the code. Therefore, I added a function to goast to dump AST for confirmation.

package main

import "fmt"

func main() {
        fmt.Println("hello")
}
Enter fullscreen mode Exit fullscreen mode

goast can output AST dump by a following command.

$ goast dump --line 6  examples/println/main.go | jq
{
  "Path": "examples/println/main.go",
  "Node": {
    "X": {
      "Fun": {
        "X": {
          "NamePos": 44,
          "Name": "fmt",
          "Obj": null
        },
        "Sel": {
          "NamePos": 48,
          "Name": "Println",
          "Obj": null
        }
      },
      "Lparen": 55,
      "Args": [
        {
          "ValuePos": 56,
          "Kind": 9,
          "Value": "\"hello\""
        }
      ],
      "Ellipsis": 0,
      "Rparen": 63
    }
  },
  "Kind": "ExprStmt"
}
Enter fullscreen mode Exit fullscreen mode

AST structural data tends to be relatively large, and even the 7 lines of code described above would be 1,408 characters of JSON data. Therefore, for ease of reading, only the sixth line of the code (fmt.Println("hello")) is output. Path is the path of the read file, Node is a dump of ast.Node passed by ast.Inspect, and Kind is the type information of Node. s type information.

As you can imagine, here .Node.X.Fun represents information about the calling function, and .Node.X.Args represents the arguments. For example, you could use this to describe a rule such as "prohibit the invocation of a particular function". You can also use following contexts as additional condition for example.

  • Allow/Prohibit calls within a specific package
  • Allow/Prohibit certain arguments
  • Allow/Prohibit direct passing of literals

Describe a rule

Now let's write Rego rules from the output AST. This time, let's describe a simple rule to forbid calling fmt.Println.

package goast

fail[res] {
    input.Kind == "ExprStmt"
    input.Node.X.Fun.X.Name == "fmt"
    input.Node.X.Fun.Sel.Name == "Println"

    res := {
        "msg": "do not use fmt.Println",
        "pos": input.Node.X.Fun.X.NamePos,
        "sev": "ERROR",
    }
}
Enter fullscreen mode Exit fullscreen mode

goast's rule schema is following.

  • package must be goast
  • Input: input has metadata such asPathKind and Node as actual AST.
  • Output: Put following structure data into fail if violation detected
    • msg (string): Detail message of violation
    • pos (int): Number to indicate position in the source code file
    • sev (string): Severity, choose one from INFO, WARNING, and ERROR

First, the three lines at the beginning of the rule detect the fmt.Println that was just dumped, and since the message in the format just dumped is passed directly to Rego as input, inspection of Kind, Node.X.Fun.X.Name and Node.X.Fun.Sel.Name Name to determine that it is an expression of a function call.

If you are not familiar with ASTs, it may be difficult to understand what pos means, but in this case it is a number that indicates the number of bytes from the beginning of the file and is stored in some fields such as NamePos and ValuePos. By putting the number in a response, goast will convert it to the number of lines in the file where the violation occurred, and the final output will indicate the number of lines.

You can detect violations by saving the Go code as main.go and the rule as policy.rego and running

$ goast eval -p policy.rego main.go
[main.go:6] - do not use fmt.Println

        Detected 1 violations

Enter fullscreen mode Exit fullscreen mode

Also, goast supports JSON format output.

$ goast eval -f json -p policy.rego main.go
{
  "diagnostics": [
    {
      "message": "do not use fmt.Println",
      "location": {
        "path": "main.go",
        "range": {
          "start": {
            "line": 6,
            "column": 2
          }
        }
      }
    }
  ],
  "source": {
    "name": "goast",
    "url": "https://github.com/m-mizutani/goast"
  }
}
Enter fullscreen mode Exit fullscreen mode

Static analysis in CI

Static analysis should be performed continuously by CI (Continuous Integration) to prevent unintentional inclusion of code. The JSON output schema is compatible with reviewdog and can be used as is in reviewdog.

We also have goast-action available for use with GitHub Actions, which allows you to perform static inspection on Pull Requests with the following workflow.

name: goast

on:
  pull_request:

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - name: checkout
        uses: actions/checkout@v2
      - uses: reviewdog/action-setup@v1
      - name: goast
        uses: m-mizutani/goast-action@main
        with:
          policy: ./policy  # Directory of rule files written in Rego
          format: json      # Output format, "text" or "json"
          output: fail.json # File name for output
          source: ./pkg     # Directory of Go source code to be checked
      - name: report
        env:
          REVIEWDOG_GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: cat fail.json | reviewdog -reporter=github-pr-review -f rdjson
Enter fullscreen mode Exit fullscreen mode

By the workflow, a comment such as the following image will be submitted by GitHub Actions.

Comment by goast

Conclusion

I believe that static analysis with Go AST and Rego allows developers more flexible static analysis for Go language. It would be helpful to develop more secure software.

However, I thought it's not possible to cover all static inspections by AST that shows the entire source code and a general-purpose policy language. For example, Rego is not good at writing rules that track changes in state, so it is not very suitable for use cases such as "how a variable is referenced or changed".

I have just started using it myself in practice, so I am still in the process of exploring various ways to use it. I welcome feature suggestions and discussion, so please feel free to comment or create an issue in the repository.

Top comments (3)

Collapse
 
marcello_h profile image
Marcelloh

I like the idea, but from the documentation it is unclear if one could write multiple rules in one Rego policy file. (There's only an example with one rule)

Collapse
 
mizutani profile image
Masayoshi Mizutani

Definitely. Let me add more documentation to write rules

Collapse
 
zimmy profile image
Ron Khera

Hi,
I am using the ast.ParseModuleWithOpts() function to parse a string conatining rego. the functions returns a *ast.Module. But I can not find a way to save the module to a file. I would like to do the same as the opa agent command below.
opa parse -format json some.rego
Do you have any suggestions or can ppoint me in the right direction ?
Thanks