Programatically collecting statistics and quality metrics from Python packages

#python

While in the process of comparing a large number of machine learning models as part of data science competitions run at Unearthed, in addition to analysing the predictive performance of models, we were interested in comparing some statistics and quality metrics for the actual source code files of each model.

Given there are a number of tools out there for analysing packages, the design goals were being able to collect of wide spectrum of information and something that could easily be invoked programatically (vs output from a CLI), to tie into our data science pipeline. With that in mind, the following options were evaluated:

In the end pylint and radon had the most promising and accessible feature set.

Radon had documented APIs for programatically accessing statistics in four categories: cyclomatic complexity, maintainability index, raw metrics and halstead metrics. An example of programatically collecting some statistics using the API is:

from radon.cli import Config
from radon.cli.harvest import MIHarvester


def measure_maintainability(source_dir):
    harvester = MIHarvester([source_dir], harvester_config())
    for path, raw_maintainability_statistics in harvester.results:
        print(raw_maintainability_statistics['mi'])


def harvester_config():
    return Config(exclude=None, ignore=None, order=SCORE, no_assert=False, show_closures=True, multi=4, by_function=False, min='A', max='F', include_ipynb=False)

Pylint lacked documentation on programatically collecting information from a source directory, but examining the entrypoint to the CLI command lead the following code snippet:

from pylint.lint import Run


def lint_directory(source_dir):
    buffer = io.StringIO()
    with redirect_stdout(buffer):
        try:
            Run(['--output-format=json', '--disable=' + ','.join(disabled_checks), source_dir])
        except:
            pass
    lint_results = json.loads(buffer.getvalue().replace("\n", ""))
    print(lint_results)

After collating each of the metrics, the response ended up looking like the following JSON payload:

{
  "aggregated_analysis": {
    "complexity": {
      "min": 1.0,
      "max": 5.0,
      "mean": 1.7619047619047619
    },
    "maintainability_index": {
      "min": 42.44120650814055,
      "max": 73.1183133154694,
      "mean": 61.16721652696456
    },
    "code_statistics": {
      "loc": 940,
      "lloc": 521,
      "sloc": 566,
      "comments": 183,
      "multi": 31,
      "blank": 186,
      "single_comments": 157
    },
    "halstead_metrics": {
      "h1": {
        "min": 1.0,
        "max": 15.0,
        "mean": 8.0
      },
      "h2": {
        "min": 3.0,
        "max": 137.0,
        "mean": 52.0
      },
      "N1": 110.0,
      "N2": 215.0,
      "vocabulary": {
        "min": 4.0,
        "max": 152.0,
        "mean": 60.0
      },
      "length": 325.0,
      "calculated_length": {
        "min": 4.754887502163469,
        "max": 1031.03375429972,
        "mean": 374.5962139339611
      },
      "volume": {
        "min": 12.0,
        "max": 2101.89897889864,
        "mean": 748.9542971398513
      },
      "difficulty": {
        "min": 0.6666666666666666,
        "max": 10.510948905109489,
        "mean": 5.309205190592052
      },
      "effort": {
        "min": 8.0,
        "max": 22092.952770905413,
        "mean": 7577.510451793251
      },
      "time": 1262.9184086322082,
      "bugs": 0.7489542971398513
    },
    "linting": {
      "types": {
        "convention": 206,
        "warning": 55,
        "refactor": 12
      },
      "symbols": {
        "invalid-name": 85,
        "line-too-long": 53,
        "trailing-whitespace": 33,
        // ...
      }
    }
  },
  "file_analysis": {
    // The same metrics, but for each file in the package.
  }
}

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

DEV Community

Programatically collecting statistics and quality metrics from Python packages

Get n8n VPS hosting 3x cheaper than a cloud solution

Top comments (0)

Your AI Code Assistant

Okay