DEV Community

Sam
Sam

Posted on • Edited on

2 1

Programatically collecting statistics and quality metrics from Python packages

While in the process of comparing a large number of machine learning models as part of data science competitions run at Unearthed, in addition to analysing the predictive performance of models, we were interested in comparing some statistics and quality metrics for the actual source code files of each model.

Given there are a number of tools out there for analysing packages, the design goals were being able to collect of wide spectrum of information and something that could easily be invoked programatically (vs output from a CLI), to tie into our data science pipeline. With that in mind, the following options were evaluated:

In the end pylint and radon had the most promising and accessible feature set.

Radon had documented APIs for programatically accessing statistics in four categories: cyclomatic complexity, maintainability index, raw metrics and halstead metrics. An example of programatically collecting some statistics using the API is:

from radon.cli import Config
from radon.cli.harvest import MIHarvester


def measure_maintainability(source_dir):
    harvester = MIHarvester([source_dir], harvester_config())
    for path, raw_maintainability_statistics in harvester.results:
        print(raw_maintainability_statistics['mi'])


def harvester_config():
    return Config(exclude=None, ignore=None, order=SCORE, no_assert=False, show_closures=True, multi=4, by_function=False, min='A', max='F', include_ipynb=False)
Enter fullscreen mode Exit fullscreen mode

Pylint lacked documentation on programatically collecting information from a source directory, but examining the entrypoint to the CLI command lead the following code snippet:

from pylint.lint import Run


def lint_directory(source_dir):
    buffer = io.StringIO()
    with redirect_stdout(buffer):
        try:
            Run(['--output-format=json', '--disable=' + ','.join(disabled_checks), source_dir])
        except:
            pass
    lint_results = json.loads(buffer.getvalue().replace("\n", ""))
    print(lint_results)
Enter fullscreen mode Exit fullscreen mode

After collating each of the metrics, the response ended up looking like the following JSON payload:

{
  "aggregated_analysis": {
    "complexity": {
      "min": 1.0,
      "max": 5.0,
      "mean": 1.7619047619047619
    },
    "maintainability_index": {
      "min": 42.44120650814055,
      "max": 73.1183133154694,
      "mean": 61.16721652696456
    },
    "code_statistics": {
      "loc": 940,
      "lloc": 521,
      "sloc": 566,
      "comments": 183,
      "multi": 31,
      "blank": 186,
      "single_comments": 157
    },
    "halstead_metrics": {
      "h1": {
        "min": 1.0,
        "max": 15.0,
        "mean": 8.0
      },
      "h2": {
        "min": 3.0,
        "max": 137.0,
        "mean": 52.0
      },
      "N1": 110.0,
      "N2": 215.0,
      "vocabulary": {
        "min": 4.0,
        "max": 152.0,
        "mean": 60.0
      },
      "length": 325.0,
      "calculated_length": {
        "min": 4.754887502163469,
        "max": 1031.03375429972,
        "mean": 374.5962139339611
      },
      "volume": {
        "min": 12.0,
        "max": 2101.89897889864,
        "mean": 748.9542971398513
      },
      "difficulty": {
        "min": 0.6666666666666666,
        "max": 10.510948905109489,
        "mean": 5.309205190592052
      },
      "effort": {
        "min": 8.0,
        "max": 22092.952770905413,
        "mean": 7577.510451793251
      },
      "time": 1262.9184086322082,
      "bugs": 0.7489542971398513
    },
    "linting": {
      "types": {
        "convention": 206,
        "warning": 55,
        "refactor": 12
      },
      "symbols": {
        "invalid-name": 85,
        "line-too-long": 53,
        "trailing-whitespace": 33,
        // ...
      }
    }
  },
  "file_analysis": {
    // The same metrics, but for each file in the package.
  }
}
Enter fullscreen mode Exit fullscreen mode

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

👋 Kindness is contagious

Engage with a wealth of insights in this thoughtful article, valued within the supportive DEV Community. Coders of every background are welcome to join in and add to our collective wisdom.

A sincere "thank you" often brightens someone’s day. Share your gratitude in the comments below!

On DEV, the act of sharing knowledge eases our journey and fortifies our community ties. Found value in this? A quick thank you to the author can make a significant impact.

Okay