Oliver Li

Posted on May 15

Java Code Change Impact Analysis

#opensource #testing #java #python

Background

As the business becomes increasingly complex, conducting full regression testing is becoming more difficult. In order to more accurately pinpoint the impact of changes in backend projects, and to define the scope of regression testing more precisely, thus improving testing efficiency, it is necessary to analyze the impact scope of Java code submissions.

Display Effect

Implementation

The basic principle is the same as the Find Usage feature in Idea, locating the impact of code changes and continuously traversing affected classes and methods until the top-level controller layer is found.

The code is mainly written in Python and involves two libraries:

javalang: a library for parsing Java file syntax
unidiff: a library for parsing git diff information

Using javalang for syntax parsing, it retrieves information from each Java file such as import, class, extends, implements, declarators, methods, etc. The results of the Java file parsing are stored using sqlite3, divided into several tables like project, class, import, field, methods, each storing corresponding information. Then, SQL queries are used to call methods.

unidiff is used to parse git diff information (diff file, added_line_num, removed_line_num). Then, based on the lines of code added or removed in the files, it determines which classes and methods are affected, continuously traversing affected classes and methods until the top-level controller layer is found.

SQLite3 Table Structure

The table structure is as follows:

CREATE TABLE project (
    project_id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    project_name TEXT NOT NULL,
    git_url TEXT NOT NULL,
    branch TEXT NOT NULL,
    commit_or_branch_new TEXT NOT NULL,
    commit_or_branch_old TEXT,
    create_at TIMESTAMP NOT NULL DEFAULT (datetime('now','localtime'))
);

CREATE TABLE class (
    class_id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    filepath TEXT,
    access_modifier TEXT,
    class_type TEXT NOT NULL,
    class_name TEXT NOT NULL,
    package_name TEXT NOT NULL,
    extends_class TEXT,
    project_id INTEGER NOT NULL,
    implements TEXT,
    annotations TEXT,
    documentation TEXT,
    is_controller REAL,
    controller_base_url TEXT,
    commit_or_branch TEXT,
    create_at TIMESTAMP NOT NULL DEFAULT (datetime('now','localtime'))
);

CREATE TABLE import (
    import_id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    class_id INTEGER NOT NULL,
    project_id INTEGER NOT NULL,
    start_line INTEGER,
    end_line INTEGER,
    import_path TEXT,
    is_static REAL,
    is_wildcard REAL,
    create_at TIMESTAMP NOT NULL DEFAULT (datetime('now','localtime'))
);

CREATE TABLE field (
    field_id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    class_id INTEGER,
    project_id INTEGER NOT NULL,
    annotations TEXT,
    access_modifier TEXT,
    field_type TEXT,
    field_name TEXT,
    is_static REAL,
    start_line INTEGER,
    end_line INTEGER,
    documentation TEXT,
    create_at TIMESTAMP NOT NULL DEFAULT (datetime('now','localtime'))
);


CREATE TABLE methods (
    method_id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    class_id INTEGER NOT NULL,
    project_id INTEGER NOT NULL,
    annotations TEXT,
    access_modifier TEXT,
    return_type TEXT,
    method_name TEXT NOT NULL,
    parameters TEXT,
    body TEXT,
    method_invocation_map TEXT,
    is_static REAL,
    is_abstract REAL,
    is_api REAL,
    api_path TEXT,
    start_line INTEGER NOT NULL,
    end_line INTEGER NOT NULL,
    documentation TEXT,
    create_at TIMESTAMP NOT NULL DEFAULT (datetime('now','localtime'))
);

This section primarily introduces the method_invocation_map
field in the methods table, which stores the classes and methods used by the parsed method, facilitating subsequent queries on which methods used a certain class or method. An example of what is stored in the method_invocation_map field:

{
    "com.XXX.Account": {
        "entity": {
            "return_type": true
        }
    },
    "com.XXXX.AccountService": {
        "methods": {
            "functionAA(Map<String#Object>)": [303]
        },
        "fields": {
            "fieldBB": [132]
        }
    }
}

Analyzing Calls

This part of the main logic is changed to analyze which methods call the modified classes or methods, querying the results via SQL. Example code:

SELECT
    *
FROM
    methods
WHERE
    project_id = 1
    AND (json_extract(method_invocation_map,
    '$."com.xxxx.ProductUtil".methods."convertProductMap(QueryForCartResponseDTO)"') IS NOT NULL
    OR json_extract(method_invocation_map,
    '$."com.xxxx.ProductUtil".methods."convertProductMap(null)"') IS NOT NULL)

Display Method

Previously, the display method used tree diagrams and relational graphs. However, the tree diagrams did not clearly show the links, and the coordinates of the nodes in the relational graphs were not reasonable. This part has also been optimized. The node coordinates are calculated based on the node relationships. The longer the relationship link, the larger the horizontal coordinate, making the display clearer. Example code:

def max_relationship_length(relationships):
    if not relationships:
        return {}
    # Build adjacency list
    graph = {}
    for relationship in relationships:
        source = relationship['source']
        target = relationship['target']
        if source not in graph:
            graph[source] = []
        if target not in graph:
            graph[target] = []
        graph[source].append(target)

    # BFS Traverse and calculate the longest path length from each node to the starting point
    longest_paths = {node: 0 for node in graph.keys()}
    graph_keys = [node for node in graph.keys()]
    longest_paths[graph_keys[0]] = 0
    queue = deque([(graph_keys[0], 0)])
    while queue:
        node, path_length = queue.popleft()
        if not graph.get(node) and not queue and graph_keys.index(node) + 1 < len(graph_keys):
            next_node = graph_keys[graph_keys.index(node) + 1]
            next_node_path_length = longest_paths[next_node]
            queue.append((next_node, next_node_path_length))
            continue
        for neighbor in graph.get(node, []):
            if path_length + 1 > longest_paths[neighbor]:
                longest_paths[neighbor] = path_length + 1
                queue.append((neighbor, path_length + 1))
    return longest_paths

Three Analysis Methods

JCCI can analyze three different scenarios: comparing two commits on the same branch, analyzing a specified class, and analyzing features across two branches. Example:

from path.to.jcci.src.jcci.analyze import JCCI

# Comparison of different commits on the same branch
commit_analyze = JCCI('git@xxxx.git', 'username1')
commit_analyze.analyze_two_commit('master','commit_id1','commit_id2')

# Analyze the method impact of a class. The last parameter of the analyze_class_method method is the number of lines where the method is located. The number of lines of different methods is separated by commas. If left blank, the impact of the complete class will be analyzed.
class_analyze = JCCI('git@xxxx.git', 'username1')
class_analyze.analyze_class_method('master','commit_id1', 'package\src\main\java\ClassA.java', '20,81')

# Compare different branches
branch_analyze = JCCI('git@xxxx.git', 'username1')
branch_analyze.analyze_two_branch('branch_new','branch_old')

Flexible Configuration

You can configure the sqlite3 database storage path, project code storage path, and files to ignore during parsing in the config file. Example:

db_path = os.path.dirname(os.path.abspath(__file__))
project_path = os.path.dirname(os.path.abspath(__file__))
ignore_file = ['*/pom.xml', '*/test/*', '*.sh', '*.md', '*/checkstyle.xml', '*.yml', '.git/*']

Conclusion

Project URL: JCCI
Everyone is welcome to try it out and provide feedback.
Looking forward to your stars~
Contact information and more can be found in the GitHub project's readme.
Thank you~~

DEV Community

Java Code Change Impact Analysis

Background

Display Effect

Implementation

SQLite3 Table Structure

Analyzing Calls

Display Method

Three Analysis Methods

Flexible Configuration

Conclusion

Top comments (0)

Read next

Manage a multiple websites server with Docker, Traefik and auto SSL certificates

Mastering Spring Cloud Gateway Testing: Predicates (part 1)

Password Managers - The Future of Logins.

Hoppscotch vs Postman - Why choose Hoppscotch?