Zakaria Maaraki

Posted on Apr 30, 2023

How to develop an online code compiler using Java and Docker.

#docker #java #microservices #programming

Have you ever wondered how platforms such as Codeforces and LeetCode operate? How do they compile and run multiple user's code against test cases? How do they determine the efficiency of algorithms?
In this article, we will delve into the process of building a highly effective problem-solving platform.

The source code for this article can be found in my Github repository Sourcecode

Specification

Functional

Our platform should be:

Able support multiple programming languages.
Able of executing user code against multiple test cases
Able to return a correct verdict after the execution, list of verdicts (Accepted, Wrong Answer, Time Limit Exceeded, Memory Limit Exceeded, Runtime Error, Compilation Error).
Able to return a detailed error to the user if the verdict is one of these (Time Limit Exceeded, Compilation Error, Runtime Error, Memory Limit Exceeded).
Able to return the compilation duration.
Able to return the execution duration for each test case.

Non Functional

Our platform should:

Be able to execute multiple requests concurrently
Separate executions environments (malicious user code should not be able access machine host)
Should not let the code running if it exceed the time limit.
For each request, the user code should be compiled once, and executed multiple times against test cases.
User should not be able to access host file system.

Interface

Example of input:

{
    "testCases": {
      "test1": {
        "input": "<YOUR_INPUT>",
        "expectedOutput": "<YOUR_EXPECTED_OUTPUT>"
      },
      "test2": {
        "input": "<YOUR_INPUT>",
        "expectedOutput": "<YOUR_EXPECTED_OUTPUT>"
      },
      ...
    },
    "sourceCode": "<YOUR_SOURCE_CODE>",
    "language": "JAVA",
    "timeLimit": 15,  
    "memoryLimit": 500 
}

Examples of outputs:

{
    "verdict": "Accepted",
    "statusCode": 100,
    "error": "",
    "testCasesResult": {
      "test1": {
        "verdict": "Accepted",
        "verdictStatusCode": 100,
        "output": "0 1 2 3 4 5 6 7 8 9",
        "error": "", 
        "expectedOutput": "0 1 2 3 4 5 6 7 8 9",
        "executionDuration": 175
      },
      "test2": {
        "verdict": "Accepted",
        "verdictStatusCode": 100,
        "output": "9 8 7 1",
        "error": "" ,
        "expectedOutput": "9 8 7 1",
        "executionDuration": 273
      },
      ...
    },
    "compilationDuration": 328,
    "averageExecutionDuration": 183,
    "timeLimit": 1500,
    "memoryLimit": 500,
    "language": "JAVA",
    "dateTime": "2022-01-28T23:32:02.843465"
}

{
    "verdict": "Runtime Error",
    "statusCode": 600,
    "error": "panic: runtime error: integer divide by zero\n\ngoroutine 1 [running]:\nmain.main()\n\t/app/main.go:11 +0x9b\n",
    "testCasesResult": {
      "test1": {
        "verdict": "Accepted",
        "verdictStatusCode": 100,
        "output": "0 1 2 3 4 5 6 7 8 9",
        "error": "", 
        "expectedOutput": "0 1 2 3 4 5 6 7 8 9",
        "executionDuration": 175
      },
      "test2": {
        "verdict": "Runtime Error",
        "verdictStatusCode": 600,
        "output": "",
        "error": "panic: runtime error: integer divide by zero\n\ngoroutine 1 [running]:\nmain.main()\n\t/app/main.go:11 +0x9b\n" ,
        "expectedOutput": "9 8 7 1",
        "executionDuration": 0
      }
    },
    "compilationDuration": 328,
    "averageExecutionDuration": 175,
    "timeLimit": 1500,
    "memoryLimit": 500,
    "language": "GO",
    "dateTime": "2022-01-28T23:32:02.843465"
}

Implementation

Separate environments of executions

To separate environments for execution, we can use containers. The concept is to take the user-provided source code and create a Docker image that includes information about the execution (time limit, memory limit, source code, test cases, etc.) and run this container against multiple test cases. Depending on the container's exit code, we can determine the outcome of the execution (Accepted, Wrong Answer, Time Limit Exceeded, Memory Limit Exceeded, Runtime Error, Compilation Error).

Some benefits of using containers.

Isolation: Containers provide a way to isolate applications from one another, as well as from the host system. This can help to prevent conflicts and improve security.
Portability: Containers package all of the dependencies required for an application to run, making it easy to move the application between different environments.
Consistency: Because containers package all of the dependencies required for an application to run, it can help to ensure that the application behaves consistently across different environments.
Scalability: Containers can be easily scaled up or down to meet changing demand, making it easy to manage resources and ensure that applications are always running at optimal performance.
Cost-effectiveness: Using containers can help to reduce the cost of running and managing applications, as they are lightweight and require fewer resources than traditional virtual machines.
Flexibility: Containers can be deployed in a variety of environments, including on-premises, in the cloud, or in a hybrid environment, making them very flexible.

As mentioned in the image above, we need two types of containers, Compilation Containers and Execution Containers. Each request will create one image of these type of containers, then it will create one container instance of the compilation container image, and multiple instances (one for each test case) of the execution container image.

Compilaton Containers

These type of containers used to compile the sourcecode into binary. these containers are very special because they share the volume with main service.

Example:

FROM openjdk:11.0.6-jdk-slim

WORKDIR /app

ENTRYPOINT ["/bin/sh", "-c", "javac -d $EXECUTION_PATH $EXECUTION_PATH/$SOURCE_CODE_FILE_NAME && rm $EXECUTION_PATH/$SOURCE_CODE_FILE_NAME"]

Execution Containers

These type of containers contains all informations about the execution and this container is executed for each test case, and it's isolated (don't share the volume with any application or container).

Example:

FROM openjdk:11.0.6-jre-slim

WORKDIR /app

USER root

RUN groupadd -r user -g 111 && \
    useradd -u 111 -r -g user -s /sbin/nologin -c "Docker image user" user

ADD . .

RUN chmod a+x ./entrypoint-*.sh

USER user

ENTRYPOINT ["/bin/sh", "-c", "./entrypoint-$TEST_CASE_ID.sh"]

As mentioned in the Dockerfile the container entrypoint file and has as prefix TEST_CASE_ID, it's generated by the application for each test case using a template.

#!/usr/bin/env bash

ulimit -s [(${compiler.memoryLimit})]
timeout -s SIGTERM [(${compiler.timeLimit})] [(${compiler.executionCommand})]
exit $?

The template contains the time limit and the memory limit allowed for each test case.

Security policy

For security reasons and to prevent the user from accessing the fil system, we can use Security policies.

For java we have security policies that are used to control access to system resources, such as files and network connections, for Java applications. The Java security manager is responsible for enforcing these policies. The security manager can be configured to grant or deny permissions to specific code based on the code's origin, such as its location on the file system or its digital signature.

grant {
  permission java.io.FilePermission "/tmp/test.txt", "read,write";
  permission java.net.SocketPermission "www.example.com:80", "connect";
};

The above policy can be set as a command-line argument when starting the JVM, like this:

java -Djava.security.policy=mypolicy.policy MyApp

User request

User input will look like this:

@Getter
@NoArgsConstructor
@EqualsAndHashCode
@AllArgsConstructor
public class Request {

    /**
     * The Source code.
     */
    @ApiModelProperty(notes = "The sourcecode")
    @NonNull
    @JsonProperty("sourcecode")
    protected String sourcecode;

    /**
     * The Language.
     */
    @ApiModelProperty(notes = "The programming language")
    @NonNull
    @JsonProperty("language")
    protected Language language;

    /**
     * The Time limit.
     */
    @ApiModelProperty(notes = "The time limit in sec")
    @NonNull
    @JsonProperty("timeLimit")
    protected int timeLimit;

    /**
     * The Memory limit.
     */
    @ApiModelProperty(notes = "The memory limit")
    @NonNull
    @JsonProperty("memoryLimit")
    protected int memoryLimit;

    /**
     * The Test cases.
     */
    @ApiModelProperty(notes = "The test cases")
    @NonNull
    @JsonProperty("testCases")
    protected LinkedHashMap<String, TestCase> testCases; // Note: test cases should be given in order
}

For each test case we'll have and input and an expected output:

@Getter
@AllArgsConstructor
@EqualsAndHashCode
public class TestCase {

    @ApiModelProperty(notes = "The input, can be null")
    @JsonProperty("input")
    private String input;

    @ApiModelProperty(notes = "The expected output, can not be null")
    @NonNull
    @JsonProperty("expectedOutput")
    private String expectedOutput;
}

Compilation Strategy

Here is a code snippet on how compilation is done for compiled languages. We can use strategy pattern to chose which algorithm should be used for compiled or interpreted languages.

@Override
    public CompilationResponse compile(Execution execution) {

        // repository name must be lowercase
        String compilationImageName = IMAGE_PREFIX_NAME + execution.getLanguage().toString().toLowerCase();

        // If the app is running inside a container, we should share the same volume with the compilation container.
        final String volume = compilationContainerVolume.isEmpty()
                                    ? System.getProperty("user.dir")
                                    : compilationContainerVolume;

        String sourceCodeFileName = execution.getSourceCodeFile().getOriginalFilename();

        String containerName = COMPILATION_CONTAINER_NAME_PREFIX + execution.getImageName();

        var processOutput = new AtomicReference<ProcessOutput>();
        compilationTimer.record(() -> {
            processOutput.set(
                    compile(volume, compilationImageName, containerName, execution.getPath(), sourceCodeFileName)
            );
        });

        ProcessOutput compilationOutput = processOutput.get();

        int compilationDuration = compilationOutput.getExecutionDuration();

        ContainerInfo containerInfo = containerService.inspect(containerName);
        ContainerHelper.logContainerInfo(containerName, containerInfo);

        Verdict verdict = getVerdict(compilationOutput);

        compilationDuration = ContainerHelper.getExecutionDuration(
                                                    containerInfo == null ? null : containerInfo.getStartTime(),
                                                    containerInfo == null ? null : containerInfo.getEndTime(),
                                                    compilationDuration);

        ContainerHelper.deleteContainer(containerName, containerService, threadPool);

        return CompilationResponse
                .builder()
                .verdict(verdict)
                .error(compilationOutput.getStdErr())
                .compilationDuration(compilationDuration)
                .build();
    }

Execution Strategy

Here is a code snippet on how an execution is done.

   public ExecutionResponse run(Execution execution, boolean deleteImageAfterExecution) {

        buildContainerImage(execution);

        var testCasesResult = new LinkedHashMap<String, TestCaseResult>();
        Verdict verdict = null;
        String err = "";

        for (ConvertedTestCase testCase : execution.getTestCases()) {

            TestCaseResult testCaseResult = executeTestCase(execution, testCase);

            testCasesResult.put(testCase.getTestCaseId(), testCaseResult);

            verdict = testCaseResult.getVerdict();

            log.info("Status response for the test case {} is {}", testCase.getTestCaseId(), verdict.getStatusResponse());

            // Update metrics
            verdictsCounters.get(verdict.getStatusResponse()).increment();

            if (verdict != Verdict.ACCEPTED) {
                // Don't continue if the current test case failed
                log.info("Test case id: {} failed, abort executions", testCase.getTestCaseId());
                err = testCaseResult.getError();
                break;
            }
        }

        // Delete container image asynchronously
        if (deleteImageAfterExecution) {
            ContainerHelper.deleteImage(execution.getImageName(), containerService, threadPool);
        }

        return ExecutionResponse
                .builder()
                .verdict(verdict)
                .testCasesResult(testCasesResult)
                .error(err)
                .build();
    }

private TestCaseResult executeTestCase(Execution execution,
                                           ConvertedTestCase testCase) {

        log.info("Start running test case id = {}", testCase.getTestCaseId());

        String expectedOutput = testCase.getExpectedOutput();

        // Free memory space
        testCase.freeMemorySpace();

        var result = new AtomicReference<TestCaseResult>();
        executionTimer.record(() -> {
            // Run the execution container
            result.set(runContainer(execution, testCase.getTestCaseId(), expectedOutput));
        });

        TestCaseResult testCaseResult = result.get();
        return testCaseResult;
    }

Each programming language has it's own execution parameters and specific configuration. To make this abstract we can use dependency inversion principle by creating Execution classes using abstract factory pattern

Abstract Factory

The Abstract Factory pattern is a design pattern that provides a way to create families of related or dependent objects without specifying their concrete classes. It is used to create objects that belong to a single family, but are not meant to be used together.

@FunctionalInterface
public interface AbstractExecutionFactory {

    /**
     * Create execution.
     *
     * @param sourceCode  the source code
     * @param testCases   the test cases
     * @param timeLimit   the time limit
     * @param memoryLimit the memory limit
     * @return the execution
     */
    Execution createExecution(MultipartFile sourceCode,
                              List<ConvertedTestCase> testCases,
                              int timeLimit,
                              int memoryLimit);
}

public abstract class ExecutionFactory {

    private static Map<Language, ExecutionType> registeredExecutionTypes = new EnumMap<>(Language.class);

    private static Map<Language, AbstractExecutionFactory> registeredFactories = new EnumMap<>(Language.class);

    private ExecutionFactory() {}

    /**
     * Register.
     *
     * @param language the language
     * @param factory  the factory
     */
    public static void registerExecution(Language language, AbstractExecutionFactory factory) {
        registeredFactories.putIfAbsent(language, factory);
    }

    /**
     * Register execution type.
     *
     * @param language      the language
     * @param executionType the execution type
     */
    public static void registerExecutionType(Language language, ExecutionType executionType) {
        registeredExecutionTypes.putIfAbsent(language, executionType);
    }

    /**
     * Gets execution type.
     *
     * @param language the language
     * @return the execution type
     */
    public static ExecutionType getExecutionType(Language language) {
        return registeredExecutionTypes.get(language);
    }

    /**
     * Gets registered factories.
     *
     * @return the registered factories
     */
    public static Set<Language> getRegisteredFactories() {
        return registeredFactories
                .keySet()
                .stream()
                .collect(Collectors.toSet());
    }

    /**
     * Gets execution.
     *
     * @param sourceCode  the source code
     * @param testCases   the test cases
     * @param timeLimit   the time limit
     * @param memoryLimit the memory limit
     * @param language    the language
     * @return the execution
     */
    public static Execution createExecution(MultipartFile sourceCode,
                                            List<ConvertedTestCase> testCases,
                                            int timeLimit,
                                            int memoryLimit,
                                            Language language) {
        AbstractExecutionFactory factory = registeredFactories.get(language);
        if (factory == null) {
            throw new FactoryNotFoundException("No ExecutionFactory registered for the language " + language);
        }

        return factory.createExecution(
                sourceCode,
                testCases,
                timeLimit,
                memoryLimit);
    }
}

All languages can be registered in a config class.

   private void configureLanguages() {
        // Register factories
        register(Language.JAVA,
                (sourceCode, testCases, timeLimit, memoryLimit) -> new JavaExecution(
                        sourceCode,
                        testCases,
                        timeLimit,
                        memoryLimit,
                        ExecutionFactory.getExecutionType(Language.JAVA)));

        register(Language.PYTHON,
                (sourceCode, testCases, timeLimit, memoryLimit) -> new PythonExecution(
                        sourceCode,
                        testCases,
                        timeLimit,
                        memoryLimit,
                        ExecutionFactory.getExecutionType(Language.PYTHON)));
...

For more information about Execution class and how it creates an environment of execution see Execution class

How to compute the compilation and the execution durations

Well, we can levrage the docker inspect command to get all details about the container (date of creation, date of start execution, status, date of end execution, exit status...).

Docker Inspect

You can use the docker inspect command by specifying the container or image ID, or the container or image name, as the argument.

For example, to inspect a container named "my_container", you would run the following command:

docker inspect my_container

You can also use the --format option to display only specific fields or to format the output in a specific way.

docker inspect --format='{{json .Config}}' my_container

For more details see the complete sourcecode of the application.

Other things available in the codebase

Helm chart to deploy the service in K8s Helm charts
Provisioning of the infrastructure on Azure using ARM template
Local execution using docker-compose including communication between RabbitMq and ApacheKafka docker-compose

Conclusion

Creating a problem-solving platform can be a challenging task, but the use of containers can make the process much more manageable. With the many benefits of containers, such as isolation, portability, consistency, scalability, and cost-effectiveness, it's easy to see why they are an excellent choice for building a powerful problem-solving platform. So, whether you're a coding enthusiast looking to sharpen your skills or a business looking to improve your operations, don't hesitate to give containers a try, you'll be glad you did! And remember, as the famous quote goes "Containers are like Legos for your code", so have fun building your own problem-solving platform, the possibilities are endless!

Top comments (1)

Tadelachewu • May 11

HOW CAN ONLINE IDES RUN CODE ON DOCKER

DEV Community