DEV Community

Cover image for A 40-line LLM-based bash command executor in Python
piko::tutorial
piko::tutorial

Posted on • Originally published at pikotutorial.com

A 40-line LLM-based bash command executor in Python

One of the interesting use cases for local LLMs are the natural language interfaces for the terminal commands generation. Instead of memorizing command syntax, flags and shell quirks, you can simply describe what you want to do like:

Find all PNG files larger than 5 MB and move them to archive
Enter fullscreen mode Exit fullscreen mode

An LLM not only translates that request into a real shell command, but also executes it.

In this article, I will build a minimal implementation of such a tool in Python - a lightweight command-line assistant called piko::ai. The entire Python code fits in less than 40 lines of code, yet it provides:

  • natural language to bash conversion
  • configurable LLM backends
  • structured JSON responses
  • dangerous command detection
  • and automatic command execution

The implementation uses a local LLM through Ollama, but the architecture is flexible enough to support almost any provider.

Why build a tool like this?

Shell commands are incredibly powerful, but for complex or chained/piped cases they are also difficult to remember and prone to human error. Even experienced developers regularly search for things like rarely used command switches.

Large language models are surprisingly good at translating intent into shell syntax, so the goal of this tool is to instead of thinking in commands, allow the user to think in actions. For example:

find all jpg files modified in the last 7 days and compress them into images.tar.gz
Enter fullscreen mode Exit fullscreen mode

should generate:

find . -type f -name "*.jpg" -mtime -7 | tar -czf images.tar.gz -T -
Enter fullscreen mode Exit fullscreen mode

For simple commands there is no justification for writing long sentences, but as soon as you start searching the internet or asking AI:

  • is it -mtime or -dtime?
  • was it -czf or -xvf for the archive creation?
  • should there be -T or --files-from or both are fine?

it may be acutally quicker to just type the request directly in the terminal and get the output for that request also directly in the terminal.

The architecture

The entire flow is extremely simple:

  1. User types a natural language request.
  2. Python inserts that request into a prompt template.
  3. The prompt is sent to an LLM.
  4. The LLM returns JSON containing a shell command.
  5. Python checks whether the command is potentially dangerous.
  6. If approved, the command is executed.

If you have never written stuff like that, you will be surprised how little code is actually required.

The Python implementation

For the TL;DR-readers, below is the complete implementation - if you want to test it out, you can clone the piko_ai GitHub repository. I go into details on what exactly does this code do in the paragraphs below:

import os
import sys
import requests
import json
import subprocess

MAIN_FILE_PATH: str = os.path.join(os.path.dirname(os.path.abspath(__file__)))
PIKO_AI_CONFIG_FILE_PATH: str = os.path.join(MAIN_FILE_PATH, "..", "config", "pai_config.json")
PIKO_AI_PROMPT_FILE_PATH: str = os.path.join(MAIN_FILE_PATH, "..", "config", "pai_prompt.txt")

with open(PIKO_AI_CONFIG_FILE_PATH) as file:
    config = json.load(file)
with open(PIKO_AI_PROMPT_FILE_PATH) as file:
    prompt_template: str = file.read()

user_request: str = " ".join(sys.argv[1:])
prompt: str = prompt_template.replace("@USER_REQUEST@", user_request)

request_payload: dict = config["llm_request"]
request_payload["prompt"] = prompt

response = requests.post(config["llm_provider_url"], json=request_payload, timeout=config["llm_request_timeout"])
response.raise_for_status()

command: str = json.loads(response.json()["response"])["command"]

print(f"$ {command}")

for dangerous_command in config["dangerous_commands"]:
    if dangerous_command in command:
        user_response: str = input(f"WARNING: dangerous command detected ({dangerous_command})! Are you sure you want to run it? (y/n)")

        if user_response == "y":
            break

        sys.exit(0)

subprocess.run(command, shell=True)
Enter fullscreen mode Exit fullscreen mode

Let’s break it down piece by piece.


AI is powerful. Snippets are instant.

Stop prompting for the same patterns repeatedly. Get almost 100 free VS Code snippets for C++, Python, CMake and Bazel from piko::snippets GitHub repository.


Step 0: set up constants

import os          # for joining paths
import sys         # for command line arguments and exit
import requests    # for sending requests to LLM
import json        # for JSON parsing
import subprocess  # for executing generated commands

# path to this file which serves as the reference for further paths
MAIN_FILE_PATH: str = os.path.join(os.path.dirname(os.path.abspath(__file__)))
# path to configuration file
PIKO_AI_CONFIG_FILE_PATH: str = os.path.join(MAIN_FILE_PATH, "..", "config", "pai_config.json")
# path to file with the prompt template
PIKO_AI_PROMPT_FILE_PATH: str = os.path.join(MAIN_FILE_PATH, "..", "config", "pai_prompt.txt")
Enter fullscreen mode Exit fullscreen mode

Step 1: loading configuration and prompt files

with open(PIKO_AI_CONFIG_FILE_PATH) as file:
    config = json.load(file)

with open(PIKO_AI_PROMPT_FILE_PATH) as file:
    prompt_template: str = file.read()
Enter fullscreen mode Exit fullscreen mode

Even in the smallest tools, I always like to separate:

  • application logic
  • configuration
  • prompts

This separation makes the tool flexible enough to support:

  • Ollama
  • OpenAI-compatible APIs
  • local inference servers
  • cloud providers
  • completely custom backends

For this example however, I will only use local Ollama model.

Step 2: reading the user request

The CLI request is simply reconstructed from command-line arguments. For more complex command line interfaces argparse could come in handy, but in case of this tool, such one-liner is completely enough to concatenate the entire output into a single string, without forcing user to use "" around the request.

user_request: str = " ".join(sys.argv[1:])
Enter fullscreen mode Exit fullscreen mode

So:

pai find all files larger than 55kB
Enter fullscreen mode Exit fullscreen mode

becomes:

"find all files larger than 55kB"
Enter fullscreen mode Exit fullscreen mode

That text is then injected into the prompt template and assigned to a proper field in the request_payload.

prompt: str = prompt_template.replace("@USER_REQUEST@", user_request)

request_payload: dict = config["llm_request"]
request_payload["prompt"] = prompt
Enter fullscreen mode Exit fullscreen mode

Step 3: prompt engineering

The prompt template looks like this:

You are a shell command generator.
User requests from you a bash command in a human readable language and your job is to convert this request into a bash command that the user can immediately invoke.

Requirements:
- If not stated differently, always assume that the command must be executed in the current working directory (.)
- The command must be syntactically valid
- The command must be fully executable
- Prefer a single grep command over pipelines when possible
- Command must be usable out of the box, so don't provide any mock values, but e.g. when the user says "here", it means "."

User request:
@USER_REQUEST@
Enter fullscreen mode Exit fullscreen mode

This prompt emphasizes several important aspects.

Establishing assumptions

This line:

always assume that the command must be executed in the current working directory
Enter fullscreen mode Exit fullscreen mode

greatly improves usability of small models because users are expected to naturally say things like:

find here all markdown files
Enter fullscreen mode Exit fullscreen mode

instead of:

find all markdown files in /home/user/Documents/applications/private_projects/docs”
Enter fullscreen mode Exit fullscreen mode

Biasing command style

This instruction:

Prefer a single grep command over pipelines when possible
Enter fullscreen mode Exit fullscreen mode

helps shape output quality. Small LLMs often overcomplicate shell commands and at the same time (because of their size), make mistakes in these overcomplicated commands. Prompt constraints can push them toward simpler solutions.

Preventing placeholders

Without instructions like:

don't provide any mock values
Enter fullscreen mode Exit fullscreen mode

models often generate unusable outputs like:

grep -r "keyword" /path/to/directory
Enter fullscreen mode Exit fullscreen mode

instead of:

grep -r "keyword" .
Enter fullscreen mode Exit fullscreen mode

The prompt explicitly forces executable commands and not suggestions.


Read also on pikotutorial.com: How to write Arduino Uno code with Python?


Step 4: sending the request to the LLM

response = requests.post(
    config["llm_provider_url"],
    json=request_payload,
    timeout=config["llm_request_timeout"]
)
Enter fullscreen mode Exit fullscreen mode

The configuration file defines the request payload and the output schema.

{
    "llm_request": {
        "model": "qwen2.5-coder:1.5b",
        "format": {
            "type": "object",
            "properties": {
                "command": {
                    "type": "string"
                }
            },
            "required": ["command"]
        },
        "prompt": "TO BE REPLACED BY THE ACTUAL PROMPT",
        "options": {
            "temperature": 0.1,
            "seed": 42
        },
        "stream": false
    },
    "llm_provider_url": "http://localhost:11434/api/generate",
    "model_name": "qwen2.5-coder:1.5b",
    "llm_request_timeout": 60,
}
Enter fullscreen mode Exit fullscreen mode

Several details here are especially important.

Structured JSON output

This is one of the most important implementation details. Instead of asking the model for plain text, the tool requests structured JSON like the one below:

{
    "command": "grep -r \"TODO\" ."
}
Enter fullscreen mode Exit fullscreen mode

Without structured output, models may generate additional explanations or markdown formatting. Structured generation constrains the model into machine-readable output.

Low temperature and a fixed seed

"temperature": 0.1,
"seed": 42
Enter fullscreen mode Exit fullscreen mode

Command generation is not creative writing, so we want deterministic, predictable and reproducible outputs. Low temperature and a fixed seed reduce hallucinations and let the user learn the tool because with the repeatable input-output relation, even if the tool misbehaves for some requests formulations, but improves for others, it lets the user to consistently improve on the tool usage. With unpredictable and non-reproducible outputs, the user would never be able to tune the inputs for the expected outputs.

Why qwen2.5-coder:1.5B?

Such tool is supposed to be a reasonable alternative for just invoking the command or searching for the command and then invoking it. If processing of a request would take several minutes, any other form (googling, copying, asking AI chat bot etc.) would end up being faster than using this tool. I needed to select something small, so that it can run fast locally. Fortunately, our task is highly specialized, so even small code-focused models are capable of handling it.

Step 5: parsing the model response

command: str = json.loads(response.json()["response"])["command"]
Enter fullscreen mode Exit fullscreen mode

The returned command is the bash command extracted from the generated JSON. Then the tool prints it:

print(f"$ {command}")
Enter fullscreen mode Exit fullscreen mode

Users should always see exactly what will be executed.

Step 6: dangerous command detection

This may be actually the most important part of the implementation. The tool scans the generated command for operations defined as dangerous in the config file:

for dangerous_command in config["dangerous_commands"]:
    if dangerous_command in command:
        user_response: str = input(f"WARNING: dangerous command detected ({dangerous_command})! Are you sure you want to run it? (y/n)")

        if user_response == "y":
            break

        sys.exit(0)
Enter fullscreen mode Exit fullscreen mode

After a dangerous operation is detected, the command execution requires explicit confirmation from the user side. Here, by "dangerous" I mean any command that performs any actual modification of the user data.

Warning: this implementation intentionally favors simplicity over perfect security. The current check is only substring matching, so it may overreact e.g. if you have a folder name containg "rm" letters next to each other.

Step 7: Executing the command

Finally:

subprocess.run(command, shell=True)
Enter fullscreen mode Exit fullscreen mode

executes the generated shell command. This is where the it transforms from AI suggestion into actual tool because the generated command is not only displayed, but actually called.

Example usage

Command:

pai find all files larger than 1kB
Enter fullscreen mode Exit fullscreen mode

Output:

$ find . -type f -size +1k
./main.py
Enter fullscreen mode Exit fullscreen mode

Command:

pai list all .py files with find, but filter out findings from ./venv folder
Enter fullscreen mode Exit fullscreen mode

Output:

$ find . -name '*.py' ! -path './venv/*'
./src/main.py
Enter fullscreen mode Exit fullscreen mode

Command:

pai grep for all usages of MAIN_FILE_PATH
Enter fullscreen mode Exit fullscreen mode

Output:

$ grep -r 'MAIN_FILE_PATH' .
./src/main.py:MAIN_FILE_PATH: str = os.path.join(os.path.dirname(os.path.abspath(__file__)))
./src/main.py:PIKO_AI_CONFIG_FILE_PATH: str = os.path.join(MAIN_FILE_PATH, "..", "config", "pai_config.json")
./src/main.py:PIKO_AI_PROMPT_FILE_PATH: str = os.path.join(MAIN_FILE_PATH, "..", "config", "pai_prompt.txt")
./scripts/install.sh:MAIN_FILE_PATH="$PIKO_AI_DIR/src/main.py"
./scripts/install.sh:ALIAS_LINE="alias pai='$VENV_DIR/bin/python3 $MAIN_FILE_PATH'"
Enter fullscreen mode Exit fullscreen mode

Top comments (0)