Continuing articles AI-Powered Repository Security Check with Antigravity Workflow and https://dev.to/gdg/how-to-build-a-custom-ai-quality-gate-on-cloud-run-from-zero-to-production-1odp I've decided to try to outsource some checks to local LLM.
This article describes my experiment and outcomes. Will be glad to read your questions, proposals, opinions or advices! π
You can listen a podcast generated based on this publication (thanks NotebookLM):
Intro
Last changes in limits management for popular LLM APIs make me thinking about FinOps management. Why should I spend expensive cloud tokens for simple tasks? Also I have a lot of talks at last security and AI events which led me to begin experiments with local LLMs in terms of code generation and code quality checks.
Hardware
The hardware for experiments is MacBook Air M5 24GB RAM. I bought it especially for diving into ML topics but it was underloaded since today.
Pains
The first pain was an introduction of new limits for the Antigravity IDE. Along with models list changing it led me to think about optimizing my development and security flows which were intended to use cheaper Antigravity tokens prior to more expensive Vertex AI tokens.
The second pain was the FOMO effect about Machine Learning and MLOps itself.
Solution Track
After some iterations with Ollama and local models I've selected the qwen2.5-coder:14b-instruct-q5_K_M as a base model with optimized context window:
% cat Modelfile-qwen-32k
FROM qwen2.5-coder:14b-instruct-q5_K_M
PARAMETER num_ctx 32000
% ollama create qwen-coder-32k -f ./Modelfile-qwen-32k
...
% ollama list
NAME ID SIZE MODIFIED
qwen-coder-32k:latest dc3c4762d967 10 GB 2 hours ago
qwen-coder-64k:latest 42f060e717dd 10 GB 2 hours ago
qwen2.5-coder:14b-instruct-q5_K_M 05d16c5ac1c1 10 GB 2 hours ago
gemma4:e4b c6eb396dbd59 9.6 GB 25 hours ago
gemma4:e2b 7fbdbf8f5e45 7.2 GB 25 hours ago
The 32k window provided me with quite quick execution and a trade-off between the speed and the temperature of my laptop. I think this configuration will be a subject of experiments in near future.
Then I've realized that I have to decompose tasks and give some rest time between requests to my hardware. So the unified script was born:
#!/bin/bash
# Default values
OUTPUT_DIR="."
MODEL_NAME="qwen-coder-32k"
COEFF=2
PROMPT_FILE=""
show_help() {
echo "Usage: $0 -d <directory> -m <file_mask> -p <prompt_file> [OPTIONS]"
echo ""
echo "Required parameters:"
echo " -d Directory for searching files"
echo " -m File mask to check"
echo " -p Path to a text file with system prompt (e.g., prompts/strict_table.txt)"
echo ""
echo "Optional parameters:"
echo " -o Directory to save the final report (default: current directory)"
echo " -e Exclude directories (comma-separated, e.g., venv,tests,migration)"
echo " -f Exclude file masks (comma-separated, e.g., *test*,__init__.py)"
echo " -c Cooldown delay multiplier (default: 2)"
exit 1
}
# Argument parsing
while getopts "d:m:o:e:f:c:p:h" opt; do
case "$opt" in
d) SRC_DIR="$OPTARG" ;;
m) FILE_MASK="$OPTARG" ;;
o) OUTPUT_DIR="$OPTARG" ;;
e) EXCLUDE_DIRS="$OPTARG" ;;
f) EXCLUDE_FILES="$OPTARG" ;;
c) COEFF="$OPTARG" ;;
p) PROMPT_FILE="$OPTARG" ;;
h) show_help ;;
*) show_help ;;
esac
done
# Check required parameters
if [ -z "$SRC_DIR" ] || [ -z "$FILE_MASK" ] || [ -z "$PROMPT_FILE" ]; then
echo "β Error: Required parameters -d, -m, or -p are missing."
show_help
fi
# Check if prompt file exists
if [ ! -f "$PROMPT_FILE" ]; then
echo "β Error: Prompt file '$PROMPT_FILE' not found!"
exit 1
fi
# Check Ollama
if ! pgrep -x "ollama" > /dev/null && ! curl -s http://localhost:11434 > /dev/null; then
echo "β Error: Ollama is not running!"
exit 1
fi
# Check jq
if ! command -v jq &> /dev/null; then
echo "β Error: 'jq' utility is not installed. Run: brew install jq"
exit 1
fi
# Initialize report directory
mkdir -p "$OUTPUT_DIR"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
REPORT_FILE="$OUTPUT_DIR/review_report_$TIMESTAMP.md"
# Write report header
{
echo "# π‘οΈ Review Report"
echo "Generation date: $(date)"
echo "Used prompt: \`$PROMPT_FILE\`"
echo -e "\n---\n"
} > "$REPORT_FILE"
echo "=================================================================="
echo "π΅οΈββοΈ Starting review..."
echo "π Final report will be saved to: $REPORT_FILE"
echo "=================================================================="
# Build find command
FIND_CMD="find \"$SRC_DIR\" -type f -name \"$FILE_MASK\""
if [ -n "$EXCLUDE_DIRS" ]; then
IFS=',' read -ra DIRS <<< "$EXCLUDE_DIRS"
FOR_FIND=""
for dir in "${DIRS[@]}"; do
if [ -z "$FOR_FIND" ]; then
FOR_FIND="-path '*/$dir/*'"
else
FOR_FIND="$FOR_FIND -o -path '*/$dir/*'"
fi
done
FIND_CMD="find \"$SRC_DIR\" \( $FOR_FIND \) -prune -o -type f -name \"$FILE_MASK\" -print"
fi
# Start main file processing loop
eval "$FIND_CMD" | while read -r file; do
if [ ! -f "$file" ]; then continue; fi
# Check file exclusions
if [ -n "$EXCLUDE_FILES" ]; then
IFS=',' read -ra FILE_MASKS <<< "$EXCLUDE_FILES"
skip_file=false
for mask in "${FILE_MASKS[@]}"; do
if [[ "$(basename "$file")" == $mask ]]; then
skip_file=true
break
fi
done
if [ "$skip_file" = true ]; then
echo "βοΈ Skipping file (excluded by mask): $file"
continue
fi
fi
echo -n "β³ Analyzing: $file ... "
# Read code and clear comments/empty lines
CLEANED_CODE=$(sed -e 's/[[:space:]]*#.*//' -e '/^[[:space:]]*$/d' "$file")
if [ -z "$CLEANED_CODE" ]; then
echo "β οΈ Empty."
continue
fi
# Write file section to report
{
echo "## π File: $file"
echo -e "\n### π Analysis results:\n"
} >> "$REPORT_FILE"
# Read external prompt and combine with code
SYSTEM_PROMPT=$(cat "$PROMPT_FILE")
FULL_PROMPT="$SYSTEM_PROMPT\n\n--- TARGET CODE ---\n$CLEANED_CODE"
JSON_PAYLOAD=$(jq -n --arg model "$MODEL_NAME" --arg prompt "$FULL_PROMPT" '{model: $model, prompt: $prompt, stream: false}')
# Measure time and send API request
START_TIME=$(date +%s)
curl -s -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d "$JSON_PAYLOAD" | jq -r '.response' >> "$REPORT_FILE"
END_TIME=$(date +%s)
ELAPSED=$((END_TIME - START_TIME))
SLEEP_TIME=$((ELAPSED * COEFF))
echo -e "\n\n---\n\n" >> "$REPORT_FILE"
echo "β
Elapsed: ${ELAPSED}s. Rest: ${SLEEP_TIME}s."
if [ "$SLEEP_TIME" -gt 0 ]; then
sleep "$SLEEP_TIME"
fi
done
echo "=================================================================="
echo "π Review successfully completed!"
echo "=================================================================="
The logic of the script:
- Get info about which files to check and where they are stored.
- Get the file with the prompt content.
- Get some optional parameters about filtering, outputs and delays between requests.
- For each file:
- Read the file and clean it from not meaningful things like comments and empty lines.
- Send the file content into the local LLM along with the prompt.
- Receive result and save it to the report.
- Count the processing time for the file and sleep x2 (by default) time to cool down the hardware.
Outcomes
Execution Flow
(venv) %n@%m %1~ %# ./scripts/repo-check-1.sh -d scripts -m setup* -p scripts/prompt-infrasec.txt
==================================================================
π΅οΈββοΈ Starting review...
π Final report will be saved to: ./review_report_20260521_121530.md
==================================================================
β³ Analyzing: scripts/setup-quality-gate-iam.sh ... β
Elapsed: 6s. Rest: 12s.
β³ Analyzing: scripts/setup-gcp-details.sh ... β
Elapsed: 95s. Rest: 190s.
β³ Analyzing: scripts/setup-gcp.sh ... β
Elapsed: 128s. Rest: 256s.
==================================================================
π Review successfully completed!
==================================================================
Report
π Analysis results:
| Finding / Vulnerability | Recommendation / Fix |
|---|---|
| Assigning public access (legacyObjectReader) to GCS bucket | Remove the line gsutil iam ch allUsers:legacyObjectReader "gs://${BUCKET_NAME}" to prevent making the bucket publicly accessible. Consider using more restrictive permissions based on your security requirements. |
| Hardcoded service account name in the script | Avoid hardcoding sensitive information like service account names. Instead, retrieve them from a secure source or use environment variables. |
| Missing encryption settings for GCS bucket | Ensure that the GCS bucket is encrypted by default. Add the --encryption flag to the gsutil mb command if you want to specify a specific encryption type, such as --encryption=DEFAULT. |
| No logging and monitoring configurations | Implement logging and monitoring for the resources created. Enable Cloud Logging and Monitoring to track access and usage of the secrets and GCS bucket. |
| Using automatic replication policy for secrets | Consider using a more controlled replication policy for secrets. Automatic replication might not be necessary for all use cases, and you should evaluate whether it aligns with your security and compliance requirements. |
| Lack of error handling for secret creation | Add proper error handling when creating the secret to ensure that any issues during the creation process are caught and addressed appropriately. |
| No version control for secrets | Ensure that secrets have a versioning strategy in place. This allows you to manage changes and roll back to previous versions if needed. |
| Potential for misconfiguration of IAM roles | Double-check the IAM roles being assigned to ensure they align with the principle of least privilege. Avoid assigning broader permissions than necessary for the service account. |
Conclusion
Looks extremely interesting:
- The time elapsed is quite good for me.
- The LLM answer is quite similar to cloud LLMs. And it was achieved without prompt tuning or additional context manipulations.
Further steps planned:
- Experiment with models, context windows, prompts and additional contexts.
- Check whether it will work on some kind of a local SOHO server for batch tasks.
Top comments (2)
The engineering nuancesβlike pre-cleaning code with sed and implementing a dynamic COEFF cooldown loopβare sheer genius, perfectly balancing local compute efficiency with device longevity. In our work with private clouds and dedicated NAS hardware, we heavily advocate for this exact brand of local-first, privacy-centric batch processing. Looking forward to seeing your upcoming stress tests on a dedicated SOHO server!
Thanks @lcmd007 ! π π π