matias yoon

Posted on May 24

터미널 AI 에이전트 구축 (v7)

#ai #llm #developers #tutorial

터미널 AI 에이전트 구축 (v7)

터미널에서 실행되는 AI 에이전트를 구축하여 코드 작성 속도를 높이는 것은 현대 개발자에게 매우 실용적인 도구입니다. 이 가이드에서는 로컬 LLM을 기반으로 한 터미널 AI 에이전트를 구축하고, 실제 개발 워크플로우에 통합하는 방법을 자세히 다룹니다.

1. CLI AI 에이전트 생태계

현재 CLI AI 에이전트 시장에는 여러 가지 솔루션이 존재합니다:

Aider: Python 기반의 코드 에이전트로, Git과 자연어 통합이 뛰어납니다.

Continue.dev: VS Code 확장 프로그램으로, 터미널에서 작동하는 코드 에이전트입니다.

OpenCode: 오픈소스 CLI 도구로, 다양한 LLM과 호환됩니다.

커스텀 스크립트: 직접 제작한 Python 스크립트로, 특정 요구 사항을 완벽하게 맞출 수 있습니다.

2. 로컬 LLM API 엔드포인트 설정

로컬 LLM을 위한 API 엔드포인트를 설정하려면 다음과 같은 단계를 따릅니다:

# 1. llama.cpp 설치
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# 2. 모델 다운로드 및 변환
wget https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/resolve/main/mistral-7b-v0.1.Q4_K_M.gguf
python convert.py ./mistral-7b-v0.1.Q4_K_M.gguf

# 3. 서버 실행
./server -m ./mistral-7b-v0.1.Q4_K_M.gguf -c 2048 --host 127.0.0.1 --port 8080

3. Python CLI 에이전트 구축

다음은 간단한 Python CLI 에이전트의 예제입니다:

# ai_agent.py
import openai
import os
import json
from typing import List, Dict

class TerminalAgent:
    def __init__(self):
        self.client = openai.OpenAI(
            base_url="http://localhost:8080/v1",
            api_key="sk-no-key-required"
        )
        self.system_prompt = """
        You are a helpful AI coding assistant. You can:
        1. Read and analyze code files
        2. Generate new code based on requirements
        3. Fix bugs in existing code
        4. Explain code concepts
        """

    def run_function_call(self, functions: List[Dict], messages: List[Dict]) -> str:
        response = self.client.chat.completions.create(
            model="local",
            messages=messages,
            functions=functions,
            function_call="auto"
        )
        return response.choices[0].message.content

# 사용 예시
agent = TerminalAgent()
functions = [
    {
        "name": "read_file",
        "description": "Read content of a file",
        "parameters": {
            "type": "object",
            "properties": {
                "filename": {"type": "string"}
            }
        }
    }
]

messages = [
    {"role": "system", "content": agent.system_prompt},
    {"role": "user", "content": "Read my main.py file"}
]
print(agent.run_function_call(functions, messages))

4. tmux와 통합

tmux를 활용하여 터미널 분할 환경에서 에이전트를 사용할 수 있습니다:

# tmux 세션 생성
tmux new-session -s ai_agent -d

# 세션 내부에서 에이전트 실행
tmux send-keys -t ai_agent "python ai_agent.py" Enter

# 세션 분할 (가로)
tmux split-window -h -t ai_agent

# 세션 분할 (세로)
tmux split-window -v -t ai_agent

# 세션에 접속
tmux attach -t ai_agent

5. 사용자 정의 도구 개발

다음은 코드 검색과 Git 연동을 위한 사용자 정의 도구입니다:

# custom_tools.py
import subprocess
import os

class CodeSearchTool:
    def __init__(self):
        self.root_path = os.getcwd()

    def search_code(self, pattern: str) -> str:
        try:
            result = subprocess.run(
                ["grep", "-r", pattern, self.root_path],
                capture_output=True,
                text=True,
                timeout=10
            )
            return result.stdout
        except subprocess.TimeoutExpired:
            return "Search timed out"

    def git_status(self) -> str:
        result = subprocess.run(["git", "status", "--porcelain"], 
                              capture_output=True, text=True)
        return result.stdout

# Git 도구
class GitTool:
    @staticmethod
    def git_commit(message: str) -> str:
        try:
            subprocess.run(["git", "commit", "-am", message], check=True)
            return "Commit successful"
        except subprocess.CalledProcessError as e:
            return f"Commit failed: {e}"

6. 컨텍스트 윈도우 관리

대형 코드베이스에서 컨텍스트 윈도우를 관리하는 방법:

# context_manager.py
import os
import tiktoken

class ContextManager:
    def __init__(self, max_tokens: int = 2048):
        self.max_tokens = max_tokens
        self.encoding = tiktoken.encoding_for_model("gpt-4")

    def truncate_context(self, context: str, max_tokens: int = None) -> str:
        if max_tokens is None:
            max_tokens = self.max_tokens

        tokens = self.encoding.encode(context)
        if len(tokens) <= max_tokens:
            return context

        truncated_tokens = tokens[:max_tokens]
        return self.encoding.decode(truncated_tokens)

    def smart_context(self, files: List[str]) -> str:
        context = ""
        for file in files:
            if os.path.exists(file):
                with open(file, 'r') as f:
                    content = f.read()
                    if len(context) + len(content) > self.max_tokens:
                        break
                    context += f"\n=== {file} ===\n{content}\n"
        return context

7. 로컬 vs API 모델 최적화

성능과 비용을 최적화하기 위한 방법:

# 1. 모델 양자화
./convert.py --outtype q4_k_m --outfile model-q4_k_m.gguf model-f16.gguf

# 2. GPU 메모리 최적화 실행
./server -m ./model-q4_k_m.gguf -c 2048 --n-gpu-layers 35 --host 127.0.0.1 --port 8080

# 3. CPU 최적화
./server -m ./model-q4_k_m.gguf -c 2048 --n-gpu-layers 0 --host 127.0.0.1 --port 8080

8. 실제 워크플로우 예제

다음은 실제 개발 워크플로우 예시입니다:

# 1. 새 기능 개발
# - 터미널에서 에이전트 실행
tmux new-session -s dev -d
tmux send-keys -t dev "python ai_agent.py" Enter

# - 코드 생성 요청
# (사용자가 요구사항 입력)
# "이 프로젝트에 새로운 API 엔드포인트를 추가해줘"

# - 자동 생성된 코드를 검토하고 커밋
git add .
git commit -m "Add new API endpoint for user management"

# 2. 버그 수정
# - 에이전트가 코드 분석
# - 버그 위치 식별
# - 수정된 코드 생성

# 3. 문서화
# - 코드 설명 생성
# - API 문서 자동 생성

고급 최적화 팁

# 1. 자동 모델 캐싱
# ~/.cache/llm 폴더에 모델 캐시 설정
mkdir -p ~/.cache/llm

# 2. 프로세스 모니터링
# GPU 메모리 사용량 모니터링
watch -n 1 nvidia-smi

# 3. 자동 재시작 스크립트
# restart_agent.sh
#!/bin/bash
if ! pgrep -f "server.*gguf" > /dev/null; then
    echo "Starting server..."
    ./server -m ./mistral-7b-v0.1.Q4_K_M.gguf -c 2048 --host 127.0.0.1 --port 8080 &
fi

📥 Get the full guide on Gumroad: https://gumroad.com/l/auto ($5)

DEV Community

터미널 AI 에이전트 구축 (v7)

터미널 AI 에이전트 구축 (v7)

1. CLI AI 에이전트 생태계

2. 로컬 LLM API 엔드포인트 설정

3. Python CLI 에이전트 구축

4. tmux와 통합

5. 사용자 정의 도구 개발

6. 컨텍스트 윈도우 관리

7. 로컬 vs API 모델 최적화

8. 실제 워크플로우 예제

고급 최적화 팁

Top comments (0)