soy

Posted on Mar 8 • Edited on Mar 24 • Originally published at media.patentllm.org

Giving a 'Brain' to Minecraft NPCs with a Local LLM — Nemotron + Mineflayer Implementation Notes

#ai #machinelearning #llm

What We Want to Achieve

Traditional Minecraft bots primarily relied on command-based operations, but natural conversation with players and situation-aware decision-making remained a challenge. This article focuses on implementing a system where a locally running LLM is integrated into an NPC to automatically generate situation awareness → decision-making → actions. We will locally execute NVIDIA's Nemotron 9B model with vLLM, integrate it with the Minecraft world via Mineflayer, and achieve flexible responses to player utterances.

System Architecture

This system consists of four layers.

Minecraft Server
  ↓
Mineflayer (Minecraft operation with Node.js)
  ↓ IPC (WebSocket/stdin)
brain.py (LLM integration with Python)
  ↓
vLLM (Local execution of Nemotron 9B)

Role of Each Component

Mineflayer: A Node.js library that connects to the Minecraft server and controls block operations and chat events.
brain.py: Collects situation awareness data (player position, inventory, etc.) and sends the context to the LLM.
vLLM: Performs high-speed inference for the Nemotron 9B model.
IPC Layer: Communication between Python and Node.js occurs via WebSocket or stdin/stdout. Since Python cannot directly call Mineflayer's API, inter-process communication is necessary.

brain.py Implementation

Situation Awareness

Mineflayer on the Node.js side collects world state in JSON format and sends it to Python via WebSocket.

import json
import websockets

async def receive_world_state(ws):
    """Receive world state from Node.js Mineflayer"""
    data = await ws.recv()
    state = json.loads(data)
    return state  # {"position": [x, y, z], "inventory": [...]}

LLM Decision Making

The collected state is sent to vLLM to obtain a natural language decision. Since vLLM provides an OpenAI-compatible API, it can be connected using the openai library.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

def get_llm_decision(state, player_message):
    prompt = f"""Current Situation:
- Player Coordinates: {state['position']}
- Inventory: {state['inventory']}
The player said: '{player_message}'.
Please instruct the appropriate action in JSON format."""

    response = client.chat.completions.create(
        model="nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=128,
        temperature=0.3
    )
    return response.choices[0].message.content

Action Execution

The LLM's decision result is received in JSON format and sent as a command to Mineflayer on the Node.js side via WebSocket.

Responding to Player Utterances

The processing flow when a player speaks is as follows:

Mineflayer's chat event is detected on the Node.js side.
Situation awareness data is collected and sent to Python via WebSocket.
brain.py sends "conversation context + player utterance" to vLLM.
The response is returned to Node.js via WebSocket and sent to Minecraft chat.

Advantages of Local LLM

No API Charges

Since vLLM runs locally, there are no monthly fees. It can be operated solely on electricity costs.

How to Start vLLM

docker run --gpus all -p 8000:8000 \
  --env HF_TOKEN="your_huggingface_token" \
  vllm/vllm-openai:latest \
  --model nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.9

Pitfalls

Mineflayer is a Node.js Library

Since Mineflayer is a Node.js library, its API cannot be called directly from Python. It requires a design where versions are managed in the Node.js project's package.json and Python interacts via inter-process communication.

vLLM Authentication

When accessing models on Hugging Face Hub, set the HF_TOKEN environment variable.

Summary

Intelligent Minecraft NPCs leveraging local LLMs are achievable despite facing technical challenges. With Nemotron model's Japanese support and vLLM's high-speed inference, natural dialogue with players can be realized. The IPC design between Python and Node.js is key to the implementation.

This article was generated by Nemotron-Nano-9B-v2-Japanese and formatted/verified by Gemini 2.5 Flash.

DEV Community