Building a Local Monster Hunter Wilds RAG System: From Web Scraping to Prompt Engineering
Gaming wikis are treasure troves of detailed information, but finding the right answer to specific questions can be like hunting a Rathalos in a thunderstorm. What if you could have a personal Monster Hunter expert that knows every weapon combo, monster weakness, and crafting recipe? That's exactly what I built with my Monster Hunter Wilds RAG (Retrieval-Augmented Generation) system.
In this article, I'll walk you through building a complete RAG pipeline that scrapes gaming wiki content, vectorizes it for fast retrieval, and serves intelligent answers through a local web interface. Along the way, we'll explore why certain architectural decisions were made and how prompt engineering can dramatically improve system performance.
🏗️ System Architecture: Two-Part Approach
The system consists of two main components:
- Intelligent Web Scraper: Harvests and structures wiki content
- RAG Pipeline: Retrieves relevant content and generates contextual answers
Let's dive into each part.
Part 1: Building the Web Scraper with Scrapy
Why Scrapy Over Custom Solutions?
When building a web scraper, you have several options:
- Write a custom scraper with
requests
andBeautifulSoup
- Use browser automation tools like Selenium
- Leverage a professional scraping framework like Scrapy
I chose Scrapy for several compelling reasons:
1. Built-in Politeness: Scrapy respects robots.txt
files and implements automatic delays between requests, making it respectful to target servers.
2. Robust Crawling Features:
- Pause/resume functionality through
JOBDIR
settings - Automatic duplicate detection and filtering
- Depth limiting to prevent infinite crawling
- Built-in retry mechanisms for failed requests
3. Scalability: Scrapy handles concurrent requests efficiently and can scale from small wikis to massive sites.
4. Extensibility: The pipeline architecture allows for easy data processing and storage customization.
Spider Implementation
Here's the core of my Fextralife spider:
class MyFextralifeSpider(scrapy.Spider):
name = "myfextralifespider"
allowed_domains = ["monsterhunterwilds.wiki.fextralife.com"]
start_urls = ["https://monsterhunterwilds.wiki.fextralife.com/Monster+Hunter+Wilds+Wiki"]
custom_settings = {
"JOBDIR": f'jobs/daily-fextralife-{datetime.today().strftime("%Y-%m-%d")}',
"DEPTH_LIMIT": 6,
"CLOSESPIDER_TIMEOUT": 3600,
"ITEM_PIPELINES": {
'wikiproject.pipelines.WikiprojectPipeline': 300,
}
}
The spider automatically:
- Follows internal links within the domain
- Skips static assets (images, CSS, JS files)
- Limits crawl depth to prevent infinite loops
- Saves progress for pause/resume functionality
Intelligent Content Extraction
The magic happens in the content parsing. Wiki pages contain both structured (tables) and unstructured (text) content:
def parse_wiki_content(self, sel):
# Extract clean text content from the main wiki content block
wikicontent = (" ".join([
x.strip() for x in sel.xpath('//div[@id="wiki-content-block"]//text()').getall()
])).replace('\xa0', ' ')
return wikicontent
def parse_wiki_tables(self, html):
# Convert HTML tables to structured JSON
# Handles nested tables, images with alt text, and complex structures
# Returns normalized data ready for vectorization
...
The system extracts:
- Breadcrumb navigation for content categorization
- Clean text content from the main wiki areas
- Structured table data converted to JSON format
- URL references for source attribution
Each page is transformed into a structured document:
If the user's answer is answered by information in this file, please direct them to {url}
URL: {url}
####################
Page Title: {title}
####################
Breadcrumb: {breadcrumb}
####################
Page Content:
{clean_text_content}
####################
Page Tables Stored as JSON:
{structured_table_data}
The Critical on_close() Function
Here's where the scraped data gets vectorized and stored. In Scrapy's pipeline system, the close_spider
method in pipelines.py
is called when crawling finishes:
def close_spider(self, spider):
# Deduplicate scraped content and build breadcrumb map
breadcrumb_map, total_page_count = dedupe_and_build_breadcrumb_map()
print(f"Total Pages Scraped: {total_page_count}")
print("Data has been ingested into Chroma vector store")
The dedupe_and_build_breadcrumb_map()
function handles the final data processing:
def upsert_into_chroma(df):
"""Upserts DataFrame content into Chroma vector store."""
print("Starting Chroma ingestion...")
# Initialize embedding model
embed_model = FixedOllamaEmbedding(model_name="nomic-embed-text")
# Create persistent Chroma client
chroma_client = chromadb.PersistentClient(path="../chroma_db")
chroma_collection = chroma_client.get_or_create_collection("monsterhunter_fextralife_wiki")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# Convert to LlamaIndex Documents and create vector index
documents = [Document(text=row["wiki_content"], metadata={"url": row["url"]})
for _, row in df.iterrows()]
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, embed_model=embed_model
)
print(f"Successfully ingested {len(documents)} documents")
This approach ensures all scraped content is automatically vectorized and ready for semantic search.
Part 2: The RAG System with OpenWebUI
OpenWebUI + Pipelines Architecture
I chose OpenWebUI as the frontend because it provides:
- Familiar Chat Interface: ChatGPT-like experience for users
- Pipeline System: Custom processing between user input and LLM
- Local Hosting: Complete control over data and privacy
- Multiple Model Support: Works with Ollama's local models
The pipeline architecture works like this:
User Query → OpenWebUI → Custom Pipeline → Chroma Search → Context + Query → LLM → Response
Early Implementation: Simple Interception
Initially, the pipeline was quite basic:
async def on_message(self, body: dict, __user__: Optional[dict] = None) -> dict:
# Simply intercept the message
user_query = body.get("content", "")
# Search Chroma for relevant content
results = self.vector_store.search(user_query, top_k=5)
# Lazily combine results with query
context = "\n\n".join([doc.text for doc in results])
enhanced_query = f"Context: {context}\n\nQuery: {user_query}"
# Pass to LLM
body["content"] = enhanced_query
return body
This worked, but responses were generic and often missed domain-specific nuances.
Part 3: Evaluation Framework
Before diving into improvements, I built a comprehensive evaluation system to measure performance objectively.
Evaluation Metrics
Following LlamaIndex best practices, I implemented both end-to-end and component-wise evaluation:
End-to-End Metrics:
- Faithfulness (0-1): Are responses faithful to retrieved context? (No hallucinations)
- Relevancy (0-1): Are responses relevant to the query?
- Correctness (0-1): Are responses factually correct?
- Semantic Similarity (0-1): How similar are responses to expected answers?
Component-Wise Metrics:
- Hit Rate: Percentage of queries where relevant documents are retrieved
- Mean Reciprocal Rank (MRR): Quality of retrieval ranking
- Response Time: Performance measurement
Dataset Generation
I created two approaches for generating evaluation data:
1. Automated Question Generation:
def generate_questions_from_vectorstore():
# Sample random documents from Chroma
# Use LLM to generate realistic questions
# Create diverse query types (factual, procedural, comparative)
# Categorize by content type (weapons, monsters, crafting, etc.)
...
2. Manual Answer Annotation:
I built a separate annotation program that:
- Takes generated questions
- Retrieves potential answers from the RAG system
- Presents them to human reviewers for validation
- Builds high-quality ground truth datasets
This hybrid approach ensured both scale (144 auto-generated questions) and quality (15 carefully curated sample queries).
Part 4: The Power of Prompt Engineering
Pre-Prompt Engineering Results
Running evaluation on the basic system revealed significant issues:
Dataset | Faithfulness | Relevancy | Correctness |
---|---|---|---|
Sample Queries (15) | 80.0% | 86.67% | 26.67% |
Generated Questions (144) | 77.08% | 90.97% | 83.33% |
The correctness scores revealed a major problem: while the system could find relevant information, it struggled to provide accurate, domain-specific answers.
What is Prompt Engineering?
Prompt engineering is the practice of designing, optimizing, and refining the instructions given to language models to achieve better performance on specific tasks. It involves:
- Role Definition: Establishing the AI's persona and expertise
- Context Guidelines: Specifying how to use provided information
- Output Formatting: Defining response structure and style
- Error Handling: Instructions for edge cases and missing information
Custom Monster Hunter Prompts
I implemented domain-specific prompts that transformed the system:
mh_qa_template = PromptTemplate(
template=(
"You are an expert Monster Hunter guide and wiki assistant with deep knowledge "
"of Monster Hunter: Wilds. Your role is to provide accurate, helpful information "
"about weapons, monsters, gameplay mechanics, and strategies.\n\n"
"IMPORTANT GUIDELINES:\n"
"- Use ONLY the information provided in the context below\n"
"- Use correct Monster Hunter terminology (e.g., 'Great Sword' not 'Greatsword')\n"
"- If information is insufficient, clearly state what you cannot answer\n"
"- Include relevant URLs when directing users to specific pages\n"
"- Structure responses clearly with sections when appropriate\n\n"
"Context Information:\n"
"{context_str}\n\n"
"User Question: {query_str}\n\n"
"Provide a comprehensive answer based on the context above:"
)
)
Key improvements included:
- Expert Persona: "You are an expert Monster Hunter guide"
- Terminology Enforcement: Specific language requirements
- Context Boundaries: "Use ONLY the information provided"
- Response Structure: Clear formatting guidelines
- Source Attribution: Including URLs for references
Results After Prompt Engineering
The impact was dramatic:
Dataset | Metric | Before | After | Improvement |
---|---|---|---|---|
Sample Queries | Correctness | 26.67% | 93.33% | +250% |
Sample Queries | Faithfulness | 80.0% | 80.0% | Maintained |
Generated Questions | Correctness | 83.33% | 91.67% | +10% |
Generated Questions | Faithfulness | 77.08% | 86.11% | +12% |
Key Performance Highlights
Exceptional Correctness Improvement:
- Sample dataset correctness jumped from 26.67% to 93.33% - a 250% improvement
- Large dataset correctness increased from 83.33% to 91.67%
- Users now receive significantly more accurate responses
Enhanced Faithfulness:
- 12% improvement on large dataset (reduced hallucinations)
- Better adherence to source material
- Increased system reliability
Domain Expertise Integration:
- Proper Monster Hunter terminology usage
- Contextually appropriate responses
- Category-specific performance improvements
The system now provides accurate answers 9 out of 10 times, with responses that stay true to the source material while being highly relevant to user queries.
Part 5: Local Hosting and Hardware Considerations
Why Local Over Cloud?
I made the conscious decision to keep this system local rather than hosting it online for several reasons:
Cost Considerations:
- GPU Requirements: The system performs best with GPU acceleration for embeddings and LLM inference
- High Memory Usage: Running multiple large language models (embedding + chat model) requires significant RAM
- Storage Needs: Vector databases and model files consume substantial disk space
- Compute Costs: Cloud GPU instances are expensive for continuous operation
Privacy Benefits:
- Complete control over data
- No external API dependencies
- Gaming queries remain private
- Can customize without service restrictions
Hardware Requirements
The system runs smoothly on my RTX 3090 setup:
- GPU: RTX 3090 (24GB VRAM) - handles both embedding and LLM inference
- RAM: 32GB system RAM for vector operations
- Storage: SSD storage for fast vector database access
Performance with RTX 3090:
- Embedding Generation: ~2-3 seconds for query embedding
- Vector Search: Sub-second retrieval from Chroma
- LLM Inference: 8-15 seconds for complete responses
- Total Response Time: 10-20 seconds end-to-end
Automated Setup Scripts
To make the system accessible, I created comprehensive build and startup scripts:
Frontend Build Process:
# Windows
build.bat
# Linux/macOS
./build.sh
System Startup:
# Windows
start_windows.bat
# Linux/macOS
./start.sh
The scripts automatically:
- Install and configure Ollama server
- Download required AI models (llama3:8b, nomic-embed-text)
- Set up conda environments for different components
- Build the OpenWebUI frontend
- Launch all services in separate terminal windows
This automation transforms a complex multi-component system into a simple double-click experience.
Key Lessons Learned
1. Scrapy's Professional Features Matter
The built-in politeness, retry mechanisms, and pause/resume capabilities saved countless hours compared to custom solutions.
2. Data Quality Trumps Quantity
150 well-processed, structured documents outperformed thousands of poorly parsed pages.
3. Prompt Engineering is Critical
Generic prompts led to 26.67% correctness; domain-specific prompts achieved 93.33% - a game-changing difference.
4. Evaluation Drives Improvement
Without quantitative metrics, I would have never discovered the correctness issues or measured the dramatic improvements.
5. Local Hosting is Viable for Personal Projects
Modern consumer GPUs like the RTX 3090 make sophisticated AI systems accessible for personal use without ongoing cloud costs.
Future Enhancements
Several improvements could further enhance the system:
- Multi-game Support: Extend to other gaming wikis
- Advanced Context: Conversation history and user preferences
- Performance Optimization: Reduce response times while maintaining quality
- Mobile Interface: Responsive design for gaming on-the-go
- Community Features: Shared question libraries and answer validation
Conclusion
Building this Monster Hunter RAG system taught me that modern AI tools can transform how we interact with domain-specific knowledge. The combination of intelligent web scraping, vector search, and carefully engineered prompts creates an experience far superior to traditional wiki browsing.
The system went from providing correct answers 1 in 4 times to 9 in 10 times through prompt engineering alone. This demonstrates the critical importance of domain-specific customization in RAG systems.
For gaming enthusiasts, researchers, or anyone working with specialized knowledge domains, this architecture provides a blueprint for building your own intelligent information systems. The complete codebase, evaluation framework, and setup scripts make it accessible even for those new to RAG systems.
Want to build your own gaming RAG system? The complete project is open source and includes automated setup scripts, comprehensive evaluation tools, and detailed documentation to get you started.
Happy hunting! 🏹
Tech Stack Used:
- Web Scraping: Scrapy, BeautifulSoup
- Vector Database: ChromaDB
- LLM Framework: LlamaIndex
- Models: Ollama (Llama 3, Nomic Embed)
- Frontend: OpenWebUI (SvelteKit)
- Evaluation: Custom framework with automated metrics
- Languages: Python, JavaScript, Shell scripting
This project showcases the power of combining modern AI tools with careful engineering to create practical, high-performance systems for specialized domains.
Top comments (0)