Alex Retana

Posted on Aug 12

Crafting a Monster Hunter Wilds AI Assistant: Scrapy, Vector Search & Prompt Engineering

#llm #rag #webscraping #vectordatabase

Building a Local Monster Hunter Wilds RAG System: From Web Scraping to Prompt Engineering

Gaming wikis are treasure troves of detailed information, but finding the right answer to specific questions can be like hunting a Rathalos in a thunderstorm. What if you could have a personal Monster Hunter expert that knows every weapon combo, monster weakness, and crafting recipe? That's exactly what I built with my Monster Hunter Wilds RAG (Retrieval-Augmented Generation) system.

In this article, I'll walk you through building a complete RAG pipeline that scrapes gaming wiki content, vectorizes it for fast retrieval, and serves intelligent answers through a local web interface. Along the way, we'll explore why certain architectural decisions were made and how prompt engineering can dramatically improve system performance.

🏗️ System Architecture: Two-Part Approach

The system consists of two main components:

Intelligent Web Scraper: Harvests and structures wiki content
RAG Pipeline: Retrieves relevant content and generates contextual answers

Let's dive into each part.

Part 1: Building the Web Scraper with Scrapy

Why Scrapy Over Custom Solutions?

When building a web scraper, you have several options:

Write a custom scraper with requests and BeautifulSoup
Use browser automation tools like Selenium
Leverage a professional scraping framework like Scrapy

I chose Scrapy for several compelling reasons:

1. Built-in Politeness: Scrapy respects robots.txt files and implements automatic delays between requests, making it respectful to target servers.

2. Robust Crawling Features:

Pause/resume functionality through JOBDIR settings
Automatic duplicate detection and filtering
Depth limiting to prevent infinite crawling
Built-in retry mechanisms for failed requests

3. Scalability: Scrapy handles concurrent requests efficiently and can scale from small wikis to massive sites.

4. Extensibility: The pipeline architecture allows for easy data processing and storage customization.

Spider Implementation

Here's the core of my Fextralife spider:

class MyFextralifeSpider(scrapy.Spider):
    name = "myfextralifespider"
    allowed_domains = ["monsterhunterwilds.wiki.fextralife.com"]
    start_urls = ["https://monsterhunterwilds.wiki.fextralife.com/Monster+Hunter+Wilds+Wiki"]

    custom_settings = {
        "JOBDIR": f'jobs/daily-fextralife-{datetime.today().strftime("%Y-%m-%d")}',
        "DEPTH_LIMIT": 6,
        "CLOSESPIDER_TIMEOUT": 3600,
        "ITEM_PIPELINES": {
            'wikiproject.pipelines.WikiprojectPipeline': 300,
        }
    }

The spider automatically:

Follows internal links within the domain
Skips static assets (images, CSS, JS files)
Limits crawl depth to prevent infinite loops
Saves progress for pause/resume functionality

Intelligent Content Extraction

The magic happens in the content parsing. Wiki pages contain both structured (tables) and unstructured (text) content:

def parse_wiki_content(self, sel):
    # Extract clean text content from the main wiki content block
    wikicontent = (" ".join([
        x.strip() for x in sel.xpath('//div[@id="wiki-content-block"]//text()').getall()
    ])).replace('\xa0', ' ')
    return wikicontent

def parse_wiki_tables(self, html):
    # Convert HTML tables to structured JSON
    # Handles nested tables, images with alt text, and complex structures
    # Returns normalized data ready for vectorization
    ...

The system extracts:

Breadcrumb navigation for content categorization
Clean text content from the main wiki areas
Structured table data converted to JSON format
URL references for source attribution

Each page is transformed into a structured document:

If the user's answer is answered by information in this file, please direct them to {url}
URL: {url}
####################
Page Title: {title}
####################
Breadcrumb: {breadcrumb}
####################
Page Content:
{clean_text_content}
####################
Page Tables Stored as JSON:
{structured_table_data}

The Critical on_close() Function

Here's where the scraped data gets vectorized and stored. In Scrapy's pipeline system, the close_spider method in pipelines.py is called when crawling finishes:

def close_spider(self, spider):
    # Deduplicate scraped content and build breadcrumb map
    breadcrumb_map, total_page_count = dedupe_and_build_breadcrumb_map()

    print(f"Total Pages Scraped: {total_page_count}")
    print("Data has been ingested into Chroma vector store")

The dedupe_and_build_breadcrumb_map() function handles the final data processing:

def upsert_into_chroma(df):
    """Upserts DataFrame content into Chroma vector store."""
    print("Starting Chroma ingestion...")

    # Initialize embedding model
    embed_model = FixedOllamaEmbedding(model_name="nomic-embed-text")

    # Create persistent Chroma client
    chroma_client = chromadb.PersistentClient(path="../chroma_db")
    chroma_collection = chroma_client.get_or_create_collection("monsterhunter_fextralife_wiki")
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

    # Convert to LlamaIndex Documents and create vector index
    documents = [Document(text=row["wiki_content"], metadata={"url": row["url"]}) 
                for _, row in df.iterrows()]

    index = VectorStoreIndex.from_documents(
        documents, storage_context=storage_context, embed_model=embed_model
    )

    print(f"Successfully ingested {len(documents)} documents")

This approach ensures all scraped content is automatically vectorized and ready for semantic search.

Part 2: The RAG System with OpenWebUI

OpenWebUI + Pipelines Architecture

I chose OpenWebUI as the frontend because it provides:

Familiar Chat Interface: ChatGPT-like experience for users
Pipeline System: Custom processing between user input and LLM
Local Hosting: Complete control over data and privacy
Multiple Model Support: Works with Ollama's local models

The pipeline architecture works like this:

User Query → OpenWebUI → Custom Pipeline → Chroma Search → Context + Query → LLM → Response

Early Implementation: Simple Interception

Initially, the pipeline was quite basic:

async def on_message(self, body: dict, __user__: Optional[dict] = None) -> dict:
    # Simply intercept the message
    user_query = body.get("content", "")

    # Search Chroma for relevant content  
    results = self.vector_store.search(user_query, top_k=5)

    # Lazily combine results with query
    context = "\n\n".join([doc.text for doc in results])
    enhanced_query = f"Context: {context}\n\nQuery: {user_query}"

    # Pass to LLM
    body["content"] = enhanced_query
    return body

This worked, but responses were generic and often missed domain-specific nuances.

Part 3: Evaluation Framework

Before diving into improvements, I built a comprehensive evaluation system to measure performance objectively.

Evaluation Metrics

Following LlamaIndex best practices, I implemented both end-to-end and component-wise evaluation:

End-to-End Metrics:

Faithfulness (0-1): Are responses faithful to retrieved context? (No hallucinations)
Relevancy (0-1): Are responses relevant to the query?
Correctness (0-1): Are responses factually correct?
Semantic Similarity (0-1): How similar are responses to expected answers?

Component-Wise Metrics:

Hit Rate: Percentage of queries where relevant documents are retrieved
Mean Reciprocal Rank (MRR): Quality of retrieval ranking
Response Time: Performance measurement

Dataset Generation

I created two approaches for generating evaluation data:

1. Automated Question Generation:

def generate_questions_from_vectorstore():
    # Sample random documents from Chroma
    # Use LLM to generate realistic questions
    # Create diverse query types (factual, procedural, comparative)
    # Categorize by content type (weapons, monsters, crafting, etc.)
    ...

2. Manual Answer Annotation:
I built a separate annotation program that:

Takes generated questions
Retrieves potential answers from the RAG system
Presents them to human reviewers for validation
Builds high-quality ground truth datasets

This hybrid approach ensured both scale (144 auto-generated questions) and quality (15 carefully curated sample queries).

Part 4: The Power of Prompt Engineering

Pre-Prompt Engineering Results

Running evaluation on the basic system revealed significant issues:

Dataset	Faithfulness	Relevancy	Correctness
Sample Queries (15)	80.0%	86.67%	26.67%
Generated Questions (144)	77.08%	90.97%	83.33%

The correctness scores revealed a major problem: while the system could find relevant information, it struggled to provide accurate, domain-specific answers.

What is Prompt Engineering?

Prompt engineering is the practice of designing, optimizing, and refining the instructions given to language models to achieve better performance on specific tasks. It involves:

Role Definition: Establishing the AI's persona and expertise
Context Guidelines: Specifying how to use provided information
Output Formatting: Defining response structure and style
Error Handling: Instructions for edge cases and missing information

Custom Monster Hunter Prompts

I implemented domain-specific prompts that transformed the system:

mh_qa_template = PromptTemplate(
    template=(
        "You are an expert Monster Hunter guide and wiki assistant with deep knowledge "
        "of Monster Hunter: Wilds. Your role is to provide accurate, helpful information "
        "about weapons, monsters, gameplay mechanics, and strategies.\n\n"

        "IMPORTANT GUIDELINES:\n"
        "- Use ONLY the information provided in the context below\n"
        "- Use correct Monster Hunter terminology (e.g., 'Great Sword' not 'Greatsword')\n"
        "- If information is insufficient, clearly state what you cannot answer\n"
        "- Include relevant URLs when directing users to specific pages\n"
        "- Structure responses clearly with sections when appropriate\n\n"

        "Context Information:\n"
        "{context_str}\n\n"

        "User Question: {query_str}\n\n"

        "Provide a comprehensive answer based on the context above:"
    )
)

Key improvements included:

Expert Persona: "You are an expert Monster Hunter guide"
Terminology Enforcement: Specific language requirements
Context Boundaries: "Use ONLY the information provided"
Response Structure: Clear formatting guidelines
Source Attribution: Including URLs for references

Results After Prompt Engineering

The impact was dramatic:

Dataset	Metric	Before	After	Improvement
Sample Queries	Correctness	26.67%	93.33%	+250%
Sample Queries	Faithfulness	80.0%	80.0%	Maintained
Generated Questions	Correctness	83.33%	91.67%	+10%
Generated Questions	Faithfulness	77.08%	86.11%	+12%

Key Performance Highlights

Exceptional Correctness Improvement:

Sample dataset correctness jumped from 26.67% to 93.33% - a 250% improvement
Large dataset correctness increased from 83.33% to 91.67%
Users now receive significantly more accurate responses

Enhanced Faithfulness:

12% improvement on large dataset (reduced hallucinations)
Better adherence to source material
Increased system reliability

Domain Expertise Integration:

Proper Monster Hunter terminology usage
Contextually appropriate responses
Category-specific performance improvements

The system now provides accurate answers 9 out of 10 times, with responses that stay true to the source material while being highly relevant to user queries.

Part 5: Local Hosting and Hardware Considerations

Why Local Over Cloud?

I made the conscious decision to keep this system local rather than hosting it online for several reasons:

Cost Considerations:

GPU Requirements: The system performs best with GPU acceleration for embeddings and LLM inference
High Memory Usage: Running multiple large language models (embedding + chat model) requires significant RAM
Storage Needs: Vector databases and model files consume substantial disk space
Compute Costs: Cloud GPU instances are expensive for continuous operation

Privacy Benefits:

Complete control over data
No external API dependencies
Gaming queries remain private
Can customize without service restrictions

Hardware Requirements

The system runs smoothly on my RTX 3090 setup:

GPU: RTX 3090 (24GB VRAM) - handles both embedding and LLM inference
RAM: 32GB system RAM for vector operations
Storage: SSD storage for fast vector database access

Performance with RTX 3090:

Embedding Generation: ~2-3 seconds for query embedding
Vector Search: Sub-second retrieval from Chroma
LLM Inference: 8-15 seconds for complete responses
Total Response Time: 10-20 seconds end-to-end

Automated Setup Scripts

To make the system accessible, I created comprehensive build and startup scripts:

Frontend Build Process:

# Windows
build.bat

# Linux/macOS  
./build.sh

System Startup:

# Windows
start_windows.bat

# Linux/macOS
./start.sh

The scripts automatically:

Install and configure Ollama server
Download required AI models (llama3:8b, nomic-embed-text)
Set up conda environments for different components
Build the OpenWebUI frontend
Launch all services in separate terminal windows

This automation transforms a complex multi-component system into a simple double-click experience.

Key Lessons Learned

1. Scrapy's Professional Features Matter

The built-in politeness, retry mechanisms, and pause/resume capabilities saved countless hours compared to custom solutions.

2. Data Quality Trumps Quantity

150 well-processed, structured documents outperformed thousands of poorly parsed pages.

3. Prompt Engineering is Critical

Generic prompts led to 26.67% correctness; domain-specific prompts achieved 93.33% - a game-changing difference.

4. Evaluation Drives Improvement

Without quantitative metrics, I would have never discovered the correctness issues or measured the dramatic improvements.

5. Local Hosting is Viable for Personal Projects

Modern consumer GPUs like the RTX 3090 make sophisticated AI systems accessible for personal use without ongoing cloud costs.

Future Enhancements

Several improvements could further enhance the system:

Multi-game Support: Extend to other gaming wikis
Advanced Context: Conversation history and user preferences
Performance Optimization: Reduce response times while maintaining quality
Mobile Interface: Responsive design for gaming on-the-go
Community Features: Shared question libraries and answer validation

Conclusion

Building this Monster Hunter RAG system taught me that modern AI tools can transform how we interact with domain-specific knowledge. The combination of intelligent web scraping, vector search, and carefully engineered prompts creates an experience far superior to traditional wiki browsing.

The system went from providing correct answers 1 in 4 times to 9 in 10 times through prompt engineering alone. This demonstrates the critical importance of domain-specific customization in RAG systems.

For gaming enthusiasts, researchers, or anyone working with specialized knowledge domains, this architecture provides a blueprint for building your own intelligent information systems. The complete codebase, evaluation framework, and setup scripts make it accessible even for those new to RAG systems.

Want to build your own gaming RAG system? The complete project is open source and includes automated setup scripts, comprehensive evaluation tools, and detailed documentation to get you started.

Happy hunting! 🏹

Tech Stack Used:

Web Scraping: Scrapy, BeautifulSoup
Vector Database: ChromaDB
LLM Framework: LlamaIndex
Models: Ollama (Llama 3, Nomic Embed)
Frontend: OpenWebUI (SvelteKit)
Evaluation: Custom framework with automated metrics
Languages: Python, JavaScript, Shell scripting

This project showcases the power of combining modern AI tools with careful engineering to create practical, high-performance systems for specialized domains.

DEV Community