Stop Paying for APIs: Build a 100% Local AI Auditor with Python & Llama 3

#python #rag #ai #automation

If you have worked in tech or digital operations for a few years, you likely know Python as the "Swiss Army Knife" of daily tasks. For a long time, it has been the undisputed king of local automation. We’ve all used it (or seen it used) to merge dozens of CSV files in seconds, batch-rename image archives, scrape documents from the web, or manipulate Excel sheets without ever opening them.

These scripts are fantastic, but they have a hard limit: they are "blind." They execute rigid instructions. If a file structure changes, the script breaks. Most importantly, they don't understand the content they are processing.

Today, thanks to the integration of Generative AI, Python is experiencing a second youth. We are no longer just moving data; we can now understand it.

Welcome to the world of RAG.

What is RAG and Why it Beats a Standard Chatbot

Imagine asking a standard AI (like ChatGPT or Claude) to analyze your specific website against your company's internal brand guidelines. You will likely get vague answers or "hallucinations" because the model doesn't know your real-time data or your specific private criteria.

This is where RAG (Retrieval-Augmented Generation) comes in.

RAG is a technique that combines the linguistic power of AI with the precision of a private library. Instead of relying solely on the model's pre-trained memory (which is often outdated), a RAG system:

Retrieves relevant information from an external source you provide (a PDF, a database, or in this case, a live website).
Augments the prompt sent to the AI with this fresh context.
Generates an answer based exclusively on that data.

The advantage? Fewer hallucinations, up-to-date data, and the ability to apply your business rules to your data—all potentially running locally on your machine.

The Project: An AI Marketing Auditor

To demonstrate this power, I’ve written a compact Python script. The goal is to create a Marketing Auditor: a tool that reads a website, compares it against our internal marketing criteria (Tone of Voice, SEO, Call to Action), and provides a score with strategic advice.

We will use:

LangChain: The framework to orchestrate the workflow.
Ollama (Llama 3): To run the AI locally (total privacy, zero API costs).
ChromaDB: To store website data in vector format.

The Code

Here is the complete code app.py. To run this, you would need a local folder named criteri containing .txt files with your guidelines (e.g., seo_copywriting.txt).

(Note: The comments and prompts have been translated to English for this article).

Python

import os
import time
from tqdm import tqdm
from langchain_community.document_loaders import WebBaseLoader, TextLoader
from langchain_community.vectorstores import Chroma
from langchain_ollama import OllamaEmbeddings, ChatOllama
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA

# --- CONFIGURATION ---
MODEL_NAME = "llama3"
CRITERIA_DIR = "./criteria" # Directory for your txt guidelines

def load_marketing_criteria(directory):
    """Loads the defined marketing txt files."""
    combined_content = ""
    # Example files you might have in your folder
    target_files = ["tone_of_voice.txt", "call_to_action.txt", "seo_copywriting.txt"]

    if not os.path.exists(directory):
        return None

    for filename in target_files:
        path = os.path.join(directory, filename)
        if os.path.exists(path):
            loader = TextLoader(path, encoding='utf-8')
            combined_content += f"\n--- CRITERIA: {filename.upper()} ---\n"
            combined_content += loader.load()[0].page_content
    return combined_content

def run_rag_analysis():
    # User Input
    print("--- RAG Marketing Analyzer ---")
    target_url = input("Enter the URL to analyze: ").strip()

    if not target_url.startswith("http"):
        print("Error: Please enter a valid URL (must start with http or https).")
        return

    # Progress Bar for visual feedback
    pbar = tqdm(total=100, desc="Analysis Progress", bar_format="{l_bar}{bar}| {n_fmt}/{total_fmt}%")

    try:
        # Step 1: Load Criteria (The "Manual")
        criteria = load_marketing_criteria(CRITERIA_DIR)
        if not criteria:
            pbar.close()
            print("Error: Ensure .txt files are in the 'criteria' folder.")
            return
        pbar.update(20)

        # Step 2: Site Scraping
        loader = WebBaseLoader(target_url)
        data = loader.load()
        pbar.update(20)

        # Step 3: Text Chunking
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
        all_splits = text_splitter.split_documents(data)
        pbar.update(20)

        # Step 4: Vector Store Creation (Embeddings)
        embeddings = OllamaEmbeddings(model=MODEL_NAME)
        vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)
        pbar.update(20)

        # Step 5: Llama 3 Analysis
        llm = ChatOllama(model=MODEL_NAME, temperature=0.2)

        prompt_str = f"""
        You are a Senior Marketing Auditor. Analyze the provided website using these criteria:
        {criteria}

        Provide a report containing:
        1. An analysis for each category with a score from 1 to 10.
        2. A final average score.
        3. A strategic recommendation.
        """

        qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",
            retriever=vectorstore.as_retriever()
        )

        result = qa_chain.invoke(prompt_str)
        pbar.update(20)
        pbar.close()

        # Final Output
        print("\n" + "="*60)
        print(f"REPORT FOR: {target_url}")
        print("="*60)
        print(result["result"])
        print("="*60)

    except Exception as e:
        pbar.close()
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    run_rag_analysis()

How It Works "Under the Hood"

For those who want to build similar tools, here is the breakdown of the logic:

Context Loading (load_marketing_criteria): Before looking at the website, the script loads the "rules of the game." It reads local text files containing your best practices. This is what differentiates a generic analysis from a custom one.
Web Scraping (WebBaseLoader): LangChain downloads the HTML content of the target URL and converts it into raw text.
Chunking & Embeddings: This is where the magic happens. The website text is split into small, digestible pieces ("chunks") and converted into numerical vectors using OllamaEmbeddings. These are stored in Chroma, a vector database that allows the AI to retrieve only the paragraphs semantically relevant to our query.
Generation (ChatOllama): Finally, we pass everything to Llama 3. We give it the criteria, we give it the website content retrieved from the database, and we ask it to act as a "Senior Marketing Auditor."

Conclusion: A Game Changer for Productivity

This script is just a prototype—a simple example of what can be achieved in under 100 lines of code. However, the implications are massive.

Imagine scaling this concept. It’s not just for analyzing a website; consider the possibilities for:

Consultancy: Analyzing financial statements or annual reports from 50 different companies and comparing them against your firm's specific investment criteria.
Management: Automatically summarizing weekly support tickets to identify recurring trends without reading them one by one.
HR: comparing hundreds of CVs against a specific job description saved locally.

Coupling Python's automation capabilities with RAG models transforms the computer from a simple executor into an active collaborator. It drastically reduces the time spent on repetitive analysis, freeing up professionals, managers, and employees to focus on what truly matters: strategy and creativity.