Applying LLMs to Humanities Research: A Primer

#learnai #oxlo #ai

I built a small CLI tool that ingests primary source texts and returns structured scholarly analysis in JSON. It is useful for historians and literary scholars who need to triage large archives or quickly surface themes and close-reading notes from a document. We will wire it to Oxlo.ai, which means you can pass in long speeches or letters without worrying about per-token cost escalation.

What you'll need

Python 3.10 or newer
The OpenAI SDK: pip install openai
An Oxlo.ai API key from https://portal.oxlo.ai
A primary source text file. If you do not have one, create speech.txt and paste a short historical document.

Step 1: Initialize the Oxlo.ai client

First, I verify that the Oxlo.ai client is reachable. I use the OpenAI SDK with Oxlo.ai's base URL and a quick sanity prompt.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a terse assistant."},
        {"role": "user", "content": "Confirm you are online."},
    ],
)

print(response.choices[0].message.content)

Step 2: Define the system prompt

Next, I write the system prompt. The prompt constrains the model to act as a rigorous humanities research assistant and mandates JSON output with specific scholarly fields.

SYSTEM_PROMPT = """You are a humanities research assistant. Analyze the provided primary source text and return valid JSON.

Rules:
- Identify historical context, including date, author, and audience if inferable from the text or your training data.
- Extract three to five major themes. For each theme, include a direct quote from the text as evidence.
- Write one close-reading paragraph that examines a single significant passage in detail.
- Suggest two connections to broader scholarly debates or secondary literature.
- Output only a JSON object with keys: historical_context, themes, close_reading, scholarly_connections.
- Do not use anachronistic language. Cite specific phrases."""

Step 3: Load the primary source

I need a way to ingest the source. This script accepts a file path from the command line and loads the text into memory. I print the character count so I know how much context I am sending.

import argparse

def load_source(file_path):
    with open(file_path, "r", encoding="utf-8") as f:
        return f.read()

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Analyze a primary source.")
    parser.add_argument("file", help="Path to the text file")
    args = parser.parse_args()

    source_text = load_source(args.file)
    print(f"Loaded source: {len(source_text)} characters")

Step 4: Assemble the JSON pipeline

Now I combine the loader, the system prompt, and JSON mode into a single pipeline. I switch to kimi-k2.6 because its long context window and reasoning capabilities handle dense historical prose well. Because Oxlo.ai charges per request rather than per token, passing a full speech does not change the price. See https://oxlo.ai/pricing for current plans.

import argparse
import json
from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a humanities research assistant. Analyze the provided primary source text and return valid JSON.

Rules:
- Identify historical context, including date, author, and audience if inferable.
- Extract three to five major themes with direct quotes as evidence.
- Write one close-reading paragraph on a significant passage.
- Suggest two connections to broader scholarly debates.
- Output only JSON with keys: historical_context, themes, close_reading, scholarly_connections."""


def analyze(file_path):
    with open(file_path, "r", encoding="utf-8") as f:
        source_text = f.read()

    response = client.chat.completions.create(
        model="kimi-k2.6",
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Analyze this primary source:\n\n{source_text}"},
        ],
    )

    return json.loads(response.choices[0].message.content)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("file")
    args = parser.parse_args()

    result = analyze(args.file)
    print(json.dumps(result, indent=2))

Run it

I saved the full script as humanities_agent.py, created a file named speech.txt containing an excerpt from Sojourner Truth's 1851 Akron address, and ran:

python humanities_agent.py speech.txt

The tool returned structured JSON in about three seconds. Representative output looks like this:

{
  "historical_context": "Delivered at the Women's Rights Convention in Akron, Ohio, in 1851. Sojourner Truth, a formerly enslaved Black woman, spoke to a predominantly white audience to challenge prevailing assumptions about gender and race.",
  "themes": [
    {
      "theme": "Racial and gender intersectionality",
      "evidence": "I could work as much and eat as much as a man, when I could get it, and bear the lash as well, and ain't I a woman?"
    },
    {
      "theme": "Religious authority",
      "evidence": "Where did your Christ come from? From God and a woman. Man had nothing to do with Him."
    },
    {
      "theme": "Labor and bodily autonomy",
      "evidence": "I have plowed and reaped and husked and chopped and mowed, and can any man do more than that?"
    }
  ],
  "close_reading": "The refrain 'ain't I a woman?' functions as both rhetorical question and demand for recognition. By repeating the phrase after cataloging her physical labor, Truth forces the audience to confront the dissonance between their ideological commitment to women's fragility and the material reality of her enslaved body. The grammatical variation, 'ain't,' signals vernacular resistance to middle-class linguistic norms.",
  "scholarly_connections": [
    "Resonates with Angela Davis's critique of the exclusion of Black women from nineteenth-century suffrage narratives.",
    "Connects to recent historiography on enslaved women's labor and the gendered economics of the antebellum North."
  ]
}

Next steps

This pipeline is a starting point, not a finished product. Two concrete next steps: first, wire the JSON output into a Zotero note or Obsidian vault so your annotations live next to your bibliography. Second, batch the script over a directory of OCR text files to generate a searchable index of themes across an entire archival collection.