DEV Community

shashank ms
shashank ms

Posted on

LLM Training for Corporate Applications

We are building a corporate policy question-answering pipeline on Oxlo.ai that turns raw HR documents into validated training examples and deploys them as a grounded support agent. This pattern helps any team bootstrap an internal knowledge base without standing up dedicated training infrastructure or worrying about per-token costs on long documents.

What you'll need

Step 1: Ingest raw policy text

I start with a plain-text expense policy. In production you would parse PDFs or Slack exports, but a hardcoded string keeps the tutorial reproducible.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

POLICY_TEXT = """
Expense Reimbursement Policy
Employees may expense up to $50 per day for meals while traveling.
Receipts are required for any single expense over $25.
Hotel bookings must be pre-approved by your direct manager.
"""

print(f"Loaded policy: {len(POLICY_TEXT)} characters")

Step 2: Generate training Q&A pairs

I use DeepSeek V3.2 on Oxlo.ai to read the policy and emit structured question-answer pairs. Because Oxlo.ai charges per request rather than per token, I can pass the full document into the prompt without worrying about length.

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

generation_prompt = f"""
Read the following corporate policy and generate exactly 3 question-answer pairs suitable for training a support agent.
Output ONLY a JSON array of objects with keys "question" and "answer". Do not use markdown code fences.

Policy:
{POLICY_TEXT}
"""

response = client.chat.completions.create(
    model="deepseek-v3.2",
    messages=[
        {"role": "system", "content": "You emit only valid JSON arrays."},
        {"role": "user", "content": generation_prompt},
    ],
)

raw = response.choices[0].message.content.strip()
if raw.startswith("

```"):
    raw = raw.split("```

")[1].replace("json", "").strip()

qa_pairs = json.loads(raw)
print(json.dumps(qa_pairs, indent=2))

Step 3: Validate the examples

Before I trust these examples, I run a validation pass with Qwen 3 32B. Any pair that adds unsourced details gets filtered out.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

def validate_pair(pair):
    validation_prompt = f"""
    A training example was extracted from the following policy.
    Reply with VALID if the answer is fully supported by the policy text.
    Reply with INVALID if the answer adds unsourced details or contradicts the text.

    Policy:
    {POLICY_TEXT}

    Question: {pair['question']}
    Answer: {pair['answer']}
    """

    response = client.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": "You are a strict validation engine. Reply with exactly one word: VALID or INVALID."},
            {"role": "user", "content": validation_prompt},
        ],
    )
    verdict = response.choices[0].message.content.strip()
    return verdict == "VALID"

validated = [p for p in qa_pairs if validate_pair(p)]
print(f"Kept {len(validated)} of {len(qa_pairs)} examples.")

Step 4: Assemble the agent prompt

I inject the validated examples directly into the system prompt as few-shot context. This grounds the agent without requiring a separate fine-tuning job.

examples_block = "\n\n".join(
    f"Q: {p['question']}\nA: {p['answer']}" for p in validated
)

SYSTEM_PROMPT = f"""You are a corporate policy assistant.
Answer employee questions using ONLY the provided policy context.
If the answer is not in the context, say 'I do not have that information.'
Use the following verified examples to guide your tone and format.

{examples_block}

Policy context:
{POLICY_TEXT}
"""

Step 5: Deploy the QA agent

The final agent sends the employee question along with the grounded system prompt to Llama 3.3 70B on Oxlo.ai.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

def ask_agent(question):
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": question},
        ],
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    print(ask_agent("Do I need a receipt for a $30 dinner?"))

Run it

Running the script end to end produces the synthetic training data, filters it, and answers the question.

Loaded policy: 189 characters
[
  {
    "question": "What is the daily meal expense limit while traveling?",
    "answer": "Employees may expense up to $50 per day for meals while traveling."
  },
  {
    "question": "When are receipts required for expenses?",
    "answer": "Receipts are required for any single expense over $25."
  },
  {
    "question": "Do hotel bookings need approval?",
    "answer": "Yes, hotel bookings must be pre-approved by your direct manager."
  }
]
Kept 3 of 3 examples.

Do I need a receipt for a $30 dinner?
Yes, receipts are required for any single expense over $25, so a $30 dinner requires a receipt.

Next steps

Replace the hardcoded POLICY_TEXT with a directory of Markdown or PDF files and regenerate the examples nightly. When you scale past a single document, add a vector store such as pgvector or Chroma so the agent retrieves only the relevant chunks.

For teams processing long employee handbooks or meeting transcripts, Oxlo.ai's request-based pricing stays flat regardless of prompt length. You can feed entire documents into the context window without the cost scaling typical of token-based providers. See https://oxlo.ai/pricing for details.

Top comments (0)