q2408808

Posted on Mar 28

LLMs Don't Grade Essays Like Humans — But Here's What They're Actually Good At (API Tutorial)

#ai #python #machinelearning #api

arXiv Bombshell: LLMs Fail at Essay Grading

On March 24, 2026, researchers published a paper making waves in academic and developer circles: "LLMs Do Not Grade Essays Like Humans". The study evaluated GPT and Llama family models on automated essay scoring (AES) in out-of-the-box settings — no fine-tuning, no task-specific prompting.

The finding: agreement between LLM scores and human scores remains relatively weak. LLMs tend to assign higher scores to short or underdeveloped essays, while penalizing longer essays with minor grammatical errors. The models follow internally coherent patterns, but those patterns don't align with how human raters actually think.

What This Means for Developers

This doesn't mean LLMs are useless for education or writing tools. It means developers need to use them for the right tasks.

What LLMs ARE reliable for in writing contexts:

Essay generation and variation — Creating draft content, generating multiple versions, producing training data at scale
Writing assistance (not grading) — Suggesting improvements, identifying structural weaknesses, offering alternative phrasings
Summarization — Condensing long essays into key points reliably
Feedback drafting — Generating constructive comments that a human teacher can review and approve
Content automation at scale — Producing e-learning content, quiz questions, and study guides

The paper itself notes: "LLMs produce feedback that is consistent with their grading and that they can be reliably used in supporting essay scoring." Key word: supporting, not replacing.

3 Developer Use Cases That Actually Work

1. Generate Essay Drafts for Training Datasets

If you're building an AES system, you need training data. LLMs can generate thousands of essay variations at different quality levels for a fraction of human writer costs.

2. Build Writing Assistance Tools (Not Graders)

The research confirms LLMs are good at generating internally consistent feedback. Build tools that suggest improvements and flag weak arguments — frame it as "AI writing coach," not "AI grader."

3. Automated Content Generation for E-Learning

E-learning platforms need massive amounts of content: practice prompts, model answers, study guides. LLMs excel here. At NexaAPI's pricing, you can process thousands of content generation requests for dollars.

Build an AI Essay Assistant in 10 Lines of Code

Python — AI Writing Coach

from nexaapi import NexaAPI

client = NexaAPI(api_key='YOUR_API_KEY')

# Generate essay feedback — coaching, not grading
response = client.chat.completions.create(
    model='gpt-4o',  # Check nexa-api.com for latest models
    messages=[
        {
            'role': 'system',
            'content': 'You are a writing coach. Provide constructive feedback on essays, '  
                       'focusing on structure, clarity, and argument strength. '  
                       'Do not assign numeric grades.'
        },
        {
            'role': 'user',
            'content': 'Please review this essay and suggest improvements: [ESSAY TEXT HERE]'
        }
    ],
    max_tokens=500
)

print(response.choices[0].message.content)
# Cost: fraction of a cent per request via NexaAPI

JavaScript — AI Writing Feedback API

// Install: npm install nexaapi
import NexaAPI from 'nexaapi';

const client = new NexaAPI({ apiKey: 'YOUR_API_KEY' });

async function getEssayFeedback(essayText) {
  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: 'You are a writing coach. Provide constructive feedback, do not assign grades.'
      },
      {
        role: 'user',
        content: `Please review this essay: ${essayText}`
      }
    ],
    maxTokens: 500
  });
  return response.choices[0].message.content;
}

getEssayFeedback('Your essay text here...').then(console.log);
// npm install nexaapi — cheapest LLM API on the market

Why NexaAPI for Ed-Tech

Provider	Cost	Free Tier	Models
NexaAPI	Cheapest available	✅ Yes	56+
OpenAI Direct	$2.50/1M tokens	❌ No	~15
Anthropic Direct	$3.00/1M tokens	❌ No	~8

NexaAPI is OpenAI-compatible — just change the base URL and API key. No code rewrite needed.

The Smart Developer's Response

The arXiv paper isn't a reason to avoid LLMs in education. It's a roadmap for using them correctly. Don't build AI graders. Build AI writing coaches, content generators, and feedback assistants.

🌐 nexa-api.com — Free API key, no credit card required
⚡ RapidAPI — Try on RapidAPI
🐍 pip install nexaapi — PyPI
📦 npm install nexaapi — npm

Reference: arXiv:2603.23714 — "LLMs Do Not Grade Essays Like Humans" (Barbosa et al., March 24, 2026)

DEV Community