1. Introduction
Today many people use AI to write text. AI can write blogs, homework, emails, stories, and almost anything. This is very cool, but it also brings a big problem: how do we know if a human wrote the text or if AI wrote the text?
So we need a tool. This tool is called an AI Checker.
An AI Checker reads text and tries to guess:
- “Is this text written by a human?”
- “Is this text written by AI?”
- “Is this text copied from somewhere?”
- “Is this text high quality?”
This blog will show you how to build your own AI Checker. We will use very simple English, but we will also use developer ideas, like code, models, APIs, and pipelines.
2. High-Level System Architecture
2.1 Core Components
- Ingestion Layer This part takes the text. The text can come from:
- copy & paste
- file upload
- API call
- a big batch of text
- Detection Engine This part is the “brain”. It reads the text and tries to predict:
- AI or human
- percent of AI text
- strange patterns
- Feature Analysis Module This part looks at the text like a small scientist. It checks:
- how random the words are
- how long each sentence is
- how hard or easy the text is
- many other small things
Model Ensemble
This part mixes predictions from different models.
One model may look at grammar.
One model may look at word patterns.
One model may look at AI style.
When they join, the final result is stronger.Report Generator
This part makes a simple result report.
It shows scores and highlights.REST API Layer
Apps can talk to your AI Checker through an API.
Developers like API because it is easy to use.Frontend Dashboard
A simple web page so users can upload text and see results.
2.2 Architecture Diagram (ASCII)
+---------------------+
| Frontend UI |
+----------+----------+
|
v
+----------+----------+
| REST API |
+----------+----------+
|
+--------------+--------------+
| |
v v
+------+--------+ +--------+-------+
| Ingestion | | Report Generator|
+------+--------+ +--------+-------+
| ^
v |
+------+--------+ +--------+-------+
| Feature | -----> | Model Ensemble |
| Extraction | | (AI Detector) |
+---------------+ +----------------+
This diagram is simple, but it shows the idea.
2.3 Tech Stack Options
You can use many languages or tools, but here are easy ones:
- Python for machine learning
- FastAPI for API
- PyTorch for models
- Transformers library for embeddings
- Redis for caching
- PostgreSQL for storage
- Docker for deployment
This stack is simple but very strong.
3. Building the Detection Engine
Now let’s build the “brain”.
An AI Checker needs to “feel” the text.
It cannot just read the text like a human; it needs to measure things.
So we use three big ideas:
- Statistical signals
- Stylometric features
- Semantic patterns
Then we mix them together.
3.1 Input Preprocessing
Before we check the text, we clean it.
Steps:
- lower case
- remove extra spaces
- split into sentences
- split into tokens (words)
- maybe remove numbers
- maybe remove emoji
Simple Python example:
import re
import nltk
def clean_text(text):
text = text.lower()
text = re.sub(r"\s+", " ", text)
sentences = nltk.sent_tokenize(text)
return sentences
This step makes the next steps easier.
3.2 Statistical Signals
AI text often has patterns.
It is usually very smooth, very clean, and too perfect.
Human text has more “noise”.
So we calculate:
- perplexity
- burstiness
- entropy
- token variance
Here is a very simple fake example of calculating “simple perplexity”:
import math
def simple_perplexity(tokens):
probs = []
for t in tokens:
p = 1.0 / len(tokens)
probs.append(p)
entropy = -sum([p * math.log(p, 2) for p in probs])
return 2 ** entropy
This is not real, but it shows the idea:
AI text often has lower perplexity (more predictable).
3.3 Stylometric Features
Stylometry means “writing style”.
AI text has a special style:
- similar sentence lengths
- similar tone
- similar grammar
- often no strong emotion
Human writing is more messy.
We can check:
- average sentence length
- length variance
- POS tag distribution
- number of commas, periods, etc.
Example:
import numpy as np
def sentence_length_features(sentences):
lengths = [len(s.split()) for s in sentences]
return {
"avg_len": np.mean(lengths),
"var_len": np.var(lengths)
}
This helps the model understand the “shape” of the writing.
3.4 Semantic Pattern Detection
We use embeddings to understand meaning.
AI text often has:
- too-consistent tone
- very generic ideas
- repeating safe phrases
We can use SentenceTransformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
def get_vector(text):
return model.encode([text])[0]
Then we can compare vectors or run clustering.
3.5 Ensemble Model
After we get many features, we mix them.
Example of a tiny PyTorch classifier:
import torch
import torch.nn as nn
class SmallClassifier(nn.Module):
def __init__(self, input_size):
super().__init__()
self.fc = nn.Linear(input_size, 1)
def forward(self, x):
return torch.sigmoid(self.fc(x))
This classifier can learn from:
- perplexity
- burstiness
- style features
- embedding features
And produce a final score.
4. Model Training Pipeline
To make the AI Checker smart, we need to train it.
Training a model is like teaching a child.
If you give the child good examples, the child learns.
If you give the child bad examples, the child becomes confused.
An AI Checker needs many, many samples.
We need both:
- Human text
- AI text
Then we show the model:
“This is human text.”
“This is AI text.”
“This is maybe AI text.”
“This is maybe human text.”
Over time, the model learns patterns.
4.1 Dataset Strategy
We need a large dataset.
Human text can come from:
- blogs
- books
- emails
- essays
- forums
- Wikipedia
- news sites
AI text can come from:
- OpenAI models (GPT-3.5 / GPT-4 / GPT-4o)
- Claude (Anthropic)
- Gemini
- LLaMA
- DeepSeek
- Mistral models
- Other generative systems
We should collect AI text in many styles:
- story style
- academic style
- SEO style
- short answer
- long answer
- technical writing
This helps the model understand many patterns.
4.2 Labeling the Dataset
This step is simple:
- If the text is written by a machine → label:
AI - If written by a human → label:
HUMAN
Sometimes we also give a score:
- 0.0 = pure human
- 1.0 = pure AI
- 0.5 = mixed
This helps the model give percentage results.
4.3 Training Loop
Here is a simple PyTorch training loop:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
def train(model, dataset, epochs=3):
loader = DataLoader(dataset, batch_size=32, shuffle=True)
opt = torch.optim.Adam(model.parameters(), lr=1e-4)
loss_fn = nn.BCELoss()
for epoch in range(epochs):
for x, y in loader:
y_pred = model(x)
loss = loss_fn(y_pred, y)
opt.zero_grad()
loss.backward()
opt.step()
This loop makes the model better each time.
4.4 Validation & Testing
We must check if our model is right.
We test:
- Accuracy
- Precision
- Recall
- False positives
- False negatives
A simple mistake:
- A model that marks human text as AI → bad
- A model that marks AI text as human → bad
We need balance.
We often test on new text that the model has never seen before.
4.5 Continuous Update Pipeline
The world changes fast.
New AI models appear every month.
So our AI Checker must also improve.
A good plan is:
- Every month: collect new AI text
- Every month: collect new human text
- Every month: retrain
- Every month: re-deploy new version
This is called continuous learning.
5. Building the REST API (FastAPI Example)
Now we make our model available to users.
A REST API lets:
- apps
- websites
- extensions
- scripts
talk to our AI Checker.
FastAPI is simple and fast.
5.1 Basic Structure
Here is a simple FastAPI app:
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class TextInput(BaseModel):
text: str
@app.post("/detect")
def detect_ai(data: TextInput):
score = run_model(data.text)
return {"ai_score": score}
This is all you need to start.
5.2 /detect Endpoint
This endpoint takes text and returns AI probability.
Example result:
{
"ai_score": 0.87
}
Meaning: 87% chance this text is made by AI.
5.3 /quality-score Endpoint
You can also check text quality.
@app.post("/quality")
def check_quality(input: TextInput):
q = get_quality_score(input.text)
return {"quality": q}
Quality can be:
- clarity
- grammar
- sentence flow
- keyword strength
5.4 /plagiarism-check Endpoint
This endpoint checks if text is copied.
@app.post("/plagiarism")
def plagiarism(input: TextInput):
result = run_plagiarism(input.text)
return {"rate": result}
If rate = 0.45 → 45% similar
If rate = 0.95 → almost fully copied
5.5 /explain Endpoint
Shows sentence-level heatmap.
Example:
{
"sentences": [
{"text": "This is a test.", "ai_prob": 0.80},
{"text": "I like pizza.", "ai_prob": 0.12}
]
}
Users love this feature because they want to see why the model thinks it's AI.
6. Frontend Dashboard (Simple Version)
Now we make a simple UI so humans can use the tool.
A simple React component:
function Checker() {
const [text, setText] = useState("");
const [result, setResult] = useState(null);
async function sendText() {
const res = await fetch("/detect", {
method: "POST",
body: JSON.stringify({text}),
headers: {"Content-Type": "application/json"}
});
setResult(await res.json());
}
return (
<div>
<textarea value={text} onChange={e => setText(e.target.value)} />
<button onClick={sendText}>Check</button>
{result && <p>AI Score: {result.ai_score}</p>}
</div>
);
}
Very basic, but it works.
7. Adding Plagiarism Detection
Plagiarism means “copying someone else’s words”.
A good AI Checker also needs this feature.
We use two kinds:
- String-based plagiarism
- Semantic plagiarism
7.1 String-Based Plagiarism
Very simple idea:
Compare the text to a big database of many documents.
You can use:
- n-grams
- shingling
- hashing
- cosine similarity
Example:
def simple_similarity(a, b):
set_a = set(a.split())
set_b = set(b.split())
return len(set_a & set_b) / len(set_a | set_b)
This is simple but not enough.
7.2 Semantic Plagiarism
This checks meaning, not just words.
We use embeddings again.
If two texts have similar meaning, cosine similarity is high:
from sklearn.metrics.pairwise import cosine_similarity
def semantic_score(v1, v2):
return cosine_similarity([v1], [v2])[0][0]
This can detect AI rewrite tools.
Many students use AI to rewrite copied text.
Semantic models can detect this.
7.3 Using Elasticsearch or Pinecone
For big databases, we must use vector search.
These tools:
- Elasticsearch
- Pinecone
- Weaviate
- Qdrant
They can store millions of vectors.
Then we search fast.
8. Multi-Language Support
To support many languages, like:
- Chinese
- Japanese
- Korean
- Spanish
- French
- German
we must change:
- tokenizers
- embedding models
- training datasets
Each language has different patterns.
For example:
- Chinese has no spaces
- English uses many small words
- Japanese uses kanji and kana
- Spanish has longer sentences
We must fine-tune per language.
9. Performance Optimization
If many users use your API, it must be fast.
Here are tips:
- Cache model results
- Use GPU if possible
- Batch requests
- Preload model to memory
- Use Redis queue
- Use async FastAPI
Example async API:
@app.post("/detect")
async def detect_ai(data: TextInput):
score = await run_async_model(data.text)
return {"ai_score": score}
10. Deployment
You can deploy with Docker:
Dockerfile
FROM python:3.10
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Then:
docker build -t ai-checker .
docker run -p 8000:8000 ai-checker
For cloud, use:
- AWS
- GCP
- Azure
- Railway
- Render
- Cloudflare Workers (for frontend)
11. Security & Privacy
Very important.
You must protect user data.
Rules:
- Do not store text
- Or store with user permission
- Remove logs
- No selling data
- Use HTTPS
- Add rate limit to block bots
Example:
from slowapi import Limiter
12. Future Extensions
Future ideas:
- detect AI images
- detect AI audio
- detect AI video
- detect AI code
- browser extensions
- WordPress plugin
- API for LMS (school systems)
AI will grow fast.
AI detection must grow too.
13. Conclusion
We now have a full AI Checker:
- architecture
- models
- features
- API
- frontend
- training
- deployment
We used simple English, but real tech.
Now any developer can build an AI Checker.
Top comments (0)