DEV Community: Yiğit Erdoğan

Detecting Objects in Satellite Imagery with YOLOv8: xView + DOTA to YOLO in Practice

Yiğit Erdoğan — Tue, 28 Jul 2026 10:50:03 +0000

Satellite imagery breaks most of the assumptions object detectors are built on. Objects are tiny (a car is 10–15 px in a 3000×3000 tile), classes are wildly imbalanced, and the annotations don't come in anything close to YOLO format.

I spent a while building dl_xview_yolo — a YOLOv8 pipeline for detecting planes, ships, vehicles, bridges and storage tanks in aerial and satellite images, trained on xView and DOTA. This post is about the parts that actually took time: the data conversion, the training config, and the mistakes worth avoiding.

The real work is the data conversion

Nobody tells you that ~80% of a detection project is reshaping labels. The two datasets couldn't be more different:

xView ships a single giant .geojson file. Every object is a feature whose properties may hold bounds_imcoords, or bbox, or xmin/ymin/xmax/ymax, with inconsistent casing, and the images are 16-bit GeoTIFFs.
DOTA uses one .txt per image, with oriented boxes: 8 numbers (four corner points) plus a class name.

So convert_all_to_yolo.py ended up being a tolerant parser rather than a clean one:

def extract_bbox_from_props(props: dict):
    # 1) bounds_imcoords: "x1,y1,x2,y2"
    for key in props:
        lk = key.lower()
        if "bounds" in lk and "imcoords" in lk:
            bb = parse_bounds_string(props[key])
            if bb:
                return bb
    # 2) bbox as list or string
    # 3) xmin/ymin/xmax/ymax, x_min/..., left/top/right/bottom

Two things I'd do the same way again:

Normalize the imagery before anything else. GeoTIFFs are frequently 16-bit or single-channel, and cv2.imread will happily hand you an array OpenCV's JPEG writer chokes on:

def ensure_bgr_uint8(im):
    if im.dtype != np.uint8:
        im = cv2.normalize(im, None, 0, 255, cv2.NORM_MINMAX).astype(np.uint8)
    if im.ndim == 2:
        im = cv2.cvtColor(im, cv2.COLOR_GRAY2BGR)
    return im

Convert oriented boxes to axis-aligned early. DOTA's polygons become horizontal boxes via min/max on the corner coordinates — a real information loss for rotated ships and bridges, but it lets you train plain yolov8m instead of the OBB head. I started on yolov8m-obb.pt and dropped back to detection; the pipeline got dramatically simpler and the mAP was good enough for the use case.

xs = list(map(float, parts[0:8:2]))
ys = list(map(float, parts[1:8:2]))
xmin, ymin, xmax, ymax = min(xs), min(ys), max(xs), max(ys)

Everything then lands in the standard layout — images/{train,val} + labels/{train,val} + data.yaml — and clipped boxes outside [0, 1] are dropped rather than clamped-and-kept, which quietly removes a class of degenerate labels.

Training config that survived contact with a single GPU

The full detection run:

model.train(
    data=str(DATA_YAML),
    epochs=100,
    imgsz=1024,        # small objects need resolution, not more epochs
    batch=4,           # 1024px is expensive; batch stays tiny
    optimizer="AdamW",
    lr0=0.0001,
    cos_lr=True,
    dropout=0.05,
    hsv_h=0.015, hsv_s=0.75, hsv_v=0.45,
    translate=0.15, scale=0.55,
    mixup=0.15, copy_paste=0.15,
    shear=0.0, perspective=0.0,   # nadir imagery: no perspective to simulate
    close_mosaic=30,
    patience=30,
)

The reasoning behind the non-obvious ones:

imgsz=1024, not 640. This is the single highest-impact knob for aerial data. Downscaling to 640 makes small vehicles disappear into a couple of pixels. The cost is that batch collapses to 4.
shear=0, perspective=0. Satellite views are roughly nadir. Simulating perspective distortion generates images the model will never see at inference.
copy_paste and mixup on. Class imbalance in xView is brutal; pasting rare instances into other tiles is cheaper than resampling.
close_mosaic=30. Mosaic augmentation is great early and harmful at the end — the last 30 epochs train on real, unstitched tiles.
cos_lr=True with a low lr0. Fine-tuning from COCO weights onto a domain this different punishes aggressive learning rates.

Results on the validation split:

Metric	Value
mAP@0.5	0.54
mAP@0.5-0.95	0.36
Precision	0.67
Recall	0.71

Not state of the art, but a usable baseline — and the gap between mAP@0.5 and mAP@0.5:0.95 is exactly what you'd expect when boxes are small: localization is approximate even when detection is correct.

Serving it: one script, CLI and UI

predict_yolo.py does double duty. With --no-ui it's a plain batch inference script; without it, it boots a FastAPI app serving a drag-and-drop page that posts an image and gets back an annotated PNG:

@app.post("/predict")
async def predict_endpoint(file: UploadFile = File(...)):
    img = Image.open(io.BytesIO(await file.read())).convert("RGB")
    results = model.predict(source=np.array(img), imgsz=imgsz, conf=conf, device=device)
    return StreamingResponse(io.BytesIO(ndarray_to_png_bytes(results[0].plot())),
                             media_type="image/png")

Small detail that saves real time: the weights are discovered automatically instead of hardcoded.

bests = sorted(glob.glob(str(RUNS / "train" / "**" / "weights" / "best.pt"), recursive=True))

After twenty training runs you stop remembering which exp folder holds the good checkpoint.

Three things I'd tell my past self

1. Make your train/val split deterministic — and verify it. I split by hashing the filename:

def is_val_split(name: str) -> bool:
    return (hash(name) % 10) < 2

Python's hash() for strings is salted per process unless PYTHONHASHSEED is fixed. Re-run the conversion and images silently move between train and val — which means a model evaluated after a re-conversion may have trained on its own validation set. Use a stable hash instead:

import hashlib

def is_val_split(name: str) -> bool:
    digest = hashlib.md5(name.encode()).hexdigest()
    return int(digest, 16) % 10 < 2

2. Unify your class taxonomy before merging datasets. xView maps type_id - 1 into a 60-class space; DOTA maps 15 class names into indices 0–14. Write both into the same labels/ directory and DOTA's plane collides with whatever xView class index 0 happens to be. Decide on one taxonomy — or keep the datasets in separate runs — before you spend GPU hours.

3. Don't hardcode absolute paths. Mine started as C:\Users\Asus\Desktop\dl_xview and it's the first thing anyone cloning the repo has to fix. Read roots from a config file or an environment variable from day one.

Try it

git clone https://github.com/Yigtwxx/dl_xview_yolo.git
cd dl_xview_yolo
pip install ultralytics opencv-python pillow tqdm numpy torch torchvision fastapi uvicorn

python scripts/convert_all_to_yolo.py     # xView/DOTA → YOLO
python scripts/tain_yolo.py --epochs 100  # train
python scripts/val_yolo.py                # evaluate
python scripts/predict_yolo.py            # FastAPI UI at 127.0.0.1:7860

Repo: github.com/Yigtwxx/dl_xview_yolo — MIT licensed, issues and PRs welcome. If you've trained detectors on aerial imagery, I'd genuinely like to hear how you handled the tiny-object problem: tiling, higher resolution, or a different architecture entirely.

Building an Offline Document Search Engine for My University

Yiğit Erdoğan — Sat, 04 Jul 2026 17:36:24 +0000

Hi everyone,

I'm Yiğit, a third-year software engineering student. If you have ever tried to find a specific rule about grading, attendance, or academic calendars in university regulation PDFs, you know how frustrating it can be. Information is scattered across dozens of poorly formatted documents, making it almost impossible to find quick answers.

To solve this, I built FiratUniversityChatbot—an open-source, completely offline Turkish question-answering and document search assistant tailored for Fırat University.

The Core Idea

Instead of relying on heavy cloud-based LLMs that might hallucinate academic rules, I wanted a fast, deterministic, and fully local system. The goal was simple: users ask a question, and the app instantly scans local PDFs to return the exact snippet and the source page number. If the answer isn't in the documents, it politely refuses to guess, ensuring zero hallucination.

The Tech Stack

To keep the application lightweight, secure, and robust without needing an internet connection, I went with a pure Information Retrieval approach:

Backend & API: Python and FastAPI for high performance.
Document Processing: pdfplumber to handle the nightmare of university PDFs. The app dynamically detects single or dual columns, filters out headers/footers, and accurately assembles text blocks.
Search Engine: A custom-built BM25 index tailored specifically for the Turkish language. I implemented ASCII normalization, tokenization, synonym expansion (e.g., treating "büt" and "bütünleme" as the exact same intent), and bigram matching.
Frontend: A minimal, dependency-free chat interface using HTML, CSS, and Jinja2 templates.

The Biggest Challenge

The hardest part was definitely the data extraction and text pipeline. University PDFs are notoriously messy. Building a fallback strategy that can accurately crop dual columns without mixing up paragraphs was a headache. Additionally, fine-tuning the BM25 ranking algorithm with domain-aware tweaks—like intent flags for "pass grade" or "appeals"—took a lot of trial and error to get right.

Open for Feedback

The project is completely open-source and ready to be tested. You can run it locally in a virtual environment or deploy it easily via Docker (it's currently live on Hugging Face Spaces too).

If you are interested in search engines, document parsing, or building fast Python applications, I would love for you to check it out.

Repository: FiratUniversityChatbot on GitHub

I’m highly open to code reviews, architectural feedback, or Pull Requests. Have you ever built a local search tool for your school or company? Let's discuss in the comments!

Harnessing Mathematical Chaos: Building a Python PRNG Using the Collatz Conjecture

Yiğit Erdoğan — Sat, 20 Jun 2026 07:29:01 +0000

Hello DEV Community,

As developers, we frequently rely on standard libraries to handle fundamental computational tasks. When we need a random number, we simply call import random or use the secrets module without delving into the underlying mechanics of entropy, seed generation, or the Mersenne Twister algorithm.

As a third-year Software Engineering student, I wanted to break through that abstraction layer. To truly understand how pseudo-randomness is computationally achieved, I decided to build a Pseudo-Random Number Generator (PRNG) entirely from scratch.

To make the experiment more challenging, I chose to base the entropy engine on one of the most famous unsolved problems in mathematics: the Collatz Conjecture.

I would like to introduce my recent project: BSG Random Number Generator.

The Theory: Why the Collatz Conjecture?

For those who might not be familiar, the Collatz Conjecture (also known as the 3n+1 problem) is a mathematical sequence defined by two simple rules applied to any positive integer:

If the number is even, divide it by 2.
If the number is odd, multiply it by 3 and add 1.

The conjecture states that no matter what starting value you choose, the sequence will always eventually reach the 4-2-1 loop. However, the path it takes to get there—the "stopping time" and the peak values reached during the sequence—exhibits behavior that is incredibly unpredictable and highly sensitive to the initial state.

This unpredictable orbital path is a perfect example of deterministic chaos. I wanted to investigate whether this chaos could be harvested and mathematically normalized to generate usable, uniformly distributed random numbers.

Architectural Overview and Implementation

The BSG Random Number Generator is written in pure Python with zero external dependencies. The core logic revolves around extracting specific metrics from the Collatz sequence to build an internal state mechanism.

Instead of relying on system time or OS-level entropy pools, the generator uses:

Orbital Lengths: The exact number of steps required for a specific seed to reach the 4-2-1 loop.
Peak Tracking: The maximum integer value achieved during the sequence path.
Parity Sequences: The alternating pattern of odd and even evaluations before termination.

By dynamically updating the seed based on these extracted metrics and applying normalization techniques, the generator successfully outputs floating-point numbers between 0.0 and 1.0, as well as scalable integers. The resulting distribution is surprisingly uniform and demonstrates how mathematical anomalies can be structured into functional code.

Security Limitations and Disclaimer

As a software engineering practice, it is crucial to clearly define the boundaries of experimental projects. This generator is strictly an educational and experimental tool.

While it produces uniform distributions suitable for procedural generation, basic simulations, or non-critical randomized logic, it has not been subjected to rigorous statistical test suites like Diehard or NIST. Furthermore, it is deterministic by nature and is not Cryptographically Secure (CSPRNG). It should never be used for generating cryptographic keys, tokens, or handling sensitive security operations.

Seeking Community Feedback

Building the BSG Random Number Generator was a comprehensive exercise in state management, algorithm optimization, and understanding computational entropy in Python.

I am sharing this project here because I highly value the technical insights of the DEV community. I would appreciate any feedback on the repository, specifically regarding:

The structural design and Pythonic efficiency of the code.
Potential mathematical strategies to extract even higher levels of entropy from the sequence orbits.
Suggestions for optimizing the state transition logic to prevent performance bottlenecks over millions of iterations.

Repository Link: Yigtwxx/bsg-random-number-generator

Have you ever tried rebuilding fundamental standard library tools from scratch to better understand their underlying architecture? I would love to hear your thoughts and experiences in the comments below.

50+ Essential Tools for Building Production RAG Systems

Yiğit Erdoğan — Thu, 08 Jan 2026 09:14:04 +0000

After researching and documenting the production RAG ecosystem, I've compiled a comprehensive list of 50+ battle-tested tools that actually matter when you're scaling Retrieval-Augmented Generation systems from prototype to production.

Why This List?

The gap between "Hello World" RAG tutorials and production-ready systems is massive. This curated collection focuses on the engineering side—real tools for real problems.

Quick Navigation

Frameworks & Orchestration
Vector Databases
Retrieval & Reranking
Evaluation & Benchmarking
Observability & Tracing
Deployment & Serving

Frameworks: Choose Your Stack

LlamaIndex

Best for: Data processing and advanced indexing strategies

Perfect when you need hierarchical retrieval, knowledge graphs, or complex query engines. The data-first approach makes ingestion pipelines cleaner.

LangChain

Best for: Rapid prototyping and maximum ecosystem compatibility

The largest community means tons of integrations, but watch out for abstraction overhead in production.

LangGraph

Best for: Agentic systems with complex workflows

When you need cyclic graphs, human-in-the-loop, or stateful multi-step reasoning. The graph-based approach is perfect for advanced agents.

Haystack

Best for: Enterprise pipelines requiring auditability

Type-safe, DAG-based architecture. If you need strict reproducibility and compliance, this is your choice.

Vector Databases: Scale Matters

Database	Sweet Spot	Key Advantage
Chroma	Local dev & mid-scale	Zero-config embedded mode
Pinecone	10M-100M vectors	Serverless, zero ops
Qdrant	<50M vectors	Best free tier + filtering
Milvus	Billions of vectors	Open source at massive scale
pgvector	PostgreSQL users	Leverage existing Postgres infra
Weaviate	Hybrid search	Native vector + keyword

Pro Tip: Start with Chroma locally, graduate to Qdrant for production, scale to Milvus only if you truly need billions of vectors.

Retrieval: Beyond Basic Search

The Hybrid Search Pattern

Dense vector search alone misses exact term matches. Sparse keyword search (BM25) alone misses semantics. Combine them.

Tools:

ColBERT (via RAGatouille): Token-level matching for superior recall
Cohere Rerank: API-based reranker, 10-20% precision boost
BGE-Reranker: Best open-source cross-encoder
FlashRank: Lightweight CPU-only reranking

Real-world pattern:

Retrieve top-100 with fast semantic search
Rerank to top-5 with cross-encoder
Feed to LLM

This 2-stage approach is standard at companies like Notion and Discord.

Evaluation: Measure What Matters

The RAG Triad

Context Relevance - Did we retrieve the right documents?
Groundedness - Is the answer faithful to the context?
Answer Relevance - Does it address the question?

Tools

Ragas: LLM-as-a-Judge evaluation without ground truth

DeepEval: The "Pytest for LLMs", integrates into CI/CD

Braintrust: Online eval for real user interactions

ARES: Stanford's automated eval with statistical confidence

Critical: Always validate your LLM judge against human labels on 100-200 samples. GPT-4 has ~85% agreement with humans, not 100%.

Observability: You Can't Fix What You Can't See

Must-Have Metrics

Latency percentiles (p50, p95, p99)
Token usage per request (cost tracking)
Retrieval quality (distance scores, reranker confidence)
Embedding drift (production vs training distribution)

Tools

LangSmith: Gold standard for LangChain, instant trace replay

Langfuse: Open-source, prompt versioning decoupled from code

Arize Phoenix: Visualize embedding clusters, debug retrieval

OpenLIT: OpenTelemetry-native for existing Prometheus/Grafana stacks

Deployment: From Laptop to Production

Three Reference Architectures

Local Stack (Zero Cost)

LLM: Ollama (Llama 3, Mistral)
Vector DB: Chroma (embedded)
Eval: Ragas

When: Prototype validation, no API keys needed

Mid-Scale Stack (Speed to Market)

Vector DB: Qdrant Cloud
Reranker: Cohere Rerank API
Tracing: Langfuse
LLM: OpenAI GPT-4

When: 90% of production use cases

Enterprise Stack (The 1%)

Vector DB: Milvus (distributed)
Serving: vLLM (self-hosted)
Monitoring: OpenLIT + custom SLAs
Eval: DeepEval in CI/CD

When: Billions of vectors, data sovereignty, dedicated platform team

Security: Don't Skip This

Production RAG handles user data. Common threats:

Prompt Injection: User manipulates retrieval context
PII Leakage: Sensitive data in embeddings or responses
Jailbreaking: Bypassing system guardrails

Essential Tools:

Presidio: PII detection before embedding
NeMo Guardrails: Programmable topic constraints
LLM Guard: Input/output sanitization
PrivateGPT: 100% offline RAG for regulated industries

Real-World Case Studies

Notion AI

Stack: Pinecone + GPT-4 + custom embeddings
Key Insight: Hybrid search improved recall by 23%

Discord (19B messages)

Stack: ScaNN + custom Rust infra
Key Insight: 99.9% recall at 10ms latency with ANN

Shopify

Key Insight: Domain-specific fine-tuning reduced hallucinations from 18% → 4%

Pattern: Everyone uses hybrid search + reranking at scale.

The Full Resource

This article covers the highlights. For the complete list of 50+ tools, reference architectures, evaluation frameworks, and anti-patterns to avoid:

Awesome RAG Production on GitHub

Includes:

✅ Comparison tables for every category
✅ Decision trees for selecting tools
✅ RAG pitfalls and how to avoid them
✅ Datasets for benchmarking
✅ Curated books and blogs

Contributing

Found a tool that should be on the list? Spotted an outdated link? PRs welcome!

Star the repo to stay updated with new tools and best practices as the RAG ecosystem evolves.

What's your production RAG stack? Drop a comment below!