DEV Community

Cover image for Building a Full Stack AI Engine From Scratch: The Architecture Behind Cevahir AI
Muhammed Yasin Yılmaz
Muhammed Yasin Yılmaz

Posted on

Building a Full Stack AI Engine From Scratch: The Architecture Behind Cevahir AI

Building a Full Stack AI Engine From Scratch: The Architecture Behind Cevahir AI

For the last 16 months, I’ve been building an open-source AI infrastructure project called Cevahir AI.

The original goal wasn’t simply creating another chatbot or wrapping existing APIs with a new interface. I wanted to explore something much deeper:

What would it look like to build a modular AI engine architecture from the tokenizer layer all the way to reasoning orchestration?

Most AI projects today focus on a single layer of the stack:
inference APIs,
RAG pipelines,
agent wrappers,
fine-tuning systems,
or prompt engineering workflows.

Very few projects attempt to unify tokenizer training, neural architectures, training orchestration, model lifecycle management, reasoning systems, and local inference pipelines under a single engineering structure.

Cevahir AI was created to explore exactly that problem.

The project is fully open source and designed as a modular AI infrastructure system capable of running locally and offline. Instead of focusing only on model outputs, the architecture focuses on the entire lifecycle of AI systems:
how they tokenize,
how they train,
how they reason,
how they orchestrate decisions,
and how they evolve over time.

One of the most important engineering decisions behind the project was separating responsibilities aggressively across the system.

The architecture is divided into multiple independent layers:

  • Tokenizer Management
  • Data Loader Management
  • Neural Network
  • Model Management
  • Training System
  • Training Management
  • Cognitive Management
  • Unified Cevahir Core

Each module owns a specific responsibility while remaining connected through a shared orchestration layer.

The upper-level Cevahir module acts as the production-facing API layer responsible for inference, generation, routing, memory management, and cognitive orchestration.

This separation allows training systems and inference systems to evolve independently without turning the infrastructure into a monolithic codebase.

Why I Focused on the Tokenizer Layer

One of the areas I spent the most time on was tokenizer infrastructure.

Turkish is a morphologically rich and agglutinative language. Traditional English-centric tokenization assumptions create serious fragmentation problems when applied directly to Turkish.

Instead of treating tokenization as a simple preprocessing step, I approached it as a language-aware infrastructure problem.

The tokenizer system extends traditional Byte Pair Encoding with Turkish-oriented preprocessing layers including:

  • Turkish lowercase normalization
  • Unicode NFC normalization
  • Morphological preprocessing
  • Syllable-aware fallback mechanisms
  • Root-suffix awareness
  • OOV recovery systems
  • Deterministic merge selection

The goal wasn’t only compression efficiency.

The real objective was reducing fragmentation while preserving semantic continuity across Turkish word structures.

Neural Architecture and Inference Design

The neural core of Cevahir AI is based on a decoder-only Transformer architecture.

The infrastructure currently supports modern LLM techniques such as:

  • RMSNorm
  • RoPE and YaRN scaling
  • SwiGLU
  • KV-Cache
  • Multi-Head Attention
  • Grouped Query Attention (GQA)
  • Flash Attention
  • Sliding Window Attention
  • QK-Norm
  • Optional Mixture of Experts (MoE)

The design philosophy here is balancing inference efficiency, scalability, VRAM optimization, and training stability without tightly coupling the system to a single architectural direction.

Rather than building a fixed model, the idea was creating an infrastructure capable of evolving over time.

Cognitive Orchestration

One of the most experimental parts of the project is the Cognitive Management layer.

I became increasingly interested in a question:

What happens after text generation?

Most systems stop once the model produces a response.
I wanted to explore architectures where inference itself could become more reflective and adaptive.

The cognitive orchestration layer combines concepts inspired by:

  • Chain of Thought
  • Tree of Thoughts
  • Self Consistency
  • ReAct
  • Self Refine
  • Constitutional AI
  • Retrieval-Augmented Memory

The system can route reasoning strategies dynamically, apply refinement loops, integrate memory-aware reasoning, and evaluate outputs before finalizing responses.

The long-term philosophy is simple:

Inference should not only generate.
Inference should also think.

Long-Term Vision

Cevahir AI is currently focused primarily on text infrastructure.

However, the architecture was intentionally designed to remain extensible toward:

  • vision tokenizer systems
  • audio tokenizer infrastructures
  • multimodal reasoning
  • real-time sensor processing
  • embodied AI systems
  • real-time inference pipelines interacting with physical systems

A large part of the inspiration behind this direction comes from embodied AI research such as PaLM-E, RT-2, and SayCan.

The long-term objective is not merely generating text outputs.

The goal is building modular AI infrastructure capable of perceiving, interpreting, reasoning about, and eventually interacting with the real world.

Final Thoughts

Cevahir AI is not a finished product.

It’s an ongoing exploration of what a modular full-stack AI engine architecture could look like when tokenizer systems, neural architectures, reasoning layers, training orchestration, and inference pipelines are treated as parts of the same ecosystem instead of isolated tools.

The project is open source and still evolving rapidly.

GitHub:
Click for repository

Top comments (0)