I’m currently working on a local server project with an unusual constraint: I can only use standard libraries. No external JSON packages, no helper parsing libraries, no frameworks doing the heavy lifting behind the scenes.
At first, I thought:
"JSON is simple. It's just key-value text."
Then I started thinking about what actually happens when a server receives:
{
"user": {
"name": "Amine",
"age": 22,
"skills": ["Go", "Java", "Networking"]
}
}
How does a computer transform those characters into usable objects and values?
How does text become structure?
That question pushed me into learning something I had seen before but never really understood:
- Lexing
- Parsing
- Abstract structures
This post isn't about implementation code. I'm still working on the project. Instead, this is the theoretical understanding I gained while building it.
The illusion of "simple text"
Humans see JSON and immediately understand structure.
We see:
{
"name": "John",
"age": 30
}
and our brains instantly think:
- object starts
- key = name
- value = John
- key = age
- value = 30
- object ends
Computers do not think this way.
To the machine, this initially looks closer to:
{
"
n
a
m
e
"
:
"
J
o
h
n
"
,
...
Just individual characters.
There is no "object."
There is no "key."
There is no "number."
There is only a stream of symbols.
The computer needs stages that gradually transform meaningless characters into meaningful structure.
Stage 1: Lexical analysis (tokenization)
The first major step is the lexer.
A lexer scans raw characters and groups them into meaningful pieces called tokens.
Think of it as reading letters and turning them into words.
Input:
{
"name":"John",
"age":30
}
Output tokens:
LEFT_BRACE
STRING(name)
COLON
STRING(John)
COMMA
STRING(age)
COLON
NUMBER(30)
RIGHT_BRACE
Notice something important:
The lexer does not care about relationships.
It doesn't know:
- whether
"name"is a key - whether
30belongs to"age" - whether the JSON structure is valid
Its only job is:
"I see characters. I convert them into recognizable pieces."
Why not skip the lexer?
Originally I wondered:
Why not directly parse characters?
Why introduce another stage?
I later realized that separating responsibilities makes everything simpler.
Without a lexer:
The parser would constantly need to think about:
- whitespace
- escaped characters
- number formats
- commas
- quotation marks
- special symbols
The parser would become messy.
Instead:
Lexer:
Raw characters → Tokens
Parser:
Tokens → Structure
Each component has a single responsibility.
Stage 2: Parsing
Once tokens exist, the parser begins building meaning.
Suppose we have:
LEFT_BRACE
STRING(name)
COLON
STRING(John)
COMMA
STRING(age)
COLON
NUMBER(30)
RIGHT_BRACE
The parser now asks questions like:
- Did an object start?
- Is a string followed by a colon?
- Is there a value after the colon?
- Is a comma separating entries?
- Did the object end correctly?
The parser is essentially validating rules.
JSON has a grammar.
Simplified:
Object
= { Members }
Members
= Pair (, Pair)*
Pair
= String : Value
Value
= String
| Number
| Object
| Array
| true
| false
| null
The parser walks through tokens trying to satisfy these rules.
Parsing feels like reading a sentence
I started seeing parsing as similar to language.
Sentence:
The cat eats fish
You unconsciously understand:
- "The cat" → subject
- "eats" → action
- "fish" → object
You don't process individual letters.
Your brain applies grammatical rules.
Parsers do something similar.
JSON:
{
"name":"John"
}
becomes:
Object
Pair
Key = name
Value = John
Structure begins to emerge.
Nested objects changed my understanding completely
Simple JSON is easy.
Then nesting appears:
{
"user":{
"name":"John",
"skills":[
"Go",
"Java"
]
}
}
This suddenly becomes much more interesting.
The parser cannot simply move left to right and forget previous information.
When it sees:
{
inside another object, it has to remember:
"I'm entering another level."
Then:
[
means:
"Now I'm entering an array inside that object."
Then eventually:
]
}
means:
"Exit those levels."
The parser is constantly entering and leaving contexts.
Almost like walking through rooms inside rooms.
House
└── Room
└── Closet
└── Box
Each opening symbol creates a new scope:
{
[
Each closing symbol exits one:
]
}
Trees started appearing everywhere
While reading more, I kept finding the same idea:
Everything becomes a tree.
The JSON:
{
"user":{
"name":"John",
"skills":[
"Go",
"Java"
]
}
}
can be imagined as:
Object
|
+-- user
|
+-- Object
|
+-- name
| |
| +-- John
|
+-- skills
|
+-- Array
|
+-- Go
|
+-- Java
I found this interesting because suddenly many things I had heard before started making more sense:
- HTML parsers
- compilers
- programming languages
- SQL parsers
- interpreters
They all repeatedly convert:
Characters
↓
Tokens
↓
Structured representation
Different input.
Same idea.
I finally understood what an AST is
I had heard "AST" many times and thought it sounded complicated.
AST means:
Abstract Syntax Tree
The word "abstract" confused me initially.
The idea is simpler than I expected.
The tree keeps only meaningful information.
Not unnecessary syntax.
For example:
{
"name":"John"
}
The parser may encounter:
- commas
- braces
- quotation marks
- colons
But the final structure cares about meaning:
Object
|
+-- name
|
+-- John
The syntax symbols helped build the structure, but they are not necessarily part of the final representation.
Error handling suddenly became less mysterious
Before learning this, parsing errors felt magical.
For example:
{
"name":"John"
"age":30
}
Missing comma.
Humans instantly notice it.
The parser notices because its grammar expected:
STRING
COLON
VALUE
COMMA
but instead found:
STRING
COLON
VALUE
STRING
The sequence violated the rules.
So parsing errors are not random.
They're basically:
"I expected X, but found Y."
What surprised me the most
I started this project trying to build a server.
I expected to learn:
- sockets
- requests
- networking
- concurrency
Instead, one of the biggest lessons became understanding how text becomes structured data.
Now whenever I use:
json.Unmarshal()
or:
JSON.parse()
I no longer see magic.
I imagine an invisible pipeline:
Raw text
↓
Characters
↓
Lexer
↓
Tokens
↓
Parser
↓
Tree/structure
↓
Objects and values
Before this project, JSON parsing felt like a built-in feature.
Now it feels more like a series of small, logical transformations.
And I think that's one of the most interesting things about building systems from scratch:
sometimes you start building one thing and end up understanding an entirely different layer of computing.
I'm still working on this local server project, and I'm still learning, but understanding the theory behind lexing and parsing changed how I look at data processing entirely.
Have you ever started a project for one reason and ended up learning something completely unexpected?
Top comments (0)