Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.
Internal Datatypes and Record Format
So far, we’ve seen how the VM executes instructions, manages cursors, and moves through bytecode programs. Today we zoom into something even more fundamental:
What exactly is a value inside the VM?
Before a value is written to disk, before it is compared in a WHERE clause, before it is returned through sqlite3_step(), it exists inside the VM as a memory object.
Understanding that representation is key to understanding SQLite’s flexibility.
Internal Datatypes
The VM uses an arbitrary number of numbered memory locations (registers). Each register holds exactly one value at a time.
Every value processed by the VM must belong to one of five primitive storage types:
- INTEGER – a signed integer
- REAL – a signed floating-point number
- TEXT – a character string
- BLOB – a byte image
- NULL – the SQL NULL
There are no other primitive types inside SQLite’s execution engine.
Everything whether it originated from disk, from an expression, or from a bound parameter becomes one of these five.
The type determines physical representation:
- INTEGER → stored in integer form
- REAL → stored in IEEE floating format
- TEXT → stored with encoding metadata
- BLOB → raw bytes
- NULL → special marker
Interestingly, some values can have multiple representations cached at once.
For example, the value 123 may simultaneously exist as:
- INTEGER 123
- REAL 123.0
- TEXT "123"
The VM keeps these representations synchronized when necessary. This caching avoids repeated conversions during expression evaluation.
Two types cannot have alternative representations:
- BLOB
- NULL
These remain as they are.
If a conversion is needed for example, comparing TEXT with INTEGER the VM performs implicit type conversion on demand. SQLite’s dynamic typing model relies heavily on this flexibility.
You can inspect storage types using:
SELECT a, typeof(a) FROM t1;
At the C API level, sqlite3_column_type() reveals the storage class of a value returned by sqlite3_step().
The Mem Object — The VM’s Value Container
Internally, almost every value inside the VM is represented as a Mem object.
Each element of the Vdbe.aMem array is one such structure.
Here is its conceptual layout:
A Mem object can contain:
- The actual value
- Flags describing its current storage type
- Metadata about encoding
- Cached alternate representations
The critical invariant is this:
A value always has exactly one storage class, even if multiple representations are cached.
For example, a Mem may cache both integer and string forms, but its storage class flag will indicate which one is canonical.
This distinction between storage class and cached representation is subtle but extremely important.
It allows SQLite to implement:
- Manifest typing
- On-the-fly type coercion
- Efficient comparison semantics
All without rigid schema enforcement at runtime.
From Values to Records
Individual values in registers are not what get stored in B-trees. The VM must first compose them into records.
A record is a logically contiguous byte string that contains:
- A key
- An optional payload (value portion)
Although the tree module may physically split a record across pages or overflow areas, the VM sees it as one continuous byte sequence.
The VM alone is responsible for formatting records. The tree module does not interpret field structure it just stores and retrieves byte strings.
Record Formats — Fixed vs Variable Length
There are two conceptual ways to format records:
Fixed-Length Format
- Each column occupies a predetermined amount of space.
- Record size is constant.
- Padding may be required.
- Column sizes are known at table creation time.
This design simplifies offset calculation but wastes space when values are small.
Variable-Length Format
- Column sizes vary per record.
- No padding.
- Record size depends on actual data.
- Offsets must be computed dynamically.
SQLite uses a variant of variable-length formatting.
Why?
Because it yields several advantages:
- Smaller database files (no wasted padding).
- Faster I/O (fewer bytes moved between disk and memory).
- Supports dynamic, manifest typing.
This final point is especially important.
If SQLite enforced static typing, fixed-length storage might be practical. But SQLite allows:
- An INTEGER column to store TEXT.
- A TEXT column to store numbers.
- Any column to store NULL.
Variable-length formatting makes this possible without structural constraints.
Manifest Typing — A Preview
Traditional databases enforce type at schema level. SQLite enforces type at value level.
This is called manifest typing.
Each value carries its own storage class, independent of column declaration.
That’s why the Mem object stores type flags with the value itself. The type is not determined solely by the table schema — it is determined by the value inserted.
This flexibility is what allows:
INSERT INTO t1(c1) VALUES(123);
INSERT INTO t1(c1) VALUES('abc');
to coexist even if c1 was declared as TEXT.
In the next sections, we will explore how this manifest typing model interacts with:
- Record headers
- Serial types
- On-disk encoding
Because once you understand how SQLite encodes records at the byte level, the design of the VM, the Mem object, and the B-tree layer will all snap into alignment.
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:
HexmosTech
/
git-lrc
Free, Unlimited AI Code Reviews That Run on Commit
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.
See It In Action
See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements
git-lrc-intro-60s.mp4
Why
- 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
- 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
- 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
- 🔗 Why git? Git is universal. Every editor, every IDE, every AI…



Top comments (0)