DEV Community

Unicorn Developer
Unicorn Developer

Posted on

Let's make a programming language. Lexer—Key points

This is the third part in our series of talks on creating a programming language. The episode focuses on building a lexer—a fundamental component of a language interpreter responsible for breaking down input strings into atomic tokens.

About the speaker

Yuri Minaev is an experienced C++ developer, architect at PVS-Studio, and a recognized voice in the C++ community who has spoken at CppCast, C++ on Sea, and CppCon. Over the course of ten sessions, he'll guide you through each stage of building your own programming language.

The lexer in action

The speaker begins by revisiting grammars and describes how a lexer handles the lowest-level grammar elements like numbers, identifiers, operators, variable names, and keywords. Using simple binary expressions as examples, Yuri demonstrates how the lexer scans an input string character by character, groups symbols into tokens, and ignores whitespaces.

How the lexer works in C++

Yuri explains the lexer's internal design in C++, including the token structure, iterator-based scanning, and "lazy tokenization". Why is it "lazy"? Instead of handling the entire input at once, the lexer generates tokens on demand and supports token previewing through a cached token mechanism. The implementation uses string_view to avoid unnecessary memory allocations and relies on helper functions to classify characters as digits, letters, separators, or operators.

The talk also covers handling integers, floating-point numbers, identifiers, keywords, single-character operators, and multi-character operators such as == and !=. The lexer follows a greedy strategy and continues parsing even after encountering invalid input, producing error tokens rather than stopping execution.

Yuri wraps up the session by discussing ambiguous grammar problems in languages like C++, explains why the custom language avoids them, and outlines plans to implement a parser in the next part.

Want more?

If you want to watch other talks or see the whole episode, follow this link.

You can also sign up for our upcoming webinars, for example: Let's make a programming language. Parser.

If you'd like to learn more about PVS-Studio analyzer, check out our website.

See ya!

Top comments (0)