DEV Community: João Henrique Machado Silva

C++ or Blueprints: pros and cons, when to use one or the other?

João Henrique Machado Silva — Mon, 28 Nov 2022 10:18:18 +0000

Ever since my first contact with Unreal Engine I have been asked this question: When do you use C++ or Blueprints? Which one is better? Which one is faster?

Well, on this post I am going to try to answer all these questions and a couple more, but first I am going to set the foundation for it and talk a little about what they are and how they work behind the scenes. Hopefully by doing that the answer to the above questions will come easy.

One thing to be said and I think it should be said at the start, there is no silver bullet or one “right answer” that is going to fit every single project out there. But hopefully by the end of this article you will have more information to help you with your design choices in your projects.

Let’s talk a bit about C++

People are often very scared of using C++ in Unreal Engine and the reason is because pure C++ requires a wide range of low-level knowledge, having to take care of cleaning up memory, taking care of a build toolchain and a real understanding of what system programming is.

But the fact is that in Unreal Engine you don’t really use pure C++, the same way that in Unity you don’t really use pure C#.

C++ in Unreal Engine and C++ in the “real world” are two completely different beasts. — Me

And because of that writing C++ code in Unreal Engine is way easier than outside of it.

But it is important to say that it is still the same language, so when people say “Unreal C++” it’s not actually a modified version of C++ itself, but in fact an extensive game-specific API built on top of good old C++.

The Unreal Engine C++ API provides a number of low-level mechanisms to make your life as programmer much easier. I won’t go into too much detail on each one of them here otherwise this articles would turn into a book, but some of them are Reflection and Object Handling with Garbage Collection happening under the hood. These “features” work with everything that is marked as U-SOMETHING: UCLASS, UFUNCTION, UPROPERTY. Basically every object in the world is based on the UObject class, which is automatically garbage collected. Check Basic Class Structure.

Unreal C++ also provides libraries supporting math, vectors, strings (with support for text localization) and many other standard useful things. Just to make development easier, so game programmers do not have to reinvent the wheel every single game.

Unreal C++ API is commonly considered as “halfway to the simplicity of scripting languages” — in the perspective of full-time programmers. It’s still C++, the entry barrier is higher than any scripting language or C#.

What about Blueprints?

Blueprints were first introduced in Unreal Engine 4 and the short explanation is that Blueprints are a way to create a UCLASS without the need to actually write C++ code. When you create a new blueprint in the editor, you will have to extend it from some other class, that being a native C++ class or another blueprint.

But of course there are other basic design differences.

For one Blueprints nodes run in a Virtual Machine (VM) that enables the nodes to call native C++ functions. But that comes at a cost, relying on a VM to translate Blueprint node into native C++ can potentially slow your game performance’s. A lot of times this cost isn’t really noticeable and you can bear that cost, but there are times when you would want your game to run as fast as possible and you will have no choice but having at least parts of your project in native C++. And although this is not an exact science, you have tools to help figure it out what points to focus on like the Profiler Tool. And even tools to help convert some of your blueprints automatically to C++ code during packaging, like Blueprint Nativization.

Ok, but what are blueprints?

Well, blueprints are nothing more that a Visual Scripting Language, focus on Visual. With blueprints you basically use these nice looking nodes to do all your programming and logic. With blueprints you can add, edit and customize components, implement custom logic, create and respond to event and interactions, define custom variables, handle input and basically do pretty much the same things you would do with C++, with some limitations only. For instance there are something that as of the writing of this post you cannot do in Blueprint for example the setting up of the Gameplay Ability System.

C++ vs Blueprints

C++ Pros

Faster runtime performance: Generally C++ logic is significantly quicker than Blueprint logic, for reasons already discussed.
Explicit Design: When exposing variables or functions from C++ you have more control over exposing precisely what you want, so you can protect specific functions/variables and build a formal “API” for your class. This allows you to avoid creating overly large and hard to follow Blueprints.
Broader Access: Functions and variables defined in C++ (and exposed correctly) can be accessed from all other systems, making it perfect for passing information between different systems. Also, C++ has more engine functionality exposed to it than Blueprints, for example as already mentioned the Gameplay Ability System.
More Data Control: C++ has access to more specific functionality when it comes to loading and saving data. This allows you to handle version changes and serialization in a very custom way. One important note is that Enums and structs that are defined in C++ can be used both by C++ and Blueprints, but user structs/enums cannot be used in C++. So a good recommendation would be to implement critical enums and structs in C++.
Network Replication: Replication support in Blueprints is straightforward and is designed to be used in smaller games or for unique one-off Actors. If you need tight control over replication bandwidth or timing you will need to use C++. Hence why most large studios and project will do all their networking in C++.
Better version control: C++ code and data (as well as config and possibly custom solutions) is stored as text, which makes working with version control, diff and merges a lot easier.

C++ Cons

Some of the cons of C++ are also the pros of using Blueprints, so I’ll mention those below except for the fact that C++ have a much longer learning curve and although I still think that people should learn the basics of general programming and object oriented programming concepts before they go into Blueprints, that process in Blueprints is a lot faster.

Blueprints Pros

Faster Prototyping: Usually creating a new Blueprint class and adding variables and functions is faster than doing something similar in C++, so prototyping brand new systems is often faster in Blueprint.
Faster Iteration: It is much quicker to modify Blueprint logic and preview inside the editor than it is to go to your code editor, editing the code, recompile the game, and hoping the HOT RELOAD won’t screw things up. So blueprints are basically more tweakable most of the time.
Better Flow Visualization: Sometimes, although this one can be very personal, It can be complicated to visualize “game flow” in C++, so it is often better to implement that in Blueprints (or in custom systems like Behavior Trees that are designed for this). Delay and async nodes make it much easier to follow flow than using C++ delegates.
Designer and Artist friendly: Designers and artists without specific technical training can create and edit Blueprints, which makes Blueprints ideal for assets that need to be modified by more than just engineers.

Blueprints Cons

Yes, I did mention that Blueprints can be better for flow visualization, but if the developer is not careful and organized enough it can easily turn into very messy code.

Also as mentioned already, Enums and Structs created on Blueprint can only be accessed in Blueprints, so if you plan to move your gameplay logic to C++ over time it is recommended to implement at least critical Enums and Structs in C++, at least if more the one or two Blueprints use them.

Another point against Blueprints is that as of writing this article there are still a couple of things that you can only do it in C++, for example the setting up of the Gameplay Ability System.

And last but not least, there is performance issue that although is not always an issue and it can be circumvented by thing like Blueprint Nativization, I still have to mention it.

So, which one should I use?

Like I mentioned in the beginning of the article, there is no Silver Bullet. There are a lot of things to consider when choosing to work with Blueprints vs C++. From the expertise of the people working with it, how confortable is the team with C++ and Blueprints. What phase of the project you are in? What are the most important pillars of the project? What type of project plays a huge role in here also? What features and mechanics will your game have? Well, and the list goes on.

The fact is that Blueprints and C++ excel in different areas, and the one that should be used is the one that offers the most support and comfort to whatever needs to be accomplished.

What is my usual approach?

Again, it varies from project to project, but here are my two cents on the matter.

For the end product I always try to do at least all my base classes, implementations and systems in C++ so I can then extend it in blueprints. I fell like this is a very efficient workflow.
Related to above point, casting to Blueprints can sometimes be very expensive, so for this reason I always use native C++ base classes or minimal Blueprints base classes that define important functions, components and variables and then make your heavier Blueprints as child classes of those.
It is absolutely fine and recommended to prototype systems and features in Blueprint, since Blueprints are much faster to iterate.
I also try to do all my networking and event related logic in C++ and for the event surface them in Blueprints as needed.
Whatever you use, keep it well documented and commented.
For the rest, it has to be a case by case evaluation, bottom line is:

Use the right tool for the right job.

To wrap it up

On this article we started by taking a little bit of a dive into how things works behind the scenes for Blueprints and C++ so than we could talk about the pros and cons of each approach. To finalize I made sure to talk a bit about my take on the matter and my usual approach to when dealing with this topic.

I hope I was able to shine a bit of light on the subject and at least help you choosing your path on your next project.

Be sure to leave a thumbs up and of course comments are always welcome!

Cheers!

What would SQLite look like if written in Rust? — Part 3

João Henrique Machado Silva — Wed, 07 Apr 2021 13:56:10 +0000

Writing a SQLite clone from scratch in Rust

← Part 0 — Overview

← Part 1 — Understanding SQLite and Setting up CLI Application and REPL

← Part 2 — SQL Statement and Meta Commands Parser + Error Handling

View on Github (pull requests are more then welcome)

Alright, slowly we are getting closer and closer to our objective. Which is to have a working simple relational database modeled after SQLite, meaning among other things, embedded into a single file. This time on top of all the research I had already mentioned in the previous articles of the series, I spent a lot of time just researching everything SQLite related. E.g.: documentation, code base and talks.

I also spent a lot of my time since the last chapter of the series understanding the data structures used on relational databases at their fundamental principles, how they are used on SQLite and what would be best way to approach them on the SQLRite Project.

Data Structures are key to a good program.

Here’s a quote from Linus Torvalds in 2006:

I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

If you need more, here is one from Rob Pike in 1989:

Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

I think those guys know what they are talking about when it comes to software, don’t you agree? Anyway, my take on what they mean by it is that good data structures makes coding easy to design and maintain, whereas the best code in the world cannot make up for poor data structures. I think this also ties to what I said on Part 1, “If you spend enough time on planning, coding is easy”.

So on this chapter of the series we will do a deep dive into the main data structures used on database design and hopefully we will get a better understanding on why they are used.

So let’s get started!

The first thing I would like to explore is a data structure called B-Tree which is a key piece of modern database design.

Lets start by talking about Binary Search Trees

Traditionally sorted maps have been the domain of Binary Search Trees (BSTs). There is no shortage of literature, implementations and promotions about BSTs in the the educational system. They are great to think about, have a great theoretical and practical applications, and have about a million different variations to satisfy your needs.

The basic idea of a BST is that every element in the tree gets a single node. Each node has two pointers or children, a left child node and a right child node. Nodes in the sub-tree to the left must contain elements that are smaller then the parent, and nodes to the right must contain elements that are larger. This makes search fairly straight-forward: start at the “root” of the tree and compare the element in that node to your search key. Then recursively search either the left or right tree accordingly (or stop if you find an exact match).

There are several “types” of BSTs. For example if you use a Red-Black-Tree, then every basic operation (search, insert, remove) will take logarithmic time worst-case, which is great!

But BST have some serious practical issues when it comes to how computers actually work. A lots of applications, like databases and file systems, are not bound by how fast the CPU can execute instructions, but actually by how fast it can access the data on disk. Bottom line is that modern CPUs now a days go so fast that the data that they’re working with has to be really fast and close to CPU to actually be used. Light only travels so fast! One other thing to take into account are the caches available to the CPUs that have a hierarchy that ranges from “very small and very fast” to “very large and very slow (relatively)”.

How do caches work? At a very high level…

Caches usually work based on a time-space locality assumption. Ideally you would like for the data you need to be as close to each other as possible. For example, if you’re working with some data at location A right now, ideally you would want next data near location A. A great example of this assumption is something like looping over an array: every piece of data you want next is literally right next to the last one. So we can start to see how choosing the right data structure can make all the difference.

This assumption is usually implemented roughly like this: when location A in memory is requested, the CPU will check if it is in the fastest cache. If it is, great (cache hit)! If it’s not, too bad (cache miss), check the next (slower, bigger) level of cache. In the worst-case this bottoms out into your computer’s RAM (or worst, the disk!). When you do find the data, it gets added to all the previous (smaller, faster) levels of cache, along with some of the data surrounding it. This operation then evicts some data that is determined to be unlikely to be needed. How exactly that works is out of the scope of this post, but the point is: cache hits are fast, so we want to access data in a space-time local way. To get an idea of scale, check out these numbers.

We started talking about BSTs and ended up talking about CPUs and Cache Hits and Misses. But how do BSTs accesses data? Basically randomly. Each node is allocated separately from every other node in the tree. So unlike an array, each node is in a separate location in memory and a search will amount to a series of random accesses. As a rough estimate, every time you go from a pointer to another you can expect a cache miss. Niet goed! (As the dutch would say!)

BSTs are actually very memory inefficient. Each node has two pointers for every single entry in the tree. So on a 64-bit that means you’ve got a 16-byte overhead for every element. To make it worst, half of those pointers are null. And that’s in the best case scenario. When you factor in issues like padding and any extra metadata that nodes need to store (such as in a red-black tree), they are just terrible.

To make things worst for BSTs, note that every insertion triggers an allocation. Allocations are generally viewed as a slow thing to do, so if we can avoid them, we would!

B-Trees to the rescue!

What’s a B-Tree? And why?

A B-Tree is a self-balancing tree data structure that maintains data sorted and allows for searches, sequential access, insertions and deletions in logarithmic time. But wait?! Isn’t that exactly what a BST is? So why do I need a B-Tree?

A B-Tree takes the idea of a BST, and roughly speaking, say “how about we put some arrays in there, arrays are easy and computers love arrays, so why not?”. So instead of each node consisting of a single element with two children, a B-Tree have an array of elements with an array of children. All ordered!

Basically, for some fixed constant B, each node contains between B-1 and 2B-1 elements in sorted order (root can have as few as one element). An internal node (one which has children) with k elements has k+1 children. In this way each element still has a “left” and “right” child, but for example the 2nd child contains elements strictly between the first and second element.

According to Knuth’s definition, a B-tree of order M is a tree which satisfies the following properties:

Every node has at most m children.
Every non-leaf node (except root) has at least ⌈M/2⌉ child nodes.
The root has at least two children if it is not a leaf node.
A non-leaf node with k children contains k − 1 keys.
All leaves appear in the same level and carry no information.

Each internal node’s keys act as separation values which divide its subtrees. For example, if an internal node has 3 child nodes (or subtrees) then it must have 2 keys: _a1 and _a2. All values in the leftmost subtree will be less than _a1, all values in the middle subtree will be between _a1 and _a2, and all values in the rightmost subtree will be greater than _a2.

Historically, ever since its invention in 1970, B-Trees have been very popular as a data structure stored on disk. This is because accessing disk can be a super slow operation, but with a B-Tree you get back big chunks at once, in contrast to a BST. It takes way less CPU time to search through your data than it does to read the data into memory from disk. So if you for example take B = 1000, then you can get a thousand entries in the tree at once and process them in RAM relatively fast. Your tree will also be really shallow (not deep), specially in comparison to a BST, meaning each search will maybe hit the disk by following a pointer only a couple times. Sound familiar? This is the cache-hit problem that we were talking about just above.

Just so you could have an idea, here are some real numbers:

A 100,000 row SQLite database has a B-Tree with a depth 3. So to fetch a node we only need to read 3 pages or jump 3 times between nodes. If we had a BST we would have needed to do log(100000) / log(2) = 16 seeks! That’s more than five times as many. We definitely do not want that!

That is all nice and very interesting, but enough theory about B-Trees and let’s take a look into how SQLite uses it. If you would like to more in depth about B-Tree the its implementation in different programming languages, I highly recommend taking a look at the Open Data Structures. For another great source of information that gives a lot of background on B-Trees, see Knuth, The Art of Computer Programming, Volume 3 “Sorting and Searching”, pages 471–479. Nice light reading…

SQLite uses B-Trees to represent both tables and indexes. But SQLite actually uses two variants of B-Trees. Traditional B-Tree is used to store indexes and it is referred as “Index B-Trees”. And to store tables, SQLite uses a variation called B+ Tree, that is referred to as “Table B-Trees”.

The main difference is that a “Table B-Tree” or B+Tree uses a 64-bit signed integer key to reference the data in the Internal Nodes and all the data is actually stored in the Leaf Nodes. This 64-bit signed integer is referred to as “ROWID”. Additionally, a leaf node may include a pointer to the next leaf node to speed sequential access. Now an “Index B-Tree” or just B-Tree uses arbitrary keys and does not store any data.

Here are some basic differences:

+-------------------------------+----------------+-----------------+  
|               #               |      B-tree    |       B+ tree   |  
+-------------------------------+----------------+-----------------+  
| Pronounced                    | “Bee Tree”     | “Bee Plus Tree” |      
| Used to store                 | Indexes        | Tables          |      
| Internal nodes store keys     | Yes            | Yes             |      
| Internal nodes store values   | Yes            | No              |      
| Number of children per node   | Less           | More            |      
| Internal nodes vs. leaf nodes | Same struct    | Different struct|   
+-------------------------------+----------------+-----------------+

Another important piece of information is that SQLite keeps one “Table B-Tree” in the database for each table in the database schema, including system tables such as sqlite_schema. SQLite also has a type of table called WITHOUT_ROWID, but I won’t get into this right now. SQLite also keeps one “Index B-Tree” in the database for each index in the schema, including implied indexes created by UNIQUE constraints.

Let’s try to visualize it!

Imagine that you have the following table stored in the database:

+-------+-------+-------+-----+  
| ROWID | Name  | Marks | Age |  
+-------+-------+-------+-----+  
|     6 |  Jone |     5 |  28 |  
|    15 | Alex  |    32 |  45 |  
|    12 | Tom   |    37 |  23 |  
|    53 | Ron   |    87 |  13 |  
|    24 | Mark  |    20 |  48 |  
|    25 | Bob   |    89 |  32 |  
+-------+-------+-------+-----+

First, the database creates a unique auto incrementing index (ROWID or uses the primary key) for each of the given record and convert relevant rows into a byte stream. Then it stores each of the key and record byte stream on a B+tree. Here, the ROWID is used as the key for indexing. The key and record byte stream altogether know as Payload. Resulting B+tree could be represented as follows.

B+tree on SQLite Database

As you can see, all records are stored on leaf nodes of the B+tree or “Table B-Tree” and index or ROWID are used as the key to creating a B+tree. There are no records stored on Internal nodes. And each of the leaf nodes have a reference to next record in the tree. This way the database can perform a binary search by using the index or sequential search by searching through every element and only traveling through the leaf nodes.

If no indexing used, then the database will read each of these records to find the given record. When indexing enabled, the database creates a B-Tree or “Index B-Tree” for each of the indexed columns in the table as follows. In this example we are indexing the “Name” column and the column value is used as a key in the B-tree for indexing. The index is the reference to the actual data record in the “Table B-Tree”.

Index B-Tree on SQLite Database

When indexing is available, first the database engine will search a given key in corresponding Index B-Tree and get the index in logarithmic time. Then it performs another search in Table B-Tree or B+Tree by using already found index also in logarithmic time and returns the record.

Since we didn’t really do any coding this time, here it is just a taste of how a Database is represented in the SQLite codebase, which by the way is really well commented and documented.

sqliteint.h

There is it, out beloved Btree, representing the beginning of our database. If we dig deep into these properties we will start to see a lot of what we have been talking about in the post. I won’t do this here, but I highly recommend that you take a look at the SQLite code, if you are curious.

Summary

I am very big fan of the First Principles Approach. When applying First Principles you first try to Identify the problem you are trying to solve, define some constraints and assumptions. The good thing about defining constraints at the start is that by defining constraints you are basically defining what you cannot do. And if you define what you cannot do, what you can do becomes a lot more clear. After this first step you breakdown the problem into smaller pieces or its fundamental principles. And last but not least, create a new solution from scratch.

This time we took a very important step towards understanding the problem, defining some constraints by defining some limitations of some data structures and even broke down the problem into smaller pieces like how to deal with tables and indexes.

Now we are ready to decide how we are going to build a solution to the problem from scratch, meaning, how are we actually going to build our data structures in Rust.

← Part 0 — Overview

← Part 1 — Understanding SQLite and Setting up CLI Application and REPL

← Part 2 — SQL Statement and Meta Commands Parser + Error Handling

View on Github (pull requests are more then welcome)

If you wanna follow this track don’t forget to follow me here on Medium and also give a couple of claps!

Additional sources:

What would SQLite look like if written in Rust? — Part 2

João Henrique Machado Silva — Tue, 23 Feb 2021 15:56:01 +0000

What would SQLite look like if written in Rust? — Part 2

Writing a SQLite clone from scratch in Rust

Part 0 — Overview

Part 1 — Understanding SQLite and Setting up CLI Application and REPL

Part 3 - Understanding the B-Tree and its role on database design

Alright! We are building a clone of SQLite, aka SQLRite, and last time all we did was a simple CLI application that will use for external commands and some accessory functions and also a simple REPL that would take a simple command to exit the application gracefully.

Now we are taking another couple of steps towards our goal. First we want to parse the input to be able to differentiate if the input is a MetaCommand or a SQLCommand . A MetaCommand start with a dot and take direct actions like .open , .help and .exit . And a SQLCommand is, well, you know.

The second step we want to take is to be able parse each of the command types and take the appropriate action. For now, we wont go too far on the database side, but we do want to be able to differentiate between different SQL Statements and have their components broken down into parts and ready to be executed. So next time we can focus on getting the parsed SQL Statement and execute it. Even against a simplified in-memory storage.

The frontend of SQLite parses the SQL statements, optimizes them, and as I mentioned on the previous post of the series, generates equivalent SQLite internal representation called bytecode

This bytecode is then passed to the virtual machine, on the backend side, which executes it.

Here is a diagram os the SQLite Architecture to refresh our minds.

Breaking the logic into steps like this has a couple advantages:

Reduces the complexity of each part (e.g. virtual machine does not worry about syntax errors).
Allows compiling common queries once and caching the bytecode for improved performance.

With this in mind, let’s get started!

First let’s look at some changes in our main.rs . Since this code can get quite extensive I won’t be commenting line by line, but instead focusing on the main points and design choices. For that reason I will try to add as many comments as possible and of course make the code as readable as I can. Nevertheless, please do not hesitate to start a discussion board, create an issue or contact me directly in case you have any questions.

Getting back to the project, as I mentioned above, the first thing I wanted to do is to be able to differentiate between a MetaCommand and a SQLCommand . You can see that I take care of that on line 64 by calling get_command_type(command: &String) that returns an enum of type repl::CommandType with the choices repl::CommanType::SQLCommand(String) and repl::CommanType::MetaCommand(String) . This way I can easily differentiate between the two types of inputs and take the appropriate action on each of them.

Also, as I mentioned on the previous post of the series, I like to keep the main.rs as clean as possible, and just like a point of entry to the application. The main.rs should have the least amount of clutter as possible.

extern crate clap;

mod meta_command;
mod repl;
mod sql;
mod error;

use repl::{REPLHelper, get_config, get_command_type, CommandType};
use meta_command::{handle_meta_command};
use sql::{process_command};

use rustyline::error::ReadlineError;
use rustyline::{Editor};

use clap::{App, crate_authors, crate_description, crate_version};

fn main() -> rustyline::Result<()> {
    env_logger::init();

    let _matches = App::new("Rust-SQLite")
                          .version(crate_version!())
                          .author(crate_authors!())
                          .about(crate_description!())
                          .get_matches();

    // Starting Rustyline with a default configuration
    let config = get_config();

    // Getting a new Rustyline Helper
    let helper = REPLHelper::default();

    // Initiatlizing Rustyline Editor with set config and setting helper
    let mut repl = Editor::with_config(config);
    repl.set_helper(Some(helper));

    // This method loads history file into memory
    // If it doesn't exist, creates one
    // TODO: Check history file size and if too big, clean it.
    if repl.load_history("history").is_err() {
        println!("No previous history.");
    }

    // Friendly intro message for the user
    println!("Rust-SQLite - {}\n{}{}{}{}",
    crate_version!(),
    "Enter .exit to quit.\n",
    "Enter .help for usage hints.\n",
    "Connected to a transient in-memory database.\n",
    "Use '.open FILENAME' to reopen on a persistent database.");

    loop {
        let p = format!("sqlrite> ");
        repl.helper_mut()
            .expect("No helper found")
            .colored_prompt = format!("\x1b[1;32m{}\x1b[0m", p);
        // Source for ANSI Color information: http://www.perpetualpc.net/6429_colors.html#color_list
        // http://bixense.com/clicolors/

        let readline = repl.readline(&p);
        match readline {
            Ok(command) => {
                repl.add_history_entry(command.as_str());
                // Parsing user's input and returning and enum of repl::CommandType
                match get_command_type(&command.trim().to_owned()) {
                    CommandType::SQLCommand(_cmd) => {
                        // process_command takes care of tokenizing, parsing and executing
                        // the SQL Statement and returning a Result<String, SQLRiteError>
                        let _ = match process_command(&command) {
                            Ok(response) => println!("{}",response),
                            Err(err) => println!("An error occured: {}", err),
                        };
                    }
                    CommandType::MetaCommand(cmd) => {
                        // handle_meta_command parses and executes the MetaCommand
                        // and returns a Result<String, SQLRiteError>
                        let _ = match handle_meta_command(cmd) {
                            Ok(response) => println!("{}",response),
                            Err(err) => println!("An error occured: {}", err),
                        };
                    }
                }
            }
            Err(ReadlineError::Interrupted) => {
                break;
            }
            Err(ReadlineError::Eof) => {
                break;
            }
            Err(err) => {
                println!("An error occured: {:?}", err);
                break;
            }
        }
    }
    repl.append_history("history").unwrap();

    Ok(())
}

Meta Commands

Next we will move to the meta_command module, which in the repository you will find it in src/meta_command/mod.rs . The idea here is to write a code that is scalable up to a point. I want to easily be able to add more MetaCommands in the future. There are four points on this module.

First the enum type definition, that to improve user experience I added an option Unknown to pick up any MetaCommands not yet defined. After that we have an impl block for the fmt::Display trait, that helps us configure how custom types would be printed out in case we want to use them in a println! macro for example. Then on line 25 you will see another impl block with a fn new method, acting as a constructor for our MetaCommand type. I say acting because Rust is not an Object Oriented language, so the fn new is not like a constructor would be in languages like Java , in fact you could call it anything you want instead of new .

And last but not least we have the pub fn handle_meta_command function, that is responsible for matching the inputed MetaCommand to it’s appropriate command and taking action. You will notice that is returns a Result<String, SQLRiteError> , so we can return a message to the user with ease.

use crate::error::{Result, SQLRiteError};

use std::fmt;

#[derive(Debug, PartialEq)]
pub enum MetaCommand {
    Exit,
    Help,
    Open(String),
    Unknown,
}

/// Trait responsible for translating type into a formated text.
impl fmt::Display for MetaCommand {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            MetaCommand::Exit => f.write_str(".exit"),
            MetaCommand::Help => f.write_str(".help"),
            MetaCommand::Open(_) => f.write_str(".open"),
            MetaCommand::Unknown => f.write_str("Unknown command"),
        }
    }
}

impl MetaCommand {
    pub fn new(command: String) -> MetaCommand {
        let args: Vec<&str> = command.split_whitespace().collect();
        let cmd = args[0].to_owned();
        match cmd.as_ref() {
            ".exit" => MetaCommand::Exit,
            ".help" => MetaCommand::Help,
            ".open" => MetaCommand::Open(command),
            _ => MetaCommand::Unknown,
        }
    }
}

pub fn handle_meta_command(command: MetaCommand) -> Result<String> {
    match command {
        MetaCommand::Exit => std::process::exit(0),
        MetaCommand::Help => {
            Ok(format!("{}{}{}{}{}","Special commands:\n",
                            ".help - Display this message\n",
                            ".open <FILENAME> - Reopens a persistent database.\n",
                            ".ast <QUERY> - Show the abstract syntax tree for QUERY.\n",
                            ".exit - Quits this application"))
        },
        MetaCommand::Open(args) => Ok(format!("To be implemented: {}", args)),
        MetaCommand::Unknown => Err(SQLRiteError::UnknownCommand(format!("Unknown command or invalid arguments. Enter '.help'"))),
    }
}

Alright people! We are finally going to do some database stuff! I bet everyone was like “Wasn’t this guy suppose to build a database?”, well yeah, but you gotta build a base first.

Like when you are building a house, laying a nice foundation is one of the most important things you can do for your software.

Database Stuff

This is our sql module and in the github repository you will find it in src/sql/mod.rs . This actually doesn’t look that different from our meta_command module, at least structure wise. We have an enum , defining the types of queries we plan to support at first. Then a impl block with a fn new method, again acting as a constructor .

And then a fn process_command function returning a Result<String, SQLRiteError>, that if you can remember is invoked from our main.rs . On this function is where the magic starts to happen. You will notice that right at the beginning of the fn process_command function we make use of the sqlparser-rs crate, that did a great job building a Extensible SQL Lexer and Parser for Rust with a number of different SQL dialects, including a SQLite dialect, so for the time being I decided to go with them instead of writing a completely new SQL Lexer . By calling Parser::parse_sql() I am getting it back a Result<Vec<Statement>, ParserError which I do some basic checking and pass it it to a match statement to determine which type of SQL Statement was inputed or if there was an error during the process, if so I just return the error. The Statement returned is a sqlparser::ast::Statement , which is an enum of all the possible statements, as you can see in the link I added from the sqlparser documentation.

For now, the only SQL Statement I managed to actually build the parser was CREATE TABLE , for the rest so far we are only identifying the type of SQL Statement and returning to the user. In the match statement block that matches with CREATE TABLE we call another module parser::create which contains all the logic for the CREATE TABLE . I have this one right after this block.

mod parser;
pub mod tokenizer;

use parser::create::{CreateQuery};

use sqlparser::ast::{Statement};
use sqlparser::dialect::SQLiteDialect;
use sqlparser::parser::{Parser, ParserError};

use crate::error::{SQLRiteError, Result};

#[derive(Debug,PartialEq)]
pub enum SQLCommand {
    Insert(String),
    Delete(String),
    Update(String),
    CreateTable(String),
    Select(String),
    Unknown(String),
}

impl SQLCommand {
    pub fn new(command: String) -> SQLCommand {
        let v = command.split(" ").collect::<Vec<&str>>();
        match v[0] {
            "insert" => SQLCommand::Insert(command),
            "update" => SQLCommand::Update(command),
            "delete" => SQLCommand::Delete(command),
            "create" => SQLCommand::CreateTable(command),
            "select" => SQLCommand::Select(command),
            _ => SQLCommand::Unknown(command),
        }
    }
}

/// Performs initial parsing of SQL Statement using sqlparser-rs
pub fn process_command(query: &str) -> Result<String> {
    let dialect = SQLiteDialect{};
    let message: String;
    let mut ast = Parser::parse_sql(&dialect, &query).map_err(SQLRiteError::from)?;

    if ast.len() > 1 {
        return Err(SQLRiteError::SqlError(ParserError::ParserError(format!(
            "Expected a single query statement, but there are {}",
            ast.len()
        ))));
    }

    let query = ast.pop().unwrap();

    // Initially only implementing some basic SQL Statements
    match query {
        Statement::CreateTable{..} => {
            let result = CreateQuery::new(&query);
            match result {
                Ok(payload) => {
                    println!("Table name: {}", payload.table_name);
                    for col in payload.columns {
                        println!("Column Name: {}, Column Type: {}", col.name, col.datatype);
                    }
                },
                Err(err) => return Err(err),
            }
            message = String::from("CREATE TABLE Statement executed.");
            // TODO: Push table to DB
        },
        Statement::Query(_query) => message = String::from("SELECT Statement executed."),
        Statement::Insert {..} => message = String::from("INSERT Statement executed."),
        Statement::Delete{..} => message = String::from("DELETE Statement executed."),
        _ => {
            return Err(SQLRiteError::NotImplemented(
                "SQL Statement not supported yet.".to_string()))
        }
    };

    Ok(message)
}

This is our sql::parser::create module. Here we have two struct types defined. The first one being ParsedColumn , well, representing a column in a table and the second one being CreateQuery , representing a table. As you can see the CreateQuery struct has a property called columns which is a vector of ParsedColumns . And our main method on this module, which is the fn new , returns a Result<CreateTable, SQLRiteError> , which will then be inserted into our database data structure that is still to be defined in the code, although I already have a pretty good idea of what is going to look like in my head and my design notes.

use sqlparser::ast::{ColumnOption, DataType, ObjectName, Statement};

use crate::error::{SQLRiteError, Result};

// Represents Columns in a table
#[derive(PartialEq, Debug)]
pub struct ParsedColumn {
    pub name: String,
    pub datatype: String,
    pub is_pk: bool,
    pub is_nullable: bool,
}

/// Represents a SQL Statement CREATE TABLE
#[derive(Debug)]
pub struct CreateQuery {
    pub table_name: String,         // table name
    pub columns: Vec<ParsedColumn>, // columns
}

impl CreateQuery {
    pub fn new(statement: &Statement) -> Result<CreateQuery> {
        match statement {
            // Confirming the Statement is sqlparser::ast:Statement::CreateTable
            Statement::CreateTable {
                name,
                columns,
                constraints: _constraints,
                with_options: _with_options,
                external: _external,
                file_format: _file_format,
                location: _location,
                ..
            } => {
                let table_name = name;
                let mut parsed_columns: Vec<ParsedColumn> = vec![];

                // Iterating over the columns returned form the Parser::parse:sql
                // in the mod sql
                for col in columns {
                    let name = col.name.to_string();
                    // TODO: Add datetime and timestamp here
                    // Parsing each column for it data type
                    // For now only accepting basic data types
                    let datatype = match &col.data_type {
                        DataType::SmallInt => "int",
                        DataType::Int => "int",
                        DataType::BigInt => "int",
                        DataType::Boolean => "bool",
                        DataType::Text => "string",
                        DataType::Varchar(_bytes) => "string",
                        DataType::Float(_precision) => "float",
                        DataType::Double => "float",
                        DataType::Decimal(_precision1, _precision2) => "float",
                        _ => {
                            println!("not matched on custom type");
                            "invalid"
                        }
                    };

                    let mut is_pk: bool = false;
                    for column_option in &col.options {
                        is_pk = match column_option.option {
                            ColumnOption::Unique { is_primary } => is_primary,
                            _ => false,
                        };
                    }

                    parsed_columns.push(ParsedColumn {
                        name,
                        datatype: datatype.to_string(),
                        is_pk,
                        is_nullable: false,
                    });
                }
                // TODO: Handle constraints,
                // Unique, Primary Key, Nullable, Default value and others.
                for constraint in _constraints {
                    println!("{:?}", constraint);
                }
                return Ok(CreateQuery {
                    table_name: table_name.to_string(),
                    columns: parsed_columns,
                });
            }

            _ => return Err(SQLRiteError::Internal("Error parsing query".to_string())),
        }
    }
}

Dealing with errors

You may have noticed that throughout the entire code I am making reference to a SQLRiteError type. That is an error type I defined as an enum using the thiserror crate, that is a super easy to use library that provides a convenient derive macro for the standard library’s std::error::Error trait. If you check the commits in the github repository, you may notice that I first wrote my own implementation of the std::error::Error trait. But then I bumped into this trait, that basically takes care of a lot of the boiler plate, and let’s face it, the code looks super clean! This is our error module so far, located in src/error.rs .

use thiserror::Error;

use std::{result};

use sqlparser::parser::ParserError;

pub type Result<T> = result::Result<T, SQLRiteError>;

#[derive(Error, Debug, PartialEq)]
pub enum SQLRiteError {
    #[error("Not Implemented error: {0}")]
    NotImplemented(String),
    #[error("General error: {0}")]
    General(String),
    #[error("Internal error: {0}")]
    Internal(String),
    #[error("Unknown command error: {0}")]
    UnknownCommand(String),
    #[error("SQL error: {0:?}")] 
    SqlError(#[from] ParserError),
}

/// Return SQLRite errors from String
pub fn sqlrite_error(message: &str) -> SQLRiteError {
    SQLRiteError::General(message.to_owned())
}

Summary

Alright! This time we managed to parse the user’s commands to differentiate between MetaCommand and SQLCommand . We also implemented a somewhat scalable MetaCommand module that makes it easy to add more commands in the future. We added the sql module, which by making use of the sqlparser-rs crate we are successfully parsing SQL Statements and being able to generate an ast from each SQL Statement. And we are already parsing and generating at least a simplified version of a bytecode from the CREATE TABLE SQL Statement, that is ready to go into the database (which we will do in the next chapter). And to finish it off, we also created an error module, so we have a standardized way of dealing with error throughout the application.

I would say that we are starting out with a nice base. What do you think?

I added a Project Progress and Roadmap sections to the github repository README.md , to add some visibility on where we are and where we are going.

Next time, we will finish parsing the basic SQL Statements we plan to be compatible with and start working on a In-Memory simplified version of out database .

View on Github (pull requests are more then welcome)

If you wanna follow this track don’t forget to follow me here on Dev.to and also give some love!

Part 0 — Overview

Part 1 — Understanding SQLite and Setting up CLI Application and REPL

Part 3 - Understanding the B-Tree and its role on database design

What would SQLite look like if written in Rust? — Part 1

João Henrique Machado Silva — Tue, 16 Feb 2021 08:48:00 +0000

What would SQLite look like if written in Rust? — Part 1

Writing a SQLite clone from scratch in Rust

← Part 0 - Overview

Part 2 — SQL Statement and Meta Commands Parser + Error Handling →

I think there isn't a developer out there that hasn't used at least one of the relational databases available out there, but do they really know how they work. I bet not. To be honest, a lot of what goes on behind the curtains with databases is still a black box to me, even after almost 2 decades working as a software developer.

Like I said in the previous post, here are some of the questions I want to be able to answer:

What format is data saved in? In our case, in memory and on disk.
When does it move from memory to disk? How would that work exactly?
Why we only have one primary key per table? Do you know?
How does rolling back a transaction work?
How are indexes formatted? And how do they work?
When and how does a full table scan happen? Does it happen every time we issue a query? Do the indexes help? How?
What format is a prepared statement saved in?
What extra features can we add to our DB, if any?

Basically, how a database really works?

To get there I want to write a database from scratch, in Rust and I will model it after the existing and popular SQLite. Why SQLite? Well, because I just don't have the time write something like MySQL or PostgreSQL, and possibly not the expertise either. Also, I want to develop a small database, contained in a single file, with the main goal of improving my understanding of it, and for that we don't need so many features as big distributed database.

Also I think it is important to say that even SQLite it self has a lot of features! You can see that even by typing sqlite3 --help . My intention here, at least at first, is not to deliver a full on replacement with all the features available.

The goal is to have a simple relational database contained in a single file and modeled after SQLite.

And for that my plan is to both, one take my time and second take one step at a time, with each post I plan to add one or two features and let's see how long this will take. I am not in a hurry, are you?

A former colleague of mine used to say something that really stuck to me and that was that:

If you spend enough time on planning, coding is easy.

As software developers I think it is in our nature to want to get our hands dirty and see things working fast. I know, it's fun to build things our of nothing and see them working. But a lot of times if we rush the process we probably will have to go back and redo the entire thing. Anyway, I am saying this because I do plan to take my time on each part and really dig in to the works of how SQLite was designed so I can try to replicate. If I really understand how it works and I have a step by step and a clear goal than the coding part will be piece of cake.

So, let's go!

Undestanding SQLite

Thankfully the documentation on their website is quite good, Architecture of SQLite. Plus I’ve got a copy of SQLite Database System: Design and Implementation, that was a great help understanding some concepts.

SQLite works by compiling SQL text into bytecode, then running that bytecode using a virtual machine.

The SQLite interfaces act as a compiler by converting SQL text into bytecode. This bytecode is basically a sqlite3_stmt object that implements a single SQL statement. The statement that has been compiled into binary form (bytecode) is ready to be evaluated and is passed on to the virtual machine, which in turn runs the program until it either completes, or forms a row of results to be returned, or hits a fatal error, or is interrupted.

Digging a little deeper…

A query on SQLite goes through a specific chain of components in order to interact with actual data.

SQLite is designed with a very modular architecture. The architecture consists of seven major components, partitioned into two parts.

Frontend Parsing System
Backend Engine

The frontend compiles each SQL statement and the backend executes the compiled statement.

The Frontend:

As I mentioned the frontend pre-processes SQL statements as well as SQLite commands that are then sent as inputs. It parses the statements (and commands), optimizes them, and as I mentioned before, generates equivalent SQLite internal bytecode so that the backend can execute.

Tokenizer: Is responsible for splitting the inputed SQL statements into separate tokens, by doing a lexical analysis on the input.
Parser: Is responsible for analyzing the structure of the SQL statement by analyzing the tokens produced by the Tokenizer, and generates a parse tree from the tokens.
Code Generator: It traverses the parse tree and generates the already mentioned so many times bytecode, that when executed by the backend, should be able to produce the result from SQL statement.

The Backend:

The backend is the engine that takes the bytecode generated by the frontend and executes it. This engine does actual thedatabase processing work and it is composed of 4 modules: Virtual Machine, B-Tree, Pager and Operating System Interface.

Virtual Machine: The virtual machine takes the bytecode generated by the frontend and it executes. This is the ultimate manipulator of data in the database. It can see a database as a collection of tables and indexes that are actually stored as a data structure called B-Tree. The VM is essentially a big switch statement on the type of bytecode instruction. Or is it? We will find out.
B-Tree: Is responsible for actually organizing the data into an ordered tree data structure. Each table and indexes get their own B-Tree's. The use of this structure helps the VM to search, insert, delete and update the data into trees. It also helps the VM to create or delete new tree if needed.
Pager: The B-Tree module requests information from the disk in fixed-size pages. The default page_size is 4096 bytes but can be any power of two between 512 and 65536 bytes. The Pager Module is responsible for reading, writing, and caching these pages. The page cache also provides the rollback and atomic commit abstraction and takes care of locking of the database file, implementing transactional ACID properties. The B-Tree driver requests particular pages from the Pager and notifies the Pager when it wants to modify pages or commit or rollback changes. The Pager handles all the messy details of making sure the requests are handled quickly, safely, and efficiently. Definitely a challenge this one.
Operating System Interface: This module is responsible for providing a uniform interface to different operating systems. It implements routines for file I/O, thread mutex, sleep, time, random number generation, etc. For the purposes of this project, I am going to support different platforms. At least not at first. But PR's are more than welcome.

A journey of a thousand miles begins with a single step.

I don't really know who said that, but it is true. So let's start with something simple. Let start by making a simple CLI executable with a REPL program that responds to basic commands and can exit gracefully.

Making a Simple CLI with a REPL in Rust

Well, since we are modeling after SQLite, we might as well try to mirror what SQLite looks like, to make it easier on users. When you run sqlite3 , SQLite starts a read-execute-print loop, aka REPL.

$ sqlite3  
SQLite version 3.30.0 2019-10-04 15:03:17  
Enter ".help" for usage hints.  
Connected to a transient in-memory database.  
Use ".open FILENAME" to reopen on a persistent database.  
sqlite> .databases  
main:  
sqlite> .blah  
Error: unknown command or invalid arguments:  "blah". Enter ".help" for help  
sqlite> .exit

In order to do that I had to make some design choices here. Since the focus of this project is to study and build a database, I decided to let most of my efforts focus on that, meaning I do not want to spend most of my time re-inventing the wheel and writing CLI interpreters or REPL logic for example. So for those I decided to make use of already developed and somewhat mature third party libraries to get it done. And who knows, maybe in the future if I have some spare time and I realize that the use of those third party libraries are really impacting the overall performance of the application, I can always come back and replace them.

The REPL logic is pretty straight forward, basically we will need an infinite loop that prints a prompt, gets an input line, validates and then processes that line. I decided to go with the crate rustyline, which is already pretty mature, memory efficient and already solved a lot of the issues we would have to deal with, even from the user experience side, for example, providing hints and auto-completion in real time, which is a great feature.

So before I jump into my code, which you can already find it on Github by the way, I am going to quickly explain, using a quick code snippet and some comments, how Rustyline works with a simple example.

First we need to add the dependency in your cargo.yaml file:

[dependencies]  
rustyline = "7.1.0"

And in your main.rs you can add this (maybe without my comments):

use rustyline::error::ReadlineError;
use rustyline::Editor;

fn main() {
    // This line creates an Editor with the default configuration options.
    let mut repl = Editor::<()>::new();
    // This if statement loads a file with the history of commands
    // If the file does not exists, it creates one.
    if repl.load_history("history.txt").is_err() {
        println!("No previous history.");
    }
    // This is our infinite loop. We will be here until the user terminates the program.
    loop {
        // This line asks the user to input a command. You can add whatever you want in here as a prefix.
        let readline = repl.readline(">> ");

        // The readline method returns an Result. Which we now use a match statement to filter the result.
        match readline {
            Ok(line) => {
                repl.add_history_entry(line.as_str());
                println!("Line: {}", line);
            },
            Err(ReadlineError::Interrupted) => {
                println!("CTRL-C");
                break
            },
            Err(ReadlineError::Eof) => {
                println!("CTRL-D");
                break
            },
            Err(err) => {
                println!("Error: {:?}", err);
                break
            }
        }
    }
    // Here we are saving the commands into the file. Until now they are stored in memory.
    repl.save_history("history.txt").unwrap();
}

And that is it. With that you would have a basic REPL program up and running. I hope my comments made it clear for you. If not, feel free to message me and ask.

Now let's get back to our Rust-SQLite program!

I assume that if you are trying to follow this and writing some code along with me you can manage to create an empty Rust project on your own. Just to be clear this is what I did to start: cargo new rust_sqlite --bin . But again, you can find all the code on Github.

You will notice that my implementation of rustyline is a bit different, but that is because I took advantage of the features like hints and auto-completion to improve the user experience from the start. But let's go.

First thing you need to do is to add the dependency to your Cargo.toml file.

[dependencies]
rustyline = "7.1.0"
log = "0.4.14"
env_logger = "0.8.3"
rustyline-derive = "0.4.0"
clap = "2.33.3"

Well, you may have noticed that I have a couple of extra dependencies there then you might have been expecting. Yes, that is true. Remember I said I was going to take advantage of some features that rustyline has? So yeah, that is why we have the rustyline-derive there, that crate is actually just some procedural macros that help us to implement the traits necessary to add things like validation, hints and completion to our REPL program with rustyline. In fact, rustyline-derive even lives within the rustyline repository on Github.

The clap crate is a command line parser also pretty mature and that takes care of a lot of the boilerplate code we would have to write to create our own CLI parser. So I decided to go with that. For now, is not actually doing much, and only showing some help information if we type for example rust-sqlite --help . But for now that is all we need.

The other two dependencies log and env_logger are there to make logging easier throughout our application, like for example, manage different log levels for us.

This is the state of our main.rs so far. I tried to add as many comments as possible but I'll still go through some lines and explain.

extern crate clap;
mod repl;

use repl::{REPLHelper, get_config};

use rustyline::error::ReadlineError;
use rustyline::{Editor};

use clap::{App, crate_version};

fn main() -> rustyline::Result<()> {
    env_logger::init();

    let _matches = App::new("Rust-SQLite")
                          .version("0.0.1")
                          .author("João Henrique Machado Silva <joaoh82@gmail.com>")
                          .about("Light version of SQLite developed with Rust")
                          .get_matches();

    // Starting Rustyline with a default configuration
    let config = get_config();

    // Getting a new Rustyline Helper
    let helper = REPLHelper::new();

    // Initiatlizing Rustyline Editor with set config and setting helper
    let mut repl = Editor::with_config(config);
    repl.set_helper(Some(helper));

    // This method loads history file into memory
    // If it doesn't exist, creates one
    // TODO: Check history file size and if too big, clean it.
    if repl.load_history("history").is_err() {
        println!("No previous history.");
    }
    // Counter is set to improve user experience and show user how many 
    // commands he has ran.
    let mut count = 1;
    loop {
        if count == 1 {
            // Friendly intro message for the user
            println!("{}{}{}{}{}",
            format!("Rust-SQLite - {}\n", crate_version!()),
            "Enter .exit to quit.\n",
            "Enter .help for usage hints.\n",
            "Connected to a transient in-memory database.\n",
            "Use '.open FILENAME' to reopen on a persistent database.");
            //TODO: Get info about application name and version dinamically.
        }

        let p = format!("rust-sqlite | {}> ", count);
        repl.helper_mut()
            .expect("No helper found")
            .colored_prompt = format!("\x1b[1;32m{}\x1b[0m", p);
        // Source for ANSI Color information: http://www.perpetualpc.net/6429_colors.html#color_list
        // http://bixense.com/clicolors/

        let readline = repl.readline(&p);
        match readline {
            Ok(command) => {
                repl.add_history_entry(command.as_str());
                // println!("Command: {}", line);
                if command.eq(".exit") {
                    break;
                }else{
                    println!("Error: unknown command or invalid arguments: '{}'. Enter '.help'", &command);
                }
            }
            Err(ReadlineError::Interrupted) => {
                break;
            }
            Err(ReadlineError::Eof) => {
                break;
            }
            Err(err) => {
                println!("Error: {:?}", err);
                break;
            }
        }
        count += 1;
    }
    repl.append_history("history").unwrap();

    Ok(())
}

For sake of keeping the code clean and readable, I like to keep my main file as clean as possible. And one thing you may have noticed is that I have a mod repl; being declared on line 2 . That is because I moved most of the business logic related to our REPL part of the program to a different module. This way our main.rs can remain clean and easy to read.

So you notice that on line 4 I am importing a struct and a function

use repl::{REPLHelper, get\_config};

that later on line 24 and line 27 I am using them to construct our helper that will be used on line 30 to construct our Editor .

The helper struct on rustyline is responsible to implementing all the behaviors we would like to have on our REPL program, for example we are making use of the Validator trait that can validate the input as the user types in real time. We are also making use of the Hinter trait, responsible for providing hints while the user is typing commands.

Maybe I got a bit carried away with those, but I think they will give a better feel to the user when using the program. Plus, sqlite3 has them, so why won't we?!

Like I mentioned all the code related to the implementation and behavior of the rustyline REPL is a separate module called repl . You can find the code below, which is pretty well commented and self explanatory I guess. To actually explain this code I would have to get into the works of Rust which not exactly the point of this post, but feel free to ask or comment anything.

use std::borrow::Cow::{self, Borrowed, Owned};

use rustyline_derive::{Helper, Completer};
use rustyline::error::ReadlineError;
use rustyline::config::OutputStreamType;
use rustyline::{CompletionType, Config, Context, EditMode};
use rustyline::validate::{MatchingBracketValidator, Validator};
use rustyline::validate::{ValidationContext, ValidationResult};
use rustyline::hint::{Hinter, HistoryHinter};
use rustyline::highlight::{Highlighter, MatchingBracketHighlighter};

// REPL Helper Struct with all functionalities
#[derive(Helper, Completer)]
pub struct REPLHelper {
    pub validator: MatchingBracketValidator,
    pub colored_prompt: String,
    pub hinter: HistoryHinter,
    pub highlighter: MatchingBracketHighlighter,
}

impl REPLHelper {
    // Default constructor
    pub fn new() -> Self {
        REPLHelper {
            // completer: FilenameCompleter::new(),
            highlighter: MatchingBracketHighlighter::new(),
            hinter: HistoryHinter {},
            colored_prompt: "".to_owned(),
            validator: MatchingBracketValidator::new(),
        }
    }
}

// Implementing trait responsible for providing hints
impl Hinter for REPLHelper {
    type Hint = String;

    // Takes the currently edited line with the cursor position and returns the string that should be 
    // displayed or None if no hint is available for the text the user currently typed
    fn hint(&self, line: &str, pos: usize, ctx: &Context<'_>) -> Option<String> {
        self.hinter.hint(line, pos, ctx)
    }
}

// Implementing trait responsible for determining whether the current input buffer is valid.
// Rustyline uses the method provided by this trait to decide whether hitting the enter key 
// will end the current editing session and return the current line buffer to the caller of 
// Editor::readline or variants.
impl Validator for REPLHelper {

    // Takes the currently edited input and returns a ValidationResult indicating whether it 
    // is valid or not along with an option message to display about the result.
    fn validate(&self, ctx: &mut ValidationContext) -> Result<ValidationResult, ReadlineError> {
        use ValidationResult::{Incomplete, /*Invalid,*/ Valid};
        let input = ctx.input();
        // let result = if !input.starts_with("SELECT") {
        //     Invalid(Some(" --< Expect: SELECT stmt".to_owned()))
        // } else 
        let result = if input.eq(".exit") {
            Valid(None)
        } else if !input.ends_with(';') {
            Incomplete
        } else {
            Valid(None)
        };
        Ok(result)
    }

    // Configure whether validation is performed while typing or only when user presses the Enter key.
    fn validate_while_typing(&self) -> bool {
        self.validator.validate_while_typing()
    }
}

// Implementing syntax highlighter with ANSI color.
impl Highlighter for REPLHelper {
    // Takes the prompt and returns the highlighted version (with ANSI color).
    fn highlight_prompt<'b, 's: 'b, 'p: 'b>(&'s self, prompt: &'p str, default: bool,) -> Cow<'b, str> {
        if default {
            Borrowed(&self.colored_prompt)
        } else {
            Borrowed(prompt)
        }
    }

    // Takes the hint and returns the highlighted version (with ANSI color).
    fn highlight_hint<'h>(&self, hint: &'h str) -> Cow<'h, str> {
        Owned("\x1b[1m".to_owned() + hint + "\x1b[m")
    }

    // Takes the currently edited line with the cursor position and returns the highlighted version (with ANSI color).
    fn highlight<'l>(&self, line: &'l str, pos: usize) -> Cow<'l, str> {
        self.highlighter.highlight(line, pos)
    }

    // Tells if line needs to be highlighted when a specific char is typed or when cursor is moved under a specific char.
    // Used to optimize refresh when a character is inserted or the cursor is moved.
    fn highlight_char(&self, line: &str, pos: usize) -> bool {
        self.highlighter.highlight_char(line, pos)
    }
}

// Returns a Config::builder with basic Editor configuration
pub fn get_config() -> Config {
    Config::builder()
        .history_ignore_space(true)
        .completion_type(CompletionType::List)
        .edit_mode(EditMode::Emacs)
        .output_stream(OutputStreamType::Stdout)
        .build()
}

Let's give it a try!!

First the CLI part.

$ ./rust\_sqlite --help  
Rust-SQLite 0.1.0  
João Henrique Machado Silva <[joaoh82@gmail.com](mailto:joaoh82@gmail.com)\>  
Light version of SQLite developed with RustUSAGE:  
    rust\_sqliteFLAGS:  
    -h, --help       Prints help information  
    -V, --version    Prints version information

Now our REPL part.

$ ./rust_sqlite
Rust-SQLite - 0.1.0
Enter .exit to quit.
Enter .help for usage hints.
Connected to a transient in-memory database.
Use '.open FILENAME' to reopen on a persistent database.
rust-sqlite | 1> random command;
Error: unknown command or invalid arguments: 'random command;'. Enter '.help'
rust-sqlite | 2> .exit
$

Alright, we set out to have a working simple CLI application with a working REPL that responds to simple commands and exits gracefully and we made it!

Next we see if we can make our app even better by start developing our command language and working with actual statements.

View on Github (pull requests are more then welcome)

If you wanna follow this track don’t forget to follow me here on Dev.to and also give some love!

What would SQLite look like if written in Rust? — Part 0

João Henrique Machado Silva — Tue, 16 Feb 2021 08:28:47 +0000

Writing a SQLite clone from scratch in Rust

Part 1 — Understanding SQLite and Setting up CLI Application and REPL

Part 2 — SQL Statement and Meta Commands Parser + Error Handling

Let's get the obvious out of the way, why am I doing this? Do I want to develop the next great embedded DB of the 21st century? Not really.

What I cannot create, I do not understand. — Richard Feynman

I am really doing it in order to understand better the works of it, and my goal is to document everything along the way. Wish me luck!

So how does a database work?
What format is data saved in? In our case, in memory and on disk.
When does it mode from memory to disk? How would that work exactly?
Why we only have one primary key per table? Do you know?
How does rolling back a transaction work?
How are indexes formatted? And how do they work?
When and how does a full table scan happen? Does it happen every time we issue a query? Do the indexes help? How?
What format is a prepared statement saved in?
What extra features can we add to our DB, if any?

Well, these are some of the questions I aim to answer along the way. And for that I will be relying heavily on the SQLite Architecture Documentation (https://www.sqlite.org/arch.html) and also here (https://www.sqlite.org/zipvfs/doc/trunk/www/howitworks.wiki).

View on Github (pull requests are welcome)

And hey, I would be lying if I said I got the idea out of the blue, no I didn't. I got the inspiration from this other great developer called Connor, that basically did the same thing, but in C, right here (https://cstack.github.io/db_tutorial/), if you wanna check it out!

If you wanna follow this track don't forget to follow me here on Dev.to and also give some love!

Continue to Part 1 →

Cheers!