loading...

Filesystem versus SQLite performance

yujiri8 profile image Ryan Westlund ・2 min read

I hate SQL. That's a known fact. A while back I was reading about werc, a "sane web anti-framework", and how its comment plugin stores comments in files and directories instead of in a database. Crazy, right? I thought it was crazy. But at the same time I thought it was the most romantic thing I could imagine - not needing an SQL database would be heaven on Earth!

The biggest drawback, I figured, would be performance. Surely the filesystem would be too slow compared to an SQL database. It would make applications impractical. But I've learned not to make judgements like that without testing. So I did.

I wrote a test that generates 100k random records with an integer primary key and two fields that are just string tokens. Insert them all into an SQLite database and into a directory hierarchy, and then search them for all for a small string that I knew would be in a few of them. I made sure to use the same records for the DB and the filesystem.

import sqlite3
import secrets, os, datetime

db = sqlite3.connect("test.db")

def create():
    with db:
        db.execute("CREATE TABLE things (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, val TEXT)")

def populate():
    print("BULK INSERT")
    data = [
        (secrets.token_urlsafe(), secrets.token_urlsafe())
        for i in range(3)
    ]
    before = datetime.datetime.now()
    with db:
        db.executemany("INSERT INTO things (name, val) VALUES (?, ?)", data)
    print(f"db took {datetime.datetime.now() - before}")
    before = datetime.datetime.now()
    for i, thing in enumerate(data):
        os.mkdir('fs/'+str(i))
        with open(f'fs/{i}/name', 'w') as f:
            f.write(thing[0])
        with open(f'fs/{i}/val', 'w') as f:
            f.write(thing[1])
    print(f"fs took {datetime.datetime.now() - before}")

def query():
    query = 'abc'
    print(f"QUERY FOR {query}")
    before = datetime.datetime.now()
    with db:
        for result in db.execute("SELECT id FROM things WHERE name LIKE ? OR val LIKE ?", (f'%{query}%', f'%{query}%')):
            print(f"found result: {result}")
    print(f"db took {datetime.datetime.now() - before}")
    before = datetime.datetime.now()
    for thing in os.listdir('fs'):
        with open(f"fs/{thing}/name") as namef, open(f"fs/{thing}/val") as valf:
            if query in namef.read() or query in valf.read():
                print(f"found result: {thing}")
    print(f"fs took {datetime.datetime.now() - before}")

create()
populate()
query()
BULK INSERT
db took 0:00:00.177113
fs took 0:00:08.010363
QUERY FOR abc
...
db took 0:00:00.023509
...
fs took 0:00:02.868744

Sadly, I was right. The filesystem is tremendously slower than an SQL database.

I did consider that Python's slowness was making a big difference for the FS but not for the DB, since the SQLite library is a C extension. Just incase, I tried using Crystal for the FS query and it was about the same.

Dir.new("fs").children.each do |thing|
  if File.read("fs/#{thing}/name").includes?("abc") || File.read("fs/#{thing}/val").includes?("abc")
    puts thing
  end
end

I wonder if the truth is that werc is not reasonable for a website that needs to do a lot of searching, or that Plan 9's filesystem makes it reasonable.

Discussion

pic
Editor guide
Collapse
aleksandrhovhannisyan profile image
Aleksandr Hovhannisyan

Yup, file-based database systems are sort of a regression. They actually predate SQL and had many problems, not the least of which was speed. There are also the issues of:

  • Data redundancy and corruption (e.g., simultaneous read/write ops without proper locking mechanisms). This is enforced by RDBMSes and SQL engines.
  • Usability. With SQL, all you need is a library for your language of choice (e.g., Python) that interfaces with a particular vendor (e.g., Postgres). Plus you have a standard language for CRUD operations, and it has a sound mathematical basis in relational algebra.
  • Modern RDBMSes are much faster than file systems. They enforce keys for faster data lookups and use more efficient tree-based structures under the hood. Files are flat/plaintext.
  • Portability. File systems vary from one OS to another. SQL doesn't.
Collapse
patarapolw profile image
Pacharapol Withayasakpunt

I just have to re-question,

  • Should be binaries, or blobs, not just text files, inside a database?
  • Why can't file systems be modeled after a database structure, like B+tree?
  • How can file systems be different from key-value stores?
Collapse
jrbrtsn profile image
John Robertson

If you really hate SQL, and I do too, you can use Berkely DB. This library provides basic key-value storage and retrieval. It is very mature, robust, and efficient.
If you require transactional integrity, about the only practical approach is using an SQL database engine which supports transactions.
I've had the best luck setting up SQL tables to resemble C structs, and then writing a class for each table. I wish I had a better idea than SQL.

Collapse
hanpari profile image
Pavel Morava

SQL is fine.

I guess the filesystem database you mentioned is simply dedicated to projects with little traffic, similarly to SQLite in a sense.

Collapse
misobelica profile image
Mišo

I was always curious if this is true sqlite.org/fasterthanfs.html and was lazy to do a quick experiment. Now I believe it a little more 🙂👍