DEV Community

Discussion on: 5 ways to implement case-insensitive search in SQLite with full Unicode support

Collapse
 
mindplay profile image
Rasmus Schultz

Having spent more than a day on this, I came up with a 6th way πŸ˜…

  • Install sqlpkg
  • Run sqlpkg init (only once in a new project)
  • Run sqlpkg install nalgeon/text

Load the extension from .sqlpkg/nalgeon/text/text.so, and start using COLLATE TEXT_NOCASE in your SQL. πŸ™‚πŸ‘

So just to explain:

webinstall.dev is this wonderful initiative to provide simple, cross-platform installers, without requiring a package manager. You can learn more in this 4-minute video.

sqlpkg (sqlpkg.org) is a package manager for SQLite extensions. Note that this gets installed for the user, meaning one shared install for your projects.

The sqlpkg init command creates a .sqlpkg folder for installed SQLite extensions in your project folder - so make sure you run it from your project root. (This is optional - sqlpkg will install to your user folder by default, but you most likely want to control the installed extensions per project, and this makes it easier to locate the installed extension in code.)

The sqlpkg install nalgeon/text command installs the nalgeon/text extension, which includes a text_nocase collation, which supports Unicode.

I don't know if this is better or worse than the official icu extension from the SQLite team, but the docs for the official extension includes a list of disconcerting known bugs and caveats - and it's also not listed in the sqlpkg registry. this extension avoids replacing the like operator, or any standard functions, and instead provides a text_like function, text_upper and text_lower etc. so as not to conflict with SQL queries that rely on the standard behavior. this sounds like an overall better approach, and the developer is apparently an experienced author of many SQLite extensions.

I don't use Python, but I tested this with Deno and jsr:@db/sqlite, and it seems to work very nicely - for example, run SELECT ${"cΓ³mo estΓ‘s"} = ${"CΓ“MO ESTÁS"} COLLATE TEXT_NOCASE and you should get 1, meaning the Unicode comparison works. ✨

@shallowdepth I would love to hear your analysis of this approach? I would say we're good on simplicity, at least, and I'd assume the performance is good - based on those criteria, I think this may be the best solution I've seen so far? It does require a new type of dependency, but it is very easy to install, and doesn't require compilation. What do you think? πŸ™‚