DEV Community

Grant
Grant

Posted on

Complex Filtering in Laravel Scout with Typesense

tl;dr Passing an array as the second parameter in the query builder's where function allows you to use different comparison operators for your Typesense filters as [$operator, $value].

Laravel Scout is the first-party solution to adding a variety of full-text search tools, such as Algolia, Meilisearch or Typesense. Since it's a plug-and-play solution, they have to make the query builder API a one-size-fits-all solution for each driver.

In this post I'll talk about how I added support for more complex filtering for Typesense that was undocumented in Scout using the built-in where query builder helper.

I'll be talking in the context of Laravel Scout with Typesense only. These principles or ideas won't necessarily apply to the other search tools.

You probably don't need Scout

First, a bit of a warning.

In my experience, you probably don't actually need Scout. Scout adds technical burden (read: debt) in a few ways, and unless you absolutely need full-text searching capabilities, with built-in typo support or other unique features, a simple query will do the trick most often than not.

  • Configuration takes time. Setting up the fields, applying the configuration to each model, and ensuring the right data types can be time consuming. There are some caveats, such as you cannot use null for a string field.
  • You have to maintain the schema for the collection in Typesense. This is a deployment burden if/when the schema for your model changes.
    • If it changes often enough, then you'll be doing a lot of scout:flush and scout:import calls. If you're not careful, exceptions will be thrown before you're able to update it in production.
  • Keeping the collection up-to-date is sometimes tricky. Scout does its best to keep the data in Typesense current, but getting out of sync is still easy.

When you need it

When you do actually need the capabilities of Typesense, then it's absolutely powerful. If you can overcome the annoyances I've described above and are certain you need to leverage Typesense, you have a lot of power at your fingertips.

Reading all the search options available in Typesense, it's easy to see they do all the heavy lifting for filtering and searching collections. It's fast and efficient, as advertised.

There are things to keep in mind in a Laravel app with Scout, though.

  • You have to really understand the Scout query builder. Again, it's a unified query builder API for each vendor. This means it's simplified compared to what each vendor actually offers.
  • Laravel is searching in Typesense, then converting that back to your model instances using your own database. You have to have a good mental model of the Typesense collection schema and the one that it actually references.
  • Relationships aren't a thing in Typesense. You have to be creative in the ways that build your Typesense collection schema so that you can leverage filtering based on relationships.

The Setup

GitHub logo grantholle / scout-typesense-example

An example of using Typesense with Scout using undocumented filtering

Laravel Scout with Typesense

  1. Clone the repo.
  2. Run composer install.
  3. Copy .env.example to .env.
  4. Set your Typesense server URL and API key in the .env file.
  5. php artisan key:generate
  6. php artisan migrate --seed
  7. php artisan scout:import "App\Models\Post"



I've created a sample repo that shows how this works.

It's a simple configuration with a Post model that has many Comment relationships. The Scout configuration for Typesense looks like this:

\App\Models\Post::class => [
    'collection-schema' => [
        'fields' => [
            [
                'name' => 'id',
                'type' => 'string',
            ],
            [
                'name' => 'title',
                'type' => 'string',
            ],
            [
                'name' => 'content',
                'type' => 'string',
            ],
            [
                'name' => 'comments',
                'type' => 'string[]',
            ],
            [
                'name' => 'created_at',
                'type' => 'int64',
            ],
        ],
        'default_sorting_field' => 'created_at',
    ],
    'search-parameters' => [
        'query_by' => 'title,content,comments',
    ],
],
Enter fullscreen mode Exit fullscreen mode

When we call Post::search(), by default we want to search across the title, content and comment contents. We can adjust this if we wanted to, but this will allow us to leverage Typesense in a way that is harder than a typical Eloquent query. Again, the uses cases are less common than you might think for reaching for Scout. A blog search is a good example, though.

Here's how we're indexing our actual data:

public function makeSearchableUsing(Collection $models): Collection
{
    return $models->load('comments');
}

public function toSearchableArray(): array
{
    return [
        'id' => (string) $this->id,
        'title' => $this->title,
        'content' => $this->content,
        'comments' => $this->comments->pluck('content')->toArray(),
        'created_at' => $this->created_at->timestamp,
    ];
}
Enter fullscreen mode Exit fullscreen mode

First, we're making the search query a bit more optimized by eager loading comments in makeSearchableUsing(). Then we're populating the fields we configured in scout.php.

We're wanting to search by comments as well, so we're creating an array of each comment content, which is supported by the field's string[] type. Now, Typesense knows and can search based on the comments. Note: since we're indexing the Post model, it's not going to reindex when a comments change. We're going to have to re-trigger the index manually on the comment's post when comments change.

One caveat I'll point out here: if title was nullable in our database, we'd want to update toSearchableArray() in the title field to be 'title' => $this->title ?? '',. If we had null as a title value, indexing would throw an error: "Error importing document: Field title must be a string."

Searching

When we visit the endpoint, there's a simple controller to perform the searches for us.

public function __invoke(Request $request)
{
    $posts = Post::search($request->input('q', '*'))
        ->when(is_numeric($request->input('created')), function (Builder $query) use ($request) {
            $query->where('created_at', ['>=', now()->subDays($request->integer('created'))->timestamp]);
        })
        ->when(! empty($request->input('exclude')), function (Builder $query) use ($request) {
            $query->whereNotIn('id', explode(',', $request->input('exclude')));
        })
        ->when($request->input('not_title'), function (Builder $query, string $title) {
            $query->where('title', ['!', $title]);
        })
        ->get();

    return $posts->load('comments');
}
Enter fullscreen mode Exit fullscreen mode

Here's the breakdown of the search options:

  • If we pass in a q query string variable, this is the whole point of Scout. This performs the "search" in Typesense leveraging all of its magic. It's searching across the fields we've set up to query_by. It's typo-tolerant, etc..
  • Sending a created value will search for posts that have been created after X days ago. More on this later.
  • We can exclude post ID's by comma separating them in exclude
  • Lastly, with not_title we can exclude posts where titles contain the given values.

So, what's special?

Behind the scenes, Scout is using the query builder to retrieve results from Typesense and taking those results to subsequently query the actual database to retrieve the model values.

In this example, we could have just leveraged Scout for its searching capabilities and called it a day. However, that's not really a real-world example. Usually we're going to have multiple filtering options like I've implemented.

Since Scout is a one-size-fits-all and each driver will implement filtering differently, it's basic by nature out of the box. As the docs state:

Since a search index is not a relational database, more advanced "where" clauses are not currently supported.

We can hook into the query that's used to retrieve our models after performing the search in Typesense using the query hook. There's also a catch there:

Since this callback is invoked after the relevant models have already been retrieved from your application's search engine, the query method should not be used for "filtering" results. Instead, you should use Scout where clauses.

Here's an example. Let's say we're just performing our search and paginating on those results with page sizes of 5. If we use the query hook to filter further, we're only going to filter on those 5 results that were returned. Each page would have inconsistent number of results, or even no results if the filter doesn't apply to any of those 5.

Scout supports simple where's out of the box, but it's always strict comparison. When it's static, like give me posts with a given tag ID (tag = 1), then it's straightforward. But what about other types of comparisons?

Spoiler: Typesense allows us to configure the comparison operators.

Behind the scenes

In our example, we're retrieving posts that were created with in the last X days (created_at > 10 days ago). By default, Scout doesn't support this. The method signature for where is the following:

public function where($field, $value)
{
    $this->wheres[$field] = $value;

    return $this;
}
Enter fullscreen mode Exit fullscreen mode

It only accepts a field key and its value. Unlike Eloquent's query builder, which accepts an $operator parameter, Scout's is simplified.

The good news is, since each engine (Algolia, Typesense, Meilisearch) implements the filtering themselves, we can dive into Typesense's engine to see what's possible.

The beginning of the hunt for the functionality comes from buildSearchParameters(). The key we're after for filtering is the filter_by key, which is set by the filters() function, which ultimately builds the filter clauses in parseWhereFilter():

protected function parseWhereFilter(array|string $value, string $key): string
{
    return is_array($value)
        ? sprintf('%s:%s', $key, implode('', $value))
        : sprintf('%s:=%s', $key, $value);
}
Enter fullscreen mode Exit fullscreen mode

If the $value we pass in is an array, it's constructing the comparison using the two parameters. Otherwise, it's a strict "exactly equals" filter (:=).

This is what explains our filter of created_at in the controller:

$query->where('created_at', ['>=', now()->subDays($request->integer('created'))->timestamp]);
Enter fullscreen mode Exit fullscreen mode

We can do numeric filtering now (greater than, less than, etc.) by passing an array as the value.

$query->where($field, [$comparison, $value]);
Enter fullscreen mode Exit fullscreen mode

Now we can retrieve consistent results for pagination without having to rely on Eloquent to filter further for us which will be inconsistent behavior. The same goes for the not_title query variable. We can use the :! Typesense operator (not exact contains) to exclude posts with a title that contains a given value.

Conclusion

Thankfully, the Scout Typesense engine was built to be flexible enough to support these filters out of the box. This allows us to leverage Typesense to its full potential. By passing a tuple of [$operator, $value], we can use Typesense's built-in filtering much simpler, providing an excellent all-in-one search/filter solution for our data.

Extra credit: the Typesense also allows us to define a typesenseSearchParameters() function on the search model to include or overwrite other parameters that it supports when searching.

Top comments (0)