Sergey Nikolaev

Posted on May 31, 2024 • Edited on Nov 6, 2024 • Originally published at manticoresearch.com

Manticore Search 6.3.0

We're excited to announce the release of Manticore Search 6.3.0! This version brings a host of enhancements, new features, and updates, making your search engine even more powerful and user-friendly.

Vector Search

Float vector data type: We've introduced the float_vector data type, which allows you to store and query floating-point number arrays. This is particularly useful for applications that need to perform similarity searches using vector search.
Vector search capability: Coupled with the new data type, the vector search feature enables you to execute k-nearest neighbor (KNN) vector searches. This is ideal for building more intuitive and responsive search functionalities in apps. Read more in the blog post Vector Search in Manticore.

JOIN (beta)

The addition of JOIN capabilities in Manticore Search although still in beta, represents a significant enhancement to the way users can perform queries and manage data relationships. Read more in the documentation.

Example:

SELECT * FROM purchases AS p LEFT JOIN articles AS a ON a.id = p.article_id:
+------+------------+-------------+------+-------+-------------+
| id   | article_id | customer_id | id   | title | @right_null |
+------+------------+-------------+------+-------+-------------+
|    1 |          1 |          10 |    1 | book  |           0 |
|    2 |          1 |          11 |    1 | book  |           0 |
|    3 |          3 |          10 |    0 |       |           1 |
+------+------------+-------------+------+-------+-------------+

REGEX

The new REGEX operator significantly improves how you can search for complex text patterns. This feature is especially important in areas that need very accurate search results, such as analyzing patents, reviewing contracts, and searching for trademarks.

For instance, in data analytics, the REGEX operator can help find specific error codes or programming patterns in log files or code. In academic research, it makes it easier to find articles that use certain citation styles. For trademark searches, this tool is excellent for spotting trademarks that are exactly the same or very similar. This enhancement makes Manticore Search much more powerful and precise for handling detailed and complex searches.

Range() and histogram()

The new RANGE function enhances aggregation, faceting, and grouping by categorizing values into specified intervals. These intervals are defined using range_from and range_to, which determine the boundaries within which values fall. This functionality allows for effective sorting and analysis of data based on user-defined ranges.

Example:

 select * from test;
+---------------------+-----------+-------+
| id                  | data      | value |
+---------------------+-----------+-------+
| 8217240980223426563 | Product 1 |    12 |
| 8217240980223426564 | Product 2 |    15 |
| 8217240980223426565 | Product 3 |    23 |
| 8217240980223426566 | Product 4 |     3 |
+---------------------+-----------+-------+

SELECT COUNT(*), RANGE(value, {range_to=10},{range_from=10,range_to=25},{range_from=25}) price_range FROM test GROUP BY price_range ORDER BY price_range ASC;
+----------+-------------+
| count(*) | price_range |
+----------+-------------+
|        1 |           0 |
|        3 |           1 |
+----------+-------------+

The HISTOGRAM() function in Manticore Search categorizes data into buckets based on a specified bucket size. It returns the bucket number for each value, using hist_interval and hist_offset parameters to determine the appropriate bucket. The function calculates the bucket key by measuring the distance from the starting point of the bucket, adjusted by the interval size. This feature is especially useful for creating histograms, which group data into specific value ranges for easier analysis and visualization.

Example:

select count(*), histogram (value, {hist_interval=10}) as price_range from test GROUP BY price_range ORDER BY price_range ASC;
+----------+-------------+
| count(*) | price_range |
+----------+-------------+
|        1 |           0 |
|        2 |          10 |
|        1 |          20 |
+----------+-------------+

There are also date_range and date_histogram for similar aggregations with date/time data.

New commands to simplify data updates and schema management

ALTER TABLE ... type='distributed' lets you change a distributed table without having to delete it first.
CREATE TABLE ... LIKE ... WITH DATA makes it easy to copy a real-time table along with all its data.
Use REPLACE INTO ... SET for updating parts of records in a table.
Attaching one real-time table to another combines two tables into one.
Rename a real-time table with ALTER TABLE ... RENAME to keep your database organized.

Replication-related changes

Significant changes have been made in the replication area to improve the process of data transmission between nodes. Replication error when transferring large files has been fixed, a mechanism for retrying command execution has been added, and network management during replication has been improved. Issues with blocking during replication and attribute updates have also been resolved, and the functionality of skipping replication update commands has been added for nodes joining the cluster. All these changes allow for increased efficiency and reliability of the replication process in various usage scenarios.

For detailed information about the changes, see here.

License change and performance optimizations

We've changed the Manticore Search license to GPLv3-or-later. This new license offers better legal safety for users and works better with other open-source licenses. This change shows our dedication to meeting the needs of the community and keeping open-source software strong. In version 6.3.0, we added the Apache 2 licensed CCTZ library, which makes date/time functions much faster. Look at the improvement:

Before:

mysql> select count(*),year(time_local) y, month(time_local) m from logs10m where y>2010 and m<5;
+----------+------+------+
| count(*) | y    | m    |
+----------+------+------+
| 10365132 | 2019 |    1 |
+----------+------+------+
1 row in set (8.26 sec)

Now:

mysql> select count(*),year(time_local) y, month(time_local) m from logs10m where y>2010 and m<5;
+----------+------+------+
| count(*) | y    | m    |
+----------+------+------+
| 10365132 | 2019 |    1 |
+----------+------+------+
1 row in set (0.11 sec)

The query is now 75 times faster.

We have also improved how tables are compacted. Previously, when merging disk chunks, Manticore removed deleted documents from any chunks that had them, using a lot of resources. We have stopped using this method. Now, merging chunks is managed only by the progressive_merge setting, which makes the process simpler and less heavy on resources.