In previous articles, we have talk about how to query with CouchDB Views and Mango Query. Both methods are working very well and able to cover a lot of use cases. Why Clouseau?
There are still very limited for CouchDB Views and Mango Query when we talk about search, there are a lot of complex searching required which makes the Views function and Mango Index more complex and harder to build, at the same time need to have a great search performance. You are still able to build your own search engine from scratch with Mango and Views. However it is very tough and you have to put a lot of resources to build a good search engine, ton of work like text preprocessing, tokenization, algorithm, ranking and etc...😰
Thanks to Clouseau brought CouchDB search to the next level🥳
Start from CouchDB v3, CouchDB can build and query full-text search indexes using an external Java service that embeds Apache Lucene. If you have been already familiar with Elasticsearch, then it is very easy for you to catch up with CouchDB + Clouseau as they are using the same Lucene Syntax.
Installation
To setup Clouseau works together with CouchDB, you may refer to my tutorial post or the official docs here.
How to use?
It is like Mango Query, create a design document for the search index function, then search with the index function.
Example Search Index Function:
function(document) {
index("default", document._id);
if (document.title) {
index("title", document.title, {"store": true});
}
if (document.status) {
index("status", document.status, { "store": false });
}
}
Design Document in full view:
{
"_id": "_design/search",
"_rev": "1-15807c8c7e310b566c0a41997d79b7fd",
"views": {},
"language": "javascript",
"indexes": {
"posts": {
"analyzer": "standard",
"index": "function(doc) {\r\n index(\"default\", doc._id);\r\n if (doc.status) {\r\n index(\"status\", doc.status, { \"store\": false });\r\n }\r\n if (doc.title) {\r\n index(\"title\", doc.title, {\"store\": true});\r\n }\r\n}"
}
}
}
Above search index function allows us to search with document ID, title and status. By default it is searching with document ID if we didn't provide any key. The "store" with boolean we pass in in the third argument is to indicate whether you want to return the value in the search result, the default value is false.
GET /YOUR_DATABASE_NAME/_design/search/_search/posts?q=ea885d7d-7af2-4858-b7bf-6fd01bcd4544
Result:
{
"total_rows": 1,
"bookmark": "g2wAAAABaANkABFjb3VjaGRiQDEyNy4wLjAuMWwAAAACYQBuBAD_____amgCRj_6gH-AAAAAYQFq",
"rows": [
{
"id": "ea885d7d-7af2-4858-b7bf-6fd01bcd4544",
"order": [
1.6563715934753418,
1
],
"fields": {
"title": "Post Two Title"
}
}
]
}
Let us try to search with post's status:
GET /YOUR_DATABASE_NAME/_design/search/_search/posts?q=status:submitted
Result:
{
"total_rows": 2,
"bookmark": "g2wAAAABaANkABFjb3VjaGRiQDEyNy4wLjAuMWwAAAACYQBuBAD_____amgCRj_0mliAAAAAYQJq",
"rows": [
{
"id": "c2ec3b79-d9ac-45a8-8c68-0f05cb3adfac",
"order": [
1.287682056427002,
0
],
"fields": {
"title": "Post One Title"
}
},
{
"id": "4a2348ca-f27c-427f-a490-e29f2a64fdf2",
"order": [
1.287682056427002,
2
],
"fields": {
"title": "Post Three Title"
}
}
]
}
Analyzers📈
Analyzers are settings that define how to recognize terms within text. Analyzers can be helpful if you need to index multiple languages.
There are 6 analyzers that are supported by the search:
classic - The standard Lucene analyzer, circa release 3.1.
email - Like the standard analyzer, but tries harder to match an email address as a complete token.
keyword - Input is not tokenized at all.
simple - Divides text at non-letters.
standard - The default analyzer. It implements the Word Break rules from the Unicode Text Segmentation algorithm
whitespace - Divides text at white space boundaries.
Based on your use cases to pick the suitable analyzer for your search index.
Geographical Searches🗺
Besides that, you can also do geographical searches in CouchDB with Lucene's built-in geospatial capabilities.😍
Example geographical data:
{
"name":"Aberdeen, Scotland",
"lat":57.15,
"lon":-2.15,
"type":"city"
}
Example search index for the geographic data:
function(doc) {
if (doc.type && doc.type == 'city') {
index('city', doc.name, {'store': true});
index('lat', doc.lat, {'store': true});
index('lon', doc.lon, {'store': true});
}
}
HTTP Request:
GET /YOUR_DATABASE_NAME/_design/YOUR_DESIGN_DOC_NAME/_search/SEARCH_INDEX_NAME?q=lat:[0+TO+90]&sort="<distance,lon,lat,-74.0059,40.7127,km>"
Abbreviated Result:
{
"total_rows": 205,
"bookmark": "g1A...XIU",
"rows": [
{
"id": "city180",
"order": [
8.530665755719783,
18
],
"fields": {
"city": "New York, N.Y.",
"lat": 40.78333333333333,
"lon": -73.96666666666667
}
},
{
"id": "city177",
"order": [
13.756343205985946,
17
],
"fields": {
"city": "Newark, N.J.",
"lat": 40.733333333333334,
"lon": -74.16666666666667
}
},
{
"id": "city178",
"order": [
113.53603438866077,
26
],
"fields": {
"city": "New Haven, Conn.",
"lat": 41.31666666666667,
"lon": -72.91666666666667
}
}
]
}
Thank you for reading.
There are more you can do with CouchDB search. Do checkout the official documentation here and also Lucene Syntax as CouchDB search query syntax is using the Lucene Syntax.😊
Top comments (0)