DEV Community

Chris Yang
Chris Yang

Posted on

Implementation of full text search on Android

Plan A: Use Lucene

The recommended version is 4.7 with Java 6 .

Plan B: Use SQLite FTS3/FTS4

1. Introduction to FTS3 and FTS4
1.1. Differences between FTS3 and FTS4
1.2. Creating and Destroying FTS Tables
1.3. Populating FTS Tables
1.4. Simple FTS Queries
1.5. Summary
2. Compiling and Enabling FTS3 and FTS4
3. Full-text Index Queries
3.1. Set Operations Using The Enhanced Query Syntax
3.2. Set Operations Using The Standard Query Syntax
4. Auxiliary Functions - Snippet, Offsets and Matchinfo
4.1. The Offsets Function
4.2. The Snippet Function
4.3. The Matchinfo Function
5. Fts4aux - Direct Access to the Full-Text Index
6. FTS4 Options
6.1. The compress= and uncompress= options
6.2. The content= option
6.2.1. Contentless FTS4 Tables
6.2.2. External Content FTS4 Tables
6.3. The languageid= option
6.4. The matchinfo= option
6.5. The notindexed= option
6.6. The prefix= option
7. Special Commands For FTS3 and FTS4
7.1. The "optimize" command
7.2. The "rebuild" command
7.3. The "integrity-check" command
7.4. The "merge=X,Y" command
7.5. The "automerge=N" command
8. Tokenizers
8.1. Custom (Application Defined) Tokenizers
8.2. Querying Tokenizers
9. Data Structures
9.1. Shadow Tables
9.2. Variable Length Integer (varint) Format
9.3. Segment B-Tree Format
9.3.1. Segment B-Tree Leaf Nodes
9.3.2. Segment B-Tree Interior Nodes
9.4. Doclist Format
Appendix A: Search Application Tips

2.1 Setting Up the Search Interface
Learn how to add a search interface to your app and how to configure an activity to handle search queries.

2.2 Storing and Searching for Data
Learn a simple way to store and search for data in a SQLite virtual database table.

2.3 Remaining Backward Compatible
Learn how to keep search features backward compatible with older devices.

Top comments (7)

Collapse
 
peaceoff profile image
David Azevedo

Assuming you are ranking Lucene as plan A for being the best option. Why do you consider Lucene to do a better job overall than fts3/fts4?
Also, have you tried implementing each one of those? What were your findings?

Best Regards!

Collapse
 
node profile image
Chris Yang

Good comment.
Actually I need "Plan C" because Lucene is good but not good for Android , while fts3/fts4 is good but not friendly to a search developer in my opinion.
This month we tried Lucene on Android and it looked like everything is ok .Maybe we are lucky. But when we hope to improve the search result include recall rate and rank , we found it was so hard since Lucene is just a library not a search solution.

Collapse
 
peaceoff profile image
David Azevedo

I'm actually investigating different solutions for offline fts on android and as of now my best options are exactly those two.
Lucene looks to be good for android, so I don't understand your statement of "not good for Android".
I think lucene is good but its the extra "weight" that it adds to the app over sqlite fts that I am not so sure about.
What do you mean with "fts3/fts4 is good but not friendly to a search developer"?

Thread Thread
 
node profile image
Chris Yang

There is a specific scene for us --- we are building a local search engine for the whole Android system not only for one app.
For Lucene which version should I use ? That's a question,and the only way to fix is testing and testing again. While for fts3/fts4 in my opinion the index and search API is not friendly to developer at least not better than Lucene.

Thread Thread
 
peaceoff profile image
David Azevedo

I haven't looked into versions that much, but I'd go with the latest stable.
Yeah, fts3 and fts4 are the most basic, Lucene as a lot and I mean a lot of features. As your work is directly on android and not just an app, I guess you should go for Lucene given that is the more complex option. For an app, I think it depends on the purpose obviously.
FTS keeps the code more native and lightweight while still giving good results. Lucene as much more features which could be interesting like ranked search and such. but at the same time, it requires more disk space due to redundancy.

Collapse
 
peaceoff profile image
David Azevedo

I forgot to mention, but also what about elastic? Did you considered it?

Thread Thread
 
node profile image
Chris Yang • Edited

Really good idea that's worth trying. Actually as a search engineer I use ELasticsearch on server side widely :)