Intro
My latest side Rails project is Search Engine.
dato.rss is a simple, clean and fast RSS search engine with a RESTful API.
the project is divided into
Search Engine: Quickly search through the millions of available RSS feeds.
RESTful API: Turns feed data into an awesome API. The API simplifies how you handle RSS, Atom, or JSON feeds. You can add and keep track of your favourite feed data with a simple, fast and clean REST API. All entries are enriched by Machine Learning and Semantic engines.
Search
Search is just implemented with Full Text Search Postgres feature.
I used the pg_search Gem, which can be used in two ways:
Multi Search: Search across multiple models and return a single array of results. Imagine having three models: Product, Brand, and Review. Using Multi Search we could search across all of them at the same time, seeing a single set of search results. This would be perfect for adding federated search functionality to your app.
Search Scope: Search within a single model, but with greater flexibility.
for the implementation I found very useful the article https://pganalyze.com/blog/full-text-search-ruby-rails-postgres
    execute <<-SQL
      ALTER TABLE entries
      ADD COLUMN searchable tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('simple', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('simple', coalesce(body,'')), 'B') ||
        setweight(to_tsvector('simple', coalesce(url,'')), 'C')
      ) STORED;
    SQL
Search View
Feed Rank
Feed Ranking is provided by openrank a free root domain authority metric based on the common search pagerank dataset. The value is normilized by
((Math.log10(domain_rank) / Math.log10(100)) * 100).round
Machine Learning
Machine Learning is provided by dandelion API Semantic Text Analytics as a service, from text to actionable data. Extract meaning from unstructured text and put it in context with a simple API.
RESTful API
All API documentation is in the Wiki section. Feel free to make it better, of course.
https://github.com/davidesantangelo/dato.rss/wiki
To use some features such as adding a new feed you need a token with write permission. Currently only I can enable it. In case contact me
Github
       davidesantangelo
       / 
        dato.rss
      
        davidesantangelo
       / 
        dato.rss
      
    
    The best RSS Search experience you can find
DATO.RSS
A seamless RSS Search Engine experience with a hint of Machine Learning.
SEED
An SQL dump of the database with over 3 million entries extracted in over a year can be downloaded at https://davidesantangelo.gumroad.com/l/nkyymb
BETA
Dato.RSS is in beta, and will likely see many changes in the near future.
If you have comments or suggestions, please send them to us using the Issues TAB.
Thanks for trying the beta!
Search Engine: Quickly search through the millions of available RSS feeds.
RESTful API: Turns feed data into an awesome API. The API simplifies how you handle RSS, Atom, or JSON feeds. You can add and keep track of your favourite feed data with a simple, fast and clean REST API. All entries are enriched by Machine Learning and Semantic engines.
Example
curl 'https://<yourhost>/api/searches?q=news' | json_pp
{
  "data": [
    {
      "id": "86b0f829-e300-4eef-82e1-82f34d03aff6… 
 
              


 
    
Top comments (0)