Hello Devs, 
The team at Swirl has created this amazing guide which contains all the relevant information for anyone who wants to extend Swirl by adding SearchProviders, Connectors, and Processors.
This makes it easy for you to contribute to Swirl. Get started with open source with Swirl. And we're participating in Hacktoberfest, giving out Swags to the contributors. Swags are up to $100, please check the blog here for more information.
Learn more about Swirl by checking out this article below.
 
       
      Creating an π©βπ» Open Source Search Platform: Search Engines with AI - Swirl π
πππππππ πππ for SWIRL γ» Sep 11 '23
Give Swirl a π on GitHub.
Table of contents
Prerequisites
- Latest python, 3.11 or later, installed locally
% python
Python 3.11.1 (v3.11.1:a7a450f84a, Dec  6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
- Redis installed and running
- Swirl installed locally (not in Docker) and running
% python swirl.py status
__S_W_I_R_L__2_._6__________________________________________________________
Service: redis...RUNNING, pid:31012
Service: django...RUNNING, pid:31014
Service: celery-worker...RUNNING, pid:31018
  PID TTY           TIME CMD
31012 ttys000    0:20.11 redis-server *:6379 
31014 ttys000    0:12.04 /Library/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python /Library/Frameworks/Python.framework/Versions/3.11/bin/daphne -b 0.0.0.0 -p 8000 swirl_server.asgi:application
Background: Understanding the Swirl Search Workflow
In a nutshell:
- User creates a query - example: http://localhost:8000/swirl/search/?q=ai
- Pre-query processing - example: SpellcheckQueryProcessor
π Each Search Provider executes in parallel
- Query processing - example: AdaptiveQueryProcessor
- Connector - example: RequestsGet
- Result processing - example: MappingResultsProcessor
π End parallel processing
- Post-result processing - example: CosineRelevancyPostResultProcessor
- Ranked results available via mixer - example: http://localhost:8000/swirl/results/?search_id=1
For more information, consult the Developer Guide Workflow Overview.
Creating a SearchProvider
A SearchProvider is a configuration of a Connector. So, to connect to a given source, first, verify that it supports a Connector you already have. (See the next tutorial for information on creating new Connectors.)
For example, if trying to query a website using a URL like https://host.com/?q=my+query+here that returns JSON or XML, create a new SearchProvider configuring the RequestsGet connector as follows:
- Copy any of the Google PSE SearchProviders
Modify the url and query_template to construct the query URL. Using the above example:
{
        "url": "https://host.com/",
        "query_template": "{url}?q={query_string}",
}
To learn more about query and URL parameters, refer to the Developer Guide.
- If the website offers the ability to page through results, or sort results by date (as well as relevancy), use the PAGE=andDATE_SORTquery mappings to add support for these features through Swirl.
For more information refer to the User Guide, Query Mappings section:
- Open the query URL in a browser and look through the JSON response.
If using Visual Studio Code, right-click on the pasted JSON and select Format Document to make it easier to read.
- Identify the results list and the number of results found and retrieved. Put these JSON paths in the response_mappings. Then, identify the JSON paths to use to extract the Swirl default fields title,body,url,date_publishedandauthorfrom each item in the result lists in the result_mappings, with the Swirl field name on the left, and the source JSON path on the right.
For example:
        "response_mappings": "FOUND=searchInformation.totalResults,RETRIEVED=queries.request[0].count,RESULTS=items",
        "result_mappings": "url=link,body=snippet,author=displayLink,cacheId,pagemap.metatags[*].['og:type'],pagemap.metatags[*].['og:site_name'],pagemap.metatags[*].['og:description'],NO_PAYLOAD",
- Add credentials as required for the service.
The format to use depends on the type of credential. Details are here: User Guide Credentials Section
- Add a suitable tag that can be used to describe the source or what it knows about.
Spaces are not permitted; good tags are clear and obvious when used in a query, like company:tesla or news:openai.
For more about tags, see: Organizing SearchProviders
- Review the finished SearchProvider:
{
        "name": "My New SearchProvider",
        "connector": "RequestsGet",
        "url": "https://host.com/",
        "query_template": "{url}?q={query_string}",
        "query_processors": [
            "AdaptiveQueryProcessor"
        ],
        "query_mappings": "",
        "result_processors": [
            "MappingResultProcessor",
            "CosineRelevancyResultProcessor"
        ],
        "response_mappings": "FOUND=jsonpath.to.number.found,RETRIEVED=jsonpath.to.number.retrieved,RESULTS=jsonpath.to.result.list",
        "result_mappings": "url=link,body=snippet,author=displayLink,NO_PAYLOAD",
        "credentials": "bearer=your-bearer-token-here",
        "tags": [
            "MyTag"
        ]
    }
- Go to Swirl - localhost:8000/swirl/searchproviders/, logging in if necessary. Put the form at the bottom of the page into RAW mode, and paste the SearchProvider in. Then hit POST. The SearchProvider will reload.
- Go to Galaxy - localhost:8000/galaxy/and run a search using the tag you created earlier. Results should again appear in roughly the same period of time.
Creating a Connector
In Swirl, Connectors are responsible for loading a SearchProvider, then constructing and transmitting queries to a particular type of service, then saving the response - typically a result list.
:info: Consider using your favorite coding AI to generate a Connector by passing it the Connector base classes, and information about the API you are trying to query.
:info: If you are trying to send an HTTP/S request to an endpoint that returns JSON or XML, you don't need to create a Connector. Instead, Create a SearchProvider that configures the RequestsGet connector included with Swirl.
To create a new Connector:
- Create a new file, e.g. - swirl/connectors/my_connector.py
- Copy the style of the - ChatGPTconnector as a starting point, or- BigQueryit targeting a database.
 
class MyConnector(Connector):
    def __init__(self, provider_id, search_id, update, request_id=''):
        self.system_guide = MODEL_DEFAULT_SYSTEM_GUIDE
        super().__init__(provider_id, search_id, update, request_id)
In the init class, load and persist anything that will be needed when connecting and querying the service. Use the ChatGPT Connector as a guide.
- Import the python package(s) to connect to the service. The ChatGPT connector uses the openai package, for example:
import openai
- Modify the execute_search method to connect to the service.
As you can see from the ChatGPT Connector, it first loads the OpenAI credentials, then constructs a prompt, sends the prompt via openai.ChatCompletion.create(), then stores the response.
    def execute_search(self, session=None):
        logger.debug(f"{self}: execute_search()")
        if self.provider.credentials:
            openai.api_key = self.provider.credentials
        else:
            if getattr(settings, 'OPENAI_API_KEY', None):
                openai.api_key = settings.OPENAI_API_KEY
            else:
                self.status = "ERR_NO_CREDENTIALS"
                return
        prompted_query = ""
        if self.query_to_provider.endswith('?'):
            prompted_query = self.query_to_provider
        else:
            if 'PROMPT' in self.query_mappings:
                prompted_query = self.query_mappings['PROMPT'].format(query_to_provider=self.query_to_provider)
            else:
                prompted_query = self.query_to_provider
                self.warning(f'PROMPT not found in query_mappings!')
        if 'CHAT_QUERY_REWRITE_GUIDE' in self.query_mappings:
            self.system_guide = self.query_mappings['CHAT_QUERY_REWRITE_GUIDE'].format(query_to_provider=self.query_to_provider)
        if not prompted_query:
            self.found = 0
            self.retrieved = 0
            self.response = []
            self.status = "ERR_PROMPT_FAILED"
            return
        logger.info(f'CGPT completion system guide:{self.system_guide} query to provider : {self.query_to_provider}')
        self.query_to_provider = prompted_query
        completions = openai.ChatCompletion.create(
            model=MODEL,
            messages=[
                {"role": "system", "content": self.system_guide},
                {"role": "user", "content": self.query_to_provider},
            ],
            temperature=0,
        )
        message = completions['choices'][0]['message']['content'] # FROM API Doc
        self.found = 1
        self.retrieved = 1
        self.response = message.replace("\n\n", "")
        return
ChatGPT depends on the OpenAI API key, which is provided to Swirl via the .env file. To follow this pattern, create new values in .env then modify swirl_server/settings.py to load them as Django settings, and set a reasonable default.
- Modify the normalize_response()method to store the raw response. This is literally no more (or less) than writing the result objects out as a Python list and storing that inself.results:
    def normalize_response(self):
        logger.debug(f"{self}: normalize_response()")
        self.results = [
                {
                'title': self.query_string_to_provider,
                'body': f'{self.response}',
                'author': 'CHATGPT',
                'date_published': str(datetime.now())
            }
        ]
        return
There's no need to do this if self.response is already a python list.
- Add the new Connector to swirl/connectors/__init__.py
from swirl.connectors.my_connector import MyConnector
- Restart Swirl
% python swirl.py restart core
- Create a SearchProvider to configure the new Connector, then add it to the Swirl installation as noted in the Create a SearchProvider tutorial.
Don't forget a useful tag so you can easily target the new connector when ready to test.
To learn more about developing Connectors, refer to the Developer Guide.
Creating a QueryProcessor
A QueryProcessor is a stage executed either during Pre-Query or Query Processing. The difference between these is that Pre-Query processing is applied to all SearchProviders, and Query Processing is executed by each individual SearchProviders. In both cases, the goal is to modify the query sent to some or all SearchProviders.
Note: if you just want to rewrite the query using lookup tables or regular expressions, consider  using QueryTransformations instead
To create a new QueryProcessor:
- Create a new file, e.g. - swirl/processors/my_query_processor.py
- Copy the - GenericQueryProcessorclass as a starting point, and rename it:
 
class MyQueryProcessor(QueryProcessor):
    type = 'MyQueryProcessor'
    def process(self):
        # TO DO: modify self.query_string, and return it 
        return self.query_string + ' modified'
Save the module.
- Add the new module to swirl/processors/__init__.py
from swirl.processors.my_processor import MyQueryProcessor
- Add the new module to the Search.pre_query_processing pipeline or at least one SearchProvider.query_processing pipeline:
SearchProvider:
        "query_processors": [
            "AdaptiveQueryProcessor",
            "MyQueryProcessor"
        ],
Search:
  {
        "query_string": "news:ai",
        "pre_query_processors": [
          "MyQueryProcessor"
        ],
  }
- Restart Swirl
% python swirl.py restart core
- Go to Galaxy http://localhost:8000/swirl/search/?q=some+query
Run a search; if using a query processor be sure to target that SearchProvider. For example if you added a QueryProcessor to a SearchProvider query_processing pipeline with tag "news", the query would be http://localhost:8000/swirl/search/?q=news:some+query instead.
Results should appear in a just a few seconds. In the messages block a message indicating that the new QueryProcessor rewrote the query should appear:
MyQueryProcessor rewrote Strategy Consulting - Google PSE's query to: <modified-query> 
To learn more about writing Processors, refer to the Developer Guide.
Creating a ResultProcessor
A ResultProcessor is a stage executed by each SearchProvider, after the Connector has retrieved results. ResultProcessors operate on results and transform them as needed for downstream consumption or presentation.
The GenericResultProcessor and MappingResultProcessor stages are intended to normalize JSON results. GenericResultProcessor searches for exact matches to the Swirl schema (as noted in the SearchProvider example) and copies them over. MappingResultProcessor applies result_mappings to normalize the results, again as shown in the SearchProvider example above. In general adding stages after these is a good idea, unless the SearchProvider is expected to respond in a Swirl schema format.
To create a new ResultProcessor:
- Create a new file, e.g. - swirl/processors/my_result_processor.py
- Copy the - GenericResultProcessorclass as a starting point, and rename it. Don't forget the init.
 
class MyResultProcessor(ResultProcessor):
    def __init__(self, results, provider, query_string, request_id='', **kwargs):
        super().__init__(results, provider, query_string, request_id=request_id, **kwargs)
- Implement the process()method. This is the only one required.
Process() operates on self.results, which will contain all the results from a given SearchProvider, in python list format. Modify items in the result list, and report the number updated.
    def process(self):
        if not self.results:
            return
        updated = 0
        for item in self.results:
            # TO DO: operate on each item and count number updated
            item['my_field1'] = 'test'
            updated = updated + 1
        # note: there is no need to save in this type of Processor
        # save modified self.results
        self.processed_results = self.results
        # save number of updated
        self.modified = updated
        return self.modified
Save the module.
- Add the new module to swirl/processors/__init__.py
from swirl.processors.my_processor import MyResultProcessor
- Add the new module to the at least one SearchProvider.result_processing pipeline:
        "result_processors": [
            "MappingResultProcessor",
            "MyResultProcessor",
            "CosineRelevancyResultProcessor"
        ],
         ...etc...
- Restart Swirl
% python swirl.py restart core
- Go to Galaxy http://localhost:8000/swirl/search/?q=some+query
Run a search; be sure to target at least one SearchProvider that has the new ResultProcessor.
For example if you added a ResultProcessor to a SearchProvider result_processing pipeline with tag "news", the query would need to be http://localhost:8000/swirl/search/?q=news:some+query instead of the above.
Results should appear in a just a few seconds. In the messages block a message indicating that the new ResultProcessor updated a number of results should appear, and the content should be modified as expected.
MyResultProcessor updated 5 results from: MyConnector",
To learn more about writing Processors, refer to the Developer Guide.
Creating a PostResultProcessor
A PostResultProcessor is a stage executed after all SearchProviders have returned results. They operate on all the results for a given query.
To create a new ResultProcessor:
- Create a new file, e.g. - swirl/processors/my_post_result_processor.py
- Copy the template below as a starting point, and rename it: 
 
class MyPostResultProcessor(PostResultProcessor):
    type = 'MyPostResultProcessor'
    ############################################
    def __init__(self, search_id, request_id = ''):
        return super().__init__(search_id, request_id=request_id)
    ############################################
    def process(self):
        updated = 0
        for results in self.results:
            if not results.json_results:
                continue
            for item in results.json_results:
                # TO DO: operate on each result item
                item['my_field2'] = "test"
                updated = updated + 1
            # end for
            # call results.save() if any results were modified
            if updated > 0:
                results.save()
        # end for
        ############################################
        self.results_updated = updated
        return self.results_updated
Modify the process() method, operating on the items and saving each result set as shown.
- Add the new module to swirl/processors/__init__.py
from swirl.processors.my_post_result_processor import MyPostResultProcessor
- Add the new module to the Search.post_result_processing pipeline:
  {
        "query_string": "news:ai",
        "post_result_processors": [
            "DedupeByFieldPostResultProcessor",
            "CosineRelevancyPostResultProcessor",
            "MyPostResultProcessor"
        ],
        ...etc...
    }
- Restart Swirl
% python swirl.py restart core
- Go to Galaxy http://localhost:8000/swirl/search/?q=some+query
Run a search; be sure to target at least one SearchProvider that has the new PostResultProcessor.
For example if you added a PostResultProcessor to a Search post_result_processing pipeline with tag "news", the query would need to be http://localhost:8000/swirl/search/?q=news:some+query instead of the above.
Results should appear in a just a few seconds. In the messages block a message indicating that the new PostResultProcessor updated a number of results should appear, and the content should be modified as expected.
MyPostResultProcessor updated 10 results from: MySearchProvider
To learn more about writing Processors, refer to the Developer Guide.
Join the Community
- Email: support@swirl.today with issues, requests, questions, etc - we'd love to hear from you! 
Give Swirl a π on GitHub.
       swirlai
       / 
        swirl-search
      
        swirlai
       / 
        swirl-search
      
    
    AI Search & RAG Without Moving Your Data. Get instant answers from your company's knowledge across 100+ apps while keeping data secure. Deploy in minutes, not months.
SWIRL
Give your team ChatGPT-level search without moving data to the cloud
RAG with One Drive & Microsoft 365 in 60 seconds
Ask question β Get answer with sources β Click through to source
Watch it on Youtube
Teams using SWIRL saves an average 7.5 hours of productive time per week.
π€ Why SWIRL?
Skip the Complexity, Keep the Power
 
 
β Without SWIRL
 
β
 With SWIRL
π Built Different
No Vector DB Drama
# No need for:
$ setup-vector-db
$ migrate-data
$ configure-indexes
# Just this:
$ curl https://raw.githubusercontent.com/swirlai/swirl-search/main/docker-compose.yaml -o docker-compose.yaml
π‘ What Can You Build With SWIRL?
Real examples of what teams build with SWIRL:
 
 
              
 
                      




 
    
Top comments (2)
Probably should start contributing to Swirl!
Swirl will help me learn Python which has been on my list. Thanks for the thorough tutorial, I would love to start contributing .