DEV Community

Cover image for Integrating ChatGPT with DuckDuckGo
Thomas Hansen for AINIRO.IO

Posted on • Edited on • Originally published at ainiro.io

Integrating ChatGPT with DuckDuckGo

Earlier this morning OpenAI turned off Bing integration in ChatGPT. A couple of hours later we released DuckDuckGo support for our chatbots. You're welcome 😁

You can try it out by clicking our AI chatbot button in the bottom/right corner of this page and write something such as follows; DuckDuckGo "Pixar"

The double quotes are important, because we use these to infer what you want to send to DuckDuckGo, and what is the prompt that we use to semantically search our database, and what we're sending to ChatGPT to transform your query into an answer of some sort. The recipe for searching is as follows;

DuckDuckGo "xyz search query here"

Currently it will only return the top 5 results, and the query towards DuckDuckGo is cached on the server for 5 minutes. But this can be modified easily.

How it works?

When you create a training snippet for your semantic search database, you can add Hyperlambda code as an integrated part of your training snippet. This Hyperlambda code is not a part of your training snippet as you create embeddings for it. When we generate vectors for our training snippet using text-embedding-ada-002 everything OpeAI sees is the following;

DuckDuckGo

This results is that if you start any query towards our VSS database with the word "DuckDuckGo", this snippet will bubble to the top. This training snippet looks as follows;

{{
.url
strings.split:x:@.arguments/*/prompt
   .:"\""
set-value:x:@.url
   strings.concat
      .:"https://html.duckduckgo.com/html/?q="
      strings.url-encode:x:@strings.split/1
unwrap:x:+/*
signal:magic.http.html2markdown-links
   url:x:@.url
   class:result__a
   max:int:5
   query:uddg
.result
set-value:x:@.result
   strings.concat
      .:"Below are the top 5 links I could find at DuckDuckGo for the query \""
      get-value:x:@strings.split/1
      .:"\":"
      .:"\r\n"
      .:"\r\n"
      get-value:x:@signal
return:x:@.result
}}
Enter fullscreen mode Exit fullscreen mode

Notice, we could write any static text before and after the {{}} parts. This text would be used during vectorisation - However, for the DuckDuckGo use case, the above is sufficient. Once the above snippet is being used as context for ChatGPT, it will invoke [strings.mixin] for the training snippet. To see the result you can try executing the following code in your Hyperlambda Playground assuming you've created the above traning snippet for your foo model and vectorised it.

signal:magic.ai.get-context
   type:foo
   vector_model:text-embedding-ada-002
   prompt:@"DuckDuckGo ""Bill Gates"""
   threshold:float:0.6
   max_tokens:int:2000
Enter fullscreen mode Exit fullscreen mode

This will find your DuckDuckGo training snippet, execute its Hyperlambda code, resulting in something such as the following being returned:

DuckDuckGo

Below are the top 5 links I could find at DuckDuckGo for the query "Bill Gates":

* [Bill Gates | Biography, Microsoft, & Facts | Britannica](https://www.britannica.com/biography/Bill-Gates)
* [Bill Gates - Wikipedia](https://en.wikipedia.org/wiki/Bill_Gates)
* [Bill Gates - Forbes](https://www.forbes.com/profile/bill-gates/)
* [Bill Gates meets privately with Xi Jinping as US-China tensions rise ...](https://www.cnn.com/2023/06/16/business/bill-gates-china-xi-jinping-visit-intl-hnk/index.html)
* [Xi Jinping meets Bill Gates in China, calls him 'an old friend'](https://www.reuters.com/world/china/chinas-president-xi-meet-with-bill-gates-beijing-state-media-2023-06-16/)
Enter fullscreen mode Exit fullscreen mode

This is then used as a "primer message" for ChatGPT or as "context" to answer the question. And the question of course is;

DuckDuckGo "Bill Gates"

Using ANY website as an API

Our website scraping technology can (almost) use any website as an "API". You can see this from the arguments passed into the above [magic.http.html2markdown-links] Hyperlambda slot. The above arguments are as follows;

  • url - URL to fetch. This one is dynamically constructed for the above DuckDuckGo parts to retrieve HTML from DuckDuckGo with search results matching your query.
  • class - This works as a filter, and will only return links with a CSS class name matching the specified class. This allows you to filter out any irrelevant navigation links for our DuckDuckGo example.
  • max - This is the maximum number of hyperlinks to return.
  • query - This is in case the URL you're interested in is obfuscated inside some sort of redirect URL, which is the case for DuckDuckGo. DuckDuckGo don't expose the actual URL, but rather a "redirect URL", which we're able to intelligently parse from the QUERY parameters to the original URL, and return directly. For weird reasons ChatGPT refuses to return URLs towards DuckDuckGo, probably because Microsoft paid them I assume ...? 😜

The above slot of course, turns any website into Markdown, which ChatGPT happens to perfectly understand, allowing you to mix ChatGPT with the search results returned from DuckDuckGo. The end result becomes as follows.

DuckDuckGo and ChatGPT

Notice, we don't actually retrieve the websites DuckDuckGo returns, even though we could in theory have done this too, and used it as a part of our dynamic context to answer real time questions. However, the ability to have ChatGPT work with live data implies we can have ChatGPT work with anything - Including your own local database if you wish. Something we demonstrate below.

Integrate ANYTHING with ChatGPT

All in all, this allows us to literally integrate EVERYTHING with ChatGPT, as long as you can provide us with the data we need. And we can use (almost) any website URL as an "API" for read only data, injecting this into the text stream we're sending to ChatGPT, having ChatGPT work on real time data.

You're welcome 😁

FYI: All the security concerns OpenAI used as arguments to shut off Bing support does not apply for our solution 😎

Top comments (7)

Collapse
 
arianygard profile image
AriaNygard

This is such an amazing update, I barely even have words. And within just a day of releasing this, you were able to release an even better internet search than this.. I'm speechless!

Collapse
 
polterguy profile image
Thomas Hansen

Thx, yes - Now we've got "search the web, scrape results, and chat with content" - Which is 10x better than this ... ;)

Collapse
 
arianygard profile image
AriaNygard

Exactly!

Collapse
 
tageleander profile image
TageLeander

Genius to release this the moment Microsoft turned off the Bing. Was this planned?

Collapse
 
polterguy profile image
Thomas Hansen

Hehe, no it was luck. Marc Mekki had a status update about it on LinkedIn, and then I realised if I hustled, I could wrap it up in a couple of hours ... ^_^

Collapse
 
johannes_k_rexx profile image
johnblommers

But the usual privacy implications are still in effect, right? Our DuckDuckGo prompt data is sent to your website. So agents Smith and Johnson can get hold of it.

Collapse
 
polterguy profile image
Thomas Hansen • Edited

The prompt is sent to our backend, we extract everything between the double quotes ("query"). Then we send that to DuckDuckGo. We never log IP or any fingerprints allowing us to identify you or your location. Then we take the top 5 search results from DuckDuckGo and we send it together with your entire prompt to OpenAI. Yet again, we never send or forward IP address or any browser fingerprints, neither to DuckDuckGo nor to OpenAI.

For a professional cloudlet, and/or an enterprise cloudlet you even get a private database, and a private Kubernetes POD, where we won't even access anything related to your stuff, since it's (duuh!) yours.

I suspect it would be almost impossible to find something "more private" than this, while still being able to leverage AI and GPT, except installing a local open source GPT model on your own server (I don't recommend it) ...

To verify everything I'm saying above is true, 100% of our platform is Open Source, if you want to scrutinise it and verify that what I'm saying is the truth ^_^

Of course, you'd have to pay us at some point if you want this in your own private solution, at which point you'd have to pull out a debit card, identifying you - But we'll never access your data without your explicit consent in any ways ...