The ChatGPT and other OpenAI models is currently on hype. But they are not the only solution available. Or are they? Let's try to figure it out today without code, only general concepts, pain, and suffering.
Imagine that you need to create a chatbot that can answer user questions based on your own data: products in an online store, a knowledge base of a support service, marketing articles, etc. Or a list of cafes and coworking spaces, in my case.
Until recently, such dialogues came down to choosing from "Are you interested in this? - Click here. Interested in something else? - Click there. Didn't understand anything? - Wait for the operator". Not very friendly, but sometimes quite predictable and understandable, only creating such a complex logic takes a long time.
Ok, let's add some friendliness and create a "natural language interface" like "Show me 10 cafes with sockets close to me in London" (if marketers are to be believed, people will write even crazier things, just to find what they're looking for).
One of the first "human language recognizers" like this was the service Amazon Lex (as well as Google Dialogflow and a dozen others)
Amazon Lex V2 is an AWS service for building conversational interfaces for applications using voice and text. Amazon Lex V2 provides the deep functionality and flexibility of natural language understanding (NLU) and automatic speech recognition (ASR) so you can build highly engaging user experiences with lifelike, conversational interactions, and create new categories of products.
Amazon Lex V2 enables any developer to build conversational bots quickly. With Amazon Lex V2, no deep learning expertise is necessary—to create a bot, you specify the basic conversation flow in the Amazon Lex V2 console. Amazon Lex V2 manages the dialog and dynamically adjusts the responses in the conversation. Using the console, you can build, test, and publish your text or voice chatbot. You can then add the conversational interfaces to bots on mobile devices, web applications, and chat platforms (for example, Facebook Messenger).
Sounds great: it can extract entities from plain text and pass to the data API.
For my example above, I had to write an utterance "Show me {count} {type} with {sockets} close to me in {region}" and describe the entities. After that, the JSON {count: 10, type: cafe, sockets: many, region: London}
was obtained from the original phrase.
But here's the problem, for a similar phrase "Give me 10 coworking in Riga", a completely different utterance is needed, and for the simplest query "5 workplaces nearby", a third one is needed. 🤷♂️ In general, I stopped after several hundred utterances of stupid word permutations. Running all the tests took about an hour.
Another pain is dialogues; for example, the seeker of coworking spaces' second request may be "What about cafes?". In Lex, context can be passed in three ways, but a limited number of times and only through code (Amazon Lambda functions).
Lex also has another way of use: training on a dataset of hundreds of thousands of questions and answers, and further automatic responses. Probably suitable for call centers with similar queries.
Well, let's move on to ChatGPT and the capabilities that are rumored about
Debunking myths: models available through API are "dumber" than a chat web interface because they don't have memory or context 🤦♂️
So, to work with a product catalog, you need to transmit the entire catalog in each request, which won't even fit in the 32k token limit of the most expensive gpt-4-32k model. And with each message, you need to transmit all previous requests and responses to maintain context.
This is how about 99.9(9)% of typical bots that start charging users from the 10th response (for example) work. The curtain falls.
In general, the second option for implementing the idea is tokenization and vectorization of the source texts, articles, or product catalogs; tokenization and vectorization of the user query and finding several closest suitable vectors based on cosine similarity.
Simplified, this is how semantic search works, and it doesn't require chat models. However, for the "friendliness" of the dialogue and responses, chat models can be used and the found vectors and their corresponding original texts can be transmitted along with the query.
Here are a few articles about it and a little more.
All actions can be performed on your own models, OpenAI models (such as text-embedding-ada-002) via API, and on any publicly available models, such as NLP cloud. Vectors can be stored in CSV files, but it's better to use specialized vector databases like Qdrant.
Nowadays, this approach is becoming a de facto standard, and most search plugins for ChatGPT are based on it.
There are several advantages:
- low cost of vectorization (it's only performed when the source data is uploaded/modified) and storage, especially in a local database
- some possibility to maintain context by carrying the entire dialogue with each request to the chat model
There are also some disadvantages:
- source data must be text-based, or more precisely, descriptive (my cafe catalog made of enum parameters in JSON format didn't work)
- there must be a lot of them
Services have already appeared that simplify the implementation of the entire stack from data preparation to obtaining chat code for your website. For example, Databerry and Spellbook. And there are also good alternative models, such as Vicuna.
After vector experiments, I switched to the third option - translating human requests into JSON using a chat model
It turned out to be the easiest, cheapest, and fastest way to implement my initial idea 😎
The model is accompanied by instructions that are transmitted together with the user's request.
Convert the question below to JSON data.
Mostly questions are related to cafes and coworkings with different amenities.
Use only following parameters.
Skip unknown parameters and parameters that not in question.
Just output JSON data without explanation, notes or error messages!
Parameters
"""
- count: integer from 0 to 5
- type: one of "Cafe", "Coworking" and "Anticafe"
- region: any city
- sockets: one of "None", "Few" and "Many"
- noise: one of "Quiet", "Medium" and "Noisy"
- size: one of "Small", "Average" and "Big"
- busyness: one of "Low", "Average" and "High"
- view: one of "Street", "Roofs" and "Garden"
- cuisine: one of "Coffee & snacks" and "Full"
- roundclock: one of true and false
In most cases, a perfectly normal JSON response like {count: 5, type: cafe, sockets: many, region: London}
is returned, which can be passed on to a microservice API.
But a text generation model wouldn't be a text generation model if it always responded the same way (even when frozen with temperature=0
). Approximately 10% of the time, it "gets stuck" and adds non-existent parameters or forgets to close the JSON, and a similar repeated request is processed normally.
It's pointless to fight this, but you can remove non-existent parameters or invalid values by validating the response against the JSON schema, and also suggest that the user ask the bot again.
By the way, the latest gpt-4 model turned out to be "smarter", more predictable, and doesn't add random parameters, but it costs 6 times more. We're waiting for gpt-4-turbo.
You can ask the resulting bot here @WorkplacesDigitalBot, its budget is $10/month until it starts earning on its own, and there's no context saving.
Top comments (0)