This November, I was very happy to have submitted PRs(286,302) to ChatCraft once again. Although it may still take some time for the PRs to be finally merged, the good news is that the basic functionality has already been implemented. I would like to record the process and the challenges I encountered.
PR Background
With an important update, GPT4 model now has the capability to take in images and answer questions about them. Therefore, it would be great if ChatCraft also had the capability to send images, and I am also very excited that I have the opportunity to be involved in the development of this feature.
Develop process
Stage 1: An easy start
Since this new feature involves many components, I first submitted an implementation version with minimal changes. Because I believe that the fewer files changed, the smaller the impact I am likely to make. Here is my first PR. In this PR, I only changed 3 files, but I was able to successfully call the API to send images and obtain the answers to questions
However, we still have many issues that need to be resolved.
The major issues with this PR include: the images sent do not appear in the history, the messages returned are incomplete, and the user experience is not intuitive enough (I used the drag method, which is less intuitive than buttons). Professor Humphrey has provided many detailed and valuable suggestions. This has been of significant assistance to my subsequent development. Shortly thereafter, Taras Glek offered me the role of Collaborator. This allowed for swift deployment of new updates. I'm deeply appreciative of this opportunity.
Stage 2: Investigate and solve issues
In the following days, I first attempted to resolve the issue of the response message being cut off.
I believe it affects normal use and has a higher priority.
The process of finding bugs
I use a python version to do the same request to see how it looks like:
I got the the details of response. I realized that it's likely the image size was too large, causing the token count to reach the threshold. Therefore, I located the type OpenAI.Chat.ChatCompletionCreateParams
for the token limit and set the max_tokens
in src/lib/ai.ts
to a particularly high value. I did this to verify my hypothesis. Consequently, I obtained the upper limit of tokens: 4096.
It should be noted that this seems like it should be the default setting, as Zima said
"By default, the number of tokens the model can return will be (4096 - prompt tokens)."
The official documentation displays it as:
After updating to openai@v4.20.0, I still needed to set the max_tokens
.
Stage 3: Refinement and Enhancement
Next came continuous improvement. In this stage, I updated more files, totaling 10.
I completed the following updates:
1 Updating the model based on file type.
I achieved this by using setSettings
from src/hooks/use-settings.tsx
// Update model to the supported model when inputImages is not empty
useEffect(() => {
if (inputImages && inputImages.length > 0) {
const modelSupportsImages = models.find((model) => model.supportsImages);
if (modelSupportsImages) {
setSettings({ ...settings, model: modelSupportsImages });
}
}
// Assuming models and setSettings are stable and don't need to be in the dependencies array
// eslint-disable-next-line
}, [inputImages]);
2 Attaching file method.
I added the ClipIcon
component to achieve the functionality of Attaching.
I achieved this by mimic src/components/PromptForm/MicIcon.tsx
3 Improved preview if the attached file is an image:
Clicking on the small image displays a larger version with a "close" button in the middle of the window.
I achieved this by using Modal
for display larger version.
4 Keeping images in the chat history.
I achieved this by adding image
in DB and ChatCraftMessage
Summary
This PR made updated the desktop version's UI, providing users with the ability to send images and ask questions to GPT-4. I made a series of updates based on the feedback received. The scope of this PR is noticeably larger than my PRs in October, and it has also been ongoing for a longer duration. I do indeed feel the urgency of time. I will continue to make updates until it is merged.
Top comments (0)