Most artificial intelligence (AI) tools are either closed-source or require a server to function. However, with the help of Ollama, it is possible to create simple yet powerful local AI tools without any of these constraints. In this blog post, I will demonstrate how I built my own local AI tool using Ollama's user-friendly interface and flexible architecture. By doing so, I aim to showcase the ease with which one can leverage AI capabilities without relying on proprietary software or remote servers.
An AI image of a bird robot writing
I recently developed an AI-powered writing assistant using Ollama. To build this innovative tool, I leveraged Vue.js and the Ollama JavaScript package, both of which proved to be indispensable in the development process. By harnessing the power of these advanced technologies, I was able to create a user-friendly interface that streamlines the writing process and yields high-quality content with ease.
The solution only works on Linux, Windows Subsystem Linux (WSL), and maybe Mac
A screenshot of AI writer showing how the first draft of this blog was made.
Integrate into JavaScript
Ollama will send the message to Llama2 model. The response can be returned with streaming, allowing for a seamless and interactive experience. By displaying the model's response in real-time, users will no longer have to look at loading bars, further enhancing their overall experience. This feature not only simplifies the programming process but also provides a more dynamic and engaging environment for all involved.
It will use a model named blog that I create in the next chapter.
const output = ref(“”)
async function sendToModel(message: String) {
// Empty the output
output.value = ""
// Send the message to Ollama
// It will use the model named ‘blog’
const response = await ollama.chat({
model: 'blog',
messages: [{ role: 'user', content:message }],
stream: true
})
// At every part of the stream back to the response in real time
for await (const part of response) {
output.value += part.message.content
}
}
Fine-tuning the model
Ollama's default model is Llama2, which is suitable for casual conversations. However, I wanted to modify it to generate rewritten versions of my initial ideas. To achieve this, I created a custom model file and made two key adjustments. Firstly, I lowered the temperature to 0.5, which results in less creative output that still conveys my intended message. Secondly, I modified the system message to avoid generating answers and instead focus on rewriting the input text. Below is the updated model file for your reference.
FROM llama2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 0.5
# set the system message
SYSTEM """
Rewrite the following into a paragraph of a technical blog with simple and formal English.
"""
To utilize the newly introduced model file, I executed the following command. This resulted in the generation of a freshly minted model named "blog." With this novel model at my disposal, I am now able to incorporate it into my JavaScript applications for seamless usage.
ollama create blog -f ./Modelfile
Conclusion
In conclusion, Ollama provides an accessible and user-friendly means of utilizing Large Language Models (LLMs) in local projects. The platform allows for seamless customization to meet individual needs and preferences, making it a valuable tool for those looking to leverage the power of LLMs in their projects. By providing an easy-to-use interface and a wide range of customization options, Ollama simplifies the process of incorporating LLMs into local projects, enabling users to unlock their full potential.
Top comments (5)
Tom, this is a stellar walkthrough on setting up a local AI writing assistant with Ollama. It's refreshing to see AI tools being developed with privacy and local execution in mind.
Thank you, it still feels so weird that Meta is the one to help with privacy 😄
Meta is pretty good on the developer side. Their maintenance of React and GraphQL have been great.
True, but I did not see a LLM as a dev tool. A specially if you see what open ai get for a revenue from a chat bot 😯
Setting up local stuff has got som much easier lately. As a Java developer I tested the setup of my own custom web app UI with Ollama and it takes like 20 lines of UI code and 2 commands to get Mistral running locally in a Docker container. Now I can build local AI systems for anything! But my question is: Is there way to use GPUs in local containers to speed up LLMs?