DEV Community

Cover image for From Terminal to UI: Building Your First Local AI Assistant with Node.js
Ataur Rahman
Ataur Rahman

Posted on

From Terminal to UI: Building Your First Local AI Assistant with Node.js

Hi everyone! How's your journey with AI going? Each day feels more exciting than the last. We're living through a technological revolution, witnessing rapid innovation in AI like never before.

I won't spend time here trying to explain what AI is capable of - that's already clear. The real question is: how can we benefit from it? If I can complete 10 tasks in a day, but AI helps me get those done in half the time, I can spend the rest doing more meaningful work - or just resting. That's the magic of automation.

But let's be clear:

We shouldn't become addicted to AI. Instead, we should learn how to make the most of it. That means staying updated and understanding the fundamentals - what's happening behind the scenes. Once you grasp how current AI systems work, you'll find yourself ready to build and innovate with confidence.

Quick Note Before We Begin

Apologies for the delay - it's been 1.5 months since my last post. I've been under the weather, dealing with job pressure, and learning a lot of new things. But now I'm back, and the good news is: I've already finished testing demo apps for the next 5–6 posts! That means new content will be rolling out much faster - so stay tuned.

In our previous Blog, I explained the essential tools and topic's like Ollama, LangChain, and how local models work. I won't repeat those here - please check out that post if you haven't yet. Read my previous details blog from here. I will mention here like I am using this. To jump into the application , must read those following topics :

Prerequisites
Make sure you have in locally:

You can pull and run it using:

ollama run llama3.2:3b-instruct-q4_K_M
Enter fullscreen mode Exit fullscreen mode

To check your setup, run the command above in your terminal. You should see the model response interface. That confirms everything is ready.

Testing in terminal if everything is ok

You can see all available model in you machine by running ollama ls command. I have 5 model as you can see in the screenshot. If this is your first time and follow the Prerequisites, then in your list you will see only one.

Let's get start:

So what are going to do in this tutorial?

If you run "ollama run llama3.2:3b-instruct-q4_K_M" command in your terminal, you should see the model response interface and you can a conversation with model. You can end or exit from the conversation by sending - "/bye" message.

Interact with model from terminal

But what is our goal ?
Building our own AI assistant with a lot of capabilities. Which is not possible from terminal. Need a application who will interact with the model with smart capabilities. Before talking about capabilities we need a basic application where we can interact with model from from UI. I mean what we are doing now from terminal, the same thing should we do by our application. No more need terminal to interact with model. We will talk via our application who will maintain how interact with model.

Basic AI assistant

How can we achieve that ?
I will do all thing in TypeScript (JS). So, I choose Next.js for frontend and Node.js (express.js framework) for back-end. By a simple express.js application we can handle our basic need.

First step is Start from Back-end.

Initialize a Nodejs application and setup a basic express app by commanding:

npm init //initaial project

//install packages
npm install express @types/express cors @types/cors dotenv @langchain/core @langchain/community @langchain/ollama 
Enter fullscreen mode Exit fullscreen mode

After installing packages create a index.ts file with basic /stream route:

import cors from "cors";
import dotenv from "dotenv";
import express from "express";

dotenv.config();

const PORT = process.env.PORT || 9000;

const app = express();
app.use(express.json());
app.use(cors());

/* 
    This is the endpoint that will be used to stream the response from the AI to the client.
*/
app.post("/stream", async (req, res) => {
    //we will do our business logic here step by step
});

app.listen(PORT, () => {
    console.log(`Server is running on port ${PORT}`);
});
Enter fullscreen mode Exit fullscreen mode

in this route we are expecting user should give at least a message like this:

{
  "message":"Hi"
}
Enter fullscreen mode Exit fullscreen mode

and inside the /stream route we will catch and process like this:

   try {
        const { message } = req.body;
        if (!message) throw Error('Message is required');

        const model = new ChatOllama({
            model: "llama3.2:3b-instruct-q4_K_M",
            baseUrl: "http://localhost:11434",
            temperature: 0,
        })

        const formatedMessages = [
            new HumanMessage(message)
        ];

        const stream = await model.stream(formatedMessages);

        for await (const chunk of stream) {
            const content = chunk?.content;

            console.log(content);

        }

        res.end();
    } catch (err) {
        console.error("Stream error:", err);
        res.status(500).end("Stream error");
    }
Enter fullscreen mode Exit fullscreen mode

wait wait wait !!!. Not complete yet. Before jump into next let's understand first what I did here.

We are configuring our model in our code by langchain. The chatOllama class is comes from langchain package. Here we are configuring the model information. when we install ollama, by default it expose 11434 port. We can communicate with ollama by that port. Now question is

As ollama itself expose a rest API to communicate with him , why we are using langchain? why not Ollama directly.

If you have this question in your head , I will say good catch. You are really curious to know how AI is working. Well, back to the point. To better understand you have to know first what is olllama and langchain. as I mention I have a details post about those topic. Highly recommend read from here. But I am mentioning here again little bit .

We can communicate with Ollama by default rest api. It will work also . But our application will Tightly Coupled with ollama system. In that case we can not use OpenAi, Claude , Gemini type others model in our system. Actually we can but we need to different configuration for each provider. So, here is the point of the langchain's entry. Langchain is a wrapper of all models configuration (in better understand in our current context). All configuration is similar. Here we are using ollama provider so , langchain offering us this chatollama class for communicate with our local model. You can see other from here.

Right now this config look too simple , that's why may be you have rise some question on your head about what is the more complex configuration. I know. Keep searching for answer and let me know your question in comment section also.

To configure the model here we need to pass which model we want to use and what is the endpoint. In our case we installed earlier llama3.2:3b-instruct-q4_K_M , and the base URL is the provider URL. In our case, provider is ollama and it expose in 11434 port.

       const model = new ChatOllama({
            model: "llama3.2:3b-instruct-q4_K_M",
            baseUrl: "http://localhost:11434",
            temperature: 0,
        })
Enter fullscreen mode Exit fullscreen mode
const formatedMessages = [
            new HumanMessage(message)
        ];
Enter fullscreen mode Exit fullscreen mode
const stream = await model.stream(formatedMessages);
Enter fullscreen mode Exit fullscreen mode

Why I convert the user messages by HumanMessage class and why I put it in array ?
Answer: Well , we can use directly the user message but if we use this class we can ensure this will return the same way how the model is expecting . You can also use like this but first one is recommended.

const stream = await model.stream("Hello, how are you?");
Enter fullscreen mode Exit fullscreen mode

If you run your code and hit your endpoint you can see some console log in your terminal. If you remember, we put a console log the content inside a for loop.

terminal response

Yooo Man!, Congratulation !!!. You have successfully achieved the first step. It's showing like almost same as how we are communicate the model directly. Now come to the stream the response part.
After the stream line update your code by following code:

  const stream = await model.stream(formatedMessages);

        // await streamChunksToTextResponse(res, stream);
        res.setHeader("Content-Type", "text/plain");
        res.setHeader("Transfer-Encoding", "chunked");

        for await (const chunk of stream) {
            const content = chunk?.content;

            console.log(content);

            if (typeof content === "string") {
                res.write(content);
            } else if (Array.isArray(content)) {
                for (const part of content) {
                    if (typeof part === "object" && "text" in part && typeof part.text === "string") {
                        res.write(part.text);
                    }
                }
            }
        }

        res.end();
Enter fullscreen mode Exit fullscreen mode

After updating your code , if you try now to hit your endpoint any software like postman , you can see the stream response.

Postman response

Wow !, your back-end API is ready now you can use your API in front-end to view your response like our chat interface.
Actually I am a Nodejs Developer. But I know front-end little without CSS 😁😁. So I am sharing the request handle part here only not how setup the entire project. I hope you know how to create a Nextjs application and can design a chat interface. Or you can use v0.dev for design like me as I did. Let's jump into next.

First I create a next js API for my back-end API call and forward the response to my client (where I am calling this api):

// app/api/chat/route.ts
import { NextRequest } from 'next/server';

export const runtime = "edge";

export async function POST(req: NextRequest) {
  const { message } = await req.json()

  console.log({ message });

  const response = await fetch("http://localhost:9000/stream", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      message
    }),
  });

  return new Response(response.body, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      Connection: 'keep-alive',
    },
  })
}
Enter fullscreen mode Exit fullscreen mode

In client I request my next API like following and handle the response here:


const res = await fetch('/api/chat', {
        method: 'POST',
        body: JSON.stringify({ message}),
        headers: { 'Content-Type': 'application/json' },
      })

      const reader = res.body?.getReader()
      const decoder = new TextDecoder('utf-8')

      if (!reader) throw new Error('No stream reader found.')

      while (true) {
        const { value, done } = await reader.read()
        if (done) break

        const chunk = decoder.decode(value)
        console.log({ chunk });
      }
Enter fullscreen mode Exit fullscreen mode

If you run and use this function you can the console log same as you can see in back-end stream was console. Your can handle you chat history messages by using a state. In my application I create a hook where all chat message related task I am handling.

"use client"

import { useState } from "react"
import type { Message } from "@/types"

export function useChat() {
  const [messages, setMessages] = useState<Message[]>([])
  const [isLoading, setIsLoading] = useState(false)
  const [isTyping, setIsTyping] = useState(false)

  const sendMessage = async (input: string) => {
    if (!input.trim() || isLoading) return

    // Add user message to chat
    const userMessage: Message = {
      id: Date.now().toString(),
      role: "user",
      content: input,
    }

    setIsLoading(true)
    setIsTyping(true)

    console.log({ messages });

    try {
      // Send message to API using server action
      setMessages((prev) => [...prev, userMessage])

      const res = await fetch('/api/chat', {
        method: 'POST',
        body: JSON.stringify({ message: input}),
        headers: { 'Content-Type': 'application/json' },
      })

      const reader = res.body?.getReader()
      const decoder = new TextDecoder('utf-8')

      if (!reader) throw new Error('No stream reader found.')

      // Add initial assistant message
      const assistantMessageId = Date.now().toString()
      setMessages((prev) => [...prev, {
        id: assistantMessageId,
        role: "assistant",
        content: ""
      }])

      while (true) {
        const { value, done } = await reader.read()
        if (done) break

        const chunk = decoder.decode(value)
        console.log({ chunk });

        // Update message in real-time
        setMessages((prev) => prev.map(msg =>
          msg.id === assistantMessageId
            ? { ...msg, content: msg.content + chunk }
            : msg
        ))
      }
    } catch (error) {
      console.error("Error sending message:", error)

      // Add error message
      const errorMessage: Message = {
        id: Date.now().toString(),
        role: "assistant",
        content: "Sorry, there was an error processing your request. Please try again.",
      }

      setMessages((prev) => [...prev, errorMessage])
    } finally {
      setIsLoading(false)
      setIsTyping(false)
    }
  }

  return {
    messages,
    isLoading,
    isTyping,
    sendMessage,
    setMessages,
  }
}
Enter fullscreen mode Exit fullscreen mode

Here actually, I am managing the chat messages state. Receiving the stream and update the state also. So it feel's the UI the real-time stream.

✅ Congratulations!

You've just built the foundation of your own AI assistant! You can now send messages to a local LLM from a web UI using LangChain, Express, and Ollama. We kick start successfully, still have long path to achieve our goal. Our current application dose not have chat memory yet. Means If you give him some information in your last message and ask in the next message , he can not answer. We will fix it one by one.

What's Next?
We've laid the groundwork. Next up, we'll refine this setup, add memory, RAG (retrieval-augmented generation), and enable tool usage. Stick around 
- it's about to get exciting.

💭 Final Thoughts
The goal of this series is not just to build something cool - but to help you understand how modern AI works under the hood. If you're a JavaScript developer, you don't need to feel left out of the AI world anymore. You've got the tools. Now let's build!

🔗 Follow me for updates, and thank you for joining in our mission building own AI Assistant !

👉 💬 Got questions or thoughts? Drop them in the comments - I'd love to hear what you're building.

👉 Stay tuned for the next post in this series!

💖 If you're finding value in my posts and want to help me continue creating, feel free to support me here [Buy me a Coffee]! Every contribution helps, and I truly appreciate it! Thank You. 🙌

Happy Coding! 🚀

Top comments (0)