DEV Community

Adwik Gupta
Adwik Gupta

Posted on

PockIt: The Voice-Activated Personal Expense Tracker

Introduction

The MERN stack is one of the most widely used full-stack development stacks and for good reason. It is easy to learn, highly scalable and powerful enough to build production-grade applications.
I built PockIt, a personal expense tracker using the MERN stack but with a unique feature:
Users can log expenses using voice instead of typing.
Manually entering expenses is tedious. Most people start tracking their spending but stop after a few days because the process becomes inconvenient.
PockIt allows users to speak their expenses, which are converted into structured transaction data using LangChain and an LLM, and stored in MongoDB.
The project has four main components:

  1. React – Frontend
  2. Firebase – Authentication
  3. Node.js – Backend
  4. MongoDB + LangChain – Database and AI processing

Let’s break down how each part works.


Project File Structure

Before diving into the code, here is a high-level look at how the frontend and backend are organized to keep the codebase modular and scalable.

📁 pockit/
├── 📁 backend/
│   ├── 📁 controllers/
│   │   └── 📄 transaction.js
│   ├── 📁 db/
│   │   └── 📄 connect.js
│   ├── 📁 middleware/
│   │   └── 📄 authMiddleware.js
│   ├── 📁 models/
│   │   ├── 📄 user.js
│   │   └── 📄 transaction.js
│   ├── 📁 routes/
│   │   ├── 📄 transactionsRouter.js
│   │   └── 📄 aiRouter.js
│   └── 📄 app.js
│
└── 📁 frontend/
    ├── 📁 src/
    │   ├── 📁 axios/
    │   │   └── 📄 api.jsx
    │   ├── 📁 components/
    │   │   ├── 📄 Header.jsx
    │   │   ├── 📄 Transaction.jsx
    │   │   ├── 📄 AddTransaction.jsx
    │   │   └── 📄 Profile.jsx
    │   ├── 📁 pages/
    │   │   ├── 📄 register.jsx
    │   │   ├── 📄 login.jsx
    │   │   ├── 📄 HomePage.jsx
    │   │   ├── 📄 Home.jsx
    │   │   └── 📄 Dashboard.jsx
    │   ├── 📄 App.jsx
    │   ├── 📄 firebase.js
    │   ├── 📄 index.css
    │   └── 📄 main.jsx
    ├── 📄 package.json
    └── 📄 vite.config.js
Enter fullscreen mode Exit fullscreen mode

React – The Frontend

React is a component-based JavaScript library used for building interactive user interfaces.
React applications are typically Single Page Applications (SPAs). This means:

  • The browser loads only one HTML file
  • JavaScript handles navigation
  • The page does not reload when switching views

This results in a faster and smoother user experience.

How React Works Internally

React uses something called the Virtual DOM.
Flow:

  1. React creates a Virtual DOM in memory
  2. When state changes, a new Virtual DOM is created
  3. React compares the old and new Virtual DOM (Reconciliation)
  4. Only the changed elements are updated in the real DOM

This makes updates efficient and fast.

Frontend Architecture in PockIt

The most important component in my frontend is: <Transactions />
It is responsible for:

  • Fetching transactions from the backend
  • Deleting transactions
  • Sorting and filtering transactions
  • Rendering <AddTransaction />

The <AddTransaction /> component handles:

  • Manual transaction entry
  • Voice-based transaction entry

API Communication using Axios Client

Instead of attaching the authentication token manually in every request, I created a custom API client using axios:

const apiClient = axios.create({
  baseURL: import.meta.env.VITE_API_URL,
});

apiClient.interceptors.request.use(async (config) => {
  const user = auth.currentUser;
  if (user) {
    const token = await user.getIdToken();
    config.headers.Authorization = `Bearer ${token}`;
  }
  return config;
});

Enter fullscreen mode Exit fullscreen mode

This ensures every request automatically includes the Firebase ID token.
This pattern keeps the code clean and secure.


Firebase – Authentication

Instead of building authentication from scratch, I used Firebase Authentication.

Firebase handles:

  • User registration
  • Login
  • Session management
  • Token generation and validation I used Email/Password authentication.

Authentication Flow

Step 1: User Login

User logs in using:
createUserWithEmailAndPassword(auth, email, password);
signInWithEmailAndPassword(auth, email, password);

Firebase verifies credentials securely.

Step 2: Token Generation

Firebase generates:

  • ID Token (JWT) – valid for 1 hour
  • Refresh Token – used to generate new ID tokens

The ID token contains:

  • User ID
  • Email
  • Digital signature

Step 3: Sending Token to Backend

The frontend sends the token:
Authorization: Bearer <token>

Step 4: Backend Verification

Backend verifies the token using Firebase Admin SDK:

const decodedToken = await admin.auth().verifyIdToken(token);
This ensures the request is authenticated.

Firebase is initialized in: firebase.js
Backend uses: serviceAccount.json
for secure verification.
This removes the need to manage passwords or sessions manually.


Node.js – Backend Architecture

Node.js is a JavaScript runtime built on Chrome’s V8 engine.

Important clarification:

Node.js is single-threaded for JavaScript execution, but it uses libuv and the OS to handle asynchronous operations efficiently.

This allows Node.js to handle thousands of concurrent requests.

Request Flow in PockIt

Example endpoint: /api/transactions
In app.js: app.use('/api/transactions', transactionsRouter);
In router:

router.route('/')
  .get(authMiddleware, getTransactions)
  .post(authMiddleware, createTransaction);

router.route('/:id')
  .delete(authMiddleware, deleteTransaction);

Enter fullscreen mode Exit fullscreen mode

Flow:

  1. Request arrives at router
  2. authMiddleware verifies token
  3. Controller processes request
  4. Response sent back

Authentication Middleware

This is one of the most important parts of the backend:

const decodedToken = await admin.auth().verifyIdToken(token);

const { uid, email, name, phone_number } = decodedToken;

let user = await User.findOne({ firebaseUid: uid });

if (!user) {
  user = new User({
    firebaseUid: uid,
    email,
    displayName: name || '',
    ...(phone_number && { phone_number })
  });

  await user.save();
}

req.user = user;
next();
Enter fullscreen mode Exit fullscreen mode

This middleware:

  • Verifies Firebase token
  • Finds user in database
  • Creates user if not exists
  • Attaches user to request

This ensures every transaction belongs to a valid user.
This is a production-grade authentication pattern.


MongoDB – Database Design

MongoDB is a NoSQL document database.
It stores data in JSON-like documents.
Advantages:

  • Flexible schema
  • Easy scaling
  • Fast development

Connected using Mongoose at server startup.

User Schema

const UserSchema = new mongoose.Schema({
  firebaseUid: {
    type: String,
    required: true,
    unique: true,
    index: true,
  },
  email: {
    type: String,
    required: true,
    unique: true,
  },
  displayName: String,
  phone_number: {
    type: String,
    unique: true,
    sparse: true
  }
}, { timestamps: true });
Enter fullscreen mode Exit fullscreen mode

Transaction Schema

const transactionSchema = new mongoose.Schema({
  title: String,
  amount: Number,
  type: {
    type: String,
    enum: ['income', 'expense'],
  },
  category: String,
  date: Date,

  user: {
    type: mongoose.Schema.Types.ObjectId,
    ref: 'User',
    index: true,
  }
}, { timestamps: true });

Enter fullscreen mode Exit fullscreen mode

Each transaction is linked to a specific user.
This ensures proper data isolation.


LangChain – Voice to Structured Data

This is the core feature of PockIt.
User input:

Spent 800 rupees on auto yesterday
Voice → Text → LangChain → Structured JSON

Prompt Template

const prompt = ChatPromptTemplate.fromMessages([
  ['system', 'You are an expert at extracting transaction data from user text. ' +
        "Today's date is {currentDate}. " +
        'You must extract the amount, vendor, category, and date. ' +
        'If the user mentions spending, the type is "expense". ' +
        'If the user mentions "credit", "salary", or "received", the type is "income". ' +
        `If type is "income", the category MUST be one of: ${incomeCategories.join(', ')}. ` +
        `If type is "expense", the category MUST be one of: ${expenseCategories.join(', ')}. ` +
        'If no specific category is mentioned, use "Other". ' +
        'If the user says "today", use {currentDate}. If they say "yesterday", calculate and use the previous day\'s date in YYYY-MM-DD format. ' +
        'If no date is mentioned, the date field must be {currentDate} in YYYY-MM-DD format.'
  ],
  ['human', '{inputText}'],
]);
Enter fullscreen mode Exit fullscreen mode

This ensures consistent extraction.

Output Validation using Zod

This ensures structured output:

const transactionSchema = z.object({
  type: z.enum(['expense', 'income']),
  amount: z.number(),
  category: z.enum(allCategories),
  vendor: z.string().nullable(),
  date: z.string().nullable(),
});
Enter fullscreen mode Exit fullscreen mode

This prevents invalid AI output from entering the database.

Example Output

{
  "type": "expense",
  "amount": 800,
  "category": "Transport",
  "vendor": "auto",
  "date": "2026-02-23"
}
Enter fullscreen mode Exit fullscreen mode

This structured data is saved in MongoDB.


Complete System Flow

  1. User logs in using Firebase
  2. User speaks expense
  3. Voice converted to text
  4. Text sent to backend
  5. LangChain extracts structured data
  6. Backend saves transaction
  7. React updates UI instantly

Key Engineering Learnings

Building PockIt helped me understand:

  • Secure authentication using Firebase tokens
  • Middleware-based backend architecture
  • Database schema design for multi-user systems
  • Safe integration of LLMs using validation
  • Clean separation between frontend, backend, and AI layers

Most importantly, I learned that improving user experience (voice input) can significantly improve usability.


Final Thoughts

PockIt started as a simple MERN project but evolved into a full system combining:

  • Frontend engineering
  • Backend architecture
  • Authentication systems
  • Database design
  • AI integration

Voice-based expense tracking removes friction.
And removing friction is often the difference between a tool people try and a tool people actually use.

Top comments (0)