muhammad naveed

Posted on May 16 • Originally published at whyanalyst.vercel.app

I Built a Julius AI Alternative in Next.js — Here's What I Learned

#ai #automation #programming #datascience

Tags: nextjs ai buildinpublic startup

Six weeks ago I started building WhyAnalyst — an AI-powered data analysis tool where you upload a CSV or Excel file and ask questions in plain English. Think Julius AI, but free to start.

This post is about what actually happened when I built it: the technical decisions, the mistakes, the costs, and the things nobody tells you when you're building an AI SaaS as a solo developer.

The stack

Before I get into the lessons, here's what I built it with:

Frontend:  Next.js 14 (App Router)
Auth:      Firebase Authentication
Database:  Firestore
AI:        Google Gemini Flash (switched from GPT-4 — more on this)
Hosting:   Vercel (frontend) + Render (backend API)
Payments:  LemonSqueezy (coming soon)

Total monthly cost at zero users: ~$0. At 100 active free users: still roughly $0. The free tiers on all of these are genuinely generous.

The AI cost problem — and how I solved it

This is the thing that almost killed the project before it started.

My first implementation was naive: user uploads CSV → I send the entire file to the AI → AI answers the question. For a 500-row CSV with 10 columns, that's easily 5,000–10,000 tokens per query. At GPT-4 pricing, that adds up terrifyingly fast.

// ❌ What I started with — extremely expensive
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{
    role: "user",
    content: `Here is my data: ${JSON.stringify(entireCSV)}\n\nQuestion: ${userQuestion}`
  }]
})

The fix was to stop sending raw data to the AI entirely. Instead, I send metadata about the data and let the AI generate analysis code, which runs locally:

// ✅ What I do now — much cheaper
const dataContext = {
  columns: csvData.columns,           // column names only
  sample: csvData.rows.slice(0, 5),   // first 5 rows only
  rowCount: csvData.rows.length,      // total row count
  dtypes: inferColumnTypes(csvData),  // inferred data types
}

const response = await gemini.generateContent(`
  You are a data analyst. Given this dataset context:
  ${JSON.stringify(dataContext)}

  Generate JavaScript code to answer this question: "${userQuestion}"
  The full data array is available as the variable 'data'.
  Return only valid JSON: { code: string, chartType: string, title: string }
`)

// Execute the generated code against the actual data client-side
const result = new Function('data', generatedCode)(csvData.rows)

I also switched from GPT-4 to Gemini 1.5 Flash, which has a generous free tier and is fast enough for this use case. For most CSV analysis questions, the output quality is indistinguishable.

Cost reduction: ~85%

Firebase Auth + Firestore for usage limits

One of the most important things for a freemium AI tool is tracking usage per user so you can enforce limits. Here's the pattern I use:

// Called on every analysis attempt
async function checkAndIncrementUsage(userId) {
  const userRef = doc(db, 'users', userId)

  return await runTransaction(db, async (transaction) => {
    const userDoc = await transaction.get(userRef)
    const { queriesUsed, queriesLimit, plan } = userDoc.data()

    if (queriesUsed >= queriesLimit) {
      throw new Error('LIMIT_REACHED')
    }

    transaction.update(userRef, {
      queriesUsed: increment(1),
      lastActiveAt: serverTimestamp()
    })

    return { allowed: true, remaining: queriesLimit - queriesUsed - 1 }
  })
}

I use a Firestore transaction here (not just an update) to avoid race conditions if someone somehow fires two requests simultaneously.

On signup, I create the user document with defaults:

// Firebase Auth onAuthStateChanged → create user doc if new
async function initializeNewUser(firebaseUser) {
  const userRef = doc(db, 'users', firebaseUser.uid)
  const existing = await getDoc(userRef)

  if (!existing.exists()) {
    await setDoc(userRef, {
      email: firebaseUser.email,
      plan: 'free',
      queriesUsed: 0,
      queriesLimit: 10,
      createdAt: serverTimestamp(),
      onboardingComplete: false
    })
  }
}

Parsing CSV and Excel on the client

One mistake I made early: sending files to the server for parsing. It's slower, uses server resources, and creates privacy concerns for users with sensitive data. Everything now parses in the browser:

import Papa from 'papaparse'
import * as XLSX from 'xlsx'

async function parseFile(file) {
  const ext = file.name.split('.').pop().toLowerCase()

  if (ext === 'csv') {
    return new Promise((resolve) => {
      Papa.parse(file, {
        header: true,
        skipEmptyLines: true,
        complete: (results) => resolve({
          columns: results.meta.fields,
          rows: results.data
        })
      })
    })
  }

  if (ext === 'xlsx' || ext === 'xls') {
    const buffer = await file.arrayBuffer()
    const workbook = XLSX.read(buffer)
    const sheet = workbook.Sheets[workbook.SheetNames[0]]
    const rows = XLSX.utils.sheet_to_json(sheet)
    return {
      columns: Object.keys(rows[0] || {}),
      rows
    }
  }
}

This runs instantly even for large files, and the data never leaves the user's browser until they explicitly ask a question.

The biggest non-technical mistake I made

I built too many features before talking to any users.

Look at my sidebar right now: Workspace, Files, Databases, History, Mission Log, Custom Agents, Notebook Templates, Connect Data. Most of these are either empty or barely functional.

I was building what I imagined users wanted. The reality: every single person who tried the tool just wanted to upload a file and ask a question. That's it. The feature they asked for most often wasn't in any of my sidebar items — it was "can I download the chart as a PNG?"

Lesson: Build the smallest possible thing. Ship it. Watch what real people actually do. Then build the next thing.

What's actually working for user acquisition

Since I have zero marketing budget, I've been trying different channels:

Reddit posts in r/datascience and r/excel with a demo GIF → best ROI so far
Building in public on Twitter → slow but compounds over time
This kind of post → you're reading it, so it works at least a little
SEO pages targeting "julius ai alternative", "chatgpt data analysis alternative" → still building, too early to tell

What hasn't worked: posting in Facebook groups, cold DMs, ProductHunt (haven't launched yet, but prep is underway).

Current status and what's next

WhyAnalyst is live at whyanalyst.vercel.app. Free tier gives you 10 analyses per month, no credit card required.

Things I'm working on next:

Chart export (PNG/PDF) — most requested feature
Persistent file storage so you don't have to re-upload every session
A Chrome Extension that reads Google Sheets data directly
Payments via LemonSqueezy for the Pro tier ($9/month)

Would I do it again?

Yes, but I'd do two things differently:

Talk to 10 potential users before writing a single line of code. I would have built a much simpler first version.
Switch to Gemini Flash from day one. I wasted time and money on GPT-4 for a use case where Flash is genuinely good enough.

If you're building something similar — an AI wrapper, a SaaS tool, anything in this space — feel free to ask questions in the comments. Happy to share more about the technical side or the business side.

And if you want to try the tool (or roast the UI), here it is. Feedback welcome.

Building WhyAnalyst in public. Follow along if you're into that sort of thing.

DEV Community