Nevan Tan

Posted on Sep 14

Real ChatGPT Conversations: Data Fetching and Query Optimization

#webdev #typescript #programming #ai

In this series, I share real conversations I have with ChatGPT during the normal course of my programming work. The goal is to document the benefits of incorporating AI into my workflow, reflect on what I learn from the conversations, and hopefully help other developers make more effective use of these tools - we can learn together!

This article will mostly take the form of going through the conversation with ChatGPT turn-by-turn while I add my thought process behind why I'm sending specific prompts, what I think of the output, and anything else I'm considering outside of the conversation itself.

Today's Topic: Processing the output of a database query and optimizing said query.

Conversation Link: https://chatgpt.com/share/68c6e283-c8a4-800e-bdef-986678a4267b

Background

I'm currently working on an API endpoint to trigger an LLM text completion in ParaProse. The generation is triggered for a specific chapter, from which it gathers a fair bit of context which includes the story title, description, tags, and a few paragraphs of content preceding the point of generation.

In my initial attempt to gather this context, I hacked together a query using my ORM (Drizzle), which returned data in a format that wasn't especially useful to me. The conversation starts with me asking ChatGPT to write some formatting code and leads to optimizing the queries and fixing a type error caused by my limited knowledge of how joins work.

Turn 1

Prompt

I'm currently getting this result back from my drizzle orm:

const result: {
    chapter: {
        id: string;
        storyId: string;
        title: string;
        description: string;
        position: number;
        createdAt: Date;
        updatedAt: Date;
    };
    story: {
        id: string;
        userId: string;
        title: string;
        description: string;
        createdAt: Date;
        updatedAt: Date;
    } | null;
    tag: {
        id: string;
        userId: string;
        value: string;
    } | null;
}[]

Could you write a typescript map or reduce that processes this array into a chapter array:

const chapters: {
    id: string;
    storyId: string;
    title: string;
    description: string;
    position: number;
    createdAt: Date;
    updatedAt: Date;
    story: {...story fields}
    tags: [...tag objects]
}[]

Response

Got it 👍 — you want to collapse the flat Drizzle result into one array of chapters, each with its story and an array of tags.
Here’s a clean way using reduce:

type Chapter = {
  id: string
  storyId: string
  title: string
  description: string
  position: number
  createdAt: Date
  updatedAt: Date
  story: {
    id: string
    userId: string
    title: string
    description: string
    createdAt: Date
    updatedAt: Date
  } | null
  tags: {
    id: string
    userId: string
    value: string
  }[]
}
const chapters: Chapter[] = result.reduce<Chapter[]>((acc, { chapter, story, tag }) => {
  // check if chapter already exists in accumulator
  let existing = acc.find(c => c.id === chapter.id)
  if (!existing) {
    existing = {
      ...chapter,
      story,
      tags: []
    }
    acc.push(existing)
  }
  if (tag) {
    // avoid duplicates if the same tag shows up multiple times
    if (!existing.tags.some(t => t.id === tag.id)) {
      existing.tags.push(tag)
    }
  }
  return acc
}, [])

✅ Each chapter is unique.
✅ Tags are grouped into an array.
✅ Story is included once per chapter.
Would you like me to also show a version using Map instead of find for better performance on large arrays?

Thoughts & Analysis

At this point, the main thing I was looking for was functional code that looked close enough to what I was going for. I've written many a map/reduce function in my day and I wasn't keen to do it yet again, especially when AI tends to be pretty decent at this particular kind of code.

Initially, I was going to ignore the Chapter type, but I did end up keeping it so I could have the chapters variable typed correctly. I didn't really want it cluttering up my (currently) single-file router with a bunch of types at the top; but we'll come back to that. I went ahead and pasted the code into the right spots and made sure all the types were lining up, when I realized I missed some context.

Turn 2

Prompt

Could you update that to also include paragraphs?

const result: {
    chapter: {
        id: string;
        storyId: string;
        title: string;
        description: string;
        position: number;
        createdAt: Date;
        updatedAt: Date;
    };
    story: {
        id: string;
        userId: string;
        title: string;
        description: string;
        createdAt: Date;
        updatedAt: Date;
    } | null;
    tag: {
        id: string;
        userId: string;
        value: string;
    } | null;
    paragraphs: {
        id: string;
        chapterId: string;
        content: string;
        position: number;
        source: "user" | "llm";
        createdAt: Date;
        updatedAt: Date;
    } | null;
}[]

Response

Here you go—now with paragraphs too. I switched to a Map for O(n) grouping and added simple de-duping.

type Story = {
  id: string
  userId: string
  title: string
  description: string
  createdAt: Date
  updatedAt: Date
}
type Tag = {
  id: string
  userId: string
  value: string
}
type Paragraph = {
  id: string
  chapterId: string
  content: string
  position: number
  source: "user" | "llm"
  createdAt: Date
  updatedAt: Date
}
type ChapterWithRelations = {
  id: string
  storyId: string
  title: string
  description: string
  position: number
  createdAt: Date
  updatedAt: Date
  story: Story | null
  tags: Tag[]
  paragraphs: Paragraph[]
}
const chapters: ChapterWithRelations[] = (() => {
  const map = new Map<string, ChapterWithRelations>()
  for (const row of result) {
    const { chapter, story, tag, paragraphs: paragraph } = row
    let entry = map.get(chapter.id)
    if (!entry) {
      entry = {
        ...chapter,
        story,
        tags: [],
        paragraphs: [],
      }
      map.set(chapter.id, entry)
    }
    if (tag && !entry.tags.some(t => t.id === tag.id)) {
      entry.tags.push(tag)
    }
    if (paragraph && !entry.paragraphs.some(p => p.id === paragraph.id)) {
      entry.paragraphs.push(paragraph)
    }
  }
  // Optional: keep things tidy
  for (const ch of map.values()) {
    ch.tags.sort((a, b) => a.value.localeCompare(b.value))
    ch.paragraphs.sort((a, b) => a.position - b.position)
  }
  return Array.from(map.values())
})()

If you want to preserve original tag order instead of sorting, just remove the ch.tags.sort(...) line.

Thoughts & Analysis

This was a very low-effort prompt; just a quick ask and pasting the updated type I was getting after I added another join to my query. Given it was so closely related to what we were already working on, I figured ChatGPT could handle it; and I was mostly right.

When it comes to the response, I immediately disliked the self-invoking function as it doesn't really fit my style. I didn't bother digging into the quality of the code; my gut instinct was telling me that this was too complicated and that there might be an easier way to do this. Drizzle is relatively new to me (coming most recently from TypeORM and Supabase), so I suspected I was probably doing something wrong with how I was fetching the data.

Turn 3

Prompt

Given the desired end state, is there a better way for me to write this query?

typescript
const result = await db
      .select({
        chapter: chaptersTable,
        story: storiesTable,
        tag: tagsTable,
        paragraphs: paragraphsTable,
      })
      .from(chaptersTable)
      .leftJoin(storiesTable, eq(chaptersTable.storyId, storiesTable.id))
      .leftJoin(storyTagsTable, eq(storiesTable.id, storyTagsTable.storyId))
      .leftJoin(tagsTable, eq(storyTagsTable.tagId, tagsTable.id))
      .leftJoin(
        paragraphsTable,
        eq(paragraphsTable.chapterId, chaptersTable.id)
      )
      .where(
        and(
          eq(storiesTable.userId, session.user.id),
          eq(chaptersTable.id, chapterId)
        )
      )

Response

Yes—your current joins create a “row explosion” (chapter × tags × paragraphs) that you then have to de-dupe in TS. Two better patterns:

1) Single-chapter: do 3 tight queries (simplest + efficient)

Since you filter by a single chapterId, split it up to avoid duplication entirely.

// 1) Chapter + story
const [chapterRow] = await db
  .select({
    chapter: chaptersTable,
    story: storiesTable,
  })
  .from(chaptersTable)
  .leftJoin(storiesTable, eq(chaptersTable.storyId, storiesTable.id))
  .where(and(eq(storiesTable.userId, session.user.id), eq(chaptersTable.id, chapterId)))
if (!chapterRow) throw new Error('Not found')
const { chapter, story } = chapterRow
// 2) Tags for the story
const tags = story
  ? await db
      .select({ id: tagsTable.id, userId: tagsTable.userId, value: tagsTable.value })
      .from(storyTagsTable)
      .innerJoin(tagsTable, eq(storyTagsTable.tagId, tagsTable.id))
      .where(eq(storyTagsTable.storyId, story.id))
  : []
// 3) Paragraphs for the chapter
const paragraphs = await db
  .select({
    id: paragraphsTable.id,
    chapterId: paragraphsTable.chapterId,
    content: paragraphsTable.content,
    position: paragraphsTable.position,
    source: paragraphsTable.source,
    createdAt: paragraphsTable.createdAt,
    updatedAt: paragraphsTable.updatedAt,
  })
  .from(paragraphsTable)
  .where(eq(paragraphsTable.chapterId, chapter.id))
  .orderBy(paragraphsTable.position)
const result = [{ ...chapter, story, tags, paragraphs }]

Pros: no row explosion, simplest code, predictable performance for a single chapter.

2) One round-trip with aggregated subqueries (good if you’ll batch many chapters)

Use JSON aggregates per relation and join those once. This keeps one row per chapter.

import { sql } from 'drizzle-orm'
// paragraphs aggregated per chapter
const paragraphsAgg = db
  .select({
    chapterId: paragraphsTable.chapterId,
    paragraphs: sql<{
      id: string; chapterId: string; content: string; position: number;
      source: 'user'|'llm'; createdAt: Date; updatedAt: Date;
    }[]>`
      COALESCE(
        JSON_AGG(
          JSON_BUILD_OBJECT(
            'id', ${paragraphsTable.id},
            'chapterId', ${paragraphsTable.chapterId},
            'content', ${paragraphsTable.content},
            'position', ${paragraphsTable.position},
            'source', ${paragraphsTable.source},
            'createdAt', ${paragraphsTable.createdAt},
            'updatedAt', ${paragraphsTable.updatedAt}
          )
          ORDER BY ${paragraphsTable.position}
        ) FILTER (WHERE ${paragraphsTable.id} IS NOT NULL),
        '[]'::json
      )
    `.as('paragraphs'),
  })
  .from(paragraphsTable)
  .groupBy(paragraphsTable.chapterId)
  .as('p_agg')
// tags aggregated per story
const tagsAgg = db
  .select({
    storyId: storyTagsTable.storyId,
    tags: sql<{ id: string; userId: string; value: string }[]>`
      COALESCE(
        JSON_AGG(
          DISTINCT JSONB_BUILD_OBJECT(
            'id', ${tagsTable.id},
            'userId', ${tagsTable.userId},
            'value', ${tagsTable.value}
          )
        ) FILTER (WHERE ${tagsTable.id} IS NOT NULL),
        '[]'::jsonb
      )
    `.as('tags'),
  })
  .from(storyTagsTable)
  .leftJoin(tagsTable, eq(storyTagsTable.tagId, tagsTable.id))
  .groupBy(storyTagsTable.storyId)
  .as('t_agg')
// final select: 1 row per chapter with arrays embedded
const rows = await db
  .select({
    chapter: chaptersTable,
    story: storiesTable,
    tags: tagsAgg.tags,
    paragraphs: paragraphsAgg.paragraphs,
  })
  .from(chaptersTable)
  .leftJoin(storiesTable, eq(chaptersTable.storyId, storiesTable.id))
  .leftJoin(tagsAgg, eq(tagsAgg.storyId, storiesTable.id))
  .leftJoin(paragraphsAgg, eq(paragraphsAgg.chapterId, chaptersTable.id))
  .where(and(eq(storiesTable.userId, session.user.id), eq(chaptersTable.id, chapterId)))
const chapters = rows.map(r => ({
  ...r.chapter,
  story: r.story,
  tags: r.tags ?? [],
  paragraphs: r.paragraphs ?? [],
}))

Pros: one DB round-trip, no de-dupe in JS, stable ordering by paragraph position.
Notes: uses Postgres JSON functions via sql\\; Drizzle types are annotated on the aggregates.

Which to pick?

Fetching a single chapter (your example): Option 1 (3 queries) is simplest and very efficient.
Fetching many chapters at once: Option 2 scales better—one query, one row per chapter, aggregated relations.

Thoughts & Analysis

This was a pretty hefty response, but glancing at the headings for the two options convinced me to focus entirely on the first option. I don't know whether it's the best way to do what I'm doing, but skimming through the code satisfied me that it passes the "good enough for now" check.

Considering this query was one of the first things I was writing in this handler, I don't even have anything to run yet. The goal at this stage is to just get something working as quickly as possible, and I can iterate on it from there - can't benchmark to check for performance if there's nothing to run the benchmarks against. It was easy enough to paste this over what I had already.

I wasn't able to run it to verify that it actually worked (yet), but checking on all the return types turned up a problem.

Turn 4

Prompt

I'm going with option 1; however, my story type is:
const story: {
    id: string;
    userId: string;
    title: string;
    description: string;
    createdAt: Date;
    updatedAt: Date;
} | null
Isn't this impossible? chaptersTable.storyId is non-nullable, so it must always be attached to a story; the story variable being null would indicate a broken database state, so why is it getting typed as | null on return from the query?

Response

Great question. The | null is coming from the join type, not from your schema constraints. In Drizzle, a leftJoin always types the right-hand side as possibly null because SQL could produce NULLs for that table—even if your FK is non-nullable. Drizzle doesn’t “prove” FK integrity at the type level.

Fix: use innerJoin

Since a chapter must have a story, use an inner join so Drizzle can type story as non-null.
const [row] = await db
  .select({
    chapter: chaptersTable,
    story: storiesTable,
  })
  .from(chaptersTable)
  .innerJoin(storiesTable, eq(chaptersTable.storyId, storiesTable.id)) // ← innerJoin
  .where(
    and(
      eq(storiesTable.userId, session.user.id),
      eq(chaptersTable.id, chapterId)
    )
  )
if (!row) throw new Error('Not found')
const { chapter, story } = row // story: Story (non-null)
Why not keep leftJoin?

With leftJoin, Drizzle must type story as Story | null because SQL allows unmatched rows (even if your DB never produces them in practice).

You could assert non-null with as NonNullable<typeof story>, but that’s a type assertion, not safety.

Edge case to be aware of

If you’re using RLS on stories, an innerJoin will filter the chapter out entirely when the story is invisible to the user (which is usually what you want). A leftJoin could still return the chapter row with a NULL story in raw SQL, but your WHERE stories.userId = ... would then eliminate it anyway. So innerJoin is both simpler and correctly typed.

Alternative API (optional)

If you define relations(...), Drizzle’s relational API can return nested objects with correct nullability:
const chapter = await db.query.chaptersTable.findFirst({
  where: eq(chaptersTable.id, chapterId),
  with: {
    story: true,          // non-null due to inner relation + FK
    paragraphs: true,
  },
})
But for your chosen Option 1 (separate queries), switching to innerJoin is the cleanest fix.

Thoughts & Analysis

This turn shows my lack of familiarity with the different kinds of SQL joins. The explanation in the response seems reasonable enough, and swapping the leftJoin for an innerJoin did result in the correct type for the story variable. However, this is one of those points that I don't want to blindly trust the model - where I don't have the knowledge to be confident in my judgement of the response. Plus, this is a good opportunity to shore up some weak points in what I know!

Off to Google we go for a bit of research:

In the case of joining the stories table to the chapters table, left join and inner join will, in fact work the same way but the latter will have better types. A left join would include all the chapters and only pull in stories if they have a matching chapter, while an inner join only includes chapters that have a matching story. However, because the chapters always have an associated story, it will result in the same set regardless of what join we use.

I did briefly look at the alternative API option provided at the end of that response; but I don't currently have relations set up in the database schema, and I figured swapping out the join was easier than trying to do that.

Conclusion

Overall, this conversation saved me a fair bit of time manually poking around and trying to get a functional query. I suspect, had I done this by hand, I would have ended up at the split queries from Turn 3 anyway; but it would have taken longer than the <5 minutes it took to reach that conclusion while talking to ChatGPT.

It also highlighted my knowledge gap around something I should really have already known (SQL joins) and pushed me to learn about it. I feel like this is an excellent way to pick up new skills - when you have a very specific problem in front of you to allow you to immediately put into practice what you learned. This approach does require you to be skeptical of what LLM chatbots are saying, however, as it can be actively detrimental to your educational progress if you blindly accept the answer in front of you.

I'm going to try to keep posting conversations like these when I find ones that were particularly helpful - either in furthering my work or when I learn something new as a result of the conversation. Writing this up helps me maximize what I get out of the conversation, but I also hope that it can inspire others in how they incorporate AI into their workflows, and that some of you have suggestions on how I can improve my workflow as well!

DEV Community

Real ChatGPT Conversations: Data Fetching and Query Optimization

Background

Turn 1

Prompt

Response

Thoughts & Analysis

Turn 2

Prompt

Response

Thoughts & Analysis

Turn 3

Prompt

Response

1) Single-chapter: do 3 tight queries (simplest + efficient)

2) One round-trip with aggregated subqueries (good if you’ll batch many chapters)

Which to pick?

Thoughts & Analysis

Turn 4

Prompt

Response

Fix: use `innerJoin`

Why not keep `leftJoin`?

Edge case to be aware of

Alternative API (optional)

Thoughts & Analysis

Conclusion

Top comments (0)

Background

Turn 1

Prompt

Response

Thoughts & Analysis

Turn 2

Prompt

Response

Thoughts & Analysis

Turn 3

Prompt

Response

1) Single-chapter: do 3 tight queries (simplest + efficient)

2) One round-trip with aggregated subqueries (good if you’ll batch many chapters)

Which to pick?

Thoughts & Analysis

Turn 4

Prompt

Response

Fix: use innerJoin

Why not keep leftJoin?

Edge case to be aware of

Alternative API (optional)

Thoughts & Analysis

Conclusion

Fix: use `innerJoin`

Why not keep `leftJoin`?