Anama Drum

Posted on Sep 22, 2025

GPT-6, As I See It — from talker to dependable teammate

Description: "My personal notes and bets on GPT-6: plan-first behavior, real tool orchestration, verifiable reasoning, controlled memory, real-time multimodality, and strict action boundaries."

tags: ai, machinelearning, agents, productivity

These are my personal notes and bets about the next generation of AI. No mysticism, no marketing — just how I think GPT-6 will actually feel in use.

A leap, not a patch

1) Plans first, words second.

GPT-6 should plan before it speaks: goal → sub-tasks → checks → execution. Expect fewer rambling paragraphs and more concrete steps I can run, verify, or delegate.

2) Real tool orchestration.

Beyond simple function calls: selection, permissioning, retries, rollbacks, and alternatives. Think a lightweight task OS that knows when to request access and when to fail fast.

3) Built-in self-checks.

Not “trust me,” but logs, tests, and sources. Show how a number was derived, where claims came from, and what was validated along the way.

4) Memory with brakes.

Useful memory with obvious modes: session, short-term, and long-term — plus visible controls to forget or export what’s stored about me.

5) Multimodal in real time.

Text, images, audio, video, tables — processed together. If the audio contradicts a slide, it notices.

6) Context that’s actually relevant.

Longer windows help, but smarter selection matters more. Adaptive retrieval that keeps the right 1% instead of “just add more context.”

Less magic, more audit

Explicit scopes. Read vs write vs transact, each with distinct confirmation paths. Logged and reproducible actions.
Sandboxes by default. Code runs in isolation; external data is scanned and quarantined when needed.
Right to retrain and to un-train. I decide what sticks for long-term personalization — and what gets erased.
Explainability that matters. Short rationales tied to sources, tests, or checks — not a candy-wrapper “because I said so.”

What this changes for daily work

From answers to outcomes.

Instead of “here’s a long essay,” I expect “here’s a plan, drafts, a calendar, and PRs/briefs ready for review.”

Less micromanagement.

Requests like “collect, test, and deploy” become a single command. Missing access is requested once; then it proceeds.

Personalization with consent.

Use my tone and constraints when I allow it. Otherwise, default to neutral and safe.

If you build software

Type-safe interfaces. Contracts via JSON Schema / Protobuf / OpenAPI with linting and validation built in.
Testability from the start. Auto-generated unit/property tests for generated code, plus reproducible decision traces.
Agent composition. Specialized sub-agents (data, synthesis, QA, publishing) coordinated by a planner.
Hybrid compute. Low-latency/private work on-device; heavy lifting in the cloud.

If you ship content and campaigns

One pass from research to draft. Gather sources → verify numbers → outline → draft → visual options → quick polish in my voice.
Workflow end-to-end. Editorial calendar, A/B subjects, drafts for email and landing pages, UTM tagging, and a post-campaign brief.
Metrics baked in. Clear attribution, client-ready tables, and a concise “what to do next.”

The limits won’t disappear

Hallucinations shrink, not die. The point is to catch more of them automatically — with tests, voters, or external checks.
Garbage in, garbage out. Without clean data and the right permissions, plans will be shallow.
Ethics & compliance matter. Some actions still need a human in the loop. That’s not a bug.

Questions I’m asking myself about GPT-6

Can it show its work without drowning me in logs?
Will it handle access and secrets responsibly, by design?
Does it know when to stop and ask?
Can I reproduce its decisions a week later?

If these become “yes,” we’re not just getting a smarter chatbot — we’re getting a dependable teammate.

Near-term scenarios I expect

Market scan in a day. Curated sources, verified numbers, a one-page memo, and a six-slide summary — ready to forward.
Docs that never rot. From repo and tickets to living technical docs with examples and tests.
Campaigns as pipelines. Content → landing → UTM → email series → summary report, with approvals at each gate.
Code assistance that argues its case. Not just “here’s a patch,” but “here’s why it’s correct,” plus failing tests if it’s not.

How I’m preparing now

Tidy the data: schemas, sources, permissions.
Write objectives as measurable outcomes — “Done” must be checkable.
Break work into stages with gates; each gate has a success criterion.
Define permission policies: who can read, write, ship, or spend — through the model.
Normalize review habits: logs, tests, and references as culture, not afterthought.

Day-one prompts I’ll try

“Collect 10 reliable sources on X, verify the key numbers, and give me a one-page brief and a 6-slide deck. Include links and what you tested.”
“Plan a 30-day email series for niche Y with A/B subjects, drafts, and UTMs. Do not send anything; queue for my review.”
“Audit repo Z, propose 3 PRs with tests, and explain the rollback plan.”
“Build a competitive matrix: pricing, features, reviews. Return a table and three prioritized recommendations.”

TL;DR

My bet: GPT-6 moves from a talker to a planner that proves itself — with tool orchestration, verifiable reasoning, controlled memory, real-time multimodality, and strict action boundaries. It won’t just write about work. It will help me get work done — and show how it did it.

If I’m wrong, I’ll update this post. If I’m right, we’ll spend fewer hours wrestling with glue work and more time deciding what actually matters.

DEV Community