The traditional system design process is a familiar ritual. It involves long hours at the whiteboard, endless pages of design documents, and protracted debates over data models and service boundaries. While thorough, this process can be slow, sometimes taking weeks to move from a high-level concept to a concrete implementation plan. But what if we could compress that timeline, exploring more options and validating ideas with greater speed and confidence?
Recently, while laying the groundwork for a new engineering analytics platform, I decided to push the boundaries of AI-assisted development. Over the course of 11 distinct, long-running sessions and nearly 100 million tokens of interaction with a large language model (LLM), I moved beyond simple code completion. The AI became a true architectural co-pilot. Together, we explored data models, pressure-tested API contracts with generated shell commands, and scaffolded entire services. This wasn't about asking an AI to "write the code for me"; it was a collaborative dialogue that transformed a complex architectural challenge into a rapid, iterative design-and-build cycle.
This article shares the practical workflows and mindset shifts that allowed me to leverage an LLM as a partner in architectural design, drastically accelerating the journey from concept to code.
The AI Co-Pilot Mindset: From Dictation to Dialogue
The first and most crucial step is to change how you think about interacting with an LLM. We're conditioned to use it as a hyper-intelligent search engine or a code snippet generator. To unlock its architectural potential, you must treat it as a collaborative partner.
This means moving from a transactional model ("write a function that does X") to a conversational one ("let's design a system that achieves Y"). The power of modern LLMs lies in their vast context window. By keeping a design session within a single, continuous conversation, the AI builds a comprehensive understanding of your goals, constraints, and previous decisions. It becomes a tireless junior architect that you can guide, question, and brainstorm with.
Key workflows in this collaborative model include:
- Exploratory Prototyping: Rapidly generating and comparing different approaches. Instead of spending a day mocking up one data model, you can generate three distinct options (e.g., relational, document-based, graph-based) in under an hour and discuss the trade-offs with the AI.
- Conceptual Validation: Using the AI as a sounding board to challenge your own assumptions. Prompts like, "What are the potential failure modes of this API design?" or "How would this schema handle a massive increase in write volume?" can reveal blind spots you might have missed.
- Iterative Refinement: Feeding the AI's output back into the conversation with new constraints. The design process becomes a tight loop of generating, critiquing, and refining, all within the same contextual conversation.
Phase 1: Exploring the Solution Space with Data Modeling
Every great system is built on a solid data model. This is where the AI co-pilot shines first. For my analytics platform, I needed to model abstract concepts like "cycle time," "bus factor," and "developer contributions." Instead of starting with a blank slate, I started with a conversation.
My initial prompt was intentionally broad, aimed at exploring the solution space:
"I am designing a database schema for an engineering analytics platform. The core concepts are Projects, Developers, Commits, Pull Requests, and Deployments. I need to calculate metrics like 'PR cycle time' (broken down into stages like 'time to first review' and 'time to approval') and 'bus factor' for key parts of a codebase. Propose a relational schema using Python's Pydantic models for validation and SQLAlchemy for ORM mapping."
This prompt sets the stage, defines the core entities, and specifies the technology stack. The LLM's initial response provided a solid, normalized schema.
# AI-Generated Foundational Models (First Pass)
import datetime
from typing import List, Optional
from pydantic import BaseModel, Field
class Developer(BaseModel):
id: int
username: str
email: str
class Project(BaseModel):
id: int
name: str
repository_url: str
class Commit(BaseModel):
id: str = Field(..., description="Git commit hash")
project_id: int
author_id: int
message: str
committed_at: datetime.datetime
class PullRequest(BaseModel):
id: int
project_id: int
author_id: int
title: str
created_at: datetime.datetime
merged_at: Optional[datetime.datetime] = None
closed_at: Optional[datetime.datetime] = None
This was a good start, but it was too simple for tracking detailed cycle time stages. The power is in the follow-up prompt. I didn't need to re-explain everything; I just needed to refine the existing context.
"This is a great starting point. Now, let's refine the PullRequest model to better support cycle time analysis. Introduce a new model,
PullRequestEvent, that can track key timestamps likereview_requested,comment_added,approval_given, andmerge_committed. A Pull Request should have a list of these events."
This led to a much more robust and purpose-built design.
# AI-Assisted Refined Models (Second Pass)
import datetime
from enum import Enum
from typing import List, Optional
from pydantic import BaseModel, Field
# ... (Developer, Project, Commit models remain the same)
class EventType(str, Enum):
REVIEW_REQUESTED = "review_requested"
COMMENT_ADDED = "comment_added"
APPROVAL_GIVEN = "approval_given"
MERGE_COMMITTED = "merge_committed"
CLOSED = "closed"
class PullRequestEvent(BaseModel):
id: int
pull_request_id: int
actor_id: int
event_type: EventType
created_at: datetime.datetime = Field(default_factory=datetime.datetime.utcnow)
class PullRequest(BaseModel):
id: int
project_id: int
author_id: int
title: str
created_at: datetime.datetime
events: List[PullRequestEvent] = []
In just two prompts, we went from a vague requirement to a sophisticated, event-sourced model for a key business metric. This iterative loop of generating and refining allowed me to evaluate design paths in minutes, a task that would have otherwise consumed hours of manual diagramming and documentation.
Phase 2: Validating Architectural Concepts with Shell Commands
A design that looks good on paper can fall apart in practice. The next phase of my AI collaboration was to bridge the gap between abstract models and concrete interactions. How would other services consume this data? How would we run analytics queries? I used the LLM to generate shell commands to simulate these interactions, pressure-testing the design before writing a single line of application code.
My prompt built directly on our previous conversation:
"Given the refined Pydantic/SQLAlchemy models, assume they are mapped to PostgreSQL tables named
developers,projects,pull_requests, andpull_request_events. Write apsqlquery to calculate the average 'time to first approval' for all pull requests in a project withid = 42that were created last month. 'Time to first approval' is the duration between the PR'screated_attimestamp and the timestamp of the firstAPPROVAL_GIVENevent."
The AI produced a complex but correct SQL query immediately.
-- AI-Generated SQL for Concept Validation
WITH FirstApproval AS (
SELECT
pre.pull_request_id,
MIN(pre.created_at) as first_approval_time
FROM
pull_request_events pre
WHERE
pre.event_type = 'APPROVAL_GIVEN'
GROUP BY
pre.pull_request_id
)
SELECT
AVG(fa.first_approval_time - pr.created_at) as avg_time_to_first_approval
FROM
pull_requests pr
JOIN
FirstApproval fa ON pr.id = fa.pull_request_id
WHERE
pr.project_id = 42
AND pr.created_at >= date_trunc('month', current_date - interval '1 month')
AND pr.created_at < date_trunc('month', current_date);
This is more than just a code snippet. This query validates the data model. It proves that the event-based structure is capable of answering the critical business questions I need it to. If the query had been convoluted or required multiple complex joins, it would have been a clear signal to rethink the schema.
I extended this concept to API interactions:
"Now, design a RESTful API endpoint in a FastAPI style to create a new
PullRequestEvent. Generate a samplecurlcommand to post a new 'APPROVAL_GIVEN' event for pull requestid = 101by developerid = 5."
# AI-Generated curl command for API interaction test
# Hypothetical API endpoint definition (for context):
# @app.post("/api/v1/pull-requests/{pr_id}/events")
curl -X POST \
https://api.example.com/v1/pull-requests/101/events \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <your_auth_token>' \
-d '{
"actor_id": 5,
"event_type": "APPROVAL_GIVEN"
}'
Generating these commands forced me to think through the practicalities of the API contract. What does the request body look like? How is authentication handled? What is the resource hierarchy? This simple validation step helps spot awkward or inefficient API designs long before implementation begins.
Phase 3: Accelerating Foundational Implementation
With a data model and API structure validated, it was time to build. This is where the AI transitions from architect to pair programmer. Because the LLM retained the full context of our design conversation, it could generate a remarkably complete and accurate codebase skeleton.
My prompt was a direct instruction to translate our work into code:
"Using our final Pydantic models for the analytics platform, generate a complete Python FastAPI application file. Include:
- SQLAlchemy ORM class definitions corresponding to the Pydantic models.
- A PostgreSQL database connection setup.
- A basic REST API with CRUD endpoints for Projects and Developers.
- The
POST /api/v1/pull-requests/{pr_id}/eventsendpoint we designed.- A GET endpoint that executes the 'time to first approval' query we validated."
The resulting output was a multi-page file containing a runnable FastAPI application. It included the database engine setup, SQLAlchemy models with relationships, Pydantic models for request/response validation, and API routers with the requested logic. It saved me hours of tedious boilerplate and setup, allowing me to focus immediately on the more nuanced business logic.
Practical Tips for AI-Powered Architecture
- Maintain a Long-Running Context: This is the most important rule. Use a single chat/thread for an entire feature or system design. The AI's memory of your previous decisions, constraints, and code is its most powerful feature.
- Be the Architect, Not the Stenographer: The AI generates options; you provide the direction and make the critical decisions. Constantly question its output. Ask for alternatives, probe for trade-offs, and correct its mistakes. You are the senior partner in this collaboration.
- Provide Clear Constraints: The quality of the output depends on the quality of the input. Be specific about your technology stack, design patterns, and constraints. Instead of "write an API," say "write a RESTful API using Python 3.11, FastAPI, and the repository pattern."
- Seed it with Existing Code: When working on an existing system, provide the AI with relevant code snippets. "Given this existing
UserServiceclass, how would you add a method for password resets that is secure against timing attacks?" This grounds the AI's suggestions in your current reality.
Conclusion
The era of using LLMs solely for generating isolated functions or explaining error messages is over. By shifting our mindset from delegation to collaboration, we can unlock their potential as a powerful tool in the architectural design process. My experience building a complex system from scratch proved that an AI co-pilot can dramatically accelerate the entire lifecycle: from exploring a wide range of data models, to validating them with practical interaction simulations, to scaffolding a robust and well-structured codebase.
This collaborative workflow—Explore, Validate, Implement—doesn't replace the need for skilled architects and developers. It augments them. It frees us from tedious boilerplate and allows us to focus on what truly matters: making sound design decisions, solving complex business problems, and building better software, faster.
Top comments (1)
100M tokens across 11 sessions is a serious dataset for lessons learned. The context drift problem you hit at session scale is real — the model starts averaging across too many competing instructions.
The fix that worked for us: structured prompts with explicit blocks per session rather than accumulated prose. Role, context, constraints, and objective re-stated cleanly each time. The model treats it as a fresh specification rather than trying to reconcile contradictions from earlier turns. flompt.dev / github.com/Nyrok/flompt for the visual approach to building those blocks.