Progress: 35% | Focus: Architecture & Tech Stack
Today marked a significant milestone — transitioning from concept to code. I officially started building the Marketing Research Multi-Format Generator as a standalone Python CLI tool, and the architecture decisions I made today will shape the entire project.
The Core Architecture Decision
After yesterday's planning, I settled on a modular design with three key components: Content Transformer, Output Manager, and Format-Specific Generators. Here's the structure I built:
marketing-research-tool/
├── main.py # Entry point and CLI interface
├── config/
│ └── output\_config.yaml # Format configuration
├── templates/
│ ├── theme1.html # Professional HTML template
│ ├── chart.js # Chart configurations
│ └── report\_styles.css # Modern CSS styling
├── outputs/ # Format-specific output directories
│ ├── html/
│ ├── pdf/
│ ├── pptx/
│ ├── images/
│ └── notion/
└── temp/
└── content.json # Standardized content schema
`
The breakthrough was realizing I needed a Content Transformer that converts Claude's raw HTML output into a standardized schema, then an Output Manager that coordinates multiple format generators simultaneously. This means adding new formats (like social media images or Notion pages) doesn't break existing functionality.
Python Over JavaScript: The Strategic Choice
I wrestled with this decision for hours. JavaScript would have meant faster prototyping and web integration, but Python won for several crucial reasons:
-
Library ecosystem:
python-pptx
,pdfkit
, andJinja2
are mature, well-documented libraries. -
AI API integration: Python's
requests
library and JSON handling feel more natural for API work. -
Data processing: If I need
pandas
for analytics later, Python's the obvious choice. -
CLI tooling:
Click
andargparse
make building professional CLIs straightforward.
The trade-off? Slower initial development, but more robust long-term architecture.
Tech Stack Deep Dive
Core Components:
- Anthropic Claude API: For intelligent content generation and research analysis
- wkhtmltopdf: HTML-to-PDF conversion with professional styling
-
YAML configuration: Clean, readable format control via
output_config.yaml
- Jinja2-style templating: Professional HTML templates with modern CSS
-
JSON schema: Standardized content structure in
temp/content.json
The Smart Setup Decision:
Instead of complex CLI frameworks, I kept it simple — a single main.py
with interactive prompts. Users just run:
bash
python main.py
`
Enter their research topic, and get multiple professional formats automatically. The magic happens in the background with the Output Manager coordinating everything.
The Standardized Content Schema
I designed a JSON schema that captures everything needed across all output formats:
`json
Stored in temp/content.json
{
"title": "str",
"generation_date": "str",
"sections": [{"title": "str", "content": "str"}],
"metrics": {"kpis": [], "data_points": []},
"chart_data": {},
"images": [],
"color_palette": ["#primary", "#secondary", "..."]
}
`
This schema is the secret sauce — generate once from Claude, transform to standardized format, then render across HTML, PDF, PowerPoint, social media images, and even Notion pages. All formats stay perfectly synchronized.
The AI API Learning Curve
Working with Claude's API introduced unexpected challenges:
Challenge 1: Response Parsing
Claude doesn't always return perfectly structured JSON. I had to build robust parsing with fallback strategies.Challenge 2: Rate Limiting
Learning to implement exponential backoff and request queuing to stay within API limits.Challenge 3: Prompt Engineering
Discovering that prompt structure dramatically affects output quality. Template-based prompts with clear formatting instructions work best.
The breakthrough moment came when I realized I could use Claude not just for content generation, but for content transformation — taking raw research data and converting it into presentation-ready insights.
Multi-Format Pipeline Reality
The actual pipeline I built is beautifully simple:
- User Input: Interactive prompt for research topic
- AI Generation: Single Claude API call for comprehensive research
- Content Transform: Raw HTML → Standardized JSON schema
- Format Distribution: JSON → Multiple generators (HTML, PDF, PowerPoint, etc.)
- Output Coordination: All formats saved with timestamps to organized directories
The genius is in the results_index.json
— it tracks every generated report, making it easy to find and reference past research. Users get a complete research suite from one simple command.
Today's Coding Wins ✅
- Complete project structure with organized directories
- Content Transformer working (HTML → JSON schema)
- Output Manager coordinating multiple formats
- Professional HTML template with modern CSS (
theme1.html
) - YAML-based configuration system
- Claude API integration and content standardization
- Results tracking with
results_index.json
The Unexpected Breakthrough
The biggest revelation wasn't technical — it was user experience. Instead of building a complex CLI with dozens of options, I created something dead simple:
`bash
python main.py
`
Enter your topic → wait 30 seconds → get professional research in 5 formats.
Sometimes the best architecture decision is making things disappear for the user.
Tomorrow's Focus
With the foundation solid, Day 6 will focus on implementing the first concrete output generator — probably PDF since it's the most straightforward. I'll also tackle template design and CSS styling for professional-looking reports.
The architecture feels right: clean, extensible, and future-proof. Sometimes the hardest part isn't writing code — it's designing systems that won't break when you scale them.
Current Status:
Foundation complete, ready to build upward.
Building something meaningful, one commit at a time.
Top comments (0)