TLDR: Comet is the first browser with real-time DOM awareness and agentic capabilities. Here's the architecture that makes it possible.
The Core Problem They Solved
Traditional browsers are stateless. Every page load is isolated. Even with AI extensions, they're blind to page structure, they see rendered pixels, not semantic meaning.
Comet's innovation: A hybrid architecture where the browser understands what you're looking at in real-time, maintains context across sessions, and can execute multi-step workflows autonomously.
Architecture Overview
┌─────────────────────────────────────┐
│ Presentation Layer (Chromium) │
├─────────────────────────────────────┤
│ DOM Interpretation Engine │
│ ├── Semantic Parser │
│ ├── Element Classifier │
│ └── Action Mapper │
├─────────────────────────────────────┤
│ Context Management │
│ ├── Local Vector Store │
│ ├── Session State │
│ └── Cross-Tab Memory │
├─────────────────────────────────────┤
│ Agent Orchestration │
│ ├── Task Planning │
│ ├── Workflow Execution │
│ └── Background Processes │
└─────────────────────────────────────┘
1. DOM Interpretation Engine
Unlike screen scrapers that use pixel coordinates or XPath selectors, Comet builds a semantic graph of every page.
What it extracts:
- Element roles (button, input, link, container)
- Data relationships (form groups, table hierarchies)
- Interactive capabilities (clickable, editable, submittable)
- Contextual meaning (what this button does, not just where it is)
Example structure:
{
"element": "button",
"role": "submit",
"context": "flight_search_form",
"action": "execute_search",
"required_fields": ["origin", "destination", "date"]
}
This semantic understanding is why it adapts when sites change their CSS or layout. It's not looking for #submit-btn
, it's looking for "the button that submits this form."
2. Context Management System
Local Vector Store
Every interaction, page, and query gets embedded into a local vector database. This enables:
- Semantic search across your browsing history - "Find that React hook article I read last week"
- Cross-tab context - The browser knows you have 3 tabs about Rust lifetimes open
- Session persistence - Context generally survives browser restarts on the same device
Stateful Conversations
Unlike traditional search, where every query is independent:
Query 1: "How does Rust handle memory?"
Query 2: "Show me an example" ← Knows "example means Raust memory example
Query 3: "What about lifetimes?" ← Maintains full conversation thread
The context generally persists within the same device session. Note: Context may be lost when clearing cache, switching devices, or in some edge cases. Cross-device sync requires account login.
3. Agent Orchestration Layer
This is where "agentic" happens. The browser can plan and execute multi-step workflows.
Task decomposition example:
User goal: "Find the cheapest flight LAX → Tokyo,
non-stop only, after 6 pm departure"
Execution plan:
1. Identify relevant airline sites
2. Open background tabs for each
3. Parallel extraction of flight data
4. Filter by constraints (non-stop, time)
5. Compare pricing
6. Build a comparison table
7. Pre-fill the booking form with the best option
Each step adapts based on what it finds. If United doesn't have non-stop flights, it skips to the next airline without breaking the workflow.
4. Background Assistants
Unlike traditional browser automations that block the UI, Comet runs tasks asynchronously in isolated contexts.
Architecture:
Main Thread (User browsing)
│
├── Background Worker Pool
│ ├── Assistant 1: Price monitoring
│ ├── Assistant 2: Email drafting
│ └── Assistant 3: Tab organization
│
└── Shared Context Bus
Each assistant has access to the DOM interpreter and context store, but runs independently. You can keep working while assistants handle parallel tasks.
5. Privacy Architecture
Local-first approach:
- DOM parsing: Primarily local
- Vector embeddings: Stored locally
- Session context: Local database
- AI inference requires cloud calls when using agentic features
What typically stays on your machine:
- Browsing history
- Passwords and payment info
- Most tab state and sessions
What may be sent to servers:
- Queries and minimal page context for AI inference
- Some metadata and feature usage diagnostics
- Technical telemetry (even in privacy modes with agentic features enabled)
Note: Incognito mode prioritises local processing, but some minimal diagnostics may still be transmitted when using AI features. Always check current privacy settings for your use case.
How It Handles Complex Workflows
Multi-site research example:
Task: Compare React vs Vue for a new project
Comet's execution:
1. Parse semantic intent (comparison task)
2. Identify relevant sources (docs, benchmarks, community)
3. Extract structured data from each:
- Performance metrics
- Bundle sizes
- Learning curves
- Ecosystem maturity
4. Synthesis into a comparison table
5. Cite sources for each claim
No manual tab switching. No copy-pasting between pages. The agent does the research work.
Why This Architecture Matters
Traditional browsers are document viewers with bolt-on features.
Comet rearchitected from the ground up around the question: "What if the browser understood intent, not just clicks?"
The result is a system where you describe goals rather than steps, context persists naturally, and repetitive workflows become one-line commands.
Try It
Download at https://pplx.ai/comet/browser (Windows, Mac)
Will perplexity be able to defeat Google? What's your take on Comet!
Top comments (0)