perfectcorp-tryon-concierge: Virtual Try-On via Chat

#hermeschallenge #ai #python #agents

The Checkout Line That Never Ends

My partner was looking for a lipstick shade online. She had about 15 tabs open. Each one had a "virtual try-on" button that opened a separate app, required a selfie, took 20 seconds to load, and then showed a result in a shade slightly different from the one she clicked. She did this for an hour.

The product UX was fine. The shopping UX was broken. Each try-on experience was isolated. You could not say "show me what I looked like in shade 3 but with the eye shadow from the second tab." You could not ask a question. You could not compare. You could only click around and hope.

That is the gap perfectcorp-tryon-concierge tries to close. You get a chat interface. You upload a photo once. Then you describe what you want in plain English, and an agent handles the calls to PerfectCorp's YouCam APIs and shows you the result inline. No tabs. No 15-step menus. Just describe the look.

Shape of the Fix

The core flow is straightforward. A user message comes in, the agent parses intent, maps it to a YouCam API endpoint and product ID, calls the API, and returns the result image.

from tryon_concierge import TryOnAgent

agent = TryOnAgent(api_key="YOUR_PERFECTCORP_KEY")

result = agent.run(
    user_message="Show me with a dark burgundy lip and natural brow",
    photo_path="selfie.jpg"
)

# result.image_url   -> rendered try-on image
# result.products    -> list of matched products with IDs
# result.raw_calls   -> list of API calls made, for debugging
print(result.products)
# [{"type": "lip_color", "product_id": "LPC-0042", "shade": "Bordeaux"},
#  {"type": "eyebrow", "product_id": "EB-0017", "style": "natural"}]

The intent parser maps natural language phrases to YouCam product categories. "Dark burgundy lip" hits the lip color endpoint. "Natural brow" hits the eyebrow shaping endpoint. If the user asks for something the API does not support, the agent says so instead of silently returning a bad result.

What It Does NOT Do

This is not a recommendation engine. It does not tell you what shade will look good on you based on your skin tone. It does not browse product catalogs. It does not pull real product names from any retailer's inventory.

What it does is let you describe a look and see it rendered using YouCam's existing try-on technology. The product IDs it works with are the ones documented in the YouCam API spec. If you want to try on a specific SKU from Sephora's catalog, you would need to build that mapping layer yourself.

It also does not handle video. YouCam supports real-time AR, but this concierge is photo-only. The agent accepts a still image and returns a still image.

Inside the Project

The design has three layers. The outer layer is Gradio, which handles the UI and the file upload widget. The middle layer is the agent harness, which manages conversation state, parses intent, and orchestrates API calls. The inner layer is a thin API client that wraps the YouCam endpoints with typed request and response models.

The intent parser is rule-based, not another LLM call. Most beauty intents fall into a manageable set of categories: lip color, eyeshadow, blush, foundation, brow, hair color, and hair style. A classification layer maps user phrases to these categories using keyword matching and a small set of regex patterns. This keeps the agent fast and predictable. Adding an LLM call for intent parsing would slow things down and introduce nondeterminism into what should be a deterministic lookup.

Error recovery follows a simple ladder. If the primary product match fails, the agent tries the closest category fallback. If the API call itself fails, the agent reports the failure with the raw error rather than returning a placeholder image. The test suite has 41 tests covering the intent mapping, the API client, the error ladder, and the Gradio integration.

The project was built for PerfectCorp's developer challenge, which is why YouCam is the backend. But the agent harness is generic enough that you could swap in a different try-on provider by replacing the API client layer.

When This Is Useful

This is useful if you are building a beauty e-commerce experience and want to add a conversational try-on layer. It is also useful as a reference implementation for wrapping any synchronous visual API in a chat interface.

It is not useful if you need real-time AR. It is not useful if your catalog is large enough that you need a search or recommendation step before the try-on call. And it is not useful if your users want to compare multiple looks side by side in a grid view rather than sequentially in a chat.

The test suite covers four areas: intent mapping accuracy, API client request/response shapes, the error recovery ladder, and the Gradio widget integration. The 41 tests run fast because they stub the YouCam API responses rather than hitting the live endpoint. If you want to run against the real API, there is a LIVE=1 flag that skips the stubs and uses your actual API key.

The Gradio interface is a prototype surface. If you are shipping this to production, you would replace Gradio with your actual frontend and keep the agent harness as the backend service. The harness exposes a clean TryOnAgent.run() interface with no Gradio dependency, so the swap is straightforward.

Install or Try It

pip install perfectcorp-tryon-concierge

# Set your PerfectCorp API key
export PERFECTCORP_API_KEY="your_key_here"

# Launch the Gradio demo
python -m tryon_concierge.demo

# Or use the agent directly in Python
python -c "
from tryon_concierge import TryOnAgent
agent = TryOnAgent()
r = agent.run('red lip, smoky eye', photo_path='photo.jpg')
print(r.image_url)
"

The demo starts a local Gradio server at http://localhost:7860. Upload a photo, type a description, and the agent will call YouCam and render the result.

Related Libraries

Library	What it does	When to pair it
agentcast	Enforces structured output from LLM responses	Pair when you want typed JSON back from intent parsing instead of raw text
agentvet	Validates tool call arguments before execution	Pair to guard the product ID lookup before hitting the YouCam API
agentguard	Egress allowlist for outbound HTTP calls	Pair to lock the agent to only YouCam endpoints, nothing else
tool-schema-from-fn	Generates tool schemas from Python function signatures	Pair when exposing try-on functions as LLM tool calls
agentsnap	Snapshots agent call traces for debugging	Pair when you need a replay-able record of which API calls fired

What Is Next

The intent parser handles single-category requests well. Multi-step combinations like "dark lip, light blush, natural brow all at once" currently fire as three sequential calls. A batched call path would reduce latency. That is the next change.

The other gap is hair. YouCam has hair color and style APIs, and the client supports them, but the intent mappings for hair are less precise than for face products. More coverage there is on the backlog.

Longer term: the interesting problem is stateful sessions. Right now, each agent.run() call is independent. If a user says "make it a bit lighter" as a follow-up, the agent does not know what "it" refers to. Conversation memory with a rolling state object for the current look would make the experience feel much more natural. That is not in the current version.

Source: MukundaKatta/perfectcorp-tryon-concierge