From Screen Recording to Agent Skill: A New Pipeline
The way we teach AI agents to perform tasks is fundamentally broken. We've been asking humans to write code that describes visual interactions—and wondering why the results are brittle and frustrating.
The Old Way: Write, Break, Fix, Repeat
Traditional browser automation follows this pattern:
- Inspect the DOM
- Write selectors
- Test the script
- Watch it break when the UI changes
- Repeat
This approach requires technical expertise, creates maintenance nightmares, and fails catastrophically when websites update their designs.
A Better Approach: Demonstrate, Extract, Execute
What if we flipped the model? Instead of humans writing code for computers, let computers learn from humans.
Here's the pipeline that makes this possible:
Step 1: Demonstrate
Record your screen while performing the task naturally. Click buttons, fill forms, navigate pages—just do what you normally do. No special tools or coding required.
Step 2: Extract
AI analyzes the recording to identify:
- The goal you're trying to achieve
- Individual actions (clicks, inputs, navigation)
- UI elements you interacted with
- Decision points and branches
- Error states and recovery paths
Step 3: Structure
The extracted information is formatted as a SKILL.md file—a structured format that's both human-readable and machine-executable. Unlike brittle selectors, SKILL.md describes intent:
## Goal
Book a meeting through the website booking form
## Workflow
1. Navigate to /schedule
2. Identify calendar widget with available time slots
3. Select first available slot
4. Fill required contact fields
5. Submit booking
## Success Criteria
- Confirmation message appears
- OR confirmation email received
Step 4: Execute
Any compatible agent can now execute this skill. Because SKILL.md describes what to accomplish (not where to click), it works across different websites with similar functionality.
Why This Pipeline Changes Everything
For Developers:
- No more maintaining brittle selectors
- Skills are portable across agent frameworks
- Version control friendly (text-based format)
For Domain Experts:
- Create automation without coding
- Capture institutional knowledge in executable form
- Share skills with team members
For Organizations:
- Build reusable skill libraries
- Reduce automation maintenance costs
- Democratize tool creation
The Technical Magic
Three technologies make this pipeline possible:
- Computer Vision: Identifies UI elements by appearance and context, not just DOM position
- LLM Understanding: Extracts intent and workflow from visual demonstrations
- Structured Format: SKILL.md provides a contract between humans and agents
Real-World Applications
- Customer Support: Record ticket resolution once, automate forever
- Sales Operations: Demo the CRM workflow, get a reusable skill
- HR Onboarding: Capture the account setup process, scale infinitely
- DevOps: Document deployment procedures in executable form
Try It Yourself
Want to see this pipeline in action?
🚀 Check out SkillForge — record your screen, get a SKILL.md file
🔥 Support our Product Hunt launch
What workflows would you turn into agent skills?
Top comments (0)