Jason Peterson

Posted on Nov 14

Building a Rubik's Cube Solver: A Primer on Claude Skills

#llm #tooling #tutorial #ai

Got a scrambled Rubik's Cube gathering dust on your desk? I did too, which prompted me to build a Claude Skill that solves it from six photos. Take pictures of each face, and Claude analyzes them, validates the cube state, and returns step-by-step solving instructions.

This project became a perfect way to understand how Claude Skills work—and discover some surprising aspects of building AI-powered workflows.

What Are Claude Skills?

Claude Skills are procedural workflows that extend Claude's capabilities by combining its natural strengths (vision, reasoning, conversation) with external tools like Python scripts, APIs, or command-line utilities.

The heart of a Claude Skill is a SKILL.md file—a markdown document containing step-by-step instructions that Claude follows. Think of it as a playbook that tells Claude how to orchestrate a complex task from start to finish.

Why does this matter? Skills are repeatable, shareable, and specialized. You can build a workflow once, package it with supporting scripts, and anyone can use it conversationally through Claude. No API wrangling, no UI to build—just describe the procedure, and Claude handles the orchestration.

Anatomy of a Claude Skill (Using the Rubik's Solver)

Let's walk through how the Rubik's Cube solver works to see the components in action:

The SKILL.md file contains procedural instructions like:

### Step 1: Request Photos
Request exactly 6 photos from the user, one of each cube face
IN THIS SPECIFIC ORDER with correct orientation.

**Photo 1: White face (center) - Hold cube with BLUE on top**
**Photo 2: Orange face (center) - Hold cube with WHITE on top**
...

Python scripts handle the computational heavy lifting:

solve_cube.py validates color sequences, renders visualizations, and runs the Kociemba solving algorithm
Each face gets cached after validation
The solver concatenates all faces and computes the optimal solution

Claude's orchestration ties it all together:

Analyzes photos using vision capabilities to extract the 9-sticker color sequence for each face
Calls validation scripts with the detected colors
Renders an emoji-based cube visualization for user confirmation
Handles errors, corrections, and edge cases conversationally
Delivers the final step-by-step solving instructions

The workflow looks like this:

Request 6 photos (with specific orientation instructions)
Claude analyzes each photo → extracts 9-sticker color sequence
Python validates and caches each face
Renders emoji visualization for user confirmation
Concatenates → Solves → Returns instructions

The skill in action: Claude guides the user through photo capture, validation, and solving

Things That Might Surprise You as a Developer

Building this skill revealed some interesting characteristics of the Claude Skills environment:

Dependencies Are Casual

You just mention pip3 install kociemba in your SKILL.md file, and Claude installs it—every time you run the skill. There's no Docker image, no virtual environment, no persistent package state to manage. The configuration is completely stateless.

This feels weird at first. Where's my requirements.txt? What about version pinning? But it's also liberating: your skill is just instructions plus scripts. Claude handles the environment setup each time.

Everything Runs Remotely

Claude executes your scripts on a remote server, not your local machine. Your skill package is just a set of instructions and supporting files that Claude interprets and runs in its environment. You're not SSH-ing into a box or deploying to infrastructure—you're describing a workflow, and Claude makes it happen.

Iteration Takes Patience

Here's the practical friction: Claude Code can't directly access image files uploaded in chat during skill development. My workaround was zipping the skill contents (SKILL.md, scripts, etc.) and uploading to Desktop for testing.

Testing also means going through the full photo upload flow each iteration. It's more friction than local development with instant feedback, but manageable once you build a rhythm. Think of it as similar to testing a mobile app—you need to go through the actual user flow.

The Vision Challenge

One surprise: computer vision to color string conversion isn't 100% reliable, even with clear, well-lit photos. Claude sometimes misreads sticker colors. That's why the skill includes a validation step—rendering an emoji visualization for the user to confirm.

This is still dramatically easier than the old way: coaxing OpenCV into reliable color detection with threshold tuning, color space conversions, lighting normalization, and endless edge case handling. I'll take "human confirms the visualization" over "debug CV pipeline for three hours" any day.

Skills Are Non-Deterministic

Here's what really impressed me: Skills aren't just rigid instruction-following. Claude doesn't execute your SKILL.md robotically.

When things go wrong—a misoriented photo, unclear lighting, user uploads faces out of order—Claude adapts. It will valiantly work to reach a valid solver state, handling scenarios I didn't anticipate when designing the workflow. The conversational nature means graceful degradation, not hard failures.

If a photo doesn't make sense, Claude asks clarifying questions. If the solver fails, it walks the user through corrections. This resilience comes "for free" from Claude's reasoning capabilities layered on top of your procedural instructions.

Following Claude's step-by-step instructions to solve the cube

Why Solutions Are Always ~20 Moves or Less

The Kociemba algorithm uses a two-phase approach that's proven to solve any valid cube state in ~20 moves or fewer (often much less). Unlike beginner methods that solve layer-by-layer, or advanced methods like CFOP that optimize for speed, Kociemba finds mathematically near-optimal solutions:

Phase 1: Getting the cube into a specific subset of positions (G1 group)
Phase 2: Solving from that subset to completion

This approach sacrifices execution speed (the moves aren't optimized for finger tricks) but guarantees remarkably short solutions. A beginner method might take 80-100 moves; CFOP averages 50-60. Kociemba typically delivers solutions in 18-22 moves, making it ideal for casual solving where you want minimal steps, not maximum speed.

Try It Yourself & Help Me Improve

The skill is open source on GitHub. It works with any standard 3×3 Rubik's Cube, and the only requirement is the kociemba library (which Claude auto-installs).

How to Install:

Clone the repository:

   git clone https://github.com/JasonMakes801/rubiks-cube-solver.git
   cd rubiks-cube-solver

Create a zip file of the skill:

   zip -r rubiks-cube-solver.zip SKILL.md scripts/ README.md

Install in Claude Desktop:
- Open Claude Desktop (or Claude Code)
- Upload the rubiks-cube-solver.zip file to chat
- Ask Claude to "extract this skill and help me solve my Rubik's Cube"
- Claude will unpack the skill and begin the photo-guided workflow
Usage:
- Have your scrambled Rubik's Cube ready
- Prepare to take 6 clear photos (one per face)
- Follow Claude's orientation instructions carefully
- Review the emoji visualization to confirm colors
- Follow the step-by-step solving instructions

One Idea I'm Considering:

Can we eliminate the human-in-the-loop validation step?

Is there a way to build error correction into the scripts themselves—maybe cross-referencing impossible color combinations, using multiple validation passes, or applying constraint satisfaction logic—so the vision-to-string conversion becomes reliable enough to skip user confirmation?

If you have ideas on this or other improvements, I'd love to hear them. Open an issue or PR, or just share your thoughts.

Takeaway

Claude Skills unlock a new pattern: conversational workflows orchestrating specialized tools. This approach works far beyond Rubik's Cubes—data analysis pipelines, image processing workflows, API integrations, code generation tasks.

The barrier to building custom AI workflows just got a lot lower. You don't need to build UIs, manage API keys, or handle state management. Just describe the procedure in markdown, provide the tools, and Claude handles the orchestration conversationally.

What will you build?

DEV Community