Semantic Hand: Running Local AI in the Browser for Physics-Based Hand Control

#ai #llm #webdev #robotics

Semantic Hand is a small AI sandbox running in a web browser with a virtual hand, joint controls, physics, contact forces, touch feedback, and two balls. The user gives natural-language prompt, AI decides which low-level hand controls to call, and the simulation applies them the hand.

General-purpose AI is already good at instructions and using tools. What happens when the instruction has to survive contact with physics? How does it translate intent into spatial control? Can it follow through a simple instruction like "lift the ball"?

What It Does:

Runs (Nemotron 3 Nano 4B or Qwen 3.5 4B) as local browser models through Transformers.js and WebGPU.
Lets the model inspect scene and hand state with explicit tool calls.
Moves and rotates the hand.
Curls individual fingers.
Controls thumb opposition and thumb joint controls.
Reports touch/contact/pressure data.
Provides manual controls and a dev harness for inspecting the same runtime contracts.

The control loop works: state comes in, commands go out, the physics engine changes the scene, contact data comes back, and the model continues from the new state.

What Worked Well:

There are two balls in the scene. The AI correctly assumed we want to interact with the ball, that is closer to the hand and in the view.
We wanted to see how a generic-purpose AI behaves, without telling the AI how to achieve any specific task (e.g., "lift the ball"). It should be able to figure it out by itself. The system prompt remained limited to a very basic information about which tools are available and how to use them. The AI followed the system prompt and used the available tools to get information about its environment.

What Was Challenging:

We ran thousands of iterations over the system prompt to ensure it remained generic enough for any task, yet specific to prevent hallucinations like calling tools that don't exist, or calling them in the wrong format, or not using them at all. For example there is a tool which tells the AI the distance between any two objects, yet the AI preferred to reason about it or ignored this tool completely and tried to calculate the distances on its own.
Making the AI models to consider the volume and shape of objects, without too much "hand-holding". A ball has a centre point and radius. A hand is a complex shape, or a group of shapes. Moving the hand over the ball was interpreted as moving a point in space above another point in space… and resulted in the AI trying to push the hand through the ball. The physics engine kicked in and shot the ball away.
With thinking and reasoning enabled, the AI questioned its every step excessively. Despite the tools calls providing exact coordinates, distances, and measurements, it tried to reason about them and often exhausted the token limit before making any action. A lot of "Wait, but…".

Local AI gave us the freedom to iterate but expect ~2.6GB AI model download from huggingface. Not required for the manual control.

Sandbox:
👉 https://semantic-hand.pages.dev/