Building an AI Node Client in Elixir from Scratch
This post walks through how ElixirClaw works — a real Elixir project that connects a local device to an AI gateway over WebSocket, then executes commands (screenshots, notifications, shell commands) when the AI asks.
If you're new to Elixir, don't worry. I'll explain each concept as we go. By the end you'll understand how a real production-ish OTP application is structured.
What We're Building
The flow looks like this:
AI Gateway (remote server)
|
| WebSocket
|
ElixirClaw (your machine)
|
|-- camera.snap → take a photo
|-- screen.snap → take a screenshot
|-- system.run → run a shell command
|-- system.notify → send a desktop notification
The node connects to the gateway, authenticates, and then listens forever for incoming commands.
Project Structure
lib/elixir_claw/
├── application.ex # Starts everything
├── gateway.ex # WebSocket connection
├── protocol.ex # Encodes/decodes messages
├── node.ex # Executes device commands
├── security.ex # Input validation
└── auth.ex # Device identity
Let's walk through each one.
Step 1: The Application Supervisor
Every OTP application starts with a supervision tree. Think of it as a family tree where parents restart children that crash.
# lib/elixir_claw/application.ex
defmodule ElixirClaw.Application do
use Application
def start(_type, _args) do
children = [
# A registry — like a phone book for processes
{Registry, keys: :unique, name: ElixirClaw.Registry},
# A dynamic supervisor — can start/stop Gateway connections at runtime
{DynamicSupervisor, strategy: :one_for_one, name: ElixirClaw.Gateway.Supervisor}
]
Supervisor.start_link(children, strategy: :one_for_one, name: ElixirClaw.Supervisor)
end
end
strategy: :one_for_one means: if one child crashes, restart only that child (not the others).
The Registry lets us find processes by name — we'll use it to look up the Gateway process for a specific node ID.
Step 2: Connecting to the Gateway
The Gateway module is a GenServer — Elixir's pattern for a long-running process that holds state and handles messages.
# lib/elixir_claw/gateway.ex
defmodule ElixirClaw.Gateway do
use GenServer
defstruct [
:gateway_host,
:gateway_port,
:config,
:conn, # The HTTP/WebSocket connection
:websocket, # The WebSocket state
:state, # :disconnected | :connected | :authenticated
:reconnect_attempts,
:tick_interval
]
def start_link(config) do
GenServer.start_link(__MODULE__, config, name: via_tuple(config.node_id))
end
# Register this process in the Registry under the node_id
def via_tuple(node_id) do
{:via, Registry, {ElixirClaw.Registry, {:gateway, node_id}}}
end
def init(config) do
state = %__MODULE__{
gateway_host: config[:gateway_host] || "127.0.0.1",
gateway_port: config[:gateway_port] || 18789,
config: config,
reconnect_attempts: 0,
tick_interval: 15_000,
state: :disconnected
}
# {:continue, :connect} means: after init(), immediately call handle_continue(:connect, ...)
{:ok, state, {:continue, :connect}}
end
end
The via_tuple trick lets us register the process by name. Later, we can look it up with Registry.lookup(ElixirClaw.Registry, {:gateway, some_node_id}).
Step 3: The WebSocket Connection
This is where Elixir gets interesting. We use Mint (an HTTP client) and Mint.WebSocket to handle the connection:
def handle_continue(:connect, state) do
case connect(state) do
{:ok, new_state} -> {:noreply, new_state}
error ->
Logger.error("Failed to connect: #{inspect(error)}")
{:stop, :connection_failed, state}
end
end
defp connect(state) do
# Build the URI
uri = "ws://#{state.gateway_host}:#{state.gateway_port}"
parsed_uri = URI.parse(uri)
# Open an HTTP connection first
case Mint.HTTP.connect(:http, parsed_uri.host, parsed_uri.port, []) do
{:ok, conn} ->
# Upgrade to WebSocket
case Mint.WebSocket.upgrade(:ws, conn, "/", []) do
{:ok, conn, ref} ->
# Wait for the HTTP 101 Switching Protocols response
receive do
{:tcp, _port, data} ->
case Mint.WebSocket.stream(conn, data) do
{:ok, conn, [{:status, ^ref, status}, {:headers, ^ref, headers}, {:done, ^ref}]} ->
{:ok, websocket} = Mint.WebSocket.new(conn, ref, status, headers)
{:ok, %{state | conn: conn, websocket: websocket, state: :connected}}
error -> {:error, error}
end
end
error -> {:error, error}
end
error -> {:error, error}
end
end
Once connected, the BEAM delivers incoming TCP data as messages to our process:
def handle_info({:tcp, _port, data}, state) do
process_data(data, state)
end
defp process_data(data, state) do
case Mint.WebSocket.stream(state.conn, data) do
{:ok, conn, messages} ->
new_state = %{state | conn: conn}
Enum.reduce(messages, new_state, fn msg, acc ->
handle_stream_message(msg, acc)
end)
|> then(&{:noreply, &1})
_ -> {:noreply, state}
end
end
Each incoming WebSocket frame becomes a message we pattern-match on.
Step 4: The Protocol
The OpenClaw gateway speaks JSON with a simple envelope format. The Protocol module handles encoding and decoding:
# lib/elixir_claw/protocol.ex
defmodule ElixirClaw.Protocol do
defstruct [:type, :id, :method, :payload, :event, :ok, :error]
# Encode outgoing requests
def encode_request(method, params, request_id \\ nil) do
id = request_id || generate_request_id()
message = %{
"type" => "req",
"id" => id,
"method" => method,
"params" => params
}
Jason.encode!(message)
end
# Decode incoming messages
def decode(message) when is_binary(message) do
case Jason.decode(message) do
{:ok, %{"type" => "req"} = map} ->
{:ok, %Protocol{type: :req, id: map["id"], method: map["method"], payload: map["params"] || %{}}}
{:ok, %{"type" => "res"} = map} ->
{:ok, %Protocol{type: :res, id: map["id"], ok: map["ok"], payload: map["payload"] || %{}}}
{:ok, %{"type" => "event"} = map} ->
{:ok, %Protocol{type: :event, event: map["event"], payload: map["payload"] || %{}}}
error -> error
end
end
def generate_request_id do
:crypto.strong_rand_bytes(8) |> Base.encode16(case: :lower)
end
end
Notice how we use pattern matching to decode different message types cleanly. No if message.type == "req" chains — just pattern matching on the data shape.
Step 5: Handling the Authentication Handshake
When we connect, the gateway sends a challenge. We sign it and respond:
# In gateway.ex
defp handle_protocol_message(%Protocol{type: :event, event: "connect.challenge"} = msg, state) do
connect_request = Protocol.build_connect_request(state.config, msg.payload)
send_text(state, connect_request)
state
end
defp handle_protocol_message(%Protocol{type: :res, method: "connect", ok: true}, state) do
Logger.info("Connected to Gateway!")
new_state = %{state | state: :authenticated, reconnect_attempts: 0}
schedule_heartbeat(new_state)
# Tell the gateway what we can do
describe = Protocol.build_node_describe(state.config)
describe_request = Protocol.encode_request("node.describe", describe, Protocol.generate_request_id())
send_text(new_state, describe_request)
new_state
end
The gateway sends an event → we respond with a signed request → gateway confirms → we announce our capabilities.
Step 6: Executing Commands
When the gateway wants us to do something, it sends a node.invoke request:
defp handle_protocol_message(%Protocol{type: :req, method: "node.invoke"} = msg, state) do
# Run in a separate process so we don't block the gateway
Task.Supervisor.async_nolink(ElixirClaw.TaskSupervisor, fn ->
handle_node_invoke(msg.payload, state)
end)
state
end
The Node module handles the actual execution:
# lib/elixir_claw/node.ex
def execute("screen.snap", args) do
with {:ok, _} <- check_cap("screen.snap") do
result = capture_screen(args[:display] || :main, args[:options] || %{})
%{ok: true, data: %{path: result}}
else
{:error, reason} -> %{ok: false, error: reason}
end
end
defp capture_screen(_display, _options) do
path = Path.join(System.tmp_dir!(), "elixir_claw_screen_#{:os.system_time(:millisecond)}.png")
{cmd, args} = case :os.type() do
{:unix, :darwin} -> {"screencapture", ["-x", path]}
{:unix, :linux} -> {"scrot", [path]}
_ -> {"echo", ["unsupported"]}
end
System.cmd(cmd, args)
path
end
The with construct is Elixir's way of chaining operations that might fail. Each step must succeed ({:ok, _}) before moving to the next.
Step 7: Security — Don't Skip This
Running shell commands on behalf of a remote gateway is serious business. ElixirClaw has a dedicated Security module:
# lib/elixir_claw/security.ex
# Strip control characters, limit length
def sanitize_input(input) when is_binary(input) do
input
|> String.replace(~r/[\x00-\x1F\x7F]/, "")
|> String.trim()
|> String.slice(0, 65536)
end
# Reject obviously dangerous commands
def validate_command(cmd) do
dangerous_patterns = [
~r/^rm\s+-rf\s+/,
~r/;\s*rm\s+/,
~r/\$\(/, # Command substitution
~r/`/, # Backtick execution
~r/curl.*\|\s*sh/i,
]
if Enum.any?(dangerous_patterns, fn p -> Regex.match?(p, cmd) end) do
{:error, :dangerous_command}
else
{:ok, cmd}
end
end
# Only connect to private network addresses
def safe_url?(url) do
case URI.parse(url) do
%URI{host: host, port: port} -> safe_host?(host) and safe_port?(port)
_ -> false
end
end
defp safe_host?("localhost"), do: true
defp safe_host?("127.0.0.1"), do: true
defp safe_host?(<<"192.168.", _::binary>>), do: true
defp safe_host?(<<"10.", _::binary>>), do: true
defp safe_host?(_), do: false
By default, ElixirClaw only connects to localhost. The command allowlist is empty by default — you explicitly add commands you trust.
Step 8: The CLI
Users interact with ElixirClaw through a simple CLI:
# lib/elixir_claw/cli.ex
def main(args) do
case parse_args(args) do
{:ok, {:node_register, opts}} -> node_register(opts)
{:ok, {:node_start, opts}} -> node_start(opts)
{:ok, {:status, _opts}} -> status()
{:ok, {:interactive, _opts}} -> interactive_mode()
{:error, :unknown_command} -> usage()
end
end
# Register and start
./elixir_claw node-register --display-name "My Node"
./elixir_claw node-start
# Check status
./elixir_claw status
# Interactive mode
./elixir_claw -i
Putting It All Together
Here's the full lifecycle of a command:
- Gateway sends
{"type":"req","method":"node.invoke","params":{"command":"screen.snap"}} - Gateway process receives TCP data →
handle_info({:tcp, ...}) - Data decoded →
Protocol.decode/1→%Protocol{type: :req, method: "node.invoke"} - Dispatched to →
handle_protocol_message/2 - Task spawned →
Node.execute("screen.snap", args) - Screenshot taken → path returned
- Response sent back →
{"type":"res","ok":true,"payload":{"path":"/tmp/screen_123.png"}}
The whole path is observable, testable at each step, and each step is isolated.
Running the Tests
mix test
The test suite covers protocol encoding/decoding, security validation, and URL safety:
# test/security_test.exs
test "rejects dangerous commands" do
assert {:error, _} = ElixirClaw.Security.validate_command("rm -rf /")
assert {:error, _} = ElixirClaw.Security.validate_command("curl | sh")
end
test "accepts safe commands" do
assert {:ok, "ls"} = ElixirClaw.Security.validate_command("ls")
assert {:ok, "git status"} = ElixirClaw.Security.validate_command("git status")
end
What's Next
The project has an open roadmap:
- Livebook integration (live dashboards for your node)
- Local AI via Nx/Bumblebee
- Phoenix LiveView dashboard
- More platform support
Contributions are welcome — the codebase is small and well-structured.
Clone It and Try It
git clone https://github.com/developerfred/ElixirClaw.git
cd ElixirClaw
mix deps.get
mix escript.build
./elixir_claw --help
Or with Docker:
docker-compose up -d
Support the project:
- ETH/ENS:
0xd1a8Dd23e356B9fAE27dF5DeF9ea025A602EC81e(codingsh.eth) - Polkadot:
5DJV8DsPT3KH1rzvqTGqJ7WsCNnFt5tBn6R9yfe8SGi7YmYD - Solana:
EyFovdqgnLAicTrDzJzjawRciLHTtq5W7ZkUV5Q3azmb
GitHub: developerfred/ElixirClaw
Top comments (0)