DEV Community

@codingsh
@codingsh

Posted on

Building an AI Node Client in Elixir from Scratch

Building an AI Node Client in Elixir from Scratch

This post walks through how ElixirClaw works — a real Elixir project that connects a local device to an AI gateway over WebSocket, then executes commands (screenshots, notifications, shell commands) when the AI asks.

If you're new to Elixir, don't worry. I'll explain each concept as we go. By the end you'll understand how a real production-ish OTP application is structured.


What We're Building

The flow looks like this:

AI Gateway (remote server)
        |
        | WebSocket
        |
ElixirClaw (your machine)
        |
        |-- camera.snap  → take a photo
        |-- screen.snap  → take a screenshot
        |-- system.run   → run a shell command
        |-- system.notify → send a desktop notification
Enter fullscreen mode Exit fullscreen mode

The node connects to the gateway, authenticates, and then listens forever for incoming commands.


Project Structure

lib/elixir_claw/
├── application.ex    # Starts everything
├── gateway.ex        # WebSocket connection
├── protocol.ex       # Encodes/decodes messages
├── node.ex           # Executes device commands
├── security.ex       # Input validation
└── auth.ex           # Device identity
Enter fullscreen mode Exit fullscreen mode

Let's walk through each one.


Step 1: The Application Supervisor

Every OTP application starts with a supervision tree. Think of it as a family tree where parents restart children that crash.

# lib/elixir_claw/application.ex

defmodule ElixirClaw.Application do
  use Application

  def start(_type, _args) do
    children = [
      # A registry — like a phone book for processes
      {Registry, keys: :unique, name: ElixirClaw.Registry},

      # A dynamic supervisor — can start/stop Gateway connections at runtime
      {DynamicSupervisor, strategy: :one_for_one, name: ElixirClaw.Gateway.Supervisor}
    ]

    Supervisor.start_link(children, strategy: :one_for_one, name: ElixirClaw.Supervisor)
  end
end
Enter fullscreen mode Exit fullscreen mode

strategy: :one_for_one means: if one child crashes, restart only that child (not the others).

The Registry lets us find processes by name — we'll use it to look up the Gateway process for a specific node ID.


Step 2: Connecting to the Gateway

The Gateway module is a GenServer — Elixir's pattern for a long-running process that holds state and handles messages.

# lib/elixir_claw/gateway.ex

defmodule ElixirClaw.Gateway do
  use GenServer

  defstruct [
    :gateway_host,
    :gateway_port,
    :config,
    :conn,          # The HTTP/WebSocket connection
    :websocket,     # The WebSocket state
    :state,         # :disconnected | :connected | :authenticated
    :reconnect_attempts,
    :tick_interval
  ]

  def start_link(config) do
    GenServer.start_link(__MODULE__, config, name: via_tuple(config.node_id))
  end

  # Register this process in the Registry under the node_id
  def via_tuple(node_id) do
    {:via, Registry, {ElixirClaw.Registry, {:gateway, node_id}}}
  end

  def init(config) do
    state = %__MODULE__{
      gateway_host: config[:gateway_host] || "127.0.0.1",
      gateway_port: config[:gateway_port] || 18789,
      config: config,
      reconnect_attempts: 0,
      tick_interval: 15_000,
      state: :disconnected
    }

    # {:continue, :connect} means: after init(), immediately call handle_continue(:connect, ...)
    {:ok, state, {:continue, :connect}}
  end
end
Enter fullscreen mode Exit fullscreen mode

The via_tuple trick lets us register the process by name. Later, we can look it up with Registry.lookup(ElixirClaw.Registry, {:gateway, some_node_id}).


Step 3: The WebSocket Connection

This is where Elixir gets interesting. We use Mint (an HTTP client) and Mint.WebSocket to handle the connection:

def handle_continue(:connect, state) do
  case connect(state) do
    {:ok, new_state} -> {:noreply, new_state}
    error ->
      Logger.error("Failed to connect: #{inspect(error)}")
      {:stop, :connection_failed, state}
  end
end

defp connect(state) do
  # Build the URI
  uri = "ws://#{state.gateway_host}:#{state.gateway_port}"
  parsed_uri = URI.parse(uri)

  # Open an HTTP connection first
  case Mint.HTTP.connect(:http, parsed_uri.host, parsed_uri.port, []) do
    {:ok, conn} ->
      # Upgrade to WebSocket
      case Mint.WebSocket.upgrade(:ws, conn, "/", []) do
        {:ok, conn, ref} ->
          # Wait for the HTTP 101 Switching Protocols response
          receive do
            {:tcp, _port, data} ->
              case Mint.WebSocket.stream(conn, data) do
                {:ok, conn, [{:status, ^ref, status}, {:headers, ^ref, headers}, {:done, ^ref}]} ->
                  {:ok, websocket} = Mint.WebSocket.new(conn, ref, status, headers)
                  {:ok, %{state | conn: conn, websocket: websocket, state: :connected}}
                error -> {:error, error}
              end
          end
        error -> {:error, error}
      end
    error -> {:error, error}
  end
end
Enter fullscreen mode Exit fullscreen mode

Once connected, the BEAM delivers incoming TCP data as messages to our process:

def handle_info({:tcp, _port, data}, state) do
  process_data(data, state)
end

defp process_data(data, state) do
  case Mint.WebSocket.stream(state.conn, data) do
    {:ok, conn, messages} ->
      new_state = %{state | conn: conn}
      Enum.reduce(messages, new_state, fn msg, acc ->
        handle_stream_message(msg, acc)
      end)
      |> then(&{:noreply, &1})
    _ -> {:noreply, state}
  end
end
Enter fullscreen mode Exit fullscreen mode

Each incoming WebSocket frame becomes a message we pattern-match on.


Step 4: The Protocol

The OpenClaw gateway speaks JSON with a simple envelope format. The Protocol module handles encoding and decoding:

# lib/elixir_claw/protocol.ex

defmodule ElixirClaw.Protocol do
  defstruct [:type, :id, :method, :payload, :event, :ok, :error]

  # Encode outgoing requests
  def encode_request(method, params, request_id \\ nil) do
    id = request_id || generate_request_id()
    message = %{
      "type" => "req",
      "id" => id,
      "method" => method,
      "params" => params
    }
    Jason.encode!(message)
  end

  # Decode incoming messages
  def decode(message) when is_binary(message) do
    case Jason.decode(message) do
      {:ok, %{"type" => "req"} = map} ->
        {:ok, %Protocol{type: :req, id: map["id"], method: map["method"], payload: map["params"] || %{}}}

      {:ok, %{"type" => "res"} = map} ->
        {:ok, %Protocol{type: :res, id: map["id"], ok: map["ok"], payload: map["payload"] || %{}}}

      {:ok, %{"type" => "event"} = map} ->
        {:ok, %Protocol{type: :event, event: map["event"], payload: map["payload"] || %{}}}

      error -> error
    end
  end

  def generate_request_id do
    :crypto.strong_rand_bytes(8) |> Base.encode16(case: :lower)
  end
end
Enter fullscreen mode Exit fullscreen mode

Notice how we use pattern matching to decode different message types cleanly. No if message.type == "req" chains — just pattern matching on the data shape.


Step 5: Handling the Authentication Handshake

When we connect, the gateway sends a challenge. We sign it and respond:

# In gateway.ex

defp handle_protocol_message(%Protocol{type: :event, event: "connect.challenge"} = msg, state) do
  connect_request = Protocol.build_connect_request(state.config, msg.payload)
  send_text(state, connect_request)
  state
end

defp handle_protocol_message(%Protocol{type: :res, method: "connect", ok: true}, state) do
  Logger.info("Connected to Gateway!")
  new_state = %{state | state: :authenticated, reconnect_attempts: 0}
  schedule_heartbeat(new_state)

  # Tell the gateway what we can do
  describe = Protocol.build_node_describe(state.config)
  describe_request = Protocol.encode_request("node.describe", describe, Protocol.generate_request_id())
  send_text(new_state, describe_request)

  new_state
end
Enter fullscreen mode Exit fullscreen mode

The gateway sends an event → we respond with a signed request → gateway confirms → we announce our capabilities.


Step 6: Executing Commands

When the gateway wants us to do something, it sends a node.invoke request:

defp handle_protocol_message(%Protocol{type: :req, method: "node.invoke"} = msg, state) do
  # Run in a separate process so we don't block the gateway
  Task.Supervisor.async_nolink(ElixirClaw.TaskSupervisor, fn ->
    handle_node_invoke(msg.payload, state)
  end)
  state
end
Enter fullscreen mode Exit fullscreen mode

The Node module handles the actual execution:

# lib/elixir_claw/node.ex

def execute("screen.snap", args) do
  with {:ok, _} <- check_cap("screen.snap") do
    result = capture_screen(args[:display] || :main, args[:options] || %{})
    %{ok: true, data: %{path: result}}
  else
    {:error, reason} -> %{ok: false, error: reason}
  end
end

defp capture_screen(_display, _options) do
  path = Path.join(System.tmp_dir!(), "elixir_claw_screen_#{:os.system_time(:millisecond)}.png")

  {cmd, args} = case :os.type() do
    {:unix, :darwin} -> {"screencapture", ["-x", path]}
    {:unix, :linux}  -> {"scrot", [path]}
    _                -> {"echo", ["unsupported"]}
  end

  System.cmd(cmd, args)
  path
end
Enter fullscreen mode Exit fullscreen mode

The with construct is Elixir's way of chaining operations that might fail. Each step must succeed ({:ok, _}) before moving to the next.


Step 7: Security — Don't Skip This

Running shell commands on behalf of a remote gateway is serious business. ElixirClaw has a dedicated Security module:

# lib/elixir_claw/security.ex

# Strip control characters, limit length
def sanitize_input(input) when is_binary(input) do
  input
  |> String.replace(~r/[\x00-\x1F\x7F]/, "")
  |> String.trim()
  |> String.slice(0, 65536)
end

# Reject obviously dangerous commands
def validate_command(cmd) do
  dangerous_patterns = [
    ~r/^rm\s+-rf\s+/,
    ~r/;\s*rm\s+/,
    ~r/\$\(/,       # Command substitution
    ~r/`/,          # Backtick execution
    ~r/curl.*\|\s*sh/i,
  ]

  if Enum.any?(dangerous_patterns, fn p -> Regex.match?(p, cmd) end) do
    {:error, :dangerous_command}
  else
    {:ok, cmd}
  end
end

# Only connect to private network addresses
def safe_url?(url) do
  case URI.parse(url) do
    %URI{host: host, port: port} -> safe_host?(host) and safe_port?(port)
    _ -> false
  end
end

defp safe_host?("localhost"), do: true
defp safe_host?("127.0.0.1"), do: true
defp safe_host?(<<"192.168.", _::binary>>), do: true
defp safe_host?(<<"10.", _::binary>>), do: true
defp safe_host?(_), do: false
Enter fullscreen mode Exit fullscreen mode

By default, ElixirClaw only connects to localhost. The command allowlist is empty by default — you explicitly add commands you trust.


Step 8: The CLI

Users interact with ElixirClaw through a simple CLI:

# lib/elixir_claw/cli.ex

def main(args) do
  case parse_args(args) do
    {:ok, {:node_register, opts}} -> node_register(opts)
    {:ok, {:node_start, opts}}    -> node_start(opts)
    {:ok, {:status, _opts}}       -> status()
    {:ok, {:interactive, _opts}}  -> interactive_mode()
    {:error, :unknown_command}    -> usage()
  end
end
Enter fullscreen mode Exit fullscreen mode
# Register and start
./elixir_claw node-register --display-name "My Node"
./elixir_claw node-start

# Check status
./elixir_claw status

# Interactive mode
./elixir_claw -i
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

Here's the full lifecycle of a command:

  1. Gateway sends {"type":"req","method":"node.invoke","params":{"command":"screen.snap"}}
  2. Gateway process receives TCP data → handle_info({:tcp, ...})
  3. Data decoded → Protocol.decode/1%Protocol{type: :req, method: "node.invoke"}
  4. Dispatched to → handle_protocol_message/2
  5. Task spawned → Node.execute("screen.snap", args)
  6. Screenshot taken → path returned
  7. Response sent back → {"type":"res","ok":true,"payload":{"path":"/tmp/screen_123.png"}}

The whole path is observable, testable at each step, and each step is isolated.


Running the Tests

mix test
Enter fullscreen mode Exit fullscreen mode

The test suite covers protocol encoding/decoding, security validation, and URL safety:

# test/security_test.exs

test "rejects dangerous commands" do
  assert {:error, _} = ElixirClaw.Security.validate_command("rm -rf /")
  assert {:error, _} = ElixirClaw.Security.validate_command("curl | sh")
end

test "accepts safe commands" do
  assert {:ok, "ls"} = ElixirClaw.Security.validate_command("ls")
  assert {:ok, "git status"} = ElixirClaw.Security.validate_command("git status")
end
Enter fullscreen mode Exit fullscreen mode

What's Next

The project has an open roadmap:

  • Livebook integration (live dashboards for your node)
  • Local AI via Nx/Bumblebee
  • Phoenix LiveView dashboard
  • More platform support

Contributions are welcome — the codebase is small and well-structured.


Clone It and Try It

git clone https://github.com/developerfred/ElixirClaw.git
cd ElixirClaw
mix deps.get
mix escript.build
./elixir_claw --help
Enter fullscreen mode Exit fullscreen mode

Or with Docker:

docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

Support the project:

  • ETH/ENS: 0xd1a8Dd23e356B9fAE27dF5DeF9ea025A602EC81e (codingsh.eth)
  • Polkadot: 5DJV8DsPT3KH1rzvqTGqJ7WsCNnFt5tBn6R9yfe8SGi7YmYD
  • Solana: EyFovdqgnLAicTrDzJzjawRciLHTtq5W7ZkUV5Q3azmb

GitHub: developerfred/ElixirClaw

Top comments (0)