Skip to content

Operating modes

PhysiClaw can be driven two ways, and they share every bit of the same arm, camera, and tool set — only the brain differs. Either an external MCP client does the thinking and PhysiClaw is just the hands, or PhysiClaw runs its own built-in agent and needs no external client at all.

You already used the first mode in your first task, where Claude Desktop did the deciding. Here’s how that compares to letting PhysiClaw think for itself.

Bring your own agent (MCP)

An external client — Claude Desktop, OpenClaw, your own script — connects to http://localhost:8048/mcp and calls the tools. That client is the brain. PhysiClaw is a pure MCP server: it exposes peek, tap, swipe and the rest, and does exactly what it’s told. You drive the conversation.

Built-in agent runtime

PhysiClaw ships its own agent engine. Start the server and a runtime loop spawns alongside it, ready to wake on a trigger, plan, call the same tools, and act — no external client in the loop. The brain lives inside PhysiClaw.

The crucial thing: the tools are identical either way. The exact peektap(bbox)peek loop from the trace you watched runs the same whether an outside client or the in-tree engine is choosing the boxes. In fact, the instructions PhysiClaw hands an external client at connect time and the doctrine its own engine loads come from the same file — so both brains reason about the phone the same way.

Bring your own agentBuilt-in runtime
Who decidesexternal MCP clientPhysiClaw’s own engine
You needan MCP client (Claude Desktop, etc.)just physiclaw server
Starts onyou sending a messagea trigger firing (cron schedule, a poll)
Modelswhatever the client runs6 providers, you pick the active one
Memory & skillsthe client’s, if anybuilt in — persistent memory + a skill system
Best forhands-on, interactive tasksunattended, scheduled, recurring tasks

Neither is “the real” mode — they’re two front ends on the same hardware. Use the external client when you want to sit and steer; use the built-in runtime when you want the phone to do something on its own at 8am.

This is the MCP-server face of PhysiClaw. The server exposes its tools over a streamable-HTTP endpoint, and any MCP client connects and drives:

Claude Desktop / OpenClaw / your script ← the brain
│ MCP (http://localhost:8048/mcp)
PhysiClaw ← just the tools (peek, tap, swipe, …)
phone

Setup is exactly what you did on the previous page: point the client’s config at the endpoint and talk to it. The agent’s quality, memory, and personality are all the client’s; PhysiClaw contributes only the eyes and the finger. This is the mode to start with — nothing extra to configure beyond the client itself.

PhysiClaw is also a complete agent on its own. When you run physiclaw server, it spawns a runtime loop as a sibling process. That loop polls a set of hooks on a timer; when a hook fires a trigger — a cron schedule coming due, or a poll noticing something changed — the engine wakes, plans, and calls the same MCP tools to get the task done. No external client is connected; the brain is inside.

cron / poll trigger fires
PhysiClaw runtime ← the brain (its own engine, picks a provider)
│ calls the same tools
phone

The runtime adds the pieces a bare MCP server doesn’t have:

  • Multi-provider engine — the same agent loop runs on any of six model providers (Anthropic, OpenAI, Google, Moonshot, Qwen), and you pick the active one. See Models.
  • Trigger-driven wake — it runs unattended, waking on schedules or polls instead of on your messages. See Autonomous tasks.
  • Memory & skills — persistent memory across sessions and a skill system it can draw on. See Memory and skills.

The introduction framed PhysiClaw as OpenClaw’s idea — let the agent use the interface a human uses — pushed onto real glass. These two modes are where that lands: PhysiClaw is both an MCP server and a complete agent, and which one you reach for is your call. Bring a brain, or use the one in the box.