Skip to content
English

System architecture

PhysiClaw has four layers: the agent (the brain), the MCP server (the translator), the sensors and actuators (cameras and the GRBL arm), and the phone (unchanged).

┌───────────────────────────────────────┐
│ AI Agent (brain) │
│ Claude Desktop / any MCP client │
│ sees screen → decides → calls tools │
└──────────────────┬────────────────────┘
│ MCP protocol
┌───────────────────────────────────────┐
│ PhysiClaw MCP server (Python) │
│ screenshot_top · screenshot_side │
│ move · tap · swipe · park │
└──────────┬──────────────────┬──────────┘
│ │
USB cameras USB serial (GRBL)
│ │
▼ ▼
┌──────────────┐ ┌────────────────┐
│ Top camera │ │ GRBL board │
│ Side camera │ │ X/Y gantry · Z │
└──────────────┘ └──────┬─────────┘
│ touch
┌─────────────┐
│ Phone │
│ (unlocked) │
└─────────────┘

Any Model Context Protocol client. It receives screenshots as images and chooses high-level actions — a direction and distance, a tap, a swipe — never raw pixel coordinates.

A small Python process that exposes hardware as tools. It owns the calibration math (pixels → motor steps), camera capture, and serial communication with the board.

  • Top camera looks straight down and reads the screen.
  • Side camera views from ~45° and checks the stylus tip before a touch.
  • GRBL board drives an X/Y gantry and a Z axis that lowers the stylus.

Completely untouched — no app, no profile, no developer mode. It only ever sees a finger.

HopTransportPayload
Agent ↔ serverMCP (stdio)tool calls + image results
Server ↔ camerasUSB UVCJPEG frames
Server ↔ boardUSB serialG-code (GRBL dialect)

Continue to The control loop to see how these hops cycle on every action.