System architecture
PhysiClaw has four layers: the agent (the brain), the MCP server (the translator), the sensors and actuators (cameras and the GRBL arm), and the phone (unchanged).
┌───────────────────────────────────────┐│ AI Agent (brain) ││ Claude Desktop / any MCP client ││ sees screen → decides → calls tools │└──────────────────┬────────────────────┘ │ MCP protocol ▼┌───────────────────────────────────────┐│ PhysiClaw MCP server (Python) ││ screenshot_top · screenshot_side ││ move · tap · swipe · park │└──────────┬──────────────────┬──────────┘ │ │ USB cameras USB serial (GRBL) │ │ ▼ ▼ ┌──────────────┐ ┌────────────────┐ │ Top camera │ │ GRBL board │ │ Side camera │ │ X/Y gantry · Z │ └──────────────┘ └──────┬─────────┘ │ touch ▼ ┌─────────────┐ │ Phone │ │ (unlocked) │ └─────────────┘The layers
Section titled “The layers”Any Model Context Protocol client. It receives screenshots as images and chooses high-level actions — a direction and distance, a tap, a swipe — never raw pixel coordinates.
MCP server
Section titled “MCP server”A small Python process that exposes hardware as tools. It owns the calibration math (pixels → motor steps), camera capture, and serial communication with the board.
Sensors & actuators
Section titled “Sensors & actuators”- Top camera looks straight down and reads the screen.
- Side camera views from ~45° and checks the stylus tip before a touch.
- GRBL board drives an X/Y gantry and a Z axis that lowers the stylus.
Completely untouched — no app, no profile, no developer mode. It only ever sees a finger.
Communication
Section titled “Communication”| Hop | Transport | Payload |
|---|---|---|
| Agent ↔ server | MCP (stdio) | tool calls + image results |
| Server ↔ cameras | USB UVC | JPEG frames |
| Server ↔ board | USB serial | G-code (GRBL dialect) |
Continue to The control loop to see how these hops cycle on every action.