Skip to content

What you'll build

Before you buy anything, here’s the whole picture: what the finished rig looks like, what it costs, and the handful of steps between a box of parts and an agent tapping your phone.

┌─ overhead camera (on a gooseneck, ~25 cm up)
│ looks straight down at the screen
┌────────┐
│ camera │
└────────┘
╎ sees
┌───────────────────────┐ ┌──────────────┐
│ pen-plotter arm │◄──USB──│ your Mac │
│ X/Y gantry │ 12 V │ physiclaw │
│ Z = solenoid + stylus│◄──pwr──│ server │
└─────────┬─────────────┘ └──────────────┘
│ taps
┌──────────────┐
│ the phone │ seated face-up in a corner holder
│ (unlocked) │
└──────────────┘

≈ $112

Off-the-shelf parts. The arm is a consumer pen plotter; the camera is an ordinary USB webcam.

≈ an afternoon

Assembly is mechanical — no soldering, no custom boards. Calibration is a guided wizard.

No special skills

If you can follow a wiring diagram and clamp a gooseneck, you can build one.

macOS today

macOS is the supported platform; Windows works for most of the flow. (Serial ports differ.)

The rest of the docs are this build, in order. Each step hands off cleanly to the next:

  1. Build the hardware — order the parts, then assemble the arm, stylus, camera, and phone holder.

  2. Install the software — one command installs the physiclaw CLI; physiclaw doctor checks your machine.

  3. Prepare the phone — turn on AssistiveTouch and add the three iOS Shortcuts. Five minutes, once.

  4. Calibrate — a guided wizard teaches the arm and camera exactly where the screen is, so taps land where the agent aims.

  5. Run your first task — connect your agent and give it a goal in plain language. Watch it look, decide, and tap.