The control loop

Every action PhysiClaw takes is one turn of the same five-phase loop. Keeping the loop fixed is what makes the system reliable: each phase has one job and one check.

 Top camera ──→ AI agent ──→ 3-axis arm ──→ Side camera ──→ Aligned?
 (read screen)  (decide)     (move stylus)  (check tip)      │
      ▲                                                   yes │ no
      │                                                    │  │
      │     Touch phone ◄──────────────────────────────────┘  │
      │          │                                            │
      └──────────┘  (next action)          adjust & retry ◄───┘

The five phases

Park & screenshot. The stylus retracts out of frame so the top camera gets a clean, unobstructed view. The agent receives the screenshot.
Decide. The agent reads the screen and picks a high-level action: a direction and distance (move("down-right", "large")), or a gesture (tap, swipe).
Move. The server converts the decision into motor coordinates and drives the arm.
Verify. The side camera checks the stylus tip against the target. If it’s off, the agent nudges and re-checks — closing the gap before any touch happens.
Touch & repeat. The stylus drops, registers the touch, retracts, and the loop starts over for the next action.

Failure handling

Misalignment — caught at verify; the agent retries with a smaller move.
Missed touch — the next screenshot shows the screen didn’t change; the agent taps again.
Unexpected screen — a popup or ad appears; the agent simply treats it as the new state and decides again. There is no brittle script to fall out of.

This “observe the result, then decide again” design means PhysiClaw recovers from surprises the same way a person would — by looking and trying again.