You’re becoming
your agent’s QA.
Automate it, mate.
Iris gives your coding agent eyes inside your running app. Every change verified with evidence. No screenshots.
npm i -D @syrin/iris0×
fewer tokens
0 min
to first verification
0
screenshots needed
01 · the problem
Sound familiar?
11:42 PM, again.
The agent can’t check its own work. Screenshots are expensive and blind. So you click through the same flows, after every edit.
the loop you’re stuck in
- 1. you: “build the checkout flow”
- 2. agent: “✅ Done!”
- 3. you: click around the app
- 4. you: it’s broken, paste the error
- 5. agent: “✅ Fixed!”
- ↻ goto 3, forever
✅ Done! Checkout works perfectly end-to-end.
POST /api/order returned 500
TypeError in the console, unread
“Order confirmed” dialog never opened
it didn’t lie on purpose. it just couldn’t see.
02 · the blind spots
What a screenshot
will never catch.
Your app already knows everything that happened. Iris exposes it to your agent over MCP, as evidence instead of pixels.
A failed API call
The page looks fine. The POST returned 500. Iris reports method, URL, status, and timing.
net · POST /api/order · 500
A button that quietly vanished
Baseline now, diff later. Iris tells you what silently went missing.
diff · "Export CSV" · missing
A webhook that never fired
Store commits, websockets, async jobs. One iris.signal() call surfaces them.
signal · order:paid · absent
A console error nobody read
Including the assertion teams forget: “no errors at all during this flow.”
console · level:error · absent
A dead button on page 7
Clicks that do nothing. Routes that never change. Iris observes the reaction, or the lack of one.
observe · click → no reaction
The exact file to fix
On React, Iris maps the broken element to its component and file:line. The agent goes straight there.
src · CheckoutForm.tsx:42
03 · how it works
The agent verifies like an engineer.
Four tiny tools over MCP. Dev-only SDK, nothing leaves your machine.
iris_query({ role: "button", name: "Pay"})// → { found: true, ref: "btn-pay" }// ~28 tokens04 · the numbers
Cheap enough to run
on every edit.
0×
fewer tokens per verify step
0k → 2k
tokens on a 20-step flow
0%
deterministic verdicts, any LLM
tokens per step · measured, same page
Playwright MCP full snapshot
~0
Iris full-page (worst case)
~0
Iris interactive-only
~0
Iris verify loop
~0
Honest footnote: forced to dump the full tree, Iris is only ~1.6× smaller. The win is asking questions instead of dumping pages. node plan/vs-playwright.mjs
cumulative tokens · 20-step verification flow
a typical qa pass on the same flow
You, clicking through the app
~10 min
Your agent with Iris, on every edit
~40 sec
And the agent never gets bored. It runs the checklist after every change, including the flows you stopped re-clicking weeks ago.
05 · see it run
Watch an agent prove its work.
The agent ships a change. Iris fails the assertion, with the evidence and the near-miss.
06 · human in the loop
The agent works.
You stay in charge.
A floating panel rides along in your app while the agent runs. Every look, act, and assert streams through it in real time.

07 · your checklist, automated
The test cases you never automated?
Your agent runs them now.
The QA checklist, the acceptance criteria, the “I just eyeball it” steps: each one maps almost 1:1 to an Iris check. The agent runs them on every edit.
“Login with valid creds lands on the dashboard”
net /api/login 200 + element tab "Dashboard" visible
“Deleting an item removes it from the list”
element { text, scope: list } absent
“Submitting shows a success toast”
text "Saved" visible
“No console errors on checkout”
console level:error absent
Your CI Playwright suite still gates releases. Iris is the checklist your agent runs while it codes, including the long tail nobody wrote automation for. Record a flow once and it self-heals as your UI drifts.
08 · how it’s different
Not another browser driver.
Playwright drives browsers. Iris verifies apps, from the inside. They compose: drive with one, assert with Iris.
| Playwright / Cypress | Playwright MCP / DevTools MCP | Iris | |
|---|---|---|---|
| Agent verifies its own work, while coding | |||
| Sees network, console, routes, signals | partial | partial | |
| Runs inside your real session & auth | |||
| Points at the source file to fix | |||
| ~100 tokens per verify step | |||
| Scripted E2E suites that gate CI | composes | ||
| Cross-browser driving & automation |
09 · quickstart
Two minutes to first verification.
Then ask your agent: “add a logout button and verify it works with Iris.”
Install one package
npm i -D @syrin/irisSDK, React adapter, source mapping, spec runner, MCP server. One dev dependency.
Point your agent at the MCP server
// .mcp.json (Claude Code, Cursor, Windsurf…){ "mcpServers": { "iris": { "command": "npx", "args": ["@syrin/iris"] } } }Works with any MCP-capable coding agent.
Embed the SDK (dev only)
import { iris } from '@syrin/iris';if (import.meta.env.DEV) iris.connect({ session: 'my-app' });Localhost-only, tree-shaken out of production, zero telemetry.
10 · what’s next
Iris verifies your app in dev, today.
Want your production site
agent-ready next?
AI agents already browse, buy, and book on websites. We’re building the production layer of Iris so they can see, act, and verify on your site reliably. Leave your email. A human will reach out, not a drip campaign.
11 · questions
Asked, answered,
with evidence.
Something else on your mind? Open an issue on GitHub. We answer fast.