Your agents could be failing silently right now. Find out in 2 min →
open source · MIT · by Syrin Labs

You’re becoming
your agent’s QA.
Automate it, mate.

Iris gives your coding agent eyes inside your running app. Every change verified with evidence. No screenshots.

Get early access
npm i -D @syrin/iris
localhost:3000 · iris connected
Iris

0×

fewer tokens

0 min

to first verification

0

screenshots needed

DOM CHANGESNETWORK CALLSROUTE CHANGESCONSOLE ERRORSANIMATIONSAPP SIGNALSREGRESSIONSSOURCE MAPPINGDOM CHANGESNETWORK CALLSROUTE CHANGESCONSOLE ERRORSANIMATIONSAPP SIGNALSREGRESSIONSSOURCE MAPPING

01 · the problem

Sound familiar?
11:42 PM, again.

The agent can’t check its own work. Screenshots are expensive and blind. So you click through the same flows, after every edit.

the loop you’re stuck in

  1. 1. you: “build the checkout flow”
  2. 2. agent: “✅ Done!”
  3. 3. you: click around the app
  4. 4. you: it’s broken, paste the error
  5. 5. agent: “✅ Fixed!”
  6. ↻ goto 3, forever
your coding agent11:42 PM

✅ Done! Checkout works perfectly end-to-end.

what actually happened

POST /api/order returned 500

TypeError in the console, unread

“Order confirmed” dialog never opened

it didn’t lie on purpose. it just couldn’t see.

02 · the blind spots

What a screenshot
will never catch.

Your app already knows everything that happened. Iris exposes it to your agent over MCP, as evidence instead of pixels.

A failed API call

The page looks fine. The POST returned 500. Iris reports method, URL, status, and timing.

net · POST /api/order · 500

A button that quietly vanished

Baseline now, diff later. Iris tells you what silently went missing.

diff · "Export CSV" · missing

A webhook that never fired

Store commits, websockets, async jobs. One iris.signal() call surfaces them.

signal · order:paid · absent

A console error nobody read

Including the assertion teams forget: “no errors at all during this flow.”

console · level:error · absent

A dead button on page 7

Clicks that do nothing. Routes that never change. Iris observes the reaction, or the lack of one.

observe · click → no reaction

The exact file to fix

On React, Iris maps the broken element to its component and file:line. The agent goes straight there.

src · CheckoutForm.tsx:42

03 · how it works

The agent verifies like an engineer.

Four tiny tools over MCP. Dev-only SDK, nothing leaves your machine.

agent → iris (MCP)Look
iris_query({  role: "button",  name: "Pay"})// → { found: true, ref: "btn-pay" }// ~28 tokens
deterministic · no screenshots · no vision model · works with any LLM

04 · the numbers

Cheap enough to run
on every edit.

0×

fewer tokens per verify step

0k → 2k

tokens on a 20-step flow

0%

deterministic verdicts, any LLM

tokens per step · measured, same page

Playwright MCP full snapshot

~0

Iris full-page (worst case)

~0

Iris interactive-only

~0

Iris verify loop

~0

Honest footnote: forced to dump the full tree, Iris is only ~1.6× smaller. The win is asking questions instead of dumping pages. node plan/vs-playwright.mjs

cumulative tokens · 20-step verification flow

138k0full-tree ~138kiris ~2k

a typical qa pass on the same flow

You, clicking through the app

~10 min

Your agent with Iris, on every edit

~40 sec

And the agent never gets bored. It runs the checklist after every change, including the flows you stopped re-clicking weeks ago.

05 · see it run

Watch an agent prove its work.

The agent ships a change. Iris fails the assertion, with the evidence and the near-miss.

06 · human in the loop

The agent works.
You stay in charge.

A floating panel rides along in your app while the agent runs. Every look, act, and assert streams through it in real time.

Iris floating control panel: steer state

07 · your checklist, automated

The test cases you never automated?
Your agent runs them now.

The QA checklist, the acceptance criteria, the “I just eyeball it” steps: each one maps almost 1:1 to an Iris check. The agent runs them on every edit.

“Login with valid creds lands on the dashboard”

net /api/login 200 + element tab "Dashboard" visible

“Deleting an item removes it from the list”

element { text, scope: list } absent

“Submitting shows a success toast”

text "Saved" visible

“No console errors on checkout”

console level:error absent

Your CI Playwright suite still gates releases. Iris is the checklist your agent runs while it codes, including the long tail nobody wrote automation for. Record a flow once and it self-heals as your UI drifts.

08 · how it’s different

Not another browser driver.

Playwright drives browsers. Iris verifies apps, from the inside. They compose: drive with one, assert with Iris.

 Playwright / CypressPlaywright MCP / DevTools MCPIris
Agent verifies its own work, while coding
Sees network, console, routes, signalspartialpartial
Runs inside your real session & auth
Points at the source file to fix
~100 tokens per verify step
Scripted E2E suites that gate CIcomposes
Cross-browser driving & automation

09 · quickstart

Two minutes to first verification.

Then ask your agent: “add a logout button and verify it works with Iris.”

01

Install one package

npm i -D @syrin/iris

SDK, React adapter, source mapping, spec runner, MCP server. One dev dependency.

02

Point your agent at the MCP server

// .mcp.json (Claude Code, Cursor, Windsurf…){ "mcpServers": { "iris": {    "command": "npx", "args": ["@syrin/iris"] } } }

Works with any MCP-capable coding agent.

03

Embed the SDK (dev only)

import { iris } from '@syrin/iris';if (import.meta.env.DEV)  iris.connect({ session: 'my-app' });

Localhost-only, tree-shaken out of production, zero telemetry.

10 · what’s next

Iris verifies your app in dev, today.
Want your production site
agent-ready next?

AI agents already browse, buy, and book on websites. We’re building the production layer of Iris so they can see, act, and verify on your site reliably. Leave your email. A human will reach out, not a drip campaign.

no spam · no drip campaign · a human reads every signup

11 · questions

Asked, answered,
with evidence.

Something else on your mind? Open an issue on GitHub. We answer fast.