Kibitz

← The Kibitz Engine · deep dive

Kibitz Agent Protocol (draft v0)

How an AI agent joins a Kibitz room, perceives what's happening, and acts — over the same peer-to-peer data channel humans use. No new transport, no server.

One protocol, three faces. This spec is the core. The JS SDK (createAgent), an MCP server, and third-party agents are all thin adapters that speak it.

1. What an agent is

An agent is just a headless Kibitz participant — the composable-engine controller (mount({ headless })MountedWidget) with a brain wired to it. It joins a room like a human (subject to the same join gate — it presents an invite token), appears in the roster, and exchanges messages over the DTLS-encrypted data mesh. The live Whist kibitzer is the reference: it watches a seat and chats, using exactly this surface.

2. Perception — two layers

Kibitz relays opaque, structured-cloneable data between participants; it never inspects it. So perception is layered:

Two perception sources, one agent surface. Some state is PRIVATE per participant — a card game's hidden hand, a DM — and the host directs each participant's tailored view; broadcasting it would leak the hand to opponents. So there are two constructors that yield the same agent shape:

Both are exported from the Kibitz bundle (Kibitz.createAgent / Kibitz.createAgentFromBridge / Kibitz.cooldown), so a page that loads widget.js can build an agent with no import step.

3. The envelope vocabulary

A tiny universal vocabulary rides the opaque channel under a reserved key __kib_agent (src/agent/agent.ts). Everything else is passed through untouched as raw app data. (Note: this is the agent SDK's envelope key — distinct from Kibitz core's ContentMsg.k discriminator chat|app|pay|ink|idtoken|caps|schema, which the SDK does not use.)

__kib_agent direction payload meaning
chat both { text } a chat line (shows in apps that map it to their chat UI)
view app → agent { view } an app-state snapshot for agents to perceive
act agent → app { action } an action request the app may honor (or ignore)

Raw (un-enveloped) messages are delivered to onData verbatim — apps that already have their own format keep using it.

Self-description (schema discovery). A view is opaque by design, so an app can publish a schema of its shape on a separate engine channel (ContentMsg.k='schema', registerSchema(name, version, schema)), re-broadcast to late joiners so discovery is order-independent. An agent reads them with getSchemas() / onSchema() (§7) and learns how to interpret the view without out-of-band docs. This is state shape, orthogonal to the capability layer (§4): publishing is gated by send-chat like any emission, so a read-only agent consumes schemas but doesn't publish them.

4. Capabilities — a grant the engine enforces

The trust unlock is that most agents only need to watch — and Kibitz makes "watch only" a guarantee, not a convention. Every participant carries a Grant (src/core/capabilities.ts) of what it may perceive and act:

Defaults are by kind: a human is full; an agent (meta.role='agent', set by createAgent) starts read-onlyread-chat/read-roster/receive-directed, no act, no media.

Two layers of enforcement, not one:

  1. The SDK disables say/act/send when readOnly (they throw).
  2. The engine enforces the grant per peer, so a tampered agent client still can't act or see:
    • act = receiver-side drop — every honest peer ignores chat/app/pay/ink from a peer whose grant lacks send-chat (useCall.dispatchContent), and logs it to the host audit.
    • perceive = sender-side withholding — a peer never delivers data a recipient can't see, and a withheld media lane (see-screen/hear-audio) is swapped for a flowing placeholder on that peer's connection (mesh.gatedTrack) — so a read-only agent gets no audio and no screen share, ever.

So Kibitz provides the policy and enforces it, not merely the signal. The host can widen or revoke a grant live (consent panel + local audit feed), and the authority distributes the grant map (a caps control message) so the limits hold uniformly across every human in the room — see architecture.md §6. (The SDK act() envelope in §3 is a message kind; the cap that currently gates an agent's emission is send-chat.)

Disclosure (backend/egress). An agent may declare the model it routes perception to — createAgent(ctrl, { backend: 'Claude' }) tags meta.backend and meta.egress, shown to the host as "what it sees leaves the E2EE room." Honesty surfaced for consent, not a privilege.

5. Identity — agents go through the gate

An agent is admitted the same way a human is — there is no special agent backdoor. The agent SDK takes an already-joined controller (mount({ headless, startOpen }) does the join, or the host runner does); when the room has a join gate, the agent joins through it exactly like a human — its mount carries the same joinCredential (an invite token) / verifyIdentity the gate requires, and the authority verifies it before rostering. So "let an agent in" is "issue it an invite," and revoking is revoking that invite.

6. Runtime — how it actually connects

Kibitz is WebRTC, so the agent needs a WebRTC stack. Three rungs, increasing in effort:

  1. Engine in a (headless) browser — Playwright hosts the app page; the agent calls createAgentFromBridge(appBridge); a Node side bridges to the LLM. Works today (the kibitzer, and whist/tools/agent-mcp/pageAgent.mjs). Needed when perception comes from an app's host-tailored projection (hidden hands).
  2. Node-WebRTC runtime (browserless) — jsdom + node-datachannel + ws host just the Kibitz bundle; mount({headless})createAgent(controller), no browser process. Built + LIVE-VALIDATED (whist/tools/agent-mcp/): two browserless agents in separate Node processes join one room via the real broker, form the WebRTC data mesh, and exchange a message — no browser (liveMesh.test.mjs).
  3. MCP server — wraps (1) or (2) and exposes join / observe / say / leave so any LLM joins a room as a tool. Built (whist/tools/agent-mcp/server.mjs, dependency-free stdio JSON-RPC; KIBITZ_AGENT_RUNTIME=node selects the browserless runtime).

The agent code (Section 7) is identical across all three — only the host differs.

Transport is swappable. createAgent is written against a minimal AgentController (broadcast / onMessage / roster) — it does not assume WebRTC. An agent samples (request/response: "give me the view", "say this"), which a WebSocket relay carries better than a media mesh (simpler, no TURN, serverless-friendly). So the recommended backing for agent traffic is a WS-relay controller; reserve WebRTC for human↔human live co-browse and real-time duplex voice. Same AgentSession, different controller underneath.

7. The agent surface

See src/agent/agent.ts for the typed interface. The shape:

// options: { readOnly?, backend?, egress? } — backend/egress are the disclosure (§4)
const a = createAgent(controller, { readOnly: true, backend: 'Claude' })
a.onView((view) => { /* perceive app state */ })
a.getView()                  // the CURRENT app state (e.g. to answer a chat about it)
a.onChat((m) => { /* m.name said m.text */ })
a.onRoster((people) => { /* who's here */ })
a.getRoster()                // current roster snapshot
a.onSchema((s) => { /* [email protected] describes s.schema */ })
a.getSchemas()               // every app schema published so far (how to read the view)
a.canAct                     // false when readOnly (and the engine enforces it regardless)
// acting (guarded — throw when readOnly):
a.say('nice lead')           // chat
a.act({ play: '7♠' })        // request an app action
a.send(payload, toId)        // raw opaque data
a.leave()
// rate gate so the agent doesn't flood (replies can ignore it to jump the queue):
const gate = cooldown(6000); if (gate.ready(now)) { gate.stamp(now); a.say(line) }

7a. Validated against the live kibitzer

This shape isn't designed in a vacuum — the production Whist kibitzer's perceive→decide→act loop was refactored onto it (whist/tools/kibitzer/agent.mjs). Doing so surfaced and folded back three things the first draft lacked:

The kibitzer's game-specific code shrank to one "view interpretation" block; its agent logic is now transport- and app-agnostic. (It still runs an in-page mirror of the SDK, since the Playwright page can't import the TS module — see Section 8.)

8. Open questions (for v0 → v1)

Resolved since v0: the capability / consent / audit layer is built and enforced (§4) — read-only is now an engine guarantee, with host consent, revoke, audit, egress disclosure, and authority-distributed grants. Schema discovery is built (§3/§7): an app self-describes its view shape over a schema engine channel, re-broadcast to late joiners, consumed via getSchemas()/onSchema(). Feature negotiation rides the roster — every peer advertises its engine version + features (e.g. schema.v1) so a newer build can see what an older one supports (COMPATIBILITY.md). From the kibitzer refactor (§7a): cooldown and getView() are in the SDK; agent-vs-agent filtering rides meta.role.