← The Kibitz Engine · deep dive
How an AI agent joins a Kibitz room, perceives what's happening, and acts — over the same peer-to-peer data channel humans use. No new transport, no server.
One protocol, three faces. This spec is the core. The JS SDK (
createAgent), an MCP server, and third-party agents are all thin adapters that speak it.
An agent is just a headless Kibitz participant — the composable-engine controller
(mount({ headless }) → MountedWidget) with a brain wired to it. It joins a room like
a human (subject to the same join gate — it presents an invite token),
appears in the roster, and exchanges messages over the DTLS-encrypted data mesh. The live
Whist kibitzer is the
reference: it watches a seat and chats, using exactly this surface.
Kibitz relays opaque, structured-cloneable data between participants; it never inspects it. So perception is layered:
view each turn; the kibitzer reads pub/myHand/chat out of it.)Two perception sources, one agent surface. Some state is PRIVATE per participant — a card game's hidden hand, a DM — and the host directs each participant's tailored view; broadcasting it would leak the hand to opponents. So there are two constructors that yield the same agent shape:
createAgent(controller) — perceive over the generic broadcast/onMessage channel.createAgentFromBridge(appBridge) — perceive an app's host-tailored, per-participant
projection (Whist's onView/sendChat). The right choice for hidden-information apps.Both are exported from the Kibitz bundle (Kibitz.createAgent / Kibitz.createAgentFromBridge
/ Kibitz.cooldown), so a page that loads widget.js can build an agent with no import step.
A tiny universal vocabulary rides the opaque channel under a reserved key __kib_agent
(src/agent/agent.ts). Everything else is passed through untouched as raw app data. (Note:
this is the agent SDK's envelope key — distinct from Kibitz core's ContentMsg.k
discriminator chat|app|pay|ink|idtoken|caps|schema, which the SDK does not use.)
__kib_agent |
direction | payload | meaning |
|---|---|---|---|
chat |
both | { text } |
a chat line (shows in apps that map it to their chat UI) |
view |
app → agent | { view } |
an app-state snapshot for agents to perceive |
act |
agent → app | { action } |
an action request the app may honor (or ignore) |
Raw (un-enveloped) messages are delivered to onData verbatim — apps that already have
their own format keep using it.
Self-description (schema discovery). A view is opaque by design, so an app can publish a
schema of its shape on a separate engine channel (ContentMsg.k='schema',
registerSchema(name, version, schema)), re-broadcast to late joiners so discovery is
order-independent. An agent reads them with getSchemas() / onSchema() (§7) and learns
how to interpret the view without out-of-band docs. This is state shape, orthogonal to the
capability layer (§4): publishing is gated by send-chat like any emission, so a read-only
agent consumes schemas but doesn't publish them.
The trust unlock is that most agents only need to watch — and Kibitz makes "watch only"
a guarantee, not a convention. Every participant carries a Grant
(src/core/capabilities.ts) of what it may perceive and
act:
see-screen, hear-audio, read-chat, read-roster, receive-directed.send-chat, speak, act.Defaults are by kind: a human is full; an agent (meta.role='agent', set by
createAgent) starts read-only — read-chat/read-roster/receive-directed, no act,
no media.
Two layers of enforcement, not one:
say/act/send when readOnly (they throw).send-chat (useCall.dispatchContent), and logs it to the host audit.see-screen/hear-audio) is swapped for a flowing
placeholder on that peer's connection (mesh.gatedTrack) — so a read-only agent gets no
audio and no screen share, ever.So Kibitz provides the policy and enforces it, not merely the signal. The host can widen
or revoke a grant live (consent panel + local audit feed), and the authority distributes
the grant map (a caps control message) so the limits hold uniformly across every human in
the room — see architecture.md §6. (The SDK
act() envelope in §3 is a message kind; the cap that currently gates an agent's emission
is send-chat.)
Disclosure (backend/egress). An agent may declare the model it routes perception to —
createAgent(ctrl, { backend: 'Claude' }) tags meta.backend and meta.egress, shown to the
host as "what it sees leaves the E2EE room." Honesty surfaced for consent, not a privilege.
An agent is admitted the same way a human is — there is no special agent backdoor. The
agent SDK takes an already-joined controller (mount({ headless, startOpen }) does the
join, or the host runner does); when the room has a join gate, the agent
joins through it exactly like a human — its mount carries the same joinCredential (an invite
token) / verifyIdentity the gate requires, and the authority verifies it before rostering.
So "let an agent in" is "issue it an invite," and revoking is revoking that invite.
Kibitz is WebRTC, so the agent needs a WebRTC stack. Three rungs, increasing in effort:
createAgentFromBridge(appBridge); a Node side bridges to the LLM. Works today (the
kibitzer, and whist/tools/agent-mcp/pageAgent.mjs). Needed when perception comes from an
app's host-tailored projection (hidden hands).node-datachannel + ws host just the
Kibitz bundle; mount({headless}) → createAgent(controller), no browser process.
Built + LIVE-VALIDATED (whist/tools/agent-mcp/): two browserless agents in separate
Node processes join one room via the real broker, form the WebRTC data mesh, and exchange
a message — no browser (liveMesh.test.mjs).join / observe / say / leave so any
LLM joins a room as a tool. Built (whist/tools/agent-mcp/server.mjs, dependency-free
stdio JSON-RPC; KIBITZ_AGENT_RUNTIME=node selects the browserless runtime).The agent code (Section 7) is identical across all three — only the host differs.
Transport is swappable. createAgent is written against a minimal AgentController
(broadcast / onMessage / roster) — it does not assume WebRTC. An agent samples
(request/response: "give me the view", "say this"), which a WebSocket relay carries
better than a media mesh (simpler, no TURN, serverless-friendly). So the recommended
backing for agent traffic is a WS-relay controller; reserve WebRTC for human↔human live
co-browse and real-time duplex voice. Same AgentSession, different controller underneath.
See src/agent/agent.ts for the typed interface. The shape:
// options: { readOnly?, backend?, egress? } — backend/egress are the disclosure (§4)
const a = createAgent(controller, { readOnly: true, backend: 'Claude' })
a.onView((view) => { /* perceive app state */ })
a.getView() // the CURRENT app state (e.g. to answer a chat about it)
a.onChat((m) => { /* m.name said m.text */ })
a.onRoster((people) => { /* who's here */ })
a.getRoster() // current roster snapshot
a.onSchema((s) => { /* [email protected] describes s.schema */ })
a.getSchemas() // every app schema published so far (how to read the view)
a.canAct // false when readOnly (and the engine enforces it regardless)
// acting (guarded — throw when readOnly):
a.say('nice lead') // chat
a.act({ play: '7♠' }) // request an app action
a.send(payload, toId) // raw opaque data
a.leave()
// rate gate so the agent doesn't flood (replies can ignore it to jump the queue):
const gate = cooldown(6000); if (gate.ready(now)) { gate.stamp(now); a.say(line) }
This shape isn't designed in a vacuum — the production Whist kibitzer's perceive→decide→act
loop was refactored onto it (whist/tools/kibitzer/agent.mjs). Doing so surfaced and
folded back three things the first draft lacked:
getView() — an agent replying to a chat line needs the current state to answer.cooldown(ms) — every agent needs a flood gate; it was hand-rolled, now it's in the SDK.meta.role — the kibitzer skipped other agents by a
uid-prefix hack; the protocol does it cleanly off the role tag every agent sets.The kibitzer's game-specific code shrank to one "view interpretation" block; its agent logic is now transport- and app-agnostic. (It still runs an in-page mirror of the SDK, since the Playwright page can't import the TS module — see Section 8.)
Kibitz.createAgent
/ createAgentFromBridge / cooldown), and the kibitzer prefers it (falling back to
a tiny inline clone only if the page's vendored widget.js predates the SDK). The last
step to delete the fallback entirely is re-vendoring the current Kibitz bundle into
Whist — deferred only because today's bundle also carries undeployed gate/identity work,
so it ships on the next clean Kibitz release.view; add frames later. Note this is the inbound
question — gating an agent's media perception is already built, §4.)Resolved since v0: the capability / consent / audit layer is built and enforced (§4) —
read-only is now an engine guarantee, with host consent, revoke, audit, egress disclosure,
and authority-distributed grants. Schema discovery is built (§3/§7): an app self-describes
its view shape over a schema engine channel, re-broadcast to late joiners, consumed via
getSchemas()/onSchema(). Feature negotiation rides the roster — every peer advertises
its engine version + features (e.g. schema.v1) so a newer build can see what an older one
supports (COMPATIBILITY.md). From the kibitzer refactor (§7a): cooldown
and getView() are in the SDK; agent-vs-agent filtering rides meta.role.