The Harness Is the Product: Testing iii Against My Multi-Agent App

Every serious agent project eventually becomes a harness project.

At the beginning, you think you’re building an agent. You wire up a model, give it a tool, stream some tokens, maybe add a second agent because the demo looks better when one model argues with another model. Then the surface area starts expanding.

Who owns the turn loop? Where do tool calls get approved? What happens if the browser disconnects halfway through a run? Where does session state live? Can I cap spend? Can I see which agent burned the tokens? Can I swap a provider without rewriting half the code? Can I replay the run after something breaks?

That surrounding runtime is the agentic harness — the layer I’ve argued is the moat, not the model.

Frameworks and libraries all answer the harness question differently. Some give you a thin loop and say “bring your own production system.” Some give you a graph abstraction. Some give you a full platform. I wanted to feel the trade-off directly, so I built one version myself and then started testing a very different substrate: iii.¹

This post isn’t a benchmark or a vendor dunk. It’s a build note from the messy middle: my initial AI SDK harness in multi-agents-team,² the live playground at mat.umai-tech.com,³ what iii changes, where the migration feels good, and where it adds operational weight. (Full disclosure: I liked the engine contract enough to start contributing to iii’s Go SDK along the way, so read this as an interested party’s build notes.)

The short version

My in-app harness is the right shape for learning, demos, and fast iteration. iii is interesting when the harness needs production jobs: durable execution, policy, budgets, server-side state, and observability. The trade-off is that you now operate a real engine, not just a Next.js app.

The harness problem

When people talk about agents, they often talk about models, prompts, tools, and memory. Those matter. But after the prototype stage, the painful questions usually aren’t “which model?” They’re:

Control flow: who decides the next step, and how do you stop loops?
Tool execution: which calls are allowed, denied, or human-approved?
Streaming: how does the UI see intermediate reasoning, tool calls, and final state?
State: what survives a refresh, a deploy, or a server restart?
Cost: where do token usage and budget limits actually get enforced?
Observability: can you inspect a run by session, message, agent, or tool call?
Provider boundaries: can you move from OpenAI to Anthropic, Mistral, or Fireworks without rewriting the app?

That list is the harness.

And once you see it, you start seeing harnesses everywhere. LangGraph is a harness with explicit graph control. AutoGen is a harness built around multi-agent conversations. Pi calls itself exactly what it is — a minimal agent harness you adapt to your workflows, not the other way around.⁴ OpenClaw is a harness for a personal assistant: one agent runtime wired into WhatsApp, Telegram, your inbox, and your calendar.⁵ The Vercel AI SDK gives you excellent model/tool/streaming primitives, but the harness around those primitives is still your job.⁶ Anthropic’s multi-agent research system is another harness shape: specialist agents, a lead agent, parallelism, and tool-heavy research coordination.⁷

I wanted a repo that made those trade-offs visible instead of theoretical.

What multi-agents-team is

MAT or multi-agents-team is a Next.js playground for multi-agent coordination patterns.² The live version is at mat.umai-tech.com.³ The same user request can run through nine different architectures:

multi-agents-team

Nine architecture pages, one event contract

Each card links to the live architecture view on the playground. The point isn’t that one pattern wins — it’s that the same task can move through different coordination shapes and still stream back through the same UI.

v1 open ↗

Orchestrated

Central coordinator

plan → delegate → synthesize

A coordinator routes work to research, writer, and editor specialists.

v2 open ↗

Choreographed

Peer message bus

round-robin negotiation

Backend, frontend, and design peers coordinate through shared messages.

v3 open ↗

Hierarchical

Dynamic agent tree

lead → sub-agents → rollup

A lead spawns depth-capped sub-agents and synthesizes their results.

v4 open ↗

Evaluator–Optimizer

Critique loop

draft → score → revise

A generator improves a draft until a critic accepts the quality bar.

v5 open ↗

Debate

Adversarial panel

argue → rebut → judge

Opposing agents argue their case before a judge synthesizes the answer.

v6 open ↗

Blackboard

Shared workspace

select agent → write board

A controller chooses which specialist updates the shared board next.

v7 open ↗

Market

Auction board

post task → bid → award

Agents bid on work; the dispatcher awards tasks to the strongest fit.

v8 open ↗

Self-Consistency

Parallel sampling

sample N → select/merge

Several attempts run in parallel and a judge selects or merges the best.

v9 open ↗

Swarm

Shared scratchpad

many passes → convergence

Identical agents build on a shared scratchpad over capped rounds.

Why nine architectures?

Nine isn’t a magic number — it’s the smallest set that covers the coordination axes I cared about: who decides (a central coordinator, negotiating peers, or a market), how state is shared (handoffs, a blackboard, a common scratchpad), and how quality emerges (critique loops, debate, parallel sampling). Each pattern stresses the harness differently — handoffs, shared boards, bids, parallel samples — but every run has to emit the same AgentEvent stream. If the harness can carry all nine without special cases, the event contract is probably right.

The project is intentionally hands-on. Every mode has its own streaming API route. Every run emits a shared AgentEvent stream: workflow starts, iteration starts, agent steps, tool calls, handoffs, blackboard updates, bids, traces, samples, and final completion. The UI renders that stream live so you can watch the system reason, not just read the final answer.

The first harness is deliberately simple:

Next.js API routes own the request lifecycle.
The Vercel AI SDK handles model calls, tool definitions, and streaming-friendly primitives.
A per-run conversation object owns the message bus.
Provider selection is request-scoped with AsyncLocalStorage, so a user’s API key doesn’t leak across concurrent runs.
Chat history lives in the browser.
Cost is estimated and displayed, but not enforced.
Human input exists, but the early version is in-memory and single-process.

original harness architecture

The whole first harness lived inside one app

Click a block to inspect the job it owns — the detail panel below the diagram updates in place.

Open live app ↗

browser localStorage history

Why this works: everything shares one process boundary, so iteration is fast and the source is easy to read. Why it bends: durability, policy, budget enforcement, and server-side state all become custom app code.

mat.umai-tech.com

Browser chat UI

The public app owns the mode selector, provider settings, local chat history, live timeline, tree visualizations, and final rich summaries.

Responsibility

User-facing control plane

This was the right first move. It kept the coordination logic close to the app. If I wanted to understand why a debate run behaved differently from a blackboard run, I could open the runner file and read it. No distributed system. No platform ceremony. No second runtime.

Could I have skipped the hand-rolling?

Probably. openharness is an open-source SDK that builds exactly this kind of harness on top of the Vercel AI SDK — stateless agents, composable middleware, tool-permission callbacks, context compaction, subagent hierarchies.⁸ If I’d wanted a harness off the shelf, that’s where I’d have started. But the point of this repo was the opposite: build each harness job by hand first, so I’d understand what I was buying when I later reached for a substrate like iii.

But as soon as I started asking production questions, the harness got louder.

The limits of the hand-rolled harness

The in-app harness is honest about what it is: a readable local runtime for exploring coordination patterns. That’s a feature. It’s also the boundary.

For a public demo, browser-local history is fine. For a customer workflow, server-side sessions matter. For a toy web search tool, “the user clicked run” is enough permission. For a production agent touching internal systems, every tool call needs policy. For a blog-worthy demo, estimated cost is enough. For a real product, budget caps need to stop the run, not just decorate the UI.

The uncomfortable part is that none of these problems are exotic. They’re the normal jobs around an agent run:

the first harness

One app, nine jobs

The original harness worked because all responsibilities were close to the product. That same closeness is also where production pressure starts.

thin harness

01 harness job

Turn loop

Hand-rolled runners inside the Next.js app.

pressure point Readable, but the route owns too much lifecycle.

02 harness job

Events to UI

SSE from the route.

pressure point Simple until runs need to outlive the request.

03 harness job

Tools

AI SDK tool definitions with Zod schemas.

pressure point Good primitive; policy is still separate work.

04 harness job

Policy

None by default.

pressure point Fine for demos, risky for internal tools.

05 harness job

Budget

Estimated cost only.

pressure point Useful signal, not an enforcement layer.

06 harness job

Sessions

Browser localStorage.

pressure point Convenient, not durable server-side memory.

07 harness job

Human approval

In-memory request registry.

pressure point Easy to prototype, fragile across restarts.

08 harness job

Observability

Logs and UI events.

pressure point Great for demos; thin for incident debugging.

09 harness job

Deployment

One Next.js app.

pressure point The strongest feature of the original harness.

That last row is the reason I still like the original harness. One app is hard to beat. But the middle rows are exactly why I became curious about iii.

What iii is

iii — source-available on GitHub⁹ — is a Rust engine built by the team behind the Motia backend framework.¹¹⁰ The repo sits at roughly 17k stars, the engine is licensed under ELv2 with Apache-2.0 SDKs,¹¹ and the docs describe a system built from three primitives — workers, functions, and triggers — with state, streams, and queues shipped as built-in workers.¹² The important mental model comes from the founder’s own harness write-up:

Pick the workers. Write the missing ones. Compose. The harness is the composition.

— Mike Piccolo • How to Build Your Own Agent Harness, iii (May 2026)

In iii, workers connect to an engine over WebSocket and register functions and triggers. The engine routes calls, manages worker connections, exposes modules for HTTP/state/queues/streams, and gives the system a common substrate.¹³

the iii mental model

Three primitives, one interface

The docs frame it the way Unix framed processes and React framed components: every category of software gets a single interface. Workers host the work, functions are the work, triggers start it — and the engine routes between them.

iii docs ↗

Worker

hosts the work

Anything that opens a WebSocket to the engine and registers functions and triggers. A queue, a scheduler, an HTTP edge, a browser tab, an agent, a sandbox — each is a worker.

Workers run anywhere — a laptop, a container, a browser tab, a microVM — in any language that can hold a WebSocket.

mat-iii-worker · iii-state · iii-http

Function

is the work

A named handler inside a worker: payload in, result out. Function IDs follow a service::name convention, so they stay stable across worker restarts and language boundaries.

Any worker can call any function through the engine — which is what makes a harness job swappable: register the same ID, replace the layer.

math::add · policy::check_permissions

Trigger

starts the work

What causes a function to run. A trigger has a type — HTTP, cron, queue message, state change, or another function calling trigger — a configuration, and the function ID it invokes.

The same function can sit behind several triggers: an HTTP route for the app, a cron for batch runs, a queue for durability.

POST /run → turn::run

Engine

routes between them

The engine is the coordinator: it accepts worker connections over WebSocket, keeps a live registry of every registered function and trigger, and routes each invocation to whichever worker currently provides that function ID. Nothing talks to anything directly — every arrow in the topology is a worker-to-engine connection.

That changes the harness shape. Instead of one application process owning every concern, the harness can be decomposed:

A provider worker streams model output.
A policy worker checks permissions.
A budget worker records and enforces spend.
A state layer owns sessions.
A queue keeps long-running work alive after the initial request.
A trace layer gives you OpenTelemetry spans across the run.
A custom worker can replace one layer by registering the same function IDs.

The iii blog post frames this as “build your own agent harness.”¹⁴ I think the stronger interpretation is: don’t confuse the harness with the application framework.

The harness is the set of jobs that make an agent safe, durable, observable, and operable. iii’s bet is that those jobs should be workers on a shared bus, not hidden inside one framework object.

The primitive shift

The powerful idea isn’t “use this one agent framework.” It’s seeing queues, streams, state, model providers, policy gates, approval surfaces, browser tabs, and business services as the same kind of thing: workers. If a capability is missing, add or replace a worker. Don’t keep changing the core engine.

Contributor note

This is the Go SDK work I mentioned up top: it sits alongside the Node, Python, and Rust SDKs.¹⁵ It’s also why the primitive argument feels concrete to me — once the engine contract is stable, another language becomes an SDK layer, not a rewrite of the engine.

Testing iii inside my repo

The useful thing about my multi-agents-team setup is that the app now has two backend paths.

The default path is still the in-app harness. The chat UI sends a request to the relevant /api/agents-v* route, the route validates credentials, then the local runner streams events back to the browser.

The iii path keeps the same front-end contract but changes where the turn runs:

The UI sends the same mode, model, provider, message, history, and conversation ID.
The API route sees backend: "iii".
The app posts the turn to the iii engine’s HTTP trigger.
A worker runs the existing agent loop and emits the same AgentEvent shape.
The Next.js app forwards those events back to the same chat UI.

That last point matters. I didn’t want to rewrite the product around iii. I wanted to test whether the harness layer could move while the visible app stayed stable.

Multi-Agent Team chat UI showing the execution backend selector with iii engine selected — The visible product stays the same: the user selects a model, a coordination pattern, and an execution backend. The iii engine path is a harness swap behind the chat UI, not a second product.

The adapter does three pragmatic things:

It fails closed when no iii engine is configured, with an actionable error and a path back to the in-app harness.
It accepts live SSE from the engine when the worker streams events over the HTTP response.
It also supports a queued path where a run gets a runId, keeps executing on the engine, and the app polls events until completion.

There is also a small policy bridge. My tools are still defined inline in the agent factories. I didn’t want to couple those tools directly to the iii SDK. Instead, the iii worker installs a request-scoped policy checker; tools call a local policyCheck() function; the in-app backend has no checker, so it behaves as before. On the iii path, the check can forward to policy::check_permissions.

iii’s own reference harness makes the same call at a different layer: it ships as roughly fourteen workers in a separate workers repo, and its policy gate fails closed — a five-second timeout counts as a deny.¹⁶

That’s the migration pattern I like: keep the app’s internal abstractions stable, move one harness concern at a time.

How I deployed the iii harness on Fly.io

The Next.js app can live happily on Vercel because the in-app harness is just route handlers. The iii backend is different. It’s a long-running engine plus a worker that needs to stay connected to that engine over WebSocket, so I deployed that part separately on Fly.io.¹⁷

The high-level shape is:

Vercel: serves the public mat.umai-tech.com Next.js app.
Fly.io: runs mat-iii-engine, a Docker image bundling the iii engine and my iii-worker.
Inside the Fly machine: the worker connects to the local engine bus on ws://localhost:49134.
Public edge: the engine exposes the worker’s POST /run and GET /health triggers over HTTPS on port 3111.
Shared secret: Vercel sends III_ENGINE_TOKEN as a bearer token; the Fly worker checks the same secret before accepting a run.

The important deploy detail: I run the Fly app as one machine. In this version, queued run events and stream state live in that engine process. If Fly sends POST /run to one machine and GET /events to another, the app can lose the run’s event stream. So the Fly config keeps min_machines_running = 1, disables auto-stop, and the deploy should stay single-instance until the state/stream layer is externalized.

The setup is intentionally boring:

mat iii backend · deploy

# Create the Fly app without deploying yet.
fly launch --no-deploy

# Set the shared secret used by the Next.js app and the iii worker.
fly secrets set III_ENGINE_TOKEN=$(openssl rand -hex 32)

# Keep this single-machine for now: queued events/state are process-local.
fly deploy --ha=false

# If Fly ever scales it up, force it back to one machine.
fly scale count 1

# Next.js app side: only used when the iii backend is selected.
III_ENGINE_HTTP_URL=https://mat-iii-engine.fly.dev
III_ENGINE_TOKEN=<same-secret>
III_RUN_PATH=/run
III_TURN_TIMEOUT_MS=240000
NEXT_PUBLIC_III_BACKEND_ENABLED=true

# Terminal 1: run the iii engine locally.
iii --use-default-config

# Terminal 2: run the MAT worker against the local engine.
pnpm worker

# Next.js env for local testing.
III_ENGINE_HTTP_URL=http://localhost:3111
III_ENGINE_TOKEN=<same-secret>
NEXT_PUBLIC_III_BACKEND_ENABLED=true

That gives me the split I wanted: Vercel keeps serving the product UI, Fly runs the long-lived harness substrate, and the two meet at one explicit seam: POST /run.

This is a phased migration

I wouldn’t describe this as “the app is now an iii app.” The honest description is better: the app has a working in-app harness and an iii backend path for testing production harness concerns without throwing away the original runners.

My harness vs iii

Here is the comparison I wish I had when I started.

migration ledger

Thin app harness vs worker substrate

Same product surface, different runtime responsibilities. The useful comparison isn’t who’s “better” — it’s where each harness shape puts the work.

AI SDK harness

thin app runtime

current

iii substrate

engine + composable workers

iii.dev ↗

Best use case

AI SDK harness

Learning, demos, local dev, fast iteration.

iii substrate

Production-ish runs that need durable state, policy, budgets, and traces.

Mental model

AI SDK harness

One app owns the route, loop, tools, events, and UI.

iii substrate

Engine plus workers; each harness concern can be a worker.

Streaming

AI SDK harness

Simple SSE from the Next.js route.

iii substrate

Worker emits events through engine/channel/stream paths; app forwards them.

Tool policy

AI SDK harness

Whatever I code inline. Initially: nothing.

iii substrate

A policy function can gate calls and fail closed.

Budget control

AI SDK harness

Estimate and display cost.

iii substrate

Budget can become a runtime enforcement concern.

Sessions

AI SDK harness

Browser-local history.

iii substrate

Server-side state keyed by conversation/session.

Human approval

AI SDK harness

Easy to prototype, fragile in one process.

iii substrate

Can be backed by durable state/queue mechanics.

Observability

AI SDK harness

UI timeline plus logs.

iii substrate

Cross-worker traces are part of the substrate.

Provider support

AI SDK harness

Great through the AI SDK, request-scoped in my app.

iii substrate

Provider workers make model access another replaceable layer.

Deployment

AI SDK harness

One Next.js app.

iii substrate

Next.js app plus engine/workers. More power, more ops.

The original harness wins on approachability. It’s easy to clone, run, read, and modify. It’s the harness I’d show someone who wants to learn how multi-agent patterns actually behave.

iii wins on separation of concerns. Policy shouldn’t be sprinkled through random tool functions. Budget enforcement shouldn’t be a UI label. Long-running agent state shouldn’t depend on a browser tab or a serverless request staying alive. If those are your problems, iii’s worker model starts to make sense.

But this isn’t free.

What feels better with iii

The biggest improvement is that the harness jobs become explicit.

In my first version, “the harness” was spread across routes, runners, event types, provider utilities, local storage, and some UI assumptions. It worked because the app was small and the author was me. That’s not a production architecture principle.

With iii, a policy layer isn’t an afterthought — it’s a function call. A budget layer isn’t a comment — it can be a worker. The approval path doesn’t have to be a React component — it can be another system that writes a decision into state. A Slack approval surface and a console approval surface can both call the same underlying function.

That composability is the interesting part. The claim isn’t “iii has the perfect default harness.” It’s that a harness is easier to evolve when its responsibilities are connected by stable function IDs instead of fused into one framework.

It also makes partial migration realistic. I can keep the Vercel AI SDK where it is useful. I can keep my nine runners. I can keep my event schema. Then I can move durability, policy, budget, and trace concerns into iii one by one.

That’s a good engineering shape — and it’s the old, proven one. Small primitives with one interface compose; big frameworks with many interfaces accrete. When the harness is a set of functions on a bus, “we need budget enforcement now” is a worker you write this week, not a feature request on someone else’s roadmap. That’s the real power of primitives: composability turns the harness from a product you adopt into an architecture you evolve.

What still hurts

The cost is operational complexity.

The in-app harness has one deployable. The iii path introduces another runtime: engine configuration, worker registration, auth token, health checks, event transport, timeout behavior, and failure modes between the app and engine.

That’s not a reason to reject it. It’s a reason to be honest about when you need it.

If I’m teaching multi-agent patterns, I don’t want a distributed system in the way. If I’m running a weekend experiment, I don’t need a budget worker. If the tool is just web search against a user’s own API key, a full policy engine may be more architecture than product.

But if the agent can call internal tools, mutate business state, spend real money, or run for minutes in the background, the “simple” harness starts hiding risk. At that point, the second runtime may be cheaper than the pile of bespoke code you were about to write badly.

The trade-off

A thin harness optimizes for momentum. A thick harness optimizes for control. The mistake is pretending one of those is always the grown-up answer.

Where iii sits in the harness landscape

iii isn’t entering an empty field — every serious agent stack already answers the harness question somewhere. LangGraph checkpoints graph state inside your process.¹⁸ Microsoft folded AutoGen and Semantic Kernel into one agent framework with an actor-style runtime.¹⁹ OpenAI’s Agents SDK keeps the loop thin and in-process,²⁰ while Temporal ships an official integration that turns that same loop into durable workflow code.²¹ Anthropic packages its own coding harness — loop, tools, permissions, subagents — as the Claude Agent SDK.²² Pi stakes out the same thin pole even more aggressively: a deliberately minimal harness you adapt to your workflows instead of the other way around.⁴ OpenHands wraps each coding session in a sandboxed per-session runtime.²³ Restate, Inngest, and Hatchet sell durable execution that agent builders increasingly borrow.²⁴²⁵²⁶

The differences aren’t feature lists. They’re answers to one question: which layer of your system owns the harness jobs?

the landscape

Five answers to where the harness jobs live

Every serious agent stack answers the harness question somewhere. The interesting comparison isn’t feature lists — it’s which layer of your system ends up owning durability, policy, budgets, and state.

In-process framework

LangGraph Microsoft Agent Framework OpenAI Agents SDK

Inside your application process. Graphs, checkpointers, guardrails, and sessions are objects the framework owns.

trade-off Fast to adopt and easy to reason about locally; durability and policy are bounded by the framework and the process running it.

Thin SDK

Claude Agent SDK openharness Pi

In a packaged loop you embed as a library: tools, permissions, hooks, compaction, and subagents.

trade-off Excellent loop ergonomics; durability, budgets, and multi-service orchestration are explicitly out of scope.

Durable-execution substrate

Temporal Restate Inngest Hatchet

In a workflow engine that makes the run itself durable: retries, queues, state, and replay come from the substrate.

trade-off The strongest durability story; agent-specific jobs like policy, approvals, and budgets are still yours to build on top.

Session platform

OpenHands

In a per-session runtime: sandbox, event stream, and tools bundled around each agent session.

trade-off Batteries included for coding agents; less a substrate you compose, more a product you adopt.

Worker bus

iii

tested in this post

In independent workers on a shared engine bus. Each harness job — policy, budget, state, provider, traces — is a function you can swap without touching the rest.

trade-off The most composable shape, and the one this post tests; the price is that you operate the engine and pay the early-adopter integration tax.

iii’s nearest neighbors are the durable-execution substrates. The difference is the unit of composition: Temporal and friends give you durable workflows you write; iii gives you a live function registry on a bus, where each harness job is a worker you can replace — including with one written by another team, in another language. That’s a stronger composition story and a younger ecosystem, and both halves of that sentence matter.

What I would want from iii next

At one end: a thin loop around an LLM call. Great for exploration. Low ceremony. Easy to debug. Almost no production guarantees.

At the other end: durable queues, server-side state, policy gates, human approvals, spend caps, worker-level traces, and provider abstraction. More moving parts. More control.

My in-app harness lives closer to the thin end. iii lets the system move toward the thick end without forcing every concern into the same application process.

That’s the part I find compelling. Not “replace your app with iii.” Not “frameworks are dead.” More practical:

Keep the agent logic where it is easiest to reason about. Move the cross-cutting harness jobs to a substrate when those jobs become real.

For multi-agents-team, that means the original harness remains the default. It’s the best path for trying the nine patterns and understanding coordination trade-offs. The iii backend is the experiment for what happens when those same patterns need production properties.

And honestly, that’s how most AI systems should evolve. Start with the thinnest harness that teaches you something. Don’t cargo-cult a platform on day one. But when the run starts touching real systems, stop pretending a route handler and a clever prompt are enough.

References

iii, iii homepage. ↩ ↩²
Marcus Elwin, multi-agents-team GitHub repo. ↩ ↩²
Marcus Elwin, multi-agents-team live demo. ↩ ↩²
Earendil, Pi — “Pi is a minimal agent harness. Adapt Pi to your workflows, not the other way around.” ↩ ↩²
OpenClaw, openclaw.ai — open-source personal AI assistant that runs on your own machine and connects to WhatsApp, Telegram, and other chat apps. ↩
Vercel, AI SDK documentation. ↩
Anthropic Engineering, “How we built our multi-agent research system”. ↩
Max Gfeller, openharness — a composable agent-harness SDK built on the Vercel AI SDK. ↩
iii HQ, iii GitHub repo. ↩
iii documentation, “Migrating from Motia”. ↩
iii HQ, iii GitHub repo: the engine is licensed under the Elastic License 2.0; the SDKs, CLI, and console are Apache-2.0. ↩
iii, documentation. ↩ ↩²
iii documentation, “Engine” and “Channels”. ↩
Mike Piccolo, “How to build your own agent harness”. ↩ ↩²
iii HQ, Go SDK package. ↩
iii HQ, workers repo, including the reference harness workers. ↩
Fly.io, documentation. ↩
LangChain, LangGraph. ↩
Microsoft, Agent Framework GitHub repo. ↩
OpenAI, Agents SDK documentation. ↩
Temporal, documentation. ↩
Anthropic Engineering, “Building agents with the Claude Agent SDK”. ↩
All Hands AI, OpenHands GitHub repo. ↩
Restate, restate.dev. ↩
Inngest, AgentKit documentation. ↩
Hatchet, hatchet.run. ↩

The short version

The harness problem

What multi-agents-team is

Nine architecture pages, one event contract

Orchestrated

Choreographed

Hierarchical

Evaluator–Optimizer

Debate

Blackboard

Market

Self-Consistency

Swarm

Why nine architectures?

The whole first harness lived inside one app

Browser chat UI

Could I have skipped the hand-rolling?

The limits of the hand-rolled harness

One app, nine jobs

Turn loop

Events to UI

Tools

Policy

Budget

Sessions

Human approval

Observability

Deployment

What iii is

Three primitives, one interface

Worker

Function

Trigger

Engine

The primitive shift

Contributor note

Testing iii inside my repo

How I deployed the iii harness on Fly.io

This is a phased migration

My harness vs iii

Thin app harness vs worker substrate

What feels better with iii

What still hurts

The trade-off

Where iii sits in the harness landscape

Five answers to where the harness jobs live

In-process framework

Thin SDK

Durable-execution substrate

Session platform

Worker bus

What I would want from iii next

The primitives are strong; the defaults can get sharper

A first-class run contract

Browser-friendly event reads

Worker lifecycle guardrails

Production HA recipes

Governance starter packs

Replay and evals as primitives

The slider, not the religion

References

Footnotes

Was this helpful?

Recent Posts

Building Aura: An Agentic LLM Gateway in Rust

The Moat Isn't Your Model — It's Your Harness and Data Flywheel

From Tweets to Carts: Stealing Twitter's AI Blueprint for E-Commerce

The Third Path: Why the Super IC vs. Product Engineer Debate Misses the Point

Taste Still Matters In AI & Software Engineering