Building Aura: An Agentic LLM Gateway in Rust
I'm a Python person. I built a Rust LLM gateway with Claude Code. Here's why I added another box to the LLM-gateway shelf, what makes Aura agentic-native instead of just OpenAI-compatible, how to use it from your existing framework, and what I learned about vibe engineering production infrastructure as a non-Rust developer.
Why I Built Another LLM Gateway
There are already good LLM gateways. LiteLLM is the one most teams reach for first. Portkey has guardrails and a polished managed plane. Helicone leads on observability. OpenRouter gives you 290+ models behind one OpenAI-compatible URL with passthrough billing1. Vercel AI Gateway ships model fallbacks and Fluid-compute observability for the Next.js crowd. Bifrost from Maxim AI claims 11 µs of overhead at 5k RPS — about 50× faster than LiteLLM2. Opper AI is the EU-sovereign managed gateway with 300+ models and LLM-as-a-judge scoring built in3. The shelf is full.
So why did I spend the last few months building Aura — a Rust LLM gateway I’m about to open-source (github.com/UmaiTech/aura-llm-gateway) — when I could have just used one of those?
Three reasons, and they’re all related:
- Most gateways treat agents as an afterthought. They speak
chat/completions. They normalize to OpenAI’s older schema. Tool calls, reasoning items, andrequires_actionflags get flattened or dropped. The thing I actually need to build — agentic workflows that yield control back and forth between a model and my application — fits awkwardly on top. - I wanted the latency budget of Rust, not a “fast enough” Python proxy. When the gateway sits on the request path of every LLM call your product makes, the overhead it adds is overhead your users feel.
- I’m a Python person. I wanted to know whether vibe engineering with Claude Code could carry me into a language I’d never shipped production code in — and what would break.
This post is the story of what Aura is, what it does that the existing gateways don’t, how to use it from your existing framework, and what I learned building it. It’s a companion piece to the talk I gave at Agentic Dev Days Stockholm 2026 — Vibe engineering taught me Rust.
Slides from the talk
I gave this as an 18-slide talk at Agentic Dev Days Stockholm 2026. If you want the deck — Vibe engineering taught me Rust: building Aura, an agentic LLM gateway, with Claude Code — grab the PDF here: Download the slides (PDF, ~5 MB). Most of this post mirrors the talk, but in more depth.
The Core Thesis
Aura is a Rust gateway built around the Open Responses API — the emerging open standard for agentic LLM workflows4 — not a translation layer that flattens agents into chat completions. The model is one provider. The gateway is the runtime that makes agentic loops legible across providers.
The Existing Gateway Landscape
Before I justify why a new gateway, let’s be honest about what’s already there. I did this research myself before writing a line of Rust, and it shaped what Aura ended up being.
LLM gateways in 2026 — what they're good at, what's missing for agents
Survey of the gateways teams actually reach for when building agentic systems
LiteLLM was my baseline. It’s the gateway I’d actually used in prior work, and it does the job. But when I started prototyping agents seriously — tools yielding back to my code, multi-step reasoning, partial responses with requires_action: true — I kept writing the same translation layer twice: once to talk to LiteLLM, once to interpret what came back.
That translation layer is the Open Responses API. So I cut out the middle hop.
What the Open Responses API Actually Changes
The Open Responses API is a specification published by the openresponses.org working group, with adoption from Hugging Face, OpenRouter, Vercel, LM Studio, Ollama, and vLLM4. It’s based on OpenAI’s Responses API, but reframed as an open standard so the agentic primitives — items, tool calls, reasoning, status lifecycle — work the same way across providers.
The core primitives:
- Items — atomic units of a conversation. Not just messages, but
function_call,function_call_output,reasoning,web_search, and so on. An agent’s “turn” is a list of items, not a string. - Response — a container with a
statuslifecycle:in_progress → completed | failed | incomplete. - Streaming as semantic events — not raw token deltas. You get
response.output_item.added,response.output_text.delta,response.completed. Your UI knows what each event means. previous_response_idfor conversation threading without resending history.- Externally vs internally hosted tools — function-calling vs provider-hosted tools (file search, web search) are first-class concepts, not glued on.
If you’re building agents, this is the shape you want. The Chat Completions shape was designed for one-shot Q&A; the Responses shape was designed for loops.
Aura speaks this natively. One endpoint — POST /v1/responses — and every provider goes through the same item-based contract.
Meet Aura
Aura is a 4-crate Rust workspace. It’s small enough to read in an afternoon and structured so each piece has a single responsibility — and the stack underneath is deliberately boring.
Aura — 4-crate Cargo workspace
Each crate has one responsibility, with explicit dependency direction
What’s on the box
- Open Responses API — agentic-native spec, not “OpenAI-compatible adjacent.”
- 7 providers — OpenAI, Anthropic, Google Gemini, Mistral, Ollama, AWS Bedrock, HuggingFace.
- Agentic metadata on every response —
provider,latency_ms,has_tool_calls,tools_used,requires_action,request_id. Same shape, every provider. - Cost tracking — per-request USD on every response, with input/output/cached/reasoning broken out. Surfaced to users, not just logged.
- Multi-tenant model —
org → team → project → end-userhierarchy. Per-user cost allocation lives in the data model, not in your billing service. - AES-256-GCM envelope encryption for provider credentials at rest. A bring-your-own-key gateway shouldn’t leak keys.
- Rate limiting + response cache — Redis-backed token bucket + SHA256-keyed TTL cache. Optional, but if you’re routing real traffic you want both.
- Prompt compression — TOON, AISP, YAML-min, JSON-min. 40–60% token savings on uniform arrays via TOON, which adds up faster than people expect.
Architecture at a glance
The diagram below is interactive — click any box to see what that component does, sourced from the codebase. Hover the diagram and hit the expand icon for a fullscreen view.
Two things in this picture matter more than they look:
Provider resolution from the model name, no routing config. You send "model": "claude-sonnet-4-5" and Aura figures out it’s Anthropic. You send "gpt-5" and it goes to OpenAI. The provider: field comes back enriched on the response. You don’t maintain a YAML mapping; the registry owns that knowledge.
Response enrichment is non-negotiable. Every response — every provider — gets cost_usd, latency_ms, and an agentic{} block bolted on before it leaves the gateway. That’s the contract Aura adds on top of the provider’s native response. It’s also what makes the gateway useful rather than just a router.
Supported Models
Aura ships with seven providers as of v0.9. Anthropic and Gemini have full streaming and tool-call support; the others land via the same Provider trait and can be added in a single file.
Model families supported in v0.9
Resolve by family or by pinned version — Aura's registry handles both
Adding a new provider is implementing the Provider trait in one file. See crates/aura-core/src/provider/ for the full list.
A Live-Demo Request
Here’s what an Aura request looks like end to end. One endpoint, three providers behind it, full agentic metadata on the way back.
curl -X POST https://api.aura-llm.dev/v1/responses \
-H "Authorization: Bearer $AURA_KEY" \
-d '{
"model": "claude-sonnet-4-5",
"input": [{
"role": "user",
"content": "Search the web for the current price of GPT-5 input tokens."
}],
"tools": [{ "type": "web_search" }],
"user": "customer_123"
}'Swap "claude-sonnet-4-5" for "gpt-5" and the shape of the response is identical. That’s the actual value proposition. Not “one URL”; one shape.
Using Aura From Your Existing Framework
Aura is just an HTTP server speaking the Open Responses API. Locally it lives on localhost:8080; in production it’s https://api.aura-llm.dev. You can hit it with anything that speaks HTTP — or, if you don’t want to write any client code yet, with no client at all via playground.aura-llm.dev. The shortcuts:
Python — the official SDK
The first-party SDK ships as aura-llm on PyPI. Install with uv or pip:
uv add aura-llmThen the same code shape works against any of the seven providers — sync, streaming, or async:
from aura import AuraClient
client = AuraClient(
api_key="your-api-key", # or AURA_API_KEY env var
base_url="https://api.aura-llm.dev", # or http://localhost:8080 locally
)
# Non-streaming — any model in the registry
response = client.responses.create(
model="claude-sonnet-4-5",
input="What's the capital of Sweden?",
)
print(response.output_text)
print(f"cost: ${response.usage.cost_usd}")OpenAI SDK — point and shoot
If you’re already on the OpenAI Python or TypeScript SDK, point base_url at Aura and most calls Just Work:
from openai import OpenAI
client = OpenAI(
base_url="https://api.aura-llm.dev/v1", # or http://localhost:8080/v1 locally
api_key="your-aura-key",
)
response = client.responses.create(
model="claude-sonnet-4-5", # any Aura-supported model
input="Hello from the OpenAI SDK",
)Aura’s /v1/responses accepts the OpenAI Responses payload shape, so the SDK doesn’t know it’s not talking to OpenAI. You still get Aura’s enrichment back — cost_usd, agentic{}, latency_ms — they just ride along on the response.
Agent frameworks
Same trick works for the major agent frameworks because they layer on top of the OpenAI / Responses shape:
- LangChain / LangGraph — set the
openai_api_baseof yourChatOpenAItohttps://api.aura-llm.dev/v1and use any of Aura’s seven providers as if it were an OpenAI model. - LlamaIndex — pass
api_base="https://api.aura-llm.dev/v1"toOpenAI(...)inllama_index.llms.openai. - Mastra / LangGraph.js — same shape on the TypeScript side. Set the base URL and ship.
- DSPy —
dspy.OpenAI(api_base="https://api.aura-llm.dev/v1", model="claude-sonnet-4-5")and you’ve got an Anthropic-backedModulewithout changing a line of your DSPy code.
The TypeScript SDK (@umai/aura) is in progress; until it lands, the OpenAI SDK is the path of least resistance on the Node side. The full integration docs live at docs.aura-llm.dev (landing soon).
Deploying Aura
Aura is a single static binary — no Python virtualenv, no node_modules, no runtime. Pick a deployment shape based on how serious you are.
Four shapes, ranked roughly by how serious the deployment is. Tab through them:
# Fastest loop — clone, set env vars, run
git clone https://github.com/UmaiTech/aura-llm-gateway
cd aura-llm-gateway
# Required: at least one provider key + a master key for credential encryption
export AURA_MASTER_KEY=$(openssl rand -hex 32)
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
cargo run -p aura-proxy
# Listening on 0.0.0.0:8080A few notes on the shapes:
cargo runneeds no Postgres or Redis for the happy path — both are optional. Skip them and Aura runs in stateless mode (no request logs, no API-key auth, no rate limits — fine for local agent experimentation).docker composeis what you want for full middleware locally — auth, rate limits, response cache, request logs all wired up.docker buildis the production shape. TheDockerfileis a multi-stage build using cargo-chef for layer caching, producing a minimal Debian-slim image typically under 80 MB.systemdif you’d rather skip Docker.cargo build --releasegives you./target/release/aura-proxy— drop it on a VM, point the unit file at it.
The Hosted Version — Aura on api.aura-llm.dev
Self-hosting isn’t for everyone. The same Aura binary you can git clone runs as a hosted gateway at api.aura-llm.dev — same Open Responses contract, same agentic metadata, no Postgres or Redis to operate yourself.
When the hosted version makes sense:
- You’re prototyping and don’t want to think about credential rotation, schema migrations, or rate-limit infra yet.
- You’re a small team where one less service to babysit is worth more than the bring-your-own-key cost.
- You want EU residency without standing up your own EU VMs —
api.aura-llm.devis hosted in Stockholm with EU-only request logs. - You want to try Aura’s agentic shape against your existing LangChain/DSPy/Mastra code before committing to a self-host migration.
When self-hosting wins:
- You already operate Postgres and Redis and want zero new SaaS in the request path.
- You need on-prem or air-gapped deployment — pull
aura-llm-gateway:0.9into your private registry and run. - You’re sensitive to per-request markup — the hosted version takes a small fee on top of pass-through provider cost; self-hosted is free.
Pricing — to be finalized
Hosted Aura pricing is being finalized. The plan, roughly: a free tier for development with rate-limited usage, and a pay-as-you-go tier with passthrough provider pricing plus a small per-request fee that funds the open-source work. No subscription minimum. Full breakdown will live at aura-llm.dev/pricing.
The zero-install path: if you just want to see Aura’s agentic shape against a real prompt, open playground.aura-llm.dev in a browser. It’s the same apps/chat React app that ships in the repo, pointed at the hosted gateway. Free-tier with a daily message cap, frontier models gated to beta, multi-provider model picker — no signup, no API key, no curl.
When you’re ready to wire it into your own code, onboarding is three steps:
- Sign up at aura-llm.dev.
- Grab an API key from the dashboard — scoped to an org, team, and project from the start.
- Point your existing client at
https://api.aura-llm.dev/v1and use the same model names. Cost, latency, and agentic metadata land in the dashboard with no extra wiring.
The self-hosted code is what powers api.aura-llm.dev. There’s no “hosted-only” feature flag, no proprietary fork — when v0.10 ships, the hosted gateway upgrades from the same MIT-licensed binary you’d run yourself. That’s the deal: the OSS is the product, the hosted version is the convenience.
Load Test — Aura vs the Competition
The “Rust gateway sits in the single digits of overhead” claim deserves more than just an assertion. Below is the harness I’m running — 1,000 requests per scenario, 1 to 5 tool calls per request, six gateways behind the same provider (Anthropic Sonnet 4.5) — to see how each one holds up as agentic loops get heavier.
The component is interactive — switch scenarios with the tabs, and each panel shows four metrics: gateway overhead, p50 latency, p99 latency, and sustained throughput.
Gateway load test — 1,000 requests, 1–5 tool calls
Aura vs LiteLLM, Portkey, Helicone, OpenRouter, Bifrost
Gateway overhead
lower = betterPure gateway-added latency, provider round-trip subtracted.
- Aura 4 ms
- Bifrost 3 ms
- Helicone 6 ms
- Portkey 22 ms
- OpenRouter 30 ms
- LiteLLM 58 ms
p50 latency
lower = betterMedian end-to-end request latency.
- Aura 312 ms
- Bifrost 308 ms
- Helicone 318 ms
- Portkey 345 ms
- OpenRouter 360 ms
- LiteLLM 395 ms
p99 latency
lower = betterTail latency — the slowest 1% of requests.
- Aura 612 ms
- Bifrost 605 ms
- Helicone 622 ms
- Portkey 690 ms
- OpenRouter 720 ms
- LiteLLM 810 ms
Sustained throughput
higher = betterRequests per second under steady load.
- Aura 1,450 RPS
- Bifrost 1,520 RPS
- Helicone 1,380 RPS
- Portkey 920 RPS
- OpenRouter 840 RPS
- LiteLLM 540 RPS
A few honest notes on this:
- Numbers are directional estimates, not measurements yet. The shape is what I expect from each gateway’s architecture — Python interpreter overhead for LiteLLM, hosted-edge overhead for Portkey/OpenRouter, raw Rust speed for Aura/Bifrost/Helicone. The harness that produces the real numbers now lives at
scripts/bench/in the gateway repo —uv run python harness.py --smoketo sanity-check,--fullfor the headline run. I’ll swap the placeholder props for measured numbers once the first end-to-end run completes against v0.9. - Same provider, same prompt, same model. The interesting variable is the gateway, not the LLM. All six gateways front Anthropic Sonnet 4.5; all five scenarios use the same input shape with the tool-call count as the only knob.
- Throughput drops as tool calls grow for everyone — that’s the loop unrolling, not the gateway choking. Aura’s curve stays flatter because the per-request overhead is small enough to disappear into the LLM round-trip.
- Bifrost is the gateway closest to Aura on raw speed. Helicone is in the same tier. The differentiator inside the Rust tier isn’t µs — it’s the agentic API shape and the multi-tenant model, as the differentiators section made the case for.
I’ll update this section with real numbers once the harness finishes its first end-to-end run against v0.9. The harness is reproducible — clone the repo, fill in .env with your gateway keys, and uv run python harness.py --full --runs 3 writes the same results.json shape this chart consumes.
What’s New on the Table
Now the question I keep getting asked: what does Aura add that the existing gateways don’t? Here’s the honest list, sorted by how confident I am about it:
1. Open Responses API as the front door, not a translation
Every other gateway I evaluated treats the OpenAI Chat Completions schema as the canonical shape and translates up to anything more agentic. Aura inverts that. The Open Responses spec is the wire format; provider adapters translate down into whatever each vendor’s native API wants. Tool calls, reasoning items, and requires_action aren’t enrichment — they’re load-bearing.
OpenRouter has started adopting Open Responses as a partner4. LiteLLM hasn’t. Bifrost is OpenAI-shaped2. Aura ships with it from day one.
2. Agentic metadata as part of the response contract
has_tool_calls, tools_used, requires_action, latency_ms, cost_usd, request_id — every provider, every response, same shape. This sounds boring until you’ve written the third version of “did this response actually call a tool?” in your application code.
LiteLLM logs this in its observability layer. Portkey surfaces it in the dashboard. Helicone shows it in analytics. Aura puts it in the response body where your agent loop can branch on it.
3. Cost as a product feature, not a billing concern
cost_usd arrives on every response. It’s not an admin-panel report you check at the end of the month. You can show it to end users, gate features on per-user budgets, and let PMs reason about unit economics without a separate telemetry pipeline.
This was the single biggest unlock from running early prototypes through Aura: cost stopped being “something to investigate later” and became part of the response.
4. Multi-tenant hierarchy at the data model layer
org → team → project → end-user is baked into the schema, not bolted on with API key prefixes. If you’re building a SaaS that resells LLM access, this matters more than it looks: per-user cost allocation, per-project rate limits, scoped API keys all fall out of the model rather than needing a separate billing layer. Bifrost has governance and SSO; what Aura adds is the user-level cost allocation primitive2.
5. Rust-level latency overhead — honestly compared
The talk slide says “under 10ms overhead.” That number is for the gateway itself — middleware, routing, enrichment — not including the provider round-trip you’d pay anyway. A Python proxy will sit at 30–80ms of pure overhead on a hot path; a Rust gateway built on Axum + Tokio sits in the single digits. On agentic loops that fire 5–20 requests per turn, that compounds.
To be fair: Bifrost reports 11 µs of overhead at 5k RPS — about 50× faster than LiteLLM2. Helicone is also Rust and edge-optimized. Aura is in the Rust tier, not the Python tier, and that’s the tier that matters. Within the Rust tier, the differentiator isn’t raw µs — it’s the agentic API shape and the multi-tenant model.
6. Prompt compression in the middleware stack
TOON, AISP, YAML-min and JSON-min compression are first-class middleware, not a side library. For uniform-array payloads — think enriched product catalogs going into an agent — TOON gives 40–60% token savings, which translates roughly 1:1 into cost savings on the input side.
I haven’t seen another gateway expose compression strategies as a configurable middleware step.
What Aura doesn’t do (yet)
I’d rather be honest than oversell. Aura is at v0.9 — pre-1.0, public APIs and schema can still shift between minor versions. The Python SDK ships; the TypeScript SDK doesn’t. The admin React dashboard landed in apps/admin/, but the browser playground at playground.aura-llm.dev is the more polished front door today. Guardrails and PII redaction — Portkey’s bread and butter — aren’t there yet. EU sovereignty as a first-class concept (Opper’s pitch3) isn’t there. If you need 1000+ provider breadth tomorrow, Bifrost still wins on coverage. If you need 300+ models with built-in LLM-as-a-judge, Opper is the managed answer.
What you do get today: a small, fast, agentic-native gateway with clean types and a roadmap I can actually keep up with as a one-person open-source project.
Building Rust as a Python Person
The other half of this story isn’t about the gateway. It’s about how it got built.
I’m primarily a Python and TypeScript developer. I’d dabbled with Rust before — read the book, wrote a CLI, abandoned it. The reason I shipped Aura in Rust is that I built it with Claude Code as a coding partner and applied what I’ve started calling vibe engineering: the discipline behind vibe coding. Same tools — same Claude — different rigor.
The split, roughly:
- Vibe coding is prompt-and-hope. One giant PR. No plan. Skip the tests. Trust the AI. Great for throwaway demos.
- Vibe engineering is PRD first, prompt second. Bite-sized commits. Architecture diagrams. Verify, don’t trust. Tests as a contract. Same tools, different discipline.
For Aura, vibe engineering meant: I wrote a PRD per crate before I wrote a prompt. I drew the request flow in Mermaid before I let Claude touch a file. Every PR was bite-sized — feat: add routing, test: cover fallback, docs: update PRD — not one giant “make me a gateway” mega-commit. I used Claude Code for the implementation, but I was the architect.
Rust, without being a Rust dev
- Compiler errors are a teacher, not a wall. With Claude Code reading the errors and explaining them in context, the borrow checker became the world's most patient tutor.
- Types catch provider schema drift early. LLM APIs drift. Strong types in
aura-typescaught two real schema changes during the build before they hit users. - Single static binary, no runtime deps.
cargo build --releaseproduces one file. No virtualenv, no node_modules. - Tokio handles SSE streaming cleanly. Server-sent events are an awkward middle-ground in many runtimes. In Tokio they're idiomatic.
- Claude Code is fluent in idiomatic Rust — not just compilable Rust.
Arc<T>vsArc<Mutex<T>>,tokio::spawnfor fire-and-forget, the right error-type idiom per crate. - Refactors feel safe. When the compiler signs off, you can ship it. I'd never had that confidence in Python.
- Borrow checker + async = pain spikes. The interaction between lifetimes and
asyncblocks is where Rust still hurts. Claude Code helped, but we both bounced off the same error for an hour several times. - Lifetimes took weeks to internalize. The book teaches you the syntax. Shipping production code teaches you what they actually mean.
- The crate ecosystem is thinner for AI work. Python has every LLM library a week after a paper drops. Rust has some of them, eventually.
- No
pip installshortcuts. Adding a dependency in Rust is a real decision — features, version pins, compile time. Healthier long-term, slower in the moment. - Compile times break flow state. A cold incremental build on this workspace hits 30+ seconds. You learn to batch.
- Debugging async traits is a trip. Errors from
async_traitmacro expansions can be 40 lines of generics referencing types you didn't write.
Claude Code didn't replace Rust knowledge. It made Rust knowledge reachable.
The honest takeaway
That’s the actual unlock — not “AI writes your code”, but “AI lets you ship in the right tool for the job, even when it isn’t the tool you already know.” A Python person shipped a production-grade Rust gateway. The discipline scaled; the language barrier didn’t.
Six Things I’d Tell Past-Me
If you’re considering building infrastructure like this — gateway, proxy, router, whatever sits in front of the LLM — these are the lessons that would have saved me weeks.
What I learned the hard way
What’s Next for Aura
The roadmap, in rough order of when I expect to land things:
- Multi-node load balancer — distribute across Aura instances, not just across providers within one instance.
- Automated pricing scraper (cron) — provider price changes shouldn’t require a PR. A scheduled job watches the pricing pages and opens a config-update PR.
- Webhooks & async callbacks — for long-running agentic tasks where the response doesn’t come back on the original HTTP connection.
- Admin dashboard (React UI) — for key management, org/team setup, cost reports.
- TypeScript SDK — the missing half of the SDK story.
- More providers via the trait system —
Provideris a trait. New providers should be a single-file addition.
Aura lives at four places, depending on what you need:
- aura-llm.dev — landing page, overview, quickstart
- docs.aura-llm.dev — full documentation, SDK reference, integration guides
- playground.aura-llm.dev — browser chat playground, free-tier with a daily message cap, no install
- api.aura-llm.dev — the hosted gateway endpoint
- pypi.org/project/aura-llm — the official Python SDK
- github.com/UmaiTech/aura-llm-gateway — the repo, MIT-licensed
Issues, PRs, and “you should have looked at X” emails are all welcome. If you’re building agentic workflows and the gateway shape doesn’t fit, tell me — that’s the kind of feedback the v0.x series is for.
The Punchline
There are good LLM gateways. Aura isn’t trying to replace them. It’s trying to be the one I’d actually want to use to build agents: agentic-native API, cost on every response, types that catch provider drift, and a latency budget small enough to disappear. Built in Rust by a Python person, with Claude Code as a coding partner — proof that vibe engineering reaches further than the language you already know.
References
Footnotes
-
TrueFoundry — Best LLM Gateways in 2026 (LiteLLM, Portkey, Helicone overview); Helicone — Top 5 LLM Gateways; OpenRouter pricing & routing docs; OpenRouter docs — Provider Routing. 2026. ↩
-
Bifrost (maximhq/bifrost) on GitHub; Maxim AI — Bifrost: A Drop-in LLM Proxy, 50× Faster Than LiteLLM. 11 µs overhead at 5k RPS, Apache 2.0, written in Go. 2026. ↩ ↩2 ↩3 ↩4
-
Opper AI — LLM Gateway & AI Gateway — 300+ Models, One API; Opper AI — LLM Router Latency Benchmark 2026; Opper AI partnership with Infercom for sovereign LLM inference, May 2026. ↩ ↩2
-
Open Responses Specification — openresponses.org; Hugging Face — Open Responses: What you need to know; InfoQ — Open Responses Specification Enables Unified Agentic LLM Workflows, February 2026. ↩ ↩2 ↩3
Was this helpful?
Let me know what you think!