umai-tech / design-system

Design System

The tokens and components behind umai-tech.com — Japanese craftsmanship (うまい) meets a terminal-flavored engineering aesthetic.

Color tokens

Defined in tailwind.config.mjs and used as text-umai-accent, bg-umai-gray-900, etc.

umai-primary #0a0a0a

Dark surfaces, primary text, dark-mode background

umai-secondary #f5f5f5

Light surfaces, inverted text on dark

umai-accent #8B5CF6

Brand purple — links, highlights, eyebrows, CTAs

umai-accent-dark #7C3AED

Hover states, gradients

umai-accent-darker #6D28D9

Pressed states, deep gradients

umai-accent-light #A78BFA

Dark-mode accents, soft highlights

Gray scale umai-gray-*

50 #FAFAFA
100 #F4F4F5
200 #E4E4E7
300 #D4D4D8
400 #A1A1AA
500 #71717A
600 #52525B
700 #3F3F46
800 #27272A
900 #18181B

Typography

Inter font-sans

The harness is the product. Build agents that ship.

Primary — headings, body, UI

JetBrains Mono font-mono

./marcus-cv --display --interactive

Code, terminals, eyebrows, data

Noto Sans JP font-japanese

うまい — skillful, delicious

Japanese brand accents

Core patterns

The recurring shapes across pages and blog components.

eyebrow label

Hover-lift card

Rounded-2xl, subtle border, lifts on hover with an accent gradient bar that fades in along the top edge.

buttons

chip / badge

Pills for navigation-level actions, accent rectangles for in-content actions, mono-tracking chips for metadata.

Callout

Boxed emphasis in blog posts. insight carries thesis-level statements, success principles and playbooks, warning risks, info context, and note cross-references.

type="info"

One strong sentence the reader's eye should land on.

type="warning"

One strong sentence the reader's eye should land on.

type="error"

One strong sentence the reader's eye should land on.

type="success"

One strong sentence the reader's eye should land on.

type="note"

One strong sentence the reader's eye should land on.

type="insight"

One strong sentence the reader's eye should land on.

Citation

Attributed quotes. pullquote is reserved for a post's single marquee quote.

The model is the rented car — the harness is the road network.
— Marcus Elwin The Moat Is the Harness, Not the Model
A supporting quote uses the quieter quote treatment.
— Marcus Elwin umai-tech.com
Less prominent attributions use the reference treatment.
— Marcus Elwin umai-tech.com

Code terminal

Dark terminal panel with mac dots, tab strip, and Shiki highlighting (GitHub Dark) — also how you consume the tokens.

umai-tech · tokens
<!-- Accent text + eyebrow pattern -->
<p class="text-xs font-bold uppercase tracking-[0.18em] text-umai-accent">
  eyebrow label
</p>

<!-- Card surface, light + dark -->
<div class="rounded-2xl border border-umai-gray-200 bg-white
            dark:border-umai-gray-700 dark:bg-umai-gray-800/80">
  ...
</div>

Component inventory

Every component in src/components/, scanned at build time and grouped by where it's actually used — this list can't go stale.

82
components
76
in production
19.7k
lines of component code
14
in the live gallery below

Site & pages ×14

AuthorCard 148 loc
[...slug]
Features 90 loc
index
Footer 157 loc
Layout
Header 366 loc
Layout
Hero 63 loc
index
InteractiveCV 1028 loc
about cv
PageViews 108 loc
[...slug] blog
PostReactions 241 loc
[...slug]
ProductCard 129 loc
products
ProjectCard 145 loc
projects
ReadingTime 25 loc
[...slug] blog
RecentPosts 125 loc
[...slug] blog
TableOfContents 220 loc
[...slug]
TeamSection 572 loc
about

Shared blog components ×4

Callout 161 loc
agentic-commerce-the-dawn-of… building-aura-an-agentic-llm… +9
Citation 151 loc
agentic-commerce-the-dawn-of… i-built-a-multi-agent-harnes… +2
CodeTerminal 208 loc
building-aura-an-agentic-llm… i-built-a-multi-agent-harnes…
PrettyTable 213 loc
agentic-commerce-the-dawn-of… lego-pieces-of-agentic-comme…

Post agentic-commerce-the-dawn-of-a-new-ai-driven-era-for-ecommerce ×12

CapabilityCard 59 loc
agentic-commerce-the-dawn-of…
ChallengeCategories 203 loc
agentic-commerce-the-dawn-of…
CompetitiveLandscape 738 loc
agentic-commerce-the-dawn-of…
ConvergenceForces 122 loc
agentic-commerce-the-dawn-of…
CustomIcon 75 loc
agentic-commerce-the-dawn-of…
FuturePredictions 280 loc
agentic-commerce-the-dawn-of…
InvestmentDashboard 192 loc
agentic-commerce-the-dawn-of…
MaturityLevels 256 loc
agentic-commerce-the-dawn-of…
PrettyList 52 loc
agentic-commerce-the-dawn-of…
ResearchInsights 57 loc
agentic-commerce-the-dawn-of…
Timeline 472 loc
agentic-commerce-the-dawn-of…
UpdateTracker 239 loc
agentic-commerce-the-dawn-of…

Post building-aura-an-agentic-llm-gateway-in-rust ×8

AuraArchitectureDiagram 278 loc
building-aura-an-agentic-llm…
CrateWorkspace 279 loc
building-aura-an-agentic-llm…
GatewayLandscapeMatrix 97 loc
building-aura-an-agentic-llm…
LessonsGrid 39 loc
building-aura-an-agentic-llm…
LoadTestChart 267 loc
building-aura-an-agentic-llm…
SupportedModelsGrid 70 loc
building-aura-an-agentic-llm…
WhatWorkedWhatHurt 69 loc
building-aura-an-agentic-llm…
ZoomableFrame 450 loc
building-aura-an-agentic-llm…

Post from-tweets-to-carts-twitter-ai-ecommerce ×4

DomainComparison 158 loc
from-tweets-to-carts-twitter…
PipelineOverview 188 loc
from-tweets-to-carts-twitter…
PipelineStage 273 loc
from-tweets-to-carts-twitter…
StrategyTable 176 loc
from-tweets-to-carts-twitter…

Post i-built-a-multi-agent-harness-then-tested-iii ×7

HarnessComparisonCards 112 loc
i-built-a-multi-agent-harnes…
HarnessImprovementCards 98 loc
i-built-a-multi-agent-harnes…
HarnessJobStack 92 loc
i-built-a-multi-agent-harnes…
HarnessLandscapeGrid 104 loc
i-built-a-multi-agent-harnes…
IiiPrimitivesGrid 93 loc
i-built-a-multi-agent-harnes…
InAppHarnessArchitecture 311 loc
i-built-a-multi-agent-harnes…
MultiAgentPatternGrid 110 loc
i-built-a-multi-agent-harnes…

Post lego-pieces-of-agentic-commerce-the-necessary-layering-of-multiple-protocols ×2

ImplementationExample 173 loc
lego-pieces-of-agentic-comme…
TextWallComparison 527 loc
lego-pieces-of-agentic-comme…

Post taste-still-matters-in-ai-software-engineering- ×10

CaseStudyComparison 347 loc
taste-still-matters-in-ai-so…
CodeTasteComparison 412 loc
taste-still-matters-in-ai-so…
MarketShareChart 664 loc
taste-still-matters-in-ai-so…
PlatformGrid 247 loc
taste-still-matters-in-ai-so…
ReviewNote 195 loc
taste-still-matters-in-ai-so…
SkillsTransition 216 loc
taste-still-matters-in-ai-so…
SoftwareEvolution 255 loc
taste-still-matters-in-ai-so…
TasteDevelopmentPath 213 loc
taste-still-matters-in-ai-so…
TastePillars 231 loc
taste-still-matters-in-ai-so…
WaveCard 276 loc
taste-still-matters-in-ai-so…

Post the-moat-is-the-harness-not-the-model ×9

FlywheelDiagram 493 loc
the-moat-is-the-harness-not-…
FoundationModelMoatCards 123 loc
the-moat-is-the-harness-not-…
HarnessDiagram 614 loc
the-moat-is-the-harness-not-…
MoatRadar 553 loc
the-moat-is-the-harness-not-…
MoatVersusTable 185 loc
the-moat-is-the-harness-not-…
ModelCommoditizationChart 416 loc
the-moat-is-the-harness-not-…
PlaybookCards 640 loc
the-moat-is-the-harness-not-…
SimplifiedAgentDiagram 139 loc
the-moat-is-the-harness-not-…
VerticalizationHeatmap 524 loc
the-moat-is-the-harness-not-…

Post the-third-path-player-coach-at-scale ×6

CareerFilter 94 loc
the-third-path-player-coach-…
CareerPathEvolution 187 loc
the-third-path-player-coach-…
LeverageFormula 154 loc
the-third-path-player-coach-…
ProfileComparison 335 loc
the-third-path-player-coach-…
SkillsRadar 362 loc
the-third-path-player-coach-…
ThreePathsComparison 249 loc
the-third-path-player-coach-…

Retired / unused ×6

CardCarousel 191 loc
CollapsibleSection 120 loc
Mermaid 53 loc
MoatComparison 155 loc
ServiceCard 87 loc
TrustMetrics 95 loc

Live gallery

Every self-contained component, rendered live — expand to inspect. Prop-driven components (MoatRadar, FlywheelDiagram, Timeline, …) need post data, so they live in the inventory above and render in their posts.

<SimplifiedAgentDiagram />

What do we mean by an agent + harness?

Agent +=
  • Instructions
  • Model
  • Tools
  • Memory

A simplified view

"Agentic Harness"

Everything around the model — the instructions it follows, the tools it can call, and the memory it keeps between turns.

<HarnessDiagram />
On failure → loop back to Cognitive Engine
7
Output on success

Click any component to explore how it creates defensibility

<ModelCommoditizationChart />

The Model Commoditization Curve

Frontier model pricing collapse: November 2022 to April 2026

Release Date (Nov 2022 - Apr 2026)
Price Index
Major Model Releases
95.2%
Price Drop
40x/yr
Cost Reduction Rate
41 months
Nov '22 to Apr '26

Chart shows output token pricing trends for frontier LLMs from November 2022 to April 2026.

Models tracked: OpenAI (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o, GPT-5.2, GPT-5.5), Anthropic (Claude 2, Claude 3 Opus, Claude 3.5 Sonnet, Claude Opus 4, Claude Opus 4.7), Google (Gemini 1.0, Gemini 2.0 Pro, Gemini 3.1 Pro), and DeepSeek (R1).

Price index methodology: Normalized to GPT-4 March 2023 pricing as baseline (index = 100). Each data point represents the average output token price for comparable frontier-tier models at time of release, adjusted for relative performance on standard benchmarks (MMLU, HumanEval, GSM8K, SWE-bench).

Key inflection points: DeepSeek R1 (Dec 2024) triggered a 43% single-quarter price drop by undercutting proprietary models by ~90%; Claude 3.5 Sonnet demonstrated mid-tier models matching flagship performance at lower cost; GPT-4 Turbo introduced tiered pricing; April 2026 saw intense competition with GPT-5.5 ($5/$30 per million tokens) and Claude Opus 4.7 ($5/$25) both achieving near-parity pricing at frontier performance levels.

Sources: OpenAI API Pricing (GPT-5.5 launch April 23, 2026), Anthropic API Pricing (Opus 4.7 launch April 16, 2026), Google AI Pricing, DeepSeek Pricing Docs, BenchLM.ai LLM Pricing Trends, Epoch AI Price Performance Analysis (2025-2026), Menlo Ventures Enterprise API Market Share Report (Mid-2025).

<VerticalizationHeatmap />

Platform Giants: Verticalization Risk Matrix

How OpenAI, Anthropic, and Google are entering vertical markets (April 2026)

Vertical
OpenAI
$852B valuation
Anthropic
Claude Opus 4.7
Google
Gemini 3.1 Pro
Legal
Design
Coding
Healthcare
Finance
Commerce
Risk Intensity:
Critical (80+)
High (60-79)
Medium (40-59)
Low (20-39)
Minimal (<20)

Methodology: Risk scores (0-100) assess the threat level each platform play poses to incumbent vertical SaaS providers, based on: market reaction (stock drops, market cap impact), strategic investment size, product maturity, distribution advantage, and timing. Click any cell to view detailed sources and analysis. Scores reflect verticalization risk as of April 2026.

<AuraArchitectureDiagram />
Aura Architecture
Click any box to see what it does
AURA GATEWAY
Middleware
Core
POST /v1/responses Open Responses API
<LoadTestChart />

Gateway load test — 1,000 requests, 1–5 tool calls

Aura vs LiteLLM, Portkey, Helicone, OpenRouter, Bifrost

Scenario 1 / 5
Heads up: these numbers are directional placeholders pending a live benchmark run. They reflect the rough shape I'd expect from each gateway's architecture (Rust vs Python, agentic vs translation, etc.), not measured values. I'll update with real numbers once I've run the harness against all six.
Gateway overhead
lower = better

Pure gateway-added latency, provider round-trip subtracted.

  • Aura
    4 ms
  • Bifrost
    3 ms
  • Helicone
    6 ms
  • Portkey
    22 ms
  • OpenRouter
    30 ms
  • LiteLLM
    58 ms
p50 latency
lower = better

Median end-to-end request latency.

  • Aura
    312 ms
  • Bifrost
    308 ms
  • Helicone
    318 ms
  • Portkey
    345 ms
  • OpenRouter
    360 ms
  • LiteLLM
    395 ms
p99 latency
lower = better

Tail latency — the slowest 1% of requests.

  • Aura
    612 ms
  • Bifrost
    605 ms
  • Helicone
    622 ms
  • Portkey
    690 ms
  • OpenRouter
    720 ms
  • LiteLLM
    810 ms
Sustained throughput
higher = better

Requests per second under steady load.

  • Aura
    1,450 RPS
  • Bifrost
    1,520 RPS
  • Helicone
    1,380 RPS
  • Portkey
    920 RPS
  • OpenRouter
    840 RPS
  • LiteLLM
    540 RPS
Scenario: 1,000 requests · 1 tool call per request · same provider (Anthropic Sonnet 4.5) behind every gateway · warm-cache, post-jit.
Aura Best in scenario Competitor
<MultiAgentPatternGrid />

multi-agents-team

Nine architecture pages, one event contract

Each card links to the live architecture view on the playground. The point isn’t that one pattern wins — it’s that the same task can move through different coordination shapes and still stream back through the same UI.

<InAppHarnessArchitecture />

original harness architecture

The whole first harness lived inside one app

Click a block to inspect the job it owns — the detail panel below the diagram updates in place.

Open live app ↗
browser localStorage history
Why this works: everything shares one process boundary, so iteration is fast and the source is easy to read. Why it bends: durability, policy, budget enforcement, and server-side state all become custom app code.

mat.umai-tech.com

Browser chat UI

The public app owns the mode selector, provider settings, local chat history, live timeline, tree visualizations, and final rich summaries.

Responsibility

User-facing control plane

<HarnessJobStack />
the first harness

One app, nine jobs

The original harness worked because all responsibilities were close to the product. That same closeness is also where production pressure starts.
thin harness
01 harness job

Turn loop

Hand-rolled runners inside the Next.js app.
pressure point Readable, but the route owns too much lifecycle.
02 harness job

Events to UI

SSE from the route.
pressure point Simple until runs need to outlive the request.
03 harness job

Tools

AI SDK tool definitions with Zod schemas.
pressure point Good primitive; policy is still separate work.
04 harness job

Policy

None by default.
pressure point Fine for demos, risky for internal tools.
05 harness job

Budget

Estimated cost only.
pressure point Useful signal, not an enforcement layer.
06 harness job

Sessions

Browser localStorage.
pressure point Convenient, not durable server-side memory.
07 harness job

Human approval

In-memory request registry.
pressure point Easy to prototype, fragile across restarts.
08 harness job

Observability

Logs and UI events.
pressure point Great for demos; thin for incident debugging.
09 harness job

Deployment

One Next.js app.
pressure point The strongest feature of the original harness.
<HarnessComparisonCards />
migration ledger

Thin app harness vs worker substrate

Same product surface, different runtime responsibilities. The useful comparison isn’t who’s “better” — it’s where each harness shape puts the work.
Best use case
AI SDK harness
Learning, demos, local dev, fast iteration.
iii substrate
Production-ish runs that need durable state, policy, budgets, and traces.
Mental model
AI SDK harness
One app owns the route, loop, tools, events, and UI.
iii substrate
Engine plus workers; each harness concern can be a worker.
Streaming
AI SDK harness
Simple SSE from the Next.js route.
iii substrate
Worker emits events through engine/channel/stream paths; app forwards them.
Tool policy
AI SDK harness
Whatever I code inline. Initially: nothing.
iii substrate
A policy function can gate calls and fail closed.
Budget control
AI SDK harness
Estimate and display cost.
iii substrate
Budget can become a runtime enforcement concern.
Sessions
AI SDK harness
Browser-local history.
iii substrate
Server-side state keyed by conversation/session.
Human approval
AI SDK harness
Easy to prototype, fragile in one process.
iii substrate
Can be backed by durable state/queue mechanics.
Observability
AI SDK harness
UI timeline plus logs.
iii substrate
Cross-worker traces are part of the substrate.
Provider support
AI SDK harness
Great through the AI SDK, request-scoped in my app.
iii substrate
Provider workers make model access another replaceable layer.
Deployment
AI SDK harness
One Next.js app.
iii substrate
Next.js app plus engine/workers. More power, more ops.
<HarnessLandscapeGrid />
the landscape

Five answers to where the harness jobs live

Every serious agent stack answers the harness question somewhere. The interesting comparison isn’t feature lists — it’s which layer of your system ends up owning durability, policy, budgets, and state.

In-process framework

LangGraph Microsoft Agent Framework OpenAI Agents SDK
Inside your application process. Graphs, checkpointers, guardrails, and sessions are objects the framework owns.
trade-off Fast to adopt and easy to reason about locally; durability and policy are bounded by the framework and the process running it.

Thin SDK

Claude Agent SDK openharness Pi
In a packaged loop you embed as a library: tools, permissions, hooks, compaction, and subagents.
trade-off Excellent loop ergonomics; durability, budgets, and multi-service orchestration are explicitly out of scope.

Durable-execution substrate

Temporal Restate Inngest Hatchet
In a workflow engine that makes the run itself durable: retries, queues, state, and replay come from the substrate.
trade-off The strongest durability story; agent-specific jobs like policy, approvals, and budgets are still yours to build on top.

Session platform

OpenHands
In a per-session runtime: sandbox, event stream, and tools bundled around each agent session.
trade-off Batteries included for coding agents; less a substrate you compose, more a product you adopt.

Worker bus

iii
tested in this post
In independent workers on a shared engine bus. Each harness job — policy, budget, state, provider, traces — is a function you can swap without touching the rest.
trade-off The most composable shape, and the one this post tests; the price is that you operate the engine and pay the early-adopter integration tax.
<HarnessImprovementCards />
iii next asks

The primitives are strong; the defaults can get sharper

These aren’t reasons to reject iii. They’re the gaps I’d productize first if the goal is making agent harness adoption feel boring for real teams.
01 product gap

A first-class run contract

today
iii gives you workers, functions, triggers, state, streams, and channels.
what I would want next
A standard agent-run envelope with run IDs, session IDs, event versions, terminal states, cancellation, and retry semantics.
Why it matters: This would remove adapter glue where every app invents its own event translator.
02 product gap

Browser-friendly event reads

today
Channels and streams are strong worker-to-worker primitives.
what I would want next
A documented HTTP/SSE or browser SDK path for reading queued run events without writing a proxy worker.
Why it matters: My queue path had to expose `GET /events` from the worker so the Next.js app could poll safely.
03 product gap

Worker lifecycle guardrails

today
The engine tracks connected workers and removes functions when workers disconnect.
what I would want next
Replace-on-register, singleton leases, or an `iii doctor` check for duplicate workers and stale registrations.
Why it matters: Two MAT workers registering the same function can split runs and create random timeouts.
04 product gap

Production HA recipes

today
The docs cover Docker and reverse proxies for production deployment.
what I would want next
Opinionated recipes for external state, stream, and queue backends on Fly.io, Kubernetes, and managed Redis/RabbitMQ.
Why it matters: Until those layers are externalized, my Fly deployment should stay single-machine.
05 product gap

Governance starter packs

today
The harness model makes policy, budget, approval, credentials, and provider routing replaceable workers.
what I would want next
Tested templates for OPA/Cedar policy, Slack approvals, workspace budgets, provider key vaulting, and audit logs.
Why it matters: The primitives are there; teams still need safe defaults before agents touch internal systems.
06 product gap

Replay and evals as primitives

today
Tracing and logs make runs inspectable after the fact.
what I would want next
A replay/eval worker that can rerun a turn against pinned inputs, compare traces, and catch harness regressions.
Why it matters: Agent harness work needs regression tests for behavior, not just unit tests for functions.
<LeverageFormula />

The Player-Coach Leverage Formula

Why context-rich leaders get disproportionate returns from AI tools

Deep Context
Historical Decisions
Political Landscape
Technical Debt Map
Strategic Priorities
+
AI Fluency
Code Generation
Pattern Recognition
Rapid Prototyping
=
Player-Coach Leverage
Strategic altitude maintained
Direct output at scale
Judgment-directed AI

The Compounding Effect

Pure ICs have AI fluency but often lack strategic context. Product Engineers have context but delegate implementation. The Player-Coach keeps both—and the combination creates leverage that neither can match alone.

<CareerPathEvolution />

The Leverage Equation Changed

How AI transformed the traditional career fork into a viable dual path

Traditional Model
Zero-Sum Tradeoff
Senior Engineer
CHOOSE
Management
Scale via others
Stop Shipping
IC Track
Direct output
Limited Leverage
AI-Enabled Model
Both Paths Viable
Senior Engineer
BOTH PATHS
Scale via AI + Others
Strategic leverage
Keep Shipping
Direct Output via AI
10x velocity
Strategic Leverage

The shift: AI compresses implementation time, freeing bandwidth for strategic work without sacrificing direct output.

The rule of thumb

Neutral zinc grays do the layout work, one purple does the talking, and the mono font marks anything machine-flavored. If a new component needs a second accent color, it's probably the wrong design.