umai-tech / design-system

Design System

The tokens and components behind umai-tech.com — Japanese craftsmanship (うまい) meets a terminal-flavored engineering aesthetic.

Color tokens

Defined in tailwind.config.mjs and used as text-umai-accent, bg-umai-gray-900, etc.

umai-primary #0a0a0a

Dark surfaces, primary text, dark-mode background

umai-secondary #f5f5f5

Light surfaces, inverted text on dark

umai-accent #8B5CF6

Brand purple — links, highlights, eyebrows, CTAs

umai-accent-dark #7C3AED

Hover states, gradients

umai-accent-darker #6D28D9

Pressed states, deep gradients

umai-accent-light #A78BFA

Dark-mode accents, soft highlights

Gray scale umai-gray-*

50 #FAFAFA

100 #F4F4F5

200 #E4E4E7

300 #D4D4D8

400 #A1A1AA

500 #71717A

600 #52525B

700 #3F3F46

800 #27272A

900 #18181B

Typography

Inter font-sans

The harness is the product. Build agents that ship.

Primary — headings, body, UI

JetBrains Mono font-mono

./marcus-cv --display --interactive

Code, terminals, eyebrows, data

Noto Sans JP font-japanese

うまい — skillful, delicious

Japanese brand accents

Core patterns

The recurring shapes across pages and blog components.

eyebrow label

Hover-lift card

Rounded-2xl, subtle border, lifts on hover with an accent gradient bar that fades in along the top edge.

buttons

chip / badge

Pills for navigation-level actions, accent rectangles for in-content actions, mono-tracking chips for metadata.

Callout

Boxed emphasis in blog posts. insight carries thesis-level statements, success principles and playbooks, warning risks, info context, and note cross-references.

type="info"

One strong sentence the reader's eye should land on.

type="warning"

One strong sentence the reader's eye should land on.

type="error"

One strong sentence the reader's eye should land on.

type="success"

One strong sentence the reader's eye should land on.

type="note"

One strong sentence the reader's eye should land on.

type="insight"

One strong sentence the reader's eye should land on.

Citation

Attributed quotes. pullquote is reserved for a post's single marquee quote.

The model is the rented car — the harness is the road network.

— Marcus Elwin •The Moat Is the Harness, Not the Model

A supporting quote uses the quieter quote treatment.

— Marcus Elwin •umai-tech.com

Less prominent attributions use the reference treatment.

— Marcus Elwin •umai-tech.com

Code terminal

Dark terminal panel with mac dots, tab strip, and Shiki highlighting (GitHub Dark) — also how you consume the tokens.

umai-tech · tokens

<!-- Accent text + eyebrow pattern -->
<p class="text-xs font-bold uppercase tracking-[0.18em] text-umai-accent">
  eyebrow label
</p>

<!-- Card surface, light + dark -->
<div class="rounded-2xl border border-umai-gray-200 bg-white
            dark:border-umai-gray-700 dark:bg-umai-gray-800/80">
  ...
</div>

// tailwind.config.mjs
colors: {
  'umai-primary': '#0a0a0a',
  'umai-secondary': '#f5f5f5',
  'umai-accent': '#8B5CF6',      // + dark / darker / light
  'umai-gray': { 50: '#FAFAFA', /* ... */ 900: '#18181B' },
},
fontFamily: {
  sans: ['Inter', 'system-ui', 'sans-serif'],
  mono: ['JetBrains Mono', 'monospace'],
  japanese: ['Noto Sans JP', 'sans-serif'],
}

Component inventory

Every component in src/components/, scanned at build time and grouped by where it's actually used — this list can't go stale.

82

components

76

in production

19.7k

lines of component code

14

in the live gallery below

Site & pages ×14

AuthorCard 148 loc

[...slug]

Features 90 loc

index

Footer 157 loc

Layout

Header 368 loc

Layout

Hero 63 loc

index

InteractiveCV 1028 loc

about cv

PageViews 108 loc

[...slug] blog

PostReactions 241 loc

[...slug]

ProductCard 129 loc

products

ProjectCard 145 loc

projects

ReadingTime 25 loc

[...slug] blog

RecentPosts 125 loc

[...slug] blog

TableOfContents 220 loc

[...slug]

TeamSection 572 loc

about

Shared blog components ×4

Callout 161 loc

agentic-commerce-the-dawn-of… building-aura-an-agentic-llm… +10

Citation 151 loc

agentic-commerce-the-dawn-of… i-built-a-multi-agent-harnes… +3

CodeTerminal 208 loc

building-aura-an-agentic-llm… i-built-a-multi-agent-harnes…

PrettyTable 213 loc

agentic-commerce-the-dawn-of… lego-pieces-of-agentic-comme… +1

Post agentic-commerce-the-dawn-of-a-new-ai-driven-era-for-ecommerce ×12

CapabilityCard 59 loc

agentic-commerce-the-dawn-of…

ChallengeCategories 203 loc

agentic-commerce-the-dawn-of…

CompetitiveLandscape 738 loc

agentic-commerce-the-dawn-of…

ConvergenceForces 122 loc

agentic-commerce-the-dawn-of…

CustomIcon 75 loc

agentic-commerce-the-dawn-of…

FuturePredictions 280 loc

agentic-commerce-the-dawn-of…

InvestmentDashboard 192 loc

agentic-commerce-the-dawn-of…

MaturityLevels 256 loc

agentic-commerce-the-dawn-of…

PrettyList 52 loc

agentic-commerce-the-dawn-of…

ResearchInsights 57 loc

agentic-commerce-the-dawn-of…

Timeline 472 loc

agentic-commerce-the-dawn-of…

UpdateTracker 239 loc

agentic-commerce-the-dawn-of…

Post building-aura-an-agentic-llm-gateway-in-rust ×8

AuraArchitectureDiagram 278 loc

building-aura-an-agentic-llm…

CrateWorkspace 279 loc

building-aura-an-agentic-llm…

GatewayLandscapeMatrix 97 loc

building-aura-an-agentic-llm…

LessonsGrid 39 loc

building-aura-an-agentic-llm…

LoadTestChart 267 loc

building-aura-an-agentic-llm…

SupportedModelsGrid 70 loc

building-aura-an-agentic-llm…

WhatWorkedWhatHurt 69 loc

building-aura-an-agentic-llm…

ZoomableFrame 450 loc

building-aura-an-agentic-llm…

Post from-tweets-to-carts-twitter-ai-ecommerce ×4

DomainComparison 158 loc

from-tweets-to-carts-twitter…

PipelineOverview 188 loc

from-tweets-to-carts-twitter…

PipelineStage 273 loc

from-tweets-to-carts-twitter…

StrategyTable 176 loc

from-tweets-to-carts-twitter…

Post i-built-a-multi-agent-harness-then-tested-iii ×7

HarnessComparisonCards 112 loc

i-built-a-multi-agent-harnes…

HarnessImprovementCards 98 loc

i-built-a-multi-agent-harnes…

HarnessJobStack 92 loc

i-built-a-multi-agent-harnes…

HarnessLandscapeGrid 104 loc

i-built-a-multi-agent-harnes…

IiiPrimitivesGrid 93 loc

i-built-a-multi-agent-harnes…

InAppHarnessArchitecture 311 loc

i-built-a-multi-agent-harnes…

MultiAgentPatternGrid 110 loc

i-built-a-multi-agent-harnes…

Post lego-pieces-of-agentic-commerce-the-necessary-layering-of-multiple-protocols ×2

ImplementationExample 173 loc

lego-pieces-of-agentic-comme…

TextWallComparison 527 loc

lego-pieces-of-agentic-comme…

Post taste-still-matters-in-ai-software-engineering- ×10

CaseStudyComparison 347 loc

taste-still-matters-in-ai-so…

CodeTasteComparison 412 loc

taste-still-matters-in-ai-so…

MarketShareChart 664 loc

taste-still-matters-in-ai-so…

PlatformGrid 247 loc

taste-still-matters-in-ai-so…

ReviewNote 195 loc

taste-still-matters-in-ai-so…

SkillsTransition 216 loc

taste-still-matters-in-ai-so…

SoftwareEvolution 255 loc

taste-still-matters-in-ai-so…

TasteDevelopmentPath 213 loc

taste-still-matters-in-ai-so…

TastePillars 231 loc

taste-still-matters-in-ai-so…

WaveCard 276 loc

taste-still-matters-in-ai-so…

Post the-moat-is-the-harness-not-the-model ×9

FlywheelDiagram 493 loc

the-moat-is-the-harness-not-…

FoundationModelMoatCards 123 loc

the-moat-is-the-harness-not-…

HarnessDiagram 614 loc

the-moat-is-the-harness-not-…

MoatRadar 553 loc

the-moat-is-the-harness-not-…

MoatVersusTable 185 loc

the-moat-is-the-harness-not-…

ModelCommoditizationChart 416 loc

the-moat-is-the-harness-not-…

PlaybookCards 640 loc

the-moat-is-the-harness-not-…

SimplifiedAgentDiagram 139 loc

the-moat-is-the-harness-not-…

VerticalizationHeatmap 524 loc

the-moat-is-the-harness-not-…

Post the-third-path-player-coach-at-scale ×6

CareerFilter 94 loc

the-third-path-player-coach-…

CareerPathEvolution 187 loc

the-third-path-player-coach-…

LeverageFormula 154 loc

the-third-path-player-coach-…

ProfileComparison 335 loc

the-third-path-player-coach-…

SkillsRadar 362 loc

the-third-path-player-coach-…

ThreePathsComparison 249 loc

the-third-path-player-coach-…

Retired / unused ×6

CardCarousel 191 loc

CollapsibleSection 120 loc

Mermaid 53 loc

MoatComparison 155 loc

ServiceCard 87 loc

TrustMetrics 95 loc

Live gallery

Every self-contained component, rendered live — expand to inspect. Prop-driven components (MoatRadar, FlywheelDiagram, Timeline, …) need post data, so they live in the inventory above and render in their posts.

<SimplifiedAgentDiagram /> the-moat-is-the-harness-not-the-model ▸

What do we mean by an agent + harness?

Agent +=

Instructions
Model
Tools
Memory

"Agentic Harness"

Everything around the model — the instructions it follows, the tools it can call, and the memory it keeps between turns.

A simplified view

"Agentic Harness"

Everything around the model — the instructions it follows, the tools it can call, and the memory it keeps between turns.

<HarnessDiagram /> the-moat-is-the-harness-not-the-model ▸

On failure → loop back to Cognitive Engine

7

Output on success

Supporting Layers

feeds into main flow

Click any component to explore how it creates defensibility

<ModelCommoditizationChart /> the-moat-is-the-harness-not-the-model ▸

The Model Commoditization Curve

Frontier model pricing collapse: November 2022 to April 2026

Release Date (Nov 2022 - Apr 2026)

Price Index

Major Model Releases

95.2%

Price Drop

40x/yr

Cost Reduction Rate

41 months

Nov '22 to Apr '26

Chart shows output token pricing trends for frontier LLMs from November 2022 to April 2026.

Models tracked: OpenAI (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o, GPT-5.2, GPT-5.5), Anthropic (Claude 2, Claude 3 Opus, Claude 3.5 Sonnet, Claude Opus 4, Claude Opus 4.7), Google (Gemini 1.0, Gemini 2.0 Pro, Gemini 3.1 Pro), and DeepSeek (R1).

Price index methodology: Normalized to GPT-4 March 2023 pricing as baseline (index = 100). Each data point represents the average output token price for comparable frontier-tier models at time of release, adjusted for relative performance on standard benchmarks (MMLU, HumanEval, GSM8K, SWE-bench).

Key inflection points: DeepSeek R1 (Dec 2024) triggered a 43% single-quarter price drop by undercutting proprietary models by ~90%; Claude 3.5 Sonnet demonstrated mid-tier models matching flagship performance at lower cost; GPT-4 Turbo introduced tiered pricing; April 2026 saw intense competition with GPT-5.5 ($5/$30 per million tokens) and Claude Opus 4.7 ($5/$25) both achieving near-parity pricing at frontier performance levels.

Sources: OpenAI API Pricing (GPT-5.5 launch April 23, 2026), Anthropic API Pricing (Opus 4.7 launch April 16, 2026), Google AI Pricing, DeepSeek Pricing Docs, BenchLM.ai LLM Pricing Trends, Epoch AI Price Performance Analysis (2025-2026), Menlo Ventures Enterprise API Market Share Report (Mid-2025).

<VerticalizationHeatmap /> the-moat-is-the-harness-not-the-model ▸

Platform Giants: Verticalization Risk Matrix

How OpenAI, Anthropic, and Google are entering vertical markets (April 2026)

Vertical	OpenAI $852B valuation	Anthropic Claude Opus 4.7	Google Gemini 3.1 Pro
Legal
Design
Coding
Healthcare
Finance
Commerce

Risk Intensity:

Critical (80+)

High (60-79)

Medium (40-59)

Low (20-39)

Minimal (<20)

Methodology: Risk scores (0-100) assess the threat level each platform play poses to incumbent vertical SaaS providers, based on: market reaction (stock drops, market cap impact), strategic investment size, product maturity, distribution advantage, and timing. Click any cell to view detailed sources and analysis. Scores reflect verticalization risk as of April 2026.

<AuraArchitectureDiagram /> building-aura-an-agentic-llm-gateway-in-rust ▸

Aura Architecture

Click any box to see what it does

AURA GATEWAY

Axum router · Tokio async

Middleware

Core

POST /v1/responses Open Responses API

<LoadTestChart /> building-aura-an-agentic-llm-gateway-in-rust ▸

Gateway load test — 1,000 requests, 1–5 tool calls

Aura vs LiteLLM, Portkey, Helicone, OpenRouter, Bifrost

Scenario 1 / 5

Heads up: these numbers are directional placeholders pending a live benchmark run. They reflect the rough shape I'd expect from each gateway's architecture (Rust vs Python, agentic vs translation, etc.), not measured values. I'll update with real numbers once I've run the harness against all six.

Gateway overhead

lower = better

Pure gateway-added latency, provider round-trip subtracted.

Aura

4 ms
Bifrost

3 ms
Helicone

6 ms
Portkey

22 ms
OpenRouter

30 ms
LiteLLM

58 ms

p50 latency

lower = better

Median end-to-end request latency.

Aura

312 ms
Bifrost

308 ms
Helicone

318 ms
Portkey

345 ms
OpenRouter

360 ms
LiteLLM

395 ms

p99 latency

lower = better

Tail latency — the slowest 1% of requests.

Aura

612 ms
Bifrost

605 ms
Helicone

622 ms
Portkey

690 ms
OpenRouter

720 ms
LiteLLM

810 ms

Sustained throughput

higher = better

Requests per second under steady load.

Aura

1,450 RPS
Bifrost

1,520 RPS
Helicone

1,380 RPS
Portkey

920 RPS
OpenRouter

840 RPS
LiteLLM

540 RPS

Scenario: 1,000 requests · 1 tool call per request · same provider (Anthropic Sonnet 4.5) behind every gateway · warm-cache, post-jit.

Aura Best in scenario Competitor

<MultiAgentPatternGrid /> i-built-a-multi-agent-harness-then-tested-iii ▸

multi-agents-team

Nine architecture pages, one event contract

Each card links to the live architecture view on the playground. The point isn’t that one pattern wins — it’s that the same task can move through different coordination shapes and still stream back through the same UI.

Orchestrated

Central coordinator

plan → delegate → synthesize

A coordinator routes work to research, writer, and editor specialists.

Choreographed

Peer message bus

round-robin negotiation

Backend, frontend, and design peers coordinate through shared messages.

Hierarchical

Dynamic agent tree

lead → sub-agents → rollup

A lead spawns depth-capped sub-agents and synthesizes their results.

Evaluator–Optimizer

Critique loop

draft → score → revise

A generator improves a draft until a critic accepts the quality bar.

Debate

Adversarial panel

argue → rebut → judge

Opposing agents argue their case before a judge synthesizes the answer.

Blackboard

Shared workspace

select agent → write board

A controller chooses which specialist updates the shared board next.

Market

Auction board

post task → bid → award

Agents bid on work; the dispatcher awards tasks to the strongest fit.

Self-Consistency

Parallel sampling

sample N → select/merge

Several attempts run in parallel and a judge selects or merges the best.

Swarm

Shared scratchpad

many passes → convergence

Identical agents build on a shared scratchpad over capped rounds.

<InAppHarnessArchitecture /> i-built-a-multi-agent-harness-then-tested-iii ▸

original harness architecture

The whole first harness lived inside one app

Click a block to inspect the job it owns — the detail panel below the diagram updates in place.

Open live app ↗

browser localStorage history

Why this works: everything shares one process boundary, so iteration is fast and the source is easy to read. Why it bends: durability, policy, budget enforcement, and server-side state all become custom app code.

mat.umai-tech.com

Browser chat UI

The public app owns the mode selector, provider settings, local chat history, live timeline, tree visualizations, and final rich summaries.

Responsibility

User-facing control plane

<HarnessJobStack /> i-built-a-multi-agent-harness-then-tested-iii ▸

the first harness

One app, nine jobs

The original harness worked because all responsibilities were close to the product. That same closeness is also where production pressure starts.

thin harness

01 harness job

Turn loop

Hand-rolled runners inside the Next.js app.

pressure point Readable, but the route owns too much lifecycle.

02 harness job

Events to UI

SSE from the route.

pressure point Simple until runs need to outlive the request.

03 harness job

Tools

AI SDK tool definitions with Zod schemas.

pressure point Good primitive; policy is still separate work.

04 harness job

Policy

None by default.

pressure point Fine for demos, risky for internal tools.

05 harness job

Budget

Estimated cost only.

pressure point Useful signal, not an enforcement layer.

06 harness job

Sessions

Browser localStorage.

pressure point Convenient, not durable server-side memory.

07 harness job

Human approval

In-memory request registry.

pressure point Easy to prototype, fragile across restarts.

08 harness job

Observability

Logs and UI events.

pressure point Great for demos; thin for incident debugging.

09 harness job

Deployment

One Next.js app.

pressure point The strongest feature of the original harness.

<HarnessComparisonCards /> i-built-a-multi-agent-harness-then-tested-iii ▸

migration ledger

Thin app harness vs worker substrate

Same product surface, different runtime responsibilities. The useful comparison isn’t who’s “better” — it’s where each harness shape puts the work.

AI SDK harness

thin app runtime

current

iii substrate

engine + composable workers

Best use case

AI SDK harness

Learning, demos, local dev, fast iteration.

iii substrate

Production-ish runs that need durable state, policy, budgets, and traces.

Mental model

AI SDK harness

One app owns the route, loop, tools, events, and UI.

iii substrate

Engine plus workers; each harness concern can be a worker.

Streaming

AI SDK harness

Simple SSE from the Next.js route.

iii substrate

Worker emits events through engine/channel/stream paths; app forwards them.

Tool policy

AI SDK harness

Whatever I code inline. Initially: nothing.

iii substrate

A policy function can gate calls and fail closed.

Budget control

AI SDK harness

Estimate and display cost.

iii substrate

Budget can become a runtime enforcement concern.

Sessions

AI SDK harness

Browser-local history.

iii substrate

Server-side state keyed by conversation/session.

Human approval

AI SDK harness

Easy to prototype, fragile in one process.

iii substrate

Can be backed by durable state/queue mechanics.

Observability

AI SDK harness

UI timeline plus logs.

iii substrate

Cross-worker traces are part of the substrate.

Provider support

AI SDK harness

Great through the AI SDK, request-scoped in my app.

iii substrate

Provider workers make model access another replaceable layer.

Deployment

AI SDK harness

One Next.js app.

iii substrate

Next.js app plus engine/workers. More power, more ops.

<HarnessLandscapeGrid /> i-built-a-multi-agent-harness-then-tested-iii ▸

the landscape

Five answers to where the harness jobs live

Every serious agent stack answers the harness question somewhere. The interesting comparison isn’t feature lists — it’s which layer of your system ends up owning durability, policy, budgets, and state.

In-process framework

LangGraph Microsoft Agent Framework OpenAI Agents SDK

Inside your application process. Graphs, checkpointers, guardrails, and sessions are objects the framework owns.

trade-off Fast to adopt and easy to reason about locally; durability and policy are bounded by the framework and the process running it.

Thin SDK

Claude Agent SDK openharness Pi

In a packaged loop you embed as a library: tools, permissions, hooks, compaction, and subagents.

trade-off Excellent loop ergonomics; durability, budgets, and multi-service orchestration are explicitly out of scope.

Durable-execution substrate

Temporal Restate Inngest Hatchet

In a workflow engine that makes the run itself durable: retries, queues, state, and replay come from the substrate.

trade-off The strongest durability story; agent-specific jobs like policy, approvals, and budgets are still yours to build on top.

Session platform

OpenHands

In a per-session runtime: sandbox, event stream, and tools bundled around each agent session.

trade-off Batteries included for coding agents; less a substrate you compose, more a product you adopt.

Worker bus

iii

tested in this post

In independent workers on a shared engine bus. Each harness job — policy, budget, state, provider, traces — is a function you can swap without touching the rest.

trade-off The most composable shape, and the one this post tests; the price is that you operate the engine and pay the early-adopter integration tax.

<HarnessImprovementCards /> i-built-a-multi-agent-harness-then-tested-iii ▸

iii next asks

The primitives are strong; the defaults can get sharper

These aren’t reasons to reject iii. They’re the gaps I’d productize first if the goal is making agent harness adoption feel boring for real teams.

01 product gap

A first-class run contract

today

iii gives you workers, functions, triggers, state, streams, and channels.

what I would want next

A standard agent-run envelope with run IDs, session IDs, event versions, terminal states, cancellation, and retry semantics.

Why it matters: This would remove adapter glue where every app invents its own event translator.

02 product gap

Browser-friendly event reads

today

Channels and streams are strong worker-to-worker primitives.

what I would want next

A documented HTTP/SSE or browser SDK path for reading queued run events without writing a proxy worker.

Why it matters: My queue path had to expose `GET /events` from the worker so the Next.js app could poll safely.

03 product gap

Worker lifecycle guardrails

today

The engine tracks connected workers and removes functions when workers disconnect.

what I would want next

Replace-on-register, singleton leases, or an `iii doctor` check for duplicate workers and stale registrations.

Why it matters: Two MAT workers registering the same function can split runs and create random timeouts.

04 product gap

Production HA recipes

today

The docs cover Docker and reverse proxies for production deployment.

what I would want next

Opinionated recipes for external state, stream, and queue backends on Fly.io, Kubernetes, and managed Redis/RabbitMQ.

Why it matters: Until those layers are externalized, my Fly deployment should stay single-machine.

05 product gap

Governance starter packs

today

The harness model makes policy, budget, approval, credentials, and provider routing replaceable workers.

what I would want next

Tested templates for OPA/Cedar policy, Slack approvals, workspace budgets, provider key vaulting, and audit logs.

Why it matters: The primitives are there; teams still need safe defaults before agents touch internal systems.

06 product gap

Replay and evals as primitives

today

Tracing and logs make runs inspectable after the fact.

what I would want next

A replay/eval worker that can rerun a turn against pinned inputs, compare traces, and catch harness regressions.

Why it matters: Agent harness work needs regression tests for behavior, not just unit tests for functions.

<LeverageFormula /> the-third-path-player-coach-at-scale ▸

The Player-Coach Leverage Formula

Why context-rich leaders get disproportionate returns from AI tools

Deep Context

Historical Decisions

Political Landscape

Technical Debt Map

Strategic Priorities

+

AI Fluency

Code Generation

Pattern Recognition

Rapid Prototyping

=

Player-Coach Leverage

Strategic altitude maintained

Direct output at scale

Judgment-directed AI

Context directs AI → AI amplifies context → Leverage compounds

The Compounding Effect

Pure ICs have AI fluency but often lack strategic context. Product Engineers have context but delegate implementation. The Player-Coach keeps both—and the combination creates leverage that neither can match alone.

<CareerPathEvolution /> the-third-path-player-coach-at-scale ▸

The Leverage Equation Changed

How AI transformed the traditional career fork into a viable dual path

Traditional Model

Zero-Sum Tradeoff

Senior Engineer

CHOOSE

Management

Scale via others

Stop Shipping

IC Track

Direct output

Limited Leverage

AI-Enabled Model

Both Paths Viable

Senior Engineer

BOTH PATHS

Scale via AI + Others

Strategic leverage

Keep Shipping

Direct Output via AI

10x velocity

Strategic Leverage

The shift: AI compresses implementation time, freeing bandwidth for strategic work without sacrificing direct output.

The rule of thumb

Neutral zinc grays do the layout work, one purple does the talking, and the mono font marks anything machine-flavored. If a new component needs a second accent color, it's probably the wrong design.