umai-tech / design-system
Design System
The tokens and components behind umai-tech.com — Japanese craftsmanship (うまい) meets a terminal-flavored engineering aesthetic.
Color tokens
Defined in tailwind.config.mjs and used as
text-umai-accent,
bg-umai-gray-900, etc.
Dark surfaces, primary text, dark-mode background
Light surfaces, inverted text on dark
Brand purple — links, highlights, eyebrows, CTAs
Hover states, gradients
Pressed states, deep gradients
Dark-mode accents, soft highlights
Gray scale umai-gray-*
Typography
The harness is the product. Build agents that ship.
Primary — headings, body, UI
./marcus-cv --display --interactive
Code, terminals, eyebrows, data
うまい — skillful, delicious
Japanese brand accents
Core patterns
The recurring shapes across pages and blog components.
eyebrow label
Hover-lift card
Rounded-2xl, subtle border, lifts on hover with an accent gradient bar that fades in along the top edge.
buttons
Pills for navigation-level actions, accent rectangles for in-content actions, mono-tracking chips for metadata.
Callout
Boxed emphasis in blog posts. insight carries thesis-level statements,
success principles and playbooks,
warning risks,
info context, and
note cross-references.
type="info"
type="warning"
type="error"
type="success"
type="note"
type="insight"
Citation
Attributed quotes. pullquote is reserved for a post's single marquee quote.
— Marcus Elwin •The Moat Is the Harness, Not the ModelThe model is the rented car — the harness is the road network.
— Marcus Elwin •umai-tech.comA supporting quote uses the quieter quote treatment.
— Marcus Elwin •umai-tech.comLess prominent attributions use the reference treatment.
Code terminal
Dark terminal panel with mac dots, tab strip, and Shiki highlighting (GitHub Dark) — also how you consume the tokens.
<!-- Accent text + eyebrow pattern -->
<p class="text-xs font-bold uppercase tracking-[0.18em] text-umai-accent">
eyebrow label
</p>
<!-- Card surface, light + dark -->
<div class="rounded-2xl border border-umai-gray-200 bg-white
dark:border-umai-gray-700 dark:bg-umai-gray-800/80">
...
</div>Component inventory
Every component in src/components/, scanned at
build time and grouped by where it's actually used — this list can't go stale.
Site & pages ×14
Shared blog components ×4
Post agentic-commerce-the-dawn-of-a-new-ai-driven-era-for-ecommerce ×12
Post building-aura-an-agentic-llm-gateway-in-rust ×8
Post from-tweets-to-carts-twitter-ai-ecommerce ×4
Post i-built-a-multi-agent-harness-then-tested-iii ×7
Post lego-pieces-of-agentic-commerce-the-necessary-layering-of-multiple-protocols ×2
Post taste-still-matters-in-ai-software-engineering- ×10
Post the-moat-is-the-harness-not-the-model ×9
Post the-third-path-player-coach-at-scale ×6
Retired / unused ×6
Live gallery
Every self-contained component, rendered live — expand to inspect. Prop-driven components
(MoatRadar,
FlywheelDiagram,
Timeline, …) need post data, so they live in
the inventory above and render in their posts.
<SimplifiedAgentDiagram /> the-moat-is-the-harness-not-the-model ▸
What do we mean by an agent + harness?
- Instructions
- Model
- Tools
- Memory
Everything around the model — the instructions it follows, the tools it can call, and the memory it keeps between turns.
A simplified view
Everything around the model — the instructions it follows, the tools it can call, and the memory it keeps between turns.
<HarnessDiagram /> the-moat-is-the-harness-not-the-model ▸
Click any component to explore how it creates defensibility
<ModelCommoditizationChart /> the-moat-is-the-harness-not-the-model ▸
The Model Commoditization Curve
Frontier model pricing collapse: November 2022 to April 2026
Chart shows output token pricing trends for frontier LLMs from November 2022 to April 2026.
Models tracked: OpenAI (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o, GPT-5.2, GPT-5.5), Anthropic (Claude 2, Claude 3 Opus, Claude 3.5 Sonnet, Claude Opus 4, Claude Opus 4.7), Google (Gemini 1.0, Gemini 2.0 Pro, Gemini 3.1 Pro), and DeepSeek (R1).
Price index methodology: Normalized to GPT-4 March 2023 pricing as baseline (index = 100). Each data point represents the average output token price for comparable frontier-tier models at time of release, adjusted for relative performance on standard benchmarks (MMLU, HumanEval, GSM8K, SWE-bench).
Key inflection points: DeepSeek R1 (Dec 2024) triggered a 43% single-quarter price drop by undercutting proprietary models by ~90%; Claude 3.5 Sonnet demonstrated mid-tier models matching flagship performance at lower cost; GPT-4 Turbo introduced tiered pricing; April 2026 saw intense competition with GPT-5.5 ($5/$30 per million tokens) and Claude Opus 4.7 ($5/$25) both achieving near-parity pricing at frontier performance levels.
Sources: OpenAI API Pricing (GPT-5.5 launch April 23, 2026), Anthropic API Pricing (Opus 4.7 launch April 16, 2026), Google AI Pricing, DeepSeek Pricing Docs, BenchLM.ai LLM Pricing Trends, Epoch AI Price Performance Analysis (2025-2026), Menlo Ventures Enterprise API Market Share Report (Mid-2025).
<VerticalizationHeatmap /> the-moat-is-the-harness-not-the-model ▸
Platform Giants: Verticalization Risk Matrix
How OpenAI, Anthropic, and Google are entering vertical markets (April 2026)
| Vertical | OpenAI $852B valuation | Anthropic Claude Opus 4.7 | Google Gemini 3.1 Pro |
|---|---|---|---|
| Legal | |||
| Design | |||
| Coding | |||
| Healthcare | |||
| Finance | |||
| Commerce |
Methodology: Risk scores (0-100) assess the threat level each platform play poses to incumbent vertical SaaS providers, based on: market reaction (stock drops, market cap impact), strategic investment size, product maturity, distribution advantage, and timing. Click any cell to view detailed sources and analysis. Scores reflect verticalization risk as of April 2026.
<AuraArchitectureDiagram /> building-aura-an-agentic-llm-gateway-in-rust ▸
<LoadTestChart /> building-aura-an-agentic-llm-gateway-in-rust ▸
Gateway load test — 1,000 requests, 1–5 tool calls
Aura vs LiteLLM, Portkey, Helicone, OpenRouter, Bifrost
Gateway overhead
lower = betterPure gateway-added latency, provider round-trip subtracted.
- Aura 4 ms
- Bifrost 3 ms
- Helicone 6 ms
- Portkey 22 ms
- OpenRouter 30 ms
- LiteLLM 58 ms
p50 latency
lower = betterMedian end-to-end request latency.
- Aura 312 ms
- Bifrost 308 ms
- Helicone 318 ms
- Portkey 345 ms
- OpenRouter 360 ms
- LiteLLM 395 ms
p99 latency
lower = betterTail latency — the slowest 1% of requests.
- Aura 612 ms
- Bifrost 605 ms
- Helicone 622 ms
- Portkey 690 ms
- OpenRouter 720 ms
- LiteLLM 810 ms
Sustained throughput
higher = betterRequests per second under steady load.
- Aura 1,450 RPS
- Bifrost 1,520 RPS
- Helicone 1,380 RPS
- Portkey 920 RPS
- OpenRouter 840 RPS
- LiteLLM 540 RPS
<MultiAgentPatternGrid /> i-built-a-multi-agent-harness-then-tested-iii ▸
multi-agents-team
Nine architecture pages, one event contract
Each card links to the live architecture view on the playground. The point isn’t that one pattern wins — it’s that the same task can move through different coordination shapes and still stream back through the same UI.
Orchestrated
Central coordinator
plan → delegate → synthesize
A coordinator routes work to research, writer, and editor specialists.
Choreographed
Peer message bus
round-robin negotiation
Backend, frontend, and design peers coordinate through shared messages.
Hierarchical
Dynamic agent tree
lead → sub-agents → rollup
A lead spawns depth-capped sub-agents and synthesizes their results.
Evaluator–Optimizer
Critique loop
draft → score → revise
A generator improves a draft until a critic accepts the quality bar.
Debate
Adversarial panel
argue → rebut → judge
Opposing agents argue their case before a judge synthesizes the answer.
Blackboard
Shared workspace
select agent → write board
A controller chooses which specialist updates the shared board next.
Market
Auction board
post task → bid → award
Agents bid on work; the dispatcher awards tasks to the strongest fit.
Self-Consistency
Parallel sampling
sample N → select/merge
Several attempts run in parallel and a judge selects or merges the best.
Swarm
Shared scratchpad
many passes → convergence
Identical agents build on a shared scratchpad over capped rounds.
<InAppHarnessArchitecture /> i-built-a-multi-agent-harness-then-tested-iii ▸
original harness architecture
The whole first harness lived inside one app
Click a block to inspect the job it owns — the detail panel below the diagram updates in place.
mat.umai-tech.com
Browser chat UI
The public app owns the mode selector, provider settings, local chat history, live timeline, tree visualizations, and final rich summaries.
Responsibility
User-facing control plane
<HarnessJobStack /> i-built-a-multi-agent-harness-then-tested-iii ▸
One app, nine jobs
Events to UI
Tools
Policy
Budget
Sessions
Human approval
Observability
Deployment
<HarnessComparisonCards /> i-built-a-multi-agent-harness-then-tested-iii ▸
Thin app harness vs worker substrate
<HarnessLandscapeGrid /> i-built-a-multi-agent-harness-then-tested-iii ▸
Five answers to where the harness jobs live
In-process framework
Thin SDK
Durable-execution substrate
Session platform
Worker bus
<HarnessImprovementCards /> i-built-a-multi-agent-harness-then-tested-iii ▸
The primitives are strong; the defaults can get sharper
A first-class run contract
Browser-friendly event reads
Worker lifecycle guardrails
Production HA recipes
Governance starter packs
Replay and evals as primitives
<LeverageFormula /> the-third-path-player-coach-at-scale ▸
The Player-Coach Leverage Formula
Why context-rich leaders get disproportionate returns from AI tools
Deep Context
AI Fluency
Player-Coach Leverage
The Compounding Effect
Pure ICs have AI fluency but often lack strategic context. Product Engineers have context but delegate implementation. The Player-Coach keeps both—and the combination creates leverage that neither can match alone.
<CareerPathEvolution /> the-third-path-player-coach-at-scale ▸
The Leverage Equation Changed
How AI transformed the traditional career fork into a viable dual path
Traditional Model
Zero-Sum TradeoffAI-Enabled Model
Both Paths ViableThe shift: AI compresses implementation time, freeing bandwidth for strategic work without sacrificing direct output.