HAPPI/1.2 — Possibilities
Protocol, possibilities, and the path to AI as infrastructure.
Download PDFWhat HAPPI Actually Is
HAPPI is a request/response protocol operating at the wire level. Every call is one JSON envelope; every response is a stream of NDJSON events.
Core Design
Input — the envelope:
{
"v": "happi/1.0",
"id": "req-001",
"cmd": "anthropic.messages.create",
"args": [{ "model": "claude-opus-4-7",
"messages": [{ "role": "user", "content": "Your question" }] }],
"flags": { "provider": "anthropic/claude-opus-4-7", "max_tokens": 4096 },
"auth": { "scheme": "apikey", "token": "env:ANTHROPIC_API_KEY" }
}
Output — the event stream:
{"type":"started", "request_id":"req-001"}
{"type":"delta", "request_id":"req-001", "text":"The answer "}
{"type":"delta", "request_id":"req-001", "text":"continues here..."}
{"type":"completed", "request_id":"req-001"}
The Seven Event Types — and Only Seven
| Event | Meaning |
|---|---|
started | Inference began |
delta | Token(s) streamed |
tool_call | Model requested a tool invocation |
tool_result | Tool execution result returned |
sub_request | Model dispatching a child HAPPI envelope |
completed | Response finished |
error | Failure with classified cause |
Auth Portability
The auth.token field accepts three forms. The auth field is automatically
scrubbed from all event stream logs — the same envelope file runs identically on a
developer's MacBook and in a GitHub Actions runner.
| Form | Source | When Used |
|---|---|---|
env:VAR_NAME | Shell environment variable | CI/CD, GitHub Actions |
keychain:SERVICE_NAME | macOS Keychain | Developer workstation |
| Literal string | Inline value | Scripted, short-lived |
Transport Layer
HAPPI is transport-agnostic. The reference transport is stdio. No transport
adds semantics — a client written for stdio works against HTTP/SSE without modification.
| Transport | Status | Notes |
|---|---|---|
stdio | Canonical (v1.0) | Reference implementation |
| HTTP/SSE | Supported (v1.0) | Server-sent events |
| Unix socket | Supported (v1.0) | Low-latency local IPC |
| WebSocket | Planned (v1.1) | Bidirectional streaming |
| MCP | Planned (v1.1) | Claude tool integration |
The Meta-Move
A single .happi.md file is simultaneously valid Markdown, executable bash,
a HAPPI/1.0 envelope, and an OpenAPI 3.1 schema. This is the design property that
changes everything. Achieved via bash heredoc no-op trick:
#!/usr/bin/env bash
# Everything above the heredoc is bash. Everything inside is JSON.
cat <<'HAPPI'
{
"v": "happi/1.0",
"id": "summarise-1",
"cmd": "anthropic.messages.create",
"args": [{ "model": "claude-opus-4-7",
"messages": [{ "role": "user", "content": "$1" }] }],
"auth": { "scheme": "subscription" }
}
HAPPI
# Run: bash summarise.happi.md "Summarise this contract clause: ..."
# | hal --happi-api
Version-Controlled AI Behaviour
git log prompts/summarise.happi.md
# commit abc123: switch to gemini-flash for latency
# commit def456: add brand-voice system prompt
# commit 789abc: cap max_tokens 2048 -> 4096 for long clauses
AI behaviour changes are reviewed, diffed, reverted, blamed, and bisected using the same tools as code. Prompt engineering becomes software engineering.
Documentation That Cannot Go Stale
A README can lie. An OpenAPI spec can drift from the code. A HAPPI .md file
cannot — it IS the running artifact. The Markdown is not a description of the system;
it is the system reading itself aloud.
Orchestration at the Wire Level
HAPPI's sub_request event type makes multi-model orchestration a wire-level
property, not an application-level concern.
Sequential Chains
Research, summarise, and critique across different providers:
bash research.happi.md | hal --happi-api \
| bash summarise.happi.md | hal --happi-api \
| bash critique.happi.md | hal --happi-api \
> verdict.ndjson
| Stage | Recommended Provider | Reason |
|---|---|---|
| Research | claude-opus-4-7 | High context, deep reasoning |
| Summarise | groq/llama-3.3-70b | Fast, cheap, sufficient |
| Critique | google/gemini-2.5-pro | Different vendor — anti-homogeneity |
| Verdict | Council deliberation (§3.3) | Stakes justify the cost |
Parallel Fan-Out
Same query dispatched to N providers simultaneously; responses merged:
parallel -j4 "hal --happi-api < {}" \
::: claude.json gemini.json llama.json grok.json \
| jq -s 'group_by(.request_id)'
Quorum voting
Majority verdict before acting on any output.
Diversity sampling
One output per provider; human picks the best.
Latency tournament
First to complete wins; others cancelled.
Benchmarking
Cost-per-token, quality, latency compared in one run.
Deliberation Councils (Quorum)
{
"v": "happi/1.0",
"id": "council-1",
"cmd": "happi.council",
"flags": {
"providers": ["claude-opus-4-7","gemini-2.5-pro","llama-3.3-70b","grok-3","deepseek"],
"quorum": 3,
"rule": "majority",
"security_override": true
},
"args": ["Review this PR for security vulnerabilities."]
}
Security issues block on any single provider's report. Other issues require quorum. This is a production deliberation pattern currently in use in the authors' internal review pipelines on every pull request.
Red-Team / Devil's Advocate
# Stage 1: proposal
echo "Design a token refresh strategy for the auth module." \
| bash propose.happi.md | hal --happi-api > proposal.ndjson
# Stage 2: adversarial review (DIFFERENT vendor mandatory)
cat proposal.ndjson \
| bash falsify.happi.md | hal --happi-api > critique.ndjson
Cost-Quality Routing (Triage)
| Complexity Class | Provider | Approx. Cost |
|---|---|---|
| trivial | groq/llama-3.1-8b | ~$0.00005/query |
| moderate | gemini-2.5-flash | ~$0.001/query |
| complex | claude-opus-4-7 | ~$0.015/query |
| critical | Council (5 providers) | ~$0.05/query |
Provider Fallback (Sentinel)
{
"flags": {
"provider": "anthropic/claude-opus-4-7",
"fallback": ["google/gemini-2.5-pro", "groq/llama-3.3-70b"],
"fallback_triggers": ["rate_limit", "timeout", "5xx"]
}
}
The client sees a continuous stream with a single completed. The provider swap
is invisible unless the client reads the provider field on each event.
ENTER Konsult's internal runtime (in development and evaluation) implements Sentinel in
~200 LOC. Every application using HAPPI inherits this for free.
A Primitive Library
Small, single-purpose .happi.md files compose into complex pipelines.
Each primitive is under 50 lines. Composition is bash. The "agent framework" is
/bin/sh.
primitives/
research.happi.md # deep investigation, high-context provider
summarise.happi.md # compress to N bullets
critique.happi.md # adversarial review
extract.happi.md # structured data extraction
classify.happi.md # route by category
score.happi.md # numerical evaluation
translate.en-za.happi.md # SA English localisation
compose.happi.md # creative synthesis
Domain-Specific Overlays
base/
research.happi.md # generic template
domains/
legal/research.happi.md # + SA POPI/common law compliance prompt
medical/research.happi.md # + HIPAA routing policy, local provider only
financial/research.happi.md # + audit trail flags, SOX-compatible output
Full Multi-Turn Agent Workflow
#!/usr/bin/env bash
# Due-diligence pipeline: 5-stage cross-provider workflow
TARGET="$1"
# W1: Research (deep, expensive)
echo "Research $TARGET -- financials, litigation, leadership." \
| bash primitives/research.happi.md \
| hal --happi-api \
| tee /tmp/research.ndjson \
| bash primitives/extract.happi.md \
| hal --happi-api > /tmp/facts.ndjson
# W2: Red-team the research (different vendor)
cat /tmp/research.ndjson \
| bash primitives/critique.happi.md \
| hal --happi-api > /tmp/critique.ndjson
# W3: Legal clause scan
cat /tmp/facts.ndjson \
| bash domains/legal/clause-check.happi.md \
| hal --happi-api > /tmp/legal.ndjson
# W4: Council deliberation
cat /tmp/research.ndjson /tmp/critique.ndjson /tmp/legal.ndjson \
| bash council/deliberate.happi.md \
| hal --happi-api > /tmp/verdict.ndjson
# W5: Human-readable summary
cat /tmp/verdict.ndjson \
| bash primitives/summarise.happi.md \
| hal --happi-api
Capability Registry (The Marketplace Seed)
happi install @codetonight/legal-review-za # SA jurisdiction
happi install @openai-community/code-critique # any vendor's community
happi install @internal/brand-voice # your org's private primitives
happi run legal-review-za --args "contract.pdf"
Files are shared across organisations without SDK lock-in. An adopting organisation runs
any published .happi.md capability — for example, a shared
morning-brief.happi.md — via its own runtime against its own provider keys.
Zero publisher SDK dependency.
Native Agentic Recursion
sub_request is an event type in the response stream. The model — not the
application, not the framework — emits a child HAPPI envelope as part of its response.
The runtime dispatches it; child events inline into the parent stream.
{"type":"delta", "request_id":"main", "text":"I need to verify this claim..."}
{"type":"sub_request", "request_id":"main",
"envelope": {"v":"happi/1.0","cmd":"gemini.generate",
"args":["Cross-check: is it true that..."]}}
{"type":"started", "request_id":"sub-1"}
{"type":"delta", "request_id":"sub-1", "text":"Cross-model result:"}
{"type":"completed", "request_id":"sub-1"}
{"type":"delta", "request_id":"main", "text":"Confirmed. Therefore..."}
{"type":"completed", "request_id":"main"}
Why This Dissolves Orchestration Frameworks
Every "agent framework" that exists today (LangGraph, AutoGen, CrewAI, Agno,
PydanticAI) exists to answer one question: how does one LLM call trigger another
LLM call with shared state? sub_request answers that at the wire level.
| Framework | Recursion | Cross-language | Cross-provider |
|---|---|---|---|
| LangGraph | Python-only | No | Via adapters |
| AutoGen | Python-only | No | Via adapters |
| CrewAI | Python-only | No | Via adapters |
HAPPI sub_request | Protocol-level | Yes | Native |
Recursive Agent Hierarchies
main-request
└── sub_request: research (claude-opus-4-7)
└── sub_request: fact-check (gemini-2.5-pro)
└── sub_request: citation-verify (llama-3.3-70b)
Depth cap prevents infinite recursion. The runtime enforces at the envelope parser.
Tree-of-Thought is native: N parallel sub_requests at each branch; parent
aggregates verdicts. No library needed. The protocol carries it.
Cross-Language Agent Communication
python3 research_agent.py # Python agent builds research envelope
| hal --happi-api # runtime dispatches
| go run summarise_agent.go # Go agent parses NDJSON
| hal --happi-api # runtime again
| node critique_agent.js # Node.js agent
> verdict.ndjson
Each agent is in a different language. Each speaks HAPPI. No cross-language RPC. The wire format is the interface.
Legal, Medical, and Financial AI
Legal AI
legal-review/
base.happi.md
jurisdictions/
za/clause-check.happi.md # SA POPI + common law
eu/gdpr-check.happi.md # GDPR Article 44 residency
us/ccpa-check.happi.md # California privacy law
handlers/
conflict-of-laws.happi.md
citation-verify.happi.md
council-review.happi.md # quorum deliberation for high-stakes
Audit trail by design. Envelope flag audit: true captures the full event
stream to the compliance store. Every tool_call is logged. Every provider response is
retained. Subpoena-ready without additional instrumentation.
for CLAUSE in $(pdfextract contract.pdf --clauses); do
echo "$CLAUSE" \
| bash legal-review/jurisdictions/za/clause-check.happi.md \
| hal --happi-api \
>> clause-review.ndjson
done
Medical AI (HIPAA-Safe Routing)
{
"flags": {
"provider": "$ROUTE",
"data_class": "phi",
"routing_policy": "$HIPAA_POLICY"
},
"args": ["Analyse patient chart..."]
}
The routing policy maps data_class: phi to approved on-premises providers only.
Cloud providers are rejected at the envelope layer — no application bug can leak PHI.
The guardrail is in the protocol, not in application code that can be bypassed.
Financial AI (SOX-Compatible Audit)
The NDJSON event stream IS the audit log:
| Event | SOX-Relevant Field |
|---|---|
started | Who queried, when, which provider |
delta | Full token-level model output |
tool_call | Every external call (price feed, order entry, risk API) |
tool_result | Full payload of each call |
completed | Final verdict, total cost, latency |
Protocol-Level Governance
Provider Failover Without Code Changes
{
"flags": {
"provider": "anthropic/claude-opus-4-7",
"fallback": ["google/gemini-2.5-pro", "groq/llama-3.3-70b"],
"fallback_triggers": ["rate_limit", "timeout", "5xx"]
}
}
Incident response: edit the config file, SIGHUP the runtime. Deployment is
unnecessary. Application is untouched.
Data Residency
{ "flags": { "data_residency": "eu-west" } }
Runtime routes to providers with EU-only inference. Every inference logs provider region in the event stream. GDPR Article 44 compliance becomes a config rule, not an application audit.
Multi-Tenancy and Cost Governance
{
"auth": { "scheme": "keychain", "token": "keychain:tenant-acme-claude-key" },
"flags": {
"max_tokens": 2048,
"max_cost_usd": 0.05,
"budget_id": "team-legal-2026-q2"
}
}
Per-request credential routing at protocol level — Tenant A's keys never touch
Tenant B's requests. Runtime rejects if budget exceeded. Monthly reports aggregate
completed events across one policy file, all providers.
Minimum Viable Observability Stack
HAPPI event stream -> Kafka -> ClickHouse -> Grafana
Every runtime produces this for free. Zero instrumentation overhead. The structured event stream IS the telemetry.
Any Language Is an AI Application
If you can write and read JSON, you can speak HAPPI. A 1977 awk script is an AI client:
BEGIN {
print "{\"v\":\"happi/1.0\",\"cmd\":\"anthropic.messages.create\",\
\"args\":[{\"messages\":[{\"role\":\"user\",\"content\":\"Hello\"}]}]}" \
| "hal --happi-api"
while (("hal --happi-api" | getline line) > 0) print line
}
COBOL, Fortran, Lua, Tcl, Pascal — all become AI-capable without vendor involvement.
This is the syscall property: you do not need a provider to support your language; you
need read().
Testability — Mock the Wire, Not the SDK
def test_clause_review():
with mock_happi_stream([
{"type": "started", "request_id": "r1"},
{"type": "delta", "request_id": "r1", "text": "Ambiguous liability in §4.2."},
{"type": "completed", "request_id": "r1"}
]):
result = run_clause_review(test_clause)
assert "§4.2" in result.findings
No provider involved. No rate limits. No network. One mock format works for all providers.
Onboarding — One Command
cat spec.happi.md | hal --happi-api
The spec demonstrates itself. New engineer. Day one. Full system behaviour, live, in thirty seconds.
The Long Arc
Unix read(fd, buf, len) does not know whether it reads from SSD, NVMe, RAM,
network socket, or /dev/null. The application calls one primitive; the kernel
dispatches. HAPPI proposes AI inference as this primitive.
| Unix syscall | HAPPI equivalent |
|---|---|
read(fd, buf, len) | hal --happi-api < envelope.json |
| File descriptor | Provider identifier in envelope |
| Buffer | NDJSON event stream |
| Kernel VFS layer | HAPPI runtime |
| Block device driver | Provider adapter |
| Physical disk | The actual model (Claude, Gemini, Llama) |
/dev/ai — The Concrete Proof-of-Concept
echo '{"v":"happi/1.0","cmd":"anthropic.messages.create","args":["Hello"]}' \
> /dev/ai
cat /dev/ai # NDJSON events stream out
Implementation: ~500-line FUSE module. Not a 10-year arc — this is a weekend project once HAPPI v1.0 runtime adoption reaches critical mass.
AI as a Unix Pipe Stage
git log --oneline -20 \
| hal chat --provider groq/llama-3.3-70b "Summarise these commits:" \
| tee weekly-update.md
AI reasoning takes its place alongside grep, awk, sed,
jq as a standard Unix primitive. Falsified if cost remains above $0.001/query
or latency above 500ms for small queries in 2030.
IoT and Embedded Systems
An ESP32 microcontroller (32 KB RAM, no Python runtime) emits HAPPI envelopes over MQTT to a local HAPPI gateway. The ESP32 code is ~200 lines of C. It has no provider SDK. It speaks HAPPI. AI is now accessible from hardware that cannot run any existing AI SDK.
Protocol-Led Ecosystem
Runtime Proliferation
ENTER Konsult's internal runtime (in development and evaluation) is the reference. Community runtimes emerge the way HTTP clients emerged after RFC 2616:
| Language | Package | Estimated Availability |
|---|---|---|
| Python | hal-py | 3–6 months |
| Node.js / Deno / Bun | hal-js | 3–6 months |
| Go | hal-go | 6–12 months |
| Ruby | hal-rb | 6–12 months |
| Java / Kotlin | hal-jvm | 12–18 months |
| Swift | hal-swift | 12–18 months |
HAPPI Routers — The DNS Analogy
happi-resolve "legal-review-za"
# -> @codetonight/legal-review-za@1.2.3
# -> preferred-provider: ollama/gemma4:26b (on-premises, data residency)
# -> fallback: anthropic/claude-opus-4-7 (if local unavailable)
HAPPI routers are to AI capabilities what DNS is to IP addresses: capability-resolution infrastructure that the application never sees.
Precision Matters
| Category | Examples | Distinction from HAPPI |
|---|---|---|
| Application framework | LangChain, LlamaIndex, AutoGen | Live inside your application; language-specific |
| Provider SDK | OpenAI SDK, Anthropic SDK, LiteLLM | Language-specific library; normalise to one format |
| Proxy server | LiteLLM proxy, OpenRouter, Portkey | Translation layer in network path; not a protocol |
| Model | Claude, GPT-4, Gemini, Llama | The inference artifact; HAPPI is what speaks to it |
| Runtime | HAL (ENTER Konsult internal, in evaluation) | Reference implementation; not the protocol itself |
requests (the Python HTTP library).
HAPPI is HTTP. requests is excellent software — it did not make HTTP redundant.
They operate at different levels of the stack. LiteLLM could implement HAPPI as its wire format.
A HAPPI gateway could accept LiteLLM-proxy traffic via adapter. They are not competitors —
they occupy adjacent layers.
Where the Protocol Breaks Down
Long-running stateful sessions
HAPPI is request/response-oriented. Applications needing persistent multi-turn conversation must encode state in envelopes — which works but adds verbosity. Mitigation: session_id flag + runtime-level session store.
Bidirectional real-time streaming
Low-latency voice agents require WebSocket bidirectional streams. HAPPI v1.0 handles unidirectional well. Addressed in planned v1.1 WebSocket transport.
High-throughput batch
Processing millions of rows via envelope-per-query has overhead. Batch APIs process at 50% cost discount. v1.1 may introduce a batch envelope type.
Provider-specific capabilities
Anthropic prompt caching, OpenAI structured outputs, Gemini context caching — HAPPI's common abstraction can hide these. Mitigation: flags.provider_specific passed opaquely to adapter.
Falsification of the TCP/IP Analogy
The central claim — that HAPPI will do for AI what TCP/IP did for networking — is falsified if:
- Provider-specific SDK market share is growing, not shrinking, by 2030
- Major cloud providers block protocol-level interop and succeed
- HAPPI adoption stalls at fewer than 5 production runtimes outside CodeTonight within 18 months
- Semantic incompatibility between providers proves irreducible
- Regulatory action mandates provider-specific attestation incompatible with protocol-level neutrality
Open Protocol Questions for v1.1
| Question | Council Vote | Notes |
|---|---|---|
state as first-class event type? | 5 yes / 5 no / 2 neutral | Enables cross-runtime state; risks scope creep |
ensemble_synthesis at protocol level? | 3 protocol / 7 application | Keep application-level until canonical algorithm exists |
| Dedicated batch envelope type? | 8 yes / 3 no | Strong signal |
| WebSocket elevated to normative? | 9 yes / 2 no | Voice agents demand it |
provider.<id>.<key> convention? | 10 yes / 1 no | Near-consensus |
| Signing/provenance in envelope? | 4 yes / 7 ecosystem | Security-conscious ecosystem over baked-in complexity |
Five Phases to Primitive
| Phase | Timeline | Milestones |
|---|---|---|
| Seed | Now – 6 months | Reference runtime; happi.md domain live; 3 community runtime seeds (Python, Go, Node) |
| Ecosystem | 6–18 months | 20+ provider adapters; capability marketplace seed (100+ .happi.md files); first non-CodeTonight production adoption |
| Enterprise | 18–36 months | Commercial runtime with SLAs; HAPPI certification programme; HIPAA/GDPR/SOX compliance profiles |
| Standard | 3–5 years | Protocol-level new-project market share exceeds SDK market share; HAPPI-native IDEs, linters, CI integrations |
| Primitive | 5–10 years | /dev/ai proof-of-concept; AI reasoning as a Unix toolkit member |
MESH Council Synthesis
This paper documents the full possibility space of HAPPI/1.0 as established by a twelve-agent MESH deliberation council. A MESH deliberation council is a multi-agent review pattern in which each agent analyses the problem from a distinct specialist perspective, and the outputs are synthesised into a consolidated finding. The use of multiple agents reduces single-model blind spots and surfaces a broader range of considerations than any individual analysis would produce. All prospective claims carry explicit falsification conditions. Vote records are preserved in §12.
This paper is a living document. The protocol is stable; the possibilities are not.
Canonical reference: happi.md