Why KruxOS beats shell tools

01 · Token efficiency

Same operation. Half the tokens.

An agent calling a typed capability spends fewer tokens describing what it wants, gets a structured response back, and doesn't have to grep its own stdout. Numbers below are illustrative — empirical ranges from our v0.0.1 traces.

Read a JSON file

/workspace/config.json

⊘ Shell tool — Bash parse, hope, retry

# Agent emits: { "tool": "bash", "command": "cat /workspace/config.json" } # Agent receives (stdout, untyped): {"model":"claude-3-5","timeout":30,...} # Then must parse, detect JSON vs error, # handle "file not found" string by reading # stderr separately, etc.

Input

~140 tok

Output

~180 tok

Error recovery

~220 tok

✓ KruxOS — typed capability structured both ways

# Agent emits: filesystem.read({ path: "/workspace/config.json" }) # Agent receives (typed result): { content: {...}, bytes: 412, mtime: "2026-05-12T08:14:02Z" } # On miss: typed error {code, path, hint}

Input

~38 tok

Output

~80 tok

Error recovery

~45 tok

HTTP POST with JSON response parsing

/api/agents/status

⊘ Shell tool — curl + jq three tools, one job

curl -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOK" \ -d '{"agent_id":"a-42"}' \ https://api.example.com/agents/status \ | jq '.health' # Status code? Lost. Headers? Lost. # Bad JSON? jq crashes. Retry logic? # Agent writes it inline, every time.

Input

~180 tok

Output

~200 tok

Error recovery

~280 tok

✓ KruxOS — typed capability one call, full envelope

network.http_request({ method: "POST", url: "https://api.example.com/agents/status", json: { agent_id: "a-42" }, secret_header: { Authorization: "Bearer ${api_token}" } }) # Returns: {status, headers, json, latency_ms}

Input

~52 tok

Output

~95 tok

Error recovery

~60 tok

Run a process with arguments

rm -rf /tmp/build

⊘ Shell tool — bash -c string templating into rm

bash -c "rm -rf /tmp/build" # Quoting? Whitespace in paths? # Injection if $path is user-derived? # Whose problem? The agent's prompt. # No structured exit code — only stdout/err # strings the agent must re-parse.

Input

~110 tok

Output

~70 tok

Error recovery

~160 tok

✓ KruxOS — typed capability argv array, no shell expansion

process.run({ command: "rm", args: ["-rf", "/tmp/build"] }) # Returns: {exit_code, stdout, stderr, # duration_ms, killed} # Policy checked argv BEFORE fork — "rm -rf /" # would block without burning a single syscall.

Input

~32 tok

Output

~70 tok

Error recovery

~40 tok

3.7× Avg input token reduction across all three examples.

2.4× Avg output token reduction. Typed responses don't carry framing.

4.9× Avg error-recovery token reduction. Structured errors don't need parsing.

0 LLM tokens spent on policy decisions. The gate runs deterministically.

02 · Safety

Built-in safety vs. bolt-on.

Shell tools are powerful because they trust the caller. KruxOS capabilities are powerful because they don't have to.

Destructive operations

The classic "agent typed the wrong path" failure mode.

⊘ Shell

rm -rf /important/dir

Gone. No trash, no recovery, no audit. The agent doesn't know it broke anything until the next test fails.

✓ KruxOS

filesystem.delete({ path: "/important/dir" })

Soft-deletes to per-principal trash. 168h retention for User, 24h for Agent. Recover with filesystem.restore().

Bulk operations

Where one mistake becomes thousands.

⊘ Shell

cat ./contacts.csv | mailx -s "Update" all@

Fires immediately. No cancellation window. Hits the SMTP server before the operator has read the agent's plan.

✓ KruxOS

email.bulk_delete({ count: 1247 })

approval_required tier by default. Or email.send to lists is write-buffered with a 30s cancellation window.

Secret handling

The most common way agents leak.

⊘ Shell

export API_KEY=... && curl -H "Authorization: $API_KEY"

Secret lives in the agent's env, its shell history, its logs, and its next prompt. Any of those leak the lot.

✓ KruxOS

secrets.use({ secret_id: "api_token_v1" })

Vault-backed, AES-256-GCM, use-not-read. The raw value never crosses the agent boundary — the gateway substitutes it server-side.

03 · Operator visibility

What the supervisor actually sees.

Audit isn't a feature you bolt on after the agent runs amok. It's the substrate.

⊘ Shell-tool agent

Whatever the agent decided to log.

No visibility into what the agent did unless logging was instrumented up-front.
Stdout/stderr only — if the agent suppressed output, it's gone.
Tampering surface = anything the agent could touch (which is everything).
No replay. No diff. No timeline.

✓ KruxOS

Every call. Tamper-evident.

Every capability call hash-chained into the audit log — CBOR, append-only.
Per-principal actor: which agent, which user, which session, which policy tier.
Bounded ring-buffer in memory + disk-full retry. Daily rotation, 90-day retention.
SIEM streaming on Enterprise. Replay tool ships v0.0.2.

04 · When not to use KruxOS

When a shell is still the right answer.

A policy gate is overhead. Sometimes the overhead isn't worth it — and we'd rather you know than learn the hard way.

One-off human-driven scripts.

You're not running make or cargo build through a typed capability. The cost of writing a policy for a thing you'll never audit is higher than the risk it mitigates.

Hot-path sysadmin tooling.

Some bespoke shell pipelines have been tuned for years. A typed capability is wider, friendlier, and slower. If your tooling outperforms what we'd ship — keep it.

Anything humans do directly.

Use KruxOS for what an AI is doing. The supervision surface, the audit surface, the policy surface — they all assume the actor is an agent. If the actor is you, you don't need them.

Sized right, KruxOS doesn't replace your shell. It replaces the shell your agent would have used — and gives you the supervisor's chair.

Download v0.0.2 → Read the docs →

Why typed capabilities beat shell tools.

Same operation. Half the tokens.

Read a JSON file

HTTP POST with JSON response parsing

Run a process with arguments

Built-in safety vs. bolt-on.

Destructive operations

Bulk operations

Secret handling

What the supervisor actually sees.

Whatever the agent decided to log.

Every call. Tamper-evident.

When a shell is still the right answer.

One-off human-driven scripts.

Hot-path sysadmin tooling.

Anything humans do directly.