Etherforce Observability
Every agent request that runs inside Ishvana goes through a component called Etherforce — the LLM engine that sits between the agents and the actual models. Etherforce picks which provider to use, picks which model to send the request to, counts the tokens, applies context compression if the request is too big, manages the tool registry, and tracks every call for cost accounting. Most of the time you don’t need to think about any of that. Etherforce’s job is to make the right decisions quietly and keep the bills reasonable. But when something does go wrong — a cost spike, a slow model, a bad routing decision, a broken tool registration — the Etherforce Observability panel is where you go to figure out why. The Engine section covers what Etherforce is conceptually. This page is about the diagnostic panel that surfaces its actual behavior.
The panel has eight tabs, each with a distinct job. Most authors only open two or three of them regularly (Costs, Decisions, Models). The other five exist for deeper diagnostics when something is broken and you need to figure out what.
The eight tabs
Section titled “The eight tabs”Decision Stream
Section titled “Decision Stream”Every time Etherforce picks a model for a request, it logs the decision. What agent made the request, what task type, what candidate models were considered, what scores each one got across quality/speed/cost dimensions, and which one ultimately won. The Decision Stream tab is a filterable list of those decisions.
Filter by agent, by routing path, by time range. Click any decision to see the full audit trail — the scoring breakdown, the candidates that lost and why, the model that was actually used, the final token count.
The practical use of this tab is understanding why Etherforce made a choice you didn’t expect. “Why did the quick generation go to OpenRouter instead of Ollama? I have Ollama configured as quick.” Open the decision, look at the score breakdown, see that Ollama lost on latency for that request because it was cold-starting. Now you know whether to change your routing config or live with it.
Most authors don’t open Decision Stream frequently. When you do open it, it’s usually because you’re investigating something specific. The tab is structured for investigation, not for browsing.
Routing Configuration
Section titled “Routing Configuration”This is where you tune how Etherforce picks models. The smart selection algorithm has weights — quality weight, speed weight, cost weight — and this tab is where you adjust them. You can also set route defaults (“quick tasks always go to this model,” “reasoning tasks always go to that model”) and per-agent overrides (“Hawken gets model X regardless of what the router would pick”).
The configurable knobs:
- Quality weight — how much the router cares about producing good output. Higher values push toward more capable (and more expensive) models.
- Speed weight — how much the router cares about latency. Higher values push toward faster (usually smaller) models.
- Cost weight — how much the router cares about not burning tokens. Higher values push toward cheaper models.
- Route defaults — per-task-type model assignments that bypass the scoring.
- Per-agent overrides — specific models tied to specific agents, regardless of task.
The three weights don’t have to sum to anything specific. They’re relative. Setting them all to 100 produces the same routing as setting them all to 10 — what matters is the ratio between them.
Model Registry
Section titled “Model Registry”A full inventory of every provider you have configured and every model available from each one. Each entry shows the model name, context window size, pricing per thousand input and output tokens, any description metadata the provider ships with the model, and the current assignment (which agents or tasks use this model).
Refresh the registry to re-query every connected provider for their current model list. This is useful when a provider adds new models or retires old ones — hit refresh and Ishvana re-scans.
Three providers are supported:
- Anthropic — Direct Claude API access.
- OpenRouter — A routing service that gives you access to many models from different providers through one API.
- Ollama — Local LLM inference running on your own hardware.
Each provider has its own credentials configured in Settings → Models. The Model Registry tab is read-only — it shows what’s available and what’s assigned, but configuration changes happen in Settings.
Operational Stats
Section titled “Operational Stats”The macro-lens view. Route mix (what percentage of requests went to quick vs. reasoning vs. per-agent overrides), request volume over time, per-model request counts, average latency by model. This tab is for answering questions like “which model is carrying the most load this week?” or “has my request volume gone up since last month?”
Operational Stats doesn’t tell you about individual decisions — for that, use Decision Stream. It’s a summary view for spotting patterns across many requests at once.
Cost Tracking
Section titled “Cost Tracking”This is probably the tab most authors open most often. Cost Tracking shows session spend broken down by model, agent, and provider, with configurable warning and hard-limit thresholds.
What it displays:
- Current session total. What you’ve spent on LLM calls since you opened the app.
- Warning threshold. A yellow line. Crossing it triggers a visible alert but doesn’t block anything.
- Hard limit. A red line. Crossing it blocks further LLM calls until you reset the session or raise the limit.
- Breakdown by model. Which model accumulated which portion of the total.
- Breakdown by agent. Which agent’s requests drove the cost.
- Breakdown by provider. How the cost splits across Anthropic, OpenRouter, Ollama.
- Per-call history. A scrollable list of individual calls with their input token count, output token count, cost, and latency. Useful for spotting unusually expensive calls.
- Cost rate projection. If you keep spending at the current rate, here’s what the day / week / month will cost.
- Cost accumulation sparkline. A tiny line chart showing how session cost has grown over time.
Warning and hard-limit thresholds are configurable per session and per project. Most authors set a session-level limit that would catch a runaway agent loop before it burned through their monthly budget. A reasonable starting point is a hard limit of $5 per session — enough for serious work, not enough to cause regret.
Tool Registry
Section titled “Tool Registry”Every tool every agent has registered, in one table. Grouped by agent, with columns for tool name, description, parameter count, and invocation count.
This tab exists for two reasons:
- Cache stability. Anthropic’s prompt caching works better when the tool list is in a stable alphabetical order. Etherforce maintains that order and the registry tab is where you can verify it’s working.
- Invocation observability. How often is each tool actually being called? If you see a tool with zero invocations, that’s a clue — either the tool is broken and agents can’t call it, or it’s genuinely unused and could be removed.
The Tool Registry tab is more of a developer diagnostic than a daily-use view. Most authors will never open it. When you do, it’s usually because you’re debugging an agent that isn’t using a tool you expect it to use.
Skills
Section titled “Skills”The Skills tab shows reusable writing instructions you’ve defined for your project. Style guides, quality checklists, genre conventions, naming rules — any persistent rule you want the agents (mostly Hawken) to follow across every request. Ishvana ships with nine bundled skills, and you can create custom ones.
Skills aren’t technically an Etherforce feature — they’re a layer between agents and the LLM that injects persistent rules into every request. But they live on the Etherforce panel because they’re part of the LLM pipeline. Full documentation is on the Skills page.
Benchmark Workbench
Section titled “Benchmark Workbench”The entry point for Legendry Bench — a mutation-based benchmarking workflow that tests models on your actual lore. It’s enough of its own thing that it gets its own page.
How the tabs relate to each other
Section titled “How the tabs relate to each other”Decision Stream tells you what Etherforce decided on individual requests. Routing Configuration tells you why (because the config drives the scoring). Cost Tracking tells you what it costs. Model Registry tells you what’s available. Operational Stats tells you how it all aggregates. Tool Registry tells you which tools are actually being used. Skills tell you what persistent rules are being applied. Benchmark tells you how well each model actually does on your project.
Eight tabs sounds like a lot, and it is. But each one answers a distinct question, and most of the time you only need one tab at a time. The panel is a toolbox, not a dashboard.