Agent Overview

Most tools want you to trust them without looking under the hood. They ship with a chat interface, something responds, and if the output looks fine you assume the system is working. The failure mode is that when the system isn’t working — when a handler regression silently drops GameMaster’s success rate from 92% to 71%, when Hawken’s latency has been creeping up for days, when Lagan is failing on research requests — you don’t find out until you feel it in the output quality. Agent Overview makes the agent pipeline’s own performance visible so you can catch regressions in the numbers before they show up in the prose.

Every agent tracks its own operations. Every operation records success or failure, duration, a short summary, and an error message if something went wrong. The data is persistent — across restarts, across sessions, across months — and the Agent Overview panel is the dashboard that reads it.

What the panel shows

The panel is one row per agent — Ishvana, Hawken, Lagan, GameMaster, WorldKnowledge, Lorekeeper — plus a few system-level entries for background services. Each row shows:

Agent name and status. “Awake” means the agent is loaded and ready to accept requests. “Asleep” means the agent has been initialized at some point but isn’t currently loaded.
Operation count. Total operations this session, this day, this week, or this month — toggleable.
Success rate. Percentage of operations that completed without error.
Average duration. How long the average operation takes.
Last activity. Timestamp of the most recent operation.
Trend indicator. Improving, steady, or degrading — a short-term trend based on the last N operations versus the session average.

Click any row and the detail view opens with a breakdown by operation type. Instead of “Hawken did 47 operations,” you see the specific handler-backed work Hawken performed: scene scans, chapter scans, manuscript scans, style analysis, and related analysis passes. Each operation type gets its own success rate, average duration, and last-executed timestamp.

Why operation type matters

A single agent can do multiple distinct jobs, and those jobs can have very different reliability profiles. Hawken, for example, can run scene, chapter, and manuscript analysis. Manuscript analysis might be at 95% success while a scene-level pass is at 70% — but if you only look at Hawken’s overall success rate, you see 82% and don’t know which one is the problem.

The per-operation-type breakdown is the thing that makes the panel diagnostic rather than just informational. “Hawken’s success rate dropped” is vague. “Hawken’s scene-scan success rate dropped from 88% to 62% starting three days ago” is actionable — you can go look at what changed three days ago and figure out what broke.

Trend indicators and regression detection

Every agent row shows a short-term trend computed from recent operations versus the longer-term average. The indicator is qualitative — improving, steady, degrading — and it’s derived from a simple comparison: the success rate of the last N operations versus the success rate of the last N*5 operations. If recent is noticeably worse than longer-term, the trend is degrading.

The trend matters because it’s the early warning system for handler regressions and configuration drift. If a GameMaster handler changed yesterday and today GameMaster’s trend is degrading, the two are probably related and you should check. The trend won’t tell you why — just that something changed and it’s worth investigating.

The failure detail view

When an operation fails, the error message is captured. Click any agent’s “recent failures” view to see the list of errors with their timestamps, operation types, short summaries, and full error messages. This is where you diagnose why things are breaking, not just that they’re breaking.

A few common things you’ll see in the failure view:

Timeouts. The agent was still working when the operation hit its timeout. Usually means the local handler had too much work for the configured limit. Fix is reducing scope or adjusting the relevant performance setting.
Context too large. The request included more project context than the handler surface can use cleanly. Fix is trimming pins or narrowing the scope.
Handler errors. The local handler returned a structured failure. Open the matching Etherforce dispatch for the handler id, scope, and error text.
Parse failures. A local parser or composer returned output that couldn’t be shaped into the expected structure. The error message should name the failing route.

Most failures are self-explanatory once you see the error message. The value of the failure view is that the messages are captured in one place instead of being buried in a log file somewhere.

Session vs. historical data

The panel can display data at four time scales:

Current session — everything since you opened the app. Good for spot-checking during a writing session.
Today — last 24 hours. Good for “how have the agents been doing today?”
Last 7 days — the short-term view. Good for catching trends that take a few days to develop.
Last 30 days — the long-term view. Good for month-over-month comparisons and catching slow drift.

Switching between time scales doesn’t re-query the backend — the data is already loaded for all four scales, and switching is instant. This matters because it lets you look at the same question at multiple zoom levels without waiting between each view.

What the panel is not

A few things Agent Overview deliberately doesn’t do:

It’s not a leaderboard. There’s no ranking of “best agent.” Each agent does a different job, and comparing them against each other doesn’t make sense. Hawken’s 94% success rate on prose generation doesn’t mean Hawken is “better” than Lagan’s 89% on research pipelines — they’re measuring different things.
It’s not a billing view. Operation count doesn’t translate to dollars spent because Divinity Engine dispatch has no per-message provider cost.
It’s not real-time. The panel updates on a refresh interval (or on manual refresh), not live. An operation that just finished might take a few seconds to show up in the panel. This is deliberate — polling the metrics every second would be wasteful for a diagnostic view that’s usually only open for a few minutes at a time.
It’s not a replacement for real error logs. The failure view is a summary. If you need the full stack trace of a specific failure, that lives in Error Tracking.

Philosophy

This panel exists because Ishvana treats agent transparency as a product feature rather than a developer setting. You’re paying for a tool that runs agents on your behalf, and you deserve to know whether those agents are doing their job. The Agent Overview panel is how that promise gets kept in practice — not as marketing copy, but as real numbers you can check any time.

The more subtle benefit is that it changes how you talk about problems. “Something felt off today” is a vague complaint you can’t act on. “GameMaster’s stat generation success rate dropped 15% after the handler changed yesterday” is a specific claim backed by data, and now you know exactly where to look. The panel turns vibes into metrics.

What’s next

Etherforce Observability The panel tracking handler dispatch, cache behavior, errors, and latency.

Error Tracking Full stack traces and diagnostic context for failures captured here.

Agents overview Which agent does what, and how they fit together.