From 37266f3efc07b9831f51b9a381bf6f0ef38322ae Mon Sep 17 00:00:00 2001
From: Alan Wizemann <alan@wizemann.com>
Date: Tue, 5 May 2026 19:42:56 +0200
Subject: [PATCH] docs(perf): document v2.8 skeleton-then-hydrate + SSH
 cancellation patterns

---
 Architecture-Overview.md  |  2 +-
 Chat.md                   | 21 +++++++++++++++++++++
 Dashboard.md              |  6 +++++-
 Insights-and-Activity.md  |  9 +++++++++
 Performance-Monitoring.md | 18 ++++++++++++++++++
 Project-Templates.md      | 12 +++++++++++-
 Projects-and-Profiles.md  | 31 ++++++++++++++++++++-----------
 7 files changed, 85 insertions(+), 14 deletions(-)
diff --git a/Architecture-Overview.md b/Architecture-Overview.md
index 127268d..c2164c7 100644
--- a/Architecture-Overview.md
+++ b/Architecture-Overview.md
@@ -64,7 +64,7 @@ The Rich Chat surface speaks the Hermes Agent Client Protocol (ACP) — a JSON-R
 
 ## File watching
 
-`HermesFileWatcher` reacts to changes under `~/.hermes/` so the Dashboard, Sessions browser, Activity feed, and Memory viewer refresh without manual reload. Local windows use FSEvents (`DispatchSourceFileSystemObject`); remote windows use mtime polling tunneled over the SSH ControlMaster.
+`HermesFileWatcher` reacts to changes under `~/.hermes/` so the Dashboard, Sessions browser, Activity feed, and Memory viewer refresh without manual reload. Local windows use FSEvents (`DispatchSourceFileSystemObject`); remote windows use mtime polling tunneled over the SSH ControlMaster. **v2.7+** also watches each registered project's `<project>/.scarf/` directory (both local FSEvents and remote polling), so file-reading dashboard widgets (`markdown_file`, `log_tail`, `image`) refresh automatically when the cron job updates their underlying file.
 
 ## Updates
 
diff --git a/Chat.md b/Chat.md
index 3426957..4d15c7e 100644
--- a/Chat.md
+++ b/Chat.md
@@ -89,6 +89,27 @@ ScarfGo now survives phone-sleep, network handoffs, and SSH socket drops without
 
 `HermesDataService.fetchMessages(sessionId:limit:before:)` paginates by id desc with centralized `HistoryPageSize` constants. `RichChatViewModel.loadEarlier()` walks back through long sessions via `oldestLoadedMessageID` + `hasMoreHistory`. Pre-fix the message fetch was unbounded — sessions with thousands of messages were doing a full-history load on every reconnect.
 
+## Skeleton-then-hydrate chat loader _(v2.8+, Mac)_
+
+Resuming a chat on a slow remote used to fetch every column the bubble might need (`content` + `tool_calls` JSON + `reasoning_content`) in one shot, which routinely tripped the 30s SSH timeout on chats with multi-page tool result blobs. v2.8 splits the load into two phases:
+
+1. **Skeleton.** `fetchSkeletonMessages` selects only user + assistant rows (skips `role='tool'`) with `tool_calls`/`reasoning`/`reasoning_content` hard-NULLed at the SQL level. Wire payload bounded by conversational text alone — typically a few KB. The chat appears in seconds.
+2. **Background hydrate.** `RichChatViewModel.startToolHydration()` pages through `hydrateAssistantToolCalls` in 5-id batches to splice tool-call JSON into the existing assistant messages. Tool-result CONTENT is opt-in (Settings → Display → "Load tool results in past chats", default off) — without it, tool call cards still render, and the inspector pane lazy-fetches per-result content via `fetchToolResult(callId:)` when you open it.
+
+The chat header surfaces "Loading tool details…" while hydration is in flight. If a 5-id batch trips the 30s timeout (an oversized `tool_calls` blob — long Edit args, big diffs), an L1 single-id retry isolates the whale so the rest of the batch still hydrates. The whale row stays bare; the assistant message is still readable.
+
+## Partial-result + chat error banners _(v2.8+, Mac)_
+
+When the skeleton fetch itself trips an SSH transport failure (rather than a clean empty result), the chat surfaces "Couldn't load full chat history — the connection to *server* timed out" through the existing `acpError` triplet so the user sees what happened instead of a silent empty transcript. A separate banner detects `model.default` / `model.provider` mismatches in `config.yaml` (e.g. `model.default: anthropic/...` with `model.provider: nous` after switching OAuth providers via Credential Pools) and offers a one-click fix in either direction. The ACP error classifier also recognizes `model_not_found` / `404 messages` / `model is not available` and surfaces "This session was created with a model the provider no longer offers — start a new chat" so the pinned-model failure mode has a clear recovery path.
+
+## Loading-state UX during session boot _(v2.8+, Mac)_
+
+The Mac chat sidebar greys out and disables row taps the moment a session-switch is initiated (synchronously, before `client.start()` returns), with a floating ProgressView showing the current phase: "Spawning hermes acp…" → "Authenticating…" → "Loading session…" → "Loading history…" → "Ready". Pre-fix the sidebar looked engageable while the 5-7 second SSH+ACP boot was still in flight, and the user could queue up a second session-switch behind the first. The new gating prevents that contention.
+
+## SSH cancellation propagation _(v2.8+, Mac)_
+
+Cancelling a Swift Task used to leave the underlying ssh subprocess running for the full 30s SSH timeout — `Task.detached` doesn't inherit cancellation from the awaiting parent, so `proc.terminate()` was never called. This pinned remote sqlite queries and ControlMaster sessions when the user navigated away mid-load. v2.8 wires `withTaskCancellationHandler` through `SSHScriptRunner.run` and `RemoteSQLiteBackend.query`; cancellation now reaches the `Process` within ~100ms (the poll-loop interval). Fires `ssh.cancelled` in ScarfMon traces. Fixes the "third chat hangs" / "dashboard spins after rapid switching" symptom.
+
 ## iOS keyboard dismissal _(v2.5.1+)_
 
 Pre-fix the chat composer's `TextField` had no keyboard dismissal at all — the keyboard would rise and stick, hiding the system tab bar (which iOS auto-hides while a keyboard is up) and trapping users in the Chat tab. v2.5.1 adds two redundant dismissal paths:
diff --git a/Dashboard.md b/Dashboard.md
index 6f36fad..d726880 100644
--- a/Dashboard.md
+++ b/Dashboard.md
@@ -19,6 +19,10 @@ A scrolling stack of cards, refreshed automatically when `~/.hermes/state.db`, `
 
 Local windows watch `~/.hermes/` with FSEvents (`DispatchSourceFileSystemObject`). Remote windows poll mtimes every 3 seconds over the SSH ControlMaster. Either way, the Dashboard updates without needing a manual refresh — open it on a second monitor and watch sessions tick by as Hermes works.
 
+### Project dashboards (v2.7+)
+
+Per-project dashboards (the **Projects** sidebar item, separate from this system Dashboard) refresh on a project-wide watch: any change anywhere under `<project>/.scarf/` triggers a reload, not just `dashboard.json` itself. So a `markdown_file` widget pointing at `reports/weekly.md` (placed under `.scarf/reports/`) refreshes automatically when the cron job rewrites it. The same coverage applies on remote SSH-attached projects via 3-second mtime polling on each project's `.scarf/` directory. _Limitation:_ in-place appends to an existing file (`>> file.log`) don't tick the watcher — write atomically (write-temp + rename) or `touch dashboard.json` after each cron run.
+
 ## Status pill in the toolbar
 
 Every Scarf window has a connection pill in the toolbar showing the bound server and its state:
@@ -49,4 +53,4 @@ A **Switch server** button in the iOS Dashboard's top-right corner (added v2.5)
 - [ScarfGo](ScarfGo) for the iOS Dashboard tour.
 
 ---
-_Last updated: 2026-04-25 — Scarf v2.5.0 (added iOS Dashboard cross-reference + Switch server button)_
+_Last updated: 2026-05-04 — Scarf v2.7 (project-wide auto-refresh on `.scarf/` directory)_
diff --git a/Insights-and-Activity.md b/Insights-and-Activity.md
index 0af54dd..595802c 100644
--- a/Insights-and-Activity.md
+++ b/Insights-and-Activity.md
@@ -41,6 +41,15 @@ The per-tool execution feed — what Hermes did, when, and with what arguments:
 - **Detail inspector** — pretty-printed arguments JSON, tool output (when available), `tool_call_id` for cross-referencing back to the Sessions message stream.
 - **Live refresh** — same `HermesFileWatcher`; the feed scrolls as Hermes works.
 
+### Skeleton-then-hydrate loader _(v2.8+)_
+
+Activity historically pulled the full message column set for the 200 most recent tool-call rows in one shot, which routinely tripped the 30s SSH timeout on remote contexts (the `tool_calls` JSON column for 200 rows = ~600KB-1MB on the wire). v2.8 splits the load:
+
+1. **Phase 1 — skeleton.** `fetchRecentToolCallSkeleton(limit: 50)` projects only `id` + `session_id` + `role` + `timestamp` (everything fat NULLed at the SQL level). Wire payload ≈ 3 KB. The day-grouped feed renders placeholder rows immediately.
+2. **Phase 2 — paged hydrate.** `hydrateAssistantToolCalls` runs in 5-id batches in the background via `startToolCallHydration()`. Each batch splices parsed `[HermesToolCall]` arrays into the existing skeleton; `filteredActivity` swaps the placeholder entry for the real per-call entries on the next observation tick. A "Loading tool details…" pill in the header surfaces hydration progress.
+
+When a 5-id batch trips the 30s timeout (an oversized `tool_calls` blob), an L1 single-id retry isolates the offending row so the rest of the batch still hydrates. Transport-layer failures during the skeleton fetch surface an orange "Couldn't load activity" banner with a Retry button instead of the silent empty state pre-v2.8 left users staring at.
+
 ## Live data freshness
 
 All three views observe the file watcher and re-query when `state.db` changes. Remote windows pull a fresh atomic snapshot via `sqlite3 .backup` (deduped by `SnapshotCoordinator` so Dashboard + Insights + Sessions don't each spawn parallel backups). See [Transport Layer](Transport-Layer) for the snapshot mechanics.
diff --git a/Performance-Monitoring.md b/Performance-Monitoring.md
index 8d22e48..fe25746 100644
--- a/Performance-Monitoring.md
+++ b/Performance-Monitoring.md
@@ -64,12 +64,30 @@ For a full export, hit **Copy as JSON**. Each line is one sample with `category`
 | `chatStream` | `finalizeStreamingMessage` | Same | End-of-turn finalize cost (target: < 1 ms) |
 | `chatStream` | `ios.send` / `ios.startResuming` / `ios.acpEvent` / `ios.handleACPEvent` | iOS `ChatView.swift` | Same shape on iOS |
 | `sessionLoad` | `mac.startACPSession` / `ios.startResuming` | Both targets | Session boot cost |
+| `sessionLoad` | `mac.fetchSkeletonMessages` / `.rows` / `.transportError` | [HermesDataService](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Services/HermesDataService.swift) | Phase 1 of v2.8 two-phase chat loader — user+assistant rows only, ~few KB on the wire regardless of tool_calls blob size |
+| `sessionLoad` | `mac.fetchToolCallSkeleton` / `.rows` / `.transportError` | Same | Phase L Activity skeleton fetch — metadata-only, ~3 KB for 50 rows |
+| `sessionLoad` | `mac.hydrateToolCalls` / `.rows` / `.cancelled` / `.pageTimeout` / `.singleTimeout` | Same | Phase 2a paged hydration. `pageTimeout` → batch fell back to single-id retry; `singleTimeout` → individual whale row skipped |
+| `sessionLoad` | `mac.hydrateToolResults` / `mac.hydrateTools.skippedToolResults` / `.dropped` / `.complete` | Same | Phase 2b tool-result content hydrate. `skippedToolResults` fires when the opt-in setting is off (default); `dropped` fires when the parent task cancelled mid-page |
+| `sessionLoad` | `mac.lazyToolResult.fetched` | Same | Inspector pane lazy-fetched a single tool result on user expand |
+| `sessionLoad` | `mac.fetchMessages.transportError` | Same | Skeleton fetch tripped the SSH timeout — chat surfaces the partial-result banner |
+| `sessionLoad` | `mac.loadRecentSessions.coalesced` | [Mac ChatViewModel](https://github.com/awizemann/scarf/tree/main/scarf/scarf/Features/Chat/ViewModels/ChatViewModel.swift) | A second caller awaited the in-flight load instead of spawning a parallel SSH round-trip |
 | `sqlite` | `sqlite.query` / `sqlite.queryBatch` | [RemoteSQLiteBackend](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Services/Backends/RemoteSQLiteBackend.swift) | Per-call latency over SSH (carries row count + stdout bytes) |
 | `transport` | `ssh.streamScript` (iOS) / `ssh.run` (Mac) | [CitadelServerTransport](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfIOS/Sources/ScarfIOS/CitadelServerTransport.swift), [SSHScriptRunner](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Transport/SSHScriptRunner.swift) | SSH round-trip time |
+| `transport` | `ssh.cancelled` | [SSHScriptRunner](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Transport/SSHScriptRunner.swift) | Parent task cancellation reached the ssh subprocess (v2.8) — terminated within 100ms instead of running to its 30s deadline |
 | `diskIO` | `loadConfig` / `loadCronJobs` | [HermesFileService](https://github.com/awizemann/scarf/tree/main/scarf/scarf/Core/Services/HermesFileService.swift) | Hot disk reads. `loadConfig` also logs caller stack frames in Full mode |
 
 Adding a new measure point is two lines (see Developer Guide below).
 
+### v2.8 perf architecture: skeleton-then-hydrate + cancellation propagation
+
+Two patterns landed in v2.8 that anyone touching remote-context code should know about:
+
+**Skeleton-then-hydrate.** Heavy SSH fetches (`fetchMessages`, `fetchRecentToolCalls`) used to pull the FAT column set in one shot, which routinely tripped the 30s SSH timeout on chats with multi-page tool result blobs. The new pattern: a fast skeleton fetch projects only the columns needed to render placeholder rows (NULLs the heavy ones at the SQL level), then a paged background hydration fills the rest in. Used by chat-resume (`fetchSkeletonMessages` + `hydrateAssistantToolCalls`) and Activity (`fetchRecentToolCallSkeleton` + same hydrate). Pages run in 5-id batches; if a page times out, an L1 single-id retry isolates the whale so the rest of the batch still hydrates.
+
+**Cancellation propagation through SSH.** `Task.detached { … }` doesn't inherit cancellation from the awaiting parent, and `Task<…> { … }` (unstructured) also drops the signal. Without explicit bridging, cancelling a chat-load Task only unwinds Swift state — the underlying ssh subprocess kept running for the full 30s, pinning a remote sqlite query and a ControlMaster session slot. v2.8 wires `withTaskCancellationHandler` through `SSHScriptRunner` and `RemoteSQLiteBackend.query` so parent cancellation reaches the `Process` and calls `proc.terminate()` within 100ms. New `ssh.cancelled` event surfaces this.
+
+**In-flight coalescing.** `loadRecentSessions` (Mac chat sidebar) coalesces against an in-flight task. File-watcher deltas during streaming used to stack 2-3 parallel `loadRecentSessions` tasks; now subsequent callers await the active one. New `mac.loadRecentSessions.coalesced` event tracks how often the dedup fires.
+
 ## Capture recipe for a useful baseline
 
 1. Build + run the latest version.
diff --git a/Project-Templates.md b/Project-Templates.md
index 8499b4f..e940974 100644
--- a/Project-Templates.md
+++ b/Project-Templates.md
@@ -184,6 +184,16 @@ Templates can ship project-scoped slash commands by listing each name in `conten
 
 `schemaVersion` bumps to **3** only when a bundle ships slash commands. v1 and v2 bundles continue to install identically (the installer accepts schemaVersion 1, 2, and 3). See [Slash Commands](Slash-Commands) for the file format and substitution rules.
 
+### v2.7 widget vocabulary expansion (no schema bump)
+
+Scarf v2.7 added five new widget types — `markdown_file`, `log_tail`, `cron_status`, `image`, `status_grid` — plus a `sparkline` field on `stat` and a typed status enum on `list` items. **None of these require a `schemaVersion` bump.** They're additive within `dashboard.json` itself, so:
+
+- v1, v2, v3 bundles that use only the original 7 widget types keep working byte-identically on v2.7+.
+- Bundles that adopt new widget types still validate against the existing manifest schema — only the catalog validator's vocabulary list ([`tools/widget-schema.json`](https://github.com/awizemann/scarf/blob/main/tools/widget-schema.json)) was extended.
+- A v2.7-authored dashboard installed into a pre-v2.7 Scarf renders unknown widgets as a clearly-labeled error card (not a crash), so forward-incompatibility degrades gracefully.
+
+See [Projects](Projects-and-Profiles) for the full widget catalog and the typed status badge synonyms.
+
 ### `AGENTS.md` contract
 
 `AGENTS.md` is the single source of truth for what the project does and how to operate it. It must:
@@ -285,4 +295,4 @@ Use `{{PROJECT_DIR}}` in the cron prompt. Hermes doesn't set a CWD for cron runs
 - [Release Notes Index](Release-Notes-Index) — v2.2.0 for the full launch notes.
 
 ---
-_Last updated: 2026-04-29 — Scarf v2.5.2 (remote-aware parent-directory pick on remote server contexts)_
+_Last updated: 2026-05-04 — Scarf v2.7 (5 new widget types; no manifest schema bump required)_
diff --git a/Projects-and-Profiles.md b/Projects-and-Profiles.md
index 04136c9..1ccc9c5 100644
--- a/Projects-and-Profiles.md
+++ b/Projects-and-Profiles.md
@@ -6,17 +6,26 @@ Two distinct concepts in adjacent sidebar items. **Projects** are agent-generate
 
 A project is any directory you tell Scarf about — typically a code repo, but anything works. Each project gets a custom dashboard composed of widgets defined in `<project>/.scarf/dashboard.json`.
 
-**Widget types** (from [`ProjectDashboard.swift`](https://github.com/awizemann/scarf/blob/main/scarf/scarf/Core/Models/ProjectDashboard.swift)):
+**Widget types** (canonical vocabulary lives at [`tools/widget-schema.json`](https://github.com/awizemann/scarf/blob/main/tools/widget-schema.json); each type maps to a Swift view under [`scarf/scarf/Features/Projects/Views/Widgets/`](https://github.com/awizemann/scarf/tree/main/scarf/scarf/Features/Projects/Views/Widgets)):
 
-| Type | Purpose |
-|---|---|
-| `stat` | Single metric: value + label + optional icon and color. |
-| `progress` | Progress bar with label. |
-| `text` | Markdown / plain text block. |
-| `table` | Columns + rows. |
-| `chart` | Line / bar / area / pie with `ChartSeries[]` of `ChartDataPoint{x, y}`. |
-| `list` | Bulleted list with optional status badges. |
-| `webview` | Embedded web view (URL + height). |
+| Type | Since | Purpose |
+|---|---|---|
+| `stat` | v2.2 | Single metric: value + label + optional icon, color, and inline `sparkline: [Number]` trend (v2.7+). |
+| `progress` | v2.2 | Progress bar with label (0.0..1.0). |
+| `text` | v2.2 | Inline markdown / plain text block. |
+| `table` | v2.2 | Columns + rows of strings. |
+| `chart` | v2.2 | Line / bar / area / pie with `ChartSeries[]` of `ChartDataPoint{x, y}`. |
+| `list` | v2.2 | Bulleted list with optional **typed** status badges per item — see Status badges below. |
+| `webview` | v2.2 | Embedded web view (URL + height). Including any webview also exposes a Site tab. |
+| `markdown_file` | v2.7 | Renders a markdown file from `<project>/<path>`. Refreshes when any file under `.scarf/` changes. |
+| `log_tail` | v2.7 | Tails the last `lines` of a file (default 20), monospaced; ANSI codes stripped. |
+| `cron_status` | v2.7 | Last run / next run / state for a Hermes cron job by `jobId`, plus a small log tail. |
+| `image` | v2.7 | Local image (`path` relative to project root) or remote `url`. |
+| `status_grid` | v2.7 | Compact NxM grid of colored cells, one per service / item. Reuses the typed status enum. |
+
+**Status badges (typed in v2.7):** `list` items and `status_grid` cells accept a `status` field that maps to a colored badge. Canonical values are `success`, `warning`, `danger`, `info`, `pending`, `done`, `neutral`. Common synonyms also work (`ok` / `up` → success, `down` / `error` / `failed` → danger, `active` → info, `complete` → done). Unknown strings render as plain text — old dashboards that used ad-hoc statuses keep working byte-identically.
+
+**Auto-refresh (v2.7):** Scarf watches each project's entire `<project>/.scarf/` directory, not just `dashboard.json`. So when a cron job atomically writes `<project>/.scarf/reports/uptime.md` (write-temp + rename), the `markdown_file` widget pointing at it refreshes automatically. _Limitation:_ in-place appends to an existing file (`>> file.log`) don't tick the watcher — the cron job should write atomically, or `touch dashboard.json` after each run to force a refresh. Remote (SSH-attached) projects share the same coverage via 3-second mtime polling.
 
 **The Hermes pattern:** ask your agent to build and maintain the dashboard for you. "Update `.scarf/dashboard.json` to show test pass rate, lines of code, and the open PR list." Scarf renders the result; the agent maintains it.
 
@@ -58,4 +67,4 @@ Remote SSH contexts don't yet auto-resolve `active_profile` — `HermesPathSet.d
 - [Settings](Gateway-Cron-Health-Logs) — exposes "Backup & Restore" buttons (`hermes backup` / `hermes import`) at the profile level.
 
 ---
-_Last updated: 2026-04-29 — Scarf v2.5.2 (remote-aware profile import/export sheets)_
+_Last updated: 2026-05-04 — Scarf v2.7 (project-wide auto-refresh, 5 new widget types, typed status enum)_