383 Commits

Author SHA1 Message Date
Alan Wizemann fd80f4f95a Create FUNDING.yml 2026-05-07 12:55:53 +02:00
Alan Wizemann 9f240ae291 chore: Bump version to 2.7.1 v2.7.1 2026-05-07 12:46:11 +02:00
Alan Wizemann 9c149b288b fix(docs): restore Sonoma compatibility messaging in BUILDING.md + CONTRIBUTING.md
Scarf's `MACOSX_DEPLOYMENT_TARGET` is `14.6` (Sonoma) on the main
`scarf` target, set in 86762ea. Sonoma support is intentional —
several users dogfood on macOS 14.x and we want to keep them on the
release channel. Yesterday's BUILDING.md and the long-stale
CONTRIBUTING.md statement both claimed macOS/Xcode 26.x as minimums,
which would have steered Sonoma contributors and users away from a
build that actually runs on their box.

Correct values:

- Runtime min: **macOS 14.6 (Sonoma)** — matches the deployment target.
- Build min: **Xcode 16.0** — needed for Swift 6 strict-concurrency
  features the codebase uses.

Add a load-bearing-callout to BUILDING.md so future doc edits don't
silently raise the floor again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:13:18 +02:00
Alan Wizemann 37afbdeffc feat(build): contributor-friendly local-build.sh + BUILDING.md
Adds `scripts/local-build.sh` for unsigned command-line Debug builds
so contributors without an Apple Developer account can clone, build,
and run without provisioning gymnastics. The script:

- Detects arm64 / x86_64
- Verifies xcode-select, xcrun, xcodebuild are present
- Probes the Metal toolchain and offers an interactive install (gated
  on `[[ -t 0 && -z "${CI:-}" ]]` — CI never gets prompted)
- Resolves Swift packages, builds Debug with signing disabled
- Optionally `ditto`s the result to /Applications/scarf.app on
  explicit y/N

`BUILDING.md` documents prerequisites alongside the script. Existing
canonical Release universal CLI in README stays — `local-build.sh`
is an alternative for contributors, not a replacement for the
shipping build.

Cherry-picked from #76 with thanks to @unixwzrd. BUILDING.md's
prerequisites are corrected to match the actual deployment target
(macOS 26.2, Xcode 26.2+).

Co-Authored-By: M S <unixwzrd.register@mac.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:08:33 +02:00
Alan Wizemann bfd9bab9a0 fix(health): stop external dashboards by port, not pkill -f
`stopDashboard()` used to fall back to `pkill -f "hermes dashboard"`
when the running dashboard wasn't a Scarf-spawned subprocess. That's
broad enough to match shell history, log tails, README readers, and
this very source file — anything with the substring "hermes
dashboard" in its argv was a kill target.

Replace with a port-anchored lookup: `lsof -tiTCP:<port> -sTCP:LISTEN`
returns the PID actually bound to the dashboard port, then we
`SIGTERM` only that one process. Trusting the port is correct here:
Scarf owns the configured port and the user-visible intent is "stop
the thing on this port."

We deliberately omit `lsof -c hermes`. Hermes installs as a Python
shebang script (verified locally — `file ~/.local/bin/hermes` →
"a python3 script text executable"), so the kernel COMM is `python` /
`python3`, never `hermes`. A `-c hermes` filter would silently miss
every standard install.

Cherry-picked from #76 with thanks to @unixwzrd for the direction;
this version drops the `-c hermes` filter to actually fire on real
Hermes installs.

Co-Authored-By: M S <unixwzrd.register@mac.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:08:23 +02:00
Alan Wizemann 2e0eb63ea4 fix(health): tighten Hermes gateway pgrep so unrelated commands don't match
`hermesPIDResult()` was running `pgrep -f hermes`, which matched any
process with "hermes" anywhere in its argv — `hermes acp` chat
sessions Scarf itself spawns, `hermes -z` one-shots, log tails, even
this very file in an editor. The Dashboard "Hermes is running" badge
read true even when the gateway daemon was down.

Narrow the match to the gateway shape specifically. Two alternations
cover both invocation forms used in the wild:

- `python -m hermes_cli.main gateway run …` (the launchctl form)
- `/path/to/hermes gateway run …` (the script-path form)

Verified locally against an actual gateway PID:

    cmd=/Users/.../python -m hermes_cli.main gateway run --replace

The first alternation matches via the `-m hermes_cli.main gateway run`
boundary. All callers — `stopHermes()`, `DashboardViewModel`,
`HealthViewModel`, `SettingsViewModel`, `scarfApp` — semantically
want the gateway PID specifically, so the narrower match is the
right shape, not a behavior change.

Cherry-picked from #76 with thanks to @unixwzrd for the diagnosis
and the regex.

Co-Authored-By: M S <unixwzrd.register@mac.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:08:11 +02:00
Alan Wizemann 3a3c87e033 fix(skills): scope What's New pill to Installed tab + reword updated→changed
Issue #78 — The "What's New" pill at the top of the Skills page
announced "18 new, 3 updated since you last looked" while the Updates
sub-tab simultaneously said "No Updates / All skills are up to date."
Two surfaces measuring two different things both used the word
"update": the pill counts local file deltas since the user last
clicked "Mark as seen", while the Updates body runs `hermes skills
check` to find skills with newer upstream versions available. From
the user's seat the screen contradicted itself.

Two changes:

1. Render the pill only on the Installed sub-tab (Mac + ScarfGo).
   Local file deltas are contextually meaningful only on the tab
   that surfaces installed skills; showing them above Browse Hub or
   Updates was misleading.

2. Reword the pill: "X updated since you last looked" → "X changed
   since you last looked". Keeps `SkillSnapshotDiff.updatedCount` as
   the field name (it's still about file changes, not version bumps);
   only the user-visible string changes. Removes the vocabulary
   collision with the Updates tab's separate upstream-update check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:51:05 +02:00
Alan Wizemann f9e3cd38f5 fix(skills): client-side filter for All-Sources hub search
Issue #79 — Browse Hub clearly listed "honcho" but searching for
"honcho" with the source picker on "All Sources" returned nothing.
Root cause is on the Hermes side: `hermes skills search <query>`
without a `--source` flag routes through the centralized
`hermes-index` source and skips the external API sources
(skills-sh, github, clawhub, lobehub, well-known, claude-marketplace).
Browse aggregates those sources too, so any skill that lives only in
the API tier shows up in browse but disappears in search. Same picker,
same query, contradictory results.

Rather than chase Hermes's index gaps, redefine "All Sources" search
in Scarf to mean filter-what-you-see — the canonical type-to-filter
UX users already expect on a list. Source-specific searches keep the
CLI shell-out for full upstream search semantics on that registry.

Implementation:
- New `lastBrowseResults` cache populated on every successful
  `browseHub()`. Setter is `internal` so the test suite can seed
  without invoking the live CLI; out-of-module callers can still
  only read.
- `searchHub()` now branches on `hubSource`. The "all" branch filters
  the cache via `localizedCaseInsensitiveContains` against name,
  description, and identifier, runs synchronously on the calling
  actor (UI invocations are already on MainActor) so the user sees
  the narrowed list without a render-tick gap.
- If the cache is empty (search-before-browse), `browseHubThenFilter`
  performs one CLI fetch, populates the cache, then applies the
  filter — failure surfaces a "Search failed" banner instead of a
  silent empty state.
- Source-specific search still shells out to
  `hermes skills search <query> --source <s> --limit 40`.

Adds five regression tests covering name match, description match,
case-insensitive folding, no-match message state, and the empty-query
fallthrough to browse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:50:52 +02:00
Alan Wizemann a6a8cae8ff fix(transport): drain ssh stdout/stderr concurrently to unwedge >64KB payloads
Issue #77 — Sessions screen rendered empty even though Dashboard
reported 161 sessions and Activity reported 116. Root cause was a
classic pipe-buffer deadlock in SSHScriptRunner: stdout was read via
`readToEnd()` AFTER the subprocess had exited. macOS pipes default to
a 16–64 KB kernel buffer; once the remote `sqlite3 -json` script wrote
more than that to its stdout, ssh back-pressured across the wire,
sshd back-pressured sqlite3, sqlite3 blocked, the script never
finished, the 30-second timeout fired, `streamScript` threw, and
`HermesDataService.sessionListSnapshot()` swallowed the failure into
an empty array. Empty Sessions list. Dashboard kept working because
its smaller LIMIT 5 payload fit under the threshold.

Why this was a v2.7 regression specifically: 20cc3a2 folded the
previously-separate sessions + previews queries into a single batched
round-trip (perf win for remote users). The new combined payload for
~150+ sessions crossed the buffer threshold for the first time.

Fix: drain stdout/stderr concurrently with the running process via
Foundation's `FileHandle.readabilityHandler`, accumulating chunks
into an NSLock-guarded `Data` buffer. The kernel pipe never fills,
the subprocess never blocks, the script returns the full payload.
Same change applied to both the SSH path (`runOverSSH`) and the
local path (`runLocally`) — they had identical bug shapes.

Adds SSHScriptRunnerTests with three regression checks: a 256 KB
synthetic payload that would have wedged pre-fix, a small-payload
sanity round-trip, and a non-zero exit propagation check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:50:34 +02:00
Alan Wizemann 6b66b1c96f perf(ios): wire v2.7 perf parity — instrument iOS-only VMs + surface hydration banner + opt-in toggle
Most of the v2.7 perf work was already covered on iOS via shared
code in ScarfCore — `RichChatViewModel.loadSessionHistory` (and
its skeleton-then-hydrate path), `hydrateAssistantToolCalls`,
`fetchSkeletonMessages`, `fetchRecentToolCallSkeleton`,
`ModelPreflight.detectMismatch`, and the `RemoteSQLiteBackend`
cancellation handler all flow through to the ScarfGo chat
unchanged. `CitadelServerTransport.streamScript` already
honors `Task.isCancelled` correctly via `withThrowingTaskGroup` +
`Task.checkCancellation()`, so the SSH-cancellation-on-nav-away
chain works on iOS without the Mac-side `SSHScriptRunner` fix.

Three iOS-specific gaps closed:

* IOSCronViewModel.load + IOSMemoryViewModel.load wrapped in
  `ScarfMon.measureAsync(.diskIO, "ios.cron.load")` /
  `"ios.memory.load"` — parity with the Mac `cron.load` /
  `memory.load` events. `ios.memory.load.bytes` records the
  payload size for the loaded file.
* iOS Settings → "Chat (Scarf)" section gains a toggle bound to
  `RichChatViewModel.loadHistoricalToolResultsKey` so iOS users
  can opt into Phase 2b bulk tool-result hydration, same as the
  Mac DisplayTab. The shared key means the gate inside
  `startToolHydration` reads the right value automatically — no
  extra plumbing needed.
* iOS ChatView surfaces `isHydratingTools` as a "Loading tool
  details…" connection banner (mirrors the Mac toolbar pill
  added in v2.7 perf work). Sits between the existing
  "Thinking…" banner and the empty-view fallback so chat status
  is always honest about what the agent and Scarf are doing.

Both Mac and iOS targets build clean; all 321 ScarfCore tests
pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:26:25 +02:00
Alan Wizemann 97ec4d2882 chore: Bump version to 2.7.0 v2.7.0 2026-05-05 20:41:39 +02:00
Alan Wizemann cd5bb32a21 release: prep v2.7.0 — consolidated notes + in-app Sparkle release notes
Rolls up everything since v2.6.5 (36 commits across remote-perf,
project wizard, dashboard widgets, OAuth resilience, ScarfMon
instrumentation, and the v2.7 skeleton-then-hydrate redesign) into
a single 2.7.0 release.

* releases/v2.7.0/RELEASE_NOTES.md — full consolidated notes,
  reorganized around the throughline (slow-remote performance) with
  five thematic sections: skeleton-then-hydrate loaders, SSH
  cancellation, project wizard + Keychain cron secrets, dashboard
  widgets, OAuth resilience, and ScarfMon. Replaces the previously-
  drafted dashboard-only v2.7.0 stub and the separate v2.8 wizard
  stub (both unreleased).
* releases/v2.8/ — deleted; folded into v2.7.
* README.md — "What's New in 2.6" → "What's New in 2.7" with the
  five-section summary linking out to the full notes.

* tools/render-release-notes.py — stdlib-only Markdown → HTML
  renderer covering the subset of GitHub-flavored markdown that
  release notes use (## / ### headings, paragraphs, ul lists,
  fenced code, inline code/bold/italic/links, hr). Output includes
  a small <style> block tuned for Sparkle's update alert WebKit
  view (light + dark variants via prefers-color-scheme).
* scripts/release.sh — render the active RELEASE_NOTES.md and
  inject the result as <description><![CDATA[...]]></description>
  on the appcast item. Sparkle's standard updater renders this in
  the in-app update sheet so users see release-specific "what's
  new" alongside the version number, not just the bare version.
  Falls back to a "see GitHub release page" placeholder when the
  notes file is missing.

User runs ./scripts/release.sh 2.7.0 to ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:31:27 +02:00
Alan Wizemann 5e23b59697 test(model-preflight): cover detect-mismatch + fix newline-trim bug
* New ModelPreflightTests suite (19 tests) covering both `check(_:)`
  and the v2.8 `detectMismatch(_:)` paths. Pins the dogfooding
  scenario (anthropic-prefixed model + nous active provider after
  Credential Pools OAuth swap), the case-insensitive prefix match,
  empty-prefix / empty-bare-model edge cases, and multi-slash model
  ids (OpenRouter style).

* Bug fix surfaced by the tests: `ModelPreflight` was using
  `trimmingCharacters(in: .whitespaces)` which doesn't strip
  newlines. A stray `\n` in a hand-edited config.yaml would either
  miss the missing-fields classifier OR false-positive the mismatch
  banner (showing "anthropic" vs "anthropic\n"). Switched both
  trims to `.whitespacesAndNewlines`.

perf(observability): instrument Tier C load paths + fetchSessionPreviews

No behavior change — adds ScarfMon coverage so future captures show
how often Memory/Skills/Cron/Curator/SessionPreviews load paths fire
and what they cost on remote (each is multiple sequential SFTP RTTs
that pre-fix were invisible). New events:

* `mac.fetchSessionPreviews` / `.rows` / `.transportError`
* `memory.load` / `.bytes`
* `cron.load` / `.jobs`
* `skills.load` / `.count`
* `curator.load` / `.bytes`

All 321 ScarfCore tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:03:35 +02:00
Alan Wizemann 09e33b2999 perf(chat,activity,transport): skeleton-then-hydrate loaders + SSH cancellation propagation
Major perf overhaul for slow-remote contexts. Chats and Activity now
render in <2s instead of timing out at 30s; abandoned SSH work is
killed within 100ms instead of pinning a ControlMaster session.

* Skeleton-then-hydrate chat loader. New `fetchSkeletonMessages`
  selects user+assistant rows only (skips role='tool', NULLs
  tool_calls + reasoning at the SQL level). Wire payload bounded by
  conversational text alone — sub-second on remote regardless of
  underlying tool result blob sizes. Background `startToolHydration`
  pages through `hydrateAssistantToolCalls` (5-id batches) to splice
  tool calls in. Tool-result CONTENT is opt-in via Settings → Display
  → "Load tool results in past chats" (default off); inspector pane
  lazy-fetches per-result via `fetchToolResult(callId:)` on expand.

* Skeleton-then-hydrate Activity loader. New
  `fetchRecentToolCallSkeleton` returns metadata-only rows in ~3 KB
  for 50 entries; placeholder ActivityRows render immediately, real
  per-call entries swap in as paged hydration completes. Loading
  pill in the page header, orange transport-error banner replaces
  the pre-fix silent empty state.

* SSH cancellation propagation. `Task.detached` and unstructured
  `Task<...> { ... }` don't inherit cancellation from awaiting
  parents — without bridging, killing a Swift Task left the ssh
  subprocess running for the full 30s deadline, pinning a remote
  sqlite query and a ControlMaster session. Wired
  `withTaskCancellationHandler` through `SSHScriptRunner.run` and
  `RemoteSQLiteBackend.query`; cancellation now reaches `Process`
  within ~100ms. New `ssh.cancelled` ScarfMon event.

* L1 single-id retry. When a 5-id `hydrateAssistantToolCalls` page
  trips the 30s timeout (one row carries an oversized tool_calls
  blob — long Edit args, big diffs), fall back to single-id queries
  to isolate the whale. Non-whale rows in the same batch hydrate
  normally; whale row stays bare. New `mac.hydrateToolCalls.singleTimeout`
  event tracks how often the recovery fires.

* L2 in-flight coalescing for `loadRecentSessions`. File-watcher
  deltas during streaming used to stack 2-3 parallel sessions-list
  reload tasks; subsequent callers now await the active one. New
  `mac.loadRecentSessions.coalesced` event tracks dedup hits.

* Loading-state UX hardening. New `isStartingSession` flag flips
  synchronously on user click so the chat sidebar greys + disables
  immediately instead of waiting for `client.start()` to return
  (5-7s on remote). Phase-typed status: "Spawning hermes acp…" →
  "Authenticating…" → "Loading session…" → "Loading history…" →
  "Ready". `ChatSessionListPane` overlays a ProgressView showing
  the current phase.

* Partial-result detection. `fetchMessagesOutcome` distinguishes a
  transport failure from a genuine empty result; `loadSessionHistory`
  surfaces "Couldn't load full chat history — connection timed out"
  through the existing acpError triplet so the user sees what
  happened instead of a silent empty transcript.

* Model/provider mismatch banner. `ModelPreflight.detectMismatch`
  recognizes when `model.default` carries a `<provider>/...` prefix
  that disagrees with `model.provider` (e.g. anthropic prefix +
  nous active provider after switching OAuth via Credential Pools).
  Banner offers one-click fix in either direction. Companion: ACP
  error classifier recognizes `model_not_found` / `404 messages`
  and surfaces "Hermes pins each session to its original model —
  start a new chat" so the pinned-model failure mode has a clear
  recovery path.

* OAuth-completion provider swap prompt. After successful OAuth in
  Credential Pools, if the just-authed provider differs from
  `model.provider` in config.yaml, surface "Switch active provider
  to <name>?" with [Switch] / [Keep current] instead of
  auto-dismissing.

All 302 ScarfCore tests pass. New ScarfMon events documented in the
Performance-Monitoring wiki page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 19:43:53 +02:00
Alan Wizemann 9f2e2ecfcd perf(chat): exclude reasoning_content from initial fetch + drop page size to 25
The 160-message thinking-model session still timed out at the 30s
ceiling even after dropping page size 200→50 in commit a193003.
ScarfMon trace:

  mac.fetchMessages    30,105,329,125 ns ← 30s timeout fired
  mac.hydrateMessages.rows  count=1     ← 1 partial row only

Root cause: `reasoning_content` is huge on thinking models (20+
KB per row). Even 50 rows × 30 KB = 1.5 MB JSON shipping over a
420ms-RTT remote SSH channel exceeds the budget. The chat
appeared empty AGAIN.

Two cuts:

1. **`messageColumnsLight`** — same as messageColumns but omits
   `reasoning_content`. Used by `fetchMessages` so the bulk
   wire payload is small. `messageFromRow` reads
   reasoning_content via `row.optionalString(at: 11)` which
   gracefully returns nil when the column isn't present, so the
   shape change is transparent.

2. **`fetchReasoningContent(for:)`** — single-row lazy fetch
   the inspector pane calls when the user expands a thinking
   disclosure. One small SSH round-trip per inspection vs. paying
   for ALL reasoning content on every session boot.

3. **`HistoryPageSize.initial` 50 → 25** — sized for the lite
   column shape with margin for sessions that include some heavy
   tool-call payloads. The "Load earlier" affordance still
   pages back through older messages.

Net effect on the user-reported case: 160-message session loads
the most-recent 25 messages in ~5-10s (one SSH round-trip ~420ms
plus ~3 KB × 25 = 75 KB wire). The remaining 135 are reachable
via Load earlier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:28:40 +02:00
Alan Wizemann 1eb5c92f6a fix(aux-tab): correct nested-YAML parser so unknown-task surface works on remote
Bug 1 — the previous parser collected every indented child under
`auxiliary:` as if it were a task name, including leaf fields
(provider, model, base_url, api_key, timeout). Result: bogus rows
on local where the parser happened to fire, plus pollution of
the unknown-tasks set with field names that subtractFrom-known
left orphaned.

Bug 2 — the flat-dot-path branch (`auxiliary.X.Y:`) was dead
code. config.yaml is always nested YAML; the dot-path form only
appears in interactive `hermes config get` output, never on
disk. Removing it.

User reported the unknown-tasks section showed on local but not
on remote. Most likely root cause: the buggy parser surfaced
junk on local (where their config has nested-form aux settings)
while the dead flat-path branch never fired on remote either,
so remote silently rendered nothing. With the parser fixed both
contexts now surface real unknown task names if any are
present.

Rewrite as a clean two-pass walker:
- First nested line inside the block locks taskIndent.
- Only collect at exactly taskIndent (skip leaf fields deeper).
- Tolerate CRLF line endings, blank lines, and YAML comments
  without resetting block state.
- Handles 2-space and 4-space indent equally.

Verified manually with four fixture shapes: 2-space, 4-space,
with-comments-and-blanks, no-aux-block. All correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:12:55 +02:00
Alan Wizemann bccaba0742 feat(acp,aux): classify resolve_provider_client errors + surface unknown aux tasks
Two fixes for the user-reported "ACP -32603 Internal error" after
removing a Nous OAuth provider while config.yaml still referenced
nous for an auxiliary task. The actual stderr was clear:

  agent.auxiliary_client: resolve_provider_client: nous requested
    but Nous Portal not configured

But Scarf's chat banner showed only the bare JSON-RPC code and
the user had no actionable path through the UI.

**ACPErrorHint.classify** now pattern-matches the
`resolve_provider_client: <name> requested but` stderr line and
extracts the provider name. Surfaces:

  An auxiliary task is configured to use `<name>` but that
  provider isn't authenticated. Open Settings → Aux Models, or
  check ~/.hermes/config.yaml for auxiliary.<task>.provider: <name>
  and switch it to your active provider (or set it to `auto`).

Routed through the existing chat-banner pipeline that already
catches OAuth revocation and missing-credentials errors.

**AuxiliaryTab** gains an "Other tasks in config.yaml" section
that surfaces aux task keys present in YAML but not in Scarf's
typed list (vision, web_extract, compression, session_search,
skills_hub, approval, mcp, flush_memories, curator). Common
case: `auxiliary.summarization.provider: nous` left over from
older Hermes versions or hand-edited configs. Each unknown task
gets a one-click "Reset provider" button that writes
`auxiliary.<key>.provider: auto` — the most-actionable fix
for the OAuth-removal failure mode. Detection scans both
flat-dot-path and nested YAML shapes so it works regardless of
how Hermes dumped the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:00:48 +02:00
Alan Wizemann 4684b9deed feat(credential-pools): OAuth remove button + auto-refresh on auth.json change
User reports the Nous OAuth provider still showed in the
credential pool after they 'removed' it, and Reload didn't help.
Two underlying bugs:

**Bug 1 — no UI path to remove OAuth providers.** The pool view
had a Re-authenticate button on each OAuth row but no remove.
Users who switched active provider thought that removed Nous;
the OAuth tokens stayed in auth.json and the row kept rendering.
Add a trash icon next to Re-authenticate that calls
`hermes auth logout <provider>` after a confirmation dialog.
ViewModel route is `removeOAuthProvider` mirroring
`removeCredential`.

**Bug 2 — view didn't refresh on external auth.json changes.**
Pool view subscribed only to .onAppear and sheet-dismiss. A
terminal `hermes auth logout` or another window's OAuth flow
left the view stale until manually re-entered. Wire up
`fileWatcher.lastChangeDate` so any auth.json mtime tick
triggers a reload (the file watcher already polls auth.json
on the remote SSH path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:46:41 +02:00
Alan Wizemann f6dc45b397 feat(scarfmon): track empty-assistant turns + document Nous quirk
User reports chats "dying" on Nous models — screenshot shows the
assistant bubble stuck with `(°□°) deliberating...` and a
1.7s turn-duration pill (turn DID complete; the content is the
problem). The literal placeholder string isn't in Scarf's source;
it's coming from Hermes or Nous itself when the model emits a
brief thought stream and then fails to produce any visible
output.

ScarfMon trace confirms the failure mode:
  mac.sendViaACP    →  firstThoughtByte (25 bytes)
  mac.handleACPEvent  ✓
  mac.sendPrompt     ✓ (1.7s, normal)
  finalizeStreamingMessage  ✓ (turn cleanly closed)

So Scarf sees no transport error — the turn finalized normally
with empty assistant text plus a small thought stream. The
visible "deliberating" text is content Hermes/Nous chose to
substitute for the missing response.

Adds `mac.emptyAssistantTurn` event (category .chatStream) that
fires whenever a turn finalizes with empty `streamingAssistantText`
and empty `streamingToolCalls`. Bytes carry the thinking-text
length so we can distinguish:
  - bytes=0: total empty turn (model produced nothing)
  - bytes>0: thoughts-only turn (model thought but didn't answer)

Both are user-visible failures. The fix is upstream — Hermes
should refuse to finalize a turn with no response and surface
an error, OR Nous should not return empty responses with the
placeholder string. Document this finding so a future capture
that shows multiple `mac.emptyAssistantTurn` events confirms
the rate / model-correlation.

For now Scarf surfaces the same UX as before (no UI change in
this commit). A follow-on commit could intercept this case and
replace the bubble with a clearer "Model returned no response"
banner, but that requires a confident heuristic for which
empty-finalize cases are real failures vs. legitimate
no-response turns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:40:21 +02:00
Alan Wizemann f2ddcbbd60 feat(model-picker): add search filter to Nous overlay model list
Nous returned 402 models in the recent perf capture (~496 KB of
JSON). The picker's existing top-bar search field already filters
the catalog list (`filteredModels`) but the Nous overlay path
showed all 402 unfiltered, making it nearly unusable.

Add `filteredNousModels` mirroring the `filteredModels` shape:
filters `nousModels` by case-insensitive substring match against
both `id` and `owned_by`. Updates the empty-state overlay so
"no matches" surfaces a different message from "no models
loaded" — the user knows the catalog is fine, the search just
didn't match.

User feedback: "we need a search in the model picker, some of
these lists are large and unorganized."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:38:30 +02:00
Alan Wizemann a193003842 fix(chat): paginate session-load + race-guard against session switch
Two related bugs from remote-context perf captures.

**Bug 1 — 30s timeout fetching the 157-message session.** The
initial page size was 200 messages. For a session including
`reasoning_content` from a thinking model, that produces enough
JSON over `sqlite3 -json | ssh` to time out at exactly 30s on a
420ms-RTT remote, returning 0 rows. Bumping queryTimeout further
just trades latency for stalls.

Drop `HistoryPageSize.initial` from 200 → 50. Sized to fit
comfortably inside the 30s queryTimeout; the existing "Load
earlier" affordance pages back through older messages on demand.

**Bug 2 — session-switch race silently swaps transcripts.** When
the user picks a small chat while a slow fetch for a different
chat is still in flight, the slow fetch finishes second and its
`messages = …` assignment overwrites the small chat's transcript.
User sees the small chat "jump back" to the big one. ScarfMon
trace: parallel `mac.fetchMessages` events at t=641870 (small,
425ms, 2 rows) and t=643316 (big, 30,028ms timeout) — last
write won.

Add a `loadingForSession` capture and three guards: after the
DB refresh, after the primary fetch, after the ACP-fork fetch.
Each compares `self.sessionId` against the captured id; on
mismatch fire `mac.hydrateMessages.dropped` and return without
assigning. Race is silent in normal usage but visible in traces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:38:19 +02:00
Alan Wizemann 93a64e3e82 fix(nous-picker): kill 120s beach-ball — dedupe readCache + 5s timeout
Two stacking bugs in the Nous-overlay branch of the model picker
caused a 120-second beach-ball on remote contexts.

**Bug 1 — duplicated readCache.** ModelPickerSheet.refreshNousModels
called `service.readCache()` directly (for instant first-paint),
then called `service.loadModels(forceRefresh: false)` which calls
`readCache()` AGAIN as its first step. Two SSH round-trips per
picker open. Drop the inline call; loadModels is already cache-first
on its happy path (returns `.cache(...)` when fresh). One read
per open.

**Bug 2 — 60s readFile timeout for a hint.** `readCache()` goes
through SSHTransport.readFile which has a 60s default timeout. On
a remote with a corrupted or oversized cache file, `cat` never
returns and we wait the full 60s — twice, due to bug 1, for a
total 120s picker stall. ScarfMon perf capture (commit 00a1bbd's
diagnostic split) localized this precisely:

  nous.readCache.fileExists  =   251 ms  ✓
  nous.readCache.readFile    = 60,011 ms   (60s timeout)

Cache is an optimization, not a requirement. Added
`readCacheWithTimeout(seconds: 5)` that races readCache against
a 5-second sleep via withTaskGroup. On timeout returns nil; caller
treats that as no-cache and falls through to the network fetch
(which succeeded in 2s in the offending capture, returning 402
models). The runaway `cat` keeps running on its own 60s transport
timeout but no longer blocks the picker.

New ScarfMon event: `nous.readCache.timeoutFired` surfaces hits
in traces so we can tell whether the timeout is being exercised
in the wild.

The underlying `cat` hang on the cache file is still unexplained;
the file size (~500KB) shouldn't take 60s on a 420ms-RTT SSH link.
For now: deleting the cache file (`rm ~/.hermes/scarf/nous_models_cache.json`
on the remote) is the workaround. The next picker open will rebuild it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:17:45 +02:00
Alan Wizemann 00a1bbd109 feat(scarfmon): split nous.readCache into fileExists/readFile/decode/bytes
Last perf capture showed nous.readCache as a single 60-second
interval — but the function does three things (transport.fileExists,
transport.readFile, JSONDecoder). Splitting the measure points so
the next capture localizes which step actually owns the wall-clock.

Adds:
- nous.readCache.fileExists (interval) — SSH `test -e` round-trip
- nous.readCache.readFile (interval) — SSH `cat` round-trip
- nous.readCache.bytes (event) — payload size of the cache file
- nous.readCache.decode (interval) — JSON parsing cost

If the next 60-second beach ball localizes to readFile, we know
the cache file is somehow huge or the SSH read is hung; if it's
fileExists, the path resolution is the issue; if decode, we have
malformed JSON. All three wear the same outer wrapper so the
existing nous.readCache total stays for trend comparison.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:07:43 +02:00
Alan Wizemann 20cc3a2985 perf(sessions): fold sessions+previews into one batched SSH round-trip
Audit Finding 1 — ChatViewModel.loadRecentSessions and
SessionsViewModel.load each fired two sequential `await
dataService.fetch*` calls (sessions + previews), paying the 420 ms
SSH RTT twice on every reload. Visible in ScarfMon traces as
back-to-back `ssh.run` intervals, totaling ~840 ms minimum
overhead per sidebar refresh.

Adds HermesDataService.sessionListSnapshot(limit:) — same shape
as the existing dashboardSnapshot, folds both queries into a
single backend.queryBatch() call. Both call sites switched.

Halves the SSH round-trips for every sidebar load. With Finding 5's
coalescing, redundant parallel reloads also become free. Together,
the 9× redundant queries-per-minute observed in baseline captures
should drop substantially.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:07:31 +02:00
Alan Wizemann 432d5b0b52 fix(remote-sqlite): bump query timeout 15s→30s + add in-flight coalescing
Two issues from the perf capture:

1. fetchMessages on a 157-message session timed out at exactly 15.06 s
   (`mac.fetchMessages` interval = 15,062,646,042 ns), then silently
   returned 0 rows. The chat appeared empty but the session had data;
   the timeout was firing before sqlite3 -json could ship the ~50KB
   payload over a 420 ms-RTT SSH link. Bumped queryTimeout to 30 s.
   The streamScript transport-level timeout still fires on truly
   wedged hosts.

2. mac.loadRecentSessions fired twice in parallel at t=960450 +
   t=960584, finishing 134 ms apart — two independent watcher ticks
   each spawning a full 3-query SSH load for the same data. Added
   in-flight request coalescing keyed on the inlined SQL text:
   when a query with the exact same SQL is already pending, second
   caller awaits the first task instead of spawning a new
   subprocess. New ScarfMon event `sqlite.query.coalesced`
   surfaces hits in traces.

Coalescing is surgical — applies to single `query` calls only,
not `queryBatch` (different timeout scaling, concurrent-same-batch
is rare). Avoids serializing independent work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:07:19 +02:00
Alan Wizemann 12e152bfea perf(ssh): replace Thread.sleep spin with kernel-wait for runLocal timeout
Audit Finding 3 — every SSH operation funnels through SSHTransport.runLocal,
which used a 100ms Thread.sleep loop while waiting for the timeout. Each
call held one cooperative-pool thread for the full timeout duration with
spin-poll overhead, AND had 100ms granularity on the deadline.

Replace with proc.terminationHandler + DispatchGroup wait — kernel-wakeup
when the process exits (or the deadline fires), no spin. Same one-thread
blocking footprint, but eliminates the per-operation spin work that
inflated query latency 60-70% under concurrent SSH load (visible in
ScarfMon as 7-second mac.loadRecentSessions outliers when sidebar reload +
chat finalize + watcher poll all fired together).

Minimum-touch fix; full async migration of runLocal documented for
follow-up. The bigger refactor would let cooperative-pool threads
park on a true async suspension during the wait, but requires
propagating async through every ServerTransport caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:06:58 +02:00
Alan Wizemann 099d73dde8 feat(scarfmon): instrument Nous model catalog + subscription path (beach-ball investigation)
User reported a remote-context beach-ball when opening the model
picker with Nous as the active provider. Existing measure points
showed loadProviders + loadModels at ~315ms each (fast). The
beach-ball must be in the uninstrumented Nous-overlay branch the
picker fires when nous is selected.

Adds four measure points covering every blocking call in that path:

- nous.subscription.loadState (interval, .diskIO) — auth.json read
  via NousSubscriptionService.loadState. Already known to do an SSH
  read; now precisely measurable.
- nous.readCache (interval, .diskIO) — nous_models cache read,
  TWO sequential SSH ops (fileExists + readFile).
- nous.bearerToken (interval, .diskIO) — auth.json read AGAIN inside
  fetchModels. **This is a duplicate read** — loadState already
  parsed the same file moments earlier. Comment-flagged as a
  caching candidate.
- nous.fetchModels (interval, .transport) + .bytes (event) — HTTP
  GET against the Nous /v1/models endpoint with the body byte count
  attached. The most likely beach-ball culprit if the endpoint is
  slow or hung.

After the next capture we'll know which of the four owns the user's
wall-clock; if `nous.bearerToken` shows up alongside
`nous.subscription.loadState` with similar duration, the duplicate
read is also a real cost worth fixing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:50:51 +02:00
Alan Wizemann 4efd84c119 feat(projects,cron): new project wizard + keychain env mirror + #75 fix
Three coordinated additions to the project surface:

1. New Project from Scratch wizard. Toolbar entry that scaffolds a
   Scarf-standard project skeleton (`<project>/.scarf/dashboard.json`
   placeholder + `AGENTS.md` marker block), registers it, opens an ACP
   chat session in the project's cwd, and auto-sends a kickoff prompt
   that activates the bundled `scarf-template-author` skill. The skill
   drives the substantive setup conversationally — widgets, optional
   config schema, optional cron, AGENTS.md content.

2. Keychain secrets mirror into ~/.hermes/.env. Cron jobs can now
   reference Keychain-backed config values via env vars named
   `SCARF_<UPPER_SLUG>_<UPPER_FIELDKEY>`. Hermes reloads .env per cron
   tick (cron/scheduler.py:897-903), so credential rotation is free.
   Source of truth stays in the Keychain — config.json keeps
   `keychain://` URIs unchanged. Mirror runs at install, post-install
   Configuration save, uninstall, "Remove from List", and on app
   launch (reconcileAll). Mode 0600 on `.env` enforced by
   LocalTransport's existing `.env` heuristic.

3. Configuration form layout recursion fix (issue #75). Per-stage
   frame sizes on `ConfigEditorSheet` triggered
   `_NSDetectedLayoutRecursion` for projects with manifest.json.
   Stabilized the outer frame at the editing stage's intrinsic size so
   transitions only swap content, never resize the container.

New services:
- `ProjectScaffolder` (Mac) — bare-shell project + AGENTS.md marker
- `SkillBootstrapService` (Mac) — copies bundled skills into ~/.hermes/skills/
- `KeychainEnvMirror` (Mac) — splice/unmirror/reconcileAll over ~/.hermes/.env
- `SecretsEnvBlock` (ScarfCore) — pure marker-block helpers

Bundled skill `scarf-template-author` v1.1.0 ships in
`Resources/BuiltinSkills.bundle/`; SkillBootstrapService copies it
into `~/.hermes/skills/scarf-template-author/` on launch (idempotent +
version-gated). The skill grew a "Using secrets in cron prompts"
section documenting the env-var convention.

Migration: launch reconciler auto-populates .env on first v2.8 launch.
Users with cron prompts authored against the old (broken) pattern need
to update them to use $SCARF_… references — see release notes.

Tests:
- SecretsEnvBlockTests: 24/24 (`swift test --filter SecretsEnvBlock`)
- KeychainEnvMirrorTests: 11/11 (`xcodebuild ... -only-testing:scarfTests/KeychainEnvMirror`)

The idempotent-mirror test caught a real bug: applyBlock's replace
path consumed the trailing newline from blockRange but didn't restore
it, breaking the no-op-when-unchanged contract that the launch
reconciler relies on. Fixed.

v2.8 RELEASE_NOTES.md committed but no release cut yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:44:23 +02:00
Alan Wizemann bd9bacb8b3 feat(scarfmon): B2 + B3 + iOS dashboard — file watcher, message hydration, dashboard load
Three areas instrumented in this batch. Both targets build clean.

B2 — Mac HermesFileWatcher (FSEvents + remote SSH poll)
- mac.fileWatcher.localFire (event) — every FSEvents change on a
  watched core or project path. High counts during streaming chats
  are normal (state.db-wal ticks per persisted message); high counts
  during idle suggest a runaway watcher install.
- mac.fileWatcher.remoteRestart (event, bytes=path-count) — fires
  once per SSH poller restart, with the union path count attached.
  Frequent restarts mean the project-list update path is churning.
- mac.fileWatcher.remoteDelta (event) — fires per non-empty change
  detected on the SSH poll. Pair with `ssh.streamScript` cadence to
  see actual poll latency.

B3 — Chat session boot + message hydration
- mac.fetchMessages (interval) + .rows (event) — bounded SQL
  fetch from HermesDataService. Catches slow paginated scrolls
  back through long sessions.
- mac.refreshSessionFromDB (interval) — RichChatViewModel's
  post-promptComplete refresh that picks up cost/token data.
- mac.hydrateMessages (interval) + .rows (event) — full session-boot
  hydration in RichChatViewModel.loadSessionHistory. Was the suspected
  trigger of the 22-bubble session-start storms in the Phase 3a
  baseline; now precisely measurable.

iOS Dashboard (resolves the original "out of sync" mystery)
- ios.loadDashboard (interval) — wraps the four dataService.fetch*
  Citadel SFTP round-trips in IOSDashboardViewModel.load().
- ios.allSessions.count (event) — sidebar list size after each
  load, correlates load latency with list growth.
- ios.dashboardRefresh.trigger (event) — fires only on
  pull-to-refresh, separates that entry path from initial appear.

**Architectural finding:** the original v2.6.0 user feedback
("chat out of sync iOS↔Mac on fast LAN") is now firmly attributable
to this — iOS does NOT subscribe to a file watcher. The dashboard
refresh path is appear-time + pull-to-refresh only.
`CitadelServerTransport.watchPaths()` is effectively dead code on
iOS today; nobody calls it. Earlier A1 instrumentation (commit
9df7142) put measure points on it, which is why captures showed
zero `ios.fileWatcher.tick` events. Future work: either add a
foregrounded poll loop to iOS, or thread the file watcher into
the dashboard subscription. Documented in the ScarfMon roadmap
memory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 23:52:11 +02:00
Alan Wizemann 96af545e66 feat(scarfmon): Tier A2/A3/B1/B4 — sessions, model catalog, dashboard widgets, image encoder
Four parallel instrumentation drops orchestrated by the perf roadmap.
All adds; no logic changes; both targets build clean.

A2 — Mac sessions list reload
- mac.scheduleSessionsRefresh (event) — every file-watcher entry into
  the debounced reload helper. Pair with mac.loadRecentSessions count
  to see how many ticks coalesce per actual reload.
- mac.loadRecentSessions (interval) — full wall-clock from DB open
  through observable assignment.
- mac.recentSessions.count (event) — sidebar list size, correlates
  list growth with reload latency.

A3 — ModelCatalogService loads
- modelCatalog.loadProviders (interval) + .providers.count (event).
- modelCatalog.loadModels (interval) + .models.count (event).
- modelCatalog.validateModel (interval) — covers loadCatalog ->
  transport.readFile, hits disk on every call.
Sync wrap (not measureAsync): the inner Task.detached body is
synchronous; the detached hop is the async boundary.

B1 — Dashboard render
- mac.dashboard.body (event) — ProjectsView body re-eval count.
- dashboard.loadRegistry (interval) — projects.json read + decode.
- widget.markdown_file.load / widget.log_tail.load /
  widget.image.load / widget.cron_status.load (intervals) —
  one per v2.7 file-reading widget. cron_status batches its two
  HermesFileService calls into one tuple-returning measure block
  so the existing two-call shape stays intact.

B4 — Image encoder
- imageEncoder.input.bytes (event) — raw input size.
- imageEncoder.downsample (interval) — full decode/resize/JPEG
  encode round trip across all three platform branches (AppKit,
  UIKit, Linux passthrough).
- imageEncoder.bytes (event) — final encoded JPEG size, lets us
  spot blowup cases.
Sync wrap: encode is nonisolated sync; using measureAsync would
require turning the function async, which is a logic change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 23:38:50 +02:00
Alan Wizemann 9df7142f49 feat(scarfmon): A1 — instrument iOS file-watcher polling cadence
Adds three measure points to CitadelServerTransport.watchPaths:

- ios.fileWatcher.tick (interval) — full poll cycle latency including
  the SSH stat round-trips. > 1500ms here is what 'out of sync' feels
  like — the channel is congested or the host is slow.
- ios.fileWatcher.delta (event) — fires only when the signature
  actually changed. Low delta/tick ratio means we can safely drop
  the 3-second cadence; high ratio means we'd just burn bandwidth.
- ios.fileWatcher.paths (event, bytes=count) — number of paths watched
  per cycle. Explains slow ticks as the project list grows.

Surgical addition; existing 3-second cadence + signature-diff logic
unchanged. With Full mode on, a few minutes of usage on LAN will
tell us empirically whether the cadence can drop to 1s — the
original v2.6.0 user feedback complained 'chat is out of sync'
between iOS and Mac on a fast LAN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 23:33:30 +02:00
Alan Wizemann 9ff9a018e7 feat(scarfmon,chat): Phase 3b — dampen finalize bursts + Thinking… status + wider loadConfig stack
Three targeted fixes from the Phase 3a baseline.

Bubble-burst dampening (Phase 3b-1):
- RichChatViewModel.finalizeStreamingMessage wraps both the
  streaming-id rewrite and the empty-finalize remove() in a
  no-animation Transaction. The id flip from 0 → permanent value
  was the load-bearing trigger of the 5–8 RichMessageBubble.body
  fires we were seeing 1–2 ms after every `finalizeStreamingMessage`
  interval; SwiftUI ran an animated diff against neighbors and
  re-evaluated their bodies. The new message is content-equal to
  the streaming one — there is no animation worth running.

Thinking… status promotion (Phase 3b-2):
- RichChatViewModel exposes `isStreamingThoughtsOnly` — true while
  a turn is in flight, has emitted thought-stream bytes, and has not
  yet produced any visible assistant text. The Phase 3a baseline
  showed this is where most of the user-perceived "feels slow" lives:
  reasoning models commonly take 3–8 s before producing visible
  output, and Scarf surfaced no specific signal during that window.
- Mac ChatView.displayedStatus promotes the toolbar pill to
  "Thinking…" when the flag is true.
- iOS connectionBanner gains a transient "Thinking…" strip with
  spinner, same trigger condition.

Phase 3a fix-up:
- HermesFileService.loadConfig stack-trace logging widened from
  one frame to a 10-frame window prefixed with "#N", so the actual
  caller is visible past inlined ScarfMon wrappers (the prior log
  surfaced ScarfMon.measure itself, not the loadConfig caller).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 23:14:03 +02:00
Alan Wizemann 0a4f8de492 feat(scarfmon): Phase 3a — diagnostic measure points for chat-render bursts
Adds four targeted measure points so the next baseline capture can
attribute the bubble-re-render storm and the slow sendPrompt to a
specific cause:

- mac.RichChatMessageList.body — distinguishes "the parent is
  re-issuing the ForEach" from "the bubbles are re-rendering on their
  own". If list.body fires once and bubble.body fires N times, churn
  is in the bubbles; if list.body fires N times, the ForEach itself
  is being rebuilt.
- finalizeStreamingMessage (interval) — pinpoints the end-of-stream
  burst trigger. The 20-bubble re-eval burst we saw at the close of
  each turn lines up with this call; measuring it surfaces whether
  it's the streaming-id rewrite, the turn-duration assignment, or
  something downstream.
- firstByte / firstThoughtByte (event) — fires once per turn on the
  first chunk after currentTurnStart is set. Splits user-tap → first
  byte (network + Hermes thinking, the dominant component of the 7-11s
  sendPrompt) from first byte → turn end (Scarf streaming render).
- loadConfig caller hint via os.Logger — when ScarfMon is in Full mode,
  logs the first stack frame above each loadConfig call to the
  com.scarf.mon subsystem so mystery callers (the read at t=264282
  with no apparent trigger in the prior baseline) become traceable
  via `log stream`. Symbol-only, no PII, free outside Full mode.

All four are pure additions — no behavior change, same zero-cost
default-off semantics as Phase 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:47:29 +02:00
Alan Wizemann 3126c34561 feat(scarfmon): chat + transport + sqlite measure points (Phase 2)
Wires ScarfMon measure points into the chat hot path on both targets,
plus the underlying SSH transport and remote-SQLite backend. All
callsites are surgical adds — no behavior change. Cost when ScarfMon
is in `.signpostOnly` (default) is one os_signpost emit per call,
elided by the runtime outside an Instruments session. In `.full` mode
the same callsites also push samples into the in-memory ring buffer.

Render counters (event):
- mac.ChatView.body / ios.ChatView.body — full transcript pane re-evals
- mac.RichMessageBubble.body / ios.MessageBubble.body — per-bubble re-evals

Stream + session (event + interval):
- mac.sendViaACP, mac.sendPrompt — user tap → first-byte
- mac.acpEvent, mac.handleACPEvent — per-event delivery + handle cost
- mac.startACPSession — session boot
- ios.send, ios.startResuming — same shape on iOS
- ios.acpEvent, ios.handleACPEvent — same per-event split on iOS

Transport + SQLite (interval, with byte counts on rows):
- ssh.streamScript (Citadel iOS) — SSH round-trip
- ssh.run (SSHScriptRunner Mac) — SSH round-trip
- sqlite.query, sqlite.queryBatch — Remote SQLite per-call
- sqlite.query.rows — row count + stdout bytes per query

Disk I/O (interval):
- diskIO.loadConfig — config.yaml read + parse
- diskIO.loadCronJobs — cron jobs.json decode

Body counters use the `let _: Void = ScarfMon.event(...)` pattern at
the top of `body` — works inside `@ViewBuilder` and fires on every
re-eval, which is exactly the signal we want.

To use:
  Mac: Settings → Advanced → Performance Diagnostics → Full
  iOS: Settings → Diagnostics → Performance → Full
Both panels auto-aggregate by (category, name), surface top 20 by
p95, and offer Copy as JSON for sharing in feedback threads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:18:06 +02:00
Alan Wizemann 6cf59c8a44 feat(scarfmon): perf instrumentation plumbing for iOS + Mac (Phase 1)
ScarfMon lands the always-on perf instrumentation harness. Phase 1 ships
the plumbing only; Phase 2 wires the chat measure points.

Core (ScarfCore/Diagnostics/):
- ScarfMon — public API: measure / measureAsync / event with @inline(__always)
  short-circuit when the backend set is empty so the off path is one
  branch + return. Categories are an enum, names are StaticString so
  user content cannot leak through metric tags.
- ScarfMonRingBuffer — fixed-capacity (4096) lock-protected ring; one
  os_unfair_lock per record; summary() aggregates by (category, name)
  with nearest-rank p50/p95; exportJSON() emits a one-line-per-sample
  dump for the Copy as JSON button.
- ScarfMonSignpostBackend — emits os_signpost into a dedicated
  com.scarf.mon subsystem so Instruments → Points of Interest shows
  Scarf's own measure points without a debug build.
- ScarfMonLoggerBackend — Logger(.debug) sink for users running
  `log stream --predicate 'subsystem == \"com.scarf.mon\"'`.
- ScarfMonBoot — three modes (off / signpostOnly / full); persists the
  user's choice in UserDefaults under ScarfMonMode; configure() is
  idempotent and replaces the active backend set atomically.

Tests: 11 cases covering ring ordering / wrap / reset, summary
aggregation, p95 percentiles, event vs interval semantics, install /
isActive, measure + measureAsync (including the throw path), boot
mode transitions, and JSON export round-trip. @Suite(.serialized)
because the suite mutates process-wide backend state.

App wiring:
- ScarfIOSApp.init + ScarfApp.init call ScarfMonBoot.configure(mode:)
  with the persisted mode (default .signpostOnly).
- iOS Settings → Diagnostics → Performance row leads to a list-style
  panel with the segmented mode picker, top-20 stat rows by p95, Copy
  as JSON, and Reset.
- Mac Settings → Advanced gains a ScarfMonDiagnosticsSection with the
  same shape (NSPasteboard for copy).

Open-source by design — no remote upload, no analytics. The ring buffer
never leaves the device unless the user explicitly taps Copy as JSON.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:08:21 +02:00
Alan Wizemann 272da6a915 fix(transport,widgets): code-review fixes for v2.7 + iOS Citadel transport
- CronStatusWidgetView: include jobId + lineCount in `.task(id:)` so widget reload fires when dashboard.json changes either field, not only when the file watcher ticks
- CitadelServerTransport.runScript: enforce the timeout via withThrowingTaskGroup race; propagate transport-level Citadel errors as TransportError.other (so RemoteSQLiteBackend.query maps them to BackendError.transport instead of misclassifying as BackendError.sqlite via a fake -1 exit code); throw TransportError.timeout on the deadline branch with partial stdout preserved
- SSHScriptRunner: close fileHandleForReading on stdout/stderr Pipes in the timeout branch (success path already did); check Task.isCancelled inside the busy-wait so a cancelled parent task terminates the subprocess early instead of waiting out the full timeout. Both runOverSSH and runLocally fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 21:40:07 +02:00
Alan Wizemann c7bcfd8655 feat(dashboards): v2.7 widget catalog — file-reading widgets, sparkline, typed status, project-wide watch
Major project-dashboard release. Five new widget types (markdown_file, log_tail,
cron_status, image, status_grid), inline sparkline on stat, typed status enum
shared by list + status_grid, structured WidgetErrorCard, and a project-wide
.scarf/ directory watch that picks up files cron jobs write next to dashboard.json.

- ProjectDashboard: extend DashboardWidget with path/lines/jobId/cells/gridColumns/sparkline; add StatusGridCell + ListItemStatus (lenient parse with synonyms)
- HermesFileWatcher: watch each project's .scarf/ dir alongside dashboard.json (local FSEvents + remote SSH mtime poll); updateProjectWatches signature now takes dashboardPaths + scarfDirs
- New widget views: CronStatus, Image, LogTail, MarkdownFile, StatusGrid, plus WidgetErrorCard for structured failure messaging; legacy "Unknown" placeholder replaced everywhere
- WidgetPathResolver: project-root-anchored path resolution that rejects absolute paths + ".." escapes pre and post canonicalization
- Stat widget gains optional inline sparkline (pure SwiftUI Path, no Charts dep); list widget rows route through typed status with semantic icons + ScarfColor tints
- iOS list widget + unsupported card adopt typed status + warning-toned error card (parity with Mac error styling); new widget types remain Mac-only
- Site mirror: widgets.js renders all five new types (file-reading widgets show annotated catalog placeholders), sparkline SVG, status-grid grid; styles.css adds typed-status palette + error-card + sparkline + grid styles
- Catalog validator: tools/widget-schema.json is the single source of truth; build-catalog.py loads it and enforces per-type required fields. 8 new test cases in test_build_catalog.py covering schema load, v2.7 additions, and missing-required rejection
- Template-author skill (SKILL.md) gains v2.7 Widget Catalog section + canonical status guidance; CONTRIBUTING.md points authors at widget-schema.json; template-author bundle rebuilt
- Localizable.xcstrings picks up auto-extracted strings for the previously-shipped OAuth keepalive feature
- Release notes drafted at releases/v2.7.0/RELEASE_NOTES.md

Backwards compatible — existing dashboard.json renders byte-identically, status synonyms (ok/up/down/active/etc.) keep working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 21:16:29 +02:00
Alan Wizemann 9d945150e0 fix(chat): suppress 'stop' badge in metadata footer for normal turn ends
Every text-bearing assistant turn finalizes with `finishReason="stop"`
(set by `RichChatViewModel.finalizeStreamingMessage` line 881 — the
standard end-of-turn signal Hermes/ACP/OpenAI all emit). The
`metadataFooter` in `RichMessageBubble` was rendering it
unconditionally, so every assistant bubble carried a `· stop · TIME`
footer. Combined with terse model output (e.g. deepseek-v4-flash
emitting only a brief status line before ending the turn), the
badge created a misleading "the agent gave up" impression — there
was no warning, error, or actual failure.

Match the convention used by ChatGPT, Claude.ai, Cursor, etc.:
suppress the badge for normal end-of-turn (`stop` / `end_turn`),
reserve it for abnormal terminations the user actually wants to
see (`max_tokens`, `length`, `error`, `refusal`, `content_filter`,
…). When it does render, color it with severity tone — warning
yellow for "response cut short" cases, danger red for failures
and refusals, muted otherwise.

The existing `handlePromptComplete` system-message-injection path
(line 725-751) for non-`end_turn` stops still surfaces those cases
explicitly at the top of the chat — this change only trims the
always-on badge from the per-message footer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:40:31 +02:00
Alan Wizemann fa15634381 fix(oauth-keepalive): drop unsupported --silent flag from cron create
`hermes cron create` only accepts --name, --deliver, --repeat,
--skill, --script, --workdir. The `silent: Bool?` field on
HermesCronJob exists in the JSON model but isn't exposed through
the CLI's create verb today — argparse rejected the unknown flag,
non-zero exit, toggle failed with the generic CLI hint.

Drops the flag; the keepalive runs with Hermes's default delivery.
Token-refresh side effect during session boot is unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:33:25 +02:00
Alan Wizemann 3271391506 fix(chat): debounce sidebar reloads so sessions list doesn't flicker mid-stream
ChatView's `.onChange(of: fileWatcher.lastChangeDate)` fired an
unconditional `Task { await viewModel.loadRecentSessions() }` on
every file-watcher tick. During an ACP message stream the watcher
fires 5–10 times per second (every message Hermes persists bumps
`state.db-wal`'s mtime), and each spawned task re-fetched sessions +
previews + project attribution and reassigned `recentSessions` even
though the data was identical. Each reassignment triggered an
@Observable re-render of the chat sidebar; the user saw the chats
list visibly disappear and reappear several times while typing the
first message in a new chat.

Two changes:

* Add `scheduleSessionsRefresh()` to ChatViewModel — coalesces rapid
  ticks into one trailing `loadRecentSessions()` ~500 ms after the
  last tick. ChatView's onChange now calls this instead. The 500 ms
  window is short enough that idle external changes (a session
  created from another `hermes` invocation, a rename from a
  different window) still appear "soon", and long enough to absorb
  a streaming-response burst.
* Add an explicit `await loadRecentSessions()` to
  `autoStartACPAndSend` after the new session id resolves — the
  debounce would otherwise delay the just-created chat from
  appearing in the sidebar by 500 ms after first send. Mirrors what
  `startACPSession` already does at line 619 for the explicit New /
  Resume paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:56:59 +02:00
Alan Wizemann 5afd391838 feat(sidebar): promote Projects to first section + move profile chip under server name
Two small UX tweaks to the macOS sidebar:

* Reorder sections so Projects is the top section above Monitor.
  Reflects how users actually start sessions in Scarf — they pick a
  project first, then drill into chat / sessions / etc. The previous
  order put the read-mostly Dashboard at the top, which made
  Projects feel like a secondary surface.
* Move the active-profile chip out of the top header HStack (where
  it competed for horizontal space with the server-name pill) and
  drop it into a second row right-aligned under the server name.
  Top row stays clean: `[icon] Scarf       <server>`. Second row:
  `                              profile: <name>` only on local
  contexts. Same click target, same .help, just better-anchored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:37:29 +02:00
Alan Wizemann 2a368a04f7 feat(window): persist window size + position across app launches
SwiftUI's WindowGroup exposes `.defaultSize` and `.windowResizability`
but no built-in autosave for window frame across launches. The
documented escape hatch is AppKit's
`NSWindow.setFrameAutosaveName(_:)`, which writes the frame to
UserDefaults on resize/move and restores it on next open.

Add a small `WindowFrameAutosave` NSViewRepresentable that finds its
hosting NSWindow on first appear and stamps the autosave name. Apply
it to `ContextBoundRoot` keyed off `context.id` so each open server
window remembers its own geometry. New servers fall back to the
WindowGroup's `.defaultSize(1100, 700)` until the user resizes once.

A previous WIP attempt (dd4a61f) tried to use a fictional
`.windowFrameAutosaveName(...)` SwiftUI modifier that doesn't exist —
which is why it was never merged. This works because we go through
AppKit directly.

Also picks up Xcode's auto-extracted cron-related Localizable.xcstrings
entries that had been pending.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:34:08 +02:00
Alan Wizemann 9aa901a286 fix(credential-pools): refresh view after OAuth sheet dismiss
The sheet auto-closes 0.8s after `oauthFlow.succeeded` flips, but
the parent view didn't reload — so the expiry badge stayed red and
the `tokenTail` stayed stale until the user hit Reload. Hook
`viewModel.load()` + `probeKeepalive()` into the sheet's
`onDismiss` so the freshly-written `auth.json` lands on screen
immediately. Runs on every dismiss (success or cancel) — `load()`
is cheap and idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:33:22 +02:00
Alan Wizemann 111fe9bb67 feat(oauth): unblock remote re-auth + daily keepalive to prevent expiry
Two related fixes for OAuth subscriptions (Nous Portal, Anthropic
Claude OAuth, etc.):

- **Remote re-auth stall**: Both `NousAuthFlow` and
  `OAuthFlowController` set `PYTHONUNBUFFERED=1` only on local
  contexts. On remote, setting `proc.environment` only affects the
  local-side ssh process — not the remote python interpreter. ssh
  doesn't forward arbitrary env vars without `SendEnv` configured on
  both sides, so remote hermes ran with default block-buffered stdout
  and the device-code prompt never reached Scarf — the sheet hung at
  "Contacting Nous Portal" forever. Fix: when remote, wrap the
  command in `env PYTHONUNBUFFERED=1 …` to inject the var on the
  remote side regardless of ssh config.
- **Daily keepalive**: Hermes refreshes OAuth access tokens on agent
  startup but never proactively. If the user goes longer than the
  refresh-token lifetime (~30 days for Nous) without starting a
  session, the refresh token itself expires and full re-auth is
  required. New `OAuthKeepaliveCronService` registers a Scarf-owned
  daily cron job (`[scarf:oauth-keepalive] OAuth token refresh`) at
  4am that runs a minimal one-token prompt — booting the session is
  what triggers `resolve_nous_runtime_credentials()`. Wired as an
  opt-in toggle in the OAuth providers section of CredentialPoolsView.
  When `hermes auth refresh <provider>` lands upstream we'll swap the
  prompt for that verb; the surrounding wiring stays unchanged.
- **Stale-refresh nudge**: `NousSubscriptionState` gains
  `daysSinceLastRefresh()` + `hasStaleRefresh` (>= 14 days, half of
  Nous's 30-day refresh-token window). The keepalive section
  surfaces an inline orange warning when stale and the toggle is
  off — points the user at the toggle that would have prevented the
  problem.

Verification: scarfCore 263/263; Mac app builds clean. Manual repro
of remote stall against Digital Ocean droplet pending user test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:32:06 +02:00
Alan Wizemann 6191c9f19f fix(remote-backend): pre-expand ~/ in Swift via resolvedUserHome
The previous fix (b8b426e) rewrote `~/.hermes/state.db` to
`"$HOME/.hermes/state.db"` and relied on the remote shell to expand
$HOME. That works on Mac SSHTransport (login shell with $HOME set in
the environment) but not reliably through Citadel's exec channel +
base64-decode + inner-/bin/sh pipeline on iOS — the user reports
"unable to open database \"~/.hermes/state.db\"" connecting from
ScarfGo (iOS Simulator) to 127.0.0.1, meaning the literal `~`
character reached sqlite3 untouched.

Switch to client-side expansion: probe remote $HOME once at
RemoteSQLiteBackend.open() via the existing
ServerContext.resolvedUserHome() helper (which uses transport.runProcess
to `echo $HOME` — same code path Hermes CLI calls already exercise
successfully on iOS). Cache the result. quoteForRemoteShell then
substitutes `~/` with the absolute path in Swift before single-
quoting, so sqlite3 receives `/Users/alan/.hermes/state.db` directly
— no nested-shell expansion required.

Falls back to the previous "$HOME/..."-quoted form when the home
probe fails (rare; covers the case where runProcess can't reach the
remote but the user happens to have a working streamScript path).

Mirrors how RemoteBackupService.expandTilde already handles the same
problem upstream.

Refs #74

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:40:33 +02:00
Alan Wizemann b8b426ed75 fix(remote-backend): expand ~/ to $HOME so sqlite3 finds the DB
Default-config remotes (Hetzner, Digital Ocean, anything where the
user hasn't overridden remoteHome on the SSHConfig) have
`paths.stateDB == "~/.hermes/state.db"`. The streaming backend was
single-quoting that path, which suppresses tilde expansion, and
sqlite3 itself doesn't expand `~` (that's a shell affordance). Result:
"Error: unable to open database \"~/.hermes/state.db\": unable to open
database file" — the path was reaching sqlite3 with a literal `~`
that it tried to interpret as a directory name.

Replace the single-quote-only `escape(_:)` with `quoteForRemoteShell(_:)`
that mirrors `SSHTransport.remotePathArg`'s pattern: rewrite leading
`~/` to `"$HOME/..."` (double-quoted so the shell expands `$HOME`,
backslash-escaping any embedded `\\`, `"`, `$`, ` to keep the literal
intact), bare `~` to `"$HOME"`, and absolute paths get the standard
single-quote-with-`'\''`-escape treatment.

Adds a regression test (`openWithDefaultTildeHomeExpands`) that
exercises the tilde-rewrite end-to-end against a real /bin/sh: places
a fixture state.db at `~/.hermes/state.db` (backing up the user's
real DB if present) and verifies open() + a query both succeed
through the streaming path.

Refs #74

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:34:20 +02:00
Alan Wizemann 593b4e62cb feat(remote): replace SQLite snapshot pipeline with SSH query streaming
The remote-DB pipeline pulled the entire state.db down via scp on
every refresh tick. For the issue #74 user (4.87 GB DB) that meant
~7-min per-snapshot wall time even with the size-aware-timeout fix,
~30 GB/hour upload, and data permanently 5–10 minutes stale. This
isn't a bug to patch — it's the wrong architecture for any non-trivial
remote DB.

Replace it with per-query streaming over SSH. Each SQL statement
becomes one ssh round-trip running `sqlite3 -readonly -json` against
the live remote DB. ControlMaster keeps the channel warm at ~5 ms
overhead; sqlite3 cold-start adds ~30–50 ms; total ~50–100 ms per
query vs. the old multi-minute snapshot. Bandwidth scales with query
result size, not DB size.

What changed:

* New `HermesQueryBackend` protocol and two implementations:
  `LocalSQLiteBackend` (libsqlite3 in-process — local performance
  unchanged) and `RemoteSQLiteBackend` (sqlite3 over SSH per query
  with batched-statement support for multi-query view loads).
* `SQLValue` and `Row` types as the typed boundary between backends
  and the row parsers. `SQLValueInliner` substitutes `?` placeholders
  with SQLite-escaped literals for the remote-CLI codepath (local
  backend keeps real `sqlite3_bind_*`).
* `ServerTransport` swaps `snapshotSQLite` + `cachedSnapshotPath` for
  `streamScript(_:timeout:)`. SSHTransport delegates to the existing
  `SSHScriptRunner`; CitadelServerTransport (iOS) base64-encodes the
  script + decodes remotely via Citadel's exec channel since stdin
  pipes aren't supported there yet.
* `HermesDataService` becomes a thin facade — every fetch* method
  routes through `backend.query(...)`. Public API is unchanged for
  view-model callers; `lastSnapshotMtime`/`isUsingStaleSnapshot`/
  `staleAge` removed (had zero UI consumers).
* New `dashboardSnapshot()` and `insightsSnapshot(since:)` batched
  calls turn Dashboard's 4-query and Insights' 5-query view loads
  into one SSH round-trip each (~80–100 ms total instead of ~280 ms
  naive). DashboardViewModel and InsightsViewModel updated to use
  them.
* One-time launch migration in `scarfApp` wipes the orphaned
  `~/Library/Caches/scarf/snapshots/` directory (could be 5 GB+ for
  the issue #74 user).

JSON parsing detail: sqlite3 -json preserves SELECT column order in
the raw bytes, but `[String: Any]` from NSJSONSerialization doesn't.
The remote backend extracts column ordering by walking the first
object's literal bytes — without this, every positional row read
(`row.string(at: 0)`) would silently return wrong columns.

Tests: 41 new across `SQLValueInlinerTests`, `HermesDataServiceBackendTests`
(mock backend) and `RemoteSQLiteBackendTests` (integration via local
sqlite3 binary). Full suite 262/262 passing.

Builds clean on Mac and iOS. Ships as part of v2.7.

Refs #74

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 13:09:06 +02:00
Alan Wizemann de36411a8d fix(remote): size-aware snapshot timeouts and partial-file cleanup (#74)
The remote-DB snapshot pipeline was hardcoded to a 120s scp timeout and
a 60s remote-backup timeout. For users with a multi-GB state.db (the
report cites 4.87 GB), 120s is wildly insufficient — at typical home
upload speeds (5-50 Mbps) a 5GB transfer takes 13 minutes to several
hours. scp gets killed mid-transfer, leaves a partially-written .db at
the cache path, and every subsequent attempt opens that corrupt file
with sqlite_open returning garbage. Symptom: SSH connects, all
diagnostics pass, but Dashboard / Sessions / Memory show no data.

Changes to SSHTransport.snapshotSQLite:

* Probe `stat` on the remote DB before starting. Drives both the
  timeout budget and a local-disk-space pre-flight (refuses to start
  if local Caches volume can't hold size + 500MB margin).
* Adaptive timeouts based on remote size:
  - backup: 60s base + 1s per 100MB, capped at 600s.
  - scp:    300s base + 0.5s per MB (≈2 MB/s minimum throughput),
            capped at 3600s.
  Defaults of 60s/300s when stat fails (still up from 120s on scp).
* Add `-C` to scp args. SQLite DBs have lots of zero-padded empty
  pages and typically compress 30-50% in transit.
* On any failure path, remove the partial local snapshot file so the
  next attempt starts fresh instead of opening a corrupt DB.
* Rewrite the generic "Command timed out after Ns" error into a
  specific "Snapshot transfer timed out after Ns pulling X.X GB
  state.db from <host>" so users on slow links know what hit the
  wall instead of seeing a meaningless number.

Cannot reproduce locally (no 5GB state.db on hand), but the failure
mode is unambiguous from code reading: hardcoded 120s vs. real-world
multi-GB transfer durations.

Closes #74

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 11:25:38 +02:00
Alan Wizemann 6a7ac21ebe chore: Bump version to 2.6.5 v2.6.5 2026-05-03 22:15:05 +02:00
Alan Wizemann 5be67282d8 test(layer-b): full Install → Configure → Open → Uninstall journey XCUITest (#73)
Closes the deferred Layer B install-drive that v2.7's smoke test
left as future work. The new test
(`testFullCatalogToInstallToDashboardJourney`) drives the full
install/uninstall pipeline end-to-end and validates 9 assertion
points along the way:

- Window surfaces under `--scarf-test-mode`
- Sidebar navigation to Projects
- Install sheet appears (URL handoff via launch arg)
- Parent-dir field accepts custom path + Continue
- Configure sheet renders + commit clicks
- Confirm Install runs the install pipeline
- Open Project advances to success view
- Project row appears in sidebar with uniquified name
- Right-click Uninstall + confirm Remove + Done removes the row

Runs in ~30s green on the dev Mac.

## What needed wiring up

**SwiftUI Menu / NSToolbarItem accessibility-bridging.** macOS
toolbar Menus don't propagate `.accessibilityIdentifier` through to
XCUITest — neither the menu trigger NOR the popup contents are
queryable by ID. Verified by tree-dump diagnostics. The test
sidesteps this entirely by routing the install URL through a new
`--scarf-test-install-url <https-url>` launch arg that calls
`TemplateURLRouter.shared.handle(scarf://install?url=...)` at App
init, gated on `TestModeFlags.shared.isTestMode`. Production
launches (no flag) untouched.

**Accessibility IDs added** on the new install/uninstall path:
- `templateConfig.commitButton`, `templateConfig.cancelButton`
- `projects.row.<name>`, `sidebar.section.<rawValue>`
- `projects.contextMenu.uninstallTemplate`
- `templateUninstall.confirmRemove`
- `templateInstall.success.openProject`
- `templateUninstall.success.done`

**Sandboxed-runner caveat.** The XCUITest runner's `/tmp` is
sandbox-protected (createDirectory throws EPERM); we use
`NSTemporaryDirectory()` which resolves to the runner's container
tmp (`~/Library/Containers/com.scarfUITests.xctrunner/Data/tmp/`),
which the unsandboxed Scarf app can read since it has full disk
access.

## Known cohabitation hazard (pre-existing uninstaller bug)

If the dev Mac already has a project from the same template
installed, the install pipeline uniquifies the new project's name
("HackerNews Daily Digest 2") but BOTH projects' cron jobs get
registered under the same `[tmpl:awizemann/hackernews-digest] Daily
HN digest` name. `ProjectTemplateUninstaller.loadUninstallPlan`
resolves cron jobs to remove by NAME and can target the wrong
project's job. The Layer B test surfaces this — manifests as: test
passes, the dev's real project's cron job disappears.

**Fix (separate work):** store cron-job IDs in
`<project>/.scarf/template.lock.json` at install time and resolve
by ID at uninstall time. Until then, the test docstring warns
about cohabitation; recovery is `hermes cron create` to recreate
the lost job.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 22:09:50 +02:00