Commit Graph

408 Commits

Author SHA1 Message Date
Alan Wizemann a8cdb3e663 feat(ios): v0.13 read-only catch-up — goal pill, queue chip, Kanban diagnostics, Curator archived, Platforms (WS-9)
Mirrors the v0.13 surfaces from WS-2 (Persistent Goals + ACP /queue),
WS-3 (Kanban diagnostics + hallucination gate), WS-4 (Curator archive),
and WS-5 (Google Chat platform + cross-platform allowlists + behavior
toggles) onto ScarfGo. Per Phase H precedent, every iOS surface is
strictly read-only — write verbs (Verify / Reject, /goal --clear, queue
send, allowlist editing, archive Restore / Prune) live on Mac in v2.8.0
and are deferred to v2.8.x.

Five iOS additions, all capability-gated so pre-v0.13 hosts see the
v2.7.5 layout unchanged:

1. Chat — goal pill ("Goal: <text>") and queue chip ("N queued") render
   inside `projectContextBar` whenever a project, goal, or queue is
   present. The bar is no longer project-only; goal/queue chips render
   even outside a project chat. Goal text scales with Dynamic Type
   (semantic `.subheadline`); the full untruncated text rides VoiceOver
   via the chip's accessibility label.
2. Kanban — `ScarfGoKanbanDetailSheet` gains a `retries: N` chip in the
   header `FlowLayout`, a yellow "Worker-created — verify on Mac" badge
   for `pending` hallucination state, a red "Auto-blocked" banner with
   the server-supplied `auto_blocked_reason`, and tappable diagnostics
   chip-lists (task-level + per-run) that present a new
   `DiagnosticDetailSheet` with kind / severity / message / timestamp.
   No Verify or Reject buttons; the badge copy points users to the Mac
   app.
3. Curator — `CuratorView` appends a read-only "Archived" section that
   loads via `viewModel.loadArchive()` on appear and pull-to-refresh.
   Per-row name + category badge + reason + archived-at + size; footer
   signposts users to the Mac app for Restore / Prune.
4. Settings → Platforms — adds a Google Chat status row (configured /
   not configured), busy-ack and restart-notification rows summarized
   across `gatewayPlatforms` (yes / no / mixed (N platforms)), and
   collapsed DisclosureGroups for allowed channels / chats / rooms with
   monospaced "platform: id" entries when expanded. No editor.
5. Settings — green "v0.13 features active" `ScarfBadge` above the
   quick-edits section when `caps.isV013OrLater`. Tap presents a new
   `V013FeaturesSheet` listing the six v0.13 surfaces with one-sentence
   summaries; the section footer is explicit that editing lives on Mac.

Implements WS-9 of Scarf v2.8.0 (Hermes v0.13.0 catch-up).
Plan: scarf/docs/v2.8/WS-9-ios-v0.13-plan.md (on coordination/v2.8.0-plans).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 19:25:16 +02:00
Alan Wizemann 441d11404f Merge remote-tracking branch 'origin/ws-8-ux-v0.13' into integration/v2.8.0
# Conflicts:
#	scarf/scarf/Features/Chat/Views/ChatTranscriptPane.swift
#	scarf/scarf/Features/Chat/Views/SessionInfoBar.swift
2026-05-09 19:12:15 +02:00
Alan Wizemann 6e8480411a Merge remote-tracking branch 'origin/ws-7-settings-v0.13' into integration/v2.8.0
# Conflicts:
#	scarf/Packages/ScarfCore/Sources/ScarfCore/Models/HermesConfig.swift
#	scarf/Packages/ScarfCore/Sources/ScarfCore/Parsing/HermesConfig+YAML.swift
2026-05-09 19:11:29 +02:00
Alan Wizemann 3a764e81e0 Merge remote-tracking branch 'origin/ws-6-providers-v0.13' into integration/v2.8.0
# Conflicts:
#	scarf/Packages/ScarfCore/Sources/ScarfCore/Models/HermesConfig.swift
#	scarf/Packages/ScarfCore/Sources/ScarfCore/Parsing/HermesConfig+YAML.swift
2026-05-09 19:10:43 +02:00
Alan Wizemann 6e90741a17 Merge remote-tracking branch 'origin/ws-5-gateway-v0.13' into integration/v2.8.0 2026-05-09 19:09:38 +02:00
Alan Wizemann 93a3b40a67 Merge remote-tracking branch 'origin/ws-4-curator-archive' into integration/v2.8.0 2026-05-09 19:09:38 +02:00
Alan Wizemann 52f0ddb36c Merge remote-tracking branch 'origin/ws-3-kanban-v0.13' into integration/v2.8.0 2026-05-09 19:09:38 +02:00
Alan Wizemann cedee04f2a feat(kanban): v0.13 diagnostics + recovery UX (WS-3)
Layers Hermes v0.13's reliability + recovery affordances on top of the
v2.7.5 Kanban v3 board. New surface — gated end-to-end on
`HermesCapabilities.hasKanbanDiagnostics` (>= v0.13.0):

- **Hallucination gate.** Worker-created cards land in `pending` until
  the user verifies the underlying work exists. Inspector renders a
  yellow Verify / Reject banner above the body; cards dim to 0.6 with
  a question-mark glyph. Verify is optimistic — banner clears
  immediately, polling confirms. Reject routes through
  `comment` + `archive` so there's an audit trail.
- **Generic diagnostics engine.** `HermesKanbanDiagnostic` (new model +
  typed-mirror enum `KanbanDiagnosticKind`) renders cross-run signals
  on the inspector header and per-run signals under each Runs row.
  Card footer gains a stethoscope dot when any signal is attached.
- **`max_retries` create-time field + inspector chip.** Toggle-gated
  Stepper in the create sheet sends `--max-retries N`; chip on the
  inspector header reads it back read-only with a tooltip explaining
  there's no update verb.
- **Multi-line title input.** Create sheet's title becomes a
  `TextField(axis: .vertical, lineLimit: 1...4)`. Newlines are stripped
  client-side on pre-v0.13 hosts (which truncate at the first `\n`).
- **Auto-blocked reason banner.** When `task.auto_blocked_reason` is
  set, replaces the generic "Last run: blocked" with a red banner
  rendering the server reason verbatim. Card footer shows a 1-line
  truncated copy in red.
- **Tolerant decode contract.** Every new field is `Optional` with
  `decodeIfPresent`; diagnostics arrays use `try?` so a single
  malformed entry doesn't poison the row. v0.12 hosts decode unchanged.

Implements WS-3 of Scarf v2.8.0 (Hermes v0.13.0 catch-up).
Plan: scarf/docs/v2.8/WS-3-kanban-v0.13-plan.md (on
coordination/v2.8.0-plans).

TODOs marked inline pending integration against a live v0.13 binary:
WS-3-Q1 (verify verb name), WS-3-Q2 (diagnostics envelope vs task),
WS-3-Q4 (failure_count placement), WS-3-Q5 (darwin-zombie kind
string), WS-3-Q6 (max_retries default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 19:06:38 +02:00
Alan Wizemann b4482e5ee7 feat(gateway): Google Chat platform + cross-platform allowlists + behavior toggles (WS-5)
Catches the Mac Messaging Gateway and Platforms surfaces up to Hermes
v0.13.0. Adds Google Chat as the 20th platform under Settings → Platforms,
gated on `hasGoogleChatPlatform`. Adds a per-platform "Gateway behavior"
subsection to the six platforms Hermes added v0.13 allowlist support to
(Slack, Mattermost, Google Chat, Telegram, WhatsApp, Matrix) — each
exposes the `allowed_channels` / `allowed_chats` / `allowed_rooms` editor
plus three new toggles (`busy_ack_enabled`, `gateway_restart_notification`,
`slash_command_notice_ttl_seconds`). The Messaging Gateway page header
gains a one-line cross-profile digest sourced from `hermes gateway list
--json`. SkillsView surfaces an informational row on skills whose body
contains the v0.13 `[[as_document]]` directive.

New ScarfCore types: `GatewayAllowlistKind` (channels/chats/rooms +
platform mapping), `GatewayPlatformSettings` (per-platform v0.13 bundle),
`GatewayConfigWriter` (pure YAML list-block editor — `hermes config set`
can't write lists; tested with 15 cases incl. round-trip + idempotence +
quoting + scalar-sibling preservation), `HermesGatewayListService`
(`hermes gateway list --json` parser tolerant of unknown keys + alt
field names; 13 tests), `HermesConfig.gatewayPlatforms` field. Mac VM
renamed to `MessagingGatewayViewModel` (single-feature local rename;
CLAUDE.md "the SidebarSection.gateway enum case stays" invariant
upheld). All 22 new tests pass; full ScarfCore suite green except 3
pre-existing `RemoteSQLiteBackendTests` failures unrelated to WS-5.

Capability-gated end-to-end. Pre-v0.13 hosts see no Google Chat row,
no cross-profile digest, no v0.13 toggles, and no `[[as_document]]`
info row — the v2.7.5 surface is byte-for-byte unchanged. Q1-Q3 wire-
shape unknowns (Google Chat identifier, YAML key path,
`gateway list --json` shape) are marked with `// TODO(WS-5-Q<N>)` and
defended by tolerant parsers + dual-spelling lookups.

Implements WS-5 of Scarf v2.8.0 (Hermes v0.13.0 catch-up).
Plan: scarf/docs/v2.8/WS-5-gateway-v0.13-plan.md (on coordination/v2.8.0-plans).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 19:05:55 +02:00
Alan Wizemann 4757b5ae49 feat(curator): archive + prune + list-archived (WS-4)
Catches the Curator surface up to Hermes v0.13's new write-side verbs
(`archive <skill>`, `prune`, `list-archived`, synchronous `run`). Adds
a new `CuratorService` actor in ScarfCore mirroring `KanbanService`'s
pattern (Sendable, pure I/O, `Task.detached(priority: .utility)` per
verb), tolerantly-decoded `HermesCuratorArchivedSkill` /
`CuratorPruneSummary` models, and `CuratorError` for inline-banner
surfacing.

Mac UX gains an "Archived" section between the leaderboards and the
last-report block (per-row Restore button), an "archivebox" button on
every active-skill leaderboard row to manually archive, a destructive
"Prune Archived…" confirm sheet enumerating each skill (template-
uninstall pattern — Cancel owns `.defaultAction`, Prune is on the red
`ScarfDestructiveButton`), and a synchronous-with-progress "Run Now"
on v0.13+ hosts (600s timeout, `ProgressView` while in-flight).
Failure path routes through a yellow inline error banner instead of a
modal alert. The legacy `CuratorRestoreSheet` stays accessible from
the overflow menu but only on pre-v0.13 hosts; on v0.13+ the per-row
Restore in the new Archived section replaces it.

All new surfaces gate on `HermesCapabilities.hasCuratorArchive` —
pre-v0.13 hosts see the v2.7.x layout unchanged. iOS picks up the new
`runNow(synchronous:)` signature with the v0.13 capability flag; the
read-only Archived section + WS-9 marker is left for the next stream.
14 new parser tests in `HermesCuratorParserTests` cover the JSON
happy path, the `{"archived": [...]}` envelope, the text fallback
(`--json` not supported), `"no archived skills"` sentinel folding,
prune-dry-run with both wrapper + bare-array shapes, and zero-skill
prune. All 369 ScarfCore tests pass; `xcodebuild` for the `scarf`
scheme succeeds.

Wire-shape unknowns (CLI flag presence on real v0.13) carry
`// TODO(WS-4-Q<N>)` markers in `CuratorService` and fall back
defensively when a flag isn't recognized. Implements WS-4 of Scarf
v2.8.0 (Hermes v0.13.0 catch-up). Plan:
scarf/docs/v2.8/WS-4-curator-archive-plan.md (on
coordination/v2.8.0-plans).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 19:03:13 +02:00
Alan Wizemann 0070441243 feat(profiles): add --no-skills toggle to create-profile sheet
Adds an "Empty profile (no skills)" toggle to the Mac create-profile
sheet, gated on `hasProfileNoSkills` (v0.13+). When ON, the create
flow appends `--no-skills` to `hermes profile create`. The toggle is
disabled (greyed out) when "Full copy of active profile" is on, per
WS-7 plan Decision H — a full clone copies skills wholesale, so
`--no-skills` would be a contradiction at the UX layer. The wire
itself stays permissive: a user can stack `--clone --no-skills` to
clone config but skip skills, which is a plausible workflow.

Defensive write-strip: even though the toggle is hidden on pre-v0.13
hosts, the call site reads `createNoSkills` through the capability
gate so a stale state value can't sneak `--no-skills` past argparse
on a CLI that doesn't know it.

iOS Profiles is read-only (per CLAUDE.md "v0.12 iOS catch-up
Phase H") so no toggle there.

TODO marker (WS-7-Q8) flags the assumed `--clone-all` interaction —
verify Hermes's behaviour with both flags during integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 19:03:06 +02:00
Alan Wizemann 57a6340985 feat(providers): catalog refresh + image_gen.model + OpenRouter caching (WS-6)
Surfaces the v0.13 provider catalog work in Scarf v2.8.0. Five new model IDs
(deepseek/deepseek-v4-pro, x-ai/grok-4.3, openrouter/owl-alpha,
tencent/hy3-preview, arcee/trinity-large-thinking) flow through
models_dev_cache.json on next refresh — no manual catalog entries
needed; the picker reaches them automatically. The grok-4.20-beta →
grok-4.20 rename is handled via a new ModelCatalogService.modelAliases
map plus resolveModelAlias() helper, called from validateModel(),
model(_:_:), and provider(for:) at read time. Lossless: stored configs
are never rewritten.

Vercel AI Gateway is demoted to the bottom of the picker via a new
demotedProviders set + sort-comparator axis (between subscription-gated
and alphabetical). Always-on, no capability gate — sort-order
consistency across Hermes versions.

image_gen.model (top-level v0.13 YAML key) and
openrouter.response_cache.enabled (provisional key shape per
TODO(WS-6-Q1)) are surfaced as new SettingsSection rows in
AuxiliaryTab, capability-gated on hasImageGenModel +
hasOpenRouterResponseCache so pre-v0.13 hosts hide them. Image-gen
picker has a curated 7-entry allowlist (HermesImageGenModel) plus
free-form Custom model ID entry.

CLAUDE.md gains two schema-drift bullets next to the existing
overlayOnlyProviders requirement (modelAliases + demotedProviders
mirror with hermes_cli/providers.py).

Tests: 4 new M0cServicesTests (sort axis, alias resolution + cross-
provider isolation, image-gen allowlist, demoted-set sentinel) and 2
new M6ConfigCronTests (YAML round-trip + empty-default).

Implements WS-6 of Scarf v2.8.0 (Hermes v0.13.0 catch-up).
Plan: scarf/docs/v2.8/WS-6-providers-v0.13-plan.md
(on coordination/v2.8.0-plans).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 19:02:45 +02:00
Alan Wizemann 0f78856e6e feat(settings): v0.13 polish — redaction hint, display.language picker, xAI cloning badge (WS-8)
Three Settings-tab surfaces tracking v0.13 release notes:

- **Redaction default-flip awareness** (Advanced → Caching & Redaction):
  inline hint below the existing toggle whose copy depends on
  `HermesCapabilities.isV013OrLater`. v0.13 flipped the server-side
  default from OFF (v0.12) to ON, but Scarf's parser still treats
  "absent key" as `false`. Hint disambiguates so users on v0.13 hosts
  understand redaction is on server-side even when the toggle reads OFF.

- **`display.language` picker** (General → Locale): 8-option enum (`""`
  default + en/zh/ja/de/es/fr/uk/tr) capability-gated on
  `hasDisplayLanguage`. Persists via `hermes config set
  display.language <code>`. Empty string preserves "no key" semantics
  (Hermes-default English); explicit `en` pins it. Required a small
  `optionLabel:` overload on `PickerRow` so non-English labels
  (中文 / 日本語 / etc.) render alongside their codes.

- **xAI Custom Voices badge** (Voice → Text-to-Speech): adds `xai`
  to the TTS provider picker (un-gated — xAI TTS shipped earlier),
  exposes Voice ID + Model fields, and renders a "Cloning supported"
  ScarfBadge gated on `hasXAIVoiceCloning`. Hint copy points at
  `hermes voice` for cloned-voice management since Scarf has no
  in-app surface for that yet (out-of-scope for v2.8).

Capability gates: `isV013OrLater` (hint discriminator),
`hasDisplayLanguage` (picker), `hasXAIVoiceCloning` (badge). Pre-v0.13
hosts see the v2.7.5 layout unchanged.

`TODO(WS-8-Q2)` flags the assumed xAI YAML keys (`tts.xai.voice_id` /
`tts.xai.model` mirroring elevenlabs) for grep-verify against
`~/.hermes/hermes-agent/hermes_cli/voice/tts.py`.

iOS deferred to v2.9 (Q4): `Scarf iOS` Settings is read-mostly and
doesn't have a write surface for either the language picker or the
xAI fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:59:38 +02:00
Alan Wizemann 5877bf6519 feat(updater): forward-compat HermesUpdaterCommandBuilder for hermes update --yes (WS-8)
Pure-function helper that builds argv arrays for `hermes update`,
gated on `HermesCapabilities`. Pre-v0.12 → bare `update`; v0.12+
honors `--check`; v0.13+ honors `--yes` for unattended runs.

No in-app "Update Hermes" affordance ships in v2.7.5 — Sparkle handles
Scarf-self-update and `hermes update` is invoked by users in their
terminal. This is forward-compat plumbing so the eventual UI surface
shares flag selection across Mac / iOS / remote without re-deriving
from scratch.

Test matrix in `M0eUpdaterTests` covers all six combinations
(pre-v0.12, v0.12 ± unattended ± check, v0.13 ± unattended ± check)
plus an empty-capabilities fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:59:12 +02:00
Alan Wizemann f19f19cd56 feat(chat): surface v0.13 compression count + bracket-aware slash hint (WS-8)
Two small chat-surface additions tracking Hermes v0.13:

- Plumb a `compressionCount` field through `ACPPromptResult` and
  `RichChatViewModel.acpCompressionCount` so `SessionInfoBar` can render
  a `🗜 ×N` chip next to the token counter when the agent has performed
  context compactions. Capability-gated on
  `HermesCapabilities.hasContextCompressionCount` and `count > 0` so
  pre-v0.13 hosts (which always emit 0) and fresh sessions never see
  the chip. Wire decode tolerates camelCase + snake_case;
  `TODO(WS-8-Q1)` flags the assumption that the field rides on
  `usage` — if v0.13 emits via a separate `session/update` notification
  the bigger fix is described in the WS-8 plan.

- Slash-menu argument hint is now bracket-aware: hints starting with
  `<` or `[` pass through verbatim, others wrap as `<hint>`. v0.13's
  `/new [name]` ships through unchanged without rendering as
  `<[name]>`. No flag check at the renderer — agent payload is the
  source of truth.

Coordination with WS-2: both WSes touch `SessionInfoBar`. WS-2 owns
the queue chip on the left half; this WS owns the compression chip on
the right half. The added `capabilities` parameter is shared — kept
additive so WS-2's later merge produces no file-level conflict.

Tests: extends `M0dViewModelsTests` (compression count tracking +
reset semantics) and `ScarfCoreSmokeTests` (decode default + explicit
v0.13 init path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:58:58 +02:00
Alan Wizemann 6c96fcfa43 feat(settings): add Web Tools tab with v0.13 search/extract split
Adds a new "Web Tools" Settings tab (between Browser and Voice) with
two distinct shapes that share the same chrome:

- Pre-v0.13: a single "Backend" picker writing the legacy
  `web_tools.backend` key (so v0.12 users still configure web tools).
- v0.13+: two pickers — Search backend writes
  `web_tools.search.backend` (SearXNG appears here only — Hermes
  registers it as a search-only dispatch), Extract backend writes
  `web_tools.extract.backend`.

Capability gate: `hasWebToolsBackendSplit` chooses which shape
renders. The tab itself is always visible — pre-v0.13 users would
otherwise lose access to the legacy combined-backend picker.

Model layer:
- `HermesConfig.webToolsBackend` / `webToolsSearchBackend` /
  `webToolsExtractBackend` — three fields, each round-tripping its
  own YAML key. Defaults: `duckduckgo` / `duckduckgo` / `reader`.
- YAML parser reads all three keys via the existing `str(...)`
  helper. Pre-v0.13 hosts populate only `webToolsBackend`; the
  split keys default to the same backend so the picker shows the
  same value the user already had.

TODO markers (WS-7-Q6/Q7) flag the inline backend lists + legacy
fallback semantics — verify against `~/.hermes/hermes-agent/
hermes_cli/web_tools.py` during integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:56:08 +02:00
Alan Wizemann edac142d08 feat(chat): add /goal and /queue slash commands (WS-2)
Adds Hermes v0.13's Persistent Goals and ACP /queue surfaces to the
rich-chat composer. /goal <text> locks the agent on a target across
turns (rendered as an info-tinted "Goal locked" pill in the chat
header, with a context-menu Clear action that dispatches /goal --clear);
/queue <text> queues a prompt to run after the current turn (rendered
as a warning-tinted chip with a popover listing queued prompts +
relative timestamps). Both ride .acpNonInterruptive so the chat keeps
"Agent working…" off, and both surface a 4-second transient toast
mirroring /steer's existing UX.

Capability-gated end-to-end: the rich-chat slash menu reads through
RichChatViewModel.capabilitiesGate (a new @ObservationIgnored field
fed by ChatViewModel.attachCapabilitiesStore on Mac and a parallel
.task(id:) on iOS), so pre-v0.13 hosts never see /goal or /queue.
/steer is greyed-out on idle sessions when hasACPSteerOnIdle is off
(pre-v0.13 hosts only). The "Clear all" queue-popover button is
intentionally absent in v2.8.0 — Hermes' wire-shape for /queue --clear
isn't verified yet, so a button that lies about server-side state is
worse than no button (per WS-2 plan Q2 decision).

Optimistic-only: there is no authoritative read-back path for the
active goal in v2.8.0. The pill paints synchronously off the
optimistic write the moment the user sends /goal …; cross-session
resume won't re-paint it until the user types /goal again. A
TODO(WS-2-Q1) marker in RichChatViewModel.recordActiveGoal points at
the read-back hook for v2.8.1; TODO(WS-2-Q5) flags the verbatim
/queue argument shape for coordinator wire-verification; TODO(WS-2-Q7)
flags the /goal non-interruptive classification. TODO(v2.8.1) in
handlePromptComplete is the deferred "auto-resumed from checkpoint"
indicator (WS-2 plan Q3 decision).

iOS surfaces no UI yet (deferred to WS-9), but the iOS controller's
_sendImpl mirrors the dispatch so the shared RichChatViewModel state
stays aligned across platforms — otherwise an iOS user who ran /goal
then opened the same session on Mac would see an empty pill.

Tests: extends M9SlashCommandTests with 13 new cases covering the
non-interruptive list contents, capability-gated availableCommands
filtering on v0.12 vs v0.13, parseGoalArgument variants, optimistic
mutators (recordActiveGoal / recordQueuedPrompt / popQueuedPrompt),
isNonInterruptiveSlash recognition, and reset() drainage.

Implements WS-2 of Scarf v2.8.0 (Hermes v0.13.0 catch-up).
Plan: scarf/docs/v2.8/WS-2-goals-and-queue-plan.md (on coordination/v2.8.0-plans).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:55:54 +02:00
Alan Wizemann fd33b714e3 feat(cron): add --no-agent watchdog toggle gated on hasCronNoAgent
Adds a "Run script only (no agent call)" toggle to the cron job
editor. When ON, the prompt + skills sections dim + disable
visually but stay rendered (no layout shift mid-edit), the
script field stays fully active, and the form passes
`noAgent: true` to `createJob`/`updateJob`. The toggle is hidden
on pre-v0.13 hosts via `supportsNoAgent: hasCronNoAgent` and
defensively stripped at the call site (`hasCronNoAgent ?
form.noAgent : false` on create, `: nil` on edit) — same shape
as the v0.12 `workdir` strip.

Read-side: `HermesCronJob.noAgent: Bool?` is decoded via
`decodeIfPresent` so pre-v0.13 jobs.json files round-trip
unchanged. The display rule `job.noAgent == true` treats
`nil` and `false` identically — a script-only job must opt in.

Write-side:
- `createJob` appends `--no-agent` and passes an empty positional
  prompt (per WS-7-Q5) to keep argparse happy when the prompt is
  the trailing positional.
- `updateJob` sends `--no-agent` / `--agent` to flip the flag in
  edit mode (per WS-7-Q4 — verify the toggle-off spelling on
  integration; if Hermes is one-way, disable the toggle in edit
  mode with a tooltip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:43:03 +02:00
Alan Wizemann c81a8a56e8 feat(mcp): add SSE transport support gated on hasMCPSSETransport
Extends MCPTransport with a third .sse case (alongside stdio + http),
plumbed through the YAML parser, add-server form, list view, detail
view, and editor. The add-server form filters .sse out of the segmented
picker on pre-v0.13 hosts (capability-gated on hasMCPSSETransport) so
Hermes never sees a transport flag it can't parse. The editor renders
a third numeric "SSE read timeout" field only for .sse servers.

YAML layer:
- HermesMCPServer.sseReadTimeout: Int? — defaulted in init, decoded
  from `sse_read_timeout` scalar.
- parseMCPServersBlock: 3-way transport discriminator — `transport: sse`
  scalar wins, then url-bearing entries default to .http (v0.12 shape),
  command-bearing to .stdio. Pre-v0.13 entries are byte-for-byte
  unaffected.
- HermesFileService.addMCPServerSSE writes via `hermes mcp add --url
  <u> --transport sse [--sse-read-timeout <t>]`.
- HermesFileService.setMCPServerSSETimeout patches the scalar via the
  same surgical patcher used by setMCPServerTimeouts.

TODO markers (WS-7-Q1/Q2/Q3) flag the wire-format unknowns the plan
called out — verify against a v0.13 Hermes install during integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:34:27 +02:00
Alan Wizemann 3e470c7155 Merge pull request #80 from awizemann/ws-1-capabilities-v0.13
feat(capabilities): add Hermes v0.13 capability flags + version bump (WS-1)
2026-05-09 18:17:51 +02:00
Alan Wizemann 963d0e1a5c feat(capabilities): add isV013OrLater convenience predicate
Surfaces a v0.12 → v0.13 boundary check that doesn't proxy through any
specific feature flag. Used by WS-8 (redaction default-state hint copy,
"v0.13 features active" Settings badge in iOS WS-9) where the call site
isn't actually about a specific feature — it's about whether the host is
on the v0.13 line.

Equivalent to any individual v0.13 flag (e.g. `hasGoals`); both resolve
to the same `>= 0.13.0` threshold. Convenience exists to keep call sites
honest: `caps.isV013OrLater` reads better than `caps.hasGoals` when the
context isn't goal-related.

Tests: 4 new fixtures covering v0.13 host (true), v0.12 host (false),
empty/undetected (false), and v0.14 host (true). 19 total tests in the
suite, all passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 18:08:14 +02:00
Alan Wizemann 52c802676f feat(capabilities): add Hermes v0.13 capability flags + version bump
Adds 22 new capability flags grouped under a v0.13 (v2026.5.7) MARK
section in HermesCapabilities, covering Persistent Goals, ACP /queue
+ /steer-on-idle, Kanban diagnostics + recovery UX, Curator archive
+ prune, Google Chat (20th platform), cross-platform allowlists,
MCP SSE transport, Cron --no-agent, Web Tools backend split, Profiles
--no-skills, context compression count, /new <name>, OpenRouter cache,
image_gen.model, display.language, xAI voice cloning, video_analyze,
and the transform_llm_output plugin hook.

Each flag gates on >= 0.13.0 so v0.13 patch releases (0.13.4 etc.)
still light up every flag. Existing v0.12 flags unchanged. Test suite
extends with v0.13.0/2026.5.7 fixtures, a v0.13.4 patch-release case,
explicit "v0.13 flags off on v0.12 host" coverage, and updates the
future-version test to v0.14.0.

CLAUDE.md target line bumps to v2026.5.7 (v0.13.0); a new v2026.5.7
section mirrors the v0.12 / v0.11 scaffolding describing the Scarf-
relevant subset. The v0.12 + v0.11 historical sections remain intact
since pre-v0.13 hosts still consume those flags.

Foundation for the v2.8.0 Scarf release — every subsequent work-stream
(WS-2 through WS-9) consumes flags added here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 17:31:51 +02:00
Alan Wizemann 5d8873d305 chore: Bump version to 2.7.5 v2.7.5 2026-05-08 13:59:21 +02:00
Alan Wizemann 49bc4efe83 fix(kanban): enrich LocalTransport subprocess env so kanban dispatcher can spawn workers
GUI-launched Scarf inherits macOS's launch-services PATH
(`/usr/bin:/bin:/usr/sbin:/sbin`). Scarf itself finds `hermes` via
absolute-path resolution in `HermesPathSet.hermesBinaryCandidates`,
but when the kanban dispatcher (a child of Scarf) tries to spawn a
worker, the worker inherits the same stripped PATH and Hermes's spawn
machinery prints `\`hermes\` executable not found on PATH. Install
Hermes Agent or activate its venv before running the kanban
dispatcher.` — recording `outcome=spawn_failed` on the run.

`LocalTransport` now mirrors `SSHTransport.environmentEnricher`:
adds an `environmentEnricher: (() -> [String: String])?` static, and
applies it to every subprocess. `scarfApp.swift` wires it at launch
to the same `HermesFileService.enrichedEnvironment()` login-shell
probe (`zsh -l -i` → `zsh -l` fallback) the SSH transport already
uses, so subprocesses see `~/.local/bin`, `/opt/homebrew/bin`, and
the user's credential env vars.

Defense-in-depth: `subprocessEnvironment(forExecutable:)` always
prepends the executable's own directory to PATH if missing — covers
early-startup paths and test harnesses where the enricher hasn't
been wired yet.

Two new tests in `KanbanModelsTests` lock in:
1. The fallback (no enricher → executable's dir lands on PATH)
2. The enricher win for PATH + the empty-string-aware copy semantics
   for credential env vars (process env happens to set
   `ANTHROPIC_API_KEY=""` as an empty string in some environments;
   the enricher's non-empty value must still take effect)

Release notes for v2.7.5 updated to document the fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:59:21 +02:00
Alan Wizemann adcc984091 feat(kanban): full read/write board with per-project tenants
Lifts Scarf's Kanban surface from the v2.6 read-only list to a
drag-and-drop board with the complete Hermes v0.12 mutation surface
wired up, plus per-project boards bound to a Scarf-minted tenant slug
and a read-only board on iOS.

Why now: the v2.6 list was a placeholder shipped while upstream Kanban
collab was still mid-rework. v0.12 stabilized the 27-verb CLI; this
release makes Scarf a real GUI client for it. Driving real tasks
end-to-end exposed and closed a connected bug pattern (claim vs
dispatch, silent skipped_unassigned, integer-vs-ISO timestamps,
parser-leaked "(no" sentinel) that would have shipped as latent UX
papercuts otherwise.

ScarfCore: KanbanService actor (Sendable, pure I/O) wrapping every
verb; KanbanTenantReader cross-platform manifest projection; eight
new model types (TaskDetail, Comment, Event, Run, Stats, Assignee,
CreateRequest, Filters); KanbanError; pure transition planner that
maps drag-drop column changes to verb sequences, tested against
canonical Hermes JSON fixtures.

Mac: KanbanBoardView orchestrator with five-column drag-drop layout,
optimistic-merge state, KanbanInspectorPane side-pane (Comments /
Events / Runs / Log tabs, Log streams worker stdout every 2s while
running), inline assignee picker, health banner for unassigned and
last-failed-run states. New Task sheet defaults to active profile
and auto-fires kanban dispatch on submit. Sidebar moved Kanban from
Manage to Monitor. Read-only KanbanListView preserved as Board|List
toggle for narrow windows / accessibility.

Per-project: DashboardTab.kanban tab on every project gated on
hasKanban; KanbanTenantResolver mints scarf:<slug> tenants on first
interaction and persists to .scarf/manifest.json (immutable across
rename); ProjectAgentContextService surfaces the tenant in the
AGENTS.md scarf-managed block so agents pass --tenant <slug> on
kanban create. New kanban_summary dashboard widget; vocabulary
mirrored in tools/widget-schema.json and site/widgets.js.

iOS: read-only board on the project tab via paged single-column
Picker, modal detail sheet with Comments / Events / Runs. Mutations
+ drag-drop deferred to v2.8.

Tests: 19 new pure-logic tests covering decoding, planner verb
mapping, argv assembly, glance string formatting, and parser
rejection of the kanban assignees empty-state sentinel. All 348
ScarfCore tests pass.

Constraints documented in CLAUDE.md: no within-column reorder
(Hermes has no update --priority verb); no live watch streaming
yet (5s polling for board, 2s for log); no bulk re-tag for legacy
NULL-tenant tasks. Pre-v0.12 Hermes hosts gracefully hide the
surface end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:59:21 +02:00
Alan Wizemann fd80f4f95a Create FUNDING.yml 2026-05-07 12:55:53 +02:00
Alan Wizemann 9f240ae291 chore: Bump version to 2.7.1 v2.7.1 2026-05-07 12:46:11 +02:00
Alan Wizemann 9c149b288b fix(docs): restore Sonoma compatibility messaging in BUILDING.md + CONTRIBUTING.md
Scarf's `MACOSX_DEPLOYMENT_TARGET` is `14.6` (Sonoma) on the main
`scarf` target, set in 86762ea. Sonoma support is intentional —
several users dogfood on macOS 14.x and we want to keep them on the
release channel. Yesterday's BUILDING.md and the long-stale
CONTRIBUTING.md statement both claimed macOS/Xcode 26.x as minimums,
which would have steered Sonoma contributors and users away from a
build that actually runs on their box.

Correct values:

- Runtime min: **macOS 14.6 (Sonoma)** — matches the deployment target.
- Build min: **Xcode 16.0** — needed for Swift 6 strict-concurrency
  features the codebase uses.

Add a load-bearing-callout to BUILDING.md so future doc edits don't
silently raise the floor again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:13:18 +02:00
Alan Wizemann 37afbdeffc feat(build): contributor-friendly local-build.sh + BUILDING.md
Adds `scripts/local-build.sh` for unsigned command-line Debug builds
so contributors without an Apple Developer account can clone, build,
and run without provisioning gymnastics. The script:

- Detects arm64 / x86_64
- Verifies xcode-select, xcrun, xcodebuild are present
- Probes the Metal toolchain and offers an interactive install (gated
  on `[[ -t 0 && -z "${CI:-}" ]]` — CI never gets prompted)
- Resolves Swift packages, builds Debug with signing disabled
- Optionally `ditto`s the result to /Applications/scarf.app on
  explicit y/N

`BUILDING.md` documents prerequisites alongside the script. Existing
canonical Release universal CLI in README stays — `local-build.sh`
is an alternative for contributors, not a replacement for the
shipping build.

Cherry-picked from #76 with thanks to @unixwzrd. BUILDING.md's
prerequisites are corrected to match the actual deployment target
(macOS 26.2, Xcode 26.2+).

Co-Authored-By: M S <unixwzrd.register@mac.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:08:33 +02:00
Alan Wizemann bfd9bab9a0 fix(health): stop external dashboards by port, not pkill -f
`stopDashboard()` used to fall back to `pkill -f "hermes dashboard"`
when the running dashboard wasn't a Scarf-spawned subprocess. That's
broad enough to match shell history, log tails, README readers, and
this very source file — anything with the substring "hermes
dashboard" in its argv was a kill target.

Replace with a port-anchored lookup: `lsof -tiTCP:<port> -sTCP:LISTEN`
returns the PID actually bound to the dashboard port, then we
`SIGTERM` only that one process. Trusting the port is correct here:
Scarf owns the configured port and the user-visible intent is "stop
the thing on this port."

We deliberately omit `lsof -c hermes`. Hermes installs as a Python
shebang script (verified locally — `file ~/.local/bin/hermes` →
"a python3 script text executable"), so the kernel COMM is `python` /
`python3`, never `hermes`. A `-c hermes` filter would silently miss
every standard install.

Cherry-picked from #76 with thanks to @unixwzrd for the direction;
this version drops the `-c hermes` filter to actually fire on real
Hermes installs.

Co-Authored-By: M S <unixwzrd.register@mac.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:08:23 +02:00
Alan Wizemann 2e0eb63ea4 fix(health): tighten Hermes gateway pgrep so unrelated commands don't match
`hermesPIDResult()` was running `pgrep -f hermes`, which matched any
process with "hermes" anywhere in its argv — `hermes acp` chat
sessions Scarf itself spawns, `hermes -z` one-shots, log tails, even
this very file in an editor. The Dashboard "Hermes is running" badge
read true even when the gateway daemon was down.

Narrow the match to the gateway shape specifically. Two alternations
cover both invocation forms used in the wild:

- `python -m hermes_cli.main gateway run …` (the launchctl form)
- `/path/to/hermes gateway run …` (the script-path form)

Verified locally against an actual gateway PID:

    cmd=/Users/.../python -m hermes_cli.main gateway run --replace

The first alternation matches via the `-m hermes_cli.main gateway run`
boundary. All callers — `stopHermes()`, `DashboardViewModel`,
`HealthViewModel`, `SettingsViewModel`, `scarfApp` — semantically
want the gateway PID specifically, so the narrower match is the
right shape, not a behavior change.

Cherry-picked from #76 with thanks to @unixwzrd for the diagnosis
and the regex.

Co-Authored-By: M S <unixwzrd.register@mac.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:08:11 +02:00
Alan Wizemann 3a3c87e033 fix(skills): scope What's New pill to Installed tab + reword updated→changed
Issue #78 — The "What's New" pill at the top of the Skills page
announced "18 new, 3 updated since you last looked" while the Updates
sub-tab simultaneously said "No Updates / All skills are up to date."
Two surfaces measuring two different things both used the word
"update": the pill counts local file deltas since the user last
clicked "Mark as seen", while the Updates body runs `hermes skills
check` to find skills with newer upstream versions available. From
the user's seat the screen contradicted itself.

Two changes:

1. Render the pill only on the Installed sub-tab (Mac + ScarfGo).
   Local file deltas are contextually meaningful only on the tab
   that surfaces installed skills; showing them above Browse Hub or
   Updates was misleading.

2. Reword the pill: "X updated since you last looked" → "X changed
   since you last looked". Keeps `SkillSnapshotDiff.updatedCount` as
   the field name (it's still about file changes, not version bumps);
   only the user-visible string changes. Removes the vocabulary
   collision with the Updates tab's separate upstream-update check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:51:05 +02:00
Alan Wizemann f9e3cd38f5 fix(skills): client-side filter for All-Sources hub search
Issue #79 — Browse Hub clearly listed "honcho" but searching for
"honcho" with the source picker on "All Sources" returned nothing.
Root cause is on the Hermes side: `hermes skills search <query>`
without a `--source` flag routes through the centralized
`hermes-index` source and skips the external API sources
(skills-sh, github, clawhub, lobehub, well-known, claude-marketplace).
Browse aggregates those sources too, so any skill that lives only in
the API tier shows up in browse but disappears in search. Same picker,
same query, contradictory results.

Rather than chase Hermes's index gaps, redefine "All Sources" search
in Scarf to mean filter-what-you-see — the canonical type-to-filter
UX users already expect on a list. Source-specific searches keep the
CLI shell-out for full upstream search semantics on that registry.

Implementation:
- New `lastBrowseResults` cache populated on every successful
  `browseHub()`. Setter is `internal` so the test suite can seed
  without invoking the live CLI; out-of-module callers can still
  only read.
- `searchHub()` now branches on `hubSource`. The "all" branch filters
  the cache via `localizedCaseInsensitiveContains` against name,
  description, and identifier, runs synchronously on the calling
  actor (UI invocations are already on MainActor) so the user sees
  the narrowed list without a render-tick gap.
- If the cache is empty (search-before-browse), `browseHubThenFilter`
  performs one CLI fetch, populates the cache, then applies the
  filter — failure surfaces a "Search failed" banner instead of a
  silent empty state.
- Source-specific search still shells out to
  `hermes skills search <query> --source <s> --limit 40`.

Adds five regression tests covering name match, description match,
case-insensitive folding, no-match message state, and the empty-query
fallthrough to browse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:50:52 +02:00
Alan Wizemann a6a8cae8ff fix(transport): drain ssh stdout/stderr concurrently to unwedge >64KB payloads
Issue #77 — Sessions screen rendered empty even though Dashboard
reported 161 sessions and Activity reported 116. Root cause was a
classic pipe-buffer deadlock in SSHScriptRunner: stdout was read via
`readToEnd()` AFTER the subprocess had exited. macOS pipes default to
a 16–64 KB kernel buffer; once the remote `sqlite3 -json` script wrote
more than that to its stdout, ssh back-pressured across the wire,
sshd back-pressured sqlite3, sqlite3 blocked, the script never
finished, the 30-second timeout fired, `streamScript` threw, and
`HermesDataService.sessionListSnapshot()` swallowed the failure into
an empty array. Empty Sessions list. Dashboard kept working because
its smaller LIMIT 5 payload fit under the threshold.

Why this was a v2.7 regression specifically: 20cc3a2 folded the
previously-separate sessions + previews queries into a single batched
round-trip (perf win for remote users). The new combined payload for
~150+ sessions crossed the buffer threshold for the first time.

Fix: drain stdout/stderr concurrently with the running process via
Foundation's `FileHandle.readabilityHandler`, accumulating chunks
into an NSLock-guarded `Data` buffer. The kernel pipe never fills,
the subprocess never blocks, the script returns the full payload.
Same change applied to both the SSH path (`runOverSSH`) and the
local path (`runLocally`) — they had identical bug shapes.

Adds SSHScriptRunnerTests with three regression checks: a 256 KB
synthetic payload that would have wedged pre-fix, a small-payload
sanity round-trip, and a non-zero exit propagation check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 11:50:34 +02:00
Alan Wizemann 6b66b1c96f perf(ios): wire v2.7 perf parity — instrument iOS-only VMs + surface hydration banner + opt-in toggle
Most of the v2.7 perf work was already covered on iOS via shared
code in ScarfCore — `RichChatViewModel.loadSessionHistory` (and
its skeleton-then-hydrate path), `hydrateAssistantToolCalls`,
`fetchSkeletonMessages`, `fetchRecentToolCallSkeleton`,
`ModelPreflight.detectMismatch`, and the `RemoteSQLiteBackend`
cancellation handler all flow through to the ScarfGo chat
unchanged. `CitadelServerTransport.streamScript` already
honors `Task.isCancelled` correctly via `withThrowingTaskGroup` +
`Task.checkCancellation()`, so the SSH-cancellation-on-nav-away
chain works on iOS without the Mac-side `SSHScriptRunner` fix.

Three iOS-specific gaps closed:

* IOSCronViewModel.load + IOSMemoryViewModel.load wrapped in
  `ScarfMon.measureAsync(.diskIO, "ios.cron.load")` /
  `"ios.memory.load"` — parity with the Mac `cron.load` /
  `memory.load` events. `ios.memory.load.bytes` records the
  payload size for the loaded file.
* iOS Settings → "Chat (Scarf)" section gains a toggle bound to
  `RichChatViewModel.loadHistoricalToolResultsKey` so iOS users
  can opt into Phase 2b bulk tool-result hydration, same as the
  Mac DisplayTab. The shared key means the gate inside
  `startToolHydration` reads the right value automatically — no
  extra plumbing needed.
* iOS ChatView surfaces `isHydratingTools` as a "Loading tool
  details…" connection banner (mirrors the Mac toolbar pill
  added in v2.7 perf work). Sits between the existing
  "Thinking…" banner and the empty-view fallback so chat status
  is always honest about what the agent and Scarf are doing.

Both Mac and iOS targets build clean; all 321 ScarfCore tests
pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:26:25 +02:00
Alan Wizemann 97ec4d2882 chore: Bump version to 2.7.0 v2.7.0 2026-05-05 20:41:39 +02:00
Alan Wizemann cd5bb32a21 release: prep v2.7.0 — consolidated notes + in-app Sparkle release notes
Rolls up everything since v2.6.5 (36 commits across remote-perf,
project wizard, dashboard widgets, OAuth resilience, ScarfMon
instrumentation, and the v2.7 skeleton-then-hydrate redesign) into
a single 2.7.0 release.

* releases/v2.7.0/RELEASE_NOTES.md — full consolidated notes,
  reorganized around the throughline (slow-remote performance) with
  five thematic sections: skeleton-then-hydrate loaders, SSH
  cancellation, project wizard + Keychain cron secrets, dashboard
  widgets, OAuth resilience, and ScarfMon. Replaces the previously-
  drafted dashboard-only v2.7.0 stub and the separate v2.8 wizard
  stub (both unreleased).
* releases/v2.8/ — deleted; folded into v2.7.
* README.md — "What's New in 2.6" → "What's New in 2.7" with the
  five-section summary linking out to the full notes.

* tools/render-release-notes.py — stdlib-only Markdown → HTML
  renderer covering the subset of GitHub-flavored markdown that
  release notes use (## / ### headings, paragraphs, ul lists,
  fenced code, inline code/bold/italic/links, hr). Output includes
  a small <style> block tuned for Sparkle's update alert WebKit
  view (light + dark variants via prefers-color-scheme).
* scripts/release.sh — render the active RELEASE_NOTES.md and
  inject the result as <description><![CDATA[...]]></description>
  on the appcast item. Sparkle's standard updater renders this in
  the in-app update sheet so users see release-specific "what's
  new" alongside the version number, not just the bare version.
  Falls back to a "see GitHub release page" placeholder when the
  notes file is missing.

User runs ./scripts/release.sh 2.7.0 to ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:31:27 +02:00
Alan Wizemann 5e23b59697 test(model-preflight): cover detect-mismatch + fix newline-trim bug
* New ModelPreflightTests suite (19 tests) covering both `check(_:)`
  and the v2.8 `detectMismatch(_:)` paths. Pins the dogfooding
  scenario (anthropic-prefixed model + nous active provider after
  Credential Pools OAuth swap), the case-insensitive prefix match,
  empty-prefix / empty-bare-model edge cases, and multi-slash model
  ids (OpenRouter style).

* Bug fix surfaced by the tests: `ModelPreflight` was using
  `trimmingCharacters(in: .whitespaces)` which doesn't strip
  newlines. A stray `\n` in a hand-edited config.yaml would either
  miss the missing-fields classifier OR false-positive the mismatch
  banner (showing "anthropic" vs "anthropic\n"). Switched both
  trims to `.whitespacesAndNewlines`.

perf(observability): instrument Tier C load paths + fetchSessionPreviews

No behavior change — adds ScarfMon coverage so future captures show
how often Memory/Skills/Cron/Curator/SessionPreviews load paths fire
and what they cost on remote (each is multiple sequential SFTP RTTs
that pre-fix were invisible). New events:

* `mac.fetchSessionPreviews` / `.rows` / `.transportError`
* `memory.load` / `.bytes`
* `cron.load` / `.jobs`
* `skills.load` / `.count`
* `curator.load` / `.bytes`

All 321 ScarfCore tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:03:35 +02:00
Alan Wizemann 09e33b2999 perf(chat,activity,transport): skeleton-then-hydrate loaders + SSH cancellation propagation
Major perf overhaul for slow-remote contexts. Chats and Activity now
render in <2s instead of timing out at 30s; abandoned SSH work is
killed within 100ms instead of pinning a ControlMaster session.

* Skeleton-then-hydrate chat loader. New `fetchSkeletonMessages`
  selects user+assistant rows only (skips role='tool', NULLs
  tool_calls + reasoning at the SQL level). Wire payload bounded by
  conversational text alone — sub-second on remote regardless of
  underlying tool result blob sizes. Background `startToolHydration`
  pages through `hydrateAssistantToolCalls` (5-id batches) to splice
  tool calls in. Tool-result CONTENT is opt-in via Settings → Display
  → "Load tool results in past chats" (default off); inspector pane
  lazy-fetches per-result via `fetchToolResult(callId:)` on expand.

* Skeleton-then-hydrate Activity loader. New
  `fetchRecentToolCallSkeleton` returns metadata-only rows in ~3 KB
  for 50 entries; placeholder ActivityRows render immediately, real
  per-call entries swap in as paged hydration completes. Loading
  pill in the page header, orange transport-error banner replaces
  the pre-fix silent empty state.

* SSH cancellation propagation. `Task.detached` and unstructured
  `Task<...> { ... }` don't inherit cancellation from awaiting
  parents — without bridging, killing a Swift Task left the ssh
  subprocess running for the full 30s deadline, pinning a remote
  sqlite query and a ControlMaster session. Wired
  `withTaskCancellationHandler` through `SSHScriptRunner.run` and
  `RemoteSQLiteBackend.query`; cancellation now reaches `Process`
  within ~100ms. New `ssh.cancelled` ScarfMon event.

* L1 single-id retry. When a 5-id `hydrateAssistantToolCalls` page
  trips the 30s timeout (one row carries an oversized tool_calls
  blob — long Edit args, big diffs), fall back to single-id queries
  to isolate the whale. Non-whale rows in the same batch hydrate
  normally; whale row stays bare. New `mac.hydrateToolCalls.singleTimeout`
  event tracks how often the recovery fires.

* L2 in-flight coalescing for `loadRecentSessions`. File-watcher
  deltas during streaming used to stack 2-3 parallel sessions-list
  reload tasks; subsequent callers now await the active one. New
  `mac.loadRecentSessions.coalesced` event tracks dedup hits.

* Loading-state UX hardening. New `isStartingSession` flag flips
  synchronously on user click so the chat sidebar greys + disables
  immediately instead of waiting for `client.start()` to return
  (5-7s on remote). Phase-typed status: "Spawning hermes acp…" →
  "Authenticating…" → "Loading session…" → "Loading history…" →
  "Ready". `ChatSessionListPane` overlays a ProgressView showing
  the current phase.

* Partial-result detection. `fetchMessagesOutcome` distinguishes a
  transport failure from a genuine empty result; `loadSessionHistory`
  surfaces "Couldn't load full chat history — connection timed out"
  through the existing acpError triplet so the user sees what
  happened instead of a silent empty transcript.

* Model/provider mismatch banner. `ModelPreflight.detectMismatch`
  recognizes when `model.default` carries a `<provider>/...` prefix
  that disagrees with `model.provider` (e.g. anthropic prefix +
  nous active provider after switching OAuth via Credential Pools).
  Banner offers one-click fix in either direction. Companion: ACP
  error classifier recognizes `model_not_found` / `404 messages`
  and surfaces "Hermes pins each session to its original model —
  start a new chat" so the pinned-model failure mode has a clear
  recovery path.

* OAuth-completion provider swap prompt. After successful OAuth in
  Credential Pools, if the just-authed provider differs from
  `model.provider` in config.yaml, surface "Switch active provider
  to <name>?" with [Switch] / [Keep current] instead of
  auto-dismissing.

All 302 ScarfCore tests pass. New ScarfMon events documented in the
Performance-Monitoring wiki page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 19:43:53 +02:00
Alan Wizemann 9f2e2ecfcd perf(chat): exclude reasoning_content from initial fetch + drop page size to 25
The 160-message thinking-model session still timed out at the 30s
ceiling even after dropping page size 200→50 in commit a193003.
ScarfMon trace:

  mac.fetchMessages    30,105,329,125 ns ← 30s timeout fired
  mac.hydrateMessages.rows  count=1     ← 1 partial row only

Root cause: `reasoning_content` is huge on thinking models (20+
KB per row). Even 50 rows × 30 KB = 1.5 MB JSON shipping over a
420ms-RTT remote SSH channel exceeds the budget. The chat
appeared empty AGAIN.

Two cuts:

1. **`messageColumnsLight`** — same as messageColumns but omits
   `reasoning_content`. Used by `fetchMessages` so the bulk
   wire payload is small. `messageFromRow` reads
   reasoning_content via `row.optionalString(at: 11)` which
   gracefully returns nil when the column isn't present, so the
   shape change is transparent.

2. **`fetchReasoningContent(for:)`** — single-row lazy fetch
   the inspector pane calls when the user expands a thinking
   disclosure. One small SSH round-trip per inspection vs. paying
   for ALL reasoning content on every session boot.

3. **`HistoryPageSize.initial` 50 → 25** — sized for the lite
   column shape with margin for sessions that include some heavy
   tool-call payloads. The "Load earlier" affordance still
   pages back through older messages.

Net effect on the user-reported case: 160-message session loads
the most-recent 25 messages in ~5-10s (one SSH round-trip ~420ms
plus ~3 KB × 25 = 75 KB wire). The remaining 135 are reachable
via Load earlier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:28:40 +02:00
Alan Wizemann 1eb5c92f6a fix(aux-tab): correct nested-YAML parser so unknown-task surface works on remote
Bug 1 — the previous parser collected every indented child under
`auxiliary:` as if it were a task name, including leaf fields
(provider, model, base_url, api_key, timeout). Result: bogus rows
on local where the parser happened to fire, plus pollution of
the unknown-tasks set with field names that subtractFrom-known
left orphaned.

Bug 2 — the flat-dot-path branch (`auxiliary.X.Y:`) was dead
code. config.yaml is always nested YAML; the dot-path form only
appears in interactive `hermes config get` output, never on
disk. Removing it.

User reported the unknown-tasks section showed on local but not
on remote. Most likely root cause: the buggy parser surfaced
junk on local (where their config has nested-form aux settings)
while the dead flat-path branch never fired on remote either,
so remote silently rendered nothing. With the parser fixed both
contexts now surface real unknown task names if any are
present.

Rewrite as a clean two-pass walker:
- First nested line inside the block locks taskIndent.
- Only collect at exactly taskIndent (skip leaf fields deeper).
- Tolerate CRLF line endings, blank lines, and YAML comments
  without resetting block state.
- Handles 2-space and 4-space indent equally.

Verified manually with four fixture shapes: 2-space, 4-space,
with-comments-and-blanks, no-aux-block. All correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:12:55 +02:00
Alan Wizemann bccaba0742 feat(acp,aux): classify resolve_provider_client errors + surface unknown aux tasks
Two fixes for the user-reported "ACP -32603 Internal error" after
removing a Nous OAuth provider while config.yaml still referenced
nous for an auxiliary task. The actual stderr was clear:

  agent.auxiliary_client: resolve_provider_client: nous requested
    but Nous Portal not configured

But Scarf's chat banner showed only the bare JSON-RPC code and
the user had no actionable path through the UI.

**ACPErrorHint.classify** now pattern-matches the
`resolve_provider_client: <name> requested but` stderr line and
extracts the provider name. Surfaces:

  An auxiliary task is configured to use `<name>` but that
  provider isn't authenticated. Open Settings → Aux Models, or
  check ~/.hermes/config.yaml for auxiliary.<task>.provider: <name>
  and switch it to your active provider (or set it to `auto`).

Routed through the existing chat-banner pipeline that already
catches OAuth revocation and missing-credentials errors.

**AuxiliaryTab** gains an "Other tasks in config.yaml" section
that surfaces aux task keys present in YAML but not in Scarf's
typed list (vision, web_extract, compression, session_search,
skills_hub, approval, mcp, flush_memories, curator). Common
case: `auxiliary.summarization.provider: nous` left over from
older Hermes versions or hand-edited configs. Each unknown task
gets a one-click "Reset provider" button that writes
`auxiliary.<key>.provider: auto` — the most-actionable fix
for the OAuth-removal failure mode. Detection scans both
flat-dot-path and nested YAML shapes so it works regardless of
how Hermes dumped the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:00:48 +02:00
Alan Wizemann 4684b9deed feat(credential-pools): OAuth remove button + auto-refresh on auth.json change
User reports the Nous OAuth provider still showed in the
credential pool after they 'removed' it, and Reload didn't help.
Two underlying bugs:

**Bug 1 — no UI path to remove OAuth providers.** The pool view
had a Re-authenticate button on each OAuth row but no remove.
Users who switched active provider thought that removed Nous;
the OAuth tokens stayed in auth.json and the row kept rendering.
Add a trash icon next to Re-authenticate that calls
`hermes auth logout <provider>` after a confirmation dialog.
ViewModel route is `removeOAuthProvider` mirroring
`removeCredential`.

**Bug 2 — view didn't refresh on external auth.json changes.**
Pool view subscribed only to .onAppear and sheet-dismiss. A
terminal `hermes auth logout` or another window's OAuth flow
left the view stale until manually re-entered. Wire up
`fileWatcher.lastChangeDate` so any auth.json mtime tick
triggers a reload (the file watcher already polls auth.json
on the remote SSH path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:46:41 +02:00
Alan Wizemann f6dc45b397 feat(scarfmon): track empty-assistant turns + document Nous quirk
User reports chats "dying" on Nous models — screenshot shows the
assistant bubble stuck with `(°□°) deliberating...` and a
1.7s turn-duration pill (turn DID complete; the content is the
problem). The literal placeholder string isn't in Scarf's source;
it's coming from Hermes or Nous itself when the model emits a
brief thought stream and then fails to produce any visible
output.

ScarfMon trace confirms the failure mode:
  mac.sendViaACP    →  firstThoughtByte (25 bytes)
  mac.handleACPEvent  ✓
  mac.sendPrompt     ✓ (1.7s, normal)
  finalizeStreamingMessage  ✓ (turn cleanly closed)

So Scarf sees no transport error — the turn finalized normally
with empty assistant text plus a small thought stream. The
visible "deliberating" text is content Hermes/Nous chose to
substitute for the missing response.

Adds `mac.emptyAssistantTurn` event (category .chatStream) that
fires whenever a turn finalizes with empty `streamingAssistantText`
and empty `streamingToolCalls`. Bytes carry the thinking-text
length so we can distinguish:
  - bytes=0: total empty turn (model produced nothing)
  - bytes>0: thoughts-only turn (model thought but didn't answer)

Both are user-visible failures. The fix is upstream — Hermes
should refuse to finalize a turn with no response and surface
an error, OR Nous should not return empty responses with the
placeholder string. Document this finding so a future capture
that shows multiple `mac.emptyAssistantTurn` events confirms
the rate / model-correlation.

For now Scarf surfaces the same UX as before (no UI change in
this commit). A follow-on commit could intercept this case and
replace the bubble with a clearer "Model returned no response"
banner, but that requires a confident heuristic for which
empty-finalize cases are real failures vs. legitimate
no-response turns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:40:21 +02:00
Alan Wizemann f2ddcbbd60 feat(model-picker): add search filter to Nous overlay model list
Nous returned 402 models in the recent perf capture (~496 KB of
JSON). The picker's existing top-bar search field already filters
the catalog list (`filteredModels`) but the Nous overlay path
showed all 402 unfiltered, making it nearly unusable.

Add `filteredNousModels` mirroring the `filteredModels` shape:
filters `nousModels` by case-insensitive substring match against
both `id` and `owned_by`. Updates the empty-state overlay so
"no matches" surfaces a different message from "no models
loaded" — the user knows the catalog is fine, the search just
didn't match.

User feedback: "we need a search in the model picker, some of
these lists are large and unorganized."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:38:30 +02:00
Alan Wizemann a193003842 fix(chat): paginate session-load + race-guard against session switch
Two related bugs from remote-context perf captures.

**Bug 1 — 30s timeout fetching the 157-message session.** The
initial page size was 200 messages. For a session including
`reasoning_content` from a thinking model, that produces enough
JSON over `sqlite3 -json | ssh` to time out at exactly 30s on a
420ms-RTT remote, returning 0 rows. Bumping queryTimeout further
just trades latency for stalls.

Drop `HistoryPageSize.initial` from 200 → 50. Sized to fit
comfortably inside the 30s queryTimeout; the existing "Load
earlier" affordance pages back through older messages on demand.

**Bug 2 — session-switch race silently swaps transcripts.** When
the user picks a small chat while a slow fetch for a different
chat is still in flight, the slow fetch finishes second and its
`messages = …` assignment overwrites the small chat's transcript.
User sees the small chat "jump back" to the big one. ScarfMon
trace: parallel `mac.fetchMessages` events at t=641870 (small,
425ms, 2 rows) and t=643316 (big, 30,028ms timeout) — last
write won.

Add a `loadingForSession` capture and three guards: after the
DB refresh, after the primary fetch, after the ACP-fork fetch.
Each compares `self.sessionId` against the captured id; on
mismatch fire `mac.hydrateMessages.dropped` and return without
assigning. Race is silent in normal usage but visible in traces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:38:19 +02:00
Alan Wizemann 93a64e3e82 fix(nous-picker): kill 120s beach-ball — dedupe readCache + 5s timeout
Two stacking bugs in the Nous-overlay branch of the model picker
caused a 120-second beach-ball on remote contexts.

**Bug 1 — duplicated readCache.** ModelPickerSheet.refreshNousModels
called `service.readCache()` directly (for instant first-paint),
then called `service.loadModels(forceRefresh: false)` which calls
`readCache()` AGAIN as its first step. Two SSH round-trips per
picker open. Drop the inline call; loadModels is already cache-first
on its happy path (returns `.cache(...)` when fresh). One read
per open.

**Bug 2 — 60s readFile timeout for a hint.** `readCache()` goes
through SSHTransport.readFile which has a 60s default timeout. On
a remote with a corrupted or oversized cache file, `cat` never
returns and we wait the full 60s — twice, due to bug 1, for a
total 120s picker stall. ScarfMon perf capture (commit 00a1bbd's
diagnostic split) localized this precisely:

  nous.readCache.fileExists  =   251 ms  ✓
  nous.readCache.readFile    = 60,011 ms   (60s timeout)

Cache is an optimization, not a requirement. Added
`readCacheWithTimeout(seconds: 5)` that races readCache against
a 5-second sleep via withTaskGroup. On timeout returns nil; caller
treats that as no-cache and falls through to the network fetch
(which succeeded in 2s in the offending capture, returning 402
models). The runaway `cat` keeps running on its own 60s transport
timeout but no longer blocks the picker.

New ScarfMon event: `nous.readCache.timeoutFired` surfaces hits
in traces so we can tell whether the timeout is being exercised
in the wild.

The underlying `cat` hang on the cache file is still unexplained;
the file size (~500KB) shouldn't take 60s on a 420ms-RTT SSH link.
For now: deleting the cache file (`rm ~/.hermes/scarf/nous_models_cache.json`
on the remote) is the workaround. The next picker open will rebuild it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:17:45 +02:00
Alan Wizemann 00a1bbd109 feat(scarfmon): split nous.readCache into fileExists/readFile/decode/bytes
Last perf capture showed nous.readCache as a single 60-second
interval — but the function does three things (transport.fileExists,
transport.readFile, JSONDecoder). Splitting the measure points so
the next capture localizes which step actually owns the wall-clock.

Adds:
- nous.readCache.fileExists (interval) — SSH `test -e` round-trip
- nous.readCache.readFile (interval) — SSH `cat` round-trip
- nous.readCache.bytes (event) — payload size of the cache file
- nous.readCache.decode (interval) — JSON parsing cost

If the next 60-second beach ball localizes to readFile, we know
the cache file is somehow huge or the SSH read is hung; if it's
fileExists, the path resolution is the issue; if decode, we have
malformed JSON. All three wear the same outer wrapper so the
existing nous.readCache total stays for trend comparison.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:07:43 +02:00
Alan Wizemann 20cc3a2985 perf(sessions): fold sessions+previews into one batched SSH round-trip
Audit Finding 1 — ChatViewModel.loadRecentSessions and
SessionsViewModel.load each fired two sequential `await
dataService.fetch*` calls (sessions + previews), paying the 420 ms
SSH RTT twice on every reload. Visible in ScarfMon traces as
back-to-back `ssh.run` intervals, totaling ~840 ms minimum
overhead per sidebar refresh.

Adds HermesDataService.sessionListSnapshot(limit:) — same shape
as the existing dashboardSnapshot, folds both queries into a
single backend.queryBatch() call. Both call sites switched.

Halves the SSH round-trips for every sidebar load. With Finding 5's
coalescing, redundant parallel reloads also become free. Together,
the 9× redundant queries-per-minute observed in baseline captures
should drop substantially.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:07:31 +02:00
Alan Wizemann 432d5b0b52 fix(remote-sqlite): bump query timeout 15s→30s + add in-flight coalescing
Two issues from the perf capture:

1. fetchMessages on a 157-message session timed out at exactly 15.06 s
   (`mac.fetchMessages` interval = 15,062,646,042 ns), then silently
   returned 0 rows. The chat appeared empty but the session had data;
   the timeout was firing before sqlite3 -json could ship the ~50KB
   payload over a 420 ms-RTT SSH link. Bumped queryTimeout to 30 s.
   The streamScript transport-level timeout still fires on truly
   wedged hosts.

2. mac.loadRecentSessions fired twice in parallel at t=960450 +
   t=960584, finishing 134 ms apart — two independent watcher ticks
   each spawning a full 3-query SSH load for the same data. Added
   in-flight request coalescing keyed on the inlined SQL text:
   when a query with the exact same SQL is already pending, second
   caller awaits the first task instead of spawning a new
   subprocess. New ScarfMon event `sqlite.query.coalesced`
   surfaces hits in traces.

Coalescing is surgical — applies to single `query` calls only,
not `queryBatch` (different timeout scaling, concurrent-same-batch
is rare). Avoids serializing independent work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 12:07:19 +02:00