scarf

mirror of https://github.com/awizemann/scarf.git synced 2026-05-10 10:36:35 +00:00

Author	SHA1	Message	Date
Alan Wizemann	3a764e81e0	Merge remote-tracking branch 'origin/ws-6-providers-v0.13' into integration/v2.8.0 # Conflicts: # scarf/Packages/ScarfCore/Sources/ScarfCore/Models/HermesConfig.swift # scarf/Packages/ScarfCore/Sources/ScarfCore/Parsing/HermesConfig+YAML.swift	2026-05-09 19:10:43 +02:00
Alan Wizemann	6e90741a17	Merge remote-tracking branch 'origin/ws-5-gateway-v0.13' into integration/v2.8.0	2026-05-09 19:09:38 +02:00
Alan Wizemann	93a3b40a67	Merge remote-tracking branch 'origin/ws-4-curator-archive' into integration/v2.8.0	2026-05-09 19:09:38 +02:00
Alan Wizemann	52f0ddb36c	Merge remote-tracking branch 'origin/ws-3-kanban-v0.13' into integration/v2.8.0	2026-05-09 19:09:38 +02:00
Alan Wizemann	cedee04f2a	feat(kanban): v0.13 diagnostics + recovery UX (WS-3) Layers Hermes v0.13's reliability + recovery affordances on top of the v2.7.5 Kanban v3 board. New surface — gated end-to-end on `HermesCapabilities.hasKanbanDiagnostics` (>= v0.13.0): - Hallucination gate. Worker-created cards land in `pending` until the user verifies the underlying work exists. Inspector renders a yellow Verify / Reject banner above the body; cards dim to 0.6 with a question-mark glyph. Verify is optimistic — banner clears immediately, polling confirms. Reject routes through `comment` + `archive` so there's an audit trail. - Generic diagnostics engine. `HermesKanbanDiagnostic` (new model + typed-mirror enum `KanbanDiagnosticKind`) renders cross-run signals on the inspector header and per-run signals under each Runs row. Card footer gains a stethoscope dot when any signal is attached. - `max_retries` create-time field + inspector chip. Toggle-gated Stepper in the create sheet sends `--max-retries N`; chip on the inspector header reads it back read-only with a tooltip explaining there's no update verb. - Multi-line title input. Create sheet's title becomes a `TextField(axis: .vertical, lineLimit: 1...4)`. Newlines are stripped client-side on pre-v0.13 hosts (which truncate at the first `\n`). - Auto-blocked reason banner. When `task.auto_blocked_reason` is set, replaces the generic "Last run: blocked" with a red banner rendering the server reason verbatim. Card footer shows a 1-line truncated copy in red. - Tolerant decode contract. Every new field is `Optional` with `decodeIfPresent`; diagnostics arrays use `try?` so a single malformed entry doesn't poison the row. v0.12 hosts decode unchanged. Implements WS-3 of Scarf v2.8.0 (Hermes v0.13.0 catch-up). Plan: scarf/docs/v2.8/WS-3-kanban-v0.13-plan.md (on coordination/v2.8.0-plans). TODOs marked inline pending integration against a live v0.13 binary: WS-3-Q1 (verify verb name), WS-3-Q2 (diagnostics envelope vs task), WS-3-Q4 (failure_count placement), WS-3-Q5 (darwin-zombie kind string), WS-3-Q6 (max_retries default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 19:06:38 +02:00
Alan Wizemann	b4482e5ee7	feat(gateway): Google Chat platform + cross-platform allowlists + behavior toggles (WS-5) Catches the Mac Messaging Gateway and Platforms surfaces up to Hermes v0.13.0. Adds Google Chat as the 20th platform under Settings → Platforms, gated on `hasGoogleChatPlatform`. Adds a per-platform "Gateway behavior" subsection to the six platforms Hermes added v0.13 allowlist support to (Slack, Mattermost, Google Chat, Telegram, WhatsApp, Matrix) — each exposes the `allowed_channels` / `allowed_chats` / `allowed_rooms` editor plus three new toggles (`busy_ack_enabled`, `gateway_restart_notification`, `slash_command_notice_ttl_seconds`). The Messaging Gateway page header gains a one-line cross-profile digest sourced from `hermes gateway list --json`. SkillsView surfaces an informational row on skills whose body contains the v0.13 `[[as_document]]` directive. New ScarfCore types: `GatewayAllowlistKind` (channels/chats/rooms + platform mapping), `GatewayPlatformSettings` (per-platform v0.13 bundle), `GatewayConfigWriter` (pure YAML list-block editor — `hermes config set` can't write lists; tested with 15 cases incl. round-trip + idempotence + quoting + scalar-sibling preservation), `HermesGatewayListService` (`hermes gateway list --json` parser tolerant of unknown keys + alt field names; 13 tests), `HermesConfig.gatewayPlatforms` field. Mac VM renamed to `MessagingGatewayViewModel` (single-feature local rename; CLAUDE.md "the SidebarSection.gateway enum case stays" invariant upheld). All 22 new tests pass; full ScarfCore suite green except 3 pre-existing `RemoteSQLiteBackendTests` failures unrelated to WS-5. Capability-gated end-to-end. Pre-v0.13 hosts see no Google Chat row, no cross-profile digest, no v0.13 toggles, and no `[[as_document]]` info row — the v2.7.5 surface is byte-for-byte unchanged. Q1-Q3 wire- shape unknowns (Google Chat identifier, YAML key path, `gateway list --json` shape) are marked with `// TODO(WS-5-Q<N>)` and defended by tolerant parsers + dual-spelling lookups. Implements WS-5 of Scarf v2.8.0 (Hermes v0.13.0 catch-up). Plan: scarf/docs/v2.8/WS-5-gateway-v0.13-plan.md (on coordination/v2.8.0-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 19:05:55 +02:00
Alan Wizemann	4757b5ae49	feat(curator): archive + prune + list-archived (WS-4) Catches the Curator surface up to Hermes v0.13's new write-side verbs (`archive <skill>`, `prune`, `list-archived`, synchronous `run`). Adds a new `CuratorService` actor in ScarfCore mirroring `KanbanService`'s pattern (Sendable, pure I/O, `Task.detached(priority: .utility)` per verb), tolerantly-decoded `HermesCuratorArchivedSkill` / `CuratorPruneSummary` models, and `CuratorError` for inline-banner surfacing. Mac UX gains an "Archived" section between the leaderboards and the last-report block (per-row Restore button), an "archivebox" button on every active-skill leaderboard row to manually archive, a destructive "Prune Archived…" confirm sheet enumerating each skill (template- uninstall pattern — Cancel owns `.defaultAction`, Prune is on the red `ScarfDestructiveButton`), and a synchronous-with-progress "Run Now" on v0.13+ hosts (600s timeout, `ProgressView` while in-flight). Failure path routes through a yellow inline error banner instead of a modal alert. The legacy `CuratorRestoreSheet` stays accessible from the overflow menu but only on pre-v0.13 hosts; on v0.13+ the per-row Restore in the new Archived section replaces it. All new surfaces gate on `HermesCapabilities.hasCuratorArchive` — pre-v0.13 hosts see the v2.7.x layout unchanged. iOS picks up the new `runNow(synchronous:)` signature with the v0.13 capability flag; the read-only Archived section + WS-9 marker is left for the next stream. 14 new parser tests in `HermesCuratorParserTests` cover the JSON happy path, the `{"archived": [...]}` envelope, the text fallback (`--json` not supported), `"no archived skills"` sentinel folding, prune-dry-run with both wrapper + bare-array shapes, and zero-skill prune. All 369 ScarfCore tests pass; `xcodebuild` for the `scarf` scheme succeeds. Wire-shape unknowns (CLI flag presence on real v0.13) carry `// TODO(WS-4-Q<N>)` markers in `CuratorService` and fall back defensively when a flag isn't recognized. Implements WS-4 of Scarf v2.8.0 (Hermes v0.13.0 catch-up). Plan: scarf/docs/v2.8/WS-4-curator-archive-plan.md (on coordination/v2.8.0-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 19:03:13 +02:00
Alan Wizemann	57a6340985	feat(providers): catalog refresh + image_gen.model + OpenRouter caching (WS-6) Surfaces the v0.13 provider catalog work in Scarf v2.8.0. Five new model IDs (deepseek/deepseek-v4-pro, x-ai/grok-4.3, openrouter/owl-alpha, tencent/hy3-preview, arcee/trinity-large-thinking) flow through models_dev_cache.json on next refresh — no manual catalog entries needed; the picker reaches them automatically. The grok-4.20-beta → grok-4.20 rename is handled via a new ModelCatalogService.modelAliases map plus resolveModelAlias() helper, called from validateModel(), model(_:_:), and provider(for:) at read time. Lossless: stored configs are never rewritten. Vercel AI Gateway is demoted to the bottom of the picker via a new demotedProviders set + sort-comparator axis (between subscription-gated and alphabetical). Always-on, no capability gate — sort-order consistency across Hermes versions. image_gen.model (top-level v0.13 YAML key) and openrouter.response_cache.enabled (provisional key shape per TODO(WS-6-Q1)) are surfaced as new SettingsSection rows in AuxiliaryTab, capability-gated on hasImageGenModel + hasOpenRouterResponseCache so pre-v0.13 hosts hide them. Image-gen picker has a curated 7-entry allowlist (HermesImageGenModel) plus free-form Custom model ID entry. CLAUDE.md gains two schema-drift bullets next to the existing overlayOnlyProviders requirement (modelAliases + demotedProviders mirror with hermes_cli/providers.py). Tests: 4 new M0cServicesTests (sort axis, alias resolution + cross- provider isolation, image-gen allowlist, demoted-set sentinel) and 2 new M6ConfigCronTests (YAML round-trip + empty-default). Implements WS-6 of Scarf v2.8.0 (Hermes v0.13.0 catch-up). Plan: scarf/docs/v2.8/WS-6-providers-v0.13-plan.md (on coordination/v2.8.0-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 19:02:45 +02:00
Alan Wizemann	edac142d08	feat(chat): add /goal and /queue slash commands (WS-2) Adds Hermes v0.13's Persistent Goals and ACP /queue surfaces to the rich-chat composer. /goal <text> locks the agent on a target across turns (rendered as an info-tinted "Goal locked" pill in the chat header, with a context-menu Clear action that dispatches /goal --clear); /queue <text> queues a prompt to run after the current turn (rendered as a warning-tinted chip with a popover listing queued prompts + relative timestamps). Both ride .acpNonInterruptive so the chat keeps "Agent working…" off, and both surface a 4-second transient toast mirroring /steer's existing UX. Capability-gated end-to-end: the rich-chat slash menu reads through RichChatViewModel.capabilitiesGate (a new @ObservationIgnored field fed by ChatViewModel.attachCapabilitiesStore on Mac and a parallel .task(id:) on iOS), so pre-v0.13 hosts never see /goal or /queue. /steer is greyed-out on idle sessions when hasACPSteerOnIdle is off (pre-v0.13 hosts only). The "Clear all" queue-popover button is intentionally absent in v2.8.0 — Hermes' wire-shape for /queue --clear isn't verified yet, so a button that lies about server-side state is worse than no button (per WS-2 plan Q2 decision). Optimistic-only: there is no authoritative read-back path for the active goal in v2.8.0. The pill paints synchronously off the optimistic write the moment the user sends /goal …; cross-session resume won't re-paint it until the user types /goal again. A TODO(WS-2-Q1) marker in RichChatViewModel.recordActiveGoal points at the read-back hook for v2.8.1; TODO(WS-2-Q5) flags the verbatim /queue argument shape for coordinator wire-verification; TODO(WS-2-Q7) flags the /goal non-interruptive classification. TODO(v2.8.1) in handlePromptComplete is the deferred "auto-resumed from checkpoint" indicator (WS-2 plan Q3 decision). iOS surfaces no UI yet (deferred to WS-9), but the iOS controller's _sendImpl mirrors the dispatch so the shared RichChatViewModel state stays aligned across platforms — otherwise an iOS user who ran /goal then opened the same session on Mac would see an empty pill. Tests: extends M9SlashCommandTests with 13 new cases covering the non-interruptive list contents, capability-gated availableCommands filtering on v0.12 vs v0.13, parseGoalArgument variants, optimistic mutators (recordActiveGoal / recordQueuedPrompt / popQueuedPrompt), isNonInterruptiveSlash recognition, and reset() drainage. Implements WS-2 of Scarf v2.8.0 (Hermes v0.13.0 catch-up). Plan: scarf/docs/v2.8/WS-2-goals-and-queue-plan.md (on coordination/v2.8.0-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 18:55:54 +02:00
Alan Wizemann	3e470c7155	Merge pull request #80 from awizemann/ws-1-capabilities-v0.13 feat(capabilities): add Hermes v0.13 capability flags + version bump (WS-1)	2026-05-09 18:17:51 +02:00
Alan Wizemann	963d0e1a5c	feat(capabilities): add isV013OrLater convenience predicate Surfaces a v0.12 → v0.13 boundary check that doesn't proxy through any specific feature flag. Used by WS-8 (redaction default-state hint copy, "v0.13 features active" Settings badge in iOS WS-9) where the call site isn't actually about a specific feature — it's about whether the host is on the v0.13 line. Equivalent to any individual v0.13 flag (e.g. `hasGoals`); both resolve to the same `>= 0.13.0` threshold. Convenience exists to keep call sites honest: `caps.isV013OrLater` reads better than `caps.hasGoals` when the context isn't goal-related. Tests: 4 new fixtures covering v0.13 host (true), v0.12 host (false), empty/undetected (false), and v0.14 host (true). 19 total tests in the suite, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 18:08:14 +02:00
Alan Wizemann	52c802676f	feat(capabilities): add Hermes v0.13 capability flags + version bump Adds 22 new capability flags grouped under a v0.13 (v2026.5.7) MARK section in HermesCapabilities, covering Persistent Goals, ACP /queue + /steer-on-idle, Kanban diagnostics + recovery UX, Curator archive + prune, Google Chat (20th platform), cross-platform allowlists, MCP SSE transport, Cron --no-agent, Web Tools backend split, Profiles --no-skills, context compression count, /new <name>, OpenRouter cache, image_gen.model, display.language, xAI voice cloning, video_analyze, and the transform_llm_output plugin hook. Each flag gates on >= 0.13.0 so v0.13 patch releases (0.13.4 etc.) still light up every flag. Existing v0.12 flags unchanged. Test suite extends with v0.13.0/2026.5.7 fixtures, a v0.13.4 patch-release case, explicit "v0.13 flags off on v0.12 host" coverage, and updates the future-version test to v0.14.0. CLAUDE.md target line bumps to v2026.5.7 (v0.13.0); a new v2026.5.7 section mirrors the v0.12 / v0.11 scaffolding describing the Scarf- relevant subset. The v0.12 + v0.11 historical sections remain intact since pre-v0.13 hosts still consume those flags. Foundation for the v2.8.0 Scarf release — every subsequent work-stream (WS-2 through WS-9) consumes flags added here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 17:31:51 +02:00
Alan Wizemann	5d8873d305	chore: Bump version to 2.7.5 v2.7.5	2026-05-08 13:59:21 +02:00
Alan Wizemann	49bc4efe83	fix(kanban): enrich LocalTransport subprocess env so kanban dispatcher can spawn workers GUI-launched Scarf inherits macOS's launch-services PATH (`/usr/bin:/bin:/usr/sbin:/sbin`). Scarf itself finds `hermes` via absolute-path resolution in `HermesPathSet.hermesBinaryCandidates`, but when the kanban dispatcher (a child of Scarf) tries to spawn a worker, the worker inherits the same stripped PATH and Hermes's spawn machinery prints `\`hermes\` executable not found on PATH. Install Hermes Agent or activate its venv before running the kanban dispatcher.` — recording `outcome=spawn_failed` on the run. `LocalTransport` now mirrors `SSHTransport.environmentEnricher`: adds an `environmentEnricher: (() -> [String: String])?` static, and applies it to every subprocess. `scarfApp.swift` wires it at launch to the same `HermesFileService.enrichedEnvironment()` login-shell probe (`zsh -l -i` → `zsh -l` fallback) the SSH transport already uses, so subprocesses see `~/.local/bin`, `/opt/homebrew/bin`, and the user's credential env vars. Defense-in-depth: `subprocessEnvironment(forExecutable:)` always prepends the executable's own directory to PATH if missing — covers early-startup paths and test harnesses where the enricher hasn't been wired yet. Two new tests in `KanbanModelsTests` lock in: 1. The fallback (no enricher → executable's dir lands on PATH) 2. The enricher win for PATH + the empty-string-aware copy semantics for credential env vars (process env happens to set `ANTHROPIC_API_KEY=""` as an empty string in some environments; the enricher's non-empty value must still take effect) Release notes for v2.7.5 updated to document the fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 13:59:21 +02:00
Alan Wizemann	adcc984091	feat(kanban): full read/write board with per-project tenants Lifts Scarf's Kanban surface from the v2.6 read-only list to a drag-and-drop board with the complete Hermes v0.12 mutation surface wired up, plus per-project boards bound to a Scarf-minted tenant slug and a read-only board on iOS. Why now: the v2.6 list was a placeholder shipped while upstream Kanban collab was still mid-rework. v0.12 stabilized the 27-verb CLI; this release makes Scarf a real GUI client for it. Driving real tasks end-to-end exposed and closed a connected bug pattern (claim vs dispatch, silent skipped_unassigned, integer-vs-ISO timestamps, parser-leaked "(no" sentinel) that would have shipped as latent UX papercuts otherwise. ScarfCore: KanbanService actor (Sendable, pure I/O) wrapping every verb; KanbanTenantReader cross-platform manifest projection; eight new model types (TaskDetail, Comment, Event, Run, Stats, Assignee, CreateRequest, Filters); KanbanError; pure transition planner that maps drag-drop column changes to verb sequences, tested against canonical Hermes JSON fixtures. Mac: KanbanBoardView orchestrator with five-column drag-drop layout, optimistic-merge state, KanbanInspectorPane side-pane (Comments / Events / Runs / Log tabs, Log streams worker stdout every 2s while running), inline assignee picker, health banner for unassigned and last-failed-run states. New Task sheet defaults to active profile and auto-fires kanban dispatch on submit. Sidebar moved Kanban from Manage to Monitor. Read-only KanbanListView preserved as Board\|List toggle for narrow windows / accessibility. Per-project: DashboardTab.kanban tab on every project gated on hasKanban; KanbanTenantResolver mints scarf:<slug> tenants on first interaction and persists to .scarf/manifest.json (immutable across rename); ProjectAgentContextService surfaces the tenant in the AGENTS.md scarf-managed block so agents pass --tenant <slug> on kanban create. New kanban_summary dashboard widget; vocabulary mirrored in tools/widget-schema.json and site/widgets.js. iOS: read-only board on the project tab via paged single-column Picker, modal detail sheet with Comments / Events / Runs. Mutations + drag-drop deferred to v2.8. Tests: 19 new pure-logic tests covering decoding, planner verb mapping, argv assembly, glance string formatting, and parser rejection of the kanban assignees empty-state sentinel. All 348 ScarfCore tests pass. Constraints documented in CLAUDE.md: no within-column reorder (Hermes has no update --priority verb); no live watch streaming yet (5s polling for board, 2s for log); no bulk re-tag for legacy NULL-tenant tasks. Pre-v0.12 Hermes hosts gracefully hide the surface end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 13:59:21 +02:00
Alan Wizemann	fd80f4f95a	Create FUNDING.yml	2026-05-07 12:55:53 +02:00
Alan Wizemann	9f240ae291	chore: Bump version to 2.7.1 v2.7.1	2026-05-07 12:46:11 +02:00
Alan Wizemann	9c149b288b	fix(docs): restore Sonoma compatibility messaging in BUILDING.md + CONTRIBUTING.md Scarf's `MACOSX_DEPLOYMENT_TARGET` is `14.6` (Sonoma) on the main `scarf` target, set in `86762ea`. Sonoma support is intentional — several users dogfood on macOS 14.x and we want to keep them on the release channel. Yesterday's BUILDING.md and the long-stale CONTRIBUTING.md statement both claimed macOS/Xcode 26.x as minimums, which would have steered Sonoma contributors and users away from a build that actually runs on their box. Correct values: - Runtime min: macOS 14.6 (Sonoma) — matches the deployment target. - Build min: Xcode 16.0 — needed for Swift 6 strict-concurrency features the codebase uses. Add a load-bearing-callout to BUILDING.md so future doc edits don't silently raise the floor again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 12:13:18 +02:00
Alan Wizemann	37afbdeffc	feat(build): contributor-friendly local-build.sh + BUILDING.md Adds `scripts/local-build.sh` for unsigned command-line Debug builds so contributors without an Apple Developer account can clone, build, and run without provisioning gymnastics. The script: - Detects arm64 / x86_64 - Verifies xcode-select, xcrun, xcodebuild are present - Probes the Metal toolchain and offers an interactive install (gated on `[[ -t 0 && -z "${CI:-}" ]]` — CI never gets prompted) - Resolves Swift packages, builds Debug with signing disabled - Optionally `ditto`s the result to /Applications/scarf.app on explicit y/N `BUILDING.md` documents prerequisites alongside the script. Existing canonical Release universal CLI in README stays — `local-build.sh` is an alternative for contributors, not a replacement for the shipping build. Cherry-picked from #76 with thanks to @unixwzrd. BUILDING.md's prerequisites are corrected to match the actual deployment target (macOS 26.2, Xcode 26.2+). Co-Authored-By: M S <unixwzrd.register@mac.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 12:08:33 +02:00
Alan Wizemann	bfd9bab9a0	fix(health): stop external dashboards by port, not pkill -f `stopDashboard()` used to fall back to `pkill -f "hermes dashboard"` when the running dashboard wasn't a Scarf-spawned subprocess. That's broad enough to match shell history, log tails, README readers, and this very source file — anything with the substring "hermes dashboard" in its argv was a kill target. Replace with a port-anchored lookup: `lsof -tiTCP:<port> -sTCP:LISTEN` returns the PID actually bound to the dashboard port, then we `SIGTERM` only that one process. Trusting the port is correct here: Scarf owns the configured port and the user-visible intent is "stop the thing on this port." We deliberately omit `lsof -c hermes`. Hermes installs as a Python shebang script (verified locally — `file ~/.local/bin/hermes` → "a python3 script text executable"), so the kernel COMM is `python` / `python3`, never `hermes`. A `-c hermes` filter would silently miss every standard install. Cherry-picked from #76 with thanks to @unixwzrd for the direction; this version drops the `-c hermes` filter to actually fire on real Hermes installs. Co-Authored-By: M S <unixwzrd.register@mac.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 12:08:23 +02:00
Alan Wizemann	2e0eb63ea4	fix(health): tighten Hermes gateway pgrep so unrelated commands don't match `hermesPIDResult()` was running `pgrep -f hermes`, which matched any process with "hermes" anywhere in its argv — `hermes acp` chat sessions Scarf itself spawns, `hermes -z` one-shots, log tails, even this very file in an editor. The Dashboard "Hermes is running" badge read true even when the gateway daemon was down. Narrow the match to the gateway shape specifically. Two alternations cover both invocation forms used in the wild: - `python -m hermes_cli.main gateway run …` (the launchctl form) - `/path/to/hermes gateway run …` (the script-path form) Verified locally against an actual gateway PID: cmd=/Users/.../python -m hermes_cli.main gateway run --replace The first alternation matches via the `-m hermes_cli.main gateway run` boundary. All callers — `stopHermes()`, `DashboardViewModel`, `HealthViewModel`, `SettingsViewModel`, `scarfApp` — semantically want the gateway PID specifically, so the narrower match is the right shape, not a behavior change. Cherry-picked from #76 with thanks to @unixwzrd for the diagnosis and the regex. Co-Authored-By: M S <unixwzrd.register@mac.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 12:08:11 +02:00
Alan Wizemann	3a3c87e033	fix(skills): scope What's New pill to Installed tab + reword updated→changed Issue #78 — The "What's New" pill at the top of the Skills page announced "18 new, 3 updated since you last looked" while the Updates sub-tab simultaneously said "No Updates / All skills are up to date." Two surfaces measuring two different things both used the word "update": the pill counts local file deltas since the user last clicked "Mark as seen", while the Updates body runs `hermes skills check` to find skills with newer upstream versions available. From the user's seat the screen contradicted itself. Two changes: 1. Render the pill only on the Installed sub-tab (Mac + ScarfGo). Local file deltas are contextually meaningful only on the tab that surfaces installed skills; showing them above Browse Hub or Updates was misleading. 2. Reword the pill: "X updated since you last looked" → "X changed since you last looked". Keeps `SkillSnapshotDiff.updatedCount` as the field name (it's still about file changes, not version bumps); only the user-visible string changes. Removes the vocabulary collision with the Updates tab's separate upstream-update check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 11:51:05 +02:00
Alan Wizemann	f9e3cd38f5	fix(skills): client-side filter for All-Sources hub search Issue #79 — Browse Hub clearly listed "honcho" but searching for "honcho" with the source picker on "All Sources" returned nothing. Root cause is on the Hermes side: `hermes skills search <query>` without a `--source` flag routes through the centralized `hermes-index` source and skips the external API sources (skills-sh, github, clawhub, lobehub, well-known, claude-marketplace). Browse aggregates those sources too, so any skill that lives only in the API tier shows up in browse but disappears in search. Same picker, same query, contradictory results. Rather than chase Hermes's index gaps, redefine "All Sources" search in Scarf to mean filter-what-you-see — the canonical type-to-filter UX users already expect on a list. Source-specific searches keep the CLI shell-out for full upstream search semantics on that registry. Implementation: - New `lastBrowseResults` cache populated on every successful `browseHub()`. Setter is `internal` so the test suite can seed without invoking the live CLI; out-of-module callers can still only read. - `searchHub()` now branches on `hubSource`. The "all" branch filters the cache via `localizedCaseInsensitiveContains` against name, description, and identifier, runs synchronously on the calling actor (UI invocations are already on MainActor) so the user sees the narrowed list without a render-tick gap. - If the cache is empty (search-before-browse), `browseHubThenFilter` performs one CLI fetch, populates the cache, then applies the filter — failure surfaces a "Search failed" banner instead of a silent empty state. - Source-specific search still shells out to `hermes skills search <query> --source <s> --limit 40`. Adds five regression tests covering name match, description match, case-insensitive folding, no-match message state, and the empty-query fallthrough to browse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 11:50:52 +02:00
Alan Wizemann	a6a8cae8ff	fix(transport): drain ssh stdout/stderr concurrently to unwedge >64KB payloads Issue #77 — Sessions screen rendered empty even though Dashboard reported 161 sessions and Activity reported 116. Root cause was a classic pipe-buffer deadlock in SSHScriptRunner: stdout was read via `readToEnd()` AFTER the subprocess had exited. macOS pipes default to a 16–64 KB kernel buffer; once the remote `sqlite3 -json` script wrote more than that to its stdout, ssh back-pressured across the wire, sshd back-pressured sqlite3, sqlite3 blocked, the script never finished, the 30-second timeout fired, `streamScript` threw, and `HermesDataService.sessionListSnapshot()` swallowed the failure into an empty array. Empty Sessions list. Dashboard kept working because its smaller LIMIT 5 payload fit under the threshold. Why this was a v2.7 regression specifically: `20cc3a2` folded the previously-separate sessions + previews queries into a single batched round-trip (perf win for remote users). The new combined payload for ~150+ sessions crossed the buffer threshold for the first time. Fix: drain stdout/stderr concurrently with the running process via Foundation's `FileHandle.readabilityHandler`, accumulating chunks into an NSLock-guarded `Data` buffer. The kernel pipe never fills, the subprocess never blocks, the script returns the full payload. Same change applied to both the SSH path (`runOverSSH`) and the local path (`runLocally`) — they had identical bug shapes. Adds SSHScriptRunnerTests with three regression checks: a 256 KB synthetic payload that would have wedged pre-fix, a small-payload sanity round-trip, and a non-zero exit propagation check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 11:50:34 +02:00
Alan Wizemann	6b66b1c96f	perf(ios): wire v2.7 perf parity — instrument iOS-only VMs + surface hydration banner + opt-in toggle Most of the v2.7 perf work was already covered on iOS via shared code in ScarfCore — `RichChatViewModel.loadSessionHistory` (and its skeleton-then-hydrate path), `hydrateAssistantToolCalls`, `fetchSkeletonMessages`, `fetchRecentToolCallSkeleton`, `ModelPreflight.detectMismatch`, and the `RemoteSQLiteBackend` cancellation handler all flow through to the ScarfGo chat unchanged. `CitadelServerTransport.streamScript` already honors `Task.isCancelled` correctly via `withThrowingTaskGroup` + `Task.checkCancellation()`, so the SSH-cancellation-on-nav-away chain works on iOS without the Mac-side `SSHScriptRunner` fix. Three iOS-specific gaps closed: * IOSCronViewModel.load + IOSMemoryViewModel.load wrapped in `ScarfMon.measureAsync(.diskIO, "ios.cron.load")` / `"ios.memory.load"` — parity with the Mac `cron.load` / `memory.load` events. `ios.memory.load.bytes` records the payload size for the loaded file. * iOS Settings → "Chat (Scarf)" section gains a toggle bound to `RichChatViewModel.loadHistoricalToolResultsKey` so iOS users can opt into Phase 2b bulk tool-result hydration, same as the Mac DisplayTab. The shared key means the gate inside `startToolHydration` reads the right value automatically — no extra plumbing needed. * iOS ChatView surfaces `isHydratingTools` as a "Loading tool details…" connection banner (mirrors the Mac toolbar pill added in v2.7 perf work). Sits between the existing "Thinking…" banner and the empty-view fallback so chat status is always honest about what the agent and Scarf are doing. Both Mac and iOS targets build clean; all 321 ScarfCore tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 21:26:25 +02:00
Alan Wizemann	97ec4d2882	chore: Bump version to 2.7.0 v2.7.0	2026-05-05 20:41:39 +02:00
Alan Wizemann	cd5bb32a21	release: prep v2.7.0 — consolidated notes + in-app Sparkle release notes Rolls up everything since v2.6.5 (36 commits across remote-perf, project wizard, dashboard widgets, OAuth resilience, ScarfMon instrumentation, and the v2.7 skeleton-then-hydrate redesign) into a single 2.7.0 release. * releases/v2.7.0/RELEASE_NOTES.md — full consolidated notes, reorganized around the throughline (slow-remote performance) with five thematic sections: skeleton-then-hydrate loaders, SSH cancellation, project wizard + Keychain cron secrets, dashboard widgets, OAuth resilience, and ScarfMon. Replaces the previously- drafted dashboard-only v2.7.0 stub and the separate v2.8 wizard stub (both unreleased). * releases/v2.8/ — deleted; folded into v2.7. * README.md — "What's New in 2.6" → "What's New in 2.7" with the five-section summary linking out to the full notes. * tools/render-release-notes.py — stdlib-only Markdown → HTML renderer covering the subset of GitHub-flavored markdown that release notes use (## / ### headings, paragraphs, ul lists, fenced code, inline code/bold/italic/links, hr). Output includes a small <style> block tuned for Sparkle's update alert WebKit view (light + dark variants via prefers-color-scheme). * scripts/release.sh — render the active RELEASE_NOTES.md and inject the result as <description><![CDATA[...]]></description> on the appcast item. Sparkle's standard updater renders this in the in-app update sheet so users see release-specific "what's new" alongside the version number, not just the bare version. Falls back to a "see GitHub release page" placeholder when the notes file is missing. User runs ./scripts/release.sh 2.7.0 to ship. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 20:31:27 +02:00
Alan Wizemann	5e23b59697	test(model-preflight): cover detect-mismatch + fix newline-trim bug * New ModelPreflightTests suite (19 tests) covering both `check(_:)` and the v2.8 `detectMismatch(_:)` paths. Pins the dogfooding scenario (anthropic-prefixed model + nous active provider after Credential Pools OAuth swap), the case-insensitive prefix match, empty-prefix / empty-bare-model edge cases, and multi-slash model ids (OpenRouter style). * Bug fix surfaced by the tests: `ModelPreflight` was using `trimmingCharacters(in: .whitespaces)` which doesn't strip newlines. A stray `\n` in a hand-edited config.yaml would either miss the missing-fields classifier OR false-positive the mismatch banner (showing "anthropic" vs "anthropic\n"). Switched both trims to `.whitespacesAndNewlines`. perf(observability): instrument Tier C load paths + fetchSessionPreviews No behavior change — adds ScarfMon coverage so future captures show how often Memory/Skills/Cron/Curator/SessionPreviews load paths fire and what they cost on remote (each is multiple sequential SFTP RTTs that pre-fix were invisible). New events: * `mac.fetchSessionPreviews` / `.rows` / `.transportError` * `memory.load` / `.bytes` * `cron.load` / `.jobs` * `skills.load` / `.count` * `curator.load` / `.bytes` All 321 ScarfCore tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 20:03:35 +02:00
Alan Wizemann	09e33b2999	perf(chat,activity,transport): skeleton-then-hydrate loaders + SSH cancellation propagation Major perf overhaul for slow-remote contexts. Chats and Activity now render in <2s instead of timing out at 30s; abandoned SSH work is killed within 100ms instead of pinning a ControlMaster session. * Skeleton-then-hydrate chat loader. New `fetchSkeletonMessages` selects user+assistant rows only (skips role='tool', NULLs tool_calls + reasoning at the SQL level). Wire payload bounded by conversational text alone — sub-second on remote regardless of underlying tool result blob sizes. Background `startToolHydration` pages through `hydrateAssistantToolCalls` (5-id batches) to splice tool calls in. Tool-result CONTENT is opt-in via Settings → Display → "Load tool results in past chats" (default off); inspector pane lazy-fetches per-result via `fetchToolResult(callId:)` on expand. * Skeleton-then-hydrate Activity loader. New `fetchRecentToolCallSkeleton` returns metadata-only rows in ~3 KB for 50 entries; placeholder ActivityRows render immediately, real per-call entries swap in as paged hydration completes. Loading pill in the page header, orange transport-error banner replaces the pre-fix silent empty state. * SSH cancellation propagation. `Task.detached` and unstructured `Task<...> { ... }` don't inherit cancellation from awaiting parents — without bridging, killing a Swift Task left the ssh subprocess running for the full 30s deadline, pinning a remote sqlite query and a ControlMaster session. Wired `withTaskCancellationHandler` through `SSHScriptRunner.run` and `RemoteSQLiteBackend.query`; cancellation now reaches `Process` within ~100ms. New `ssh.cancelled` ScarfMon event. * L1 single-id retry. When a 5-id `hydrateAssistantToolCalls` page trips the 30s timeout (one row carries an oversized tool_calls blob — long Edit args, big diffs), fall back to single-id queries to isolate the whale. Non-whale rows in the same batch hydrate normally; whale row stays bare. New `mac.hydrateToolCalls.singleTimeout` event tracks how often the recovery fires. * L2 in-flight coalescing for `loadRecentSessions`. File-watcher deltas during streaming used to stack 2-3 parallel sessions-list reload tasks; subsequent callers now await the active one. New `mac.loadRecentSessions.coalesced` event tracks dedup hits. * Loading-state UX hardening. New `isStartingSession` flag flips synchronously on user click so the chat sidebar greys + disables immediately instead of waiting for `client.start()` to return (5-7s on remote). Phase-typed status: "Spawning hermes acp…" → "Authenticating…" → "Loading session…" → "Loading history…" → "Ready". `ChatSessionListPane` overlays a ProgressView showing the current phase. * Partial-result detection. `fetchMessagesOutcome` distinguishes a transport failure from a genuine empty result; `loadSessionHistory` surfaces "Couldn't load full chat history — connection timed out" through the existing acpError triplet so the user sees what happened instead of a silent empty transcript. * Model/provider mismatch banner. `ModelPreflight.detectMismatch` recognizes when `model.default` carries a `<provider>/...` prefix that disagrees with `model.provider` (e.g. anthropic prefix + nous active provider after switching OAuth via Credential Pools). Banner offers one-click fix in either direction. Companion: ACP error classifier recognizes `model_not_found` / `404 messages` and surfaces "Hermes pins each session to its original model — start a new chat" so the pinned-model failure mode has a clear recovery path. * OAuth-completion provider swap prompt. After successful OAuth in Credential Pools, if the just-authed provider differs from `model.provider` in config.yaml, surface "Switch active provider to <name>?" with [Switch] / [Keep current] instead of auto-dismissing. All 302 ScarfCore tests pass. New ScarfMon events documented in the Performance-Monitoring wiki page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 19:43:53 +02:00
Alan Wizemann	9f2e2ecfcd	perf(chat): exclude reasoning_content from initial fetch + drop page size to 25 The 160-message thinking-model session still timed out at the 30s ceiling even after dropping page size 200→50 in commit `a193003`. ScarfMon trace: mac.fetchMessages 30,105,329,125 ns ← 30s timeout fired mac.hydrateMessages.rows count=1 ← 1 partial row only Root cause: `reasoning_content` is huge on thinking models (20+ KB per row). Even 50 rows × 30 KB = 1.5 MB JSON shipping over a 420ms-RTT remote SSH channel exceeds the budget. The chat appeared empty AGAIN. Two cuts: 1. `messageColumnsLight` — same as messageColumns but omits `reasoning_content`. Used by `fetchMessages` so the bulk wire payload is small. `messageFromRow` reads reasoning_content via `row.optionalString(at: 11)` which gracefully returns nil when the column isn't present, so the shape change is transparent. 2. `fetchReasoningContent(for:)` — single-row lazy fetch the inspector pane calls when the user expands a thinking disclosure. One small SSH round-trip per inspection vs. paying for ALL reasoning content on every session boot. 3. `HistoryPageSize.initial` 50 → 25 — sized for the lite column shape with margin for sessions that include some heavy tool-call payloads. The "Load earlier" affordance still pages back through older messages. Net effect on the user-reported case: 160-message session loads the most-recent 25 messages in ~5-10s (one SSH round-trip ~420ms plus ~3 KB × 25 = 75 KB wire). The remaining 135 are reachable via Load earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:28:40 +02:00
Alan Wizemann	1eb5c92f6a	fix(aux-tab): correct nested-YAML parser so unknown-task surface works on remote Bug 1 — the previous parser collected every indented child under `auxiliary:` as if it were a task name, including leaf fields (provider, model, base_url, api_key, timeout). Result: bogus rows on local where the parser happened to fire, plus pollution of the unknown-tasks set with field names that subtractFrom-known left orphaned. Bug 2 — the flat-dot-path branch (`auxiliary.X.Y:`) was dead code. config.yaml is always nested YAML; the dot-path form only appears in interactive `hermes config get` output, never on disk. Removing it. User reported the unknown-tasks section showed on local but not on remote. Most likely root cause: the buggy parser surfaced junk on local (where their config has nested-form aux settings) while the dead flat-path branch never fired on remote either, so remote silently rendered nothing. With the parser fixed both contexts now surface real unknown task names if any are present. Rewrite as a clean two-pass walker: - First nested line inside the block locks taskIndent. - Only collect at exactly taskIndent (skip leaf fields deeper). - Tolerate CRLF line endings, blank lines, and YAML comments without resetting block state. - Handles 2-space and 4-space indent equally. Verified manually with four fixture shapes: 2-space, 4-space, with-comments-and-blanks, no-aux-block. All correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:12:55 +02:00
Alan Wizemann	bccaba0742	feat(acp,aux): classify resolve_provider_client errors + surface unknown aux tasks Two fixes for the user-reported "ACP -32603 Internal error" after removing a Nous OAuth provider while config.yaml still referenced nous for an auxiliary task. The actual stderr was clear: agent.auxiliary_client: resolve_provider_client: nous requested but Nous Portal not configured But Scarf's chat banner showed only the bare JSON-RPC code and the user had no actionable path through the UI. ACPErrorHint.classify now pattern-matches the `resolve_provider_client: <name> requested but` stderr line and extracts the provider name. Surfaces: An auxiliary task is configured to use `<name>` but that provider isn't authenticated. Open Settings → Aux Models, or check ~/.hermes/config.yaml for auxiliary.<task>.provider: <name> and switch it to your active provider (or set it to `auto`). Routed through the existing chat-banner pipeline that already catches OAuth revocation and missing-credentials errors. AuxiliaryTab gains an "Other tasks in config.yaml" section that surfaces aux task keys present in YAML but not in Scarf's typed list (vision, web_extract, compression, session_search, skills_hub, approval, mcp, flush_memories, curator). Common case: `auxiliary.summarization.provider: nous` left over from older Hermes versions or hand-edited configs. Each unknown task gets a one-click "Reset provider" button that writes `auxiliary.<key>.provider: auto` — the most-actionable fix for the OAuth-removal failure mode. Detection scans both flat-dot-path and nested YAML shapes so it works regardless of how Hermes dumped the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:00:48 +02:00
Alan Wizemann	4684b9deed	feat(credential-pools): OAuth remove button + auto-refresh on auth.json change User reports the Nous OAuth provider still showed in the credential pool after they 'removed' it, and Reload didn't help. Two underlying bugs: Bug 1 — no UI path to remove OAuth providers. The pool view had a Re-authenticate button on each OAuth row but no remove. Users who switched active provider thought that removed Nous; the OAuth tokens stayed in auth.json and the row kept rendering. Add a trash icon next to Re-authenticate that calls `hermes auth logout <provider>` after a confirmation dialog. ViewModel route is `removeOAuthProvider` mirroring `removeCredential`. Bug 2 — view didn't refresh on external auth.json changes. Pool view subscribed only to .onAppear and sheet-dismiss. A terminal `hermes auth logout` or another window's OAuth flow left the view stale until manually re-entered. Wire up `fileWatcher.lastChangeDate` so any auth.json mtime tick triggers a reload (the file watcher already polls auth.json on the remote SSH path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:46:41 +02:00
Alan Wizemann	f6dc45b397	feat(scarfmon): track empty-assistant turns + document Nous quirk User reports chats "dying" on Nous models — screenshot shows the assistant bubble stuck with `(°□°) deliberating...` and a 1.7s turn-duration pill (turn DID complete; the content is the problem). The literal placeholder string isn't in Scarf's source; it's coming from Hermes or Nous itself when the model emits a brief thought stream and then fails to produce any visible output. ScarfMon trace confirms the failure mode: mac.sendViaACP → firstThoughtByte (25 bytes) mac.handleACPEvent ✓ mac.sendPrompt ✓ (1.7s, normal) finalizeStreamingMessage ✓ (turn cleanly closed) So Scarf sees no transport error — the turn finalized normally with empty assistant text plus a small thought stream. The visible "deliberating" text is content Hermes/Nous chose to substitute for the missing response. Adds `mac.emptyAssistantTurn` event (category .chatStream) that fires whenever a turn finalizes with empty `streamingAssistantText` and empty `streamingToolCalls`. Bytes carry the thinking-text length so we can distinguish: - bytes=0: total empty turn (model produced nothing) - bytes>0: thoughts-only turn (model thought but didn't answer) Both are user-visible failures. The fix is upstream — Hermes should refuse to finalize a turn with no response and surface an error, OR Nous should not return empty responses with the placeholder string. Document this finding so a future capture that shows multiple `mac.emptyAssistantTurn` events confirms the rate / model-correlation. For now Scarf surfaces the same UX as before (no UI change in this commit). A follow-on commit could intercept this case and replace the bubble with a clearer "Model returned no response" banner, but that requires a confident heuristic for which empty-finalize cases are real failures vs. legitimate no-response turns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:40:21 +02:00
Alan Wizemann	f2ddcbbd60	feat(model-picker): add search filter to Nous overlay model list Nous returned 402 models in the recent perf capture (~496 KB of JSON). The picker's existing top-bar search field already filters the catalog list (`filteredModels`) but the Nous overlay path showed all 402 unfiltered, making it nearly unusable. Add `filteredNousModels` mirroring the `filteredModels` shape: filters `nousModels` by case-insensitive substring match against both `id` and `owned_by`. Updates the empty-state overlay so "no matches" surfaces a different message from "no models loaded" — the user knows the catalog is fine, the search just didn't match. User feedback: "we need a search in the model picker, some of these lists are large and unorganized." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:38:30 +02:00
Alan Wizemann	a193003842	fix(chat): paginate session-load + race-guard against session switch Two related bugs from remote-context perf captures. Bug 1 — 30s timeout fetching the 157-message session. The initial page size was 200 messages. For a session including `reasoning_content` from a thinking model, that produces enough JSON over `sqlite3 -json \| ssh` to time out at exactly 30s on a 420ms-RTT remote, returning 0 rows. Bumping queryTimeout further just trades latency for stalls. Drop `HistoryPageSize.initial` from 200 → 50. Sized to fit comfortably inside the 30s queryTimeout; the existing "Load earlier" affordance pages back through older messages on demand. Bug 2 — session-switch race silently swaps transcripts. When the user picks a small chat while a slow fetch for a different chat is still in flight, the slow fetch finishes second and its `messages = …` assignment overwrites the small chat's transcript. User sees the small chat "jump back" to the big one. ScarfMon trace: parallel `mac.fetchMessages` events at t=641870 (small, 425ms, 2 rows) and t=643316 (big, 30,028ms timeout) — last write won. Add a `loadingForSession` capture and three guards: after the DB refresh, after the primary fetch, after the ACP-fork fetch. Each compares `self.sessionId` against the captured id; on mismatch fire `mac.hydrateMessages.dropped` and return without assigning. Race is silent in normal usage but visible in traces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:38:19 +02:00
Alan Wizemann	93a64e3e82	fix(nous-picker): kill 120s beach-ball — dedupe readCache + 5s timeout Two stacking bugs in the Nous-overlay branch of the model picker caused a 120-second beach-ball on remote contexts. Bug 1 — duplicated readCache. ModelPickerSheet.refreshNousModels called `service.readCache()` directly (for instant first-paint), then called `service.loadModels(forceRefresh: false)` which calls `readCache()` AGAIN as its first step. Two SSH round-trips per picker open. Drop the inline call; loadModels is already cache-first on its happy path (returns `.cache(...)` when fresh). One read per open. Bug 2 — 60s readFile timeout for a hint. `readCache()` goes through SSHTransport.readFile which has a 60s default timeout. On a remote with a corrupted or oversized cache file, `cat` never returns and we wait the full 60s — twice, due to bug 1, for a total 120s picker stall. ScarfMon perf capture (commit 00a1bbd's diagnostic split) localized this precisely: nous.readCache.fileExists = 251 ms ✓ nous.readCache.readFile = 60,011 ms ❌ (60s timeout) Cache is an optimization, not a requirement. Added `readCacheWithTimeout(seconds: 5)` that races readCache against a 5-second sleep via withTaskGroup. On timeout returns nil; caller treats that as no-cache and falls through to the network fetch (which succeeded in 2s in the offending capture, returning 402 models). The runaway `cat` keeps running on its own 60s transport timeout but no longer blocks the picker. New ScarfMon event: `nous.readCache.timeoutFired` surfaces hits in traces so we can tell whether the timeout is being exercised in the wild. The underlying `cat` hang on the cache file is still unexplained; the file size (~500KB) shouldn't take 60s on a 420ms-RTT SSH link. For now: deleting the cache file (`rm ~/.hermes/scarf/nous_models_cache.json` on the remote) is the workaround. The next picker open will rebuild it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:17:45 +02:00
Alan Wizemann	00a1bbd109	feat(scarfmon): split nous.readCache into fileExists/readFile/decode/bytes Last perf capture showed nous.readCache as a single 60-second interval — but the function does three things (transport.fileExists, transport.readFile, JSONDecoder). Splitting the measure points so the next capture localizes which step actually owns the wall-clock. Adds: - nous.readCache.fileExists (interval) — SSH `test -e` round-trip - nous.readCache.readFile (interval) — SSH `cat` round-trip - nous.readCache.bytes (event) — payload size of the cache file - nous.readCache.decode (interval) — JSON parsing cost If the next 60-second beach ball localizes to readFile, we know the cache file is somehow huge or the SSH read is hung; if it's fileExists, the path resolution is the issue; if decode, we have malformed JSON. All three wear the same outer wrapper so the existing nous.readCache total stays for trend comparison. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:07:43 +02:00
Alan Wizemann	20cc3a2985	perf(sessions): fold sessions+previews into one batched SSH round-trip Audit Finding 1 — ChatViewModel.loadRecentSessions and SessionsViewModel.load each fired two sequential `await dataService.fetch*` calls (sessions + previews), paying the 420 ms SSH RTT twice on every reload. Visible in ScarfMon traces as back-to-back `ssh.run` intervals, totaling ~840 ms minimum overhead per sidebar refresh. Adds HermesDataService.sessionListSnapshot(limit:) — same shape as the existing dashboardSnapshot, folds both queries into a single backend.queryBatch() call. Both call sites switched. Halves the SSH round-trips for every sidebar load. With Finding 5's coalescing, redundant parallel reloads also become free. Together, the 9× redundant queries-per-minute observed in baseline captures should drop substantially. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:07:31 +02:00
Alan Wizemann	432d5b0b52	fix(remote-sqlite): bump query timeout 15s→30s + add in-flight coalescing Two issues from the perf capture: 1. fetchMessages on a 157-message session timed out at exactly 15.06 s (`mac.fetchMessages` interval = 15,062,646,042 ns), then silently returned 0 rows. The chat appeared empty but the session had data; the timeout was firing before sqlite3 -json could ship the ~50KB payload over a 420 ms-RTT SSH link. Bumped queryTimeout to 30 s. The streamScript transport-level timeout still fires on truly wedged hosts. 2. mac.loadRecentSessions fired twice in parallel at t=960450 + t=960584, finishing 134 ms apart — two independent watcher ticks each spawning a full 3-query SSH load for the same data. Added in-flight request coalescing keyed on the inlined SQL text: when a query with the exact same SQL is already pending, second caller awaits the first task instead of spawning a new subprocess. New ScarfMon event `sqlite.query.coalesced` surfaces hits in traces. Coalescing is surgical — applies to single `query` calls only, not `queryBatch` (different timeout scaling, concurrent-same-batch is rare). Avoids serializing independent work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:07:19 +02:00
Alan Wizemann	12e152bfea	perf(ssh): replace Thread.sleep spin with kernel-wait for runLocal timeout Audit Finding 3 — every SSH operation funnels through SSHTransport.runLocal, which used a 100ms Thread.sleep loop while waiting for the timeout. Each call held one cooperative-pool thread for the full timeout duration with spin-poll overhead, AND had 100ms granularity on the deadline. Replace with proc.terminationHandler + DispatchGroup wait — kernel-wakeup when the process exits (or the deadline fires), no spin. Same one-thread blocking footprint, but eliminates the per-operation spin work that inflated query latency 60-70% under concurrent SSH load (visible in ScarfMon as 7-second mac.loadRecentSessions outliers when sidebar reload + chat finalize + watcher poll all fired together). Minimum-touch fix; full async migration of runLocal documented for follow-up. The bigger refactor would let cooperative-pool threads park on a true async suspension during the wait, but requires propagating async through every ServerTransport caller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 12:06:58 +02:00
Alan Wizemann	099d73dde8	feat(scarfmon): instrument Nous model catalog + subscription path (beach-ball investigation) User reported a remote-context beach-ball when opening the model picker with Nous as the active provider. Existing measure points showed loadProviders + loadModels at ~315ms each (fast). The beach-ball must be in the uninstrumented Nous-overlay branch the picker fires when nous is selected. Adds four measure points covering every blocking call in that path: - nous.subscription.loadState (interval, .diskIO) — auth.json read via NousSubscriptionService.loadState. Already known to do an SSH read; now precisely measurable. - nous.readCache (interval, .diskIO) — nous_models cache read, TWO sequential SSH ops (fileExists + readFile). - nous.bearerToken (interval, .diskIO) — auth.json read AGAIN inside fetchModels. This is a duplicate read — loadState already parsed the same file moments earlier. Comment-flagged as a caching candidate. - nous.fetchModels (interval, .transport) + .bytes (event) — HTTP GET against the Nous /v1/models endpoint with the body byte count attached. The most likely beach-ball culprit if the endpoint is slow or hung. After the next capture we'll know which of the four owns the user's wall-clock; if `nous.bearerToken` shows up alongside `nous.subscription.loadState` with similar duration, the duplicate read is also a real cost worth fixing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:50:51 +02:00
Alan Wizemann	4efd84c119	feat(projects,cron): new project wizard + keychain env mirror + #75 fix Three coordinated additions to the project surface: 1. New Project from Scratch wizard. Toolbar entry that scaffolds a Scarf-standard project skeleton (`<project>/.scarf/dashboard.json` placeholder + `AGENTS.md` marker block), registers it, opens an ACP chat session in the project's cwd, and auto-sends a kickoff prompt that activates the bundled `scarf-template-author` skill. The skill drives the substantive setup conversationally — widgets, optional config schema, optional cron, AGENTS.md content. 2. Keychain secrets mirror into ~/.hermes/.env. Cron jobs can now reference Keychain-backed config values via env vars named `SCARF_<UPPER_SLUG>_<UPPER_FIELDKEY>`. Hermes reloads .env per cron tick (cron/scheduler.py:897-903), so credential rotation is free. Source of truth stays in the Keychain — config.json keeps `keychain://` URIs unchanged. Mirror runs at install, post-install Configuration save, uninstall, "Remove from List", and on app launch (reconcileAll). Mode 0600 on `.env` enforced by LocalTransport's existing `.env` heuristic. 3. Configuration form layout recursion fix (issue #75). Per-stage frame sizes on `ConfigEditorSheet` triggered `_NSDetectedLayoutRecursion` for projects with manifest.json. Stabilized the outer frame at the editing stage's intrinsic size so transitions only swap content, never resize the container. New services: - `ProjectScaffolder` (Mac) — bare-shell project + AGENTS.md marker - `SkillBootstrapService` (Mac) — copies bundled skills into ~/.hermes/skills/ - `KeychainEnvMirror` (Mac) — splice/unmirror/reconcileAll over ~/.hermes/.env - `SecretsEnvBlock` (ScarfCore) — pure marker-block helpers Bundled skill `scarf-template-author` v1.1.0 ships in `Resources/BuiltinSkills.bundle/`; SkillBootstrapService copies it into `~/.hermes/skills/scarf-template-author/` on launch (idempotent + version-gated). The skill grew a "Using secrets in cron prompts" section documenting the env-var convention. Migration: launch reconciler auto-populates .env on first v2.8 launch. Users with cron prompts authored against the old (broken) pattern need to update them to use $SCARF_… references — see release notes. Tests: - SecretsEnvBlockTests: 24/24 (`swift test --filter SecretsEnvBlock`) - KeychainEnvMirrorTests: 11/11 (`xcodebuild ... -only-testing:scarfTests/KeychainEnvMirror`) The idempotent-mirror test caught a real bug: applyBlock's replace path consumed the trailing newline from blockRange but didn't restore it, breaking the no-op-when-unchanged contract that the launch reconciler relies on. Fixed. v2.8 RELEASE_NOTES.md committed but no release cut yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 11:44:23 +02:00
Alan Wizemann	bd9bacb8b3	feat(scarfmon): B2 + B3 + iOS dashboard — file watcher, message hydration, dashboard load Three areas instrumented in this batch. Both targets build clean. B2 — Mac HermesFileWatcher (FSEvents + remote SSH poll) - mac.fileWatcher.localFire (event) — every FSEvents change on a watched core or project path. High counts during streaming chats are normal (state.db-wal ticks per persisted message); high counts during idle suggest a runaway watcher install. - mac.fileWatcher.remoteRestart (event, bytes=path-count) — fires once per SSH poller restart, with the union path count attached. Frequent restarts mean the project-list update path is churning. - mac.fileWatcher.remoteDelta (event) — fires per non-empty change detected on the SSH poll. Pair with `ssh.streamScript` cadence to see actual poll latency. B3 — Chat session boot + message hydration - mac.fetchMessages (interval) + .rows (event) — bounded SQL fetch from HermesDataService. Catches slow paginated scrolls back through long sessions. - mac.refreshSessionFromDB (interval) — RichChatViewModel's post-promptComplete refresh that picks up cost/token data. - mac.hydrateMessages (interval) + .rows (event) — full session-boot hydration in RichChatViewModel.loadSessionHistory. Was the suspected trigger of the 22-bubble session-start storms in the Phase 3a baseline; now precisely measurable. iOS Dashboard (resolves the original "out of sync" mystery) - ios.loadDashboard (interval) — wraps the four dataService.fetch* Citadel SFTP round-trips in IOSDashboardViewModel.load(). - ios.allSessions.count (event) — sidebar list size after each load, correlates load latency with list growth. - ios.dashboardRefresh.trigger (event) — fires only on pull-to-refresh, separates that entry path from initial appear. Architectural finding: the original v2.6.0 user feedback ("chat out of sync iOS↔Mac on fast LAN") is now firmly attributable to this — iOS does NOT subscribe to a file watcher. The dashboard refresh path is appear-time + pull-to-refresh only. `CitadelServerTransport.watchPaths()` is effectively dead code on iOS today; nobody calls it. Earlier A1 instrumentation (commit `9df7142`) put measure points on it, which is why captures showed zero `ios.fileWatcher.tick` events. Future work: either add a foregrounded poll loop to iOS, or thread the file watcher into the dashboard subscription. Documented in the ScarfMon roadmap memory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 23:52:11 +02:00
Alan Wizemann	96af545e66	feat(scarfmon): Tier A2/A3/B1/B4 — sessions, model catalog, dashboard widgets, image encoder Four parallel instrumentation drops orchestrated by the perf roadmap. All adds; no logic changes; both targets build clean. A2 — Mac sessions list reload - mac.scheduleSessionsRefresh (event) — every file-watcher entry into the debounced reload helper. Pair with mac.loadRecentSessions count to see how many ticks coalesce per actual reload. - mac.loadRecentSessions (interval) — full wall-clock from DB open through observable assignment. - mac.recentSessions.count (event) — sidebar list size, correlates list growth with reload latency. A3 — ModelCatalogService loads - modelCatalog.loadProviders (interval) + .providers.count (event). - modelCatalog.loadModels (interval) + .models.count (event). - modelCatalog.validateModel (interval) — covers loadCatalog -> transport.readFile, hits disk on every call. Sync wrap (not measureAsync): the inner Task.detached body is synchronous; the detached hop is the async boundary. B1 — Dashboard render - mac.dashboard.body (event) — ProjectsView body re-eval count. - dashboard.loadRegistry (interval) — projects.json read + decode. - widget.markdown_file.load / widget.log_tail.load / widget.image.load / widget.cron_status.load (intervals) — one per v2.7 file-reading widget. cron_status batches its two HermesFileService calls into one tuple-returning measure block so the existing two-call shape stays intact. B4 — Image encoder - imageEncoder.input.bytes (event) — raw input size. - imageEncoder.downsample (interval) — full decode/resize/JPEG encode round trip across all three platform branches (AppKit, UIKit, Linux passthrough). - imageEncoder.bytes (event) — final encoded JPEG size, lets us spot blowup cases. Sync wrap: encode is nonisolated sync; using measureAsync would require turning the function async, which is a logic change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 23:38:50 +02:00
Alan Wizemann	9df7142f49	feat(scarfmon): A1 — instrument iOS file-watcher polling cadence Adds three measure points to CitadelServerTransport.watchPaths: - ios.fileWatcher.tick (interval) — full poll cycle latency including the SSH stat round-trips. > 1500ms here is what 'out of sync' feels like — the channel is congested or the host is slow. - ios.fileWatcher.delta (event) — fires only when the signature actually changed. Low delta/tick ratio means we can safely drop the 3-second cadence; high ratio means we'd just burn bandwidth. - ios.fileWatcher.paths (event, bytes=count) — number of paths watched per cycle. Explains slow ticks as the project list grows. Surgical addition; existing 3-second cadence + signature-diff logic unchanged. With Full mode on, a few minutes of usage on LAN will tell us empirically whether the cadence can drop to 1s — the original v2.6.0 user feedback complained 'chat is out of sync' between iOS and Mac on a fast LAN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 23:33:30 +02:00
Alan Wizemann	9ff9a018e7	feat(scarfmon,chat): Phase 3b — dampen finalize bursts + Thinking… status + wider loadConfig stack Three targeted fixes from the Phase 3a baseline. Bubble-burst dampening (Phase 3b-1): - RichChatViewModel.finalizeStreamingMessage wraps both the streaming-id rewrite and the empty-finalize remove() in a no-animation Transaction. The id flip from 0 → permanent value was the load-bearing trigger of the 5–8 RichMessageBubble.body fires we were seeing 1–2 ms after every `finalizeStreamingMessage` interval; SwiftUI ran an animated diff against neighbors and re-evaluated their bodies. The new message is content-equal to the streaming one — there is no animation worth running. Thinking… status promotion (Phase 3b-2): - RichChatViewModel exposes `isStreamingThoughtsOnly` — true while a turn is in flight, has emitted thought-stream bytes, and has not yet produced any visible assistant text. The Phase 3a baseline showed this is where most of the user-perceived "feels slow" lives: reasoning models commonly take 3–8 s before producing visible output, and Scarf surfaced no specific signal during that window. - Mac ChatView.displayedStatus promotes the toolbar pill to "Thinking…" when the flag is true. - iOS connectionBanner gains a transient "Thinking…" strip with spinner, same trigger condition. Phase 3a fix-up: - HermesFileService.loadConfig stack-trace logging widened from one frame to a 10-frame window prefixed with "#N", so the actual caller is visible past inlined ScarfMon wrappers (the prior log surfaced ScarfMon.measure itself, not the loadConfig caller). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 23:14:03 +02:00
Alan Wizemann	0a4f8de492	feat(scarfmon): Phase 3a — diagnostic measure points for chat-render bursts Adds four targeted measure points so the next baseline capture can attribute the bubble-re-render storm and the slow sendPrompt to a specific cause: - mac.RichChatMessageList.body — distinguishes "the parent is re-issuing the ForEach" from "the bubbles are re-rendering on their own". If list.body fires once and bubble.body fires N times, churn is in the bubbles; if list.body fires N times, the ForEach itself is being rebuilt. - finalizeStreamingMessage (interval) — pinpoints the end-of-stream burst trigger. The 20-bubble re-eval burst we saw at the close of each turn lines up with this call; measuring it surfaces whether it's the streaming-id rewrite, the turn-duration assignment, or something downstream. - firstByte / firstThoughtByte (event) — fires once per turn on the first chunk after currentTurnStart is set. Splits user-tap → first byte (network + Hermes thinking, the dominant component of the 7-11s sendPrompt) from first byte → turn end (Scarf streaming render). - loadConfig caller hint via os.Logger — when ScarfMon is in Full mode, logs the first stack frame above each loadConfig call to the com.scarf.mon subsystem so mystery callers (the read at t=264282 with no apparent trigger in the prior baseline) become traceable via `log stream`. Symbol-only, no PII, free outside Full mode. All four are pure additions — no behavior change, same zero-cost default-off semantics as Phase 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:47:29 +02:00
Alan Wizemann	3126c34561	feat(scarfmon): chat + transport + sqlite measure points (Phase 2) Wires ScarfMon measure points into the chat hot path on both targets, plus the underlying SSH transport and remote-SQLite backend. All callsites are surgical adds — no behavior change. Cost when ScarfMon is in `.signpostOnly` (default) is one os_signpost emit per call, elided by the runtime outside an Instruments session. In `.full` mode the same callsites also push samples into the in-memory ring buffer. Render counters (event): - mac.ChatView.body / ios.ChatView.body — full transcript pane re-evals - mac.RichMessageBubble.body / ios.MessageBubble.body — per-bubble re-evals Stream + session (event + interval): - mac.sendViaACP, mac.sendPrompt — user tap → first-byte - mac.acpEvent, mac.handleACPEvent — per-event delivery + handle cost - mac.startACPSession — session boot - ios.send, ios.startResuming — same shape on iOS - ios.acpEvent, ios.handleACPEvent — same per-event split on iOS Transport + SQLite (interval, with byte counts on rows): - ssh.streamScript (Citadel iOS) — SSH round-trip - ssh.run (SSHScriptRunner Mac) — SSH round-trip - sqlite.query, sqlite.queryBatch — Remote SQLite per-call - sqlite.query.rows — row count + stdout bytes per query Disk I/O (interval): - diskIO.loadConfig — config.yaml read + parse - diskIO.loadCronJobs — cron jobs.json decode Body counters use the `let _: Void = ScarfMon.event(...)` pattern at the top of `body` — works inside `@ViewBuilder` and fires on every re-eval, which is exactly the signal we want. To use: Mac: Settings → Advanced → Performance Diagnostics → Full iOS: Settings → Diagnostics → Performance → Full Both panels auto-aggregate by (category, name), surface top 20 by p95, and offer Copy as JSON for sharing in feedback threads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:18:06 +02:00
Alan Wizemann	6cf59c8a44	feat(scarfmon): perf instrumentation plumbing for iOS + Mac (Phase 1) ScarfMon lands the always-on perf instrumentation harness. Phase 1 ships the plumbing only; Phase 2 wires the chat measure points. Core (ScarfCore/Diagnostics/): - ScarfMon — public API: measure / measureAsync / event with @inline(__always) short-circuit when the backend set is empty so the off path is one branch + return. Categories are an enum, names are StaticString so user content cannot leak through metric tags. - ScarfMonRingBuffer — fixed-capacity (4096) lock-protected ring; one os_unfair_lock per record; summary() aggregates by (category, name) with nearest-rank p50/p95; exportJSON() emits a one-line-per-sample dump for the Copy as JSON button. - ScarfMonSignpostBackend — emits os_signpost into a dedicated com.scarf.mon subsystem so Instruments → Points of Interest shows Scarf's own measure points without a debug build. - ScarfMonLoggerBackend — Logger(.debug) sink for users running `log stream --predicate 'subsystem == \"com.scarf.mon\"'`. - ScarfMonBoot — three modes (off / signpostOnly / full); persists the user's choice in UserDefaults under ScarfMonMode; configure() is idempotent and replaces the active backend set atomically. Tests: 11 cases covering ring ordering / wrap / reset, summary aggregation, p95 percentiles, event vs interval semantics, install / isActive, measure + measureAsync (including the throw path), boot mode transitions, and JSON export round-trip. @Suite(.serialized) because the suite mutates process-wide backend state. App wiring: - ScarfIOSApp.init + ScarfApp.init call ScarfMonBoot.configure(mode:) with the persisted mode (default .signpostOnly). - iOS Settings → Diagnostics → Performance row leads to a list-style panel with the segmented mode picker, top-20 stat rows by p95, Copy as JSON, and Reset. - Mac Settings → Advanced gains a ScarfMonDiagnosticsSection with the same shape (NSPasteboard for copy). Open-source by design — no remote upload, no analytics. The ring buffer never leaves the device unless the user explicitly taps Copy as JSON. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:08:21 +02:00

1 2 3 4 5 ...

398 Commits