chore: Bump version to 2.0.2

test(ssh): regression tests for ControlPath socket-limit invariants
Two tests pinning the invariants that were violated / introduced by the #19 / PR #20 fix: - controlDirPathFitsMacOSSocketLimit: asserts dir + '/' + 64-char %C hash + NUL <= 104 bytes. Would have caught the original Caches-based path landing at 105 bytes for users with longer $HOME strings. - controlDirPathIsPerUser: asserts the path includes the current uid, pinning the per-user-isolation invariant against any future refactor that drops it (since /tmp is shared across all local users). scarfTests was a stub before this — these are the suite's first real tests.
2026-05-10 18:44:45 +00:00 · 2026-04-20 15:46:07 -07:00 · 2026-04-20 15:45:29 -07:00 · 2026-04-20 15:45:20 -07:00 · 2026-04-20 15:32:47 -07:00 · 2026-04-20 15:32:47 -07:00
24 changed files with 1685 additions and 88 deletions
@@ -1,6 +1,7 @@
 # Xcode
 build/
 .gh-pages-worktree/
+.wiki-worktree/
 DerivedData/
 *.pbxuser
 !default.pbxuser
@@ -52,3 +53,6 @@ scarf/standards/backups/
 # history. RELEASE_NOTES.md stays tracked (committed with the version bump).
 releases/v*/*.zip
 releases/v*/appcast-entry.xml
+
+# Wiki helper: personal patterns (hostnames, IPs) blocked from the wiki push.
+scripts/wiki-blocklist.txt
@@ -59,6 +59,29 @@ The script bumps version, archives Universal (arm64 + x86_64) + ARM64-only varia

 **Prerequisites (one-time, already set up on Alan's machine):** Developer ID Application cert in login Keychain (team `3Q6X2L86C4`), notarytool keychain profile `scarf-notary`, Sparkle EdDSA private key in Keychain item `https://sparkle-project.org`, `gh-pages` branch + GitHub Pages enabled. See the header of [scripts/release.sh](scripts/release.sh) and the Releases section in [README.md](README.md) for details.

+## Wiki
+
+Public documentation lives in the GitHub wiki at https://github.com/awizemann/scarf/wiki. The wiki is a separate git repo cloned to `.wiki-worktree/` in the repo root (gitignored, sibling to `.gh-pages-worktree/`). Internal dev notes stay in `scarf/docs/`; the wiki is for public-facing reference.
+
+**Update the wiki when:**
+- A new feature module is added under `scarf/scarf/scarf/Features/` → extend the relevant User Guide page.
+- A new core service is added under `Core/Services/` → extend `Core-Services.md`.
+- Architecture changes (AppCoordinator, transport, MVVM-F rule, sandbox) → `Architecture-Overview.md` + the specific sub-page.
+- Hermes version bumps in this file → `Hermes-Version-Compatibility.md`.
+- `scripts/release.sh` completes a full (non-draft) release → bump latest-version on `Home.md` + append to `Release-Notes-Index.md`.
+- Keyboard shortcut or sidebar section changes → `Keyboard-Shortcuts.md` / `Sidebar-and-Navigation.md`.
+
+**Skip for:** bug fixes with no user-observable change, pure refactors, typos, test-only changes, internal cleanups.
+
+```bash
+./scripts/wiki.sh pull                         # always first
+# edit .wiki-worktree/*.md with normal tools
+./scripts/wiki.sh commit "docs: describe X"   # runs secret-scan
+./scripts/wiki.sh push                         # runs secret-scan again, then push
+```
+
+**Never** commit API keys, tokens, `.env` files, private keys, or real hostnames/IPs to the wiki. The script's two-pass secret-scan blocks common token patterns and a user-maintained blocklist at `scripts/wiki-blocklist.txt` (gitignored). Do not bypass without explicit approval. Full workflow on the wiki itself at `.wiki-worktree/Wiki-Maintenance.md`.
+
 ## Hermes Version

 Targets Hermes v0.9.0 (v2026.4.13). Log lines may carry an optional `[session_id]` tag between the level and logger name — `HermesLogService.parseLine` treats the session tag as an optional capture group, so older untagged lines still parse.
@@ -33,6 +33,10 @@ Rules:
 - The app only reads from `~/.hermes/state.db` (never writes). Memory files are the exception.
 - Swift 6 strict concurrency: `@MainActor` default isolation, `nonisolated` for service methods.

+## Documentation
+
+Public docs live in the [GitHub wiki](https://github.com/awizemann/scarf/wiki). Small fixes (typos, clarifications) can be made via the "Edit" button on any wiki page — you need push access to the main repo. For larger changes, clone the wiki locally (`git clone git@github.com:awizemann/scarf.wiki.git`) or open an issue describing the proposed change.
+
 ## Reporting Issues

 Open an issue with:
@@ -42,6 +42,21 @@ Scarf 2.0 is a multi-window app. Each window is bound to exactly one Hermes serv

 Remote Hermes is reached over system SSH — the same `~/.ssh/config`, ssh-agent, ProxyJump, and ControlMaster pooling your terminal uses. File I/O flows through `scp`/`sftp`; SQLite is served from atomic `sqlite3 .backup` snapshots cached under `~/Library/Caches/scarf/snapshots/<server-id>/`; chat (ACP) tunnels as `ssh -T host -- hermes acp` with JSON-RPC over stdio end-to-end. Everything in the feature list below works against remote identically to local.

+### Remote setup requirements
+
+The remote host must have:
+
+1. **SSH access** — key-based auth via your local ssh-agent. Scarf never prompts for passphrases; run `ssh-add` once in Terminal before connecting.
+2. **`sqlite3`** on the remote `$PATH` — needed for the atomic DB snapshots. Install on the remote with `apt install sqlite3` (Ubuntu/Debian), `yum install sqlite` (RHEL/Fedora), or `apk add sqlite` (Alpine).
+3. **`pgrep`** on the remote `$PATH` — used by the Dashboard "is Hermes running" check. Standard on every distro; install `procps` if missing.
+4. **`~/.hermes/` readable by the SSH user**. When Hermes runs as a separate user (systemd service, Docker container), the SSH user needs read access to `config.yaml` and `state.db`. Either (a) SSH as the Hermes user, (b) `chmod` Hermes's home to be group-readable and add your SSH user to that group, or (c) set the **Hermes data directory** field when adding the server to point at the right location (e.g. `/var/lib/hermes/.hermes`).
+
+### Troubleshooting remote connections
+
+If the connection pill is green but the Dashboard shows "Stopped", "unknown", or empty values, the SSH user can't read the Hermes state files. Open **Manage Servers → 🩺 Run Diagnostics** (or click the yellow "Can't read Hermes state" pill in the toolbar). The diagnostics sheet runs fourteen checks in one SSH session — connectivity, `sqlite3` presence, read access to `config.yaml` and `state.db`, the effective non-login `$PATH` — and tells you exactly which one fails and why, with remediation hints for each. Use the **Copy Full Report** button to paste the full output into a bug report.
+
+For the common "Hermes isn't at the default path" case (systemd services, Docker), **Test Connection** in the Add Server sheet now probes `/var/lib/hermes/.hermes`, `/opt/hermes/.hermes`, `/home/hermes/.hermes`, and `/root/.hermes` when it can't find `state.db` at `~/.hermes/`, and offers a one-click fill if it finds any of them.
+
 ## Features

 Scarf mirrors Hermes's surface area through a sidebar-based UI. Sections below map 1:1 to the app's sidebar.
@@ -0,0 +1,68 @@
+## What's New in 2.0.1
+
+Hotfix for [#19](https://github.com/awizemann/scarf/issues/19) and the related reports from the first day of v2.0: users' remote SSH connections would show a green "Connected" pill but every view (Dashboard, Sessions, Activity, Chat) read as empty / "not running" / "not configured". Three distinct environments reported it — Docker Hermes on a LAN, homelab VM over Tailscale, Ubuntu VPS — and every one was a silent file-access failure on the remote that Scarf wasn't surfacing.
+
+### Errors no longer disappear
+
+Every remote read (`config.yaml`, `gateway_state.json`, `state.db`, `pgrep`) used to silently substitute an empty value on *any* failure — permission denied, missing file, `sqlite3` not installed, connection drop — they all looked identical to the UI. Now:
+
+- Each failure logs a specific warning via `os.Logger` (visible in Console.app under subsystem `com.scarf`).
+- The Dashboard shows an orange banner above the stats with the exact error (e.g. "Permission denied reading `~/.hermes/state.db`") and a **Run Diagnostics…** button.
+- `HermesDataService` exposes a `lastOpenError` so views can explain *why* state.db couldn't be opened, rather than just rendering zeros.
+- Routine "file doesn't exist" cases (optional `skill.yaml` metadata, `gateway_state.json` before Hermes starts, `memories/USER.md` on fresh installs) are detected and **not** logged as warnings — only real errors (permission denied, connection drops, `sqlite3` missing) hit the log. Prevents Console from filling with false-positive noise when directory walks encounter optional files.
+
+### New Remote Diagnostics sheet
+
+Accessible from **Manage Servers → 🩺** per-server button, or by clicking the orange connection pill when Scarf can see the server but can't read Hermes state. Runs fourteen checks in a single SSH session and shows pass/fail for each, plus a targeted hint per failure:
+
+- SSH connectivity and auth
+- Remote user identity and `$HOME` resolution
+- `~/.hermes` directory existence and readability
+- `config.yaml` readable (existence *and* actual read access — the old probe only checked existence)
+- `state.db` readable
+- `sqlite3` installed on the remote (required for the atomic snapshot Scarf pulls)
+- `sqlite3` can actually open `state.db`
+- `hermes` binary on the non-login `$PATH` (what runtime uses)
+- `hermes` binary on the login `$PATH` (what the Test Connection probe uses)
+- `pgrep` available (for the "is Hermes running" check)
+
+One **Copy Full Report** button dumps every check as plain text for bug reports, and a raw-output disclosure panel shows the exact stdout/stderr the remote returned whenever any probe fails — so transport-level problems are self-diagnosing.
+
+The diagnostics script is piped to `/bin/sh -s` on stdin rather than passed as `sh -c <script>` argv. The latter was getting split line-by-line by the remote's login shell (newlines parsed as command separators), which stranded variables set on line 1 in an ephemeral `sh` subprocess that exited before line 2 could use them. Stdin-piping runs the whole script in one `sh` process with variable scope preserved.
+
+### Connection pill gains a "degraded" state
+
+The pill used to be green as long as SSH connected; now after connectivity passes it runs a second-tier check (`test -r $HOME/.hermes/config.yaml`). If that fails, the pill turns **orange** with "Connected — can't read Hermes state" and clicking it opens Remote Diagnostics directly. This is the exact symptom mode in #19, and it's now one click away from a specific answer.
+
+The pill's visual also got a pass: the colored dot is replaced with a state-specific SF Symbol (`checkmark.circle.fill` / `stethoscope` / `arrow.triangle.2.circlepath` / `exclamationmark.triangle.fill`), which reads more like a clickable toolbar tool and doubles as the status signal. No custom pill background anymore — the toolbar's native `.principal` bezel is the frame.
+
+### Auto-suggest the correct `remoteHome` during Add Server
+
+When Test Connection can't find `state.db` at the configured (or default) path, it now also probes the common alternate locations — `/var/lib/hermes/.hermes`, `/opt/hermes/.hermes`, `/home/hermes/.hermes`, `/root/.hermes` — and offers a one-click "Use this" fill if it finds one. Removes the need to know that systemd-installed Hermes lives at `/var/lib/hermes/.hermes` by convention.
+
+### Clearer copy for the `remoteHome` field
+
+The Add Server sheet field is now labeled "Hermes data directory" with a description explaining when you'd override it (systemd service installs, Docker sidecars) and noting that Test Connection auto-suggests.
+
+### README has a new "Remote setup requirements" section
+
+Four concrete prerequisites (SSH, `sqlite3`, `pgrep`, read access to `~/.hermes`) and a troubleshooting paragraph pointing at Remote Diagnostics.
+
+### Migrating from 2.0.0
+
+Sparkle will offer the update automatically. Settings and server list are preserved verbatim — this is purely additive (new diagnostics surface, new error banners, auto-suggest in Test Connection). If you were affected by #19, run Remote Diagnostics after updating; the sheet should pinpoint the specific file access issue and suggest a fix.
+
+### Under the hood
+
+- New types: `RemoteDiagnosticsViewModel`, `RemoteDiagnosticsView`. Both are local to Scarf; no new transport protocol.
+- `HermesFileService` gains `loadConfigResult()`, `loadGatewayStateResult()`, `hermesPIDResult()`, `readFileResult()`, `readFileDataResult()` — Result-returning variants that preserve the error. Legacy `loadConfig()` etc. still exist as thin forwarders for callers that don't need diagnostics.
+- `HermesDataService.open()` records `lastOpenError` with humanized hints for "sqlite3 not installed", "permission denied", and "file not found" — the three failure modes that produce 90% of issue #19 symptoms.
+- `ConnectionStatusViewModel` status enum gains `.degraded(reason:)` between `.connected` and `.error`.
+- `TestConnectionProbe` result enum gains `suggestedRemoteHome: String?` carrying any alternate-location hit.
+
+### Known follow-ups (not in 2.0.1)
+
+- `TestConnectionProbe` uses a direct-argv ssh invocation that's functionally correct but fragile (works by accident when split across the login shell). Should be ported to the stdin-pipe pattern the diagnostics sheet now uses.
+- Remaining `try?`-swallowed read paths beyond the four Dashboard-surfacing ones — Cron, Memory, Skills, MCP Servers, Platforms still silently render empty on read errors. Same fix pattern applies, low priority.
+- `hermesBinaryHint` is only populated when the user clicks Test Connection; if they skip it, ACP chat and CLI calls fall back to bare `hermes` which requires it on the non-interactive PATH (rarely true for `~/.local/bin` installs). The connection-pill's second-tier probe could auto-populate this.
+- Docker-host support: when users SSH to a Docker host, `pgrep` and `~/.hermes/` on the host don't see what's inside the container. Needs a `docker exec` wrapping option per server.
@@ -0,0 +1,41 @@
+## What's New in 2.0.2
+
+The actual root cause of [#19](https://github.com/awizemann/scarf/issues/19), found and patched by Scarf's first external contributor. v2.0.1 added the diagnostics UI assuming file-perm root cause; v2.0.2 fixes the underlying bug for everyone, regardless of perms.
+
+### macOS Unix domain socket path limit (the real #19)
+
+OpenSSH's ControlMaster multiplexes our bursty stat/cat/cp traffic over one TCP session per host. The socket path is bound by `bind(2)` to a Unix domain socket — and macOS' `sun_path` is **104 bytes including the NUL terminator**.
+
+Scarf's old socket path was `~/Library/Caches/scarf/ssh/<%C>` where `%C` is OpenSSH's 64-char SHA1 hash of `(local user, host, port, remote user)`. For a username like `alex.maksimchuk`, the full path landed at **105 bytes** — one byte over the limit. ssh exited 255 with `unix_listener: path "..." too long for Unix domain socket`. Our `LogLevel=QUIET` flag (set so ACP's line-delimited JSON stays binary-clean) suppressed the diagnostic, and the user just saw "Remote command exited 255" — which the UI rendered as the silent empty-data state every reporter in #19 described.
+
+The fix is to use a much shorter path:
+
+```swift
+"/tmp/scarf-ssh-\(getuid())"   //  ~17 bytes + 64 hash + sep + NUL = ~83 bytes
+```
+
+Per-user uid suffix keeps two local users' sockets from colliding in the shared `/tmp`, and 0700 perms on the dir keep them inaccessible to other users.
+
+**Massive thanks to Alex Maksimchuk ([@aliatx2017](https://github.com/aliatx2017)) — Scarf's first external PR contributor — for diagnosing and patching this in [#20](https://github.com/awizemann/scarf/pull/20).** That diagnosis only happened because Alex bothered to read the codebase, reproduce against multiple usernames including a Termux/Android instance, and walk back from the cryptic exit code to the actual `bind()` failure. This release wouldn't exist without that work.
+
+### Hardening on top of the fix
+
+Three additions on top of Alex's patch, layered in via separate commits to keep the original change reviewable:
+
+- **Defensive ownership check on the socket dir.** `/tmp` is world-writable, so a malicious local user could pre-create `/tmp/scarf-ssh-<uid>` and trick Scarf into using a hostile directory (we'd silently fail to chmod it back to 0700, since we wouldn't own it). `ensureControlDir` now uses POSIX `mkdir(0700)` (atomic, sets perms at create time) and on `EEXIST` runs `lstat` to verify the entry is a directory we own with mode 0700 — symlink → refuse, wrong owner → refuse + log to `os.Logger`, wrong mode → repair. Closes the `/tmp` pre-creation hole that's the standard concern for any per-user `/tmp` path.
+- **Launch-time sweep of stale sockets.** `ServerRegistry.sweepOrphanCaches` already prunes orphaned snapshot directories on launch; it now also removes ControlMaster socket files older than 30 minutes. Socket basenames are `%C` hashes (not ServerIDs), so we can't keep "still registered" sockets the way the snapshot sweep does — but `ControlPersist` is 600s, so anything older than 30 minutes is guaranteed to be a dead orphan from a crashed master, an unclean app exit, or a server removed while another Scarf instance was holding the dir. Keeps `/tmp/scarf-ssh-<uid>/` from accumulating indefinitely until reboot, while leaving any concurrent Scarf instance's live sockets untouched.
+- **Regression test for the path-length invariant.** `scarfTests` was a stub — it now has two tests: one asserting `controlDirPath().utf8.count + 1 + 64 + 1 ≤ 104` (would have caught the original #19 bug in CI), one asserting the path includes the current uid (pins the per-user-isolation invariant against a future "simplification" that drops it).
+
+### v2.0.1 diagnostics work is still useful
+
+The diagnostics sheet, orange "degraded" pill, dashboard error banner, and `remoteHome` auto-suggest from v2.0.1 all still ship — they just turn out not to have been the right diagnosis for the original three reporters. They remain valuable for the *other* connection-failure modes they were designed to surface (missing `sqlite3` on the remote, real permission errors, container/host visibility gaps, custom Hermes data directories). If you upgrade to v2.0.2 and *still* see incomplete data, run Remote Diagnostics from **Manage Servers → 🩺** and the sheet will tell you why.
+
+### Migrating from 2.0.0 / 2.0.1 / draft 2.0.1
+
+Sparkle will offer the update automatically. Settings and server list are preserved verbatim. The first time v2.0.2 connects to a remote, it'll create `/tmp/scarf-ssh-<uid>/` with mode 0700; the old `~/Library/Caches/scarf/ssh/` directory becomes unused (you can delete it manually, or leave it — macOS will sweep it eventually).
+
+The previous v2.0.1 draft download remains available for anyone who already grabbed it — it's still a valid build with the diagnostics work. v2.0.2 is the recommended upgrade path.
+
+### Reporters of #19
+
+@cmalpass, @flyespresso, @maikokan — please grab v2.0.2 and confirm the dashboard populates without needing to run Remote Diagnostics first. If it still doesn't, the diagnostics sheet should now have a much better chance of pinpointing what's left.
@@ -424,7 +424,7 @@
 				CODE_SIGN_ENTITLEMENTS = scarf/scarf.entitlements;
 				CODE_SIGN_STYLE = Automatic;
 				COMBINE_HIDPI_IMAGES = YES;
-				CURRENT_PROJECT_VERSION = 19;
+				CURRENT_PROJECT_VERSION = 21;
 				DEVELOPMENT_TEAM = 3Q6X2L86C4;
 				ENABLE_APP_SANDBOX = NO;
 				ENABLE_HARDENED_RUNTIME = YES;
@@ -436,7 +436,7 @@
 					"@executable_path/../Frameworks",
 				);
 				MACOSX_DEPLOYMENT_TARGET = 14.6;
-				MARKETING_VERSION = 2.0.0;
+				MARKETING_VERSION = 2.0.2;
 				PRODUCT_BUNDLE_IDENTIFIER = com.scarf.app;
 				PRODUCT_NAME = "$(TARGET_NAME)";
 				REGISTER_APP_GROUPS = YES;
@@ -458,7 +458,7 @@
 				CODE_SIGN_ENTITLEMENTS = scarf/scarf.entitlements;
 				CODE_SIGN_STYLE = Automatic;
 				COMBINE_HIDPI_IMAGES = YES;
-				CURRENT_PROJECT_VERSION = 19;
+				CURRENT_PROJECT_VERSION = 21;
 				DEVELOPMENT_TEAM = 3Q6X2L86C4;
 				ENABLE_APP_SANDBOX = NO;
 				ENABLE_HARDENED_RUNTIME = YES;
@@ -470,7 +470,7 @@
 					"@executable_path/../Frameworks",
 				);
 				MACOSX_DEPLOYMENT_TARGET = 14.6;
-				MARKETING_VERSION = 2.0.0;
+				MARKETING_VERSION = 2.0.2;
 				PRODUCT_BUNDLE_IDENTIFIER = com.scarf.app;
 				PRODUCT_NAME = "$(TARGET_NAME)";
 				REGISTER_APP_GROUPS = YES;
@@ -488,11 +488,11 @@
 			buildSettings = {
 				BUNDLE_LOADER = "$(TEST_HOST)";
 				CODE_SIGN_STYLE = Automatic;
-				CURRENT_PROJECT_VERSION = 19;
+				CURRENT_PROJECT_VERSION = 21;
 				DEVELOPMENT_TEAM = 3Q6X2L86C4;
 				GENERATE_INFOPLIST_FILE = YES;
 				MACOSX_DEPLOYMENT_TARGET = 26.2;
-				MARKETING_VERSION = 2.0.0;
+				MARKETING_VERSION = 2.0.2;
 				PRODUCT_BUNDLE_IDENTIFIER = com.scarfTests;
 				PRODUCT_NAME = "$(TARGET_NAME)";
 				STRING_CATALOG_GENERATE_SYMBOLS = NO;
@@ -509,11 +509,11 @@
 			buildSettings = {
 				BUNDLE_LOADER = "$(TEST_HOST)";
 				CODE_SIGN_STYLE = Automatic;
-				CURRENT_PROJECT_VERSION = 19;
+				CURRENT_PROJECT_VERSION = 21;
 				DEVELOPMENT_TEAM = 3Q6X2L86C4;
 				GENERATE_INFOPLIST_FILE = YES;
 				MACOSX_DEPLOYMENT_TARGET = 26.2;
-				MARKETING_VERSION = 2.0.0;
+				MARKETING_VERSION = 2.0.2;
 				PRODUCT_BUNDLE_IDENTIFIER = com.scarfTests;
 				PRODUCT_NAME = "$(TARGET_NAME)";
 				STRING_CATALOG_GENERATE_SYMBOLS = NO;
@@ -529,10 +529,10 @@
 			isa = XCBuildConfiguration;
 			buildSettings = {
 				CODE_SIGN_STYLE = Automatic;
-				CURRENT_PROJECT_VERSION = 19;
+				CURRENT_PROJECT_VERSION = 21;
 				DEVELOPMENT_TEAM = 3Q6X2L86C4;
 				GENERATE_INFOPLIST_FILE = YES;
-				MARKETING_VERSION = 2.0.0;
+				MARKETING_VERSION = 2.0.2;
 				PRODUCT_BUNDLE_IDENTIFIER = com.scarfUITests;
 				PRODUCT_NAME = "$(TARGET_NAME)";
 				STRING_CATALOG_GENERATE_SYMBOLS = NO;
@@ -548,10 +548,10 @@
 			isa = XCBuildConfiguration;
 			buildSettings = {
 				CODE_SIGN_STYLE = Automatic;
-				CURRENT_PROJECT_VERSION = 19;
+				CURRENT_PROJECT_VERSION = 21;
 				DEVELOPMENT_TEAM = 3Q6X2L86C4;
 				GENERATE_INFOPLIST_FILE = YES;
-				MARKETING_VERSION = 2.0.0;
+				MARKETING_VERSION = 2.0.2;
 				PRODUCT_BUNDLE_IDENTIFIER = com.scarfUITests;
 				PRODUCT_NAME = "$(TARGET_NAME)";
 				STRING_CATALOG_GENERATE_SYMBOLS = NO;
@@ -21,6 +21,10 @@ struct ContentView: View {
                        ServerSwitcherToolbar()
                    }
                    if serverContext.isRemote {
+                        // `.principal` centers the pill in the toolbar —
+                        // the native emphasis bezel is the intended frame;
+                        // the pill's own visual content (icon + label, no
+                        // background) sits inside it in balance.
                        ToolbarItem(placement: .principal) {
                            ConnectionStatusPill(status: connectionStatus)
                        }
@@ -129,6 +129,7 @@ final class ServerRegistry {
        var keep: Set<ServerID> = [ServerContext.local.id]
        for entry in entries { keep.insert(entry.id) }
        SSHTransport.sweepOrphanSnapshots(keeping: keep)
+        SSHTransport.sweepStaleControlSockets()
    }

    // MARK: - Persistence
@@ -1,5 +1,6 @@
 import Foundation
 import SQLite3
+import os

 /// Dedupes concurrent `snapshotSQLite` calls for the same server. When the
 /// file watcher ticks, Dashboard + Sessions + Activity (+ Chat's loadHistory)
@@ -29,11 +30,18 @@ actor SnapshotCoordinator {
 }

 actor HermesDataService {
+    private static let logger = Logger(subsystem: "com.scarf", category: "HermesDataService")
+
    private var db: OpaquePointer?
    private var hasV07Schema = false
    /// Local filesystem path we last opened. For remote contexts this is
    /// the cached snapshot under `~/Library/Caches/scarf/snapshots/<id>/`.
    private var openedAtPath: String?
+    /// Last error from `open()` / `refresh()`, user-presentable. `nil` means
+    /// the last attempt succeeded. Views surface this when their own load
+    /// path fails, so the user sees "Permission denied reading state.db"
+    /// instead of an empty Dashboard with no explanation.
+    private(set) var lastOpenError: String?

    let context: ServerContext
    private let transport: any ServerTransport
@@ -52,16 +60,25 @@ actor HermesDataService {
            // corrupt. Routed through SnapshotCoordinator so concurrent
            // view models don't each spawn a parallel SSH backup for the
            // same server.
-            let url = try? await SnapshotCoordinator.shared.snapshot(
-                remotePath: context.paths.stateDB,
-                contextID: context.id,
-                transport: transport
-            )
-            guard let url else { return false }
-            localPath = url.path
+            do {
+                let url = try await SnapshotCoordinator.shared.snapshot(
+                    remotePath: context.paths.stateDB,
+                    contextID: context.id,
+                    transport: transport
+                )
+                localPath = url.path
+                lastOpenError = nil
+            } catch {
+                lastOpenError = humanize(error)
+                Self.logger.warning("snapshotSQLite failed: \(error.localizedDescription, privacy: .public)")
+                return false
+            }
        } else {
            localPath = context.paths.stateDB
-            guard FileManager.default.fileExists(atPath: localPath) else { return false }
+            guard FileManager.default.fileExists(atPath: localPath) else {
+                lastOpenError = "Hermes state database not found at \(localPath)."
+                return false
+            }
        }
        // Remote snapshots are point-in-time copies that no one writes to;
        // opening them with `immutable=1` tells SQLite to skip WAL/SHM and
@@ -81,14 +98,41 @@ actor HermesDataService {
        }
        let result = sqlite3_open_v2(openPath, &db, flags, nil)
        guard result == SQLITE_OK else {
+            let msg: String
+            if let db {
+                msg = String(cString: sqlite3_errmsg(db))
+            } else {
+                msg = "sqlite3_open_v2 returned \(result)"
+            }
+            lastOpenError = "Couldn't open state.db: \(msg)"
+            Self.logger.warning("sqlite3_open_v2 failed (\(result)) at \(localPath, privacy: .public): \(msg, privacy: .public)")
            db = nil
            return false
        }
        openedAtPath = localPath
+        lastOpenError = nil
        detectSchema()
        return true
    }

+    /// Turn a transport error into the one-line string Dashboard shows. Adds
+    /// hints for the common "sqlite3 not installed" and "permission denied"
+    /// cases so users know what to do.
+    private nonisolated func humanize(_ error: Error) -> String {
+        let desc = (error as? LocalizedError)?.errorDescription ?? error.localizedDescription
+        let lower = desc.lowercased()
+        if lower.contains("sqlite3: command not found") || lower.contains("sqlite3: not found") {
+            return "sqlite3 is not installed on \(context.displayName). Install it with `apt install sqlite3` (Ubuntu/Debian) or `yum install sqlite` (RHEL/Fedora)."
+        }
+        if lower.contains("permission denied") {
+            return "Permission denied reading Hermes state on \(context.displayName). The SSH user may not have read access to ~/.hermes/state.db — try Run Diagnostics."
+        }
+        if lower.contains("no such file") {
+            return "Hermes state not found at ~/.hermes on \(context.displayName). If Hermes is installed elsewhere, set its data directory in Manage Servers."
+        }
+        return desc
+    }
+
    /// Force a fresh snapshot pull + reopen. Used on session-load and in
    /// any path that needs the UI to reflect writes Hermes just made.
    /// Without this, remote snapshots would be frozen at the first `open()`
@@ -1,7 +1,10 @@
 import Foundation
+import os

 struct HermesFileService: Sendable {

+    nonisolated static let logger = Logger(subsystem: "com.scarf", category: "HermesFileService")
+
    let context: ServerContext
    let transport: any ServerTransport

@@ -17,6 +20,14 @@ struct HermesFileService: Sendable {
        return parseConfig(content)
    }

+    /// Error-surfacing config load. Used by Dashboard to show the user a
+    /// specific reason when config.yaml can't be read on a remote host
+    /// (permission denied, missing file, sqlite3 not installed, etc.)
+    /// instead of silently falling back to `.empty`.
+    nonisolated func loadConfigResult() -> Result<HermesConfig, Error> {
+        readFileResult(context.paths.configYAML).map { parseConfig($0) }
+    }
+
    nonisolated private func parseConfig(_ yaml: String) -> HermesConfig {
        let parsed = Self.parseNestedYAML(yaml)
        let values = parsed.values
@@ -385,11 +396,34 @@ struct HermesFileService: Sendable {
        do {
            return try JSONDecoder().decode(GatewayState.self, from: data)
        } catch {
-            print("[Scarf] Failed to decode gateway state: \(error.localizedDescription)")
+            Self.logger.warning("Failed to decode gateway state: \(error.localizedDescription, privacy: .public)")
            return nil
        }
    }

+    /// Error-surfacing gateway-state load. `.success(nil)` means the file
+    /// doesn't exist yet (gateway hasn't written state — normal when Hermes
+    /// is stopped). `.failure` means the file exists but couldn't be read
+    /// (permission denied, connection down, JSON corruption).
+    nonisolated func loadGatewayStateResult() -> Result<GatewayState?, Error> {
+        // Distinguish "file doesn't exist yet" (normal, returns .success(nil))
+        // from "file exists but we can't read or parse it" (error).
+        if !transport.fileExists(context.paths.gatewayStateJSON) {
+            return .success(nil)
+        }
+        switch readFileDataResult(context.paths.gatewayStateJSON) {
+        case .success(let data):
+            do {
+                return .success(try JSONDecoder().decode(GatewayState.self, from: data))
+            } catch {
+                Self.logger.warning("Failed to decode gateway state: \(error.localizedDescription, privacy: .public)")
+                return .failure(error)
+            }
+        case .failure(let err):
+            return .failure(err)
+        }
+    }
+
    // MARK: - Memory

    nonisolated func loadMemoryProfiles() -> [String] {
@@ -1173,22 +1207,45 @@ struct HermesFileService: Sendable {
    }

    nonisolated func hermesPID() -> pid_t? {
-        // Run `pgrep -f hermes` either locally or via the transport. On
-        // remote hosts we trust `pgrep` to be present — it's standard on
-        // Linux and macOS. On failure we conservatively return nil rather
-        // than pretending Hermes is down: the caller will see
-        // isHermesRunning==false, which is already the "unknown" UX.
-        let result = try? transport.runProcess(
-            executable: "/usr/bin/pgrep",
-            args: ["-f", "hermes"],
-            stdin: nil,
-            timeout: 5
-        )
-        guard let result, let firstLine = result.stdoutString
-            .components(separatedBy: "\n")
-            .first(where: { !$0.isEmpty }),
-              let pid = pid_t(firstLine.trimmingCharacters(in: .whitespaces)) else { return nil }
-        return pid
+        switch hermesPIDResult() {
+        case .success(let pid): return pid
+        case .failure: return nil
+        }
+    }
+
+    /// Error-surfacing variant. `.success(nil)` means `pgrep` ran successfully
+    /// and found no hermes process (Hermes is genuinely not running).
+    /// `.failure` means we couldn't probe at all (pgrep missing, connection
+    /// down, permission issue) — a *different* UX from "not running".
+    nonisolated func hermesPIDResult() -> Result<pid_t?, Error> {
+        do {
+            let result = try transport.runProcess(
+                executable: "/usr/bin/pgrep",
+                args: ["-f", "hermes"],
+                stdin: nil,
+                timeout: 5
+            )
+            // pgrep exits 1 when nothing matches — that's "not running", NOT an
+            // error. Anything else (127=command not found, 255=ssh failure) is.
+            if result.exitCode == 0 {
+                if let firstLine = result.stdoutString
+                    .components(separatedBy: "\n")
+                    .first(where: { !$0.isEmpty }),
+                   let pid = pid_t(firstLine.trimmingCharacters(in: .whitespaces)) {
+                    return .success(pid)
+                }
+                return .success(nil)
+            } else if result.exitCode == 1 {
+                return .success(nil)   // genuinely not running
+            } else {
+                let err = TransportError.commandFailed(exitCode: result.exitCode, stderr: result.stderrString)
+                Self.logger.warning("pgrep failed (exit \(result.exitCode)): \(result.stderrString, privacy: .public)")
+                return .failure(err)
+            }
+        } catch {
+            Self.logger.warning("pgrep transport error: \(error.localizedDescription, privacy: .public)")
+            return .failure(error)
+        }
    }

    @discardableResult
@@ -1488,15 +1545,80 @@ struct HermesFileService: Sendable {
    // MARK: - File I/O

    /// Read a UTF-8 text file through the transport. Missing files and any
-    /// transport error surface as `nil` — callers treat missing/unreadable
-    /// the same way they always have.
+    /// transport error surface as `nil` — callers that don't need the
+    /// specific error reason keep using this. New call sites that want to
+    /// show a user-actionable message should use `readFileResult`.
    nonisolated private func readFile(_ path: String) -> String? {
-        guard let data = try? transport.readFile(path) else { return nil }
-        return String(data: data, encoding: .utf8)
+        switch readFileResult(path) {
+        case .success(let s):
+            return s
+        case .failure:
+            return nil
+        }
    }

    nonisolated private func readFileData(_ path: String) -> Data? {
-        try? transport.readFile(path)
+        switch readFileDataResult(path) {
+        case .success(let d):
+            return d
+        case .failure:
+            return nil
+        }
+    }
+
+    /// Error-surfacing read. Returns the decoded text on success, or the
+    /// underlying `TransportError` (or raw error for local failures) on
+    /// failure. Every failure is also logged via `os.Logger` — the warning
+    /// trail in Console.app is how we diagnose "connection green, data
+    /// empty" bug reports without needing to wire the error through every
+    /// existing call site.
+    nonisolated func readFileResult(_ path: String) -> Result<String, Error> {
+        switch readFileDataResult(path) {
+        case .success(let data):
+            guard let s = String(data: data, encoding: .utf8) else {
+                let err = TransportError.fileIO(path: path, underlying: "file is not valid UTF-8")
+                Self.logger.warning("readFile(\(path, privacy: .public)): not UTF-8")
+                return .failure(err)
+            }
+            return .success(s)
+        case .failure(let err):
+            return .failure(err)
+        }
+    }
+
+    nonisolated func readFileDataResult(_ path: String) -> Result<Data, Error> {
+        do {
+            let data = try transport.readFile(path)
+            return .success(data)
+        } catch {
+            // Don't log "No such file" — that's a routine, expected case
+            // for optional files (skill.yaml, gateway_state.json before
+            // Hermes starts, ~/.hermes/memories/USER.md on fresh installs,
+            // etc.). The caller still gets the Result.failure so it can
+            // distinguish missing from present-but-unreadable.
+            // Log everything else — permission denied, connection drops,
+            // sqlite3 missing — since those are actionable diagnostics.
+            if !Self.isFileNotFound(error) {
+                Self.logger.warning("readFile(\(path, privacy: .public)) failed: \(error.localizedDescription, privacy: .public)")
+            }
+            return .failure(error)
+        }
+    }
+
+    /// `true` iff the error represents "file does not exist" as opposed to
+    /// a permission / transport / parse failure. Used to suppress routine
+    /// logging for optional files while still surfacing real problems.
+    nonisolated private static func isFileNotFound(_ error: Error) -> Bool {
+        if let transportErr = error as? TransportError,
+           case .fileIO(_, let underlying) = transportErr {
+            return underlying.lowercased().contains("no such file")
+        }
+        // Cocoa NSFileNoSuchFileError (returned by LocalTransport when
+        // reading a missing file via FileManager).
+        let ns = error as NSError
+        if ns.domain == NSCocoaErrorDomain && ns.code == 260 { return true }
+        if ns.domain == NSPOSIXErrorDomain && ns.code == 2 { return true }   // ENOENT
+        return false
    }

    /// Write a UTF-8 text file atomically through the transport. Matches the
@@ -52,10 +52,13 @@ struct SSHTransport: ServerTransport {
    /// per-host via OpenSSH's `%C` token). Exposed as a static so
    /// cleanup paths (`ServerRegistry.removeServer`, app-launch sweep) can
    /// compute it without instantiating a transport.
+    ///
+    /// Uses a short path under /tmp to stay within the 104-byte macOS
+    /// Unix domain socket limit. The Caches path
+    /// (~/Library/Caches/scarf/ssh/%C) can exceed this limit when the
+    /// username is long, causing ssh to exit 255.
    nonisolated static func controlDirPath() -> String {
-        let base = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first?.path
-            ?? NSHomeDirectory() + "/Library/Caches"
-        return base + "/scarf/ssh"
+        return "/tmp/scarf-ssh-\(getuid())"
    }

    /// Snapshot cache directory for a given server. Stable per-ID so repeated
@@ -94,6 +97,31 @@ struct SSHTransport: ServerTransport {
        }
    }

+    /// Remove ControlMaster socket files older than `staleAfter` seconds.
+    ///
+    /// Socket basenames are %C hashes (not ServerIDs), so we can't keep "still
+    /// registered" sockets the way `sweepOrphanSnapshots` does. But
+    /// `ControlPersist` is 600s — anything older than 30 minutes is guaranteed
+    /// to be a dead orphan from a crashed master, an unclean app exit, or a
+    /// server removed while another Scarf instance was holding the dir.
+    /// Wiping these on launch keeps `/tmp/scarf-ssh-<uid>/` from accumulating
+    /// indefinitely until reboot, while leaving any concurrent Scarf
+    /// instance's live sockets (always <600s old) untouched.
+    static func sweepStaleControlSockets(staleAfter: TimeInterval = 1800) {
+        let root = controlDirPath()
+        guard let entries = try? FileManager.default.contentsOfDirectory(atPath: root) else { return }
+        let cutoff = Date().addingTimeInterval(-staleAfter)
+        for name in entries {
+            let path = root + "/" + name
+            guard let attrs = try? FileManager.default.attributesOfItem(atPath: path),
+                  let mtime = attrs[.modificationDate] as? Date
+            else { continue }
+            if mtime < cutoff {
+                try? FileManager.default.removeItem(atPath: path)
+            }
+        }
+    }
+
    /// Ask OpenSSH to shut down this host's ControlMaster socket, so the TCP
    /// session isn't held open after the user removes this server. If no
    /// master is currently running, `ssh -O exit` exits non-zero — we ignore
@@ -137,13 +165,47 @@ struct SSHTransport: ServerTransport {
        return args
    }

-    /// Ensure the ControlMaster socket directory exists. Called before every
-    /// ssh invocation. Cheap — `createDirectory(withIntermediateDirectories: true)`
-    /// is a no-op when present.
+    /// Ensure the ControlMaster socket directory exists, is a real directory
+    /// (not a symlink), is owned by us, and has mode 0700. Called before every
+    /// ssh invocation.
+    ///
+    /// Defensive against `/tmp` pre-creation: any local user can create
+    /// `/tmp/scarf-ssh-<uid>` before Scarf launches. Plain `mkdir -p` plus
+    /// `setAttributes` would silently accept a hostile dir (since the chmod
+    /// fails when we don't own it, and the Foundation API swallows that). So
+    /// we use POSIX `mkdir` (atomic, sets perms at create time, doesn't
+    /// follow symlinks) and `lstat` to verify ownership when the entry
+    /// already exists.
    nonisolated private func ensureControlDir() {
-        try? FileManager.default.createDirectory(atPath: controlDir, withIntermediateDirectories: true)
-        // 0700 so socket files aren't visible to other users on the Mac.
-        try? FileManager.default.setAttributes([.posixPermissions: 0o700], ofItemAtPath: controlDir)
+        let path = controlDir
+
+        let mkResult = path.withCString { mkdir($0, 0o700) }
+        if mkResult == 0 { return }
+
+        let mkErr = errno
+        if mkErr != EEXIST {
+            Self.logger.error("Failed to create ControlDir \(path, privacy: .public): errno=\(mkErr)")
+            return
+        }
+
+        var st = Darwin.stat()
+        let lstatResult = path.withCString { lstat($0, &st) }
+        guard lstatResult == 0 else {
+            Self.logger.error("Could not lstat existing ControlDir \(path, privacy: .public): errno=\(errno)")
+            return
+        }
+        guard (st.st_mode & S_IFMT) == S_IFDIR else {
+            Self.logger.error("ControlDir \(path, privacy: .public) exists but is not a directory (possibly a symlink) — refusing to use")
+            return
+        }
+        guard st.st_uid == getuid() else {
+            Self.logger.error("ControlDir \(path, privacy: .public) owned by uid \(st.st_uid), expected \(getuid()) — refusing to use")
+            return
+        }
+        if (st.st_mode & 0o777) != 0o700 {
+            Self.logger.warning("ControlDir \(path, privacy: .public) had mode \(String(st.st_mode & 0o777, radix: 8), privacy: .public), repairing to 700")
+            _ = path.withCString { chmod($0, 0o700) }
+        }
    }

    /// Shell-quote a single argument for remote execution. The remote shell
@@ -21,28 +21,73 @@ final class DashboardViewModel {
    var hermesRunning = false
    var isLoading = true

+    /// User-presentable error banner. Set when any of the remote reads
+    /// (state.db snapshot, config.yaml, gateway_state.json, pgrep) failed
+    /// in a way that's not just "file doesn't exist yet". Dashboard renders
+    /// this above the stats with a "Run Diagnostics…" button. `nil` = no
+    /// surfaceable error.
+    var lastReadError: String?
+
    func load() async {
        isLoading = true
        // refresh() = close + reopen, forces a fresh remote snapshot. Cheap
        // on local (live DB reopen).
        let opened = await dataService.refresh()
+        var collectedErrors: [String] = []
        if opened {
            stats = await dataService.fetchStats()
            recentSessions = await dataService.fetchSessions(limit: 5)
            sessionPreviews = await dataService.fetchSessionPreviews(limit: 5)
            await dataService.close()
+        } else if let msg = await dataService.lastOpenError {
+            collectedErrors.append(msg)
        }
        // The fileService methods are synchronous and route through the
        // transport. For remote contexts each call is a blocking ssh
        // round-trip — do them off the main thread to avoid spinning the
        // beach ball during the load.
        let svc = fileService
-        let (cfg, gw, running) = await Task.detached {
-            (svc.loadConfig(), svc.loadGatewayState(), svc.isHermesRunning())
+        struct LoadResults: Sendable {
+            let cfg: Result<HermesConfig, Error>
+            let gw: Result<GatewayState?, Error>
+            let running: Result<pid_t?, Error>
+        }
+        let results = await Task.detached { () -> LoadResults in
+            LoadResults(
+                cfg: svc.loadConfigResult(),
+                gw: svc.loadGatewayStateResult(),
+                running: svc.hermesPIDResult()
+            )
        }.value
-        config = cfg
-        gatewayState = gw
-        hermesRunning = running
+
+        switch results.cfg {
+        case .success(let c): config = c
+        case .failure(let e):
+            config = .empty
+            collectedErrors.append("config.yaml — \(e.localizedDescription)")
+        }
+        switch results.gw {
+        case .success(let g): gatewayState = g
+        case .failure(let e):
+            gatewayState = nil
+            collectedErrors.append("gateway_state.json — \(e.localizedDescription)")
+        }
+        switch results.running {
+        case .success(let pid): hermesRunning = (pid != nil)
+        case .failure(let e):
+            hermesRunning = false
+            collectedErrors.append("pgrep — \(e.localizedDescription)")
+        }
+
+        // Only surface when there's a real error AND we're on a remote
+        // context. Local contexts rarely hit these paths (live DB, local
+        // filesystem), and a transient "file doesn't exist yet" on fresh
+        // installs shouldn't scare users.
+        if context.isRemote, !collectedErrors.isEmpty {
+            lastReadError = collectedErrors.joined(separator: "\n")
+        } else {
+            lastReadError = nil
+        }
        isLoading = false
    }
 }
@@ -2,6 +2,7 @@ import SwiftUI

 struct DashboardView: View {
    @State private var viewModel: DashboardViewModel
+    @State private var showDiagnostics = false
    @Environment(AppCoordinator.self) private var coordinator
    @Environment(HermesFileWatcher.self) private var fileWatcher

@@ -13,6 +14,9 @@ struct DashboardView: View {
    var body: some View {
        ScrollView {
            VStack(alignment: .leading, spacing: 20) {
+                if let err = viewModel.lastReadError {
+                    readErrorBanner(err)
+                }
                statusSection
                statsSection
                recentSessionsSection
@@ -30,6 +34,44 @@ struct DashboardView: View {
        .onChange(of: fileWatcher.lastChangeDate) {
            Task { await viewModel.load() }
        }
+        .sheet(isPresented: $showDiagnostics) {
+            RemoteDiagnosticsView(context: viewModel.context)
+        }
+    }
+
+    /// Banner shown above the Dashboard when one or more remote reads
+    /// failed (permission denied, missing sqlite3, wrong home dir, etc.).
+    /// Replaces the old silent-failure mode where empty values just
+    /// appeared as "Stopped / unknown / 0" with no explanation.
+    private func readErrorBanner(_ err: String) -> some View {
+        VStack(alignment: .leading, spacing: 8) {
+            HStack(alignment: .top, spacing: 8) {
+                Image(systemName: "exclamationmark.triangle.fill")
+                    .foregroundStyle(.orange)
+                VStack(alignment: .leading, spacing: 4) {
+                    Text("Can't read Hermes state on \(viewModel.context.displayName)")
+                        .font(.headline)
+                    Text(err)
+                        .font(.caption.monospaced())
+                        .foregroundStyle(.secondary)
+                        .textSelection(.enabled)
+                        .fixedSize(horizontal: false, vertical: true)
+                }
+                Spacer()
+                Button {
+                    showDiagnostics = true
+                } label: {
+                    Label("Run Diagnostics…", systemImage: "stethoscope")
+                }
+                .controlSize(.regular)
+            }
+        }
+        .padding(12)
+        .background(Color.orange.opacity(0.1), in: RoundedRectangle(cornerRadius: 8))
+        .overlay(
+            RoundedRectangle(cornerRadius: 8)
+                .strokeBorder(Color.orange.opacity(0.3), lineWidth: 1)
+        )
    }

    private var statusSection: some View {
@@ -22,7 +22,12 @@ final class AddServerViewModel {
    var testResult: TestResult?

    enum TestResult: Equatable {
-        case success(hermesPath: String, dbFound: Bool)
+        /// `suggestedRemoteHome` is non-nil when the probe didn't find
+        /// state.db at the configured (or default) path but did find a
+        /// `state.db` at one of the well-known alternates (e.g. a systemd
+        /// install in `/var/lib/hermes/.hermes`). UI offers a one-click
+        /// fill so the user doesn't have to know the convention.
+        case success(hermesPath: String, dbFound: Bool, suggestedRemoteHome: String?)
        /// `command` is the full ssh invocation we attempted (so the user can
        /// paste it into Terminal to see what their shell does with it).
        /// `stderr` is whatever ssh / the remote shell wrote to stderr.
@@ -95,7 +100,7 @@ final class AddServerViewModel {
    /// `hermesBinaryHint` so subsequent calls don't need to re-resolve it.
    func configForSave() -> SSHConfig {
        var cfg = draftConfig
-        if case .success(let path, _) = testResult {
+        if case .success(let path, _, _) = testResult {
            cfg.hermesBinaryHint = path
        }
        return cfg
@@ -11,8 +11,12 @@ final class ConnectionStatusViewModel {
    private let logger = Logger(subsystem: "com.scarf", category: "ConnectionStatus")

    enum Status: Equatable {
-        /// Healthy: most recent probe succeeded.
+        /// Healthy: SSH connected AND we can read `~/.hermes/config.yaml`.
        case connected
+        /// SSH connects but the follow-up read-access probe failed. Data
+        /// views will be empty until this is resolved. `reason` is shown
+        /// in the pill tooltip; users click the pill to open diagnostics.
+        case degraded(reason: String)
        /// No probe yet or the previous probe timed out but we haven't
        /// confirmed failure. Shown as yellow to tell the user "checking…".
        case idle
@@ -72,20 +76,59 @@ final class ConnectionStatusViewModel {

    private func probeOnce() async {
        let snapshot = transport
-        let result: Result<Void, TransportError>
-        // Transport IO on a detached task so we don't block MainActor.
-        result = await Task.detached {
+        let hermesHome = context.paths.home
+        // Two-tier probe in one SSH round-trip:
+        //   tier 1: `true` — raw connectivity / auth / ControlMaster path
+        //   tier 2: `test -r $HERMESHOME/config.yaml` — can we actually
+        //           read the file Dashboard reads on every tick? Green pill
+        //           only if both pass; yellow "degraded" if tier 1 passes
+        //           but tier 2 fails (the exact symptom in issue #19).
+        // Script emits two lines: TIER1:<exitcode> and TIER2:<exitcode>.
+        let homeArg: String
+        if hermesHome.hasPrefix("~/") {
+            homeArg = "\"$HOME/\(hermesHome.dropFirst(2))\""
+        } else if hermesHome == "~" {
+            homeArg = "\"$HOME\""
+        } else {
+            homeArg = "\"\(hermesHome.replacingOccurrences(of: "\"", with: "\\\""))\""
+        }
+        let script = """
+        echo TIER1:0
+        H=\(homeArg)
+        if [ -r "$H/config.yaml" ]; then echo TIER2:0; else echo TIER2:1; fi
+        """
+
+        enum ProbeOutcome {
+            case connected
+            case degraded(reason: String)
+            case failure(TransportError)
+        }
+
+        let outcome: ProbeOutcome = await Task.detached {
            do {
                let probe = try snapshot.runProcess(
                    executable: "/bin/sh",
-                    args: ["-c", "true"],
+                    args: ["-c", script],
                    stdin: nil,
                    timeout: 10
                )
-                if probe.exitCode == 0 {
-                    return .success(())
+                guard probe.exitCode == 0 else {
+                    return .failure(.commandFailed(exitCode: probe.exitCode, stderr: probe.stderrString))
                }
-                return .failure(.commandFailed(exitCode: probe.exitCode, stderr: probe.stderrString))
+                let out = probe.stdoutString
+                let tier1 = out.contains("TIER1:0")
+                let tier2 = out.contains("TIER2:0")
+                if !tier1 {
+                    // The script itself didn't reach tier 1 — treat as connection failure.
+                    return .failure(.commandFailed(exitCode: 1, stderr: out))
+                }
+                if tier2 {
+                    return .connected
+                }
+                // Connected but can't read config.yaml — the core issue #19
+                // symptom. Give the pill a short reason; the full story goes
+                // into Remote Diagnostics.
+                return .degraded(reason: "can't read ~/.hermes/config.yaml")
            } catch let e as TransportError {
                return .failure(e)
            } catch {
@@ -93,11 +136,15 @@ final class ConnectionStatusViewModel {
            }
        }.value

-        switch result {
-        case .success:
+        switch outcome {
+        case .connected:
            status = .connected
            lastSuccess = Date()
            consecutiveFailures = 0
+        case .degraded(let reason):
+            status = .degraded(reason: reason)
+            lastSuccess = Date()   // SSH itself is fine, reset failure count
+            consecutiveFailures = 0
        case .failure(let err):
            consecutiveFailures += 1
            // First failure → silent yellow "Reconnecting…" while we try
@@ -0,0 +1,474 @@
+import Foundation
+import os
+
+/// Runs a fixed check-list against a remote server and reports per-probe
+/// pass/fail. Exists because `TestConnectionProbe` only verifies ssh
+/// connectivity + hermes binary presence, and `ConnectionStatusViewModel`
+/// only pings `/bin/sh -c true`. When users file "connection green but
+/// everything empty" bug reports (issue #19), this is the diagnostic surface
+/// that tells them (and us) exactly which read fails and why.
+///
+/// One shell invocation runs every check on the remote and emits a
+/// line-delimited `KEY|STATUS|DETAIL` protocol that the view model parses.
+/// Cheaper than one SSH round-trip per probe and gives a consistent shell
+/// environment across all probes.
+@Observable
+@MainActor
+final class RemoteDiagnosticsViewModel {
+    private static let logger = Logger(subsystem: "com.scarf", category: "RemoteDiagnostics")
+
+    let context: ServerContext
+
+    /// Probes in display order. The order matters: connectivity first, then
+    /// environment checks, then Hermes data-path checks. A failure early in
+    /// the list usually explains every subsequent failure.
+    enum ProbeID: String, CaseIterable, Identifiable {
+        case connectivity
+        case remoteUser
+        case remoteHome
+        case hermesHomeConfigured
+        case hermesDirExists
+        case hermesDirReadable
+        case configYAMLReadable
+        case configYAMLContents
+        case stateDBReadable
+        case sqlite3Installed
+        case sqlite3CanOpenStateDB
+        case hermesBinaryNonLogin
+        case hermesBinaryLogin
+        case pgrepAvailable
+
+        var id: String { rawValue }
+
+        /// Human-readable title rendered in the diagnostics sheet.
+        var title: String {
+            switch self {
+            case .connectivity:           return "SSH connectivity"
+            case .remoteUser:             return "Remote user identity"
+            case .remoteHome:             return "Remote $HOME"
+            case .hermesHomeConfigured:   return "Hermes home directory"
+            case .hermesDirExists:        return "Hermes directory exists"
+            case .hermesDirReadable:      return "Hermes directory readable"
+            case .configYAMLReadable:     return "config.yaml readable"
+            case .configYAMLContents:     return "config.yaml actually readable (content)"
+            case .stateDBReadable:        return "state.db readable"
+            case .sqlite3Installed:       return "sqlite3 binary installed on remote"
+            case .sqlite3CanOpenStateDB:  return "sqlite3 can open state.db"
+            case .hermesBinaryNonLogin:   return "hermes binary on non-login PATH"
+            case .hermesBinaryLogin:      return "hermes binary on login PATH (via rc files)"
+            case .pgrepAvailable:         return "pgrep available (for 'is Hermes running')"
+            }
+        }
+
+        /// When the check fails, show this hint alongside the stderr.
+        var failureHint: String? {
+            switch self {
+            case .connectivity:
+                return "SSH itself can't complete. Before re-testing in Scarf, confirm `ssh <host>` works in Terminal."
+            case .remoteUser, .remoteHome:
+                return nil
+            case .hermesHomeConfigured:
+                return nil
+            case .hermesDirExists:
+                return "Scarf is looking at the default `~/.hermes`. If Hermes is installed elsewhere (e.g. `/var/lib/hermes/.hermes` for systemd installs), set the Hermes home directory in Manage Servers → this server → Edit."
+            case .hermesDirReadable:
+                return "The SSH user can see `~/.hermes` but can't list it. Check permissions: `ls -ld ~/.hermes` on the remote — the SSH user needs at least `r-x`."
+            case .configYAMLReadable, .configYAMLContents:
+                return "Scarf can't read `config.yaml`. This usually means the SSH user is different from the user Hermes runs as. Either (a) run Hermes as the SSH user, (b) `chmod a+r ~/.hermes/config.yaml`, or (c) configure Scarf to SSH as the Hermes user."
+            case .stateDBReadable:
+                return "Scarf can't read `state.db` — Sessions, Activity, Dashboard stats all depend on this. Same fix pattern as config.yaml."
+            case .sqlite3Installed:
+                return "Scarf pulls a snapshot of state.db via `sqlite3 .backup`, so sqlite3 must be installed on the remote. Install: `sudo apt install sqlite3` (Ubuntu/Debian), `sudo yum install sqlite` (RHEL/Fedora), `apk add sqlite` (Alpine)."
+            case .sqlite3CanOpenStateDB:
+                return "sqlite3 exists but can't open state.db. Could be a permission issue, a corrupt DB, or a version skew."
+            case .hermesBinaryNonLogin:
+                return "Scarf's runtime calls use non-login SSH shells (no .bashrc). If `hermes` only appears here via the login path, runtime CLI calls will fail. Move your PATH export from `.bashrc` to `.zshenv` or `.profile`."
+            case .hermesBinaryLogin:
+                return "hermes couldn't be located even after sourcing login rc files. Install path is non-standard — set the hermes binary path manually in Manage Servers."
+            case .pgrepAvailable:
+                return "pgrep not found on remote. Dashboard can't determine whether Hermes is running. Install procps: `apt install procps` (most distros have it by default)."
+            }
+        }
+    }
+
+    struct Probe: Identifiable, Sendable {
+        let id: ProbeID
+        let passed: Bool
+        let detail: String
+    }
+
+    private(set) var probes: [Probe] = []
+    private(set) var isRunning: Bool = false
+    private(set) var startedAt: Date?
+    private(set) var finishedAt: Date?
+    /// Raw stdout/stderr from the most recent run, preserved so the UI can
+    /// surface them in a disclosure panel when things look wrong. This is
+    /// how we debug cases where the script ran but no probes were parsed
+    /// (e.g. transport-quoting bugs, dash-vs-bash incompatibilities).
+    private(set) var rawStdout: String = ""
+    private(set) var rawStderr: String = ""
+    private(set) var rawExitCode: Int32 = 0
+
+    init(context: ServerContext) {
+        self.context = context
+    }
+
+    /// Kick off the full check list. Safe to call again to re-run.
+    func run() async {
+        if isRunning { return }
+        isRunning = true
+        probes = []
+        startedAt = Date()
+        finishedAt = nil
+
+        let script = Self.buildScript(hermesHome: context.paths.home)
+        let captured = await Self.execute(script: script, context: context)
+
+        switch captured {
+        case .connectFailure(let msg):
+            rawStdout = ""
+            rawStderr = msg
+            rawExitCode = -1
+            probes = [
+                Probe(id: .connectivity, passed: false, detail: msg)
+            ] + ProbeID.allCases
+                .filter { $0 != .connectivity }
+                .map { Probe(id: $0, passed: false, detail: "(skipped — SSH didn't connect)") }
+        case .completed(let stdout, let stderr, let exitCode):
+            rawStdout = stdout
+            rawStderr = stderr
+            rawExitCode = exitCode
+            probes = Self.parse(stdout: stdout, stderr: stderr, exitCode: exitCode)
+        }
+
+        finishedAt = Date()
+        isRunning = false
+        Self.logger.info("Diagnostics for \(self.context.displayName, privacy: .public) finished — \(self.passingCount)/\(self.probes.count) passing")
+    }
+
+    /// Quick summary string, e.g. "9/14 passing". Used in the header.
+    var summary: String {
+        guard !probes.isEmpty else { return "Not yet run." }
+        return "\(passingCount)/\(probes.count) checks passing"
+    }
+
+    var passingCount: Int {
+        probes.filter { $0.passed }.count
+    }
+
+    var allPassed: Bool {
+        !probes.isEmpty && passingCount == probes.count
+    }
+
+    // MARK: - Script + parsing
+
+    /// Build the remote shell script. Uses a pipe-delimited protocol so the
+    /// Swift side can parse without regex surprises. Status is `PASS` or
+    /// `FAIL`; detail is a single line (can be blank). `__END__` at the
+    /// bottom lets us detect truncation.
+    private static func buildScript(hermesHome: String) -> String {
+        // Shell-quote the home path — user may have typed `~/.hermes` which
+        // we want the remote shell to expand, so we substitute `~/` with
+        // `$HOME/` like `SSHTransport.remotePathArg` does.
+        let expanded: String
+        if hermesHome.hasPrefix("~/") {
+            expanded = "\"$HOME/\(hermesHome.dropFirst(2))\""
+        } else if hermesHome == "~" {
+            expanded = "\"$HOME\""
+        } else {
+            // Absolute path — still quote in case of spaces.
+            expanded = "\"\(hermesHome.replacingOccurrences(of: "\"", with: "\\\""))\""
+        }
+
+        return #"""
+        H=\#(expanded)
+        emit() { printf '%s|%s|%s\n' "$1" "$2" "$3"; }
+
+        emit connectivity PASS "(running in this shell)"
+
+        user=$(id -un 2>/dev/null || echo unknown)
+        emit remoteUser PASS "$user"
+
+        emit remoteHome PASS "$HOME"
+
+        emit hermesHomeConfigured PASS "$H"
+
+        if [ -d "$H" ]; then
+            emit hermesDirExists PASS "$H"
+        else
+            emit hermesDirExists FAIL "not a directory: $H"
+        fi
+
+        if [ -r "$H" ] && [ -x "$H" ]; then
+            emit hermesDirReadable PASS ""
+        else
+            emit hermesDirReadable FAIL "cannot read/enter $H (check perms on the dir)"
+        fi
+
+        if [ -r "$H/config.yaml" ]; then
+            emit configYAMLReadable PASS ""
+        else
+            if [ -e "$H/config.yaml" ]; then
+                emit configYAMLReadable FAIL "exists but not readable by $user"
+            else
+                emit configYAMLReadable FAIL "file does not exist"
+            fi
+        fi
+
+        if head -c 1 "$H/config.yaml" > /dev/null 2>&1; then
+            size=$(wc -c < "$H/config.yaml" 2>/dev/null | tr -d ' ')
+            emit configYAMLContents PASS "${size} bytes"
+        else
+            emit configYAMLContents FAIL "cannot read file contents"
+        fi
+
+        if [ -r "$H/state.db" ]; then
+            size=$(wc -c < "$H/state.db" 2>/dev/null | tr -d ' ')
+            emit stateDBReadable PASS "${size} bytes"
+        else
+            if [ -e "$H/state.db" ]; then
+                emit stateDBReadable FAIL "exists but not readable by $user"
+            else
+                emit stateDBReadable FAIL "file does not exist"
+            fi
+        fi
+
+        if command -v sqlite3 > /dev/null 2>&1; then
+            sq=$(command -v sqlite3)
+            emit sqlite3Installed PASS "$sq"
+        else
+            emit sqlite3Installed FAIL "sqlite3 not on PATH"
+        fi
+
+        if sqlite3 "$H/state.db" 'SELECT 1' > /dev/null 2>&1; then
+            emit sqlite3CanOpenStateDB PASS ""
+        else
+            err=$(sqlite3 "$H/state.db" 'SELECT 1' 2>&1 | head -1)
+            emit sqlite3CanOpenStateDB FAIL "$err"
+        fi
+
+        # Non-login PATH: just ask the current shell.
+        hpath=$(command -v hermes 2>/dev/null)
+        if [ -n "$hpath" ]; then
+            emit hermesBinaryNonLogin PASS "$hpath"
+        else
+            emit hermesBinaryNonLogin FAIL "not on non-login PATH ($PATH)"
+        fi
+
+        # Login PATH: source rc files (mirroring TestConnectionProbe) and re-probe.
+        for rc in "$HOME/.zshenv" "$HOME/.zprofile" "$HOME/.bash_profile" "$HOME/.profile"; do
+            [ -f "$rc" ] && . "$rc" 2>/dev/null
+        done
+        hpath2=$(command -v hermes 2>/dev/null)
+        if [ -z "$hpath2" ]; then
+            for cand in "$HOME/.local/bin/hermes" "/opt/homebrew/bin/hermes" "/usr/local/bin/hermes" "$HOME/.hermes/bin/hermes"; do
+                if [ -x "$cand" ]; then hpath2="$cand"; break; fi
+            done
+        fi
+        if [ -n "$hpath2" ]; then
+            emit hermesBinaryLogin PASS "$hpath2"
+        else
+            emit hermesBinaryLogin FAIL "not found after sourcing rc files"
+        fi
+
+        if command -v pgrep > /dev/null 2>&1; then
+            emit pgrepAvailable PASS "$(command -v pgrep)"
+        else
+            emit pgrepAvailable FAIL "pgrep not on PATH"
+        fi
+
+        printf '__END__\n'
+        """#
+    }
+
+    enum Captured {
+        case connectFailure(String)
+        case completed(stdout: String, stderr: String, exitCode: Int32)
+    }
+
+    private static func execute(script: String, context: ServerContext) async -> Captured {
+        // Can't use `transport.runProcess(executable: "/bin/sh", args: ["-c", script])`
+        // here: SSHTransport.runProcess pipes every argument through
+        // `remotePathArg` (which double-quotes to rewrite `~/` → `$HOME/`),
+        // which mangles a multi-line shell script containing `"$1"`,
+        // nested quotes, and `printf` escape sequences. The result on the
+        // remote is a scrambled string and every probe fails to emit.
+        //
+        // Mirror TestConnectionProbe's approach: build the ssh argv
+        // directly so the script travels as a single opaque argv entry
+        // that ssh forwards to the remote shell unchanged.
+        switch context.kind {
+        case .local:
+            return await runLocally(script: script)
+        case .ssh(let config):
+            return await runOverSSH(script: script, config: config)
+        }
+    }
+
+    /// Direct ssh invocation. Pipes the script into `sh` on stdin rather
+    /// than passing it as `sh -c <script>` argv — because ssh concatenates
+    /// argv with spaces and sends that as a single command string to the
+    /// remote's LOGIN shell, which then parses newlines as command
+    /// separators. A multi-line `sh -c <script>` would run only the first
+    /// line inside the `sh` subprocess (any variables set there die when
+    /// `sh` exits), and the rest would run in the login shell with no
+    /// access to those variables. Symptom: `$H=""` everywhere downstream.
+    ///
+    /// Feeding the script via stdin avoids the split entirely — `sh -s`
+    /// consumes the whole stream in one process, so variable scope is
+    /// preserved and the script runs exactly the same way it would from
+    /// a local `cat script.sh | sh`.
+    private static func runOverSSH(script: String, config: SSHConfig) async -> Captured {
+        var sshArgv: [String] = [
+            "-o", "ControlMaster=auto",
+            "-o", "ControlPath=\(controlDirPath())/%C",
+            "-o", "ControlPersist=600",
+            "-o", "ServerAliveInterval=30",
+            "-o", "ConnectTimeout=10",
+            "-o", "StrictHostKeyChecking=accept-new",
+            "-o", "LogLevel=QUIET",
+            "-o", "BatchMode=yes",
+            "-T"  // no pty — keep stdin/stdout a clean byte stream
+        ]
+        if let port = config.port { sshArgv += ["-p", String(port)] }
+        if let id = config.identityFile, !id.isEmpty {
+            sshArgv += ["-i", id]
+        }
+        let hostSpec: String
+        if let user = config.user, !user.isEmpty { hostSpec = "\(user)@\(config.host)" }
+        else { hostSpec = config.host }
+        sshArgv.append(hostSpec)
+        sshArgv.append("--")
+        sshArgv.append("/bin/sh")
+        sshArgv.append("-s")   // read script from stdin
+
+        return await Task.detached { () -> Captured in
+            let proc = Process()
+            proc.executableURL = URL(fileURLWithPath: "/usr/bin/ssh")
+            proc.arguments = sshArgv
+
+            // Inherit the shell's SSH_AUTH_SOCK so ssh can reach the
+            // agent — same pattern as SSHTransport + TestConnectionProbe.
+            var env = ProcessInfo.processInfo.environment
+            let shellEnv = HermesFileService.enrichedEnvironment()
+            for key in ["SSH_AUTH_SOCK", "SSH_AGENT_PID"] {
+                if env[key] == nil, let v = shellEnv[key], !v.isEmpty {
+                    env[key] = v
+                }
+            }
+            proc.environment = env
+
+            let stdinPipe = Pipe()
+            let stdoutPipe = Pipe()
+            let stderrPipe = Pipe()
+            proc.standardInput = stdinPipe
+            proc.standardOutput = stdoutPipe
+            proc.standardError = stderrPipe
+
+            do {
+                try proc.run()
+            } catch {
+                return .connectFailure("Failed to launch ssh: \(error.localizedDescription)")
+            }
+
+            // Write the script to ssh's stdin, then close the write end so
+            // remote sh sees EOF and exits after executing the whole script.
+            if let data = script.data(using: .utf8) {
+                try? stdinPipe.fileHandleForWriting.write(contentsOf: data)
+            }
+            try? stdinPipe.fileHandleForWriting.close()
+
+            let deadline = Date().addingTimeInterval(30)
+            while proc.isRunning && Date() < deadline {
+                try? await Task.sleep(nanoseconds: 100_000_000)
+            }
+            if proc.isRunning {
+                proc.terminate()
+                return .connectFailure("Diagnostics timed out after 30s")
+            }
+            let out = (try? stdoutPipe.fileHandleForReading.readToEnd()) ?? Data()
+            let err = (try? stderrPipe.fileHandleForReading.readToEnd()) ?? Data()
+            return .completed(
+                stdout: String(data: out, encoding: .utf8) ?? "",
+                stderr: String(data: err, encoding: .utf8) ?? "",
+                exitCode: proc.terminationStatus
+            )
+        }.value
+    }
+
+    /// Local Shell invocation — runs the diagnostic script against the
+    /// user's own Mac. Less useful than the remote form (most checks will
+    /// trivially pass), but lets the same UI work for both contexts.
+    private static func runLocally(script: String) async -> Captured {
+        return await Task.detached { () -> Captured in
+            let proc = Process()
+            proc.executableURL = URL(fileURLWithPath: "/bin/sh")
+            proc.arguments = ["-c", script]
+
+            let stdoutPipe = Pipe()
+            let stderrPipe = Pipe()
+            proc.standardOutput = stdoutPipe
+            proc.standardError = stderrPipe
+            do {
+                try proc.run()
+            } catch {
+                return .connectFailure("Failed to launch /bin/sh: \(error.localizedDescription)")
+            }
+            let deadline = Date().addingTimeInterval(10)
+            while proc.isRunning && Date() < deadline {
+                try? await Task.sleep(nanoseconds: 100_000_000)
+            }
+            if proc.isRunning {
+                proc.terminate()
+                return .connectFailure("Local diagnostics timed out (should be <1s)")
+            }
+            let out = (try? stdoutPipe.fileHandleForReading.readToEnd()) ?? Data()
+            let err = (try? stderrPipe.fileHandleForReading.readToEnd()) ?? Data()
+            return .completed(
+                stdout: String(data: out, encoding: .utf8) ?? "",
+                stderr: String(data: err, encoding: .utf8) ?? "",
+                exitCode: proc.terminationStatus
+            )
+        }.value
+    }
+
+    /// Same cache directory used by SSHTransport — shared so the diagnostic
+    /// probe reuses the connection's ControlMaster socket when it already
+    /// exists (no second TCP handshake, no second auth).
+    private static func controlDirPath() -> String {
+        SSHTransport.controlDirPath()
+    }
+
+    private static func parse(stdout: String, stderr: String, exitCode: Int32) -> [Probe] {
+        var results: [ProbeID: Probe] = [:]
+        for line in stdout.split(whereSeparator: { $0 == "\n" || $0 == "\r" }) {
+            let parts = line.split(separator: "|", maxSplits: 2, omittingEmptySubsequences: false)
+            guard parts.count == 3 else { continue }
+            let key = String(parts[0]).trimmingCharacters(in: .whitespaces)
+            let status = String(parts[1]).trimmingCharacters(in: .whitespaces)
+            let detail = String(parts[2]).trimmingCharacters(in: .whitespaces)
+            guard let probe = ProbeID(rawValue: key) else { continue }
+            results[probe] = Probe(
+                id: probe,
+                passed: status == "PASS",
+                detail: detail
+            )
+        }
+
+        // If the script didn't complete, fill in the missing probes so the UI
+        // still shows every expected row (rather than silently skipping).
+        let terminated = stdout.contains("__END__")
+        let fallbackDetail: String
+        if terminated {
+            fallbackDetail = "(no output)"
+        } else if exitCode != 0 {
+            fallbackDetail = "(script exited \(exitCode) before this check — stderr: \(stderr.prefix(200)))"
+        } else {
+            fallbackDetail = "(no output from script)"
+        }
+
+        return ProbeID.allCases.map { id in
+            results[id] ?? Probe(id: id, passed: false, detail: fallbackDetail)
+        }
+    }
+}
@@ -47,6 +47,26 @@ struct TestConnectionProbe {
        //      Scarf's local resolution.
        // The matched absolute path is stored as `hermesBinaryHint` on the
        // SSHConfig so subsequent CLI/ACP invocations don't have to re-probe.
+        // If the user already typed a remoteHome override, use it; otherwise
+        // default to $HOME/.hermes. Either way, the script also probes a
+        // short list of well-known alternates when the primary path doesn't
+        // have state.db — systemd/docker/VPS installs tend to live at
+        // /var/lib/hermes/.hermes or /home/hermes/.hermes, and SSHing in as
+        // a different user than the Hermes daemon is the leading cause of
+        // "connection green, data empty" bug reports (issue #19).
+        let primary: String
+        if let override = config.remoteHome, !override.isEmpty {
+            if override.hasPrefix("~/") {
+                primary = "$HOME/\(override.dropFirst(2))"
+            } else if override == "~" {
+                primary = "$HOME"
+            } else {
+                primary = override
+            }
+        } else {
+            primary = "$HOME/.hermes"
+        }
+
        let script = #"""
        hpath=$(command -v hermes 2>/dev/null)
        if [ -z "$hpath" ]; then
@@ -61,7 +81,21 @@ struct TestConnectionProbe {
            done
        fi
        echo "HERMES:$hpath"
-        if [ -e "$HOME/.hermes/state.db" ]; then echo DB:ok; else echo DB:missing; fi
+        PRIMARY="\#(primary)"
+        if [ -r "$PRIMARY/state.db" ]; then
+            echo "DB:ok"
+            echo "HOME_USED:$PRIMARY"
+        else
+            echo "DB:missing"
+            # Probe well-known alternates. Emit the first one that has a
+            # readable state.db so the UI can offer a one-click fill.
+            for alt in "/var/lib/hermes/.hermes" "/opt/hermes/.hermes" "/home/hermes/.hermes" "/root/.hermes"; do
+                if [ -r "$alt/state.db" ]; then
+                    echo "SUGGEST:$alt"
+                    break
+                fi
+            done
+        fi
        """#
        sshArgs.append("/bin/sh")
        sshArgs.append("-c")
@@ -133,6 +167,8 @@ struct TestConnectionProbe {
            let hermesPath = lines.first(where: { $0.hasPrefix("HERMES:") })?
                .dropFirst("HERMES:".count).trimmingCharacters(in: .whitespaces) ?? ""
            let dbFound = lines.contains(where: { $0 == "DB:ok" })
+            let suggestedHome = lines.first(where: { $0.hasPrefix("SUGGEST:") })
+                .map { String($0.dropFirst("SUGGEST:".count)).trimmingCharacters(in: .whitespaces) }
            if hermesPath.isEmpty {
                return .failure(
                    message: "hermes binary not found in remote $PATH",
@@ -140,7 +176,7 @@ struct TestConnectionProbe {
                    command: displayCommand
                )
            }
-            return .success(hermesPath: String(hermesPath), dbFound: dbFound)
+            return .success(hermesPath: String(hermesPath), dbFound: dbFound, suggestedRemoteHome: suggestedHome)
        }

        // Classify common failures by scanning the stderr trace.
@@ -81,11 +81,15 @@ struct AddServerSheet: View {
                }
            }

-            LabeledField("Remote ~/.hermes override") {
-                TextField("Leave blank for default", text: $viewModel.remoteHome)
+            LabeledField("Hermes data directory") {
+                TextField("Default: ~/.hermes", text: $viewModel.remoteHome)
                    .textFieldStyle(.roundedBorder)
                    .autocorrectionDisabled()
            }
+            Text("Leave blank unless Hermes is installed at a non-default path (systemd services often live at /var/lib/hermes/.hermes; Docker sidecars vary). Test Connection auto-suggests a value when it detects one of the known alternates.")
+                .font(.caption)
+                .foregroundStyle(.secondary)
+                .fixedSize(horizontal: false, vertical: true)

            Text("Scarf uses ssh-agent for authentication. If your key has a passphrase, run `ssh-add` before connecting — Scarf never prompts for or stores passphrases.")
                .font(.caption)
@@ -113,14 +117,43 @@ struct AddServerSheet: View {

            if let result = viewModel.testResult {
                switch result {
-                case .success(let path, let dbFound):
-                    VStack(alignment: .leading, spacing: 4) {
+                case .success(let path, let dbFound, let suggestedHome):
+                    VStack(alignment: .leading, spacing: 6) {
                        Label("Connected", systemImage: "checkmark.circle.fill")
                            .foregroundStyle(.green)
                        Text("hermes at \(path)").font(.caption).monospaced()
-                        Text(dbFound ? "state.db found" : "state.db not found — Hermes may not have run yet on the remote")
-                            .font(.caption)
-                            .foregroundStyle(dbFound ? Color.secondary : Color.orange)
+                        if dbFound {
+                            Text("state.db readable")
+                                .font(.caption)
+                                .foregroundStyle(.secondary)
+                        } else if let suggestion = suggestedHome {
+                            // Scarf found Hermes data at one of the common
+                            // alternate paths. One-click fill the
+                            // remoteHome field so the user doesn't have to
+                            // know this is a convention thing.
+                            VStack(alignment: .leading, spacing: 4) {
+                                Text("state.db not found at the default location, but Scarf found one at:")
+                                    .font(.caption)
+                                    .foregroundStyle(.orange)
+                                HStack {
+                                    Text(suggestion)
+                                        .font(.caption.monospaced())
+                                        .textSelection(.enabled)
+                                    Spacer()
+                                    Button("Use this") {
+                                        viewModel.remoteHome = suggestion
+                                    }
+                                    .controlSize(.small)
+                                }
+                                .padding(8)
+                                .background(Color.yellow.opacity(0.12), in: RoundedRectangle(cornerRadius: 6))
+                            }
+                        } else {
+                            Text("state.db not found at the configured path. Either Hermes hasn't run yet on this server, or it's installed at a non-default location — set the Hermes data directory field above.")
+                                .font(.caption)
+                                .foregroundStyle(.orange)
+                                .fixedSize(horizontal: false, vertical: true)
+                        }
                    }
                case .failure(let message, let stderr, let command):
                    VStack(alignment: .leading, spacing: 6) {
@@ -8,47 +8,72 @@ import SwiftUI
 struct ConnectionStatusPill: View {
    let status: ConnectionStatusViewModel
    @State private var showDetails = false
+    @State private var showDiagnostics = false

    var body: some View {
        Button {
            switch status.status {
            case .error:
                showDetails = true
+            case .degraded:
+                // Yellow "can't read" state — open the diagnostics sheet
+                // so the user can see exactly which files fail and why.
+                showDiagnostics = true
            case .connected, .idle:
                status.retry()
            }
        } label: {
-            HStack(spacing: 4) {
-                Circle()
-                    .fill(color)
-                    .frame(width: 8, height: 8)
+            // Leading SF Symbol does double duty: its color is the status
+            // signal (green/orange/yellow/red), and its shape reads as a
+            // clickable toolbar tool. No custom background — the toolbar's
+            // `.principal` emphasis bezel is the frame.
+            HStack(spacing: 5) {
+                Image(systemName: iconName)
+                    .foregroundStyle(color)
+                    .symbolRenderingMode(.hierarchical)
                Text(label)
                    .font(.caption)
                    .foregroundStyle(.secondary)
                    .lineLimit(1)
            }
-            .padding(.horizontal, 6)
-            .padding(.vertical, 2)
-            .background(Color.secondary.opacity(0.08), in: Capsule())
+            .padding(.horizontal, 4)
        }
        .buttonStyle(.plain)
        .help(tooltip)
        .popover(isPresented: $showDetails, arrowEdge: .bottom) {
            errorDetails.frame(width: 400)
        }
+        .sheet(isPresented: $showDiagnostics) {
+            RemoteDiagnosticsView(context: status.context)
+        }
    }

    private var color: Color {
        switch status.status {
        case .connected: return .green
+        case .degraded: return .orange
        case .idle: return .yellow
        case .error: return .red
        }
    }

+    /// State-specific SF Symbol. The icon shape itself signals what the
+    /// click will do: checkmark for connected (click to re-probe),
+    /// stethoscope for degraded (click to run diagnostics), spinning
+    /// arrows for probing, triangle for error.
+    private var iconName: String {
+        switch status.status {
+        case .connected: return "checkmark.circle.fill"
+        case .degraded: return "stethoscope"
+        case .idle: return "arrow.triangle.2.circlepath"
+        case .error: return "exclamationmark.triangle.fill"
+        }
+    }
+
    private var label: String {
        switch status.status {
        case .connected: return "Connected"
+        case .degraded: return "Connected — can't read Hermes state"
        case .idle: return "Checking…"
        case .error(let message, _): return message
        }
@@ -62,6 +87,8 @@ struct ConnectionStatusPill: View {
                return "Last probe: \(fmt.localizedString(for: ts, relativeTo: Date()))"
            }
            return "Connected"
+        case .degraded(let reason):
+            return "SSH works but \(reason). Click for diagnostics."
        case .idle: return "Waiting for first probe"
        case .error(_, _): return "Click for details"
        }
@@ -6,6 +6,7 @@ struct ManageServersView: View {
    @Environment(ServerRegistry.self) private var registry
    @State private var showAddSheet = false
    @State private var pendingRemoveID: ServerID?
+    @State private var diagnosticsContext: ServerContext?

    var body: some View {
        VStack(alignment: .leading, spacing: 0) {
@@ -17,12 +18,18 @@ struct ManageServersView: View {
                list
            }
        }
-        .frame(width: 380, height: 360)
+        .frame(width: 440, height: 380)
        .sheet(isPresented: $showAddSheet) {
            AddServerSheet { name, config in
                _ = registry.addServer(displayName: name, config: config)
            }
        }
+        .sheet(item: Binding(
+            get: { diagnosticsContext.map { IdentifiableContext(context: $0) } },
+            set: { diagnosticsContext = $0?.context }
+        )) { wrapper in
+            RemoteDiagnosticsView(context: wrapper.context)
+        }
        .confirmationDialog(
            "Remove this server?",
            isPresented: Binding(
@@ -42,6 +49,13 @@ struct ManageServersView: View {
        )
    }

+    /// Wrapper because `ServerContext` isn't `Identifiable` against the sheet
+    /// item API in a way that preserves display-ordering stability.
+    private struct IdentifiableContext: Identifiable {
+        var id: ServerID { context.id }
+        let context: ServerContext
+    }
+
    private var header: some View {
        HStack {
            Text("Servers").font(.headline)
@@ -87,6 +101,13 @@ struct ManageServersView: View {
                        }
                    }
                    Spacer()
+                    Button {
+                        diagnosticsContext = entry.context
+                    } label: {
+                        Image(systemName: "stethoscope")
+                    }
+                    .buttonStyle(.borderless)
+                    .help("Run remote diagnostics — check exactly which files are readable on this server.")
                    Button {
                        pendingRemoveID = entry.id
                    } label: {
@@ -94,6 +115,7 @@ struct ManageServersView: View {
                    }
                    .buttonStyle(.borderless)
                    .foregroundStyle(.red)
+                    .help("Remove this server from Scarf.")
                }
                .padding(.vertical, 4)
            }
@@ -0,0 +1,203 @@
+import SwiftUI
+import AppKit
+
+/// Per-server diagnostics sheet. Shown from Manage Servers and from the
+/// Dashboard "Run Diagnostics…" button when `lastReadError` is set. Gives
+/// the user a specific list of what does/doesn't work over SSH, with
+/// targeted remediation hints for each failure.
+///
+/// Design principle: a failing check always shows both the raw detail the
+/// remote shell produced AND a human-written hint. The raw detail lets us
+/// triage bug reports; the hint unblocks the user without a round trip.
+struct RemoteDiagnosticsView: View {
+    let context: ServerContext
+    @State private var viewModel: RemoteDiagnosticsViewModel
+    @Environment(\.dismiss) private var dismiss
+
+    init(context: ServerContext) {
+        self.context = context
+        _viewModel = State(initialValue: RemoteDiagnosticsViewModel(context: context))
+    }
+
+    var body: some View {
+        VStack(spacing: 0) {
+            header
+            Divider()
+            probeList
+            Divider()
+            footer
+        }
+        .frame(minWidth: 640, minHeight: 520)
+        .task { await viewModel.run() }
+    }
+
+    private var header: some View {
+        VStack(alignment: .leading, spacing: 6) {
+            HStack {
+                Text("Remote Diagnostics — \(context.displayName)")
+                    .font(.title3)
+                    .fontWeight(.semibold)
+                Spacer()
+                if viewModel.isRunning {
+                    ProgressView()
+                        .controlSize(.small)
+                } else {
+                    Button("Re-run") { Task { await viewModel.run() } }
+                        .controlSize(.small)
+                }
+            }
+            HStack {
+                if viewModel.isRunning {
+                    Text("Running checks…")
+                        .font(.callout)
+                        .foregroundStyle(.secondary)
+                } else {
+                    Label(viewModel.summary, systemImage: viewModel.allPassed ? "checkmark.seal" : "info.circle")
+                        .font(.callout)
+                        .foregroundStyle(viewModel.allPassed ? .green : .orange)
+                }
+                Spacer()
+                if !viewModel.probes.isEmpty {
+                    Button {
+                        copyReportToClipboard()
+                    } label: {
+                        Label("Copy Full Report", systemImage: "doc.on.doc")
+                    }
+                    .controlSize(.small)
+                    .help("Copy a plain-text summary of every check (passes and fails) — paste into GitHub issues so we can see everything at once.")
+                }
+            }
+        }
+        .padding(16)
+    }
+
+    private var probeList: some View {
+        ScrollView {
+            VStack(alignment: .leading, spacing: 0) {
+                if viewModel.probes.isEmpty && viewModel.isRunning {
+                    Text("Running a single shell session on \(context.displayName) that exercises every path Scarf reads…")
+                        .font(.callout)
+                        .foregroundStyle(.secondary)
+                        .padding()
+                }
+                ForEach(viewModel.probes) { probe in
+                    probeRow(probe)
+                    if probe.id != viewModel.probes.last?.id {
+                        Divider()
+                    }
+                }
+            }
+        }
+    }
+
+    private func probeRow(_ probe: RemoteDiagnosticsViewModel.Probe) -> some View {
+        HStack(alignment: .top, spacing: 12) {
+            Image(systemName: probe.passed ? "checkmark.circle.fill" : "xmark.circle.fill")
+                .foregroundStyle(probe.passed ? .green : .red)
+                .font(.title3)
+                .padding(.top, 2)
+            VStack(alignment: .leading, spacing: 4) {
+                Text(probe.id.title)
+                    .font(.body)
+                if !probe.detail.isEmpty {
+                    Text(probe.detail)
+                        .font(.caption.monospaced())
+                        .foregroundStyle(.secondary)
+                        .textSelection(.enabled)
+                }
+                if !probe.passed, let hint = probe.id.failureHint {
+                    HStack(alignment: .top, spacing: 6) {
+                        Image(systemName: "lightbulb")
+                            .foregroundStyle(.yellow)
+                            .font(.caption)
+                        Text(hint)
+                            .font(.caption)
+                            .foregroundStyle(.primary)
+                            .textSelection(.enabled)
+                            .fixedSize(horizontal: false, vertical: true)
+                    }
+                    .padding(8)
+                    .background(Color.yellow.opacity(0.08))
+                    .clipShape(RoundedRectangle(cornerRadius: 6))
+                }
+            }
+            Spacer(minLength: 0)
+        }
+        .padding(.horizontal, 16)
+        .padding(.vertical, 10)
+    }
+
+    private var footer: some View {
+        VStack(alignment: .leading, spacing: 8) {
+            // Raw-output disclosure. Shown whenever anything fails — we need
+            // this visible for partial failures too since the raw stdout is
+            // the only way to see WHY a check returned its detail. Hidden
+            // only when 14/14 pass (script worked, nothing to debug).
+            if !viewModel.probes.isEmpty, !viewModel.allPassed {
+                DisclosureGroup("Raw remote output (for debugging)") {
+                    VStack(alignment: .leading, spacing: 6) {
+                        Text("exit code: \(viewModel.rawExitCode)")
+                            .font(.caption.monospaced())
+                        if !viewModel.rawStdout.isEmpty {
+                            Text("stdout:").font(.caption).foregroundStyle(.secondary)
+                            ScrollView {
+                                Text(viewModel.rawStdout)
+                                    .font(.system(size: 10, design: .monospaced))
+                                    .textSelection(.enabled)
+                                    .frame(maxWidth: .infinity, alignment: .leading)
+                            }
+                            .frame(maxHeight: 140)
+                        }
+                        if !viewModel.rawStderr.isEmpty {
+                            Text("stderr:").font(.caption).foregroundStyle(.secondary)
+                            ScrollView {
+                                Text(viewModel.rawStderr)
+                                    .font(.system(size: 10, design: .monospaced))
+                                    .textSelection(.enabled)
+                                    .frame(maxWidth: .infinity, alignment: .leading)
+                            }
+                            .frame(maxHeight: 140)
+                        }
+                    }
+                    .padding(8)
+                    .background(Color.secondary.opacity(0.08), in: RoundedRectangle(cornerRadius: 6))
+                }
+                .font(.caption)
+            }
+            HStack {
+                Text("Scarf runs these over a single SSH session that mirrors the shell your dashboard reads from, so a green row here means Scarf can actually read that file at runtime.")
+                    .font(.caption)
+                    .foregroundStyle(.secondary)
+                    .fixedSize(horizontal: false, vertical: true)
+                Spacer()
+                Button("Done") { dismiss() }
+                    .keyboardShortcut(.defaultAction)
+            }
+        }
+        .padding(16)
+    }
+
+    private func copyReportToClipboard() {
+        var lines: [String] = []
+        lines.append("Scarf remote diagnostics — \(context.displayName)")
+        if case .ssh(let config) = context.kind {
+            lines.append("Host: \(config.host)" + (config.user.map { " (user: \($0))" } ?? ""))
+            if let rh = config.remoteHome { lines.append("Hermes home (override): \(rh)") }
+        }
+        lines.append("Ran at: \(viewModel.startedAt.map { ISO8601DateFormatter().string(from: $0) } ?? "?")")
+        lines.append("Result: \(viewModel.summary)")
+        lines.append("")
+        for probe in viewModel.probes {
+            let mark = probe.passed ? "PASS" : "FAIL"
+            lines.append("[\(mark)] \(probe.id.title)")
+            if !probe.detail.isEmpty { lines.append("    \(probe.detail)") }
+            if !probe.passed, let hint = probe.id.failureHint {
+                lines.append("    hint: \(hint)")
+            }
+        }
+        let text = lines.joined(separator: "\n")
+        let pb = NSPasteboard.general
+        pb.clearContents()
+        pb.setString(text, forType: .string)
+    }
+}
@@ -6,12 +6,33 @@
 //

 import Testing
+import Darwin
@testable import scarf

-struct scarfTests {
+@Suite struct ControlPathTests {

-    @Test func example() async throws {
-        // Write your test here and use APIs like `#expect(...)` to check expected conditions.
+    /// macOS' `sun_path` is 104 bytes including the NUL terminator. OpenSSH's
+    /// `%C` token expands to a 64-char SHA1 hex digest. The full bound socket
+    /// path is `<controlDir>/<%C>`, so the worst case is
+    /// `controlDir.utf8.count + 1 (separator) + 64 (hash) + 1 (NUL)`.
+    ///
+    /// This test pins the invariant violated by the original Caches-based
+    /// path that triggered issue #19 — any future change to `controlDirPath`
+    /// that drifts back over the limit will fail here instead of silently
+    /// breaking remote SSH for users with long `$HOME` strings.
+    @Test func controlDirPathFitsMacOSSocketLimit() {
+        let dir = SSHTransport.controlDirPath()
+        let worstCase = dir.utf8.count + 1 + 64 + 1
+        #expect(worstCase <= 104,
+                "ControlPath worst case is \(worstCase) bytes for dir '\(dir)'; macOS sun_path limit is 104")
    }

+    /// `/tmp` is shared across all users on a Mac. The per-uid suffix is what
+    /// keeps two local users' control sockets from colliding — guard against
+    /// a refactor that drops it.
+    @Test func controlDirPathIsPerUser() {
+        let dir = SSHTransport.controlDirPath()
+        #expect(dir.contains("\(getuid())"),
+                "ControlPath '\(dir)' must include the current uid for per-user isolation")
+    }
 }
@@ -0,0 +1,254 @@
+#!/usr/bin/env bash
+#
+# Scarf wiki helper — wraps the local .wiki-worktree/ clone with a secret-scan.
+#
+# Usage:
+#   ./scripts/wiki.sh status                 # git status inside .wiki-worktree/
+#   ./scripts/wiki.sh pull                   # fetch + fast-forward; aborts if dirty
+#   ./scripts/wiki.sh new <Page-Name>        # create Page-Name.md with stub template
+#   ./scripts/wiki.sh stub-check             # list pages still containing the TODO stub
+#   ./scripts/wiki.sh commit "<msg>"         # secret-scan, then git add -A && git commit
+#   ./scripts/wiki.sh push                   # secret-scan again, then git push
+#   ./scripts/wiki.sh touch <Page-Name>      # bump "Last updated" line to today
+#   ./scripts/wiki.sh --help                 # this help
+#
+# The secret-scan has two tiers:
+#   - Hard patterns (tokens, keys, private-key headers): block with non-zero exit.
+#   - Soft keywords (password, api_key, secret, bearer, authorization:, .env):
+#     warn and require --force-terms on commit/push to proceed.
+#
+# A user-maintained blocklist lives at scripts/wiki-blocklist.txt (gitignored).
+# One pattern per line — blank lines and lines starting with # are ignored.
+# Matches are treated as HARD blocks. Use this for personal IPs, hostnames, etc.
+#
+# Bootstrap (one-time):
+#   1. In GitHub repo Settings → Features → Wikis, enable Wikis and restrict
+#      editing to collaborators.
+#   2. Visit https://github.com/awizemann/scarf/wiki and save any first page
+#      via the UI (this is what creates the underlying .wiki.git repo).
+#   3. From repo root:
+#        git clone git@github.com:awizemann/scarf.wiki.git .wiki-worktree
+#
+# Recovery: if .wiki-worktree/ is deleted, re-run step 3. Remote is authoritative.
+
+set -euo pipefail
+
+# ---------- config ----------
+REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+WIKI_DIR="$REPO_ROOT/.wiki-worktree"
+WIKI_REMOTE="git@github.com:awizemann/scarf.wiki.git"
+BLOCKLIST="$REPO_ROOT/scripts/wiki-blocklist.txt"
+STUB_MARKER='> **TODO: document.**'
+MAINTENANCE_PAGE="Wiki-Maintenance.md"
+
+# ---------- helpers ----------
+log()  { printf '\033[1;34m==> %s\033[0m\n' "$*"; }
+warn() { printf '\033[1;33m[WARN] %s\033[0m\n' "$*" >&2; }
+die()  { printf '\033[1;31m[ERR] %s\033[0m\n' "$*" >&2; exit 1; }
+
+need_worktree() {
+  [[ -d "$WIKI_DIR/.git" ]] || die "no wiki clone at $WIKI_DIR
+  Run: git clone $WIKI_REMOTE .wiki-worktree
+  (See --help for bootstrap steps.)"
+}
+
+in_wiki() { (cd "$WIKI_DIR" && "$@"); }
+
+today() { date +%Y-%m-%d; }
+
+# ---------- secret-scan ----------
+#
+# scan_hard:   exits non-zero if any hard pattern or user-blocklist pattern matches.
+# scan_soft:   warns (returns 1) if any soft keyword matches, else 0. The caller
+#              decides whether to bail based on --force-terms.
+#
+# All scans ignore the .git directory of the wiki clone.
+
+hard_regex='(sk-[A-Za-z0-9_-]{20,}|ghp_[A-Za-z0-9]{30,}|ghs_[A-Za-z0-9]{30,}|ghu_[A-Za-z0-9]{30,}|gho_[A-Za-z0-9]{30,}|ghr_[A-Za-z0-9]{30,}|github_pat_[A-Za-z0-9_]{20,}|xox[baprs]-[A-Za-z0-9-]{10,}|AKIA[0-9A-Z]{16}|AIza[0-9A-Za-z_-]{35}|-----BEGIN [A-Z ]*PRIVATE KEY-----|BEGIN OPENSSH PRIVATE KEY)'
+# Soft tier targets KEY=value / KEY: value patterns where the key name suggests
+# a secret. Catches accidental .env paste; ignores legitimate mentions of the
+# words "password" / "secret" in prose.
+soft_regex='([Pp]assword|[Aa]pi[_-]?[Kk]ey|[Ss]ecret[_-][Kk]ey|[Tt]oken|[Aa]uth[_-]?[Tt]oken|[Bb]earer)[[:space:]]*[=:][[:space:]]*['"'"'"]?[A-Za-z0-9_+./-]{8,}'
+
+scan_hard() {
+  local hits
+  hits="$(cd "$WIKI_DIR" && grep -rInE --exclude-dir=.git "$hard_regex" . 2>/dev/null || true)"
+  if [[ -n "$hits" ]]; then
+    printf '%s\n' "$hits" >&2
+    die "hard-pattern secret match — aborting. Remove the above and retry."
+  fi
+
+  if [[ -f "$BLOCKLIST" ]]; then
+    local pat
+    while IFS= read -r pat; do
+      [[ -z "$pat" || "$pat" =~ ^# ]] && continue
+      local blocklist_hits
+      blocklist_hits="$(cd "$WIKI_DIR" && grep -rInF --exclude-dir=.git -- "$pat" . 2>/dev/null || true)"
+      if [[ -n "$blocklist_hits" ]]; then
+        printf '%s\n' "$blocklist_hits" >&2
+        die "user blocklist match on pattern \"$pat\" — aborting."
+      fi
+    done < "$BLOCKLIST"
+  fi
+}
+
+scan_soft() {
+  local hits
+  # Exempt the maintenance page (it documents the forbidden terms on purpose).
+  hits="$(cd "$WIKI_DIR" && grep -rInE --exclude-dir=.git --exclude="$MAINTENANCE_PAGE" "$soft_regex" . 2>/dev/null || true)"
+  if [[ -n "$hits" ]]; then
+    printf '%s\n' "$hits" >&2
+    warn "soft-keyword matches above. Review carefully. Pass --force-terms to proceed."
+    return 1
+  fi
+  return 0
+}
+
+# ---------- commands ----------
+cmd_status() {
+  need_worktree
+  in_wiki git status
+}
+
+cmd_pull() {
+  need_worktree
+  if [[ -n "$(in_wiki git status --porcelain)" ]]; then
+    die "wiki has uncommitted changes — commit or stash before pulling."
+  fi
+  log "Pulling wiki"
+  in_wiki git fetch origin
+  in_wiki git merge --ff-only origin/master
+  log "Up to date."
+}
+
+cmd_new() {
+  need_worktree
+  local name="${1:-}"
+  [[ -n "$name" ]] || die "usage: wiki.sh new <Page-Name>"
+  # Normalize spaces → dashes; strip .md if the user added it.
+  name="${name%.md}"
+  name="${name// /-}"
+  local path="$WIKI_DIR/${name}.md"
+  if [[ -e "$path" ]]; then
+    die "already exists: $path"
+  fi
+  local title="${name//-/ }"
+  cat > "$path" <<EOF
+# ${title}
+
+${STUB_MARKER} This page is a stub. See [Wiki Maintenance](Wiki-Maintenance).
+
+---
+_Last updated: $(today) — stub_
+EOF
+  log "Created $path"
+}
+
+cmd_stub_check() {
+  need_worktree
+  # Wiki-Maintenance.md documents the stub template in a code fence, so it
+  # would always match the marker — exempt it.
+  local matches count
+  matches="$(cd "$WIKI_DIR" && grep -rlnF --exclude-dir=.git --exclude="$MAINTENANCE_PAGE" -- "$STUB_MARKER" . 2>/dev/null || true)"
+  if [[ -z "$matches" ]]; then
+    log "No stub pages remain."
+    return 0
+  fi
+  count="$(printf '%s\n' "$matches" | wc -l | tr -d ' ')"
+  log "$count stub page(s):"
+  printf '%s\n' "$matches" | sed 's|^\./||' | sort
+}
+
+cmd_touch() {
+  need_worktree
+  local name="${1:-}"
+  [[ -n "$name" ]] || die "usage: wiki.sh touch <Page-Name>"
+  name="${name%.md}"
+  name="${name// /-}"
+  local path="$WIKI_DIR/${name}.md"
+  [[ -f "$path" ]] || die "no such page: $path"
+  # Replace the YYYY-MM-DD portion of the Last updated line, keep whatever trails it.
+  # Uses sed -i '' for BSD/macOS compatibility.
+  sed -i '' -E "s/(_Last updated: )[0-9]{4}-[0-9]{2}-[0-9]{2}/\1$(today)/" "$path"
+  log "Touched ${name}.md → $(today)"
+}
+
+cmd_commit() {
+  need_worktree
+  local msg="" force=0
+  for arg in "$@"; do
+    case "$arg" in
+      --force-terms) force=1 ;;
+      -*) die "unknown flag: $arg" ;;
+      *) [[ -z "$msg" ]] && msg="$arg" || die "unexpected arg: $arg" ;;
+    esac
+  done
+  [[ -n "$msg" ]] || die 'usage: wiki.sh commit "<message>" [--force-terms]'
+  log "Secret-scan (hard patterns + blocklist)"
+  scan_hard
+  log "Secret-scan (soft keywords)"
+  if ! scan_soft; then
+    if [[ "$force" -eq 1 ]]; then
+      warn "proceeding past soft-keyword matches (--force-terms)"
+    else
+      die "soft-keyword matches — re-run with --force-terms to proceed."
+    fi
+  fi
+  in_wiki git add -A
+  if [[ -z "$(in_wiki git status --porcelain)" ]]; then
+    warn "nothing to commit"
+    return 0
+  fi
+  in_wiki git commit -m "$msg"
+  log "Committed: $msg"
+}
+
+cmd_push() {
+  need_worktree
+  local force=0
+  for arg in "$@"; do
+    case "$arg" in
+      --force-terms) force=1 ;;
+      *) die "unknown arg: $arg" ;;
+    esac
+  done
+  log "Secret-scan before push (hard patterns + blocklist)"
+  scan_hard
+  log "Secret-scan before push (soft keywords)"
+  if ! scan_soft; then
+    if [[ "$force" -eq 1 ]]; then
+      warn "proceeding past soft-keyword matches (--force-terms)"
+    else
+      die "soft-keyword matches — re-run with --force-terms to proceed."
+    fi
+  fi
+  # Nothing to push?
+  local ahead
+  ahead="$(in_wiki git rev-list --count @{u}..HEAD 2>/dev/null || echo 0)"
+  if [[ "$ahead" -eq 0 ]]; then
+    warn "nothing to push (0 commits ahead of origin)"
+    return 0
+  fi
+  log "Pushing $ahead commit(s) to origin"
+  in_wiki git push origin HEAD
+  log "Pushed. Verify at https://github.com/awizemann/scarf/wiki"
+}
+
+cmd_help() {
+  sed -n '2,32p' "$0" | sed 's/^# \{0,1\}//'
+}
+
+# ---------- dispatch ----------
+sub="${1:-}"
+shift || true
+case "$sub" in
+  status)     cmd_status "$@" ;;
+  pull)       cmd_pull "$@" ;;
+  new)        cmd_new "$@" ;;
+  stub-check) cmd_stub_check "$@" ;;
+  touch)      cmd_touch "$@" ;;
+  commit)     cmd_commit "$@" ;;
+  push)       cmd_push "$@" ;;
+  -h|--help|help|"") cmd_help ;;
+  *) die "unknown subcommand: $sub (run --help)" ;;
+esac
Author	SHA1	Message	Date
Alan Wizemann	0384c6ef17	chore: Bump version to 2.0.2	2026-04-20 15:46:07 -07:00
Alan Wizemann	f36fb55ebe	test(ssh): regression tests for ControlPath socket-limit invariants Two tests pinning the invariants that were violated / introduced by the #19 / PR #20 fix: - controlDirPathFitsMacOSSocketLimit: asserts dir + '/' + 64-char %C hash + NUL <= 104 bytes. Would have caught the original Caches-based path landing at 105 bytes for users with longer $HOME strings. - controlDirPathIsPerUser: asserts the path includes the current uid, pinning the per-user-isolation invariant against any future refactor that drops it (since /tmp is shared across all local users). scarfTests was a stub before this — these are the suite's first real tests.	2026-04-20 15:45:29 -07:00
Alan Wizemann	1823160546	fix(ssh): defensive ControlPath dir + sweep stale sockets Layered hardening on top of the /tmp ControlPath move from #20: - ensureControlDir uses POSIX mkdir(0700) + lstat instead of createDirectory + setAttributes. Closes the /tmp pre-creation TOCTOU: any local user can pre-create /tmp/scarf-ssh-<uid>, and the old code would silently fail to chmod a hostile dir back to 0700 (since we wouldn't own it). Now we refuse to use a dir that isn't a real directory we own with mode 0700, and log via os.Logger. - sweepStaleControlSockets removes ControlMaster socket files older than 30 minutes from controlDirPath() at app launch. Symmetric to sweepOrphanSnapshots — keeps /tmp/scarf-ssh-<uid>/ from accumulating crashed-master / unclean-exit orphans indefinitely until reboot. The 30-min threshold (vs ControlPersist's 10 min) ensures any concurrent Scarf instance's live sockets are untouched.	2026-04-20 15:45:20 -07:00
Alan Wizemann	d2a447fcc4	docs: add GitHub wiki + scripts/wiki.sh helper with secret-scan Public docs now live at https://github.com/awizemann/scarf/wiki (separate git repo cloned to .wiki-worktree/, mirroring the .gh-pages-worktree/ pattern). Internal dev notes stay in scarf/docs/. scripts/wiki.sh wraps pull/commit/push with a two-pass secret-scan: hard patterns (token regexes + private-key headers + a user-maintained scripts/wiki-blocklist.txt) abort with non-zero exit; soft assignment patterns (api_key=…, password=…, token=…) warn and require --force-terms. CLAUDE.md gains a Wiki section listing the update triggers (new feature, new service, architecture change, Hermes version bump, full release, keyboard/sidebar change) and the workflow. CONTRIBUTING.md points external contributors at the wiki Edit button or a direct clone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 15:32:47 -07:00
Alan Wizemann	76bfeb34d4	chore: Bump version to 2.0.1	2026-04-20 15:32:47 -07:00
Alan Wizemann	85a4ec0e14	Merge pull request #20 from aliatx2017/fix/controlpath-too-long fix: ControlPath too long for Unix socket on macOS	2026-04-20 15:08:40 -07:00
Alan Wizemann	1453c7a841	Merge fix/issue-19-ssh-diagnostics into main — v2.0.1 hotfix Closes #19 (remote SSH connections showed connected but every view read as empty). Eight commits bring: - Result-returning readers in HermesFileService that surface errors instead of silently returning nil - HermesDataService.open records lastOpenError with humanized hints - Dashboard orange banner when remote reads fail - New Remote Diagnostics sheet (14-probe checklist, stethoscope icon) - Yellow 'degraded' pill state for 'connected but can't read' case - Auto-suggest remoteHome in Test Connection for systemd/Docker installs at /var/lib/hermes/.hermes etc. - Log-noise suppression for expected 'No such file' reads - Diagnostics script pipes via stdin to sh -s (not sh -c argv), so multi-line scripts run in one sh process with variable scope - Pill UX: state-specific SF Symbol instead of dot, no custom background, centered via .principal - README 'Remote setup requirements' + troubleshooting section Investigation notes + deferred follow-ups recorded in the session transcript. See releases/v2.0.1/RELEASE_NOTES.md for the full user-facing breakdown.	2026-04-20 14:27:11 -07:00
Alan Wizemann	bd21a539e6	docs: update v2.0.1 release notes for diagnostics fixes + pill UX Reflect the three post-initial-commit fixes: - log-noise suppression (skill.yaml / optional-file 'No such file' warnings no longer spam Console via the new Result-returning readers) - diagnostics script now stdin-pipes to sh -s instead of sh -c <script> argv, so it runs as one sh process with variable scope preserved - pill UX: replaced colored dot with state-specific SF Symbol (checkmark / stethoscope / arrows / triangle), removed custom background, kept .principal placement for centering Also expanded the 'Known follow-ups' section so users know what's explicitly deferred post-2.0.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:26:33 -07:00
Alan Wizemann	d3055702ef	fix: connection pill — revert to .principal, swap dot for state SF Symbol Rolling back the .primaryAction placement (the pill shifted right and lost its centered position in the toolbar). The "funny background with shadow" visible in the toolbar is macOS's own .principal emphasis bezel — not something Scarf draws, and not something we can cleanly hide without disabling the toolbar surface itself. The native bezel is the pill's frame; we just have to make the pill's interior read well inside it. Two changes to make the pill itself look like a toolbar tool inside that bezel: - Drop the colored dot, replace with a state-specific SF Symbol. The icon's shape signals clickability (looks like a tool button), and its color signals state (green/orange/yellow/red hierarchical). Less "status chip", more "toolbar button with status". - Icons per state: - connected → checkmark.circle.fill (click to re-probe) - degraded → stethoscope (click to run diagnostics, matches the stethoscope on the Manage Servers row) - idle → arrow.triangle.2.circlepath (checking/retry) - error → exclamationmark.triangle.fill (click for stderr) Horizontal padding = 4 so the icon-and-label sit balanced inside the bezel rather than pushed up against its edges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:22:35 -07:00
Alan Wizemann	ee1d705abc	fix: move connection pill off .principal to drop the emphasis bezel macOS applies a centered emphasis bezel (light capsule + drop shadow) to ToolbarItem(placement: .principal) — visible in screenshots as a doubly-framed "capsule behind the pill" look. The pill itself doesn't own that background; the toolbar placement does. .primaryAction (right side of the toolbar) has no decorative background, so the pill renders as just the colored dot + label text directly on the toolbar surface. Fits the intended minimal look. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:16:54 -07:00
Alan Wizemann	8e3dafe4c6	fix: remove the pill's own capsule background The toolbar item already draws its own bezel for the principal-placement slot; painting a `Color.secondary.opacity(0.08)` capsule on top gave the pill a doubly-framed look. Drop the pill's background + the padding that was only there to fit inside the capsule. The dot + label now sit directly on the toolbar's native surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:14:12 -07:00
Alan Wizemann	c51241dc72	fix: diagnostics script — pipe to sh via stdin, not sh -c argv The previous fix (direct ssh argv, bypassing transport.runProcess) got us from 0/14 to 7/14, but \$H was empty everywhere it was referenced — the user's 7/14 report showed: - probe 4 (hermesHomeConfigured): PASS with empty detail - probe 5 (hermesDirExists FAIL): "not a directory:" (empty after colon) - probe 11 (sqlite3CanOpenStateDB FAIL): 'unable to open "/state.db"' Root cause: `ssh host -- /bin/sh -c <script>` doesn't travel as three argv entries to the remote. ssh concatenates them with single spaces into one command string and sends that to the remote's LOGIN shell. The login shell then runs `$LOGIN_SHELL -c "$string"`, and bash's parser treats unquoted newlines inside `$string` as command separators. So the first newline splits the script: `/bin/sh -c H="..."` becomes one command (which runs in an ephemeral sh subprocess that exits immediately), and every subsequent line runs in the login shell with no \$H set. TestConnectionProbe happens to still work because its downstream lines don't depend on an assignment from the first line — but the diagnostic script's \$H is used everywhere, so the entire script is effectively running with \$H="". Fix: pipe the script into `/bin/sh -s` on stdin via ssh's own stdin channel. `sh -s` reads a shell program from stdin and executes it in one process, variable scope preserved. Implementation uses Process.standardInput with a Pipe, writing the script after proc.run() and closing the write end so sh sees EOF. Same as `cat script.sh \| ssh host -- /bin/sh -s` from the command line. Also: raw-output disclosure panel in the diagnostics sheet now shows whenever ANY probe fails, not only when all fail. Partial failures are the most common failure mode and the raw stdout is the only way to see why a specific detail came back the way it did. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:04:32 -07:00
Alan Wizemann	ec03627bcd	fix: diagnostics sheet — bypass transport.runProcess for shell script First-run of diagnostics against a working Mardon returned 0/14 passing with "(no output)" for every probe — including the trivial "emit connectivity PASS" that the script emits unconditionally. That meant the script wasn't executing as written; the parser saw `__END__` but no probe lines. Root cause: SSHTransport.runProcess wraps every argument through `remotePathArg`, which is designed for PATHS (it rewrites `~/` to `$HOME/` and double-quotes the result with backslash-escapes). Passing a multi-line shell script with embedded `"$1"` / `"$2"` / `"$3"` and `printf '\n'` escape sequences through that is corruption — the remote sh -c receives a scrambled script and silently emits nothing. TestConnectionProbe already works around this: it builds the ssh argv directly (ssh host -- /bin/sh -c <script>) so the script travels as a single opaque argv entry and ssh forwards it to the remote shell unchanged. Mirror that approach. RemoteDiagnosticsViewModel.execute now: - For remote contexts: builds ssh argv directly (ControlMaster-aware, uses the same socket as SSHTransport so it's effectively free after the first connection), then passes /bin/sh -c <script> as argv. - For local contexts: spawns /bin/sh -c <script> via Process directly. Also surfaces raw stdout/stderr/exit-code in a disclosure panel at the bottom of the sheet, visible only when ALL probes fail. Makes any future transport-level breakage self-diagnosing: the user sees exactly what the remote returned, not just "(no output)" rows. Expose SSHTransport.controlDirPath (already static) as a public helper so the diagnostics probe reuses the same ControlMaster socket as the connection itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:56:09 -07:00
Alan Wizemann	f8069a4481	fix: don't log 'No such file' as warnings on remote reads The Result-returning readers I added for the v2.0.1 diagnostics surface were logging EVERY failure, including routine "file doesn't exist" cases — e.g. skill.yaml files under ~/.hermes/skills/*/ that are optional metadata, gateway_state.json before Hermes has started, memories/USER.md on fresh installs. In practice this meant the Platforms view and similar feature loaders that walk directories and read optional files now spam the Console with warnings on every refresh. That's noisier than useful and actively hides the signal (permission denied, connection failure, sqlite3 missing) we added the logging to surface. readFileDataResult now detects the "no such file" case via either: - TransportError.fileIO(_, "No such file...") from SSHTransport - NSCocoaErrorDomain code 260 (NSFileNoSuchFileError) from FileManager - NSPOSIXErrorDomain code 2 (ENOENT) and suppresses the warning log for those paths. The Result.failure is still returned, so any caller that cares (Dashboard's banner, Remote Diagnostics) can still distinguish missing from present-but-unreadable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:52:28 -07:00
Alan Wizemann	110170d6e9	fix: v2.0.1 — surface remote SSH file-access errors (closes #19 ) Three users reported on day-one of v2.0 that SSH connections showed a green "Connected" pill but every data view read as empty / "not running" / "not configured". The common thread across Docker, homelab VM, and Ubuntu VPS setups: file-access failures on the remote that Scarf silently swallowed into nil/empty defaults. Stop swallowing errors - HermesFileService gains Result-returning variants for the four dashboard-critical readers: loadConfigResult, loadGatewayStateResult, hermesPIDResult, plus readFileResult / readFileDataResult as primitives. Each logs os.Logger warnings on failure. Legacy nil- returning signatures remain as thin forwarders. - HermesDataService.open records lastOpenError with humanized hints for the top three failure modes — sqlite3 not installed, permission denied, file not found. Each maps to concrete remediation (`apt install sqlite3`, "check file perms", "set Hermes data directory"). Dashboard surfaces the error - DashboardViewModel collects errors from every loader into lastReadError, only on remote contexts (local skips the banner). - DashboardView renders an orange banner above the stats with the specific error text, a copy-selectable detail, and a "Run Diagnostics…" button. New Remote Diagnostics sheet (stethoscope icon) - RemoteDiagnosticsViewModel runs 14 checks in one SSH round-trip via a pipe-delimited "KEY\|STATUS\|DETAIL" protocol. Covers: SSH connectivity, remote user/$HOME, Hermes dir existence + readability, config.yaml readability + actual read (distinct from just `test -e` which can't detect permission issues), state.db readability, sqlite3 binary presence, sqlite3 open test, hermes binary on non-login AND login PATH, pgrep availability. - Each probe row shows a targeted hint on fail (e.g. "check perms on ~/.hermes", "apt install sqlite3", "move PATH export from .bashrc to .zshenv"). A Copy Full Report button dumps plain-text output for GitHub issues. - Accessible from Manage Servers (stethoscope button per row) and directly from the yellow pill. Yellow "degraded" connection state - ConnectionStatusViewModel.Status gains .degraded(reason:) between .connected and .error. After tier-1 `true` passes, the probe runs tier-2 `test -r $HOME/.hermes/config.yaml` in the same SSH round- trip. On tier-2 fail, pill is orange with "Connected — can't read Hermes state" tooltip. - Clicking a degraded pill opens Remote Diagnostics directly. Exactly the symptom in #19 is now one click from a specific answer. Auto-suggest remoteHome for non-default installs - TestConnectionProbe.TestResult.success gains suggestedRemoteHome: String?. When state.db isn't found at the configured path, the probe also checks /var/lib/hermes/.hermes, /opt/hermes/.hermes, /home/hermes/.hermes, /root/.hermes — the common alternates for systemd services, Docker containers, and single-user VPSes — and surfaces the first hit as a "Use this" suggestion in Add Server. - AddServerSheet relabels "Remote ~/.hermes override" to "Hermes data directory" with an explanation of when you'd use it. README - New "Remote setup requirements" subsection lists the four concrete prereqs (SSH, sqlite3, pgrep, read access to ~/.hermes). - New "Troubleshooting remote connections" paragraph describes the diagnostics sheet and remoteHome auto-suggest for the two most common failure modes. Releases - releases/v2.0.1/RELEASE_NOTES.md for the GitHub release body. - Ship via `./scripts/release.sh 2.0.1`. Closes #19. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:40:35 -07:00
Alex Maksimchuk	1293cfa23b	fix: use short ControlPath to avoid Unix socket limit on macOS The ControlMaster socket path ~/Library/Caches/scarf/ssh/%C can exceed the 104-byte macOS Unix domain socket limit when the username is long, causing ssh to silently exit 255 with "unix_listener: path too long for Unix domain socket". Switch to /tmp/scarf-ssh-<uid> which stays well within the limit.	2026-04-19 23:03:28 -05:00