Merge fix/issue-19-ssh-diagnostics into main — v2.0.1 hotfix

Closes #19 (remote SSH connections showed connected but every view read as empty). Eight commits bring: - Result-returning readers in HermesFileService that surface errors instead of silently returning nil - HermesDataService.open records lastOpenError with humanized hints - Dashboard orange banner when remote reads fail - New Remote Diagnostics sheet (14-probe checklist, stethoscope icon) - Yellow 'degraded' pill state for 'connected but can't read' case - Auto-suggest remoteHome in Test Connection for systemd/Docker installs at /var/lib/hermes/.hermes etc. - Log-noise suppression for expected 'No such file' reads - Diagnostics script pipes via stdin to sh -s (not sh -c argv), so multi-line scripts run in one sh process with variable scope - Pill UX: state-specific SF Symbol instead of dot, no custom background, centered via .principal - README 'Remote setup requirements' + troubleshooting section Investigation notes + deferred follow-ups recorded in the session transcript. See releases/v2.0.1/RELEASE_NOTES.md for the full user-facing breakdown.
2026-05-10 10:36:35 +00:00 · 2026-04-20 14:27:11 -07:00
parent ba8bf14ff0 bd21a539e6
commit 1453c7a841
15 changed files with 1251 additions and 64 deletions
@@ -42,6 +42,21 @@ Scarf 2.0 is a multi-window app. Each window is bound to exactly one Hermes serv

 Remote Hermes is reached over system SSH — the same `~/.ssh/config`, ssh-agent, ProxyJump, and ControlMaster pooling your terminal uses. File I/O flows through `scp`/`sftp`; SQLite is served from atomic `sqlite3 .backup` snapshots cached under `~/Library/Caches/scarf/snapshots/<server-id>/`; chat (ACP) tunnels as `ssh -T host -- hermes acp` with JSON-RPC over stdio end-to-end. Everything in the feature list below works against remote identically to local.

+### Remote setup requirements
+
+The remote host must have:
+
+1. **SSH access** — key-based auth via your local ssh-agent. Scarf never prompts for passphrases; run `ssh-add` once in Terminal before connecting.
+2. **`sqlite3`** on the remote `$PATH` — needed for the atomic DB snapshots. Install on the remote with `apt install sqlite3` (Ubuntu/Debian), `yum install sqlite` (RHEL/Fedora), or `apk add sqlite` (Alpine).
+3. **`pgrep`** on the remote `$PATH` — used by the Dashboard "is Hermes running" check. Standard on every distro; install `procps` if missing.
+4. **`~/.hermes/` readable by the SSH user**. When Hermes runs as a separate user (systemd service, Docker container), the SSH user needs read access to `config.yaml` and `state.db`. Either (a) SSH as the Hermes user, (b) `chmod` Hermes's home to be group-readable and add your SSH user to that group, or (c) set the **Hermes data directory** field when adding the server to point at the right location (e.g. `/var/lib/hermes/.hermes`).
+
+### Troubleshooting remote connections
+
+If the connection pill is green but the Dashboard shows "Stopped", "unknown", or empty values, the SSH user can't read the Hermes state files. Open **Manage Servers → 🩺 Run Diagnostics** (or click the yellow "Can't read Hermes state" pill in the toolbar). The diagnostics sheet runs fourteen checks in one SSH session — connectivity, `sqlite3` presence, read access to `config.yaml` and `state.db`, the effective non-login `$PATH` — and tells you exactly which one fails and why, with remediation hints for each. Use the **Copy Full Report** button to paste the full output into a bug report.
+
+For the common "Hermes isn't at the default path" case (systemd services, Docker), **Test Connection** in the Add Server sheet now probes `/var/lib/hermes/.hermes`, `/opt/hermes/.hermes`, `/home/hermes/.hermes`, and `/root/.hermes` when it can't find `state.db` at `~/.hermes/`, and offers a one-click fill if it finds any of them.
+
 ## Features

 Scarf mirrors Hermes's surface area through a sidebar-based UI. Sections below map 1:1 to the app's sidebar.
@@ -0,0 +1,68 @@
+## What's New in 2.0.1
+
+Hotfix for [#19](https://github.com/awizemann/scarf/issues/19) and the related reports from the first day of v2.0: users' remote SSH connections would show a green "Connected" pill but every view (Dashboard, Sessions, Activity, Chat) read as empty / "not running" / "not configured". Three distinct environments reported it — Docker Hermes on a LAN, homelab VM over Tailscale, Ubuntu VPS — and every one was a silent file-access failure on the remote that Scarf wasn't surfacing.
+
+### Errors no longer disappear
+
+Every remote read (`config.yaml`, `gateway_state.json`, `state.db`, `pgrep`) used to silently substitute an empty value on *any* failure — permission denied, missing file, `sqlite3` not installed, connection drop — they all looked identical to the UI. Now:
+
+- Each failure logs a specific warning via `os.Logger` (visible in Console.app under subsystem `com.scarf`).
+- The Dashboard shows an orange banner above the stats with the exact error (e.g. "Permission denied reading `~/.hermes/state.db`") and a **Run Diagnostics…** button.
+- `HermesDataService` exposes a `lastOpenError` so views can explain *why* state.db couldn't be opened, rather than just rendering zeros.
+- Routine "file doesn't exist" cases (optional `skill.yaml` metadata, `gateway_state.json` before Hermes starts, `memories/USER.md` on fresh installs) are detected and **not** logged as warnings — only real errors (permission denied, connection drops, `sqlite3` missing) hit the log. Prevents Console from filling with false-positive noise when directory walks encounter optional files.
+
+### New Remote Diagnostics sheet
+
+Accessible from **Manage Servers → 🩺** per-server button, or by clicking the orange connection pill when Scarf can see the server but can't read Hermes state. Runs fourteen checks in a single SSH session and shows pass/fail for each, plus a targeted hint per failure:
+
+- SSH connectivity and auth
+- Remote user identity and `$HOME` resolution
+- `~/.hermes` directory existence and readability
+- `config.yaml` readable (existence *and* actual read access — the old probe only checked existence)
+- `state.db` readable
+- `sqlite3` installed on the remote (required for the atomic snapshot Scarf pulls)
+- `sqlite3` can actually open `state.db`
+- `hermes` binary on the non-login `$PATH` (what runtime uses)
+- `hermes` binary on the login `$PATH` (what the Test Connection probe uses)
+- `pgrep` available (for the "is Hermes running" check)
+
+One **Copy Full Report** button dumps every check as plain text for bug reports, and a raw-output disclosure panel shows the exact stdout/stderr the remote returned whenever any probe fails — so transport-level problems are self-diagnosing.
+
+The diagnostics script is piped to `/bin/sh -s` on stdin rather than passed as `sh -c <script>` argv. The latter was getting split line-by-line by the remote's login shell (newlines parsed as command separators), which stranded variables set on line 1 in an ephemeral `sh` subprocess that exited before line 2 could use them. Stdin-piping runs the whole script in one `sh` process with variable scope preserved.
+
+### Connection pill gains a "degraded" state
+
+The pill used to be green as long as SSH connected; now after connectivity passes it runs a second-tier check (`test -r $HOME/.hermes/config.yaml`). If that fails, the pill turns **orange** with "Connected — can't read Hermes state" and clicking it opens Remote Diagnostics directly. This is the exact symptom mode in #19, and it's now one click away from a specific answer.
+
+The pill's visual also got a pass: the colored dot is replaced with a state-specific SF Symbol (`checkmark.circle.fill` / `stethoscope` / `arrow.triangle.2.circlepath` / `exclamationmark.triangle.fill`), which reads more like a clickable toolbar tool and doubles as the status signal. No custom pill background anymore — the toolbar's native `.principal` bezel is the frame.
+
+### Auto-suggest the correct `remoteHome` during Add Server
+
+When Test Connection can't find `state.db` at the configured (or default) path, it now also probes the common alternate locations — `/var/lib/hermes/.hermes`, `/opt/hermes/.hermes`, `/home/hermes/.hermes`, `/root/.hermes` — and offers a one-click "Use this" fill if it finds one. Removes the need to know that systemd-installed Hermes lives at `/var/lib/hermes/.hermes` by convention.
+
+### Clearer copy for the `remoteHome` field
+
+The Add Server sheet field is now labeled "Hermes data directory" with a description explaining when you'd override it (systemd service installs, Docker sidecars) and noting that Test Connection auto-suggests.
+
+### README has a new "Remote setup requirements" section
+
+Four concrete prerequisites (SSH, `sqlite3`, `pgrep`, read access to `~/.hermes`) and a troubleshooting paragraph pointing at Remote Diagnostics.
+
+### Migrating from 2.0.0
+
+Sparkle will offer the update automatically. Settings and server list are preserved verbatim — this is purely additive (new diagnostics surface, new error banners, auto-suggest in Test Connection). If you were affected by #19, run Remote Diagnostics after updating; the sheet should pinpoint the specific file access issue and suggest a fix.
+
+### Under the hood
+
+- New types: `RemoteDiagnosticsViewModel`, `RemoteDiagnosticsView`. Both are local to Scarf; no new transport protocol.
+- `HermesFileService` gains `loadConfigResult()`, `loadGatewayStateResult()`, `hermesPIDResult()`, `readFileResult()`, `readFileDataResult()` — Result-returning variants that preserve the error. Legacy `loadConfig()` etc. still exist as thin forwarders for callers that don't need diagnostics.
+- `HermesDataService.open()` records `lastOpenError` with humanized hints for "sqlite3 not installed", "permission denied", and "file not found" — the three failure modes that produce 90% of issue #19 symptoms.
+- `ConnectionStatusViewModel` status enum gains `.degraded(reason:)` between `.connected` and `.error`.
+- `TestConnectionProbe` result enum gains `suggestedRemoteHome: String?` carrying any alternate-location hit.
+
+### Known follow-ups (not in 2.0.1)
+
+- `TestConnectionProbe` uses a direct-argv ssh invocation that's functionally correct but fragile (works by accident when split across the login shell). Should be ported to the stdin-pipe pattern the diagnostics sheet now uses.
+- Remaining `try?`-swallowed read paths beyond the four Dashboard-surfacing ones — Cron, Memory, Skills, MCP Servers, Platforms still silently render empty on read errors. Same fix pattern applies, low priority.
+- `hermesBinaryHint` is only populated when the user clicks Test Connection; if they skip it, ACP chat and CLI calls fall back to bare `hermes` which requires it on the non-interactive PATH (rarely true for `~/.local/bin` installs). The connection-pill's second-tier probe could auto-populate this.
+- Docker-host support: when users SSH to a Docker host, `pgrep` and `~/.hermes/` on the host don't see what's inside the container. Needs a `docker exec` wrapping option per server.
@@ -21,6 +21,10 @@ struct ContentView: View {
                        ServerSwitcherToolbar()
                    }
                    if serverContext.isRemote {
+                        // `.principal` centers the pill in the toolbar —
+                        // the native emphasis bezel is the intended frame;
+                        // the pill's own visual content (icon + label, no
+                        // background) sits inside it in balance.
                        ToolbarItem(placement: .principal) {
                            ConnectionStatusPill(status: connectionStatus)
                        }
@@ -1,5 +1,6 @@
 import Foundation
 import SQLite3
+import os

 /// Dedupes concurrent `snapshotSQLite` calls for the same server. When the
 /// file watcher ticks, Dashboard + Sessions + Activity (+ Chat's loadHistory)
@@ -29,11 +30,18 @@ actor SnapshotCoordinator {
 }

 actor HermesDataService {
+    private static let logger = Logger(subsystem: "com.scarf", category: "HermesDataService")
+
    private var db: OpaquePointer?
    private var hasV07Schema = false
    /// Local filesystem path we last opened. For remote contexts this is
    /// the cached snapshot under `~/Library/Caches/scarf/snapshots/<id>/`.
    private var openedAtPath: String?
+    /// Last error from `open()` / `refresh()`, user-presentable. `nil` means
+    /// the last attempt succeeded. Views surface this when their own load
+    /// path fails, so the user sees "Permission denied reading state.db"
+    /// instead of an empty Dashboard with no explanation.
+    private(set) var lastOpenError: String?

    let context: ServerContext
    private let transport: any ServerTransport
@@ -52,16 +60,25 @@ actor HermesDataService {
            // corrupt. Routed through SnapshotCoordinator so concurrent
            // view models don't each spawn a parallel SSH backup for the
            // same server.
-            let url = try? await SnapshotCoordinator.shared.snapshot(
-                remotePath: context.paths.stateDB,
-                contextID: context.id,
-                transport: transport
-            )
-            guard let url else { return false }
-            localPath = url.path
+            do {
+                let url = try await SnapshotCoordinator.shared.snapshot(
+                    remotePath: context.paths.stateDB,
+                    contextID: context.id,
+                    transport: transport
+                )
+                localPath = url.path
+                lastOpenError = nil
+            } catch {
+                lastOpenError = humanize(error)
+                Self.logger.warning("snapshotSQLite failed: \(error.localizedDescription, privacy: .public)")
+                return false
+            }
        } else {
            localPath = context.paths.stateDB
-            guard FileManager.default.fileExists(atPath: localPath) else { return false }
+            guard FileManager.default.fileExists(atPath: localPath) else {
+                lastOpenError = "Hermes state database not found at \(localPath)."
+                return false
+            }
        }
        // Remote snapshots are point-in-time copies that no one writes to;
        // opening them with `immutable=1` tells SQLite to skip WAL/SHM and
@@ -81,14 +98,41 @@ actor HermesDataService {
        }
        let result = sqlite3_open_v2(openPath, &db, flags, nil)
        guard result == SQLITE_OK else {
+            let msg: String
+            if let db {
+                msg = String(cString: sqlite3_errmsg(db))
+            } else {
+                msg = "sqlite3_open_v2 returned \(result)"
+            }
+            lastOpenError = "Couldn't open state.db: \(msg)"
+            Self.logger.warning("sqlite3_open_v2 failed (\(result)) at \(localPath, privacy: .public): \(msg, privacy: .public)")
            db = nil
            return false
        }
        openedAtPath = localPath
+        lastOpenError = nil
        detectSchema()
        return true
    }

+    /// Turn a transport error into the one-line string Dashboard shows. Adds
+    /// hints for the common "sqlite3 not installed" and "permission denied"
+    /// cases so users know what to do.
+    private nonisolated func humanize(_ error: Error) -> String {
+        let desc = (error as? LocalizedError)?.errorDescription ?? error.localizedDescription
+        let lower = desc.lowercased()
+        if lower.contains("sqlite3: command not found") || lower.contains("sqlite3: not found") {
+            return "sqlite3 is not installed on \(context.displayName). Install it with `apt install sqlite3` (Ubuntu/Debian) or `yum install sqlite` (RHEL/Fedora)."
+        }
+        if lower.contains("permission denied") {
+            return "Permission denied reading Hermes state on \(context.displayName). The SSH user may not have read access to ~/.hermes/state.db — try Run Diagnostics."
+        }
+        if lower.contains("no such file") {
+            return "Hermes state not found at ~/.hermes on \(context.displayName). If Hermes is installed elsewhere, set its data directory in Manage Servers."
+        }
+        return desc
+    }
+
    /// Force a fresh snapshot pull + reopen. Used on session-load and in
    /// any path that needs the UI to reflect writes Hermes just made.
    /// Without this, remote snapshots would be frozen at the first `open()`
@@ -1,7 +1,10 @@
 import Foundation
+import os

 struct HermesFileService: Sendable {

+    nonisolated static let logger = Logger(subsystem: "com.scarf", category: "HermesFileService")
+
    let context: ServerContext
    let transport: any ServerTransport

@@ -17,6 +20,14 @@ struct HermesFileService: Sendable {
        return parseConfig(content)
    }

+    /// Error-surfacing config load. Used by Dashboard to show the user a
+    /// specific reason when config.yaml can't be read on a remote host
+    /// (permission denied, missing file, sqlite3 not installed, etc.)
+    /// instead of silently falling back to `.empty`.
+    nonisolated func loadConfigResult() -> Result<HermesConfig, Error> {
+        readFileResult(context.paths.configYAML).map { parseConfig($0) }
+    }
+
    nonisolated private func parseConfig(_ yaml: String) -> HermesConfig {
        let parsed = Self.parseNestedYAML(yaml)
        let values = parsed.values
@@ -385,11 +396,34 @@ struct HermesFileService: Sendable {
        do {
            return try JSONDecoder().decode(GatewayState.self, from: data)
        } catch {
-            print("[Scarf] Failed to decode gateway state: \(error.localizedDescription)")
+            Self.logger.warning("Failed to decode gateway state: \(error.localizedDescription, privacy: .public)")
            return nil
        }
    }

+    /// Error-surfacing gateway-state load. `.success(nil)` means the file
+    /// doesn't exist yet (gateway hasn't written state — normal when Hermes
+    /// is stopped). `.failure` means the file exists but couldn't be read
+    /// (permission denied, connection down, JSON corruption).
+    nonisolated func loadGatewayStateResult() -> Result<GatewayState?, Error> {
+        // Distinguish "file doesn't exist yet" (normal, returns .success(nil))
+        // from "file exists but we can't read or parse it" (error).
+        if !transport.fileExists(context.paths.gatewayStateJSON) {
+            return .success(nil)
+        }
+        switch readFileDataResult(context.paths.gatewayStateJSON) {
+        case .success(let data):
+            do {
+                return .success(try JSONDecoder().decode(GatewayState.self, from: data))
+            } catch {
+                Self.logger.warning("Failed to decode gateway state: \(error.localizedDescription, privacy: .public)")
+                return .failure(error)
+            }
+        case .failure(let err):
+            return .failure(err)
+        }
+    }
+
    // MARK: - Memory

    nonisolated func loadMemoryProfiles() -> [String] {
@@ -1173,22 +1207,45 @@ struct HermesFileService: Sendable {
    }

    nonisolated func hermesPID() -> pid_t? {
-        // Run `pgrep -f hermes` either locally or via the transport. On
-        // remote hosts we trust `pgrep` to be present — it's standard on
-        // Linux and macOS. On failure we conservatively return nil rather
-        // than pretending Hermes is down: the caller will see
-        // isHermesRunning==false, which is already the "unknown" UX.
-        let result = try? transport.runProcess(
-            executable: "/usr/bin/pgrep",
-            args: ["-f", "hermes"],
-            stdin: nil,
-            timeout: 5
-        )
-        guard let result, let firstLine = result.stdoutString
-            .components(separatedBy: "\n")
-            .first(where: { !$0.isEmpty }),
-              let pid = pid_t(firstLine.trimmingCharacters(in: .whitespaces)) else { return nil }
-        return pid
+        switch hermesPIDResult() {
+        case .success(let pid): return pid
+        case .failure: return nil
+        }
+    }
+
+    /// Error-surfacing variant. `.success(nil)` means `pgrep` ran successfully
+    /// and found no hermes process (Hermes is genuinely not running).
+    /// `.failure` means we couldn't probe at all (pgrep missing, connection
+    /// down, permission issue) — a *different* UX from "not running".
+    nonisolated func hermesPIDResult() -> Result<pid_t?, Error> {
+        do {
+            let result = try transport.runProcess(
+                executable: "/usr/bin/pgrep",
+                args: ["-f", "hermes"],
+                stdin: nil,
+                timeout: 5
+            )
+            // pgrep exits 1 when nothing matches — that's "not running", NOT an
+            // error. Anything else (127=command not found, 255=ssh failure) is.
+            if result.exitCode == 0 {
+                if let firstLine = result.stdoutString
+                    .components(separatedBy: "\n")
+                    .first(where: { !$0.isEmpty }),
+                   let pid = pid_t(firstLine.trimmingCharacters(in: .whitespaces)) {
+                    return .success(pid)
+                }
+                return .success(nil)
+            } else if result.exitCode == 1 {
+                return .success(nil)   // genuinely not running
+            } else {
+                let err = TransportError.commandFailed(exitCode: result.exitCode, stderr: result.stderrString)
+                Self.logger.warning("pgrep failed (exit \(result.exitCode)): \(result.stderrString, privacy: .public)")
+                return .failure(err)
+            }
+        } catch {
+            Self.logger.warning("pgrep transport error: \(error.localizedDescription, privacy: .public)")
+            return .failure(error)
+        }
    }

    @discardableResult
@@ -1488,15 +1545,80 @@ struct HermesFileService: Sendable {
    // MARK: - File I/O

    /// Read a UTF-8 text file through the transport. Missing files and any
-    /// transport error surface as `nil` — callers treat missing/unreadable
-    /// the same way they always have.
+    /// transport error surface as `nil` — callers that don't need the
+    /// specific error reason keep using this. New call sites that want to
+    /// show a user-actionable message should use `readFileResult`.
    nonisolated private func readFile(_ path: String) -> String? {
-        guard let data = try? transport.readFile(path) else { return nil }
-        return String(data: data, encoding: .utf8)
+        switch readFileResult(path) {
+        case .success(let s):
+            return s
+        case .failure:
+            return nil
+        }
    }

    nonisolated private func readFileData(_ path: String) -> Data? {
-        try? transport.readFile(path)
+        switch readFileDataResult(path) {
+        case .success(let d):
+            return d
+        case .failure:
+            return nil
+        }
+    }
+
+    /// Error-surfacing read. Returns the decoded text on success, or the
+    /// underlying `TransportError` (or raw error for local failures) on
+    /// failure. Every failure is also logged via `os.Logger` — the warning
+    /// trail in Console.app is how we diagnose "connection green, data
+    /// empty" bug reports without needing to wire the error through every
+    /// existing call site.
+    nonisolated func readFileResult(_ path: String) -> Result<String, Error> {
+        switch readFileDataResult(path) {
+        case .success(let data):
+            guard let s = String(data: data, encoding: .utf8) else {
+                let err = TransportError.fileIO(path: path, underlying: "file is not valid UTF-8")
+                Self.logger.warning("readFile(\(path, privacy: .public)): not UTF-8")
+                return .failure(err)
+            }
+            return .success(s)
+        case .failure(let err):
+            return .failure(err)
+        }
+    }
+
+    nonisolated func readFileDataResult(_ path: String) -> Result<Data, Error> {
+        do {
+            let data = try transport.readFile(path)
+            return .success(data)
+        } catch {
+            // Don't log "No such file" — that's a routine, expected case
+            // for optional files (skill.yaml, gateway_state.json before
+            // Hermes starts, ~/.hermes/memories/USER.md on fresh installs,
+            // etc.). The caller still gets the Result.failure so it can
+            // distinguish missing from present-but-unreadable.
+            // Log everything else — permission denied, connection drops,
+            // sqlite3 missing — since those are actionable diagnostics.
+            if !Self.isFileNotFound(error) {
+                Self.logger.warning("readFile(\(path, privacy: .public)) failed: \(error.localizedDescription, privacy: .public)")
+            }
+            return .failure(error)
+        }
+    }
+
+    /// `true` iff the error represents "file does not exist" as opposed to
+    /// a permission / transport / parse failure. Used to suppress routine
+    /// logging for optional files while still surfacing real problems.
+    nonisolated private static func isFileNotFound(_ error: Error) -> Bool {
+        if let transportErr = error as? TransportError,
+           case .fileIO(_, let underlying) = transportErr {
+            return underlying.lowercased().contains("no such file")
+        }
+        // Cocoa NSFileNoSuchFileError (returned by LocalTransport when
+        // reading a missing file via FileManager).
+        let ns = error as NSError
+        if ns.domain == NSCocoaErrorDomain && ns.code == 260 { return true }
+        if ns.domain == NSPOSIXErrorDomain && ns.code == 2 { return true }   // ENOENT
+        return false
    }

    /// Write a UTF-8 text file atomically through the transport. Matches the
@@ -21,28 +21,73 @@ final class DashboardViewModel {
    var hermesRunning = false
    var isLoading = true

+    /// User-presentable error banner. Set when any of the remote reads
+    /// (state.db snapshot, config.yaml, gateway_state.json, pgrep) failed
+    /// in a way that's not just "file doesn't exist yet". Dashboard renders
+    /// this above the stats with a "Run Diagnostics…" button. `nil` = no
+    /// surfaceable error.
+    var lastReadError: String?
+
    func load() async {
        isLoading = true
        // refresh() = close + reopen, forces a fresh remote snapshot. Cheap
        // on local (live DB reopen).
        let opened = await dataService.refresh()
+        var collectedErrors: [String] = []
        if opened {
            stats = await dataService.fetchStats()
            recentSessions = await dataService.fetchSessions(limit: 5)
            sessionPreviews = await dataService.fetchSessionPreviews(limit: 5)
            await dataService.close()
+        } else if let msg = await dataService.lastOpenError {
+            collectedErrors.append(msg)
        }
        // The fileService methods are synchronous and route through the
        // transport. For remote contexts each call is a blocking ssh
        // round-trip — do them off the main thread to avoid spinning the
        // beach ball during the load.
        let svc = fileService
-        let (cfg, gw, running) = await Task.detached {
-            (svc.loadConfig(), svc.loadGatewayState(), svc.isHermesRunning())
+        struct LoadResults: Sendable {
+            let cfg: Result<HermesConfig, Error>
+            let gw: Result<GatewayState?, Error>
+            let running: Result<pid_t?, Error>
+        }
+        let results = await Task.detached { () -> LoadResults in
+            LoadResults(
+                cfg: svc.loadConfigResult(),
+                gw: svc.loadGatewayStateResult(),
+                running: svc.hermesPIDResult()
+            )
        }.value
-        config = cfg
-        gatewayState = gw
-        hermesRunning = running
+
+        switch results.cfg {
+        case .success(let c): config = c
+        case .failure(let e):
+            config = .empty
+            collectedErrors.append("config.yaml — \(e.localizedDescription)")
+        }
+        switch results.gw {
+        case .success(let g): gatewayState = g
+        case .failure(let e):
+            gatewayState = nil
+            collectedErrors.append("gateway_state.json — \(e.localizedDescription)")
+        }
+        switch results.running {
+        case .success(let pid): hermesRunning = (pid != nil)
+        case .failure(let e):
+            hermesRunning = false
+            collectedErrors.append("pgrep — \(e.localizedDescription)")
+        }
+
+        // Only surface when there's a real error AND we're on a remote
+        // context. Local contexts rarely hit these paths (live DB, local
+        // filesystem), and a transient "file doesn't exist yet" on fresh
+        // installs shouldn't scare users.
+        if context.isRemote, !collectedErrors.isEmpty {
+            lastReadError = collectedErrors.joined(separator: "\n")
+        } else {
+            lastReadError = nil
+        }
        isLoading = false
    }
 }
@@ -2,6 +2,7 @@ import SwiftUI

 struct DashboardView: View {
    @State private var viewModel: DashboardViewModel
+    @State private var showDiagnostics = false
    @Environment(AppCoordinator.self) private var coordinator
    @Environment(HermesFileWatcher.self) private var fileWatcher

@@ -13,6 +14,9 @@ struct DashboardView: View {
    var body: some View {
        ScrollView {
            VStack(alignment: .leading, spacing: 20) {
+                if let err = viewModel.lastReadError {
+                    readErrorBanner(err)
+                }
                statusSection
                statsSection
                recentSessionsSection
@@ -30,6 +34,44 @@ struct DashboardView: View {
        .onChange(of: fileWatcher.lastChangeDate) {
            Task { await viewModel.load() }
        }
+        .sheet(isPresented: $showDiagnostics) {
+            RemoteDiagnosticsView(context: viewModel.context)
+        }
+    }
+
+    /// Banner shown above the Dashboard when one or more remote reads
+    /// failed (permission denied, missing sqlite3, wrong home dir, etc.).
+    /// Replaces the old silent-failure mode where empty values just
+    /// appeared as "Stopped / unknown / 0" with no explanation.
+    private func readErrorBanner(_ err: String) -> some View {
+        VStack(alignment: .leading, spacing: 8) {
+            HStack(alignment: .top, spacing: 8) {
+                Image(systemName: "exclamationmark.triangle.fill")
+                    .foregroundStyle(.orange)
+                VStack(alignment: .leading, spacing: 4) {
+                    Text("Can't read Hermes state on \(viewModel.context.displayName)")
+                        .font(.headline)
+                    Text(err)
+                        .font(.caption.monospaced())
+                        .foregroundStyle(.secondary)
+                        .textSelection(.enabled)
+                        .fixedSize(horizontal: false, vertical: true)
+                }
+                Spacer()
+                Button {
+                    showDiagnostics = true
+                } label: {
+                    Label("Run Diagnostics…", systemImage: "stethoscope")
+                }
+                .controlSize(.regular)
+            }
+        }
+        .padding(12)
+        .background(Color.orange.opacity(0.1), in: RoundedRectangle(cornerRadius: 8))
+        .overlay(
+            RoundedRectangle(cornerRadius: 8)
+                .strokeBorder(Color.orange.opacity(0.3), lineWidth: 1)
+        )
    }

    private var statusSection: some View {
@@ -22,7 +22,12 @@ final class AddServerViewModel {
    var testResult: TestResult?

    enum TestResult: Equatable {
-        case success(hermesPath: String, dbFound: Bool)
+        /// `suggestedRemoteHome` is non-nil when the probe didn't find
+        /// state.db at the configured (or default) path but did find a
+        /// `state.db` at one of the well-known alternates (e.g. a systemd
+        /// install in `/var/lib/hermes/.hermes`). UI offers a one-click
+        /// fill so the user doesn't have to know the convention.
+        case success(hermesPath: String, dbFound: Bool, suggestedRemoteHome: String?)
        /// `command` is the full ssh invocation we attempted (so the user can
        /// paste it into Terminal to see what their shell does with it).
        /// `stderr` is whatever ssh / the remote shell wrote to stderr.
@@ -95,7 +100,7 @@ final class AddServerViewModel {
    /// `hermesBinaryHint` so subsequent calls don't need to re-resolve it.
    func configForSave() -> SSHConfig {
        var cfg = draftConfig
-        if case .success(let path, _) = testResult {
+        if case .success(let path, _, _) = testResult {
            cfg.hermesBinaryHint = path
        }
        return cfg
@@ -11,8 +11,12 @@ final class ConnectionStatusViewModel {
    private let logger = Logger(subsystem: "com.scarf", category: "ConnectionStatus")

    enum Status: Equatable {
-        /// Healthy: most recent probe succeeded.
+        /// Healthy: SSH connected AND we can read `~/.hermes/config.yaml`.
        case connected
+        /// SSH connects but the follow-up read-access probe failed. Data
+        /// views will be empty until this is resolved. `reason` is shown
+        /// in the pill tooltip; users click the pill to open diagnostics.
+        case degraded(reason: String)
        /// No probe yet or the previous probe timed out but we haven't
        /// confirmed failure. Shown as yellow to tell the user "checking…".
        case idle
@@ -72,20 +76,59 @@ final class ConnectionStatusViewModel {

    private func probeOnce() async {
        let snapshot = transport
-        let result: Result<Void, TransportError>
-        // Transport IO on a detached task so we don't block MainActor.
-        result = await Task.detached {
+        let hermesHome = context.paths.home
+        // Two-tier probe in one SSH round-trip:
+        //   tier 1: `true` — raw connectivity / auth / ControlMaster path
+        //   tier 2: `test -r $HERMESHOME/config.yaml` — can we actually
+        //           read the file Dashboard reads on every tick? Green pill
+        //           only if both pass; yellow "degraded" if tier 1 passes
+        //           but tier 2 fails (the exact symptom in issue #19).
+        // Script emits two lines: TIER1:<exitcode> and TIER2:<exitcode>.
+        let homeArg: String
+        if hermesHome.hasPrefix("~/") {
+            homeArg = "\"$HOME/\(hermesHome.dropFirst(2))\""
+        } else if hermesHome == "~" {
+            homeArg = "\"$HOME\""
+        } else {
+            homeArg = "\"\(hermesHome.replacingOccurrences(of: "\"", with: "\\\""))\""
+        }
+        let script = """
+        echo TIER1:0
+        H=\(homeArg)
+        if [ -r "$H/config.yaml" ]; then echo TIER2:0; else echo TIER2:1; fi
+        """
+
+        enum ProbeOutcome {
+            case connected
+            case degraded(reason: String)
+            case failure(TransportError)
+        }
+
+        let outcome: ProbeOutcome = await Task.detached {
            do {
                let probe = try snapshot.runProcess(
                    executable: "/bin/sh",
-                    args: ["-c", "true"],
+                    args: ["-c", script],
                    stdin: nil,
                    timeout: 10
                )
-                if probe.exitCode == 0 {
-                    return .success(())
+                guard probe.exitCode == 0 else {
+                    return .failure(.commandFailed(exitCode: probe.exitCode, stderr: probe.stderrString))
                }
-                return .failure(.commandFailed(exitCode: probe.exitCode, stderr: probe.stderrString))
+                let out = probe.stdoutString
+                let tier1 = out.contains("TIER1:0")
+                let tier2 = out.contains("TIER2:0")
+                if !tier1 {
+                    // The script itself didn't reach tier 1 — treat as connection failure.
+                    return .failure(.commandFailed(exitCode: 1, stderr: out))
+                }
+                if tier2 {
+                    return .connected
+                }
+                // Connected but can't read config.yaml — the core issue #19
+                // symptom. Give the pill a short reason; the full story goes
+                // into Remote Diagnostics.
+                return .degraded(reason: "can't read ~/.hermes/config.yaml")
            } catch let e as TransportError {
                return .failure(e)
            } catch {
@@ -93,11 +136,15 @@ final class ConnectionStatusViewModel {
            }
        }.value

-        switch result {
-        case .success:
+        switch outcome {
+        case .connected:
            status = .connected
            lastSuccess = Date()
            consecutiveFailures = 0
+        case .degraded(let reason):
+            status = .degraded(reason: reason)
+            lastSuccess = Date()   // SSH itself is fine, reset failure count
+            consecutiveFailures = 0
        case .failure(let err):
            consecutiveFailures += 1
            // First failure → silent yellow "Reconnecting…" while we try
@@ -0,0 +1,474 @@
+import Foundation
+import os
+
+/// Runs a fixed check-list against a remote server and reports per-probe
+/// pass/fail. Exists because `TestConnectionProbe` only verifies ssh
+/// connectivity + hermes binary presence, and `ConnectionStatusViewModel`
+/// only pings `/bin/sh -c true`. When users file "connection green but
+/// everything empty" bug reports (issue #19), this is the diagnostic surface
+/// that tells them (and us) exactly which read fails and why.
+///
+/// One shell invocation runs every check on the remote and emits a
+/// line-delimited `KEY|STATUS|DETAIL` protocol that the view model parses.
+/// Cheaper than one SSH round-trip per probe and gives a consistent shell
+/// environment across all probes.
+@Observable
+@MainActor
+final class RemoteDiagnosticsViewModel {
+    private static let logger = Logger(subsystem: "com.scarf", category: "RemoteDiagnostics")
+
+    let context: ServerContext
+
+    /// Probes in display order. The order matters: connectivity first, then
+    /// environment checks, then Hermes data-path checks. A failure early in
+    /// the list usually explains every subsequent failure.
+    enum ProbeID: String, CaseIterable, Identifiable {
+        case connectivity
+        case remoteUser
+        case remoteHome
+        case hermesHomeConfigured
+        case hermesDirExists
+        case hermesDirReadable
+        case configYAMLReadable
+        case configYAMLContents
+        case stateDBReadable
+        case sqlite3Installed
+        case sqlite3CanOpenStateDB
+        case hermesBinaryNonLogin
+        case hermesBinaryLogin
+        case pgrepAvailable
+
+        var id: String { rawValue }
+
+        /// Human-readable title rendered in the diagnostics sheet.
+        var title: String {
+            switch self {
+            case .connectivity:           return "SSH connectivity"
+            case .remoteUser:             return "Remote user identity"
+            case .remoteHome:             return "Remote $HOME"
+            case .hermesHomeConfigured:   return "Hermes home directory"
+            case .hermesDirExists:        return "Hermes directory exists"
+            case .hermesDirReadable:      return "Hermes directory readable"
+            case .configYAMLReadable:     return "config.yaml readable"
+            case .configYAMLContents:     return "config.yaml actually readable (content)"
+            case .stateDBReadable:        return "state.db readable"
+            case .sqlite3Installed:       return "sqlite3 binary installed on remote"
+            case .sqlite3CanOpenStateDB:  return "sqlite3 can open state.db"
+            case .hermesBinaryNonLogin:   return "hermes binary on non-login PATH"
+            case .hermesBinaryLogin:      return "hermes binary on login PATH (via rc files)"
+            case .pgrepAvailable:         return "pgrep available (for 'is Hermes running')"
+            }
+        }
+
+        /// When the check fails, show this hint alongside the stderr.
+        var failureHint: String? {
+            switch self {
+            case .connectivity:
+                return "SSH itself can't complete. Before re-testing in Scarf, confirm `ssh <host>` works in Terminal."
+            case .remoteUser, .remoteHome:
+                return nil
+            case .hermesHomeConfigured:
+                return nil
+            case .hermesDirExists:
+                return "Scarf is looking at the default `~/.hermes`. If Hermes is installed elsewhere (e.g. `/var/lib/hermes/.hermes` for systemd installs), set the Hermes home directory in Manage Servers → this server → Edit."
+            case .hermesDirReadable:
+                return "The SSH user can see `~/.hermes` but can't list it. Check permissions: `ls -ld ~/.hermes` on the remote — the SSH user needs at least `r-x`."
+            case .configYAMLReadable, .configYAMLContents:
+                return "Scarf can't read `config.yaml`. This usually means the SSH user is different from the user Hermes runs as. Either (a) run Hermes as the SSH user, (b) `chmod a+r ~/.hermes/config.yaml`, or (c) configure Scarf to SSH as the Hermes user."
+            case .stateDBReadable:
+                return "Scarf can't read `state.db` — Sessions, Activity, Dashboard stats all depend on this. Same fix pattern as config.yaml."
+            case .sqlite3Installed:
+                return "Scarf pulls a snapshot of state.db via `sqlite3 .backup`, so sqlite3 must be installed on the remote. Install: `sudo apt install sqlite3` (Ubuntu/Debian), `sudo yum install sqlite` (RHEL/Fedora), `apk add sqlite` (Alpine)."
+            case .sqlite3CanOpenStateDB:
+                return "sqlite3 exists but can't open state.db. Could be a permission issue, a corrupt DB, or a version skew."
+            case .hermesBinaryNonLogin:
+                return "Scarf's runtime calls use non-login SSH shells (no .bashrc). If `hermes` only appears here via the login path, runtime CLI calls will fail. Move your PATH export from `.bashrc` to `.zshenv` or `.profile`."
+            case .hermesBinaryLogin:
+                return "hermes couldn't be located even after sourcing login rc files. Install path is non-standard — set the hermes binary path manually in Manage Servers."
+            case .pgrepAvailable:
+                return "pgrep not found on remote. Dashboard can't determine whether Hermes is running. Install procps: `apt install procps` (most distros have it by default)."
+            }
+        }
+    }
+
+    struct Probe: Identifiable, Sendable {
+        let id: ProbeID
+        let passed: Bool
+        let detail: String
+    }
+
+    private(set) var probes: [Probe] = []
+    private(set) var isRunning: Bool = false
+    private(set) var startedAt: Date?
+    private(set) var finishedAt: Date?
+    /// Raw stdout/stderr from the most recent run, preserved so the UI can
+    /// surface them in a disclosure panel when things look wrong. This is
+    /// how we debug cases where the script ran but no probes were parsed
+    /// (e.g. transport-quoting bugs, dash-vs-bash incompatibilities).
+    private(set) var rawStdout: String = ""
+    private(set) var rawStderr: String = ""
+    private(set) var rawExitCode: Int32 = 0
+
+    init(context: ServerContext) {
+        self.context = context
+    }
+
+    /// Kick off the full check list. Safe to call again to re-run.
+    func run() async {
+        if isRunning { return }
+        isRunning = true
+        probes = []
+        startedAt = Date()
+        finishedAt = nil
+
+        let script = Self.buildScript(hermesHome: context.paths.home)
+        let captured = await Self.execute(script: script, context: context)
+
+        switch captured {
+        case .connectFailure(let msg):
+            rawStdout = ""
+            rawStderr = msg
+            rawExitCode = -1
+            probes = [
+                Probe(id: .connectivity, passed: false, detail: msg)
+            ] + ProbeID.allCases
+                .filter { $0 != .connectivity }
+                .map { Probe(id: $0, passed: false, detail: "(skipped — SSH didn't connect)") }
+        case .completed(let stdout, let stderr, let exitCode):
+            rawStdout = stdout
+            rawStderr = stderr
+            rawExitCode = exitCode
+            probes = Self.parse(stdout: stdout, stderr: stderr, exitCode: exitCode)
+        }
+
+        finishedAt = Date()
+        isRunning = false
+        Self.logger.info("Diagnostics for \(self.context.displayName, privacy: .public) finished — \(self.passingCount)/\(self.probes.count) passing")
+    }
+
+    /// Quick summary string, e.g. "9/14 passing". Used in the header.
+    var summary: String {
+        guard !probes.isEmpty else { return "Not yet run." }
+        return "\(passingCount)/\(probes.count) checks passing"
+    }
+
+    var passingCount: Int {
+        probes.filter { $0.passed }.count
+    }
+
+    var allPassed: Bool {
+        !probes.isEmpty && passingCount == probes.count
+    }
+
+    // MARK: - Script + parsing
+
+    /// Build the remote shell script. Uses a pipe-delimited protocol so the
+    /// Swift side can parse without regex surprises. Status is `PASS` or
+    /// `FAIL`; detail is a single line (can be blank). `__END__` at the
+    /// bottom lets us detect truncation.
+    private static func buildScript(hermesHome: String) -> String {
+        // Shell-quote the home path — user may have typed `~/.hermes` which
+        // we want the remote shell to expand, so we substitute `~/` with
+        // `$HOME/` like `SSHTransport.remotePathArg` does.
+        let expanded: String
+        if hermesHome.hasPrefix("~/") {
+            expanded = "\"$HOME/\(hermesHome.dropFirst(2))\""
+        } else if hermesHome == "~" {
+            expanded = "\"$HOME\""
+        } else {
+            // Absolute path — still quote in case of spaces.
+            expanded = "\"\(hermesHome.replacingOccurrences(of: "\"", with: "\\\""))\""
+        }
+
+        return #"""
+        H=\#(expanded)
+        emit() { printf '%s|%s|%s\n' "$1" "$2" "$3"; }
+
+        emit connectivity PASS "(running in this shell)"
+
+        user=$(id -un 2>/dev/null || echo unknown)
+        emit remoteUser PASS "$user"
+
+        emit remoteHome PASS "$HOME"
+
+        emit hermesHomeConfigured PASS "$H"
+
+        if [ -d "$H" ]; then
+            emit hermesDirExists PASS "$H"
+        else
+            emit hermesDirExists FAIL "not a directory: $H"
+        fi
+
+        if [ -r "$H" ] && [ -x "$H" ]; then
+            emit hermesDirReadable PASS ""
+        else
+            emit hermesDirReadable FAIL "cannot read/enter $H (check perms on the dir)"
+        fi
+
+        if [ -r "$H/config.yaml" ]; then
+            emit configYAMLReadable PASS ""
+        else
+            if [ -e "$H/config.yaml" ]; then
+                emit configYAMLReadable FAIL "exists but not readable by $user"
+            else
+                emit configYAMLReadable FAIL "file does not exist"
+            fi
+        fi
+
+        if head -c 1 "$H/config.yaml" > /dev/null 2>&1; then
+            size=$(wc -c < "$H/config.yaml" 2>/dev/null | tr -d ' ')
+            emit configYAMLContents PASS "${size} bytes"
+        else
+            emit configYAMLContents FAIL "cannot read file contents"
+        fi
+
+        if [ -r "$H/state.db" ]; then
+            size=$(wc -c < "$H/state.db" 2>/dev/null | tr -d ' ')
+            emit stateDBReadable PASS "${size} bytes"
+        else
+            if [ -e "$H/state.db" ]; then
+                emit stateDBReadable FAIL "exists but not readable by $user"
+            else
+                emit stateDBReadable FAIL "file does not exist"
+            fi
+        fi
+
+        if command -v sqlite3 > /dev/null 2>&1; then
+            sq=$(command -v sqlite3)
+            emit sqlite3Installed PASS "$sq"
+        else
+            emit sqlite3Installed FAIL "sqlite3 not on PATH"
+        fi
+
+        if sqlite3 "$H/state.db" 'SELECT 1' > /dev/null 2>&1; then
+            emit sqlite3CanOpenStateDB PASS ""
+        else
+            err=$(sqlite3 "$H/state.db" 'SELECT 1' 2>&1 | head -1)
+            emit sqlite3CanOpenStateDB FAIL "$err"
+        fi
+
+        # Non-login PATH: just ask the current shell.
+        hpath=$(command -v hermes 2>/dev/null)
+        if [ -n "$hpath" ]; then
+            emit hermesBinaryNonLogin PASS "$hpath"
+        else
+            emit hermesBinaryNonLogin FAIL "not on non-login PATH ($PATH)"
+        fi
+
+        # Login PATH: source rc files (mirroring TestConnectionProbe) and re-probe.
+        for rc in "$HOME/.zshenv" "$HOME/.zprofile" "$HOME/.bash_profile" "$HOME/.profile"; do
+            [ -f "$rc" ] && . "$rc" 2>/dev/null
+        done
+        hpath2=$(command -v hermes 2>/dev/null)
+        if [ -z "$hpath2" ]; then
+            for cand in "$HOME/.local/bin/hermes" "/opt/homebrew/bin/hermes" "/usr/local/bin/hermes" "$HOME/.hermes/bin/hermes"; do
+                if [ -x "$cand" ]; then hpath2="$cand"; break; fi
+            done
+        fi
+        if [ -n "$hpath2" ]; then
+            emit hermesBinaryLogin PASS "$hpath2"
+        else
+            emit hermesBinaryLogin FAIL "not found after sourcing rc files"
+        fi
+
+        if command -v pgrep > /dev/null 2>&1; then
+            emit pgrepAvailable PASS "$(command -v pgrep)"
+        else
+            emit pgrepAvailable FAIL "pgrep not on PATH"
+        fi
+
+        printf '__END__\n'
+        """#
+    }
+
+    enum Captured {
+        case connectFailure(String)
+        case completed(stdout: String, stderr: String, exitCode: Int32)
+    }
+
+    private static func execute(script: String, context: ServerContext) async -> Captured {
+        // Can't use `transport.runProcess(executable: "/bin/sh", args: ["-c", script])`
+        // here: SSHTransport.runProcess pipes every argument through
+        // `remotePathArg` (which double-quotes to rewrite `~/` → `$HOME/`),
+        // which mangles a multi-line shell script containing `"$1"`,
+        // nested quotes, and `printf` escape sequences. The result on the
+        // remote is a scrambled string and every probe fails to emit.
+        //
+        // Mirror TestConnectionProbe's approach: build the ssh argv
+        // directly so the script travels as a single opaque argv entry
+        // that ssh forwards to the remote shell unchanged.
+        switch context.kind {
+        case .local:
+            return await runLocally(script: script)
+        case .ssh(let config):
+            return await runOverSSH(script: script, config: config)
+        }
+    }
+
+    /// Direct ssh invocation. Pipes the script into `sh` on stdin rather
+    /// than passing it as `sh -c <script>` argv — because ssh concatenates
+    /// argv with spaces and sends that as a single command string to the
+    /// remote's LOGIN shell, which then parses newlines as command
+    /// separators. A multi-line `sh -c <script>` would run only the first
+    /// line inside the `sh` subprocess (any variables set there die when
+    /// `sh` exits), and the rest would run in the login shell with no
+    /// access to those variables. Symptom: `$H=""` everywhere downstream.
+    ///
+    /// Feeding the script via stdin avoids the split entirely — `sh -s`
+    /// consumes the whole stream in one process, so variable scope is
+    /// preserved and the script runs exactly the same way it would from
+    /// a local `cat script.sh | sh`.
+    private static func runOverSSH(script: String, config: SSHConfig) async -> Captured {
+        var sshArgv: [String] = [
+            "-o", "ControlMaster=auto",
+            "-o", "ControlPath=\(controlDirPath())/%C",
+            "-o", "ControlPersist=600",
+            "-o", "ServerAliveInterval=30",
+            "-o", "ConnectTimeout=10",
+            "-o", "StrictHostKeyChecking=accept-new",
+            "-o", "LogLevel=QUIET",
+            "-o", "BatchMode=yes",
+            "-T"  // no pty — keep stdin/stdout a clean byte stream
+        ]
+        if let port = config.port { sshArgv += ["-p", String(port)] }
+        if let id = config.identityFile, !id.isEmpty {
+            sshArgv += ["-i", id]
+        }
+        let hostSpec: String
+        if let user = config.user, !user.isEmpty { hostSpec = "\(user)@\(config.host)" }
+        else { hostSpec = config.host }
+        sshArgv.append(hostSpec)
+        sshArgv.append("--")
+        sshArgv.append("/bin/sh")
+        sshArgv.append("-s")   // read script from stdin
+
+        return await Task.detached { () -> Captured in
+            let proc = Process()
+            proc.executableURL = URL(fileURLWithPath: "/usr/bin/ssh")
+            proc.arguments = sshArgv
+
+            // Inherit the shell's SSH_AUTH_SOCK so ssh can reach the
+            // agent — same pattern as SSHTransport + TestConnectionProbe.
+            var env = ProcessInfo.processInfo.environment
+            let shellEnv = HermesFileService.enrichedEnvironment()
+            for key in ["SSH_AUTH_SOCK", "SSH_AGENT_PID"] {
+                if env[key] == nil, let v = shellEnv[key], !v.isEmpty {
+                    env[key] = v
+                }
+            }
+            proc.environment = env
+
+            let stdinPipe = Pipe()
+            let stdoutPipe = Pipe()
+            let stderrPipe = Pipe()
+            proc.standardInput = stdinPipe
+            proc.standardOutput = stdoutPipe
+            proc.standardError = stderrPipe
+
+            do {
+                try proc.run()
+            } catch {
+                return .connectFailure("Failed to launch ssh: \(error.localizedDescription)")
+            }
+
+            // Write the script to ssh's stdin, then close the write end so
+            // remote sh sees EOF and exits after executing the whole script.
+            if let data = script.data(using: .utf8) {
+                try? stdinPipe.fileHandleForWriting.write(contentsOf: data)
+            }
+            try? stdinPipe.fileHandleForWriting.close()
+
+            let deadline = Date().addingTimeInterval(30)
+            while proc.isRunning && Date() < deadline {
+                try? await Task.sleep(nanoseconds: 100_000_000)
+            }
+            if proc.isRunning {
+                proc.terminate()
+                return .connectFailure("Diagnostics timed out after 30s")
+            }
+            let out = (try? stdoutPipe.fileHandleForReading.readToEnd()) ?? Data()
+            let err = (try? stderrPipe.fileHandleForReading.readToEnd()) ?? Data()
+            return .completed(
+                stdout: String(data: out, encoding: .utf8) ?? "",
+                stderr: String(data: err, encoding: .utf8) ?? "",
+                exitCode: proc.terminationStatus
+            )
+        }.value
+    }
+
+    /// Local Shell invocation — runs the diagnostic script against the
+    /// user's own Mac. Less useful than the remote form (most checks will
+    /// trivially pass), but lets the same UI work for both contexts.
+    private static func runLocally(script: String) async -> Captured {
+        return await Task.detached { () -> Captured in
+            let proc = Process()
+            proc.executableURL = URL(fileURLWithPath: "/bin/sh")
+            proc.arguments = ["-c", script]
+
+            let stdoutPipe = Pipe()
+            let stderrPipe = Pipe()
+            proc.standardOutput = stdoutPipe
+            proc.standardError = stderrPipe
+            do {
+                try proc.run()
+            } catch {
+                return .connectFailure("Failed to launch /bin/sh: \(error.localizedDescription)")
+            }
+            let deadline = Date().addingTimeInterval(10)
+            while proc.isRunning && Date() < deadline {
+                try? await Task.sleep(nanoseconds: 100_000_000)
+            }
+            if proc.isRunning {
+                proc.terminate()
+                return .connectFailure("Local diagnostics timed out (should be <1s)")
+            }
+            let out = (try? stdoutPipe.fileHandleForReading.readToEnd()) ?? Data()
+            let err = (try? stderrPipe.fileHandleForReading.readToEnd()) ?? Data()
+            return .completed(
+                stdout: String(data: out, encoding: .utf8) ?? "",
+                stderr: String(data: err, encoding: .utf8) ?? "",
+                exitCode: proc.terminationStatus
+            )
+        }.value
+    }
+
+    /// Same cache directory used by SSHTransport — shared so the diagnostic
+    /// probe reuses the connection's ControlMaster socket when it already
+    /// exists (no second TCP handshake, no second auth).
+    private static func controlDirPath() -> String {
+        SSHTransport.controlDirPath()
+    }
+
+    private static func parse(stdout: String, stderr: String, exitCode: Int32) -> [Probe] {
+        var results: [ProbeID: Probe] = [:]
+        for line in stdout.split(whereSeparator: { $0 == "\n" || $0 == "\r" }) {
+            let parts = line.split(separator: "|", maxSplits: 2, omittingEmptySubsequences: false)
+            guard parts.count == 3 else { continue }
+            let key = String(parts[0]).trimmingCharacters(in: .whitespaces)
+            let status = String(parts[1]).trimmingCharacters(in: .whitespaces)
+            let detail = String(parts[2]).trimmingCharacters(in: .whitespaces)
+            guard let probe = ProbeID(rawValue: key) else { continue }
+            results[probe] = Probe(
+                id: probe,
+                passed: status == "PASS",
+                detail: detail
+            )
+        }
+
+        // If the script didn't complete, fill in the missing probes so the UI
+        // still shows every expected row (rather than silently skipping).
+        let terminated = stdout.contains("__END__")
+        let fallbackDetail: String
+        if terminated {
+            fallbackDetail = "(no output)"
+        } else if exitCode != 0 {
+            fallbackDetail = "(script exited \(exitCode) before this check — stderr: \(stderr.prefix(200)))"
+        } else {
+            fallbackDetail = "(no output from script)"
+        }
+
+        return ProbeID.allCases.map { id in
+            results[id] ?? Probe(id: id, passed: false, detail: fallbackDetail)
+        }
+    }
+}
@@ -47,6 +47,26 @@ struct TestConnectionProbe {
        //      Scarf's local resolution.
        // The matched absolute path is stored as `hermesBinaryHint` on the
        // SSHConfig so subsequent CLI/ACP invocations don't have to re-probe.
+        // If the user already typed a remoteHome override, use it; otherwise
+        // default to $HOME/.hermes. Either way, the script also probes a
+        // short list of well-known alternates when the primary path doesn't
+        // have state.db — systemd/docker/VPS installs tend to live at
+        // /var/lib/hermes/.hermes or /home/hermes/.hermes, and SSHing in as
+        // a different user than the Hermes daemon is the leading cause of
+        // "connection green, data empty" bug reports (issue #19).
+        let primary: String
+        if let override = config.remoteHome, !override.isEmpty {
+            if override.hasPrefix("~/") {
+                primary = "$HOME/\(override.dropFirst(2))"
+            } else if override == "~" {
+                primary = "$HOME"
+            } else {
+                primary = override
+            }
+        } else {
+            primary = "$HOME/.hermes"
+        }
+
        let script = #"""
        hpath=$(command -v hermes 2>/dev/null)
        if [ -z "$hpath" ]; then
@@ -61,7 +81,21 @@ struct TestConnectionProbe {
            done
        fi
        echo "HERMES:$hpath"
-        if [ -e "$HOME/.hermes/state.db" ]; then echo DB:ok; else echo DB:missing; fi
+        PRIMARY="\#(primary)"
+        if [ -r "$PRIMARY/state.db" ]; then
+            echo "DB:ok"
+            echo "HOME_USED:$PRIMARY"
+        else
+            echo "DB:missing"
+            # Probe well-known alternates. Emit the first one that has a
+            # readable state.db so the UI can offer a one-click fill.
+            for alt in "/var/lib/hermes/.hermes" "/opt/hermes/.hermes" "/home/hermes/.hermes" "/root/.hermes"; do
+                if [ -r "$alt/state.db" ]; then
+                    echo "SUGGEST:$alt"
+                    break
+                fi
+            done
+        fi
        """#
        sshArgs.append("/bin/sh")
        sshArgs.append("-c")
@@ -133,6 +167,8 @@ struct TestConnectionProbe {
            let hermesPath = lines.first(where: { $0.hasPrefix("HERMES:") })?
                .dropFirst("HERMES:".count).trimmingCharacters(in: .whitespaces) ?? ""
            let dbFound = lines.contains(where: { $0 == "DB:ok" })
+            let suggestedHome = lines.first(where: { $0.hasPrefix("SUGGEST:") })
+                .map { String($0.dropFirst("SUGGEST:".count)).trimmingCharacters(in: .whitespaces) }
            if hermesPath.isEmpty {
                return .failure(
                    message: "hermes binary not found in remote $PATH",
@@ -140,7 +176,7 @@ struct TestConnectionProbe {
                    command: displayCommand
                )
            }
-            return .success(hermesPath: String(hermesPath), dbFound: dbFound)
+            return .success(hermesPath: String(hermesPath), dbFound: dbFound, suggestedRemoteHome: suggestedHome)
        }

        // Classify common failures by scanning the stderr trace.
@@ -81,11 +81,15 @@ struct AddServerSheet: View {
                }
            }

-            LabeledField("Remote ~/.hermes override") {
-                TextField("Leave blank for default", text: $viewModel.remoteHome)
+            LabeledField("Hermes data directory") {
+                TextField("Default: ~/.hermes", text: $viewModel.remoteHome)
                    .textFieldStyle(.roundedBorder)
                    .autocorrectionDisabled()
            }
+            Text("Leave blank unless Hermes is installed at a non-default path (systemd services often live at /var/lib/hermes/.hermes; Docker sidecars vary). Test Connection auto-suggests a value when it detects one of the known alternates.")
+                .font(.caption)
+                .foregroundStyle(.secondary)
+                .fixedSize(horizontal: false, vertical: true)

            Text("Scarf uses ssh-agent for authentication. If your key has a passphrase, run `ssh-add` before connecting — Scarf never prompts for or stores passphrases.")
                .font(.caption)
@@ -113,14 +117,43 @@ struct AddServerSheet: View {

            if let result = viewModel.testResult {
                switch result {
-                case .success(let path, let dbFound):
-                    VStack(alignment: .leading, spacing: 4) {
+                case .success(let path, let dbFound, let suggestedHome):
+                    VStack(alignment: .leading, spacing: 6) {
                        Label("Connected", systemImage: "checkmark.circle.fill")
                            .foregroundStyle(.green)
                        Text("hermes at \(path)").font(.caption).monospaced()
-                        Text(dbFound ? "state.db found" : "state.db not found — Hermes may not have run yet on the remote")
-                            .font(.caption)
-                            .foregroundStyle(dbFound ? Color.secondary : Color.orange)
+                        if dbFound {
+                            Text("state.db readable")
+                                .font(.caption)
+                                .foregroundStyle(.secondary)
+                        } else if let suggestion = suggestedHome {
+                            // Scarf found Hermes data at one of the common
+                            // alternate paths. One-click fill the
+                            // remoteHome field so the user doesn't have to
+                            // know this is a convention thing.
+                            VStack(alignment: .leading, spacing: 4) {
+                                Text("state.db not found at the default location, but Scarf found one at:")
+                                    .font(.caption)
+                                    .foregroundStyle(.orange)
+                                HStack {
+                                    Text(suggestion)
+                                        .font(.caption.monospaced())
+                                        .textSelection(.enabled)
+                                    Spacer()
+                                    Button("Use this") {
+                                        viewModel.remoteHome = suggestion
+                                    }
+                                    .controlSize(.small)
+                                }
+                                .padding(8)
+                                .background(Color.yellow.opacity(0.12), in: RoundedRectangle(cornerRadius: 6))
+                            }
+                        } else {
+                            Text("state.db not found at the configured path. Either Hermes hasn't run yet on this server, or it's installed at a non-default location — set the Hermes data directory field above.")
+                                .font(.caption)
+                                .foregroundStyle(.orange)
+                                .fixedSize(horizontal: false, vertical: true)
+                        }
                    }
                case .failure(let message, let stderr, let command):
                    VStack(alignment: .leading, spacing: 6) {
@@ -8,47 +8,72 @@ import SwiftUI
 struct ConnectionStatusPill: View {
    let status: ConnectionStatusViewModel
    @State private var showDetails = false
+    @State private var showDiagnostics = false

    var body: some View {
        Button {
            switch status.status {
            case .error:
                showDetails = true
+            case .degraded:
+                // Yellow "can't read" state — open the diagnostics sheet
+                // so the user can see exactly which files fail and why.
+                showDiagnostics = true
            case .connected, .idle:
                status.retry()
            }
        } label: {
-            HStack(spacing: 4) {
-                Circle()
-                    .fill(color)
-                    .frame(width: 8, height: 8)
+            // Leading SF Symbol does double duty: its color is the status
+            // signal (green/orange/yellow/red), and its shape reads as a
+            // clickable toolbar tool. No custom background — the toolbar's
+            // `.principal` emphasis bezel is the frame.
+            HStack(spacing: 5) {
+                Image(systemName: iconName)
+                    .foregroundStyle(color)
+                    .symbolRenderingMode(.hierarchical)
                Text(label)
                    .font(.caption)
                    .foregroundStyle(.secondary)
                    .lineLimit(1)
            }
-            .padding(.horizontal, 6)
-            .padding(.vertical, 2)
-            .background(Color.secondary.opacity(0.08), in: Capsule())
+            .padding(.horizontal, 4)
        }
        .buttonStyle(.plain)
        .help(tooltip)
        .popover(isPresented: $showDetails, arrowEdge: .bottom) {
            errorDetails.frame(width: 400)
        }
+        .sheet(isPresented: $showDiagnostics) {
+            RemoteDiagnosticsView(context: status.context)
+        }
    }

    private var color: Color {
        switch status.status {
        case .connected: return .green
+        case .degraded: return .orange
        case .idle: return .yellow
        case .error: return .red
        }
    }

+    /// State-specific SF Symbol. The icon shape itself signals what the
+    /// click will do: checkmark for connected (click to re-probe),
+    /// stethoscope for degraded (click to run diagnostics), spinning
+    /// arrows for probing, triangle for error.
+    private var iconName: String {
+        switch status.status {
+        case .connected: return "checkmark.circle.fill"
+        case .degraded: return "stethoscope"
+        case .idle: return "arrow.triangle.2.circlepath"
+        case .error: return "exclamationmark.triangle.fill"
+        }
+    }
+
    private var label: String {
        switch status.status {
        case .connected: return "Connected"
+        case .degraded: return "Connected — can't read Hermes state"
        case .idle: return "Checking…"
        case .error(let message, _): return message
        }
@@ -62,6 +87,8 @@ struct ConnectionStatusPill: View {
                return "Last probe: \(fmt.localizedString(for: ts, relativeTo: Date()))"
            }
            return "Connected"
+        case .degraded(let reason):
+            return "SSH works but \(reason). Click for diagnostics."
        case .idle: return "Waiting for first probe"
        case .error(_, _): return "Click for details"
        }
@@ -6,6 +6,7 @@ struct ManageServersView: View {
    @Environment(ServerRegistry.self) private var registry
    @State private var showAddSheet = false
    @State private var pendingRemoveID: ServerID?
+    @State private var diagnosticsContext: ServerContext?

    var body: some View {
        VStack(alignment: .leading, spacing: 0) {
@@ -17,12 +18,18 @@ struct ManageServersView: View {
                list
            }
        }
-        .frame(width: 380, height: 360)
+        .frame(width: 440, height: 380)
        .sheet(isPresented: $showAddSheet) {
            AddServerSheet { name, config in
                _ = registry.addServer(displayName: name, config: config)
            }
        }
+        .sheet(item: Binding(
+            get: { diagnosticsContext.map { IdentifiableContext(context: $0) } },
+            set: { diagnosticsContext = $0?.context }
+        )) { wrapper in
+            RemoteDiagnosticsView(context: wrapper.context)
+        }
        .confirmationDialog(
            "Remove this server?",
            isPresented: Binding(
@@ -42,6 +49,13 @@ struct ManageServersView: View {
        )
    }

+    /// Wrapper because `ServerContext` isn't `Identifiable` against the sheet
+    /// item API in a way that preserves display-ordering stability.
+    private struct IdentifiableContext: Identifiable {
+        var id: ServerID { context.id }
+        let context: ServerContext
+    }
+
    private var header: some View {
        HStack {
            Text("Servers").font(.headline)
@@ -87,6 +101,13 @@ struct ManageServersView: View {
                        }
                    }
                    Spacer()
+                    Button {
+                        diagnosticsContext = entry.context
+                    } label: {
+                        Image(systemName: "stethoscope")
+                    }
+                    .buttonStyle(.borderless)
+                    .help("Run remote diagnostics — check exactly which files are readable on this server.")
                    Button {
                        pendingRemoveID = entry.id
                    } label: {
@@ -94,6 +115,7 @@ struct ManageServersView: View {
                    }
                    .buttonStyle(.borderless)
                    .foregroundStyle(.red)
+                    .help("Remove this server from Scarf.")
                }
                .padding(.vertical, 4)
            }
@@ -0,0 +1,203 @@
+import SwiftUI
+import AppKit
+
+/// Per-server diagnostics sheet. Shown from Manage Servers and from the
+/// Dashboard "Run Diagnostics…" button when `lastReadError` is set. Gives
+/// the user a specific list of what does/doesn't work over SSH, with
+/// targeted remediation hints for each failure.
+///
+/// Design principle: a failing check always shows both the raw detail the
+/// remote shell produced AND a human-written hint. The raw detail lets us
+/// triage bug reports; the hint unblocks the user without a round trip.
+struct RemoteDiagnosticsView: View {
+    let context: ServerContext
+    @State private var viewModel: RemoteDiagnosticsViewModel
+    @Environment(\.dismiss) private var dismiss
+
+    init(context: ServerContext) {
+        self.context = context
+        _viewModel = State(initialValue: RemoteDiagnosticsViewModel(context: context))
+    }
+
+    var body: some View {
+        VStack(spacing: 0) {
+            header
+            Divider()
+            probeList
+            Divider()
+            footer
+        }
+        .frame(minWidth: 640, minHeight: 520)
+        .task { await viewModel.run() }
+    }
+
+    private var header: some View {
+        VStack(alignment: .leading, spacing: 6) {
+            HStack {
+                Text("Remote Diagnostics — \(context.displayName)")
+                    .font(.title3)
+                    .fontWeight(.semibold)
+                Spacer()
+                if viewModel.isRunning {
+                    ProgressView()
+                        .controlSize(.small)
+                } else {
+                    Button("Re-run") { Task { await viewModel.run() } }
+                        .controlSize(.small)
+                }
+            }
+            HStack {
+                if viewModel.isRunning {
+                    Text("Running checks…")
+                        .font(.callout)
+                        .foregroundStyle(.secondary)
+                } else {
+                    Label(viewModel.summary, systemImage: viewModel.allPassed ? "checkmark.seal" : "info.circle")
+                        .font(.callout)
+                        .foregroundStyle(viewModel.allPassed ? .green : .orange)
+                }
+                Spacer()
+                if !viewModel.probes.isEmpty {
+                    Button {
+                        copyReportToClipboard()
+                    } label: {
+                        Label("Copy Full Report", systemImage: "doc.on.doc")
+                    }
+                    .controlSize(.small)
+                    .help("Copy a plain-text summary of every check (passes and fails) — paste into GitHub issues so we can see everything at once.")
+                }
+            }
+        }
+        .padding(16)
+    }
+
+    private var probeList: some View {
+        ScrollView {
+            VStack(alignment: .leading, spacing: 0) {
+                if viewModel.probes.isEmpty && viewModel.isRunning {
+                    Text("Running a single shell session on \(context.displayName) that exercises every path Scarf reads…")
+                        .font(.callout)
+                        .foregroundStyle(.secondary)
+                        .padding()
+                }
+                ForEach(viewModel.probes) { probe in
+                    probeRow(probe)
+                    if probe.id != viewModel.probes.last?.id {
+                        Divider()
+                    }
+                }
+            }
+        }
+    }
+
+    private func probeRow(_ probe: RemoteDiagnosticsViewModel.Probe) -> some View {
+        HStack(alignment: .top, spacing: 12) {
+            Image(systemName: probe.passed ? "checkmark.circle.fill" : "xmark.circle.fill")
+                .foregroundStyle(probe.passed ? .green : .red)
+                .font(.title3)
+                .padding(.top, 2)
+            VStack(alignment: .leading, spacing: 4) {
+                Text(probe.id.title)
+                    .font(.body)
+                if !probe.detail.isEmpty {
+                    Text(probe.detail)
+                        .font(.caption.monospaced())
+                        .foregroundStyle(.secondary)
+                        .textSelection(.enabled)
+                }
+                if !probe.passed, let hint = probe.id.failureHint {
+                    HStack(alignment: .top, spacing: 6) {
+                        Image(systemName: "lightbulb")
+                            .foregroundStyle(.yellow)
+                            .font(.caption)
+                        Text(hint)
+                            .font(.caption)
+                            .foregroundStyle(.primary)
+                            .textSelection(.enabled)
+                            .fixedSize(horizontal: false, vertical: true)
+                    }
+                    .padding(8)
+                    .background(Color.yellow.opacity(0.08))
+                    .clipShape(RoundedRectangle(cornerRadius: 6))
+                }
+            }
+            Spacer(minLength: 0)
+        }
+        .padding(.horizontal, 16)
+        .padding(.vertical, 10)
+    }
+
+    private var footer: some View {
+        VStack(alignment: .leading, spacing: 8) {
+            // Raw-output disclosure. Shown whenever anything fails — we need
+            // this visible for partial failures too since the raw stdout is
+            // the only way to see WHY a check returned its detail. Hidden
+            // only when 14/14 pass (script worked, nothing to debug).
+            if !viewModel.probes.isEmpty, !viewModel.allPassed {
+                DisclosureGroup("Raw remote output (for debugging)") {
+                    VStack(alignment: .leading, spacing: 6) {
+                        Text("exit code: \(viewModel.rawExitCode)")
+                            .font(.caption.monospaced())
+                        if !viewModel.rawStdout.isEmpty {
+                            Text("stdout:").font(.caption).foregroundStyle(.secondary)
+                            ScrollView {
+                                Text(viewModel.rawStdout)
+                                    .font(.system(size: 10, design: .monospaced))
+                                    .textSelection(.enabled)
+                                    .frame(maxWidth: .infinity, alignment: .leading)
+                            }
+                            .frame(maxHeight: 140)
+                        }
+                        if !viewModel.rawStderr.isEmpty {
+                            Text("stderr:").font(.caption).foregroundStyle(.secondary)
+                            ScrollView {
+                                Text(viewModel.rawStderr)
+                                    .font(.system(size: 10, design: .monospaced))
+                                    .textSelection(.enabled)
+                                    .frame(maxWidth: .infinity, alignment: .leading)
+                            }
+                            .frame(maxHeight: 140)
+                        }
+                    }
+                    .padding(8)
+                    .background(Color.secondary.opacity(0.08), in: RoundedRectangle(cornerRadius: 6))
+                }
+                .font(.caption)
+            }
+            HStack {
+                Text("Scarf runs these over a single SSH session that mirrors the shell your dashboard reads from, so a green row here means Scarf can actually read that file at runtime.")
+                    .font(.caption)
+                    .foregroundStyle(.secondary)
+                    .fixedSize(horizontal: false, vertical: true)
+                Spacer()
+                Button("Done") { dismiss() }
+                    .keyboardShortcut(.defaultAction)
+            }
+        }
+        .padding(16)
+    }
+
+    private func copyReportToClipboard() {
+        var lines: [String] = []
+        lines.append("Scarf remote diagnostics — \(context.displayName)")
+        if case .ssh(let config) = context.kind {
+            lines.append("Host: \(config.host)" + (config.user.map { " (user: \($0))" } ?? ""))
+            if let rh = config.remoteHome { lines.append("Hermes home (override): \(rh)") }
+        }
+        lines.append("Ran at: \(viewModel.startedAt.map { ISO8601DateFormatter().string(from: $0) } ?? "?")")
+        lines.append("Result: \(viewModel.summary)")
+        lines.append("")
+        for probe in viewModel.probes {
+            let mark = probe.passed ? "PASS" : "FAIL"
+            lines.append("[\(mark)] \(probe.id.title)")
+            if !probe.detail.isEmpty { lines.append("    \(probe.detail)") }
+            if !probe.passed, let hint = probe.id.failureHint {
+                lines.append("    hint: \(hint)")
+            }
+        }
+        let text = lines.joined(separator: "\n")
+        let pb = NSPasteboard.general
+        pb.clearContents()
+        pb.setString(text, forType: .string)
+    }
+}