From 110170d6e9e6232892bfd7bce1690f8dbc643ae8 Mon Sep 17 00:00:00 2001 From: Alan Wizemann Date: Mon, 20 Apr 2026 13:40:35 -0700 Subject: [PATCH 1/8] =?UTF-8?q?fix:=20v2.0.1=20=E2=80=94=20surface=20remot?= =?UTF-8?q?e=20SSH=20file-access=20errors=20(closes=20#19)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three users reported on day-one of v2.0 that SSH connections showed a green "Connected" pill but every data view read as empty / "not running" / "not configured". The common thread across Docker, homelab VM, and Ubuntu VPS setups: file-access failures on the remote that Scarf silently swallowed into nil/empty defaults. Stop swallowing errors - HermesFileService gains Result-returning variants for the four dashboard-critical readers: loadConfigResult, loadGatewayStateResult, hermesPIDResult, plus readFileResult / readFileDataResult as primitives. Each logs os.Logger warnings on failure. Legacy nil- returning signatures remain as thin forwarders. - HermesDataService.open records lastOpenError with humanized hints for the top three failure modes — sqlite3 not installed, permission denied, file not found. Each maps to concrete remediation (`apt install sqlite3`, "check file perms", "set Hermes data directory"). Dashboard surfaces the error - DashboardViewModel collects errors from every loader into lastReadError, only on remote contexts (local skips the banner). - DashboardView renders an orange banner above the stats with the specific error text, a copy-selectable detail, and a "Run Diagnostics…" button. New Remote Diagnostics sheet (stethoscope icon) - RemoteDiagnosticsViewModel runs 14 checks in one SSH round-trip via a pipe-delimited "KEY|STATUS|DETAIL" protocol. Covers: SSH connectivity, remote user/$HOME, Hermes dir existence + readability, config.yaml readability + actual read (distinct from just `test -e` which can't detect permission issues), state.db readability, sqlite3 binary presence, sqlite3 open test, hermes binary on non-login AND login PATH, pgrep availability. - Each probe row shows a targeted hint on fail (e.g. "check perms on ~/.hermes", "apt install sqlite3", "move PATH export from .bashrc to .zshenv"). A Copy Full Report button dumps plain-text output for GitHub issues. - Accessible from Manage Servers (stethoscope button per row) and directly from the yellow pill. Yellow "degraded" connection state - ConnectionStatusViewModel.Status gains .degraded(reason:) between .connected and .error. After tier-1 `true` passes, the probe runs tier-2 `test -r $HOME/.hermes/config.yaml` in the same SSH round- trip. On tier-2 fail, pill is orange with "Connected — can't read Hermes state" tooltip. - Clicking a degraded pill opens Remote Diagnostics directly. Exactly the symptom in #19 is now one click from a specific answer. Auto-suggest remoteHome for non-default installs - TestConnectionProbe.TestResult.success gains suggestedRemoteHome: String?. When state.db isn't found at the configured path, the probe also checks /var/lib/hermes/.hermes, /opt/hermes/.hermes, /home/hermes/.hermes, /root/.hermes — the common alternates for systemd services, Docker containers, and single-user VPSes — and surfaces the first hit as a "Use this" suggestion in Add Server. - AddServerSheet relabels "Remote ~/.hermes override" to "Hermes data directory" with an explanation of when you'd use it. README - New "Remote setup requirements" subsection lists the four concrete prereqs (SSH, sqlite3, pgrep, read access to ~/.hermes). - New "Troubleshooting remote connections" paragraph describes the diagnostics sheet and remoteHome auto-suggest for the two most common failure modes. Releases - releases/v2.0.1/RELEASE_NOTES.md for the GitHub release body. - Ship via `./scripts/release.sh 2.0.1`. Closes #19. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 15 + releases/v2.0.1/RELEASE_NOTES.md | 46 +++ .../Core/Services/HermesDataService.swift | 60 +++- .../Core/Services/HermesFileService.swift | 141 ++++++-- .../ViewModels/DashboardViewModel.swift | 55 ++- .../Dashboard/Views/DashboardView.swift | 42 +++ .../ViewModels/AddServerViewModel.swift | 9 +- .../ConnectionStatusViewModel.swift | 67 +++- .../RemoteDiagnosticsViewModel.swift | 330 ++++++++++++++++++ .../ViewModels/TestConnectionProbe.swift | 40 ++- .../Servers/Views/AddServerSheet.swift | 47 ++- .../Servers/Views/ConnectionStatusPill.swift | 12 + .../Servers/Views/ManageServersView.swift | 24 +- .../Servers/Views/RemoteDiagnostics.swift | 166 +++++++++ 14 files changed, 997 insertions(+), 57 deletions(-) create mode 100644 releases/v2.0.1/RELEASE_NOTES.md create mode 100644 scarf/scarf/Features/Servers/ViewModels/RemoteDiagnosticsViewModel.swift create mode 100644 scarf/scarf/Features/Servers/Views/RemoteDiagnostics.swift diff --git a/README.md b/README.md index e27e830..f4fc394 100644 --- a/README.md +++ b/README.md @@ -42,6 +42,21 @@ Scarf 2.0 is a multi-window app. Each window is bound to exactly one Hermes serv Remote Hermes is reached over system SSH — the same `~/.ssh/config`, ssh-agent, ProxyJump, and ControlMaster pooling your terminal uses. File I/O flows through `scp`/`sftp`; SQLite is served from atomic `sqlite3 .backup` snapshots cached under `~/Library/Caches/scarf/snapshots//`; chat (ACP) tunnels as `ssh -T host -- hermes acp` with JSON-RPC over stdio end-to-end. Everything in the feature list below works against remote identically to local. +### Remote setup requirements + +The remote host must have: + +1. **SSH access** — key-based auth via your local ssh-agent. Scarf never prompts for passphrases; run `ssh-add` once in Terminal before connecting. +2. **`sqlite3`** on the remote `$PATH` — needed for the atomic DB snapshots. Install on the remote with `apt install sqlite3` (Ubuntu/Debian), `yum install sqlite` (RHEL/Fedora), or `apk add sqlite` (Alpine). +3. **`pgrep`** on the remote `$PATH` — used by the Dashboard "is Hermes running" check. Standard on every distro; install `procps` if missing. +4. **`~/.hermes/` readable by the SSH user**. When Hermes runs as a separate user (systemd service, Docker container), the SSH user needs read access to `config.yaml` and `state.db`. Either (a) SSH as the Hermes user, (b) `chmod` Hermes's home to be group-readable and add your SSH user to that group, or (c) set the **Hermes data directory** field when adding the server to point at the right location (e.g. `/var/lib/hermes/.hermes`). + +### Troubleshooting remote connections + +If the connection pill is green but the Dashboard shows "Stopped", "unknown", or empty values, the SSH user can't read the Hermes state files. Open **Manage Servers → 🩺 Run Diagnostics** (or click the yellow "Can't read Hermes state" pill in the toolbar). The diagnostics sheet runs fourteen checks in one SSH session — connectivity, `sqlite3` presence, read access to `config.yaml` and `state.db`, the effective non-login `$PATH` — and tells you exactly which one fails and why, with remediation hints for each. Use the **Copy Full Report** button to paste the full output into a bug report. + +For the common "Hermes isn't at the default path" case (systemd services, Docker), **Test Connection** in the Add Server sheet now probes `/var/lib/hermes/.hermes`, `/opt/hermes/.hermes`, `/home/hermes/.hermes`, and `/root/.hermes` when it can't find `state.db` at `~/.hermes/`, and offers a one-click fill if it finds any of them. + ## Features Scarf mirrors Hermes's surface area through a sidebar-based UI. Sections below map 1:1 to the app's sidebar. diff --git a/releases/v2.0.1/RELEASE_NOTES.md b/releases/v2.0.1/RELEASE_NOTES.md new file mode 100644 index 0000000..6de355d --- /dev/null +++ b/releases/v2.0.1/RELEASE_NOTES.md @@ -0,0 +1,46 @@ +## What's New in 2.0.1 + +Hotfix for [#19](https://github.com/awizemann/scarf/issues/19) and the related reports from the first day of v2.0: users' remote SSH connections would show a green "Connected" pill but every view (Dashboard, Sessions, Activity, Chat) read as empty / "not running" / "not configured". Three distinct environments reported it — Docker Hermes on a LAN, homelab VM over Tailscale, Ubuntu VPS — and every one was a silent file-access failure on the remote that Scarf wasn't surfacing. + +### What changed + +**Errors no longer disappear.** Every remote read (`config.yaml`, `gateway_state.json`, `state.db`, `pgrep`) used to silently substitute an empty value on *any* failure — permission denied, missing file, `sqlite3` not installed, connection drop — they all looked identical to the UI. Now: + +- Each failure logs a specific warning via `os.Logger` (visible in Console.app under subsystem `com.scarf`). +- The Dashboard shows an orange banner above the stats with the exact error (e.g. "Permission denied reading `~/.hermes/state.db`") and a **Run Diagnostics…** button. +- `HermesDataService` exposes a `lastOpenError` so views can explain *why* state.db couldn't be opened, rather than just rendering zeros. + +**New Remote Diagnostics sheet.** Accessible from **Manage Servers → 🩺** or directly by clicking the toolbar pill when it's yellow. Runs fourteen checks in a single SSH session and shows pass/fail for each, plus a targeted hint per failure: + +- SSH connectivity and auth +- Remote user identity and `$HOME` resolution +- `~/.hermes` directory existence and readability +- `config.yaml` readable (existence *and* actual read access — the old probe only checked existence) +- `state.db` readable +- `sqlite3` installed on the remote (required for the atomic snapshot Scarf pulls) +- `sqlite3` can actually open `state.db` +- `hermes` binary on the non-login `$PATH` (what runtime uses) +- `hermes` binary on the login `$PATH` (what the Test Connection probe uses) +- `pgrep` available (for the "is Hermes running" check) + +One **Copy Full Report** button dumps every check as plain text for bug reports. + +**Connection pill gains a yellow "degraded" state.** The pill used to be green as long as SSH connected; now after connectivity passes it runs a second-tier check (`test -r ~/.hermes/config.yaml`). If that fails, the pill turns **yellow** with "Connected — can't read Hermes state" and clicking it opens Remote Diagnostics directly. This is the exact symptom mode in issue #19, and it's now one click away from a specific answer. + +**Auto-suggest the correct `remoteHome` during Add Server.** When Test Connection can't find `state.db` at the configured (or default) path, it now also probes the common alternate locations — `/var/lib/hermes/.hermes`, `/opt/hermes/.hermes`, `/home/hermes/.hermes`, `/root/.hermes` — and offers a one-click "Use this" fill if it finds one. Removes the need to know that systemd-installed Hermes lives at `/var/lib/hermes/.hermes` by convention. + +**Clearer copy for the `remoteHome` field.** The Add Server sheet field is now labeled "Hermes data directory" with a description explaining when you'd override it (systemd service installs, Docker sidecars) and noting that Test Connection auto-suggests. + +**README has a new "Remote setup requirements" section** with the four concrete prerequisites (SSH, `sqlite3`, `pgrep`, read access to `~/.hermes`) and a troubleshooting paragraph pointing at Remote Diagnostics. + +### Migrating from 2.0.0 + +Sparkle will offer the update automatically. Settings and server list are preserved verbatim — this is purely additive (new diagnostics surface, new error banners, auto-suggest in Test Connection). If you were affected by #19, run Remote Diagnostics after updating; the sheet should pinpoint the specific file access issue and suggest a fix. + +### Under the hood + +- New types: `RemoteDiagnosticsViewModel`, `RemoteDiagnosticsView`. Both are local to Scarf; no new transport protocol. +- `HermesFileService` gains `loadConfigResult()`, `loadGatewayStateResult()`, `hermesPIDResult()`, `readFileResult()`, `readFileDataResult()` — Result-returning variants that preserve the error. Legacy `loadConfig()` etc. still exist as thin forwarders for callers that don't need diagnostics. +- `HermesDataService.open()` records `lastOpenError` with humanized hints for "sqlite3 not installed", "permission denied", and "file not found" — the three failure modes that produce 90% of issue #19 symptoms. +- `ConnectionStatusViewModel` status enum gains `.degraded(reason:)` between `.connected` and `.error`. +- `TestConnectionProbe` result enum gains `suggestedRemoteHome: String?` carrying any alternate-location hit. diff --git a/scarf/scarf/Core/Services/HermesDataService.swift b/scarf/scarf/Core/Services/HermesDataService.swift index 6011c0c..c75fbaf 100644 --- a/scarf/scarf/Core/Services/HermesDataService.swift +++ b/scarf/scarf/Core/Services/HermesDataService.swift @@ -1,5 +1,6 @@ import Foundation import SQLite3 +import os /// Dedupes concurrent `snapshotSQLite` calls for the same server. When the /// file watcher ticks, Dashboard + Sessions + Activity (+ Chat's loadHistory) @@ -29,11 +30,18 @@ actor SnapshotCoordinator { } actor HermesDataService { + private static let logger = Logger(subsystem: "com.scarf", category: "HermesDataService") + private var db: OpaquePointer? private var hasV07Schema = false /// Local filesystem path we last opened. For remote contexts this is /// the cached snapshot under `~/Library/Caches/scarf/snapshots//`. private var openedAtPath: String? + /// Last error from `open()` / `refresh()`, user-presentable. `nil` means + /// the last attempt succeeded. Views surface this when their own load + /// path fails, so the user sees "Permission denied reading state.db" + /// instead of an empty Dashboard with no explanation. + private(set) var lastOpenError: String? let context: ServerContext private let transport: any ServerTransport @@ -52,16 +60,25 @@ actor HermesDataService { // corrupt. Routed through SnapshotCoordinator so concurrent // view models don't each spawn a parallel SSH backup for the // same server. - let url = try? await SnapshotCoordinator.shared.snapshot( - remotePath: context.paths.stateDB, - contextID: context.id, - transport: transport - ) - guard let url else { return false } - localPath = url.path + do { + let url = try await SnapshotCoordinator.shared.snapshot( + remotePath: context.paths.stateDB, + contextID: context.id, + transport: transport + ) + localPath = url.path + lastOpenError = nil + } catch { + lastOpenError = humanize(error) + Self.logger.warning("snapshotSQLite failed: \(error.localizedDescription, privacy: .public)") + return false + } } else { localPath = context.paths.stateDB - guard FileManager.default.fileExists(atPath: localPath) else { return false } + guard FileManager.default.fileExists(atPath: localPath) else { + lastOpenError = "Hermes state database not found at \(localPath)." + return false + } } // Remote snapshots are point-in-time copies that no one writes to; // opening them with `immutable=1` tells SQLite to skip WAL/SHM and @@ -81,14 +98,41 @@ actor HermesDataService { } let result = sqlite3_open_v2(openPath, &db, flags, nil) guard result == SQLITE_OK else { + let msg: String + if let db { + msg = String(cString: sqlite3_errmsg(db)) + } else { + msg = "sqlite3_open_v2 returned \(result)" + } + lastOpenError = "Couldn't open state.db: \(msg)" + Self.logger.warning("sqlite3_open_v2 failed (\(result)) at \(localPath, privacy: .public): \(msg, privacy: .public)") db = nil return false } openedAtPath = localPath + lastOpenError = nil detectSchema() return true } + /// Turn a transport error into the one-line string Dashboard shows. Adds + /// hints for the common "sqlite3 not installed" and "permission denied" + /// cases so users know what to do. + private nonisolated func humanize(_ error: Error) -> String { + let desc = (error as? LocalizedError)?.errorDescription ?? error.localizedDescription + let lower = desc.lowercased() + if lower.contains("sqlite3: command not found") || lower.contains("sqlite3: not found") { + return "sqlite3 is not installed on \(context.displayName). Install it with `apt install sqlite3` (Ubuntu/Debian) or `yum install sqlite` (RHEL/Fedora)." + } + if lower.contains("permission denied") { + return "Permission denied reading Hermes state on \(context.displayName). The SSH user may not have read access to ~/.hermes/state.db — try Run Diagnostics." + } + if lower.contains("no such file") { + return "Hermes state not found at ~/.hermes on \(context.displayName). If Hermes is installed elsewhere, set its data directory in Manage Servers." + } + return desc + } + /// Force a fresh snapshot pull + reopen. Used on session-load and in /// any path that needs the UI to reflect writes Hermes just made. /// Without this, remote snapshots would be frozen at the first `open()` diff --git a/scarf/scarf/Core/Services/HermesFileService.swift b/scarf/scarf/Core/Services/HermesFileService.swift index 1c756d0..2e3df54 100644 --- a/scarf/scarf/Core/Services/HermesFileService.swift +++ b/scarf/scarf/Core/Services/HermesFileService.swift @@ -1,7 +1,10 @@ import Foundation +import os struct HermesFileService: Sendable { + nonisolated static let logger = Logger(subsystem: "com.scarf", category: "HermesFileService") + let context: ServerContext let transport: any ServerTransport @@ -17,6 +20,14 @@ struct HermesFileService: Sendable { return parseConfig(content) } + /// Error-surfacing config load. Used by Dashboard to show the user a + /// specific reason when config.yaml can't be read on a remote host + /// (permission denied, missing file, sqlite3 not installed, etc.) + /// instead of silently falling back to `.empty`. + nonisolated func loadConfigResult() -> Result { + readFileResult(context.paths.configYAML).map { parseConfig($0) } + } + nonisolated private func parseConfig(_ yaml: String) -> HermesConfig { let parsed = Self.parseNestedYAML(yaml) let values = parsed.values @@ -385,11 +396,34 @@ struct HermesFileService: Sendable { do { return try JSONDecoder().decode(GatewayState.self, from: data) } catch { - print("[Scarf] Failed to decode gateway state: \(error.localizedDescription)") + Self.logger.warning("Failed to decode gateway state: \(error.localizedDescription, privacy: .public)") return nil } } + /// Error-surfacing gateway-state load. `.success(nil)` means the file + /// doesn't exist yet (gateway hasn't written state — normal when Hermes + /// is stopped). `.failure` means the file exists but couldn't be read + /// (permission denied, connection down, JSON corruption). + nonisolated func loadGatewayStateResult() -> Result { + // Distinguish "file doesn't exist yet" (normal, returns .success(nil)) + // from "file exists but we can't read or parse it" (error). + if !transport.fileExists(context.paths.gatewayStateJSON) { + return .success(nil) + } + switch readFileDataResult(context.paths.gatewayStateJSON) { + case .success(let data): + do { + return .success(try JSONDecoder().decode(GatewayState.self, from: data)) + } catch { + Self.logger.warning("Failed to decode gateway state: \(error.localizedDescription, privacy: .public)") + return .failure(error) + } + case .failure(let err): + return .failure(err) + } + } + // MARK: - Memory nonisolated func loadMemoryProfiles() -> [String] { @@ -1173,22 +1207,45 @@ struct HermesFileService: Sendable { } nonisolated func hermesPID() -> pid_t? { - // Run `pgrep -f hermes` either locally or via the transport. On - // remote hosts we trust `pgrep` to be present — it's standard on - // Linux and macOS. On failure we conservatively return nil rather - // than pretending Hermes is down: the caller will see - // isHermesRunning==false, which is already the "unknown" UX. - let result = try? transport.runProcess( - executable: "/usr/bin/pgrep", - args: ["-f", "hermes"], - stdin: nil, - timeout: 5 - ) - guard let result, let firstLine = result.stdoutString - .components(separatedBy: "\n") - .first(where: { !$0.isEmpty }), - let pid = pid_t(firstLine.trimmingCharacters(in: .whitespaces)) else { return nil } - return pid + switch hermesPIDResult() { + case .success(let pid): return pid + case .failure: return nil + } + } + + /// Error-surfacing variant. `.success(nil)` means `pgrep` ran successfully + /// and found no hermes process (Hermes is genuinely not running). + /// `.failure` means we couldn't probe at all (pgrep missing, connection + /// down, permission issue) — a *different* UX from "not running". + nonisolated func hermesPIDResult() -> Result { + do { + let result = try transport.runProcess( + executable: "/usr/bin/pgrep", + args: ["-f", "hermes"], + stdin: nil, + timeout: 5 + ) + // pgrep exits 1 when nothing matches — that's "not running", NOT an + // error. Anything else (127=command not found, 255=ssh failure) is. + if result.exitCode == 0 { + if let firstLine = result.stdoutString + .components(separatedBy: "\n") + .first(where: { !$0.isEmpty }), + let pid = pid_t(firstLine.trimmingCharacters(in: .whitespaces)) { + return .success(pid) + } + return .success(nil) + } else if result.exitCode == 1 { + return .success(nil) // genuinely not running + } else { + let err = TransportError.commandFailed(exitCode: result.exitCode, stderr: result.stderrString) + Self.logger.warning("pgrep failed (exit \(result.exitCode)): \(result.stderrString, privacy: .public)") + return .failure(err) + } + } catch { + Self.logger.warning("pgrep transport error: \(error.localizedDescription, privacy: .public)") + return .failure(error) + } } @discardableResult @@ -1488,15 +1545,55 @@ struct HermesFileService: Sendable { // MARK: - File I/O /// Read a UTF-8 text file through the transport. Missing files and any - /// transport error surface as `nil` — callers treat missing/unreadable - /// the same way they always have. + /// transport error surface as `nil` — callers that don't need the + /// specific error reason keep using this. New call sites that want to + /// show a user-actionable message should use `readFileResult`. nonisolated private func readFile(_ path: String) -> String? { - guard let data = try? transport.readFile(path) else { return nil } - return String(data: data, encoding: .utf8) + switch readFileResult(path) { + case .success(let s): + return s + case .failure: + return nil + } } nonisolated private func readFileData(_ path: String) -> Data? { - try? transport.readFile(path) + switch readFileDataResult(path) { + case .success(let d): + return d + case .failure: + return nil + } + } + + /// Error-surfacing read. Returns the decoded text on success, or the + /// underlying `TransportError` (or raw error for local failures) on + /// failure. Every failure is also logged via `os.Logger` — the warning + /// trail in Console.app is how we diagnose "connection green, data + /// empty" bug reports without needing to wire the error through every + /// existing call site. + nonisolated func readFileResult(_ path: String) -> Result { + switch readFileDataResult(path) { + case .success(let data): + guard let s = String(data: data, encoding: .utf8) else { + let err = TransportError.fileIO(path: path, underlying: "file is not valid UTF-8") + Self.logger.warning("readFile(\(path, privacy: .public)): not UTF-8") + return .failure(err) + } + return .success(s) + case .failure(let err): + return .failure(err) + } + } + + nonisolated func readFileDataResult(_ path: String) -> Result { + do { + let data = try transport.readFile(path) + return .success(data) + } catch { + Self.logger.warning("readFile(\(path, privacy: .public)) failed: \(error.localizedDescription, privacy: .public)") + return .failure(error) + } } /// Write a UTF-8 text file atomically through the transport. Matches the diff --git a/scarf/scarf/Features/Dashboard/ViewModels/DashboardViewModel.swift b/scarf/scarf/Features/Dashboard/ViewModels/DashboardViewModel.swift index d2b970e..a1956da 100644 --- a/scarf/scarf/Features/Dashboard/ViewModels/DashboardViewModel.swift +++ b/scarf/scarf/Features/Dashboard/ViewModels/DashboardViewModel.swift @@ -21,28 +21,73 @@ final class DashboardViewModel { var hermesRunning = false var isLoading = true + /// User-presentable error banner. Set when any of the remote reads + /// (state.db snapshot, config.yaml, gateway_state.json, pgrep) failed + /// in a way that's not just "file doesn't exist yet". Dashboard renders + /// this above the stats with a "Run Diagnostics…" button. `nil` = no + /// surfaceable error. + var lastReadError: String? + func load() async { isLoading = true // refresh() = close + reopen, forces a fresh remote snapshot. Cheap // on local (live DB reopen). let opened = await dataService.refresh() + var collectedErrors: [String] = [] if opened { stats = await dataService.fetchStats() recentSessions = await dataService.fetchSessions(limit: 5) sessionPreviews = await dataService.fetchSessionPreviews(limit: 5) await dataService.close() + } else if let msg = await dataService.lastOpenError { + collectedErrors.append(msg) } // The fileService methods are synchronous and route through the // transport. For remote contexts each call is a blocking ssh // round-trip — do them off the main thread to avoid spinning the // beach ball during the load. let svc = fileService - let (cfg, gw, running) = await Task.detached { - (svc.loadConfig(), svc.loadGatewayState(), svc.isHermesRunning()) + struct LoadResults: Sendable { + let cfg: Result + let gw: Result + let running: Result + } + let results = await Task.detached { () -> LoadResults in + LoadResults( + cfg: svc.loadConfigResult(), + gw: svc.loadGatewayStateResult(), + running: svc.hermesPIDResult() + ) }.value - config = cfg - gatewayState = gw - hermesRunning = running + + switch results.cfg { + case .success(let c): config = c + case .failure(let e): + config = .empty + collectedErrors.append("config.yaml — \(e.localizedDescription)") + } + switch results.gw { + case .success(let g): gatewayState = g + case .failure(let e): + gatewayState = nil + collectedErrors.append("gateway_state.json — \(e.localizedDescription)") + } + switch results.running { + case .success(let pid): hermesRunning = (pid != nil) + case .failure(let e): + hermesRunning = false + collectedErrors.append("pgrep — \(e.localizedDescription)") + } + + // Only surface when there's a real error AND we're on a remote + // context. Local contexts rarely hit these paths (live DB, local + // filesystem), and a transient "file doesn't exist yet" on fresh + // installs shouldn't scare users. + if context.isRemote, !collectedErrors.isEmpty { + lastReadError = collectedErrors.joined(separator: "\n") + } else { + lastReadError = nil + } isLoading = false } } diff --git a/scarf/scarf/Features/Dashboard/Views/DashboardView.swift b/scarf/scarf/Features/Dashboard/Views/DashboardView.swift index 2f041ba..dd2da22 100644 --- a/scarf/scarf/Features/Dashboard/Views/DashboardView.swift +++ b/scarf/scarf/Features/Dashboard/Views/DashboardView.swift @@ -2,6 +2,7 @@ import SwiftUI struct DashboardView: View { @State private var viewModel: DashboardViewModel + @State private var showDiagnostics = false @Environment(AppCoordinator.self) private var coordinator @Environment(HermesFileWatcher.self) private var fileWatcher @@ -13,6 +14,9 @@ struct DashboardView: View { var body: some View { ScrollView { VStack(alignment: .leading, spacing: 20) { + if let err = viewModel.lastReadError { + readErrorBanner(err) + } statusSection statsSection recentSessionsSection @@ -30,6 +34,44 @@ struct DashboardView: View { .onChange(of: fileWatcher.lastChangeDate) { Task { await viewModel.load() } } + .sheet(isPresented: $showDiagnostics) { + RemoteDiagnosticsView(context: viewModel.context) + } + } + + /// Banner shown above the Dashboard when one or more remote reads + /// failed (permission denied, missing sqlite3, wrong home dir, etc.). + /// Replaces the old silent-failure mode where empty values just + /// appeared as "Stopped / unknown / 0" with no explanation. + private func readErrorBanner(_ err: String) -> some View { + VStack(alignment: .leading, spacing: 8) { + HStack(alignment: .top, spacing: 8) { + Image(systemName: "exclamationmark.triangle.fill") + .foregroundStyle(.orange) + VStack(alignment: .leading, spacing: 4) { + Text("Can't read Hermes state on \(viewModel.context.displayName)") + .font(.headline) + Text(err) + .font(.caption.monospaced()) + .foregroundStyle(.secondary) + .textSelection(.enabled) + .fixedSize(horizontal: false, vertical: true) + } + Spacer() + Button { + showDiagnostics = true + } label: { + Label("Run Diagnostics…", systemImage: "stethoscope") + } + .controlSize(.regular) + } + } + .padding(12) + .background(Color.orange.opacity(0.1), in: RoundedRectangle(cornerRadius: 8)) + .overlay( + RoundedRectangle(cornerRadius: 8) + .strokeBorder(Color.orange.opacity(0.3), lineWidth: 1) + ) } private var statusSection: some View { diff --git a/scarf/scarf/Features/Servers/ViewModels/AddServerViewModel.swift b/scarf/scarf/Features/Servers/ViewModels/AddServerViewModel.swift index 8be7769..526def4 100644 --- a/scarf/scarf/Features/Servers/ViewModels/AddServerViewModel.swift +++ b/scarf/scarf/Features/Servers/ViewModels/AddServerViewModel.swift @@ -22,7 +22,12 @@ final class AddServerViewModel { var testResult: TestResult? enum TestResult: Equatable { - case success(hermesPath: String, dbFound: Bool) + /// `suggestedRemoteHome` is non-nil when the probe didn't find + /// state.db at the configured (or default) path but did find a + /// `state.db` at one of the well-known alternates (e.g. a systemd + /// install in `/var/lib/hermes/.hermes`). UI offers a one-click + /// fill so the user doesn't have to know the convention. + case success(hermesPath: String, dbFound: Bool, suggestedRemoteHome: String?) /// `command` is the full ssh invocation we attempted (so the user can /// paste it into Terminal to see what their shell does with it). /// `stderr` is whatever ssh / the remote shell wrote to stderr. @@ -95,7 +100,7 @@ final class AddServerViewModel { /// `hermesBinaryHint` so subsequent calls don't need to re-resolve it. func configForSave() -> SSHConfig { var cfg = draftConfig - if case .success(let path, _) = testResult { + if case .success(let path, _, _) = testResult { cfg.hermesBinaryHint = path } return cfg diff --git a/scarf/scarf/Features/Servers/ViewModels/ConnectionStatusViewModel.swift b/scarf/scarf/Features/Servers/ViewModels/ConnectionStatusViewModel.swift index 35ce655..5410a2c 100644 --- a/scarf/scarf/Features/Servers/ViewModels/ConnectionStatusViewModel.swift +++ b/scarf/scarf/Features/Servers/ViewModels/ConnectionStatusViewModel.swift @@ -11,8 +11,12 @@ final class ConnectionStatusViewModel { private let logger = Logger(subsystem: "com.scarf", category: "ConnectionStatus") enum Status: Equatable { - /// Healthy: most recent probe succeeded. + /// Healthy: SSH connected AND we can read `~/.hermes/config.yaml`. case connected + /// SSH connects but the follow-up read-access probe failed. Data + /// views will be empty until this is resolved. `reason` is shown + /// in the pill tooltip; users click the pill to open diagnostics. + case degraded(reason: String) /// No probe yet or the previous probe timed out but we haven't /// confirmed failure. Shown as yellow to tell the user "checking…". case idle @@ -72,20 +76,59 @@ final class ConnectionStatusViewModel { private func probeOnce() async { let snapshot = transport - let result: Result - // Transport IO on a detached task so we don't block MainActor. - result = await Task.detached { + let hermesHome = context.paths.home + // Two-tier probe in one SSH round-trip: + // tier 1: `true` — raw connectivity / auth / ControlMaster path + // tier 2: `test -r $HERMESHOME/config.yaml` — can we actually + // read the file Dashboard reads on every tick? Green pill + // only if both pass; yellow "degraded" if tier 1 passes + // but tier 2 fails (the exact symptom in issue #19). + // Script emits two lines: TIER1: and TIER2:. + let homeArg: String + if hermesHome.hasPrefix("~/") { + homeArg = "\"$HOME/\(hermesHome.dropFirst(2))\"" + } else if hermesHome == "~" { + homeArg = "\"$HOME\"" + } else { + homeArg = "\"\(hermesHome.replacingOccurrences(of: "\"", with: "\\\""))\"" + } + let script = """ + echo TIER1:0 + H=\(homeArg) + if [ -r "$H/config.yaml" ]; then echo TIER2:0; else echo TIER2:1; fi + """ + + enum ProbeOutcome { + case connected + case degraded(reason: String) + case failure(TransportError) + } + + let outcome: ProbeOutcome = await Task.detached { do { let probe = try snapshot.runProcess( executable: "/bin/sh", - args: ["-c", "true"], + args: ["-c", script], stdin: nil, timeout: 10 ) - if probe.exitCode == 0 { - return .success(()) + guard probe.exitCode == 0 else { + return .failure(.commandFailed(exitCode: probe.exitCode, stderr: probe.stderrString)) } - return .failure(.commandFailed(exitCode: probe.exitCode, stderr: probe.stderrString)) + let out = probe.stdoutString + let tier1 = out.contains("TIER1:0") + let tier2 = out.contains("TIER2:0") + if !tier1 { + // The script itself didn't reach tier 1 — treat as connection failure. + return .failure(.commandFailed(exitCode: 1, stderr: out)) + } + if tier2 { + return .connected + } + // Connected but can't read config.yaml — the core issue #19 + // symptom. Give the pill a short reason; the full story goes + // into Remote Diagnostics. + return .degraded(reason: "can't read ~/.hermes/config.yaml") } catch let e as TransportError { return .failure(e) } catch { @@ -93,11 +136,15 @@ final class ConnectionStatusViewModel { } }.value - switch result { - case .success: + switch outcome { + case .connected: status = .connected lastSuccess = Date() consecutiveFailures = 0 + case .degraded(let reason): + status = .degraded(reason: reason) + lastSuccess = Date() // SSH itself is fine, reset failure count + consecutiveFailures = 0 case .failure(let err): consecutiveFailures += 1 // First failure → silent yellow "Reconnecting…" while we try diff --git a/scarf/scarf/Features/Servers/ViewModels/RemoteDiagnosticsViewModel.swift b/scarf/scarf/Features/Servers/ViewModels/RemoteDiagnosticsViewModel.swift new file mode 100644 index 0000000..434ff90 --- /dev/null +++ b/scarf/scarf/Features/Servers/ViewModels/RemoteDiagnosticsViewModel.swift @@ -0,0 +1,330 @@ +import Foundation +import os + +/// Runs a fixed check-list against a remote server and reports per-probe +/// pass/fail. Exists because `TestConnectionProbe` only verifies ssh +/// connectivity + hermes binary presence, and `ConnectionStatusViewModel` +/// only pings `/bin/sh -c true`. When users file "connection green but +/// everything empty" bug reports (issue #19), this is the diagnostic surface +/// that tells them (and us) exactly which read fails and why. +/// +/// One shell invocation runs every check on the remote and emits a +/// line-delimited `KEY|STATUS|DETAIL` protocol that the view model parses. +/// Cheaper than one SSH round-trip per probe and gives a consistent shell +/// environment across all probes. +@Observable +@MainActor +final class RemoteDiagnosticsViewModel { + private static let logger = Logger(subsystem: "com.scarf", category: "RemoteDiagnostics") + + let context: ServerContext + + /// Probes in display order. The order matters: connectivity first, then + /// environment checks, then Hermes data-path checks. A failure early in + /// the list usually explains every subsequent failure. + enum ProbeID: String, CaseIterable, Identifiable { + case connectivity + case remoteUser + case remoteHome + case hermesHomeConfigured + case hermesDirExists + case hermesDirReadable + case configYAMLReadable + case configYAMLContents + case stateDBReadable + case sqlite3Installed + case sqlite3CanOpenStateDB + case hermesBinaryNonLogin + case hermesBinaryLogin + case pgrepAvailable + + var id: String { rawValue } + + /// Human-readable title rendered in the diagnostics sheet. + var title: String { + switch self { + case .connectivity: return "SSH connectivity" + case .remoteUser: return "Remote user identity" + case .remoteHome: return "Remote $HOME" + case .hermesHomeConfigured: return "Hermes home directory" + case .hermesDirExists: return "Hermes directory exists" + case .hermesDirReadable: return "Hermes directory readable" + case .configYAMLReadable: return "config.yaml readable" + case .configYAMLContents: return "config.yaml actually readable (content)" + case .stateDBReadable: return "state.db readable" + case .sqlite3Installed: return "sqlite3 binary installed on remote" + case .sqlite3CanOpenStateDB: return "sqlite3 can open state.db" + case .hermesBinaryNonLogin: return "hermes binary on non-login PATH" + case .hermesBinaryLogin: return "hermes binary on login PATH (via rc files)" + case .pgrepAvailable: return "pgrep available (for 'is Hermes running')" + } + } + + /// When the check fails, show this hint alongside the stderr. + var failureHint: String? { + switch self { + case .connectivity: + return "SSH itself can't complete. Before re-testing in Scarf, confirm `ssh ` works in Terminal." + case .remoteUser, .remoteHome: + return nil + case .hermesHomeConfigured: + return nil + case .hermesDirExists: + return "Scarf is looking at the default `~/.hermes`. If Hermes is installed elsewhere (e.g. `/var/lib/hermes/.hermes` for systemd installs), set the Hermes home directory in Manage Servers → this server → Edit." + case .hermesDirReadable: + return "The SSH user can see `~/.hermes` but can't list it. Check permissions: `ls -ld ~/.hermes` on the remote — the SSH user needs at least `r-x`." + case .configYAMLReadable, .configYAMLContents: + return "Scarf can't read `config.yaml`. This usually means the SSH user is different from the user Hermes runs as. Either (a) run Hermes as the SSH user, (b) `chmod a+r ~/.hermes/config.yaml`, or (c) configure Scarf to SSH as the Hermes user." + case .stateDBReadable: + return "Scarf can't read `state.db` — Sessions, Activity, Dashboard stats all depend on this. Same fix pattern as config.yaml." + case .sqlite3Installed: + return "Scarf pulls a snapshot of state.db via `sqlite3 .backup`, so sqlite3 must be installed on the remote. Install: `sudo apt install sqlite3` (Ubuntu/Debian), `sudo yum install sqlite` (RHEL/Fedora), `apk add sqlite` (Alpine)." + case .sqlite3CanOpenStateDB: + return "sqlite3 exists but can't open state.db. Could be a permission issue, a corrupt DB, or a version skew." + case .hermesBinaryNonLogin: + return "Scarf's runtime calls use non-login SSH shells (no .bashrc). If `hermes` only appears here via the login path, runtime CLI calls will fail. Move your PATH export from `.bashrc` to `.zshenv` or `.profile`." + case .hermesBinaryLogin: + return "hermes couldn't be located even after sourcing login rc files. Install path is non-standard — set the hermes binary path manually in Manage Servers." + case .pgrepAvailable: + return "pgrep not found on remote. Dashboard can't determine whether Hermes is running. Install procps: `apt install procps` (most distros have it by default)." + } + } + } + + struct Probe: Identifiable, Sendable { + let id: ProbeID + let passed: Bool + let detail: String + } + + private(set) var probes: [Probe] = [] + private(set) var isRunning: Bool = false + private(set) var startedAt: Date? + private(set) var finishedAt: Date? + + init(context: ServerContext) { + self.context = context + } + + /// Kick off the full check list. Safe to call again to re-run. + func run() async { + if isRunning { return } + isRunning = true + probes = [] + startedAt = Date() + finishedAt = nil + + let script = Self.buildScript(hermesHome: context.paths.home) + let captured = await Self.execute(script: script, context: context) + + switch captured { + case .connectFailure(let msg): + probes = [ + Probe(id: .connectivity, passed: false, detail: msg) + ] + ProbeID.allCases + .filter { $0 != .connectivity } + .map { Probe(id: $0, passed: false, detail: "(skipped — SSH didn't connect)") } + case .completed(let stdout, let stderr, let exitCode): + probes = Self.parse(stdout: stdout, stderr: stderr, exitCode: exitCode) + } + + finishedAt = Date() + isRunning = false + Self.logger.info("Diagnostics for \(self.context.displayName, privacy: .public) finished — \(self.passingCount)/\(self.probes.count) passing") + } + + /// Quick summary string, e.g. "9/14 passing". Used in the header. + var summary: String { + guard !probes.isEmpty else { return "Not yet run." } + return "\(passingCount)/\(probes.count) checks passing" + } + + var passingCount: Int { + probes.filter { $0.passed }.count + } + + var allPassed: Bool { + !probes.isEmpty && passingCount == probes.count + } + + // MARK: - Script + parsing + + /// Build the remote shell script. Uses a pipe-delimited protocol so the + /// Swift side can parse without regex surprises. Status is `PASS` or + /// `FAIL`; detail is a single line (can be blank). `__END__` at the + /// bottom lets us detect truncation. + private static func buildScript(hermesHome: String) -> String { + // Shell-quote the home path — user may have typed `~/.hermes` which + // we want the remote shell to expand, so we substitute `~/` with + // `$HOME/` like `SSHTransport.remotePathArg` does. + let expanded: String + if hermesHome.hasPrefix("~/") { + expanded = "\"$HOME/\(hermesHome.dropFirst(2))\"" + } else if hermesHome == "~" { + expanded = "\"$HOME\"" + } else { + // Absolute path — still quote in case of spaces. + expanded = "\"\(hermesHome.replacingOccurrences(of: "\"", with: "\\\""))\"" + } + + return #""" + H=\#(expanded) + emit() { printf '%s|%s|%s\n' "$1" "$2" "$3"; } + + emit connectivity PASS "(running in this shell)" + + user=$(id -un 2>/dev/null || echo unknown) + emit remoteUser PASS "$user" + + emit remoteHome PASS "$HOME" + + emit hermesHomeConfigured PASS "$H" + + if [ -d "$H" ]; then + emit hermesDirExists PASS "$H" + else + emit hermesDirExists FAIL "not a directory: $H" + fi + + if [ -r "$H" ] && [ -x "$H" ]; then + emit hermesDirReadable PASS "" + else + emit hermesDirReadable FAIL "cannot read/enter $H (check perms on the dir)" + fi + + if [ -r "$H/config.yaml" ]; then + emit configYAMLReadable PASS "" + else + if [ -e "$H/config.yaml" ]; then + emit configYAMLReadable FAIL "exists but not readable by $user" + else + emit configYAMLReadable FAIL "file does not exist" + fi + fi + + if head -c 1 "$H/config.yaml" > /dev/null 2>&1; then + size=$(wc -c < "$H/config.yaml" 2>/dev/null | tr -d ' ') + emit configYAMLContents PASS "${size} bytes" + else + emit configYAMLContents FAIL "cannot read file contents" + fi + + if [ -r "$H/state.db" ]; then + size=$(wc -c < "$H/state.db" 2>/dev/null | tr -d ' ') + emit stateDBReadable PASS "${size} bytes" + else + if [ -e "$H/state.db" ]; then + emit stateDBReadable FAIL "exists but not readable by $user" + else + emit stateDBReadable FAIL "file does not exist" + fi + fi + + if command -v sqlite3 > /dev/null 2>&1; then + sq=$(command -v sqlite3) + emit sqlite3Installed PASS "$sq" + else + emit sqlite3Installed FAIL "sqlite3 not on PATH" + fi + + if sqlite3 "$H/state.db" 'SELECT 1' > /dev/null 2>&1; then + emit sqlite3CanOpenStateDB PASS "" + else + err=$(sqlite3 "$H/state.db" 'SELECT 1' 2>&1 | head -1) + emit sqlite3CanOpenStateDB FAIL "$err" + fi + + # Non-login PATH: just ask the current shell. + hpath=$(command -v hermes 2>/dev/null) + if [ -n "$hpath" ]; then + emit hermesBinaryNonLogin PASS "$hpath" + else + emit hermesBinaryNonLogin FAIL "not on non-login PATH ($PATH)" + fi + + # Login PATH: source rc files (mirroring TestConnectionProbe) and re-probe. + for rc in "$HOME/.zshenv" "$HOME/.zprofile" "$HOME/.bash_profile" "$HOME/.profile"; do + [ -f "$rc" ] && . "$rc" 2>/dev/null + done + hpath2=$(command -v hermes 2>/dev/null) + if [ -z "$hpath2" ]; then + for cand in "$HOME/.local/bin/hermes" "/opt/homebrew/bin/hermes" "/usr/local/bin/hermes" "$HOME/.hermes/bin/hermes"; do + if [ -x "$cand" ]; then hpath2="$cand"; break; fi + done + fi + if [ -n "$hpath2" ]; then + emit hermesBinaryLogin PASS "$hpath2" + else + emit hermesBinaryLogin FAIL "not found after sourcing rc files" + fi + + if command -v pgrep > /dev/null 2>&1; then + emit pgrepAvailable PASS "$(command -v pgrep)" + else + emit pgrepAvailable FAIL "pgrep not on PATH" + fi + + printf '__END__\n' + """# + } + + enum Captured { + case connectFailure(String) + case completed(stdout: String, stderr: String, exitCode: Int32) + } + + private static func execute(script: String, context: ServerContext) async -> Captured { + let transport = context.makeTransport() + return await Task.detached { () -> Captured in + do { + let result = try transport.runProcess( + executable: "/bin/sh", + args: ["-c", script], + stdin: nil, + timeout: 30 + ) + return .completed( + stdout: result.stdoutString, + stderr: result.stderrString, + exitCode: result.exitCode + ) + } catch { + let desc = (error as? LocalizedError)?.errorDescription ?? error.localizedDescription + return .connectFailure(desc) + } + }.value + } + + private static func parse(stdout: String, stderr: String, exitCode: Int32) -> [Probe] { + var results: [ProbeID: Probe] = [:] + for line in stdout.split(whereSeparator: { $0 == "\n" || $0 == "\r" }) { + let parts = line.split(separator: "|", maxSplits: 2, omittingEmptySubsequences: false) + guard parts.count == 3 else { continue } + let key = String(parts[0]).trimmingCharacters(in: .whitespaces) + let status = String(parts[1]).trimmingCharacters(in: .whitespaces) + let detail = String(parts[2]).trimmingCharacters(in: .whitespaces) + guard let probe = ProbeID(rawValue: key) else { continue } + results[probe] = Probe( + id: probe, + passed: status == "PASS", + detail: detail + ) + } + + // If the script didn't complete, fill in the missing probes so the UI + // still shows every expected row (rather than silently skipping). + let terminated = stdout.contains("__END__") + let fallbackDetail: String + if terminated { + fallbackDetail = "(no output)" + } else if exitCode != 0 { + fallbackDetail = "(script exited \(exitCode) before this check — stderr: \(stderr.prefix(200)))" + } else { + fallbackDetail = "(no output from script)" + } + + return ProbeID.allCases.map { id in + results[id] ?? Probe(id: id, passed: false, detail: fallbackDetail) + } + } +} diff --git a/scarf/scarf/Features/Servers/ViewModels/TestConnectionProbe.swift b/scarf/scarf/Features/Servers/ViewModels/TestConnectionProbe.swift index b2aeb34..0e3ba44 100644 --- a/scarf/scarf/Features/Servers/ViewModels/TestConnectionProbe.swift +++ b/scarf/scarf/Features/Servers/ViewModels/TestConnectionProbe.swift @@ -47,6 +47,26 @@ struct TestConnectionProbe { // Scarf's local resolution. // The matched absolute path is stored as `hermesBinaryHint` on the // SSHConfig so subsequent CLI/ACP invocations don't have to re-probe. + // If the user already typed a remoteHome override, use it; otherwise + // default to $HOME/.hermes. Either way, the script also probes a + // short list of well-known alternates when the primary path doesn't + // have state.db — systemd/docker/VPS installs tend to live at + // /var/lib/hermes/.hermes or /home/hermes/.hermes, and SSHing in as + // a different user than the Hermes daemon is the leading cause of + // "connection green, data empty" bug reports (issue #19). + let primary: String + if let override = config.remoteHome, !override.isEmpty { + if override.hasPrefix("~/") { + primary = "$HOME/\(override.dropFirst(2))" + } else if override == "~" { + primary = "$HOME" + } else { + primary = override + } + } else { + primary = "$HOME/.hermes" + } + let script = #""" hpath=$(command -v hermes 2>/dev/null) if [ -z "$hpath" ]; then @@ -61,7 +81,21 @@ struct TestConnectionProbe { done fi echo "HERMES:$hpath" - if [ -e "$HOME/.hermes/state.db" ]; then echo DB:ok; else echo DB:missing; fi + PRIMARY="\#(primary)" + if [ -r "$PRIMARY/state.db" ]; then + echo "DB:ok" + echo "HOME_USED:$PRIMARY" + else + echo "DB:missing" + # Probe well-known alternates. Emit the first one that has a + # readable state.db so the UI can offer a one-click fill. + for alt in "/var/lib/hermes/.hermes" "/opt/hermes/.hermes" "/home/hermes/.hermes" "/root/.hermes"; do + if [ -r "$alt/state.db" ]; then + echo "SUGGEST:$alt" + break + fi + done + fi """# sshArgs.append("/bin/sh") sshArgs.append("-c") @@ -133,6 +167,8 @@ struct TestConnectionProbe { let hermesPath = lines.first(where: { $0.hasPrefix("HERMES:") })? .dropFirst("HERMES:".count).trimmingCharacters(in: .whitespaces) ?? "" let dbFound = lines.contains(where: { $0 == "DB:ok" }) + let suggestedHome = lines.first(where: { $0.hasPrefix("SUGGEST:") }) + .map { String($0.dropFirst("SUGGEST:".count)).trimmingCharacters(in: .whitespaces) } if hermesPath.isEmpty { return .failure( message: "hermes binary not found in remote $PATH", @@ -140,7 +176,7 @@ struct TestConnectionProbe { command: displayCommand ) } - return .success(hermesPath: String(hermesPath), dbFound: dbFound) + return .success(hermesPath: String(hermesPath), dbFound: dbFound, suggestedRemoteHome: suggestedHome) } // Classify common failures by scanning the stderr trace. diff --git a/scarf/scarf/Features/Servers/Views/AddServerSheet.swift b/scarf/scarf/Features/Servers/Views/AddServerSheet.swift index 2cfbdc0..2ee63e5 100644 --- a/scarf/scarf/Features/Servers/Views/AddServerSheet.swift +++ b/scarf/scarf/Features/Servers/Views/AddServerSheet.swift @@ -81,11 +81,15 @@ struct AddServerSheet: View { } } - LabeledField("Remote ~/.hermes override") { - TextField("Leave blank for default", text: $viewModel.remoteHome) + LabeledField("Hermes data directory") { + TextField("Default: ~/.hermes", text: $viewModel.remoteHome) .textFieldStyle(.roundedBorder) .autocorrectionDisabled() } + Text("Leave blank unless Hermes is installed at a non-default path (systemd services often live at /var/lib/hermes/.hermes; Docker sidecars vary). Test Connection auto-suggests a value when it detects one of the known alternates.") + .font(.caption) + .foregroundStyle(.secondary) + .fixedSize(horizontal: false, vertical: true) Text("Scarf uses ssh-agent for authentication. If your key has a passphrase, run `ssh-add` before connecting — Scarf never prompts for or stores passphrases.") .font(.caption) @@ -113,14 +117,43 @@ struct AddServerSheet: View { if let result = viewModel.testResult { switch result { - case .success(let path, let dbFound): - VStack(alignment: .leading, spacing: 4) { + case .success(let path, let dbFound, let suggestedHome): + VStack(alignment: .leading, spacing: 6) { Label("Connected", systemImage: "checkmark.circle.fill") .foregroundStyle(.green) Text("hermes at \(path)").font(.caption).monospaced() - Text(dbFound ? "state.db found" : "state.db not found — Hermes may not have run yet on the remote") - .font(.caption) - .foregroundStyle(dbFound ? Color.secondary : Color.orange) + if dbFound { + Text("state.db readable") + .font(.caption) + .foregroundStyle(.secondary) + } else if let suggestion = suggestedHome { + // Scarf found Hermes data at one of the common + // alternate paths. One-click fill the + // remoteHome field so the user doesn't have to + // know this is a convention thing. + VStack(alignment: .leading, spacing: 4) { + Text("state.db not found at the default location, but Scarf found one at:") + .font(.caption) + .foregroundStyle(.orange) + HStack { + Text(suggestion) + .font(.caption.monospaced()) + .textSelection(.enabled) + Spacer() + Button("Use this") { + viewModel.remoteHome = suggestion + } + .controlSize(.small) + } + .padding(8) + .background(Color.yellow.opacity(0.12), in: RoundedRectangle(cornerRadius: 6)) + } + } else { + Text("state.db not found at the configured path. Either Hermes hasn't run yet on this server, or it's installed at a non-default location — set the Hermes data directory field above.") + .font(.caption) + .foregroundStyle(.orange) + .fixedSize(horizontal: false, vertical: true) + } } case .failure(let message, let stderr, let command): VStack(alignment: .leading, spacing: 6) { diff --git a/scarf/scarf/Features/Servers/Views/ConnectionStatusPill.swift b/scarf/scarf/Features/Servers/Views/ConnectionStatusPill.swift index 3533585..1e3e468 100644 --- a/scarf/scarf/Features/Servers/Views/ConnectionStatusPill.swift +++ b/scarf/scarf/Features/Servers/Views/ConnectionStatusPill.swift @@ -8,12 +8,17 @@ import SwiftUI struct ConnectionStatusPill: View { let status: ConnectionStatusViewModel @State private var showDetails = false + @State private var showDiagnostics = false var body: some View { Button { switch status.status { case .error: showDetails = true + case .degraded: + // Yellow "can't read" state — open the diagnostics sheet + // so the user can see exactly which files fail and why. + showDiagnostics = true case .connected, .idle: status.retry() } @@ -36,11 +41,15 @@ struct ConnectionStatusPill: View { .popover(isPresented: $showDetails, arrowEdge: .bottom) { errorDetails.frame(width: 400) } + .sheet(isPresented: $showDiagnostics) { + RemoteDiagnosticsView(context: status.context) + } } private var color: Color { switch status.status { case .connected: return .green + case .degraded: return .orange case .idle: return .yellow case .error: return .red } @@ -49,6 +58,7 @@ struct ConnectionStatusPill: View { private var label: String { switch status.status { case .connected: return "Connected" + case .degraded: return "Connected — can't read Hermes state" case .idle: return "Checking…" case .error(let message, _): return message } @@ -62,6 +72,8 @@ struct ConnectionStatusPill: View { return "Last probe: \(fmt.localizedString(for: ts, relativeTo: Date()))" } return "Connected" + case .degraded(let reason): + return "SSH works but \(reason). Click for diagnostics." case .idle: return "Waiting for first probe" case .error(_, _): return "Click for details" } diff --git a/scarf/scarf/Features/Servers/Views/ManageServersView.swift b/scarf/scarf/Features/Servers/Views/ManageServersView.swift index 994c789..ffaa929 100644 --- a/scarf/scarf/Features/Servers/Views/ManageServersView.swift +++ b/scarf/scarf/Features/Servers/Views/ManageServersView.swift @@ -6,6 +6,7 @@ struct ManageServersView: View { @Environment(ServerRegistry.self) private var registry @State private var showAddSheet = false @State private var pendingRemoveID: ServerID? + @State private var diagnosticsContext: ServerContext? var body: some View { VStack(alignment: .leading, spacing: 0) { @@ -17,12 +18,18 @@ struct ManageServersView: View { list } } - .frame(width: 380, height: 360) + .frame(width: 440, height: 380) .sheet(isPresented: $showAddSheet) { AddServerSheet { name, config in _ = registry.addServer(displayName: name, config: config) } } + .sheet(item: Binding( + get: { diagnosticsContext.map { IdentifiableContext(context: $0) } }, + set: { diagnosticsContext = $0?.context } + )) { wrapper in + RemoteDiagnosticsView(context: wrapper.context) + } .confirmationDialog( "Remove this server?", isPresented: Binding( @@ -42,6 +49,13 @@ struct ManageServersView: View { ) } + /// Wrapper because `ServerContext` isn't `Identifiable` against the sheet + /// item API in a way that preserves display-ordering stability. + private struct IdentifiableContext: Identifiable { + var id: ServerID { context.id } + let context: ServerContext + } + private var header: some View { HStack { Text("Servers").font(.headline) @@ -87,6 +101,13 @@ struct ManageServersView: View { } } Spacer() + Button { + diagnosticsContext = entry.context + } label: { + Image(systemName: "stethoscope") + } + .buttonStyle(.borderless) + .help("Run remote diagnostics — check exactly which files are readable on this server.") Button { pendingRemoveID = entry.id } label: { @@ -94,6 +115,7 @@ struct ManageServersView: View { } .buttonStyle(.borderless) .foregroundStyle(.red) + .help("Remove this server from Scarf.") } .padding(.vertical, 4) } diff --git a/scarf/scarf/Features/Servers/Views/RemoteDiagnostics.swift b/scarf/scarf/Features/Servers/Views/RemoteDiagnostics.swift new file mode 100644 index 0000000..26cc440 --- /dev/null +++ b/scarf/scarf/Features/Servers/Views/RemoteDiagnostics.swift @@ -0,0 +1,166 @@ +import SwiftUI +import AppKit + +/// Per-server diagnostics sheet. Shown from Manage Servers and from the +/// Dashboard "Run Diagnostics…" button when `lastReadError` is set. Gives +/// the user a specific list of what does/doesn't work over SSH, with +/// targeted remediation hints for each failure. +/// +/// Design principle: a failing check always shows both the raw detail the +/// remote shell produced AND a human-written hint. The raw detail lets us +/// triage bug reports; the hint unblocks the user without a round trip. +struct RemoteDiagnosticsView: View { + let context: ServerContext + @State private var viewModel: RemoteDiagnosticsViewModel + @Environment(\.dismiss) private var dismiss + + init(context: ServerContext) { + self.context = context + _viewModel = State(initialValue: RemoteDiagnosticsViewModel(context: context)) + } + + var body: some View { + VStack(spacing: 0) { + header + Divider() + probeList + Divider() + footer + } + .frame(minWidth: 640, minHeight: 520) + .task { await viewModel.run() } + } + + private var header: some View { + VStack(alignment: .leading, spacing: 6) { + HStack { + Text("Remote Diagnostics — \(context.displayName)") + .font(.title3) + .fontWeight(.semibold) + Spacer() + if viewModel.isRunning { + ProgressView() + .controlSize(.small) + } else { + Button("Re-run") { Task { await viewModel.run() } } + .controlSize(.small) + } + } + HStack { + if viewModel.isRunning { + Text("Running checks…") + .font(.callout) + .foregroundStyle(.secondary) + } else { + Label(viewModel.summary, systemImage: viewModel.allPassed ? "checkmark.seal" : "info.circle") + .font(.callout) + .foregroundStyle(viewModel.allPassed ? .green : .orange) + } + Spacer() + if !viewModel.probes.isEmpty { + Button { + copyReportToClipboard() + } label: { + Label("Copy Full Report", systemImage: "doc.on.doc") + } + .controlSize(.small) + .help("Copy a plain-text summary of every check (passes and fails) — paste into GitHub issues so we can see everything at once.") + } + } + } + .padding(16) + } + + private var probeList: some View { + ScrollView { + VStack(alignment: .leading, spacing: 0) { + if viewModel.probes.isEmpty && viewModel.isRunning { + Text("Running a single shell session on \(context.displayName) that exercises every path Scarf reads…") + .font(.callout) + .foregroundStyle(.secondary) + .padding() + } + ForEach(viewModel.probes) { probe in + probeRow(probe) + if probe.id != viewModel.probes.last?.id { + Divider() + } + } + } + } + } + + private func probeRow(_ probe: RemoteDiagnosticsViewModel.Probe) -> some View { + HStack(alignment: .top, spacing: 12) { + Image(systemName: probe.passed ? "checkmark.circle.fill" : "xmark.circle.fill") + .foregroundStyle(probe.passed ? .green : .red) + .font(.title3) + .padding(.top, 2) + VStack(alignment: .leading, spacing: 4) { + Text(probe.id.title) + .font(.body) + if !probe.detail.isEmpty { + Text(probe.detail) + .font(.caption.monospaced()) + .foregroundStyle(.secondary) + .textSelection(.enabled) + } + if !probe.passed, let hint = probe.id.failureHint { + HStack(alignment: .top, spacing: 6) { + Image(systemName: "lightbulb") + .foregroundStyle(.yellow) + .font(.caption) + Text(hint) + .font(.caption) + .foregroundStyle(.primary) + .textSelection(.enabled) + .fixedSize(horizontal: false, vertical: true) + } + .padding(8) + .background(Color.yellow.opacity(0.08)) + .clipShape(RoundedRectangle(cornerRadius: 6)) + } + } + Spacer(minLength: 0) + } + .padding(.horizontal, 16) + .padding(.vertical, 10) + } + + private var footer: some View { + HStack { + Text("Scarf runs these over a single SSH session that mirrors the shell your dashboard reads from, so a green row here means Scarf can actually read that file at runtime.") + .font(.caption) + .foregroundStyle(.secondary) + .fixedSize(horizontal: false, vertical: true) + Spacer() + Button("Done") { dismiss() } + .keyboardShortcut(.defaultAction) + } + .padding(16) + } + + private func copyReportToClipboard() { + var lines: [String] = [] + lines.append("Scarf remote diagnostics — \(context.displayName)") + if case .ssh(let config) = context.kind { + lines.append("Host: \(config.host)" + (config.user.map { " (user: \($0))" } ?? "")) + if let rh = config.remoteHome { lines.append("Hermes home (override): \(rh)") } + } + lines.append("Ran at: \(viewModel.startedAt.map { ISO8601DateFormatter().string(from: $0) } ?? "?")") + lines.append("Result: \(viewModel.summary)") + lines.append("") + for probe in viewModel.probes { + let mark = probe.passed ? "PASS" : "FAIL" + lines.append("[\(mark)] \(probe.id.title)") + if !probe.detail.isEmpty { lines.append(" \(probe.detail)") } + if !probe.passed, let hint = probe.id.failureHint { + lines.append(" hint: \(hint)") + } + } + let text = lines.joined(separator: "\n") + let pb = NSPasteboard.general + pb.clearContents() + pb.setString(text, forType: .string) + } +} From f8069a44810a0a0887a4746693ae21f9013c5cfc Mon Sep 17 00:00:00 2001 From: Alan Wizemann Date: Mon, 20 Apr 2026 13:52:28 -0700 Subject: [PATCH 2/8] fix: don't log 'No such file' as warnings on remote reads MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Result-returning readers I added for the v2.0.1 diagnostics surface were logging EVERY failure, including routine "file doesn't exist" cases — e.g. skill.yaml files under ~/.hermes/skills/*/ that are optional metadata, gateway_state.json before Hermes has started, memories/USER.md on fresh installs. In practice this meant the Platforms view and similar feature loaders that walk directories and read optional files now spam the Console with warnings on every refresh. That's noisier than useful and actively hides the signal (permission denied, connection failure, sqlite3 missing) we added the logging to surface. readFileDataResult now detects the "no such file" case via either: - TransportError.fileIO(_, "No such file...") from SSHTransport - NSCocoaErrorDomain code 260 (NSFileNoSuchFileError) from FileManager - NSPOSIXErrorDomain code 2 (ENOENT) and suppresses the warning log for those paths. The Result.failure is still returned, so any caller that cares (Dashboard's banner, Remote Diagnostics) can still distinguish missing from present-but-unreadable. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../Core/Services/HermesFileService.swift | 27 ++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/scarf/scarf/Core/Services/HermesFileService.swift b/scarf/scarf/Core/Services/HermesFileService.swift index 2e3df54..cf5e980 100644 --- a/scarf/scarf/Core/Services/HermesFileService.swift +++ b/scarf/scarf/Core/Services/HermesFileService.swift @@ -1591,11 +1591,36 @@ struct HermesFileService: Sendable { let data = try transport.readFile(path) return .success(data) } catch { - Self.logger.warning("readFile(\(path, privacy: .public)) failed: \(error.localizedDescription, privacy: .public)") + // Don't log "No such file" — that's a routine, expected case + // for optional files (skill.yaml, gateway_state.json before + // Hermes starts, ~/.hermes/memories/USER.md on fresh installs, + // etc.). The caller still gets the Result.failure so it can + // distinguish missing from present-but-unreadable. + // Log everything else — permission denied, connection drops, + // sqlite3 missing — since those are actionable diagnostics. + if !Self.isFileNotFound(error) { + Self.logger.warning("readFile(\(path, privacy: .public)) failed: \(error.localizedDescription, privacy: .public)") + } return .failure(error) } } + /// `true` iff the error represents "file does not exist" as opposed to + /// a permission / transport / parse failure. Used to suppress routine + /// logging for optional files while still surfacing real problems. + nonisolated private static func isFileNotFound(_ error: Error) -> Bool { + if let transportErr = error as? TransportError, + case .fileIO(_, let underlying) = transportErr { + return underlying.lowercased().contains("no such file") + } + // Cocoa NSFileNoSuchFileError (returned by LocalTransport when + // reading a missing file via FileManager). + let ns = error as NSError + if ns.domain == NSCocoaErrorDomain && ns.code == 260 { return true } + if ns.domain == NSPOSIXErrorDomain && ns.code == 2 { return true } // ENOENT + return false + } + /// Write a UTF-8 text file atomically through the transport. Matches the /// old pre-transport behavior (print + swallow on error) because the /// callers don't have a UI path for surfacing I/O failures — that's From ec03627bcd3f09372f52a24583a5299a390f41cb Mon Sep 17 00:00:00 2001 From: Alan Wizemann Date: Mon, 20 Apr 2026 13:56:09 -0700 Subject: [PATCH 3/8] =?UTF-8?q?fix:=20diagnostics=20sheet=20=E2=80=94=20by?= =?UTF-8?q?pass=20transport.runProcess=20for=20shell=20script?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First-run of diagnostics against a working Mardon returned 0/14 passing with "(no output)" for every probe — including the trivial "emit connectivity PASS" that the script emits unconditionally. That meant the script wasn't executing as written; the parser saw `__END__` but no probe lines. Root cause: SSHTransport.runProcess wraps every argument through `remotePathArg`, which is designed for PATHS (it rewrites `~/` to `$HOME/` and double-quotes the result with backslash-escapes). Passing a multi-line shell script with embedded `"$1"` / `"$2"` / `"$3"` and `printf '\n'` escape sequences through that is corruption — the remote sh -c receives a scrambled script and silently emits nothing. TestConnectionProbe already works around this: it builds the ssh argv directly (ssh host -- /bin/sh -c