Compare commits

...

16 Commits

Author SHA1 Message Date
Alan Wizemann 0384c6ef17 chore: Bump version to 2.0.2 2026-04-20 15:46:07 -07:00
Alan Wizemann f36fb55ebe test(ssh): regression tests for ControlPath socket-limit invariants
Two tests pinning the invariants that were violated / introduced
by the #19 / PR #20 fix:

- controlDirPathFitsMacOSSocketLimit: asserts dir + '/' + 64-char
  %C hash + NUL <= 104 bytes. Would have caught the original
  Caches-based path landing at 105 bytes for users with longer
  $HOME strings.

- controlDirPathIsPerUser: asserts the path includes the current
  uid, pinning the per-user-isolation invariant against any future
  refactor that drops it (since /tmp is shared across all local
  users).

scarfTests was a stub before this — these are the suite's first
real tests.
2026-04-20 15:45:29 -07:00
Alan Wizemann 1823160546 fix(ssh): defensive ControlPath dir + sweep stale sockets
Layered hardening on top of the /tmp ControlPath move from #20:

- ensureControlDir uses POSIX mkdir(0700) + lstat instead of
  createDirectory + setAttributes. Closes the /tmp pre-creation
  TOCTOU: any local user can pre-create /tmp/scarf-ssh-<uid>, and
  the old code would silently fail to chmod a hostile dir back to
  0700 (since we wouldn't own it). Now we refuse to use a dir that
  isn't a real directory we own with mode 0700, and log via
  os.Logger.

- sweepStaleControlSockets removes ControlMaster socket files
  older than 30 minutes from controlDirPath() at app launch.
  Symmetric to sweepOrphanSnapshots — keeps /tmp/scarf-ssh-<uid>/
  from accumulating crashed-master / unclean-exit orphans
  indefinitely until reboot. The 30-min threshold (vs ControlPersist's
  10 min) ensures any concurrent Scarf instance's live sockets
  are untouched.
2026-04-20 15:45:20 -07:00
Alan Wizemann d2a447fcc4 docs: add GitHub wiki + scripts/wiki.sh helper with secret-scan
Public docs now live at https://github.com/awizemann/scarf/wiki (separate
git repo cloned to .wiki-worktree/, mirroring the .gh-pages-worktree/
pattern). Internal dev notes stay in scarf/docs/.

scripts/wiki.sh wraps pull/commit/push with a two-pass secret-scan: hard
patterns (token regexes + private-key headers + a user-maintained
scripts/wiki-blocklist.txt) abort with non-zero exit; soft assignment
patterns (api_key=…, password=…, token=…) warn and require --force-terms.

CLAUDE.md gains a Wiki section listing the update triggers (new feature,
new service, architecture change, Hermes version bump, full release,
keyboard/sidebar change) and the workflow. CONTRIBUTING.md points
external contributors at the wiki Edit button or a direct clone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:32:47 -07:00
Alan Wizemann 76bfeb34d4 chore: Bump version to 2.0.1 2026-04-20 15:32:47 -07:00
Alan Wizemann 85a4ec0e14 Merge pull request #20 from aliatx2017/fix/controlpath-too-long
fix: ControlPath too long for Unix socket on macOS
2026-04-20 15:08:40 -07:00
Alan Wizemann 1453c7a841 Merge fix/issue-19-ssh-diagnostics into main — v2.0.1 hotfix
Closes #19 (remote SSH connections showed connected but every view
read as empty). Eight commits bring:
- Result-returning readers in HermesFileService that surface errors
  instead of silently returning nil
- HermesDataService.open records lastOpenError with humanized hints
- Dashboard orange banner when remote reads fail
- New Remote Diagnostics sheet (14-probe checklist, stethoscope icon)
- Yellow 'degraded' pill state for 'connected but can't read' case
- Auto-suggest remoteHome in Test Connection for systemd/Docker
  installs at /var/lib/hermes/.hermes etc.
- Log-noise suppression for expected 'No such file' reads
- Diagnostics script pipes via stdin to sh -s (not sh -c argv), so
  multi-line scripts run in one sh process with variable scope
- Pill UX: state-specific SF Symbol instead of dot, no custom
  background, centered via .principal
- README 'Remote setup requirements' + troubleshooting section

Investigation notes + deferred follow-ups recorded in the session
transcript. See releases/v2.0.1/RELEASE_NOTES.md for the full
user-facing breakdown.
2026-04-20 14:27:11 -07:00
Alan Wizemann bd21a539e6 docs: update v2.0.1 release notes for diagnostics fixes + pill UX
Reflect the three post-initial-commit fixes:
- log-noise suppression (skill.yaml / optional-file 'No such file'
  warnings no longer spam Console via the new Result-returning readers)
- diagnostics script now stdin-pipes to sh -s instead of sh -c <script>
  argv, so it runs as one sh process with variable scope preserved
- pill UX: replaced colored dot with state-specific SF Symbol
  (checkmark / stethoscope / arrows / triangle), removed custom
  background, kept .principal placement for centering

Also expanded the 'Known follow-ups' section so users know what's
explicitly deferred post-2.0.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:26:33 -07:00
Alan Wizemann d3055702ef fix: connection pill — revert to .principal, swap dot for state SF Symbol
Rolling back the .primaryAction placement (the pill shifted right and
lost its centered position in the toolbar). The "funny background with
shadow" visible in the toolbar is macOS's own .principal emphasis bezel
— not something Scarf draws, and not something we can cleanly hide
without disabling the toolbar surface itself. The native bezel is the
pill's frame; we just have to make the pill's interior read well inside
it.

Two changes to make the pill itself look like a toolbar tool inside
that bezel:

- Drop the colored dot, replace with a state-specific SF Symbol. The
  icon's shape signals clickability (looks like a tool button), and its
  color signals state (green/orange/yellow/red hierarchical). Less
  "status chip", more "toolbar button with status".
- Icons per state:
  - connected  → checkmark.circle.fill (click to re-probe)
  - degraded   → stethoscope (click to run diagnostics, matches the
                 stethoscope on the Manage Servers row)
  - idle       → arrow.triangle.2.circlepath (checking/retry)
  - error      → exclamationmark.triangle.fill (click for stderr)

Horizontal padding = 4 so the icon-and-label sit balanced inside the
bezel rather than pushed up against its edges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:22:35 -07:00
Alan Wizemann ee1d705abc fix: move connection pill off .principal to drop the emphasis bezel
macOS applies a centered emphasis bezel (light capsule + drop shadow)
to ToolbarItem(placement: .principal) — visible in screenshots as a
doubly-framed "capsule behind the pill" look. The pill itself doesn't
own that background; the toolbar placement does.

.primaryAction (right side of the toolbar) has no decorative
background, so the pill renders as just the colored dot + label text
directly on the toolbar surface. Fits the intended minimal look.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:16:54 -07:00
Alan Wizemann 8e3dafe4c6 fix: remove the pill's own capsule background
The toolbar item already draws its own bezel for the principal-placement
slot; painting a `Color.secondary.opacity(0.08)` capsule on top gave the
pill a doubly-framed look. Drop the pill's background + the padding that
was only there to fit inside the capsule. The dot + label now sit
directly on the toolbar's native surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:14:12 -07:00
Alan Wizemann c51241dc72 fix: diagnostics script — pipe to sh via stdin, not sh -c argv
The previous fix (direct ssh argv, bypassing transport.runProcess) got
us from 0/14 to 7/14, but \$H was empty everywhere it was referenced —
the user's 7/14 report showed:
- probe 4 (hermesHomeConfigured): PASS with empty detail
- probe 5 (hermesDirExists FAIL): "not a directory:" (empty after colon)
- probe 11 (sqlite3CanOpenStateDB FAIL): 'unable to open "/state.db"'

Root cause: `ssh host -- /bin/sh -c <script>` doesn't travel as three
argv entries to the remote. ssh concatenates them with single spaces
into one command string and sends that to the remote's LOGIN shell.
The login shell then runs `$LOGIN_SHELL -c "$string"`, and bash's
parser treats unquoted newlines inside `$string` as command separators.
So the first newline splits the script: `/bin/sh -c H="..."` becomes
one command (which runs in an ephemeral sh subprocess that exits
immediately), and every subsequent line runs in the login shell with
no \$H set.

TestConnectionProbe happens to still work because its downstream lines
don't depend on an assignment from the first line — but the diagnostic
script's \$H is used everywhere, so the entire script is effectively
running with \$H="".

Fix: pipe the script into `/bin/sh -s` on stdin via ssh's own stdin
channel. `sh -s` reads a shell program from stdin and executes it in
one process, variable scope preserved. Implementation uses
Process.standardInput with a Pipe, writing the script after proc.run()
and closing the write end so sh sees EOF. Same as
`cat script.sh | ssh host -- /bin/sh -s` from the command line.

Also: raw-output disclosure panel in the diagnostics sheet now shows
whenever ANY probe fails, not only when all fail. Partial failures are
the most common failure mode and the raw stdout is the only way to see
why a specific detail came back the way it did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:04:32 -07:00
Alan Wizemann ec03627bcd fix: diagnostics sheet — bypass transport.runProcess for shell script
First-run of diagnostics against a working Mardon returned 0/14 passing
with "(no output)" for every probe — including the trivial "emit
connectivity PASS" that the script emits unconditionally. That meant the
script wasn't executing as written; the parser saw `__END__` but no
probe lines.

Root cause: SSHTransport.runProcess wraps every argument through
`remotePathArg`, which is designed for PATHS (it rewrites `~/` to
`$HOME/` and double-quotes the result with backslash-escapes). Passing
a multi-line shell script with embedded `"$1"` / `"$2"` / `"$3"` and
`printf '\n'` escape sequences through that is corruption — the remote
sh -c receives a scrambled script and silently emits nothing.

TestConnectionProbe already works around this: it builds the ssh argv
directly (ssh host -- /bin/sh -c <script>) so the script travels as a
single opaque argv entry and ssh forwards it to the remote shell
unchanged.

Mirror that approach. RemoteDiagnosticsViewModel.execute now:
- For remote contexts: builds ssh argv directly (ControlMaster-aware,
  uses the same socket as SSHTransport so it's effectively free after
  the first connection), then passes /bin/sh -c <script> as argv.
- For local contexts: spawns /bin/sh -c <script> via Process directly.

Also surfaces raw stdout/stderr/exit-code in a disclosure panel at the
bottom of the sheet, visible only when ALL probes fail. Makes any
future transport-level breakage self-diagnosing: the user sees exactly
what the remote returned, not just "(no output)" rows.

Expose SSHTransport.controlDirPath (already static) as a public helper
so the diagnostics probe reuses the same ControlMaster socket as the
connection itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:56:09 -07:00
Alan Wizemann f8069a4481 fix: don't log 'No such file' as warnings on remote reads
The Result-returning readers I added for the v2.0.1 diagnostics surface
were logging EVERY failure, including routine "file doesn't exist" cases
— e.g. skill.yaml files under ~/.hermes/skills/*/ that are optional
metadata, gateway_state.json before Hermes has started, memories/USER.md
on fresh installs.

In practice this meant the Platforms view and similar feature loaders
that walk directories and read optional files now spam the Console with
warnings on every refresh. That's noisier than useful and actively hides
the signal (permission denied, connection failure, sqlite3 missing) we
added the logging to surface.

readFileDataResult now detects the "no such file" case via either:
- TransportError.fileIO(_, "No such file...") from SSHTransport
- NSCocoaErrorDomain code 260 (NSFileNoSuchFileError) from FileManager
- NSPOSIXErrorDomain code 2 (ENOENT)

and suppresses the warning log for those paths. The Result.failure is
still returned, so any caller that cares (Dashboard's banner, Remote
Diagnostics) can still distinguish missing from present-but-unreadable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:52:28 -07:00
Alan Wizemann 110170d6e9 fix: v2.0.1 — surface remote SSH file-access errors (closes #19)
Three users reported on day-one of v2.0 that SSH connections showed a
green "Connected" pill but every data view read as empty / "not running"
/ "not configured". The common thread across Docker, homelab VM, and
Ubuntu VPS setups: file-access failures on the remote that Scarf
silently swallowed into nil/empty defaults.

Stop swallowing errors
- HermesFileService gains Result-returning variants for the four
  dashboard-critical readers: loadConfigResult, loadGatewayStateResult,
  hermesPIDResult, plus readFileResult / readFileDataResult as
  primitives. Each logs os.Logger warnings on failure. Legacy nil-
  returning signatures remain as thin forwarders.
- HermesDataService.open records lastOpenError with humanized hints
  for the top three failure modes — sqlite3 not installed, permission
  denied, file not found. Each maps to concrete remediation (`apt
  install sqlite3`, "check file perms", "set Hermes data directory").

Dashboard surfaces the error
- DashboardViewModel collects errors from every loader into
  lastReadError, only on remote contexts (local skips the banner).
- DashboardView renders an orange banner above the stats with the
  specific error text, a copy-selectable detail, and a "Run
  Diagnostics…" button.

New Remote Diagnostics sheet (stethoscope icon)
- RemoteDiagnosticsViewModel runs 14 checks in one SSH round-trip via
  a pipe-delimited "KEY|STATUS|DETAIL" protocol. Covers: SSH
  connectivity, remote user/$HOME, Hermes dir existence + readability,
  config.yaml readability + actual read (distinct from just `test -e`
  which can't detect permission issues), state.db readability, sqlite3
  binary presence, sqlite3 open test, hermes binary on non-login AND
  login PATH, pgrep availability.
- Each probe row shows a targeted hint on fail (e.g. "check perms on
  ~/.hermes", "apt install sqlite3", "move PATH export from .bashrc
  to .zshenv"). A Copy Full Report button dumps plain-text output
  for GitHub issues.
- Accessible from Manage Servers (stethoscope button per row) and
  directly from the yellow pill.

Yellow "degraded" connection state
- ConnectionStatusViewModel.Status gains .degraded(reason:) between
  .connected and .error. After tier-1 `true` passes, the probe runs
  tier-2 `test -r $HOME/.hermes/config.yaml` in the same SSH round-
  trip. On tier-2 fail, pill is orange with "Connected — can't read
  Hermes state" tooltip.
- Clicking a degraded pill opens Remote Diagnostics directly. Exactly
  the symptom in #19 is now one click from a specific answer.

Auto-suggest remoteHome for non-default installs
- TestConnectionProbe.TestResult.success gains suggestedRemoteHome:
  String?. When state.db isn't found at the configured path, the
  probe also checks /var/lib/hermes/.hermes, /opt/hermes/.hermes,
  /home/hermes/.hermes, /root/.hermes — the common alternates for
  systemd services, Docker containers, and single-user VPSes — and
  surfaces the first hit as a "Use this" suggestion in Add Server.
- AddServerSheet relabels "Remote ~/.hermes override" to "Hermes data
  directory" with an explanation of when you'd use it.

README
- New "Remote setup requirements" subsection lists the four concrete
  prereqs (SSH, sqlite3, pgrep, read access to ~/.hermes).
- New "Troubleshooting remote connections" paragraph describes the
  diagnostics sheet and remoteHome auto-suggest for the two most
  common failure modes.

Releases
- releases/v2.0.1/RELEASE_NOTES.md for the GitHub release body.
- Ship via `./scripts/release.sh 2.0.1`.

Closes #19.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:40:35 -07:00
Alex Maksimchuk 1293cfa23b fix: use short ControlPath to avoid Unix socket limit on macOS
The ControlMaster socket path ~/Library/Caches/scarf/ssh/%C can
exceed the 104-byte macOS Unix domain socket limit when the
username is long, causing ssh to silently exit 255 with
"unix_listener: path too long for Unix domain socket".

Switch to /tmp/scarf-ssh-<uid> which stays well within the limit.
2026-04-19 23:03:28 -05:00
24 changed files with 1685 additions and 88 deletions
+4
View File
@@ -1,6 +1,7 @@
# Xcode
build/
.gh-pages-worktree/
.wiki-worktree/
DerivedData/
*.pbxuser
!default.pbxuser
@@ -52,3 +53,6 @@ scarf/standards/backups/
# history. RELEASE_NOTES.md stays tracked (committed with the version bump).
releases/v*/*.zip
releases/v*/appcast-entry.xml
# Wiki helper: personal patterns (hostnames, IPs) blocked from the wiki push.
scripts/wiki-blocklist.txt
+23
View File
@@ -59,6 +59,29 @@ The script bumps version, archives Universal (arm64 + x86_64) + ARM64-only varia
**Prerequisites (one-time, already set up on Alan's machine):** Developer ID Application cert in login Keychain (team `3Q6X2L86C4`), notarytool keychain profile `scarf-notary`, Sparkle EdDSA private key in Keychain item `https://sparkle-project.org`, `gh-pages` branch + GitHub Pages enabled. See the header of [scripts/release.sh](scripts/release.sh) and the Releases section in [README.md](README.md) for details.
## Wiki
Public documentation lives in the GitHub wiki at https://github.com/awizemann/scarf/wiki. The wiki is a separate git repo cloned to `.wiki-worktree/` in the repo root (gitignored, sibling to `.gh-pages-worktree/`). Internal dev notes stay in `scarf/docs/`; the wiki is for public-facing reference.
**Update the wiki when:**
- A new feature module is added under `scarf/scarf/scarf/Features/` → extend the relevant User Guide page.
- A new core service is added under `Core/Services/` → extend `Core-Services.md`.
- Architecture changes (AppCoordinator, transport, MVVM-F rule, sandbox) → `Architecture-Overview.md` + the specific sub-page.
- Hermes version bumps in this file → `Hermes-Version-Compatibility.md`.
- `scripts/release.sh` completes a full (non-draft) release → bump latest-version on `Home.md` + append to `Release-Notes-Index.md`.
- Keyboard shortcut or sidebar section changes → `Keyboard-Shortcuts.md` / `Sidebar-and-Navigation.md`.
**Skip for:** bug fixes with no user-observable change, pure refactors, typos, test-only changes, internal cleanups.
```bash
./scripts/wiki.sh pull # always first
# edit .wiki-worktree/*.md with normal tools
./scripts/wiki.sh commit "docs: describe X" # runs secret-scan
./scripts/wiki.sh push # runs secret-scan again, then push
```
**Never** commit API keys, tokens, `.env` files, private keys, or real hostnames/IPs to the wiki. The script's two-pass secret-scan blocks common token patterns and a user-maintained blocklist at `scripts/wiki-blocklist.txt` (gitignored). Do not bypass without explicit approval. Full workflow on the wiki itself at `.wiki-worktree/Wiki-Maintenance.md`.
## Hermes Version
Targets Hermes v0.9.0 (v2026.4.13). Log lines may carry an optional `[session_id]` tag between the level and logger name — `HermesLogService.parseLine` treats the session tag as an optional capture group, so older untagged lines still parse.
+4
View File
@@ -33,6 +33,10 @@ Rules:
- The app only reads from `~/.hermes/state.db` (never writes). Memory files are the exception.
- Swift 6 strict concurrency: `@MainActor` default isolation, `nonisolated` for service methods.
## Documentation
Public docs live in the [GitHub wiki](https://github.com/awizemann/scarf/wiki). Small fixes (typos, clarifications) can be made via the "Edit" button on any wiki page — you need push access to the main repo. For larger changes, clone the wiki locally (`git clone git@github.com:awizemann/scarf.wiki.git`) or open an issue describing the proposed change.
## Reporting Issues
Open an issue with:
+15
View File
@@ -42,6 +42,21 @@ Scarf 2.0 is a multi-window app. Each window is bound to exactly one Hermes serv
Remote Hermes is reached over system SSH — the same `~/.ssh/config`, ssh-agent, ProxyJump, and ControlMaster pooling your terminal uses. File I/O flows through `scp`/`sftp`; SQLite is served from atomic `sqlite3 .backup` snapshots cached under `~/Library/Caches/scarf/snapshots/<server-id>/`; chat (ACP) tunnels as `ssh -T host -- hermes acp` with JSON-RPC over stdio end-to-end. Everything in the feature list below works against remote identically to local.
### Remote setup requirements
The remote host must have:
1. **SSH access** — key-based auth via your local ssh-agent. Scarf never prompts for passphrases; run `ssh-add` once in Terminal before connecting.
2. **`sqlite3`** on the remote `$PATH` — needed for the atomic DB snapshots. Install on the remote with `apt install sqlite3` (Ubuntu/Debian), `yum install sqlite` (RHEL/Fedora), or `apk add sqlite` (Alpine).
3. **`pgrep`** on the remote `$PATH` — used by the Dashboard "is Hermes running" check. Standard on every distro; install `procps` if missing.
4. **`~/.hermes/` readable by the SSH user**. When Hermes runs as a separate user (systemd service, Docker container), the SSH user needs read access to `config.yaml` and `state.db`. Either (a) SSH as the Hermes user, (b) `chmod` Hermes's home to be group-readable and add your SSH user to that group, or (c) set the **Hermes data directory** field when adding the server to point at the right location (e.g. `/var/lib/hermes/.hermes`).
### Troubleshooting remote connections
If the connection pill is green but the Dashboard shows "Stopped", "unknown", or empty values, the SSH user can't read the Hermes state files. Open **Manage Servers → 🩺 Run Diagnostics** (or click the yellow "Can't read Hermes state" pill in the toolbar). The diagnostics sheet runs fourteen checks in one SSH session — connectivity, `sqlite3` presence, read access to `config.yaml` and `state.db`, the effective non-login `$PATH` — and tells you exactly which one fails and why, with remediation hints for each. Use the **Copy Full Report** button to paste the full output into a bug report.
For the common "Hermes isn't at the default path" case (systemd services, Docker), **Test Connection** in the Add Server sheet now probes `/var/lib/hermes/.hermes`, `/opt/hermes/.hermes`, `/home/hermes/.hermes`, and `/root/.hermes` when it can't find `state.db` at `~/.hermes/`, and offers a one-click fill if it finds any of them.
## Features
Scarf mirrors Hermes's surface area through a sidebar-based UI. Sections below map 1:1 to the app's sidebar.
+68
View File
@@ -0,0 +1,68 @@
## What's New in 2.0.1
Hotfix for [#19](https://github.com/awizemann/scarf/issues/19) and the related reports from the first day of v2.0: users' remote SSH connections would show a green "Connected" pill but every view (Dashboard, Sessions, Activity, Chat) read as empty / "not running" / "not configured". Three distinct environments reported it — Docker Hermes on a LAN, homelab VM over Tailscale, Ubuntu VPS — and every one was a silent file-access failure on the remote that Scarf wasn't surfacing.
### Errors no longer disappear
Every remote read (`config.yaml`, `gateway_state.json`, `state.db`, `pgrep`) used to silently substitute an empty value on *any* failure — permission denied, missing file, `sqlite3` not installed, connection drop — they all looked identical to the UI. Now:
- Each failure logs a specific warning via `os.Logger` (visible in Console.app under subsystem `com.scarf`).
- The Dashboard shows an orange banner above the stats with the exact error (e.g. "Permission denied reading `~/.hermes/state.db`") and a **Run Diagnostics…** button.
- `HermesDataService` exposes a `lastOpenError` so views can explain *why* state.db couldn't be opened, rather than just rendering zeros.
- Routine "file doesn't exist" cases (optional `skill.yaml` metadata, `gateway_state.json` before Hermes starts, `memories/USER.md` on fresh installs) are detected and **not** logged as warnings — only real errors (permission denied, connection drops, `sqlite3` missing) hit the log. Prevents Console from filling with false-positive noise when directory walks encounter optional files.
### New Remote Diagnostics sheet
Accessible from **Manage Servers → 🩺** per-server button, or by clicking the orange connection pill when Scarf can see the server but can't read Hermes state. Runs fourteen checks in a single SSH session and shows pass/fail for each, plus a targeted hint per failure:
- SSH connectivity and auth
- Remote user identity and `$HOME` resolution
- `~/.hermes` directory existence and readability
- `config.yaml` readable (existence *and* actual read access — the old probe only checked existence)
- `state.db` readable
- `sqlite3` installed on the remote (required for the atomic snapshot Scarf pulls)
- `sqlite3` can actually open `state.db`
- `hermes` binary on the non-login `$PATH` (what runtime uses)
- `hermes` binary on the login `$PATH` (what the Test Connection probe uses)
- `pgrep` available (for the "is Hermes running" check)
One **Copy Full Report** button dumps every check as plain text for bug reports, and a raw-output disclosure panel shows the exact stdout/stderr the remote returned whenever any probe fails — so transport-level problems are self-diagnosing.
The diagnostics script is piped to `/bin/sh -s` on stdin rather than passed as `sh -c <script>` argv. The latter was getting split line-by-line by the remote's login shell (newlines parsed as command separators), which stranded variables set on line 1 in an ephemeral `sh` subprocess that exited before line 2 could use them. Stdin-piping runs the whole script in one `sh` process with variable scope preserved.
### Connection pill gains a "degraded" state
The pill used to be green as long as SSH connected; now after connectivity passes it runs a second-tier check (`test -r $HOME/.hermes/config.yaml`). If that fails, the pill turns **orange** with "Connected — can't read Hermes state" and clicking it opens Remote Diagnostics directly. This is the exact symptom mode in #19, and it's now one click away from a specific answer.
The pill's visual also got a pass: the colored dot is replaced with a state-specific SF Symbol (`checkmark.circle.fill` / `stethoscope` / `arrow.triangle.2.circlepath` / `exclamationmark.triangle.fill`), which reads more like a clickable toolbar tool and doubles as the status signal. No custom pill background anymore — the toolbar's native `.principal` bezel is the frame.
### Auto-suggest the correct `remoteHome` during Add Server
When Test Connection can't find `state.db` at the configured (or default) path, it now also probes the common alternate locations — `/var/lib/hermes/.hermes`, `/opt/hermes/.hermes`, `/home/hermes/.hermes`, `/root/.hermes` — and offers a one-click "Use this" fill if it finds one. Removes the need to know that systemd-installed Hermes lives at `/var/lib/hermes/.hermes` by convention.
### Clearer copy for the `remoteHome` field
The Add Server sheet field is now labeled "Hermes data directory" with a description explaining when you'd override it (systemd service installs, Docker sidecars) and noting that Test Connection auto-suggests.
### README has a new "Remote setup requirements" section
Four concrete prerequisites (SSH, `sqlite3`, `pgrep`, read access to `~/.hermes`) and a troubleshooting paragraph pointing at Remote Diagnostics.
### Migrating from 2.0.0
Sparkle will offer the update automatically. Settings and server list are preserved verbatim — this is purely additive (new diagnostics surface, new error banners, auto-suggest in Test Connection). If you were affected by #19, run Remote Diagnostics after updating; the sheet should pinpoint the specific file access issue and suggest a fix.
### Under the hood
- New types: `RemoteDiagnosticsViewModel`, `RemoteDiagnosticsView`. Both are local to Scarf; no new transport protocol.
- `HermesFileService` gains `loadConfigResult()`, `loadGatewayStateResult()`, `hermesPIDResult()`, `readFileResult()`, `readFileDataResult()` — Result-returning variants that preserve the error. Legacy `loadConfig()` etc. still exist as thin forwarders for callers that don't need diagnostics.
- `HermesDataService.open()` records `lastOpenError` with humanized hints for "sqlite3 not installed", "permission denied", and "file not found" — the three failure modes that produce 90% of issue #19 symptoms.
- `ConnectionStatusViewModel` status enum gains `.degraded(reason:)` between `.connected` and `.error`.
- `TestConnectionProbe` result enum gains `suggestedRemoteHome: String?` carrying any alternate-location hit.
### Known follow-ups (not in 2.0.1)
- `TestConnectionProbe` uses a direct-argv ssh invocation that's functionally correct but fragile (works by accident when split across the login shell). Should be ported to the stdin-pipe pattern the diagnostics sheet now uses.
- Remaining `try?`-swallowed read paths beyond the four Dashboard-surfacing ones — Cron, Memory, Skills, MCP Servers, Platforms still silently render empty on read errors. Same fix pattern applies, low priority.
- `hermesBinaryHint` is only populated when the user clicks Test Connection; if they skip it, ACP chat and CLI calls fall back to bare `hermes` which requires it on the non-interactive PATH (rarely true for `~/.local/bin` installs). The connection-pill's second-tier probe could auto-populate this.
- Docker-host support: when users SSH to a Docker host, `pgrep` and `~/.hermes/` on the host don't see what's inside the container. Needs a `docker exec` wrapping option per server.
+41
View File
@@ -0,0 +1,41 @@
## What's New in 2.0.2
The actual root cause of [#19](https://github.com/awizemann/scarf/issues/19), found and patched by Scarf's first external contributor. v2.0.1 added the diagnostics UI assuming file-perm root cause; v2.0.2 fixes the underlying bug for everyone, regardless of perms.
### macOS Unix domain socket path limit (the real #19)
OpenSSH's ControlMaster multiplexes our bursty stat/cat/cp traffic over one TCP session per host. The socket path is bound by `bind(2)` to a Unix domain socket — and macOS' `sun_path` is **104 bytes including the NUL terminator**.
Scarf's old socket path was `~/Library/Caches/scarf/ssh/<%C>` where `%C` is OpenSSH's 64-char SHA1 hash of `(local user, host, port, remote user)`. For a username like `alex.maksimchuk`, the full path landed at **105 bytes** — one byte over the limit. ssh exited 255 with `unix_listener: path "..." too long for Unix domain socket`. Our `LogLevel=QUIET` flag (set so ACP's line-delimited JSON stays binary-clean) suppressed the diagnostic, and the user just saw "Remote command exited 255" — which the UI rendered as the silent empty-data state every reporter in #19 described.
The fix is to use a much shorter path:
```swift
"/tmp/scarf-ssh-\(getuid())" // ~17 bytes + 64 hash + sep + NUL = ~83 bytes
```
Per-user uid suffix keeps two local users' sockets from colliding in the shared `/tmp`, and 0700 perms on the dir keep them inaccessible to other users.
**Massive thanks to Alex Maksimchuk ([@aliatx2017](https://github.com/aliatx2017)) — Scarf's first external PR contributor — for diagnosing and patching this in [#20](https://github.com/awizemann/scarf/pull/20).** That diagnosis only happened because Alex bothered to read the codebase, reproduce against multiple usernames including a Termux/Android instance, and walk back from the cryptic exit code to the actual `bind()` failure. This release wouldn't exist without that work.
### Hardening on top of the fix
Three additions on top of Alex's patch, layered in via separate commits to keep the original change reviewable:
- **Defensive ownership check on the socket dir.** `/tmp` is world-writable, so a malicious local user could pre-create `/tmp/scarf-ssh-<uid>` and trick Scarf into using a hostile directory (we'd silently fail to chmod it back to 0700, since we wouldn't own it). `ensureControlDir` now uses POSIX `mkdir(0700)` (atomic, sets perms at create time) and on `EEXIST` runs `lstat` to verify the entry is a directory we own with mode 0700 — symlink → refuse, wrong owner → refuse + log to `os.Logger`, wrong mode → repair. Closes the `/tmp` pre-creation hole that's the standard concern for any per-user `/tmp` path.
- **Launch-time sweep of stale sockets.** `ServerRegistry.sweepOrphanCaches` already prunes orphaned snapshot directories on launch; it now also removes ControlMaster socket files older than 30 minutes. Socket basenames are `%C` hashes (not ServerIDs), so we can't keep "still registered" sockets the way the snapshot sweep does — but `ControlPersist` is 600s, so anything older than 30 minutes is guaranteed to be a dead orphan from a crashed master, an unclean app exit, or a server removed while another Scarf instance was holding the dir. Keeps `/tmp/scarf-ssh-<uid>/` from accumulating indefinitely until reboot, while leaving any concurrent Scarf instance's live sockets untouched.
- **Regression test for the path-length invariant.** `scarfTests` was a stub — it now has two tests: one asserting `controlDirPath().utf8.count + 1 + 64 + 1 ≤ 104` (would have caught the original #19 bug in CI), one asserting the path includes the current uid (pins the per-user-isolation invariant against a future "simplification" that drops it).
### v2.0.1 diagnostics work is still useful
The diagnostics sheet, orange "degraded" pill, dashboard error banner, and `remoteHome` auto-suggest from v2.0.1 all still ship — they just turn out not to have been the right diagnosis for the original three reporters. They remain valuable for the *other* connection-failure modes they were designed to surface (missing `sqlite3` on the remote, real permission errors, container/host visibility gaps, custom Hermes data directories). If you upgrade to v2.0.2 and *still* see incomplete data, run Remote Diagnostics from **Manage Servers → 🩺** and the sheet will tell you why.
### Migrating from 2.0.0 / 2.0.1 / draft 2.0.1
Sparkle will offer the update automatically. Settings and server list are preserved verbatim. The first time v2.0.2 connects to a remote, it'll create `/tmp/scarf-ssh-<uid>/` with mode 0700; the old `~/Library/Caches/scarf/ssh/` directory becomes unused (you can delete it manually, or leave it — macOS will sweep it eventually).
The previous v2.0.1 draft download remains available for anyone who already grabbed it — it's still a valid build with the diagnostics work. v2.0.2 is the recommended upgrade path.
### Reporters of #19
@cmalpass, @flyespresso, @maikokan — please grab v2.0.2 and confirm the dashboard populates without needing to run Remote Diagnostics first. If it still doesn't, the diagnostics sheet should now have a much better chance of pinpointing what's left.
+12 -12
View File
@@ -424,7 +424,7 @@
CODE_SIGN_ENTITLEMENTS = scarf/scarf.entitlements;
CODE_SIGN_STYLE = Automatic;
COMBINE_HIDPI_IMAGES = YES;
CURRENT_PROJECT_VERSION = 19;
CURRENT_PROJECT_VERSION = 21;
DEVELOPMENT_TEAM = 3Q6X2L86C4;
ENABLE_APP_SANDBOX = NO;
ENABLE_HARDENED_RUNTIME = YES;
@@ -436,7 +436,7 @@
"@executable_path/../Frameworks",
);
MACOSX_DEPLOYMENT_TARGET = 14.6;
MARKETING_VERSION = 2.0.0;
MARKETING_VERSION = 2.0.2;
PRODUCT_BUNDLE_IDENTIFIER = com.scarf.app;
PRODUCT_NAME = "$(TARGET_NAME)";
REGISTER_APP_GROUPS = YES;
@@ -458,7 +458,7 @@
CODE_SIGN_ENTITLEMENTS = scarf/scarf.entitlements;
CODE_SIGN_STYLE = Automatic;
COMBINE_HIDPI_IMAGES = YES;
CURRENT_PROJECT_VERSION = 19;
CURRENT_PROJECT_VERSION = 21;
DEVELOPMENT_TEAM = 3Q6X2L86C4;
ENABLE_APP_SANDBOX = NO;
ENABLE_HARDENED_RUNTIME = YES;
@@ -470,7 +470,7 @@
"@executable_path/../Frameworks",
);
MACOSX_DEPLOYMENT_TARGET = 14.6;
MARKETING_VERSION = 2.0.0;
MARKETING_VERSION = 2.0.2;
PRODUCT_BUNDLE_IDENTIFIER = com.scarf.app;
PRODUCT_NAME = "$(TARGET_NAME)";
REGISTER_APP_GROUPS = YES;
@@ -488,11 +488,11 @@
buildSettings = {
BUNDLE_LOADER = "$(TEST_HOST)";
CODE_SIGN_STYLE = Automatic;
CURRENT_PROJECT_VERSION = 19;
CURRENT_PROJECT_VERSION = 21;
DEVELOPMENT_TEAM = 3Q6X2L86C4;
GENERATE_INFOPLIST_FILE = YES;
MACOSX_DEPLOYMENT_TARGET = 26.2;
MARKETING_VERSION = 2.0.0;
MARKETING_VERSION = 2.0.2;
PRODUCT_BUNDLE_IDENTIFIER = com.scarfTests;
PRODUCT_NAME = "$(TARGET_NAME)";
STRING_CATALOG_GENERATE_SYMBOLS = NO;
@@ -509,11 +509,11 @@
buildSettings = {
BUNDLE_LOADER = "$(TEST_HOST)";
CODE_SIGN_STYLE = Automatic;
CURRENT_PROJECT_VERSION = 19;
CURRENT_PROJECT_VERSION = 21;
DEVELOPMENT_TEAM = 3Q6X2L86C4;
GENERATE_INFOPLIST_FILE = YES;
MACOSX_DEPLOYMENT_TARGET = 26.2;
MARKETING_VERSION = 2.0.0;
MARKETING_VERSION = 2.0.2;
PRODUCT_BUNDLE_IDENTIFIER = com.scarfTests;
PRODUCT_NAME = "$(TARGET_NAME)";
STRING_CATALOG_GENERATE_SYMBOLS = NO;
@@ -529,10 +529,10 @@
isa = XCBuildConfiguration;
buildSettings = {
CODE_SIGN_STYLE = Automatic;
CURRENT_PROJECT_VERSION = 19;
CURRENT_PROJECT_VERSION = 21;
DEVELOPMENT_TEAM = 3Q6X2L86C4;
GENERATE_INFOPLIST_FILE = YES;
MARKETING_VERSION = 2.0.0;
MARKETING_VERSION = 2.0.2;
PRODUCT_BUNDLE_IDENTIFIER = com.scarfUITests;
PRODUCT_NAME = "$(TARGET_NAME)";
STRING_CATALOG_GENERATE_SYMBOLS = NO;
@@ -548,10 +548,10 @@
isa = XCBuildConfiguration;
buildSettings = {
CODE_SIGN_STYLE = Automatic;
CURRENT_PROJECT_VERSION = 19;
CURRENT_PROJECT_VERSION = 21;
DEVELOPMENT_TEAM = 3Q6X2L86C4;
GENERATE_INFOPLIST_FILE = YES;
MARKETING_VERSION = 2.0.0;
MARKETING_VERSION = 2.0.2;
PRODUCT_BUNDLE_IDENTIFIER = com.scarfUITests;
PRODUCT_NAME = "$(TARGET_NAME)";
STRING_CATALOG_GENERATE_SYMBOLS = NO;
+4
View File
@@ -21,6 +21,10 @@ struct ContentView: View {
ServerSwitcherToolbar()
}
if serverContext.isRemote {
// `.principal` centers the pill in the toolbar
// the native emphasis bezel is the intended frame;
// the pill's own visual content (icon + label, no
// background) sits inside it in balance.
ToolbarItem(placement: .principal) {
ConnectionStatusPill(status: connectionStatus)
}
@@ -129,6 +129,7 @@ final class ServerRegistry {
var keep: Set<ServerID> = [ServerContext.local.id]
for entry in entries { keep.insert(entry.id) }
SSHTransport.sweepOrphanSnapshots(keeping: keep)
SSHTransport.sweepStaleControlSockets()
}
// MARK: - Persistence
@@ -1,5 +1,6 @@
import Foundation
import SQLite3
import os
/// Dedupes concurrent `snapshotSQLite` calls for the same server. When the
/// file watcher ticks, Dashboard + Sessions + Activity (+ Chat's loadHistory)
@@ -29,11 +30,18 @@ actor SnapshotCoordinator {
}
actor HermesDataService {
private static let logger = Logger(subsystem: "com.scarf", category: "HermesDataService")
private var db: OpaquePointer?
private var hasV07Schema = false
/// Local filesystem path we last opened. For remote contexts this is
/// the cached snapshot under `~/Library/Caches/scarf/snapshots/<id>/`.
private var openedAtPath: String?
/// Last error from `open()` / `refresh()`, user-presentable. `nil` means
/// the last attempt succeeded. Views surface this when their own load
/// path fails, so the user sees "Permission denied reading state.db"
/// instead of an empty Dashboard with no explanation.
private(set) var lastOpenError: String?
let context: ServerContext
private let transport: any ServerTransport
@@ -52,16 +60,25 @@ actor HermesDataService {
// corrupt. Routed through SnapshotCoordinator so concurrent
// view models don't each spawn a parallel SSH backup for the
// same server.
let url = try? await SnapshotCoordinator.shared.snapshot(
remotePath: context.paths.stateDB,
contextID: context.id,
transport: transport
)
guard let url else { return false }
localPath = url.path
do {
let url = try await SnapshotCoordinator.shared.snapshot(
remotePath: context.paths.stateDB,
contextID: context.id,
transport: transport
)
localPath = url.path
lastOpenError = nil
} catch {
lastOpenError = humanize(error)
Self.logger.warning("snapshotSQLite failed: \(error.localizedDescription, privacy: .public)")
return false
}
} else {
localPath = context.paths.stateDB
guard FileManager.default.fileExists(atPath: localPath) else { return false }
guard FileManager.default.fileExists(atPath: localPath) else {
lastOpenError = "Hermes state database not found at \(localPath)."
return false
}
}
// Remote snapshots are point-in-time copies that no one writes to;
// opening them with `immutable=1` tells SQLite to skip WAL/SHM and
@@ -81,14 +98,41 @@ actor HermesDataService {
}
let result = sqlite3_open_v2(openPath, &db, flags, nil)
guard result == SQLITE_OK else {
let msg: String
if let db {
msg = String(cString: sqlite3_errmsg(db))
} else {
msg = "sqlite3_open_v2 returned \(result)"
}
lastOpenError = "Couldn't open state.db: \(msg)"
Self.logger.warning("sqlite3_open_v2 failed (\(result)) at \(localPath, privacy: .public): \(msg, privacy: .public)")
db = nil
return false
}
openedAtPath = localPath
lastOpenError = nil
detectSchema()
return true
}
/// Turn a transport error into the one-line string Dashboard shows. Adds
/// hints for the common "sqlite3 not installed" and "permission denied"
/// cases so users know what to do.
private nonisolated func humanize(_ error: Error) -> String {
let desc = (error as? LocalizedError)?.errorDescription ?? error.localizedDescription
let lower = desc.lowercased()
if lower.contains("sqlite3: command not found") || lower.contains("sqlite3: not found") {
return "sqlite3 is not installed on \(context.displayName). Install it with `apt install sqlite3` (Ubuntu/Debian) or `yum install sqlite` (RHEL/Fedora)."
}
if lower.contains("permission denied") {
return "Permission denied reading Hermes state on \(context.displayName). The SSH user may not have read access to ~/.hermes/state.db — try Run Diagnostics."
}
if lower.contains("no such file") {
return "Hermes state not found at ~/.hermes on \(context.displayName). If Hermes is installed elsewhere, set its data directory in Manage Servers."
}
return desc
}
/// Force a fresh snapshot pull + reopen. Used on session-load and in
/// any path that needs the UI to reflect writes Hermes just made.
/// Without this, remote snapshots would be frozen at the first `open()`
+144 -22
View File
@@ -1,7 +1,10 @@
import Foundation
import os
struct HermesFileService: Sendable {
nonisolated static let logger = Logger(subsystem: "com.scarf", category: "HermesFileService")
let context: ServerContext
let transport: any ServerTransport
@@ -17,6 +20,14 @@ struct HermesFileService: Sendable {
return parseConfig(content)
}
/// Error-surfacing config load. Used by Dashboard to show the user a
/// specific reason when config.yaml can't be read on a remote host
/// (permission denied, missing file, sqlite3 not installed, etc.)
/// instead of silently falling back to `.empty`.
nonisolated func loadConfigResult() -> Result<HermesConfig, Error> {
readFileResult(context.paths.configYAML).map { parseConfig($0) }
}
nonisolated private func parseConfig(_ yaml: String) -> HermesConfig {
let parsed = Self.parseNestedYAML(yaml)
let values = parsed.values
@@ -385,11 +396,34 @@ struct HermesFileService: Sendable {
do {
return try JSONDecoder().decode(GatewayState.self, from: data)
} catch {
print("[Scarf] Failed to decode gateway state: \(error.localizedDescription)")
Self.logger.warning("Failed to decode gateway state: \(error.localizedDescription, privacy: .public)")
return nil
}
}
/// Error-surfacing gateway-state load. `.success(nil)` means the file
/// doesn't exist yet (gateway hasn't written state normal when Hermes
/// is stopped). `.failure` means the file exists but couldn't be read
/// (permission denied, connection down, JSON corruption).
nonisolated func loadGatewayStateResult() -> Result<GatewayState?, Error> {
// Distinguish "file doesn't exist yet" (normal, returns .success(nil))
// from "file exists but we can't read or parse it" (error).
if !transport.fileExists(context.paths.gatewayStateJSON) {
return .success(nil)
}
switch readFileDataResult(context.paths.gatewayStateJSON) {
case .success(let data):
do {
return .success(try JSONDecoder().decode(GatewayState.self, from: data))
} catch {
Self.logger.warning("Failed to decode gateway state: \(error.localizedDescription, privacy: .public)")
return .failure(error)
}
case .failure(let err):
return .failure(err)
}
}
// MARK: - Memory
nonisolated func loadMemoryProfiles() -> [String] {
@@ -1173,22 +1207,45 @@ struct HermesFileService: Sendable {
}
nonisolated func hermesPID() -> pid_t? {
// Run `pgrep -f hermes` either locally or via the transport. On
// remote hosts we trust `pgrep` to be present it's standard on
// Linux and macOS. On failure we conservatively return nil rather
// than pretending Hermes is down: the caller will see
// isHermesRunning==false, which is already the "unknown" UX.
let result = try? transport.runProcess(
executable: "/usr/bin/pgrep",
args: ["-f", "hermes"],
stdin: nil,
timeout: 5
)
guard let result, let firstLine = result.stdoutString
.components(separatedBy: "\n")
.first(where: { !$0.isEmpty }),
let pid = pid_t(firstLine.trimmingCharacters(in: .whitespaces)) else { return nil }
return pid
switch hermesPIDResult() {
case .success(let pid): return pid
case .failure: return nil
}
}
/// Error-surfacing variant. `.success(nil)` means `pgrep` ran successfully
/// and found no hermes process (Hermes is genuinely not running).
/// `.failure` means we couldn't probe at all (pgrep missing, connection
/// down, permission issue) a *different* UX from "not running".
nonisolated func hermesPIDResult() -> Result<pid_t?, Error> {
do {
let result = try transport.runProcess(
executable: "/usr/bin/pgrep",
args: ["-f", "hermes"],
stdin: nil,
timeout: 5
)
// pgrep exits 1 when nothing matches that's "not running", NOT an
// error. Anything else (127=command not found, 255=ssh failure) is.
if result.exitCode == 0 {
if let firstLine = result.stdoutString
.components(separatedBy: "\n")
.first(where: { !$0.isEmpty }),
let pid = pid_t(firstLine.trimmingCharacters(in: .whitespaces)) {
return .success(pid)
}
return .success(nil)
} else if result.exitCode == 1 {
return .success(nil) // genuinely not running
} else {
let err = TransportError.commandFailed(exitCode: result.exitCode, stderr: result.stderrString)
Self.logger.warning("pgrep failed (exit \(result.exitCode)): \(result.stderrString, privacy: .public)")
return .failure(err)
}
} catch {
Self.logger.warning("pgrep transport error: \(error.localizedDescription, privacy: .public)")
return .failure(error)
}
}
@discardableResult
@@ -1488,15 +1545,80 @@ struct HermesFileService: Sendable {
// MARK: - File I/O
/// Read a UTF-8 text file through the transport. Missing files and any
/// transport error surface as `nil` callers treat missing/unreadable
/// the same way they always have.
/// transport error surface as `nil` callers that don't need the
/// specific error reason keep using this. New call sites that want to
/// show a user-actionable message should use `readFileResult`.
nonisolated private func readFile(_ path: String) -> String? {
guard let data = try? transport.readFile(path) else { return nil }
return String(data: data, encoding: .utf8)
switch readFileResult(path) {
case .success(let s):
return s
case .failure:
return nil
}
}
nonisolated private func readFileData(_ path: String) -> Data? {
try? transport.readFile(path)
switch readFileDataResult(path) {
case .success(let d):
return d
case .failure:
return nil
}
}
/// Error-surfacing read. Returns the decoded text on success, or the
/// underlying `TransportError` (or raw error for local failures) on
/// failure. Every failure is also logged via `os.Logger` the warning
/// trail in Console.app is how we diagnose "connection green, data
/// empty" bug reports without needing to wire the error through every
/// existing call site.
nonisolated func readFileResult(_ path: String) -> Result<String, Error> {
switch readFileDataResult(path) {
case .success(let data):
guard let s = String(data: data, encoding: .utf8) else {
let err = TransportError.fileIO(path: path, underlying: "file is not valid UTF-8")
Self.logger.warning("readFile(\(path, privacy: .public)): not UTF-8")
return .failure(err)
}
return .success(s)
case .failure(let err):
return .failure(err)
}
}
nonisolated func readFileDataResult(_ path: String) -> Result<Data, Error> {
do {
let data = try transport.readFile(path)
return .success(data)
} catch {
// Don't log "No such file" that's a routine, expected case
// for optional files (skill.yaml, gateway_state.json before
// Hermes starts, ~/.hermes/memories/USER.md on fresh installs,
// etc.). The caller still gets the Result.failure so it can
// distinguish missing from present-but-unreadable.
// Log everything else permission denied, connection drops,
// sqlite3 missing since those are actionable diagnostics.
if !Self.isFileNotFound(error) {
Self.logger.warning("readFile(\(path, privacy: .public)) failed: \(error.localizedDescription, privacy: .public)")
}
return .failure(error)
}
}
/// `true` iff the error represents "file does not exist" as opposed to
/// a permission / transport / parse failure. Used to suppress routine
/// logging for optional files while still surfacing real problems.
nonisolated private static func isFileNotFound(_ error: Error) -> Bool {
if let transportErr = error as? TransportError,
case .fileIO(_, let underlying) = transportErr {
return underlying.lowercased().contains("no such file")
}
// Cocoa NSFileNoSuchFileError (returned by LocalTransport when
// reading a missing file via FileManager).
let ns = error as NSError
if ns.domain == NSCocoaErrorDomain && ns.code == 260 { return true }
if ns.domain == NSPOSIXErrorDomain && ns.code == 2 { return true } // ENOENT
return false
}
/// Write a UTF-8 text file atomically through the transport. Matches the
+71 -9
View File
@@ -52,10 +52,13 @@ struct SSHTransport: ServerTransport {
/// per-host via OpenSSH's `%C` token). Exposed as a static so
/// cleanup paths (`ServerRegistry.removeServer`, app-launch sweep) can
/// compute it without instantiating a transport.
///
/// Uses a short path under /tmp to stay within the 104-byte macOS
/// Unix domain socket limit. The Caches path
/// (~/Library/Caches/scarf/ssh/%C) can exceed this limit when the
/// username is long, causing ssh to exit 255.
nonisolated static func controlDirPath() -> String {
let base = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first?.path
?? NSHomeDirectory() + "/Library/Caches"
return base + "/scarf/ssh"
return "/tmp/scarf-ssh-\(getuid())"
}
/// Snapshot cache directory for a given server. Stable per-ID so repeated
@@ -94,6 +97,31 @@ struct SSHTransport: ServerTransport {
}
}
/// Remove ControlMaster socket files older than `staleAfter` seconds.
///
/// Socket basenames are %C hashes (not ServerIDs), so we can't keep "still
/// registered" sockets the way `sweepOrphanSnapshots` does. But
/// `ControlPersist` is 600s anything older than 30 minutes is guaranteed
/// to be a dead orphan from a crashed master, an unclean app exit, or a
/// server removed while another Scarf instance was holding the dir.
/// Wiping these on launch keeps `/tmp/scarf-ssh-<uid>/` from accumulating
/// indefinitely until reboot, while leaving any concurrent Scarf
/// instance's live sockets (always <600s old) untouched.
static func sweepStaleControlSockets(staleAfter: TimeInterval = 1800) {
let root = controlDirPath()
guard let entries = try? FileManager.default.contentsOfDirectory(atPath: root) else { return }
let cutoff = Date().addingTimeInterval(-staleAfter)
for name in entries {
let path = root + "/" + name
guard let attrs = try? FileManager.default.attributesOfItem(atPath: path),
let mtime = attrs[.modificationDate] as? Date
else { continue }
if mtime < cutoff {
try? FileManager.default.removeItem(atPath: path)
}
}
}
/// Ask OpenSSH to shut down this host's ControlMaster socket, so the TCP
/// session isn't held open after the user removes this server. If no
/// master is currently running, `ssh -O exit` exits non-zero we ignore
@@ -137,13 +165,47 @@ struct SSHTransport: ServerTransport {
return args
}
/// Ensure the ControlMaster socket directory exists. Called before every
/// ssh invocation. Cheap `createDirectory(withIntermediateDirectories: true)`
/// is a no-op when present.
/// Ensure the ControlMaster socket directory exists, is a real directory
/// (not a symlink), is owned by us, and has mode 0700. Called before every
/// ssh invocation.
///
/// Defensive against `/tmp` pre-creation: any local user can create
/// `/tmp/scarf-ssh-<uid>` before Scarf launches. Plain `mkdir -p` plus
/// `setAttributes` would silently accept a hostile dir (since the chmod
/// fails when we don't own it, and the Foundation API swallows that). So
/// we use POSIX `mkdir` (atomic, sets perms at create time, doesn't
/// follow symlinks) and `lstat` to verify ownership when the entry
/// already exists.
nonisolated private func ensureControlDir() {
try? FileManager.default.createDirectory(atPath: controlDir, withIntermediateDirectories: true)
// 0700 so socket files aren't visible to other users on the Mac.
try? FileManager.default.setAttributes([.posixPermissions: 0o700], ofItemAtPath: controlDir)
let path = controlDir
let mkResult = path.withCString { mkdir($0, 0o700) }
if mkResult == 0 { return }
let mkErr = errno
if mkErr != EEXIST {
Self.logger.error("Failed to create ControlDir \(path, privacy: .public): errno=\(mkErr)")
return
}
var st = Darwin.stat()
let lstatResult = path.withCString { lstat($0, &st) }
guard lstatResult == 0 else {
Self.logger.error("Could not lstat existing ControlDir \(path, privacy: .public): errno=\(errno)")
return
}
guard (st.st_mode & S_IFMT) == S_IFDIR else {
Self.logger.error("ControlDir \(path, privacy: .public) exists but is not a directory (possibly a symlink) — refusing to use")
return
}
guard st.st_uid == getuid() else {
Self.logger.error("ControlDir \(path, privacy: .public) owned by uid \(st.st_uid), expected \(getuid()) — refusing to use")
return
}
if (st.st_mode & 0o777) != 0o700 {
Self.logger.warning("ControlDir \(path, privacy: .public) had mode \(String(st.st_mode & 0o777, radix: 8), privacy: .public), repairing to 700")
_ = path.withCString { chmod($0, 0o700) }
}
}
/// Shell-quote a single argument for remote execution. The remote shell
@@ -21,28 +21,73 @@ final class DashboardViewModel {
var hermesRunning = false
var isLoading = true
/// User-presentable error banner. Set when any of the remote reads
/// (state.db snapshot, config.yaml, gateway_state.json, pgrep) failed
/// in a way that's not just "file doesn't exist yet". Dashboard renders
/// this above the stats with a "Run Diagnostics" button. `nil` = no
/// surfaceable error.
var lastReadError: String?
func load() async {
isLoading = true
// refresh() = close + reopen, forces a fresh remote snapshot. Cheap
// on local (live DB reopen).
let opened = await dataService.refresh()
var collectedErrors: [String] = []
if opened {
stats = await dataService.fetchStats()
recentSessions = await dataService.fetchSessions(limit: 5)
sessionPreviews = await dataService.fetchSessionPreviews(limit: 5)
await dataService.close()
} else if let msg = await dataService.lastOpenError {
collectedErrors.append(msg)
}
// The fileService methods are synchronous and route through the
// transport. For remote contexts each call is a blocking ssh
// round-trip do them off the main thread to avoid spinning the
// beach ball during the load.
let svc = fileService
let (cfg, gw, running) = await Task.detached {
(svc.loadConfig(), svc.loadGatewayState(), svc.isHermesRunning())
struct LoadResults: Sendable {
let cfg: Result<HermesConfig, Error>
let gw: Result<GatewayState?, Error>
let running: Result<pid_t?, Error>
}
let results = await Task.detached { () -> LoadResults in
LoadResults(
cfg: svc.loadConfigResult(),
gw: svc.loadGatewayStateResult(),
running: svc.hermesPIDResult()
)
}.value
config = cfg
gatewayState = gw
hermesRunning = running
switch results.cfg {
case .success(let c): config = c
case .failure(let e):
config = .empty
collectedErrors.append("config.yaml — \(e.localizedDescription)")
}
switch results.gw {
case .success(let g): gatewayState = g
case .failure(let e):
gatewayState = nil
collectedErrors.append("gateway_state.json — \(e.localizedDescription)")
}
switch results.running {
case .success(let pid): hermesRunning = (pid != nil)
case .failure(let e):
hermesRunning = false
collectedErrors.append("pgrep — \(e.localizedDescription)")
}
// Only surface when there's a real error AND we're on a remote
// context. Local contexts rarely hit these paths (live DB, local
// filesystem), and a transient "file doesn't exist yet" on fresh
// installs shouldn't scare users.
if context.isRemote, !collectedErrors.isEmpty {
lastReadError = collectedErrors.joined(separator: "\n")
} else {
lastReadError = nil
}
isLoading = false
}
}
@@ -2,6 +2,7 @@ import SwiftUI
struct DashboardView: View {
@State private var viewModel: DashboardViewModel
@State private var showDiagnostics = false
@Environment(AppCoordinator.self) private var coordinator
@Environment(HermesFileWatcher.self) private var fileWatcher
@@ -13,6 +14,9 @@ struct DashboardView: View {
var body: some View {
ScrollView {
VStack(alignment: .leading, spacing: 20) {
if let err = viewModel.lastReadError {
readErrorBanner(err)
}
statusSection
statsSection
recentSessionsSection
@@ -30,6 +34,44 @@ struct DashboardView: View {
.onChange(of: fileWatcher.lastChangeDate) {
Task { await viewModel.load() }
}
.sheet(isPresented: $showDiagnostics) {
RemoteDiagnosticsView(context: viewModel.context)
}
}
/// Banner shown above the Dashboard when one or more remote reads
/// failed (permission denied, missing sqlite3, wrong home dir, etc.).
/// Replaces the old silent-failure mode where empty values just
/// appeared as "Stopped / unknown / 0" with no explanation.
private func readErrorBanner(_ err: String) -> some View {
VStack(alignment: .leading, spacing: 8) {
HStack(alignment: .top, spacing: 8) {
Image(systemName: "exclamationmark.triangle.fill")
.foregroundStyle(.orange)
VStack(alignment: .leading, spacing: 4) {
Text("Can't read Hermes state on \(viewModel.context.displayName)")
.font(.headline)
Text(err)
.font(.caption.monospaced())
.foregroundStyle(.secondary)
.textSelection(.enabled)
.fixedSize(horizontal: false, vertical: true)
}
Spacer()
Button {
showDiagnostics = true
} label: {
Label("Run Diagnostics…", systemImage: "stethoscope")
}
.controlSize(.regular)
}
}
.padding(12)
.background(Color.orange.opacity(0.1), in: RoundedRectangle(cornerRadius: 8))
.overlay(
RoundedRectangle(cornerRadius: 8)
.strokeBorder(Color.orange.opacity(0.3), lineWidth: 1)
)
}
private var statusSection: some View {
@@ -22,7 +22,12 @@ final class AddServerViewModel {
var testResult: TestResult?
enum TestResult: Equatable {
case success(hermesPath: String, dbFound: Bool)
/// `suggestedRemoteHome` is non-nil when the probe didn't find
/// state.db at the configured (or default) path but did find a
/// `state.db` at one of the well-known alternates (e.g. a systemd
/// install in `/var/lib/hermes/.hermes`). UI offers a one-click
/// fill so the user doesn't have to know the convention.
case success(hermesPath: String, dbFound: Bool, suggestedRemoteHome: String?)
/// `command` is the full ssh invocation we attempted (so the user can
/// paste it into Terminal to see what their shell does with it).
/// `stderr` is whatever ssh / the remote shell wrote to stderr.
@@ -95,7 +100,7 @@ final class AddServerViewModel {
/// `hermesBinaryHint` so subsequent calls don't need to re-resolve it.
func configForSave() -> SSHConfig {
var cfg = draftConfig
if case .success(let path, _) = testResult {
if case .success(let path, _, _) = testResult {
cfg.hermesBinaryHint = path
}
return cfg
@@ -11,8 +11,12 @@ final class ConnectionStatusViewModel {
private let logger = Logger(subsystem: "com.scarf", category: "ConnectionStatus")
enum Status: Equatable {
/// Healthy: most recent probe succeeded.
/// Healthy: SSH connected AND we can read `~/.hermes/config.yaml`.
case connected
/// SSH connects but the follow-up read-access probe failed. Data
/// views will be empty until this is resolved. `reason` is shown
/// in the pill tooltip; users click the pill to open diagnostics.
case degraded(reason: String)
/// No probe yet or the previous probe timed out but we haven't
/// confirmed failure. Shown as yellow to tell the user "checking".
case idle
@@ -72,20 +76,59 @@ final class ConnectionStatusViewModel {
private func probeOnce() async {
let snapshot = transport
let result: Result<Void, TransportError>
// Transport IO on a detached task so we don't block MainActor.
result = await Task.detached {
let hermesHome = context.paths.home
// Two-tier probe in one SSH round-trip:
// tier 1: `true` raw connectivity / auth / ControlMaster path
// tier 2: `test -r $HERMESHOME/config.yaml` can we actually
// read the file Dashboard reads on every tick? Green pill
// only if both pass; yellow "degraded" if tier 1 passes
// but tier 2 fails (the exact symptom in issue #19).
// Script emits two lines: TIER1:<exitcode> and TIER2:<exitcode>.
let homeArg: String
if hermesHome.hasPrefix("~/") {
homeArg = "\"$HOME/\(hermesHome.dropFirst(2))\""
} else if hermesHome == "~" {
homeArg = "\"$HOME\""
} else {
homeArg = "\"\(hermesHome.replacingOccurrences(of: "\"", with: "\\\""))\""
}
let script = """
echo TIER1:0
H=\(homeArg)
if [ -r "$H/config.yaml" ]; then echo TIER2:0; else echo TIER2:1; fi
"""
enum ProbeOutcome {
case connected
case degraded(reason: String)
case failure(TransportError)
}
let outcome: ProbeOutcome = await Task.detached {
do {
let probe = try snapshot.runProcess(
executable: "/bin/sh",
args: ["-c", "true"],
args: ["-c", script],
stdin: nil,
timeout: 10
)
if probe.exitCode == 0 {
return .success(())
guard probe.exitCode == 0 else {
return .failure(.commandFailed(exitCode: probe.exitCode, stderr: probe.stderrString))
}
return .failure(.commandFailed(exitCode: probe.exitCode, stderr: probe.stderrString))
let out = probe.stdoutString
let tier1 = out.contains("TIER1:0")
let tier2 = out.contains("TIER2:0")
if !tier1 {
// The script itself didn't reach tier 1 treat as connection failure.
return .failure(.commandFailed(exitCode: 1, stderr: out))
}
if tier2 {
return .connected
}
// Connected but can't read config.yaml the core issue #19
// symptom. Give the pill a short reason; the full story goes
// into Remote Diagnostics.
return .degraded(reason: "can't read ~/.hermes/config.yaml")
} catch let e as TransportError {
return .failure(e)
} catch {
@@ -93,11 +136,15 @@ final class ConnectionStatusViewModel {
}
}.value
switch result {
case .success:
switch outcome {
case .connected:
status = .connected
lastSuccess = Date()
consecutiveFailures = 0
case .degraded(let reason):
status = .degraded(reason: reason)
lastSuccess = Date() // SSH itself is fine, reset failure count
consecutiveFailures = 0
case .failure(let err):
consecutiveFailures += 1
// First failure silent yellow "Reconnecting" while we try
@@ -0,0 +1,474 @@
import Foundation
import os
/// Runs a fixed check-list against a remote server and reports per-probe
/// pass/fail. Exists because `TestConnectionProbe` only verifies ssh
/// connectivity + hermes binary presence, and `ConnectionStatusViewModel`
/// only pings `/bin/sh -c true`. When users file "connection green but
/// everything empty" bug reports (issue #19), this is the diagnostic surface
/// that tells them (and us) exactly which read fails and why.
///
/// One shell invocation runs every check on the remote and emits a
/// line-delimited `KEY|STATUS|DETAIL` protocol that the view model parses.
/// Cheaper than one SSH round-trip per probe and gives a consistent shell
/// environment across all probes.
@Observable
@MainActor
final class RemoteDiagnosticsViewModel {
private static let logger = Logger(subsystem: "com.scarf", category: "RemoteDiagnostics")
let context: ServerContext
/// Probes in display order. The order matters: connectivity first, then
/// environment checks, then Hermes data-path checks. A failure early in
/// the list usually explains every subsequent failure.
enum ProbeID: String, CaseIterable, Identifiable {
case connectivity
case remoteUser
case remoteHome
case hermesHomeConfigured
case hermesDirExists
case hermesDirReadable
case configYAMLReadable
case configYAMLContents
case stateDBReadable
case sqlite3Installed
case sqlite3CanOpenStateDB
case hermesBinaryNonLogin
case hermesBinaryLogin
case pgrepAvailable
var id: String { rawValue }
/// Human-readable title rendered in the diagnostics sheet.
var title: String {
switch self {
case .connectivity: return "SSH connectivity"
case .remoteUser: return "Remote user identity"
case .remoteHome: return "Remote $HOME"
case .hermesHomeConfigured: return "Hermes home directory"
case .hermesDirExists: return "Hermes directory exists"
case .hermesDirReadable: return "Hermes directory readable"
case .configYAMLReadable: return "config.yaml readable"
case .configYAMLContents: return "config.yaml actually readable (content)"
case .stateDBReadable: return "state.db readable"
case .sqlite3Installed: return "sqlite3 binary installed on remote"
case .sqlite3CanOpenStateDB: return "sqlite3 can open state.db"
case .hermesBinaryNonLogin: return "hermes binary on non-login PATH"
case .hermesBinaryLogin: return "hermes binary on login PATH (via rc files)"
case .pgrepAvailable: return "pgrep available (for 'is Hermes running')"
}
}
/// When the check fails, show this hint alongside the stderr.
var failureHint: String? {
switch self {
case .connectivity:
return "SSH itself can't complete. Before re-testing in Scarf, confirm `ssh <host>` works in Terminal."
case .remoteUser, .remoteHome:
return nil
case .hermesHomeConfigured:
return nil
case .hermesDirExists:
return "Scarf is looking at the default `~/.hermes`. If Hermes is installed elsewhere (e.g. `/var/lib/hermes/.hermes` for systemd installs), set the Hermes home directory in Manage Servers → this server → Edit."
case .hermesDirReadable:
return "The SSH user can see `~/.hermes` but can't list it. Check permissions: `ls -ld ~/.hermes` on the remote — the SSH user needs at least `r-x`."
case .configYAMLReadable, .configYAMLContents:
return "Scarf can't read `config.yaml`. This usually means the SSH user is different from the user Hermes runs as. Either (a) run Hermes as the SSH user, (b) `chmod a+r ~/.hermes/config.yaml`, or (c) configure Scarf to SSH as the Hermes user."
case .stateDBReadable:
return "Scarf can't read `state.db` — Sessions, Activity, Dashboard stats all depend on this. Same fix pattern as config.yaml."
case .sqlite3Installed:
return "Scarf pulls a snapshot of state.db via `sqlite3 .backup`, so sqlite3 must be installed on the remote. Install: `sudo apt install sqlite3` (Ubuntu/Debian), `sudo yum install sqlite` (RHEL/Fedora), `apk add sqlite` (Alpine)."
case .sqlite3CanOpenStateDB:
return "sqlite3 exists but can't open state.db. Could be a permission issue, a corrupt DB, or a version skew."
case .hermesBinaryNonLogin:
return "Scarf's runtime calls use non-login SSH shells (no .bashrc). If `hermes` only appears here via the login path, runtime CLI calls will fail. Move your PATH export from `.bashrc` to `.zshenv` or `.profile`."
case .hermesBinaryLogin:
return "hermes couldn't be located even after sourcing login rc files. Install path is non-standard — set the hermes binary path manually in Manage Servers."
case .pgrepAvailable:
return "pgrep not found on remote. Dashboard can't determine whether Hermes is running. Install procps: `apt install procps` (most distros have it by default)."
}
}
}
struct Probe: Identifiable, Sendable {
let id: ProbeID
let passed: Bool
let detail: String
}
private(set) var probes: [Probe] = []
private(set) var isRunning: Bool = false
private(set) var startedAt: Date?
private(set) var finishedAt: Date?
/// Raw stdout/stderr from the most recent run, preserved so the UI can
/// surface them in a disclosure panel when things look wrong. This is
/// how we debug cases where the script ran but no probes were parsed
/// (e.g. transport-quoting bugs, dash-vs-bash incompatibilities).
private(set) var rawStdout: String = ""
private(set) var rawStderr: String = ""
private(set) var rawExitCode: Int32 = 0
init(context: ServerContext) {
self.context = context
}
/// Kick off the full check list. Safe to call again to re-run.
func run() async {
if isRunning { return }
isRunning = true
probes = []
startedAt = Date()
finishedAt = nil
let script = Self.buildScript(hermesHome: context.paths.home)
let captured = await Self.execute(script: script, context: context)
switch captured {
case .connectFailure(let msg):
rawStdout = ""
rawStderr = msg
rawExitCode = -1
probes = [
Probe(id: .connectivity, passed: false, detail: msg)
] + ProbeID.allCases
.filter { $0 != .connectivity }
.map { Probe(id: $0, passed: false, detail: "(skipped — SSH didn't connect)") }
case .completed(let stdout, let stderr, let exitCode):
rawStdout = stdout
rawStderr = stderr
rawExitCode = exitCode
probes = Self.parse(stdout: stdout, stderr: stderr, exitCode: exitCode)
}
finishedAt = Date()
isRunning = false
Self.logger.info("Diagnostics for \(self.context.displayName, privacy: .public) finished — \(self.passingCount)/\(self.probes.count) passing")
}
/// Quick summary string, e.g. "9/14 passing". Used in the header.
var summary: String {
guard !probes.isEmpty else { return "Not yet run." }
return "\(passingCount)/\(probes.count) checks passing"
}
var passingCount: Int {
probes.filter { $0.passed }.count
}
var allPassed: Bool {
!probes.isEmpty && passingCount == probes.count
}
// MARK: - Script + parsing
/// Build the remote shell script. Uses a pipe-delimited protocol so the
/// Swift side can parse without regex surprises. Status is `PASS` or
/// `FAIL`; detail is a single line (can be blank). `__END__` at the
/// bottom lets us detect truncation.
private static func buildScript(hermesHome: String) -> String {
// Shell-quote the home path user may have typed `~/.hermes` which
// we want the remote shell to expand, so we substitute `~/` with
// `$HOME/` like `SSHTransport.remotePathArg` does.
let expanded: String
if hermesHome.hasPrefix("~/") {
expanded = "\"$HOME/\(hermesHome.dropFirst(2))\""
} else if hermesHome == "~" {
expanded = "\"$HOME\""
} else {
// Absolute path still quote in case of spaces.
expanded = "\"\(hermesHome.replacingOccurrences(of: "\"", with: "\\\""))\""
}
return #"""
H=\#(expanded)
emit() { printf '%s|%s|%s\n' "$1" "$2" "$3"; }
emit connectivity PASS "(running in this shell)"
user=$(id -un 2>/dev/null || echo unknown)
emit remoteUser PASS "$user"
emit remoteHome PASS "$HOME"
emit hermesHomeConfigured PASS "$H"
if [ -d "$H" ]; then
emit hermesDirExists PASS "$H"
else
emit hermesDirExists FAIL "not a directory: $H"
fi
if [ -r "$H" ] && [ -x "$H" ]; then
emit hermesDirReadable PASS ""
else
emit hermesDirReadable FAIL "cannot read/enter $H (check perms on the dir)"
fi
if [ -r "$H/config.yaml" ]; then
emit configYAMLReadable PASS ""
else
if [ -e "$H/config.yaml" ]; then
emit configYAMLReadable FAIL "exists but not readable by $user"
else
emit configYAMLReadable FAIL "file does not exist"
fi
fi
if head -c 1 "$H/config.yaml" > /dev/null 2>&1; then
size=$(wc -c < "$H/config.yaml" 2>/dev/null | tr -d ' ')
emit configYAMLContents PASS "${size} bytes"
else
emit configYAMLContents FAIL "cannot read file contents"
fi
if [ -r "$H/state.db" ]; then
size=$(wc -c < "$H/state.db" 2>/dev/null | tr -d ' ')
emit stateDBReadable PASS "${size} bytes"
else
if [ -e "$H/state.db" ]; then
emit stateDBReadable FAIL "exists but not readable by $user"
else
emit stateDBReadable FAIL "file does not exist"
fi
fi
if command -v sqlite3 > /dev/null 2>&1; then
sq=$(command -v sqlite3)
emit sqlite3Installed PASS "$sq"
else
emit sqlite3Installed FAIL "sqlite3 not on PATH"
fi
if sqlite3 "$H/state.db" 'SELECT 1' > /dev/null 2>&1; then
emit sqlite3CanOpenStateDB PASS ""
else
err=$(sqlite3 "$H/state.db" 'SELECT 1' 2>&1 | head -1)
emit sqlite3CanOpenStateDB FAIL "$err"
fi
# Non-login PATH: just ask the current shell.
hpath=$(command -v hermes 2>/dev/null)
if [ -n "$hpath" ]; then
emit hermesBinaryNonLogin PASS "$hpath"
else
emit hermesBinaryNonLogin FAIL "not on non-login PATH ($PATH)"
fi
# Login PATH: source rc files (mirroring TestConnectionProbe) and re-probe.
for rc in "$HOME/.zshenv" "$HOME/.zprofile" "$HOME/.bash_profile" "$HOME/.profile"; do
[ -f "$rc" ] && . "$rc" 2>/dev/null
done
hpath2=$(command -v hermes 2>/dev/null)
if [ -z "$hpath2" ]; then
for cand in "$HOME/.local/bin/hermes" "/opt/homebrew/bin/hermes" "/usr/local/bin/hermes" "$HOME/.hermes/bin/hermes"; do
if [ -x "$cand" ]; then hpath2="$cand"; break; fi
done
fi
if [ -n "$hpath2" ]; then
emit hermesBinaryLogin PASS "$hpath2"
else
emit hermesBinaryLogin FAIL "not found after sourcing rc files"
fi
if command -v pgrep > /dev/null 2>&1; then
emit pgrepAvailable PASS "$(command -v pgrep)"
else
emit pgrepAvailable FAIL "pgrep not on PATH"
fi
printf '__END__\n'
"""#
}
enum Captured {
case connectFailure(String)
case completed(stdout: String, stderr: String, exitCode: Int32)
}
private static func execute(script: String, context: ServerContext) async -> Captured {
// Can't use `transport.runProcess(executable: "/bin/sh", args: ["-c", script])`
// here: SSHTransport.runProcess pipes every argument through
// `remotePathArg` (which double-quotes to rewrite `~/` `$HOME/`),
// which mangles a multi-line shell script containing `"$1"`,
// nested quotes, and `printf` escape sequences. The result on the
// remote is a scrambled string and every probe fails to emit.
//
// Mirror TestConnectionProbe's approach: build the ssh argv
// directly so the script travels as a single opaque argv entry
// that ssh forwards to the remote shell unchanged.
switch context.kind {
case .local:
return await runLocally(script: script)
case .ssh(let config):
return await runOverSSH(script: script, config: config)
}
}
/// Direct ssh invocation. Pipes the script into `sh` on stdin rather
/// than passing it as `sh -c <script>` argv because ssh concatenates
/// argv with spaces and sends that as a single command string to the
/// remote's LOGIN shell, which then parses newlines as command
/// separators. A multi-line `sh -c <script>` would run only the first
/// line inside the `sh` subprocess (any variables set there die when
/// `sh` exits), and the rest would run in the login shell with no
/// access to those variables. Symptom: `$H=""` everywhere downstream.
///
/// Feeding the script via stdin avoids the split entirely `sh -s`
/// consumes the whole stream in one process, so variable scope is
/// preserved and the script runs exactly the same way it would from
/// a local `cat script.sh | sh`.
private static func runOverSSH(script: String, config: SSHConfig) async -> Captured {
var sshArgv: [String] = [
"-o", "ControlMaster=auto",
"-o", "ControlPath=\(controlDirPath())/%C",
"-o", "ControlPersist=600",
"-o", "ServerAliveInterval=30",
"-o", "ConnectTimeout=10",
"-o", "StrictHostKeyChecking=accept-new",
"-o", "LogLevel=QUIET",
"-o", "BatchMode=yes",
"-T" // no pty keep stdin/stdout a clean byte stream
]
if let port = config.port { sshArgv += ["-p", String(port)] }
if let id = config.identityFile, !id.isEmpty {
sshArgv += ["-i", id]
}
let hostSpec: String
if let user = config.user, !user.isEmpty { hostSpec = "\(user)@\(config.host)" }
else { hostSpec = config.host }
sshArgv.append(hostSpec)
sshArgv.append("--")
sshArgv.append("/bin/sh")
sshArgv.append("-s") // read script from stdin
return await Task.detached { () -> Captured in
let proc = Process()
proc.executableURL = URL(fileURLWithPath: "/usr/bin/ssh")
proc.arguments = sshArgv
// Inherit the shell's SSH_AUTH_SOCK so ssh can reach the
// agent same pattern as SSHTransport + TestConnectionProbe.
var env = ProcessInfo.processInfo.environment
let shellEnv = HermesFileService.enrichedEnvironment()
for key in ["SSH_AUTH_SOCK", "SSH_AGENT_PID"] {
if env[key] == nil, let v = shellEnv[key], !v.isEmpty {
env[key] = v
}
}
proc.environment = env
let stdinPipe = Pipe()
let stdoutPipe = Pipe()
let stderrPipe = Pipe()
proc.standardInput = stdinPipe
proc.standardOutput = stdoutPipe
proc.standardError = stderrPipe
do {
try proc.run()
} catch {
return .connectFailure("Failed to launch ssh: \(error.localizedDescription)")
}
// Write the script to ssh's stdin, then close the write end so
// remote sh sees EOF and exits after executing the whole script.
if let data = script.data(using: .utf8) {
try? stdinPipe.fileHandleForWriting.write(contentsOf: data)
}
try? stdinPipe.fileHandleForWriting.close()
let deadline = Date().addingTimeInterval(30)
while proc.isRunning && Date() < deadline {
try? await Task.sleep(nanoseconds: 100_000_000)
}
if proc.isRunning {
proc.terminate()
return .connectFailure("Diagnostics timed out after 30s")
}
let out = (try? stdoutPipe.fileHandleForReading.readToEnd()) ?? Data()
let err = (try? stderrPipe.fileHandleForReading.readToEnd()) ?? Data()
return .completed(
stdout: String(data: out, encoding: .utf8) ?? "",
stderr: String(data: err, encoding: .utf8) ?? "",
exitCode: proc.terminationStatus
)
}.value
}
/// Local Shell invocation runs the diagnostic script against the
/// user's own Mac. Less useful than the remote form (most checks will
/// trivially pass), but lets the same UI work for both contexts.
private static func runLocally(script: String) async -> Captured {
return await Task.detached { () -> Captured in
let proc = Process()
proc.executableURL = URL(fileURLWithPath: "/bin/sh")
proc.arguments = ["-c", script]
let stdoutPipe = Pipe()
let stderrPipe = Pipe()
proc.standardOutput = stdoutPipe
proc.standardError = stderrPipe
do {
try proc.run()
} catch {
return .connectFailure("Failed to launch /bin/sh: \(error.localizedDescription)")
}
let deadline = Date().addingTimeInterval(10)
while proc.isRunning && Date() < deadline {
try? await Task.sleep(nanoseconds: 100_000_000)
}
if proc.isRunning {
proc.terminate()
return .connectFailure("Local diagnostics timed out (should be <1s)")
}
let out = (try? stdoutPipe.fileHandleForReading.readToEnd()) ?? Data()
let err = (try? stderrPipe.fileHandleForReading.readToEnd()) ?? Data()
return .completed(
stdout: String(data: out, encoding: .utf8) ?? "",
stderr: String(data: err, encoding: .utf8) ?? "",
exitCode: proc.terminationStatus
)
}.value
}
/// Same cache directory used by SSHTransport shared so the diagnostic
/// probe reuses the connection's ControlMaster socket when it already
/// exists (no second TCP handshake, no second auth).
private static func controlDirPath() -> String {
SSHTransport.controlDirPath()
}
private static func parse(stdout: String, stderr: String, exitCode: Int32) -> [Probe] {
var results: [ProbeID: Probe] = [:]
for line in stdout.split(whereSeparator: { $0 == "\n" || $0 == "\r" }) {
let parts = line.split(separator: "|", maxSplits: 2, omittingEmptySubsequences: false)
guard parts.count == 3 else { continue }
let key = String(parts[0]).trimmingCharacters(in: .whitespaces)
let status = String(parts[1]).trimmingCharacters(in: .whitespaces)
let detail = String(parts[2]).trimmingCharacters(in: .whitespaces)
guard let probe = ProbeID(rawValue: key) else { continue }
results[probe] = Probe(
id: probe,
passed: status == "PASS",
detail: detail
)
}
// If the script didn't complete, fill in the missing probes so the UI
// still shows every expected row (rather than silently skipping).
let terminated = stdout.contains("__END__")
let fallbackDetail: String
if terminated {
fallbackDetail = "(no output)"
} else if exitCode != 0 {
fallbackDetail = "(script exited \(exitCode) before this check — stderr: \(stderr.prefix(200)))"
} else {
fallbackDetail = "(no output from script)"
}
return ProbeID.allCases.map { id in
results[id] ?? Probe(id: id, passed: false, detail: fallbackDetail)
}
}
}
@@ -47,6 +47,26 @@ struct TestConnectionProbe {
// Scarf's local resolution.
// The matched absolute path is stored as `hermesBinaryHint` on the
// SSHConfig so subsequent CLI/ACP invocations don't have to re-probe.
// If the user already typed a remoteHome override, use it; otherwise
// default to $HOME/.hermes. Either way, the script also probes a
// short list of well-known alternates when the primary path doesn't
// have state.db systemd/docker/VPS installs tend to live at
// /var/lib/hermes/.hermes or /home/hermes/.hermes, and SSHing in as
// a different user than the Hermes daemon is the leading cause of
// "connection green, data empty" bug reports (issue #19).
let primary: String
if let override = config.remoteHome, !override.isEmpty {
if override.hasPrefix("~/") {
primary = "$HOME/\(override.dropFirst(2))"
} else if override == "~" {
primary = "$HOME"
} else {
primary = override
}
} else {
primary = "$HOME/.hermes"
}
let script = #"""
hpath=$(command -v hermes 2>/dev/null)
if [ -z "$hpath" ]; then
@@ -61,7 +81,21 @@ struct TestConnectionProbe {
done
fi
echo "HERMES:$hpath"
if [ -e "$HOME/.hermes/state.db" ]; then echo DB:ok; else echo DB:missing; fi
PRIMARY="\#(primary)"
if [ -r "$PRIMARY/state.db" ]; then
echo "DB:ok"
echo "HOME_USED:$PRIMARY"
else
echo "DB:missing"
# Probe well-known alternates. Emit the first one that has a
# readable state.db so the UI can offer a one-click fill.
for alt in "/var/lib/hermes/.hermes" "/opt/hermes/.hermes" "/home/hermes/.hermes" "/root/.hermes"; do
if [ -r "$alt/state.db" ]; then
echo "SUGGEST:$alt"
break
fi
done
fi
"""#
sshArgs.append("/bin/sh")
sshArgs.append("-c")
@@ -133,6 +167,8 @@ struct TestConnectionProbe {
let hermesPath = lines.first(where: { $0.hasPrefix("HERMES:") })?
.dropFirst("HERMES:".count).trimmingCharacters(in: .whitespaces) ?? ""
let dbFound = lines.contains(where: { $0 == "DB:ok" })
let suggestedHome = lines.first(where: { $0.hasPrefix("SUGGEST:") })
.map { String($0.dropFirst("SUGGEST:".count)).trimmingCharacters(in: .whitespaces) }
if hermesPath.isEmpty {
return .failure(
message: "hermes binary not found in remote $PATH",
@@ -140,7 +176,7 @@ struct TestConnectionProbe {
command: displayCommand
)
}
return .success(hermesPath: String(hermesPath), dbFound: dbFound)
return .success(hermesPath: String(hermesPath), dbFound: dbFound, suggestedRemoteHome: suggestedHome)
}
// Classify common failures by scanning the stderr trace.
@@ -81,11 +81,15 @@ struct AddServerSheet: View {
}
}
LabeledField("Remote ~/.hermes override") {
TextField("Leave blank for default", text: $viewModel.remoteHome)
LabeledField("Hermes data directory") {
TextField("Default: ~/.hermes", text: $viewModel.remoteHome)
.textFieldStyle(.roundedBorder)
.autocorrectionDisabled()
}
Text("Leave blank unless Hermes is installed at a non-default path (systemd services often live at /var/lib/hermes/.hermes; Docker sidecars vary). Test Connection auto-suggests a value when it detects one of the known alternates.")
.font(.caption)
.foregroundStyle(.secondary)
.fixedSize(horizontal: false, vertical: true)
Text("Scarf uses ssh-agent for authentication. If your key has a passphrase, run `ssh-add` before connecting — Scarf never prompts for or stores passphrases.")
.font(.caption)
@@ -113,14 +117,43 @@ struct AddServerSheet: View {
if let result = viewModel.testResult {
switch result {
case .success(let path, let dbFound):
VStack(alignment: .leading, spacing: 4) {
case .success(let path, let dbFound, let suggestedHome):
VStack(alignment: .leading, spacing: 6) {
Label("Connected", systemImage: "checkmark.circle.fill")
.foregroundStyle(.green)
Text("hermes at \(path)").font(.caption).monospaced()
Text(dbFound ? "state.db found" : "state.db not found — Hermes may not have run yet on the remote")
.font(.caption)
.foregroundStyle(dbFound ? Color.secondary : Color.orange)
if dbFound {
Text("state.db readable")
.font(.caption)
.foregroundStyle(.secondary)
} else if let suggestion = suggestedHome {
// Scarf found Hermes data at one of the common
// alternate paths. One-click fill the
// remoteHome field so the user doesn't have to
// know this is a convention thing.
VStack(alignment: .leading, spacing: 4) {
Text("state.db not found at the default location, but Scarf found one at:")
.font(.caption)
.foregroundStyle(.orange)
HStack {
Text(suggestion)
.font(.caption.monospaced())
.textSelection(.enabled)
Spacer()
Button("Use this") {
viewModel.remoteHome = suggestion
}
.controlSize(.small)
}
.padding(8)
.background(Color.yellow.opacity(0.12), in: RoundedRectangle(cornerRadius: 6))
}
} else {
Text("state.db not found at the configured path. Either Hermes hasn't run yet on this server, or it's installed at a non-default location — set the Hermes data directory field above.")
.font(.caption)
.foregroundStyle(.orange)
.fixedSize(horizontal: false, vertical: true)
}
}
case .failure(let message, let stderr, let command):
VStack(alignment: .leading, spacing: 6) {
@@ -8,47 +8,72 @@ import SwiftUI
struct ConnectionStatusPill: View {
let status: ConnectionStatusViewModel
@State private var showDetails = false
@State private var showDiagnostics = false
var body: some View {
Button {
switch status.status {
case .error:
showDetails = true
case .degraded:
// Yellow "can't read" state open the diagnostics sheet
// so the user can see exactly which files fail and why.
showDiagnostics = true
case .connected, .idle:
status.retry()
}
} label: {
HStack(spacing: 4) {
Circle()
.fill(color)
.frame(width: 8, height: 8)
// Leading SF Symbol does double duty: its color is the status
// signal (green/orange/yellow/red), and its shape reads as a
// clickable toolbar tool. No custom background the toolbar's
// `.principal` emphasis bezel is the frame.
HStack(spacing: 5) {
Image(systemName: iconName)
.foregroundStyle(color)
.symbolRenderingMode(.hierarchical)
Text(label)
.font(.caption)
.foregroundStyle(.secondary)
.lineLimit(1)
}
.padding(.horizontal, 6)
.padding(.vertical, 2)
.background(Color.secondary.opacity(0.08), in: Capsule())
.padding(.horizontal, 4)
}
.buttonStyle(.plain)
.help(tooltip)
.popover(isPresented: $showDetails, arrowEdge: .bottom) {
errorDetails.frame(width: 400)
}
.sheet(isPresented: $showDiagnostics) {
RemoteDiagnosticsView(context: status.context)
}
}
private var color: Color {
switch status.status {
case .connected: return .green
case .degraded: return .orange
case .idle: return .yellow
case .error: return .red
}
}
/// State-specific SF Symbol. The icon shape itself signals what the
/// click will do: checkmark for connected (click to re-probe),
/// stethoscope for degraded (click to run diagnostics), spinning
/// arrows for probing, triangle for error.
private var iconName: String {
switch status.status {
case .connected: return "checkmark.circle.fill"
case .degraded: return "stethoscope"
case .idle: return "arrow.triangle.2.circlepath"
case .error: return "exclamationmark.triangle.fill"
}
}
private var label: String {
switch status.status {
case .connected: return "Connected"
case .degraded: return "Connected — can't read Hermes state"
case .idle: return "Checking…"
case .error(let message, _): return message
}
@@ -62,6 +87,8 @@ struct ConnectionStatusPill: View {
return "Last probe: \(fmt.localizedString(for: ts, relativeTo: Date()))"
}
return "Connected"
case .degraded(let reason):
return "SSH works but \(reason). Click for diagnostics."
case .idle: return "Waiting for first probe"
case .error(_, _): return "Click for details"
}
@@ -6,6 +6,7 @@ struct ManageServersView: View {
@Environment(ServerRegistry.self) private var registry
@State private var showAddSheet = false
@State private var pendingRemoveID: ServerID?
@State private var diagnosticsContext: ServerContext?
var body: some View {
VStack(alignment: .leading, spacing: 0) {
@@ -17,12 +18,18 @@ struct ManageServersView: View {
list
}
}
.frame(width: 380, height: 360)
.frame(width: 440, height: 380)
.sheet(isPresented: $showAddSheet) {
AddServerSheet { name, config in
_ = registry.addServer(displayName: name, config: config)
}
}
.sheet(item: Binding(
get: { diagnosticsContext.map { IdentifiableContext(context: $0) } },
set: { diagnosticsContext = $0?.context }
)) { wrapper in
RemoteDiagnosticsView(context: wrapper.context)
}
.confirmationDialog(
"Remove this server?",
isPresented: Binding(
@@ -42,6 +49,13 @@ struct ManageServersView: View {
)
}
/// Wrapper because `ServerContext` isn't `Identifiable` against the sheet
/// item API in a way that preserves display-ordering stability.
private struct IdentifiableContext: Identifiable {
var id: ServerID { context.id }
let context: ServerContext
}
private var header: some View {
HStack {
Text("Servers").font(.headline)
@@ -87,6 +101,13 @@ struct ManageServersView: View {
}
}
Spacer()
Button {
diagnosticsContext = entry.context
} label: {
Image(systemName: "stethoscope")
}
.buttonStyle(.borderless)
.help("Run remote diagnostics — check exactly which files are readable on this server.")
Button {
pendingRemoveID = entry.id
} label: {
@@ -94,6 +115,7 @@ struct ManageServersView: View {
}
.buttonStyle(.borderless)
.foregroundStyle(.red)
.help("Remove this server from Scarf.")
}
.padding(.vertical, 4)
}
@@ -0,0 +1,203 @@
import SwiftUI
import AppKit
/// Per-server diagnostics sheet. Shown from Manage Servers and from the
/// Dashboard "Run Diagnostics" button when `lastReadError` is set. Gives
/// the user a specific list of what does/doesn't work over SSH, with
/// targeted remediation hints for each failure.
///
/// Design principle: a failing check always shows both the raw detail the
/// remote shell produced AND a human-written hint. The raw detail lets us
/// triage bug reports; the hint unblocks the user without a round trip.
struct RemoteDiagnosticsView: View {
let context: ServerContext
@State private var viewModel: RemoteDiagnosticsViewModel
@Environment(\.dismiss) private var dismiss
init(context: ServerContext) {
self.context = context
_viewModel = State(initialValue: RemoteDiagnosticsViewModel(context: context))
}
var body: some View {
VStack(spacing: 0) {
header
Divider()
probeList
Divider()
footer
}
.frame(minWidth: 640, minHeight: 520)
.task { await viewModel.run() }
}
private var header: some View {
VStack(alignment: .leading, spacing: 6) {
HStack {
Text("Remote Diagnostics — \(context.displayName)")
.font(.title3)
.fontWeight(.semibold)
Spacer()
if viewModel.isRunning {
ProgressView()
.controlSize(.small)
} else {
Button("Re-run") { Task { await viewModel.run() } }
.controlSize(.small)
}
}
HStack {
if viewModel.isRunning {
Text("Running checks…")
.font(.callout)
.foregroundStyle(.secondary)
} else {
Label(viewModel.summary, systemImage: viewModel.allPassed ? "checkmark.seal" : "info.circle")
.font(.callout)
.foregroundStyle(viewModel.allPassed ? .green : .orange)
}
Spacer()
if !viewModel.probes.isEmpty {
Button {
copyReportToClipboard()
} label: {
Label("Copy Full Report", systemImage: "doc.on.doc")
}
.controlSize(.small)
.help("Copy a plain-text summary of every check (passes and fails) — paste into GitHub issues so we can see everything at once.")
}
}
}
.padding(16)
}
private var probeList: some View {
ScrollView {
VStack(alignment: .leading, spacing: 0) {
if viewModel.probes.isEmpty && viewModel.isRunning {
Text("Running a single shell session on \(context.displayName) that exercises every path Scarf reads…")
.font(.callout)
.foregroundStyle(.secondary)
.padding()
}
ForEach(viewModel.probes) { probe in
probeRow(probe)
if probe.id != viewModel.probes.last?.id {
Divider()
}
}
}
}
}
private func probeRow(_ probe: RemoteDiagnosticsViewModel.Probe) -> some View {
HStack(alignment: .top, spacing: 12) {
Image(systemName: probe.passed ? "checkmark.circle.fill" : "xmark.circle.fill")
.foregroundStyle(probe.passed ? .green : .red)
.font(.title3)
.padding(.top, 2)
VStack(alignment: .leading, spacing: 4) {
Text(probe.id.title)
.font(.body)
if !probe.detail.isEmpty {
Text(probe.detail)
.font(.caption.monospaced())
.foregroundStyle(.secondary)
.textSelection(.enabled)
}
if !probe.passed, let hint = probe.id.failureHint {
HStack(alignment: .top, spacing: 6) {
Image(systemName: "lightbulb")
.foregroundStyle(.yellow)
.font(.caption)
Text(hint)
.font(.caption)
.foregroundStyle(.primary)
.textSelection(.enabled)
.fixedSize(horizontal: false, vertical: true)
}
.padding(8)
.background(Color.yellow.opacity(0.08))
.clipShape(RoundedRectangle(cornerRadius: 6))
}
}
Spacer(minLength: 0)
}
.padding(.horizontal, 16)
.padding(.vertical, 10)
}
private var footer: some View {
VStack(alignment: .leading, spacing: 8) {
// Raw-output disclosure. Shown whenever anything fails we need
// this visible for partial failures too since the raw stdout is
// the only way to see WHY a check returned its detail. Hidden
// only when 14/14 pass (script worked, nothing to debug).
if !viewModel.probes.isEmpty, !viewModel.allPassed {
DisclosureGroup("Raw remote output (for debugging)") {
VStack(alignment: .leading, spacing: 6) {
Text("exit code: \(viewModel.rawExitCode)")
.font(.caption.monospaced())
if !viewModel.rawStdout.isEmpty {
Text("stdout:").font(.caption).foregroundStyle(.secondary)
ScrollView {
Text(viewModel.rawStdout)
.font(.system(size: 10, design: .monospaced))
.textSelection(.enabled)
.frame(maxWidth: .infinity, alignment: .leading)
}
.frame(maxHeight: 140)
}
if !viewModel.rawStderr.isEmpty {
Text("stderr:").font(.caption).foregroundStyle(.secondary)
ScrollView {
Text(viewModel.rawStderr)
.font(.system(size: 10, design: .monospaced))
.textSelection(.enabled)
.frame(maxWidth: .infinity, alignment: .leading)
}
.frame(maxHeight: 140)
}
}
.padding(8)
.background(Color.secondary.opacity(0.08), in: RoundedRectangle(cornerRadius: 6))
}
.font(.caption)
}
HStack {
Text("Scarf runs these over a single SSH session that mirrors the shell your dashboard reads from, so a green row here means Scarf can actually read that file at runtime.")
.font(.caption)
.foregroundStyle(.secondary)
.fixedSize(horizontal: false, vertical: true)
Spacer()
Button("Done") { dismiss() }
.keyboardShortcut(.defaultAction)
}
}
.padding(16)
}
private func copyReportToClipboard() {
var lines: [String] = []
lines.append("Scarf remote diagnostics — \(context.displayName)")
if case .ssh(let config) = context.kind {
lines.append("Host: \(config.host)" + (config.user.map { " (user: \($0))" } ?? ""))
if let rh = config.remoteHome { lines.append("Hermes home (override): \(rh)") }
}
lines.append("Ran at: \(viewModel.startedAt.map { ISO8601DateFormatter().string(from: $0) } ?? "?")")
lines.append("Result: \(viewModel.summary)")
lines.append("")
for probe in viewModel.probes {
let mark = probe.passed ? "PASS" : "FAIL"
lines.append("[\(mark)] \(probe.id.title)")
if !probe.detail.isEmpty { lines.append(" \(probe.detail)") }
if !probe.passed, let hint = probe.id.failureHint {
lines.append(" hint: \(hint)")
}
}
let text = lines.joined(separator: "\n")
let pb = NSPasteboard.general
pb.clearContents()
pb.setString(text, forType: .string)
}
}
+24 -3
View File
@@ -6,12 +6,33 @@
//
import Testing
import Darwin
@testable import scarf
struct scarfTests {
@Suite struct ControlPathTests {
@Test func example() async throws {
// Write your test here and use APIs like `#expect(...)` to check expected conditions.
/// macOS' `sun_path` is 104 bytes including the NUL terminator. OpenSSH's
/// `%C` token expands to a 64-char SHA1 hex digest. The full bound socket
/// path is `<controlDir>/<%C>`, so the worst case is
/// `controlDir.utf8.count + 1 (separator) + 64 (hash) + 1 (NUL)`.
///
/// This test pins the invariant violated by the original Caches-based
/// path that triggered issue #19 any future change to `controlDirPath`
/// that drifts back over the limit will fail here instead of silently
/// breaking remote SSH for users with long `$HOME` strings.
@Test func controlDirPathFitsMacOSSocketLimit() {
let dir = SSHTransport.controlDirPath()
let worstCase = dir.utf8.count + 1 + 64 + 1
#expect(worstCase <= 104,
"ControlPath worst case is \(worstCase) bytes for dir '\(dir)'; macOS sun_path limit is 104")
}
/// `/tmp` is shared across all users on a Mac. The per-uid suffix is what
/// keeps two local users' control sockets from colliding guard against
/// a refactor that drops it.
@Test func controlDirPathIsPerUser() {
let dir = SSHTransport.controlDirPath()
#expect(dir.contains("\(getuid())"),
"ControlPath '\(dir)' must include the current uid for per-user isolation")
}
}
+254
View File
@@ -0,0 +1,254 @@
#!/usr/bin/env bash
#
# Scarf wiki helper — wraps the local .wiki-worktree/ clone with a secret-scan.
#
# Usage:
# ./scripts/wiki.sh status # git status inside .wiki-worktree/
# ./scripts/wiki.sh pull # fetch + fast-forward; aborts if dirty
# ./scripts/wiki.sh new <Page-Name> # create Page-Name.md with stub template
# ./scripts/wiki.sh stub-check # list pages still containing the TODO stub
# ./scripts/wiki.sh commit "<msg>" # secret-scan, then git add -A && git commit
# ./scripts/wiki.sh push # secret-scan again, then git push
# ./scripts/wiki.sh touch <Page-Name> # bump "Last updated" line to today
# ./scripts/wiki.sh --help # this help
#
# The secret-scan has two tiers:
# - Hard patterns (tokens, keys, private-key headers): block with non-zero exit.
# - Soft keywords (password, api_key, secret, bearer, authorization:, .env):
# warn and require --force-terms on commit/push to proceed.
#
# A user-maintained blocklist lives at scripts/wiki-blocklist.txt (gitignored).
# One pattern per line — blank lines and lines starting with # are ignored.
# Matches are treated as HARD blocks. Use this for personal IPs, hostnames, etc.
#
# Bootstrap (one-time):
# 1. In GitHub repo Settings → Features → Wikis, enable Wikis and restrict
# editing to collaborators.
# 2. Visit https://github.com/awizemann/scarf/wiki and save any first page
# via the UI (this is what creates the underlying .wiki.git repo).
# 3. From repo root:
# git clone git@github.com:awizemann/scarf.wiki.git .wiki-worktree
#
# Recovery: if .wiki-worktree/ is deleted, re-run step 3. Remote is authoritative.
set -euo pipefail
# ---------- config ----------
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
WIKI_DIR="$REPO_ROOT/.wiki-worktree"
WIKI_REMOTE="git@github.com:awizemann/scarf.wiki.git"
BLOCKLIST="$REPO_ROOT/scripts/wiki-blocklist.txt"
STUB_MARKER='> **TODO: document.**'
MAINTENANCE_PAGE="Wiki-Maintenance.md"
# ---------- helpers ----------
log() { printf '\033[1;34m==> %s\033[0m\n' "$*"; }
warn() { printf '\033[1;33m[WARN] %s\033[0m\n' "$*" >&2; }
die() { printf '\033[1;31m[ERR] %s\033[0m\n' "$*" >&2; exit 1; }
need_worktree() {
[[ -d "$WIKI_DIR/.git" ]] || die "no wiki clone at $WIKI_DIR
Run: git clone $WIKI_REMOTE .wiki-worktree
(See --help for bootstrap steps.)"
}
in_wiki() { (cd "$WIKI_DIR" && "$@"); }
today() { date +%Y-%m-%d; }
# ---------- secret-scan ----------
#
# scan_hard: exits non-zero if any hard pattern or user-blocklist pattern matches.
# scan_soft: warns (returns 1) if any soft keyword matches, else 0. The caller
# decides whether to bail based on --force-terms.
#
# All scans ignore the .git directory of the wiki clone.
hard_regex='(sk-[A-Za-z0-9_-]{20,}|ghp_[A-Za-z0-9]{30,}|ghs_[A-Za-z0-9]{30,}|ghu_[A-Za-z0-9]{30,}|gho_[A-Za-z0-9]{30,}|ghr_[A-Za-z0-9]{30,}|github_pat_[A-Za-z0-9_]{20,}|xox[baprs]-[A-Za-z0-9-]{10,}|AKIA[0-9A-Z]{16}|AIza[0-9A-Za-z_-]{35}|-----BEGIN [A-Z ]*PRIVATE KEY-----|BEGIN OPENSSH PRIVATE KEY)'
# Soft tier targets KEY=value / KEY: value patterns where the key name suggests
# a secret. Catches accidental .env paste; ignores legitimate mentions of the
# words "password" / "secret" in prose.
soft_regex='([Pp]assword|[Aa]pi[_-]?[Kk]ey|[Ss]ecret[_-][Kk]ey|[Tt]oken|[Aa]uth[_-]?[Tt]oken|[Bb]earer)[[:space:]]*[=:][[:space:]]*['"'"'"]?[A-Za-z0-9_+./-]{8,}'
scan_hard() {
local hits
hits="$(cd "$WIKI_DIR" && grep -rInE --exclude-dir=.git "$hard_regex" . 2>/dev/null || true)"
if [[ -n "$hits" ]]; then
printf '%s\n' "$hits" >&2
die "hard-pattern secret match — aborting. Remove the above and retry."
fi
if [[ -f "$BLOCKLIST" ]]; then
local pat
while IFS= read -r pat; do
[[ -z "$pat" || "$pat" =~ ^# ]] && continue
local blocklist_hits
blocklist_hits="$(cd "$WIKI_DIR" && grep -rInF --exclude-dir=.git -- "$pat" . 2>/dev/null || true)"
if [[ -n "$blocklist_hits" ]]; then
printf '%s\n' "$blocklist_hits" >&2
die "user blocklist match on pattern \"$pat\" — aborting."
fi
done < "$BLOCKLIST"
fi
}
scan_soft() {
local hits
# Exempt the maintenance page (it documents the forbidden terms on purpose).
hits="$(cd "$WIKI_DIR" && grep -rInE --exclude-dir=.git --exclude="$MAINTENANCE_PAGE" "$soft_regex" . 2>/dev/null || true)"
if [[ -n "$hits" ]]; then
printf '%s\n' "$hits" >&2
warn "soft-keyword matches above. Review carefully. Pass --force-terms to proceed."
return 1
fi
return 0
}
# ---------- commands ----------
cmd_status() {
need_worktree
in_wiki git status
}
cmd_pull() {
need_worktree
if [[ -n "$(in_wiki git status --porcelain)" ]]; then
die "wiki has uncommitted changes — commit or stash before pulling."
fi
log "Pulling wiki"
in_wiki git fetch origin
in_wiki git merge --ff-only origin/master
log "Up to date."
}
cmd_new() {
need_worktree
local name="${1:-}"
[[ -n "$name" ]] || die "usage: wiki.sh new <Page-Name>"
# Normalize spaces → dashes; strip .md if the user added it.
name="${name%.md}"
name="${name// /-}"
local path="$WIKI_DIR/${name}.md"
if [[ -e "$path" ]]; then
die "already exists: $path"
fi
local title="${name//-/ }"
cat > "$path" <<EOF
# ${title}
${STUB_MARKER} This page is a stub. See [Wiki Maintenance](Wiki-Maintenance).
---
_Last updated: $(today) — stub_
EOF
log "Created $path"
}
cmd_stub_check() {
need_worktree
# Wiki-Maintenance.md documents the stub template in a code fence, so it
# would always match the marker — exempt it.
local matches count
matches="$(cd "$WIKI_DIR" && grep -rlnF --exclude-dir=.git --exclude="$MAINTENANCE_PAGE" -- "$STUB_MARKER" . 2>/dev/null || true)"
if [[ -z "$matches" ]]; then
log "No stub pages remain."
return 0
fi
count="$(printf '%s\n' "$matches" | wc -l | tr -d ' ')"
log "$count stub page(s):"
printf '%s\n' "$matches" | sed 's|^\./||' | sort
}
cmd_touch() {
need_worktree
local name="${1:-}"
[[ -n "$name" ]] || die "usage: wiki.sh touch <Page-Name>"
name="${name%.md}"
name="${name// /-}"
local path="$WIKI_DIR/${name}.md"
[[ -f "$path" ]] || die "no such page: $path"
# Replace the YYYY-MM-DD portion of the Last updated line, keep whatever trails it.
# Uses sed -i '' for BSD/macOS compatibility.
sed -i '' -E "s/(_Last updated: )[0-9]{4}-[0-9]{2}-[0-9]{2}/\1$(today)/" "$path"
log "Touched ${name}.md → $(today)"
}
cmd_commit() {
need_worktree
local msg="" force=0
for arg in "$@"; do
case "$arg" in
--force-terms) force=1 ;;
-*) die "unknown flag: $arg" ;;
*) [[ -z "$msg" ]] && msg="$arg" || die "unexpected arg: $arg" ;;
esac
done
[[ -n "$msg" ]] || die 'usage: wiki.sh commit "<message>" [--force-terms]'
log "Secret-scan (hard patterns + blocklist)"
scan_hard
log "Secret-scan (soft keywords)"
if ! scan_soft; then
if [[ "$force" -eq 1 ]]; then
warn "proceeding past soft-keyword matches (--force-terms)"
else
die "soft-keyword matches — re-run with --force-terms to proceed."
fi
fi
in_wiki git add -A
if [[ -z "$(in_wiki git status --porcelain)" ]]; then
warn "nothing to commit"
return 0
fi
in_wiki git commit -m "$msg"
log "Committed: $msg"
}
cmd_push() {
need_worktree
local force=0
for arg in "$@"; do
case "$arg" in
--force-terms) force=1 ;;
*) die "unknown arg: $arg" ;;
esac
done
log "Secret-scan before push (hard patterns + blocklist)"
scan_hard
log "Secret-scan before push (soft keywords)"
if ! scan_soft; then
if [[ "$force" -eq 1 ]]; then
warn "proceeding past soft-keyword matches (--force-terms)"
else
die "soft-keyword matches — re-run with --force-terms to proceed."
fi
fi
# Nothing to push?
local ahead
ahead="$(in_wiki git rev-list --count @{u}..HEAD 2>/dev/null || echo 0)"
if [[ "$ahead" -eq 0 ]]; then
warn "nothing to push (0 commits ahead of origin)"
return 0
fi
log "Pushing $ahead commit(s) to origin"
in_wiki git push origin HEAD
log "Pushed. Verify at https://github.com/awizemann/scarf/wiki"
}
cmd_help() {
sed -n '2,32p' "$0" | sed 's/^# \{0,1\}//'
}
# ---------- dispatch ----------
sub="${1:-}"
shift || true
case "$sub" in
status) cmd_status "$@" ;;
pull) cmd_pull "$@" ;;
new) cmd_new "$@" ;;
stub-check) cmd_stub_check "$@" ;;
touch) cmd_touch "$@" ;;
commit) cmd_commit "$@" ;;
push) cmd_push "$@" ;;
-h|--help|help|"") cmd_help ;;
*) die "unknown subcommand: $sub (run --help)" ;;
esac