docs: in-app Template Catalog + dogfooding template-ideas/test-harness pages

Alan Wizemann
2026-05-03 20:56:37 +02:00
parent 2ec14de64a
commit 81f114a7d6
4 changed files with 268 additions and 0 deletions
+51
@@ -0,0 +1,51 @@
# Template Catalog (v2.8+)
The in-app **Template Catalog** browses every official `.scarftemplate` published at [awizemann.github.io/scarf/templates/](https://awizemann.github.io/scarf/templates/) without leaving Scarf. One click installs a template; the existing [Project Templates](Project-Templates) flow takes over from there.
## Opening the catalog
Sidebar → **Projects** → toolbar **Templates** menu → **Browse Catalog…**.
The sheet fetches the catalog from awizemann.github.io once on first open and again whenever you click the refresh icon (top-right of the sheet). Between refreshes it serves a 24-hour cache from `~/.hermes/scarf/catalog_cache.json`, so opening the sheet repeatedly doesn't re-hit the network. If both the network and the cache are unavailable, Scarf falls back to a hardcoded list of the official templates so the sheet is never empty on a fresh install.
## Filtering
- **Search** — matches your text against template name, description, and tags (case-insensitive). Empty = no filter.
- **Category** — picker constrains to a single category (`monitoring`, `news`, `dev`, `ops`, `personal`, `finance`, …). Categories shown in the picker are exactly those present in the loaded catalog — you'll never see a category that has no matching template.
- **Sort** — official `awizemann/...` templates rank first; community templates follow alphabetically by name.
## Install state
Each row shows a badge indicating how the template relates to your local install:
- *(no badge)* — not installed.
- **Installed v1.0.0** (green) — you have this template installed at exactly this version.
- **Update v1.1.0** (amber) — you have an older version installed; a new one is available in the catalog.
Update detection compares the catalog's `version` against `<project>/.scarf/template.lock.json`'s `template_version` for every project in `~/.hermes/scarf/projects.json`. The actual update flow is "uninstall the old version, install the new one" — Scarf doesn't yet ship a single-click in-place upgrade. Your config values survive the round-trip because they live in `<project>/.scarf/config.json` separately from the lock file.
## Detail page
Tap a row to push a detail page. The detail page surfaces only what the catalog already carries — no separate README fetch, no remote markdown — so the sheet stays snappy and works offline. You'll see:
- Description and tags (from the template's `template.json`).
- **What's inside** — a checklist of the bundle's contents (dashboard, AGENTS.md, cron jobs, configuration fields, memory appendix, skills).
- **Configuration** — every field the install form will ask you for, with the same description text the form will show. Secret fields are flagged with a lock icon.
- **Recommended model** — if the template author specified one. Scarf does NOT auto-switch your active model; this is informational only.
The big primary button at the bottom hands the template's HTTPS install URL straight to the existing [install sheet](Project-Templates#installing-a-template). From there the flow is identical to "Install from URL…" — pick a parent directory, fill in the configuration form, confirm.
## Where the catalog lives
The catalog is a single JSON file regenerated by `tools/build-catalog.py` on every release that adds or updates a template. The file format is documented in [`templates/CONTRIBUTING.md`](https://github.com/awizemann/scarf/blob/main/templates/CONTRIBUTING.md). Scarf reads it from a hardcoded URL — there's no Settings field to change the source today (deliberate; one canonical catalog keeps the namespace honest). If you want to test against a forked or local catalog, point your dev Scarf build at it by editing `CatalogService.baseURL` in source.
## Privacy
Opening the sheet hits `https://awizemann.github.io/scarf/templates/catalog.json` once per refresh window (24 hours). The request is anonymous — no token, no identifier. Same shape as Sparkle's update check. See [Privacy Policy](Privacy-Policy) for the full picture of what Scarf accesses.
## Limitations (v2.8)
- **No screenshots / hero images.** The catalog ships text-only metadata. We may add image support once the template count justifies the visual differentiation work.
- **No favorites / starring.** Browse, install, uninstall — no per-user preferences carried with you.
- **No background refresh.** The catalog refreshes only when you open the sheet (with a stale cache) or click refresh. No on-launch network call, no timer.
- **No README rendering.** The detail page uses the description from `catalog.json` only. Full READMEs ship inside each `.scarftemplate` and become visible when you start the install flow.
+97
@@ -0,0 +1,97 @@
# Template Ideas
We track candidate `.scarftemplate` ideas here. Each Scarf release cycle we ship one official template from this list as part of the pre-release dogfooding pass — see [Test Harness](Test-Harness) for how that works in practice. Once shipped, templates surface in the in-app [Template Catalog](Template-Catalog).
If you have an idea, [open an issue](https://github.com/awizemann/scarf/issues/new) or jump straight into [`templates/CONTRIBUTING.md`](https://github.com/awizemann/scarf/blob/main/templates/CONTRIBUTING.md). Community templates are merged via PR after `tools/build-catalog.py --check` passes.
## Status legend
- 🟢 **Shipped** — published in the catalog at [awizemann.github.io/scarf/templates/](https://awizemann.github.io/scarf/templates/).
- 🟡 **In progress** — being built this release cycle.
-**Idea** — fits today's primitives (config + cron + dashboard + optional webview-to-public-URL).
- 🔵 **Blocked** — needs a core Scarf feature that doesn't exist yet (uploads, local server, OAuth provider plug-in, …).
## Shipped
| Template | Author | Version | Highlights |
|----------|--------|---------|------------|
| 🟢 [Site Status Checker](https://awizemann.github.io/scarf/templates/awizemann/site-status-checker/) | awizemann | 1.1.0 | Configurable URL list, daily uptime check, Site tab webview |
| 🟢 [Template Author](https://awizemann.github.io/scarf/templates/awizemann/template-author/) | awizemann | 1.0.0 | Meta-template — scaffolds new Scarf templates conversationally |
| 🟡 [HackerNews Daily Digest](https://github.com/awizemann/scarf/tree/dogfooding-templates/templates/awizemann/hackernews-digest) | awizemann | 1.0.0 | First template built under the dogfooding-templates pass — no secrets, public Firebase API |
## Backlog
All entries below fit today's capabilities (config + cron + dashboard, no uploads, no local server). The "Secrets" column flags whether the template needs a Keychain-backed config field. Templates with no secrets are easier first builds because the test harness doesn't need to fake-Keychain anything.
| # | Idea | Category | Secrets | Notes |
|---|------|----------|---------|-------|
| 1 | GitHub Repo Health Tracker | dev | GitHub PAT | Open issues / stale PRs / last-release-age across a list of repos. High dogfood value — every Hermes user has GitHub repos. |
| 2 | Crypto / Stock Portfolio | finance | optional API key | List of holdings → live price + P/L. Showcases the `chart` widget (value over time, persisted to a status log). |
| 3 | RSS / Newsletter Watcher | news | none | List of feed URLs → new-since-last-check → digest. Builds on Hermes's good summarization. |
| 4 | Domain + TLS Cert Expiry | ops | none | `whois` + `openssl s_client` per domain → days-till-expiry stats. Pure shell, exercises Hermes's shell-tool. |
| 5 | Local Disk + Folder Health | ops | none | Watch path sizes / disk free / largest-files-changed. All local I/O. Useful for the dev mac. |
| 6 | Weather + AQI Board | personal | none | List of `(label, lat, lon)` → daily forecast + air quality via open-meteo / openaq. |
| 7 | Spotify Listening Stats | personal | OAuth via Hermes | Reuses `hermes auth spotify` (added in v0.11). Recently-played + top-tracks digest. Tokens never leave Hermes. |
| 8 | CI Status Board | dev | GitHub PAT | GitHub Actions runs across configured repos. Failure-focused; chart of failure rate over time. |
| 9 | Obsidian Vault Stats | personal | none | Note count, daily-note streak, tag distribution from a configured vault path. Filesystem-only. |
## v3 Epic — Project Site as Living Surface
🔵 **Blocked on core features.** The eBay Listings Manager idea (drop photos + condition notes → agent prices via eBay's APIs → drafts a listing the user reviews + publishes from Scarf) is the motivating use case for a broader v3 capability: **make project sites two-way.** Today's `webview` widget renders a single URL; v3 turns it into an interactive surface backed by a per-project loopback HTTP server.
### Components
| Component | Description |
|-----------|-------------|
| **Per-project loopback HTTP server** | New `ProjectSiteServer` service wraps `Network.NWListener`, binds 127.0.0.1 on a random port written into `<projectDir>/.scarf/site.runtime.json` (gitignored). One process-wide instance multiplexes by project ID. Dies on app quit. |
| **Bearer-auth + CSP** | Server requires `Authorization: Bearer <token>` on every request; token generated at first project open, stored in Keychain (`com.scarf.project.<slug>:site-token`), injected into the served page via a templated header. Default CSP forbids any non-loopback origin. |
| **Static + dynamic routes** | `GET /<path>` serves files from `<projectDir>/site/`; `POST /api/<endpoint>` is dispatched to handlers the template registers in `template.json` under a new `site.endpoints[]` block: `{path, method, handler: "append-json-line" \| "save-upload" \| "trigger-hermes-prompt"}`. v3 starts with these three handlers — no arbitrary code execution. |
| **Upload handler** | `multipart/form-data` parsed; files land at `<projectDir>/uploads/<uuid>/<original-name>`; manifest entry appended to `<projectDir>/uploads/index.json`. |
| **Trigger handler** | Posts a JSON-RPC message to Hermes via the existing webhook gateway, with a templated prompt + the upload UUID. The agent picks up the work and writes back to the dashboard. Reuses Hermes's existing webhook infra rather than inventing a new IPC channel. |
| **OAuth provider plug-in** | Generalizes the existing `SpotifyAuthFlow` shape: a template can declare `auth.providers: ["ebay"]`, and Scarf renders a "Connect eBay" button that runs `hermes auth ebay` (Hermes side adds the provider). Tokens never leave Hermes; the template addresses the connection by provider name only. |
| **Webview ↔ server bridge** | The existing `webview` widget gains a `siteRoot: true` option that points at the loopback server instead of an external URL. Page can call `fetch('/api/...')` because same-origin to the bound port. |
| **Manifest changes** | `template.json` `schemaVersion` bumps to **4** (gated; pre-v4 hosts refuse to install). New `site` and `auth` blocks. `tools/build-catalog.py` mirrors the new schema. |
### Risks
Serving HTML in a per-project sandbox, processing user uploads, and holding OAuth tokens are each independent security boundaries. The epic explicitly calls out:
- **Threat-model review** before merge. Loopback only is necessary but not sufficient — local CORS, malicious tabs in the user's browser, and other co-resident processes are all attack surfaces.
- **Feature flag** in `config.yaml` (`scarf.experimental.project-site-server: true`). Off by default; users opt in.
- **No arbitrary handlers in v3.** The three named handlers (`append-json-line`, `save-upload`, `trigger-hermes-prompt`) cover ~90% of legitimate use cases without giving templates a code-execution channel.
### Canonical use case: eBay Listings Manager
User flow once v3 ships:
1. Drop a photo into the Site tab. The template's HTML form `POST`s to `/api/uploads`; `save-upload` handler stores it under `<projectDir>/uploads/`.
2. Add a one-line condition / notes field; `POST /api/items` adds an entry to `<projectDir>/items/<uuid>.json` via `append-json-line`.
3. The form's "Suggest pricing" button hits `POST /api/research` which `trigger-hermes-prompt` translates into a Hermes prompt: *"For the item described in `<projectDir>/items/<uuid>.json`, fetch comparable completed-listing prices from eBay's Browse API and write a price recommendation back to the item file."* The agent runs, writes the result, dashboard refreshes.
4. User reviews price + photos in the Site tab; clicks "Create draft listing." Another `trigger-hermes-prompt` calls eBay's Trading API `AddItem` with `DraftFlag=true`, capturing the draft ID.
5. Listing tracking: a daily cron polls `GetMyeBaySelling` for status + `GetMessages` for buyer questions, populating dashboard widgets (chart of sell-through rate, list of buyer messages).
### Out of v3 scope
Even with the epic done:
- Arbitrary JS execution in the served page (handlers are named + bounded).
- Cross-project navigation (each project's loopback server is sandboxed to its own dir).
- Served pages reading other projects' data.
- Anything that talks to non-loopback addresses.
- Auto-publishing listings without user confirmation (drafts only; user always reviews + clicks Publish in eBay's own UI or via a future `respond-to-buyer` named handler).
### Other v3+ ideas (one-paragraph treatments)
- 🔵 **Gmail Triage / Calendar Today** — needs OAuth + token refresh model. Pattern overlaps with Spotify's existing flow but Gmail's token churn is harder.
- 🔵 **Linear / Jira / Notion mirrors** — needs OAuth provider plug-ins (same v3 epic) plus a webhooks-back-into-Scarf flow we don't have.
## Submitting an idea
Open an issue in [awizemann/scarf](https://github.com/awizemann/scarf) tagged `template-idea`. Include:
- One-line summary.
- Category (dev / ops / news / personal / finance).
- Whether it needs secret config (and if so, what kind — API key, OAuth, etc.).
- Whether it fits today's capabilities or needs a v3 core feature (the table above is a good template).
If you're ready to build, see [`templates/CONTRIBUTING.md`](https://github.com/awizemann/scarf/blob/main/templates/CONTRIBUTING.md). The catalog validator (`tools/build-catalog.py`) is the same one CI runs on PRs.
+117
@@ -0,0 +1,117 @@
# Test Harness
Each pre-release cycle Scarf runs an end-to-end harness that drives the app against an isolated Hermes home, using a freshly-shipped `.scarftemplate` from the [Template Ideas backlog](Template-Ideas) as fixture data. The act of authoring + installing + exercising a new template *is* the regression test.
This page is for contributors extending the harness. Users wanting to install templates should read [Project Templates](Project-Templates).
## Goal
A pre-release run is a single `xcodebuild test` invocation against the dev Mac. Green = the install / configure / dashboard journey works end-to-end against the new template. Red = a screenshot+log bundle lands under `derivedData/Logs/Test/` for triage.
The harness explicitly does NOT cover:
- **Chat / ACP streaming.** Non-deterministic and depends on a real model API key. Stays a manual smoke step.
- **Real Keychain prompts.** XCUITest can't accept Keychain dialogs. Templates with `secret`-typed config fields are exercised via Layer A only until we add a fake-Keychain shim.
- **Sparkle update checks.** The `--scarf-test-mode` launch arg suppresses them.
- **NSOpenPanel folder picking.** The install-sheet's parent-dir input is a `TextField`; tests type into it directly rather than driving the panel.
- **OAuth device flows.** Same reason — no programmatic accept.
## Architecture (two layers)
### Layer A — programmatic (Swift Testing)
Lives in [`scarf/scarfTests/TemplateE2ETests.swift`](https://github.com/awizemann/scarf/blob/main/scarf/scarfTests/TemplateE2ETests.swift).
Two suites:
1. **`HackerNewsDigestTemplateE2ETests`** — unpacks the shipped `.scarftemplate` from `templates/awizemann/hackernews-digest/`, validates manifest + config schema + cron prompt + dashboard against the same `ProjectTemplateService` the app uses at install time, and asserts the resulting `TemplateInstallPlan` would write the right files in the right places. This is what catches "a contributor changed the dashboard widget vocabulary and didn't update the template."
2. **`ScarfHermesHomeOverrideE2ETests`** — proves the `SCARF_HERMES_HOME` env-var override actually steers `ServerContext.local.paths`. Exists because every other test depends on it; if it silently regresses, future UI tests would suddenly start writing to the user's real `~/.hermes`. Detection-cost is one tiny suite that runs in milliseconds.
The pattern for a third template: copy `HackerNewsDigestTemplateE2ETests`, change the bundle name, swap the assertions for the new template's shape. Five-ish minutes per added template.
### Layer B — UI driving (XCUITest)
Lives in [`scarf/scarfUITests/TemplateInstallUITests.swift`](https://github.com/awizemann/scarf/blob/main/scarf/scarfUITests/TemplateInstallUITests.swift).
**Layer B drives the dev Mac's real `~/.hermes/` installation, not an isolated tmpdir.** The XCUITest runner is sandboxed (it can read `~/.hermes/` but not write to it), which kills the snapshot-and-restore-from-the-runner pattern. Instead, the harness:
- Reads `~/.hermes/scarf/projects.json` and `~/.hermes/cron/jobs.json` for assertions.
- Drives the install/uninstall through the app's UI — which runs unsandboxed and has full disk access.
- Uses a tagged cron-job-name prefix (`[tmpl:awizemann/hackernews-digest]`) and a tmpdir parent (`/tmp/scarf-uitest-…`) so cleanup-on-crash is bounded: at worst, an orphan project dir under `/tmp` (auto-reaped) and one tagged cron job (`hermes cron remove …` recovers).
- Skips entirely (XCTSkip) on a Mac without `hermes` at `~/.local/bin/hermes`.
Each test launches the Mac app with:
```
launchArguments: --scarf-test-mode
```
The app activates, the harness sends ⌘1 (the "Open Server → Local" menu shortcut) to surface a window — SwiftUI's `WindowGroup(for: ServerID.self)` doesn't auto-open under `XCUIApplication.launch()` and the keyboard shortcut takes the same code path real users hit via Dock click. Then the install/configure/dashboard journey runs via accessibility identifiers, using the v2.8+ in-app [Template Catalog](Template-Catalog) as the entry surface:
```
1. Click the Templates toolbar menu (templates.toolbar.menu)
2. Click "Browse Catalog…" (templates.browseCatalog)
3. (optional) Type into the catalog search field (catalog.searchField)
4. Tap the row matching the template under test (catalog.row.<detailSlug>)
5. Click Install on the detail page (catalogDetail.installButton)
6. Type a tmpdir path into the parent directory (templateInstall.parentDir.field)
7. Click Continue (templateInstall.parentDir.continue)
8. Configure step: leave defaults, click Continue (config sheet — TODO IDs)
9. Click Install in the confirm sheet (templateInstall.confirmInstall)
10. Wait for project to surface; verify dashboard rendered
```
Today (v2.7) the suite only covers steps 12 of that journey via a smoke test that proves the harness can launch the app and surface a window. v2.8 lands the catalog browser + IDs so steps 35 become drivable; v2.9 adds the configure-step IDs to close the gap. Layer A's `TemplateE2ETests`/`CatalogServiceTests`/`CatalogViewModelTests` already cover the non-UI install path end-to-end, so the regression value of the half-driven Layer B today is bounded: launch + window-surface checks only.
Two alternate Layer B entry points (still using the v2.7 IDs) bypass the catalog and go through the older install paths — useful for tests targeting the install pipeline itself rather than the catalog discovery surface:
- **Install from URL** — `templates.toolbar.menu``templates.installFromURL``templates.installURL.field``templates.installURL.confirm`.
- **Install from File** — `templates.toolbar.menu``templates.installFromFile` → (NSOpenPanel — not drivable from XCUITest, so this path is for manual smoke tests only).
A single screenshot is captured on success; on failure the test framework auto-attaches the rendered tree, console log, and a screenshot for triage.
## Adding a template to the harness
1. **Create the bundle.** Mirror `templates/awizemann/hackernews-digest/staging/` for shape: `template.json`, `README.md`, `AGENTS.md`, `dashboard.json`, optional `cron/jobs.json`. Run `python3 tools/build-catalog.py --check` until green.
2. **Add Layer A coverage.** In `TemplateE2ETests.swift`, add a new `@Test` function that calls the same parse + plan + assert pattern as `hackernewsDigestParsesAndPlans()`. Adjust assertions to match your template's manifest, config schema, and dashboard shape. ~30 minutes.
3. **Add accessibility IDs only for new widget shapes.** Most templates reuse the dashboard widget vocabulary (`stat`, `list`, `text`, etc.) — those already have IDs. If your template introduces a new install-time UX (e.g. the first secret field, the first OAuth provider), add IDs for the new controls and centralize them in `UITestIdentifiers.swift`.
4. **Extend Layer B only when warranted.** Every shipped template inherits the existing install / configure / dashboard XCUITest. You only need to write new XCUITest code if the install flow itself changes — e.g. a template that introduces a secret field would justify a new test that exercises the fake-Keychain shim once we add it.
5. **Run locally.** `xcodebuild test -scheme scarf -destination 'platform=macOS'` should be green before you open the PR.
## Launch arguments + env vars
| Variable / Argument | Where read | Effect |
|---------------------|------------|--------|
| `SCARF_HERMES_HOME` (env var) | `HermesProfileResolver.resolveLocalHome()` | Pins the resolver to the supplied absolute path. Bypasses both the cache and the `active_profile` lookup. Must be absolute — relative paths are rejected with a warning. **Layer A only** — Layer B drives the real `~/.hermes` because the sandboxed XCUITest runner can't write to a tmpdir-style `SCARF_HERMES_HOME` either. |
| `--scarf-test-mode` (launch arg) | `TestModeFlags.shared.isTestMode` at app init | Currently gates: `UpdaterService` initializes Sparkle inert (no on-launch update prompt). Future gating sites: Hermes capability live-probe, first-run walkthrough — added incrementally as the harness exercises them. |
| `--scarf-test-fake-keychain` (launch arg) | (planned, lands with the first secret-bearing template) `ProjectConfigKeychain` | Routes Keychain reads/writes through a process-local in-memory store so XCUITest can drive secret-field templates without the OS prompting. |
## Failure triage
When an XCUITest fails:
1. Open the latest `.xcresult` bundle under `derivedData/Logs/Test/` in Xcode.
2. The Failures tab shows the assertion + screenshot at the moment of failure.
3. The Activity log under each test step shows the full XCUITest event trace — which accessibility ID was queried, what was found, what the framework saw on the screen.
4. To reproduce interactively: open the test in Xcode, set a breakpoint at the failing step, hit Run. The app launches with the same env vars + launch args, and you can step through the click sequence by hand.
When a Layer A test fails:
1. The error message names the assertion that failed (e.g. `expected 3 fields in config schema, got 2`).
2. Read the new template's `template.json` against the assertion. Almost always the test caught real drift between the bundle and the test's expectations — fix one or the other.
## Why this shape, not other shapes
A few alternatives we considered + rejected:
- **Mocking `hermes` binary.** Too brittle — the binary's behavior changes per release, and we'd be maintaining a parallel implementation to mock. The harness uses the *real* Hermes binary on the dev Mac and assumes a working installation.
- **Isolated `SCARF_HERMES_HOME` for Layer B.** Tried and rejected: XCUITest runners on macOS are sandboxed even when the app under test isn't, so the runner can't populate a fixture tmpdir before launch. The seam still exists for Layer A's unit-level isolation; Layer B drives the real `~/.hermes/` with a tagged-cron-job cleanup story instead.
- **Visual diffing.** Image-diff infrastructure is its own multi-week project. Screenshots get captured for human review only; no pixel asserts.
- **Single-suite unit-test only.** Fast and hermetic but doesn't prove the UI actually renders. The two-layer split keeps the bulk of regression value in fast Layer A while still having a thin "did the install button do what the user thinks?" gate in Layer B.
- **Automating chat / ACP.** Non-deterministic; would block every release on flaky output. Stays a manual smoke step.
## Status
The harness is being built on the [`dogfooding-templates` branch](https://github.com/awizemann/scarf/tree/dogfooding-templates) under issue tracking the v2.7 cycle. Initial scope is Layer A + the env-var seam + the HN Digest fixture. Layer B + accessibility IDs land in the same release.
When this page reads "the harness is in production," you can run the entire pre-release sweep with `xcodebuild test -scheme scarf -destination 'platform=macOS'` and have confidence equivalent to a manual click-through.
+3
@@ -17,6 +17,8 @@
- [Memory & Skills](Memory-and-Skills) - [Memory & Skills](Memory-and-Skills)
- [Projects & Profiles](Projects-and-Profiles) - [Projects & Profiles](Projects-and-Profiles)
- [Project Templates](Project-Templates) - [Project Templates](Project-Templates)
- [Template Catalog](Template-Catalog)
- [Template Ideas](Template-Ideas)
- [Platforms / Personalities / Quick Commands](Platforms-Personalities-QuickCommands) - [Platforms / Personalities / Quick Commands](Platforms-Personalities-QuickCommands)
- [Servers & Remote](Servers-and-Remote) - [Servers & Remote](Servers-and-Remote)
- [MCP, Plugins, Webhooks, Tools](MCP-Servers-Plugins-Webhooks-Tools) - [MCP, Plugins, Webhooks, Tools](MCP-Servers-Plugins-Webhooks-Tools)
@@ -37,6 +39,7 @@
- [Adding a Feature Module](Adding-a-Feature-Module) - [Adding a Feature Module](Adding-a-Feature-Module)
- [Adding a Service](Adding-a-Service) - [Adding a Service](Adding-a-Service)
- [Testing](Testing) - [Testing](Testing)
- [Test Harness](Test-Harness)
- [Release Process](Release-Process) - [Release Process](Release-Process)
**Reference** **Reference**