docs: Performance Monitoring (ScarfMon) reference page

Documents the ScarfMon perf instrumentation harness shipped in v2.7.

Covers:
- User capture recipe (Settings → Diagnostics → Performance, Reset, run scenario, Copy as JSON)
- Mode tradeoffs (off / signpostOnly default / full)
- Privacy stance: no remote upload, StaticString names, symbol-only stack frames
- Catalog of every measure point shipping today (chat render, chat stream, session load, sqlite, transport, disk I/O)
- Developer guide for adding new measure points
- Architecture diagram + ring-buffer sizing
- When to use Instruments instead

Linked from Home and Core-Services so it's discoverable.
Alan Wizemann
2026-05-04 23:29:25 +02:00
parent 81f114a7d6
commit 53e4c8efc0
3 changed files with 226 additions and 0 deletions
+4
@@ -76,6 +76,10 @@ In v2.5 most service code moved out of the Mac target into the shared **ScarfCor
See [ScarfCore Package](ScarfCore-Package) for the package architecture and how to add a new shared service.
## Performance instrumentation (ScarfMon, v2.7+)
A separate harness lives at [`ScarfCore/Diagnostics/`](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Diagnostics) — `ScarfMon.measure` / `measureAsync` / `event` wrap hot call sites in the chat path, transport, SQLite backend, and disk I/O. Three modes (`off`, `signpostOnly` (default), `full`) controlled from the in-app Diagnostics → Performance panel; the default is effectively free outside an Instruments session. See [Performance Monitoring](Performance-Monitoring) for the full reference, including the user capture recipe and the developer guide for adding new measure points.
## Patterns shared across the layer
- **`ServerContext` parameterizes all I/O.** Services receive the context at init; routing local vs. SSH happens through `context.transport`. See [Transport Layer](Transport-Layer).
+1
@@ -18,6 +18,7 @@ A native macOS companion app for the [Hermes AI agent](https://github.com/hermes
- **[Slash Commands](Slash-Commands)** — author project-scoped slash commands (v2.5+)
- **[Design System](Design-System)** — ScarfColor / ScarfFont / components reference
- [Architecture Overview](Architecture-Overview) — MVVM-F, services, transport, ScarfCore
- [Performance Monitoring](Performance-Monitoring) — ScarfMon: opt-in perf instrumentation, how to capture a baseline
- [Servers & Remote](Servers-and-Remote) — adding remote Hermes hosts over SSH
- [Localization](Localization) — supported languages + how to contribute a new one
- [Release Notes Index](Release-Notes-Index) — every version's notes
+221
@@ -0,0 +1,221 @@
# Performance Monitoring (ScarfMon)
Scarf ships an always-on, opt-in performance instrumentation harness called **ScarfMon**. It records timing samples and event counts at known hot spots so users hitting "feels slow" can capture a baseline, share it with maintainers, and have a concrete signal to act on instead of a vague report.
This page is for both groups:
- **Users** who want to help diagnose perceived slowness.
- **Developers** who want to add measure points to their own code or extend the harness.
## TL;DR
- **It's free when off.** Default mode (`signpostOnly`) emits Apple `os_signpost` events, which the runtime elides outside an Instruments session.
- **It's privacy-respecting.** Sample names are `StaticString` (compile-time literals), so user content can't leak through metric tags. Nothing leaves the device unless the user explicitly hits **Copy as JSON**.
- **It's open source.** All the plumbing lives in [`ScarfCore/Diagnostics/`](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Diagnostics).
## How to turn it on
### Mac
`Settings → Advanced → Performance Diagnostics`. The picker has three modes:
| Mode | What it records | Cost |
|---|---|---|
| **Off** | Nothing | One branch + return per call |
| **Signpost only** (default) | `os_signpost` events to the `com.scarf.mon` subsystem | Zero outside Instruments |
| **Full** | Signposts + a 4096-entry in-memory ring + `os.Logger` debug stream | One ring write per call |
Switching to **Full** unlocks the in-app summary table (top 20 buckets by p95) and the **Copy as JSON** button.
### iOS
`Settings → Diagnostics → Performance`. Same three-mode picker, same panel layout, same `Copy as JSON` action. The mode is persisted across launches in `UserDefaults` under key `ScarfMonMode`.
### From a Terminal (any mode)
```bash
log stream --predicate 'subsystem == "com.scarf.mon"' --info --debug
```
Streams every signpost / log line live. Useful for catching events that happen before you can flip the panel toggle.
## How to read the data
The in-app panel groups samples by `(category, name)` and shows for each:
- **count** — total samples of that name in the buffer
- **p50 / p95 / max** — for `interval` samples, percentile durations
- **bytes** — running total when the call site reported a payload size
For a full export, hit **Copy as JSON**. Each line is one sample with `category`, `name`, `kind` (`event` or `interval`), `timestampMs`, `durationNanos`, `count`, and optional `bytes`. Compact JSON, valid JSON array — pipe through `jq` or paste into a feedback thread.
## What's measured today (v2.7+)
| Category | Name | Where | What it tells you |
|---|---|---|---|
| `chatRender` | `mac.ChatView.body` | [Mac ChatView](https://github.com/awizemann/scarf/tree/main/scarf/scarf/Features/Chat/Views/ChatView.swift) | Full chat tab body re-eval count |
| `chatRender` | `mac.RichChatMessageList.body` | [RichChatMessageList](https://github.com/awizemann/scarf/tree/main/scarf/scarf/Features/Chat/Views/RichChatMessageList.swift) | Whether the message-list `ForEach` is re-issuing |
| `chatRender` | `mac.RichMessageBubble.body` | [RichMessageBubble](https://github.com/awizemann/scarf/tree/main/scarf/scarf/Features/Chat/Views/RichMessageBubble.swift) | Per-bubble re-evals — divide by `acpEvent` count to spot wasted re-renders |
| `chatRender` | `ios.ChatView.body` / `ios.MessageBubble.body` | iOS `ChatView.swift` | Same signal on iOS |
| `chatStream` | `mac.sendViaACP` | [Mac ChatViewModel](https://github.com/awizemann/scarf/tree/main/scarf/scarf/Features/Chat/ViewModels/ChatViewModel.swift) | User tap → first prompt write (carries prompt byte count) |
| `chatStream` | `mac.sendPrompt` | Same | User tap → response complete (interval) |
| `chatStream` | `mac.acpEvent` / `mac.handleACPEvent` | Same | Per-event arrival + handle cost |
| `chatStream` | `firstByte` / `firstThoughtByte` | [RichChatViewModel](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/ViewModels/RichChatViewModel.swift) | Time-to-first-token. Splits Hermes "thinking" from streaming render |
| `chatStream` | `finalizeStreamingMessage` | Same | End-of-turn finalize cost (target: < 1 ms) |
| `chatStream` | `ios.send` / `ios.startResuming` / `ios.acpEvent` / `ios.handleACPEvent` | iOS `ChatView.swift` | Same shape on iOS |
| `sessionLoad` | `mac.startACPSession` / `ios.startResuming` | Both targets | Session boot cost |
| `sqlite` | `sqlite.query` / `sqlite.queryBatch` | [RemoteSQLiteBackend](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Services/Backends/RemoteSQLiteBackend.swift) | Per-call latency over SSH (carries row count + stdout bytes) |
| `transport` | `ssh.streamScript` (iOS) / `ssh.run` (Mac) | [CitadelServerTransport](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfIOS/Sources/ScarfIOS/CitadelServerTransport.swift), [SSHScriptRunner](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Transport/SSHScriptRunner.swift) | SSH round-trip time |
| `diskIO` | `loadConfig` / `loadCronJobs` | [HermesFileService](https://github.com/awizemann/scarf/tree/main/scarf/scarf/Core/Services/HermesFileService.swift) | Hot disk reads. `loadConfig` also logs caller stack frames in Full mode |
Adding a new measure point is two lines (see Developer Guide below).
## Capture recipe for a useful baseline
1. Build + run the latest version.
2. Flip the panel to **Full**.
3. Optionally, in another terminal: `log stream --predicate 'subsystem == "com.scarf.mon"'`.
4. Hit **Reset** in the panel.
5. Run the specific scenario you want to measure (one chat turn, one session boot, one specific click).
6. Hit **Copy as JSON** *before* doing anything else — the ring is FIFO with a 4096-entry capacity, so a long idle session will eventually overwrite earlier samples.
7. Paste the JSON in a [GitHub issue](https://github.com/awizemann/scarf/issues) or feedback thread, with one sentence describing what you were doing.
The maintainers will use the same JSON shape to grep for outliers.
## Privacy
ScarfMon is privacy-conscious by construction:
- **No remote upload.** The ring buffer never leaves the device. The `Copy as JSON` button puts the dump on the system clipboard; the user decides whether to paste it anywhere.
- **No content.** Sample names are `StaticString` (compile-time literals), so prompt text, response text, file paths, etc. cannot accidentally end up in a metric.
- **No PII.** Optional `bytes` field tracks payload *size*, never payload *contents*.
- **Symbol-only stack traces.** When `Full` mode logs caller stack frames (e.g. for the `loadConfig` mystery-caller hint), they are mangled Swift symbols + offsets. No memory addresses, no file paths.
- **Subsystem isolation.** All output uses subsystem `com.scarf.mon`, so users can grep / filter / disable independently of Scarf's general logs.
## Developer guide — adding a measure point
The public API has three primitives:
```swift
// Synchronous interval duration is recorded
ScarfMon.measure(.diskIO, "loadX") {
// your work
}
// Async interval same shape
try await ScarfMon.measureAsync(.sqlite, "query") {
try await actualQuery()
}
// One-shot event count + optional payload size
ScarfMon.event(.chatStream, "firstByte", count: 1, bytes: chunk.utf8.count)
```
### Picking a category
Categories are a fixed enum in [`ScarfMon.swift`](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Diagnostics/ScarfMon.swift):
| Category | When to use |
|---|---|
| `chatRender` | View-body re-evals, scroll, layout work |
| `chatStream` | ACP events, prompt sends, finalize |
| `sessionLoad` | Session boot / resume / load |
| `transport` | SSH round-trips, network |
| `sqlite` | DB queries, snapshot pipeline |
| `diskIO` | File reads / writes |
| `render` | Other rendering (dashboards, sidebars) |
| `other` | Catch-all — promote to a real category if it grows |
Adding a new category is a one-line case in `ScarfMon.Category` plus a row in this page.
### Picking a name
- Names must be `StaticString` (compile-time literal) — the type system enforces this.
- Conventionally prefixed with the platform (`mac.`, `ios.`) when the same logical operation has both shapes; bare names are fine for cross-platform code.
- Use `dot.notation` to group related events (`sqlite.query`, `sqlite.queryBatch`, `sqlite.query.rows`).
- Stable: rename = breaking change for any saved JSON dumps users have shared.
### Body counters in SwiftUI views
Inside `var body: some View`, the idiom is:
```swift
var body: some View {
let _: Void = ScarfMon.event(.chatRender, "mac.ChatView.body")
return VStack { }
}
```
The `let _: Void = …` works inside `@ViewBuilder` (it's a local declaration, not a view-producing expression) and fires every time SwiftUI re-evaluates the body. Use sparingly — these are sentinel events for diagnosing render storms, not for routine operation.
### Verifying the cost
Default `signpostOnly` mode is effectively free. To prove it locally:
```swift
let n = 1_000_000
let t = ContinuousClock().measure {
for _ in 0..<n {
ScarfMon.measure(.other, "noop") { 1 + 1 }
}
}
// t should be < ~50ms total about 50ns per call when off
```
When the backend set is empty, the wrapper is `@inline(__always)` and short-circuits without taking a clock reading.
## Architecture
Three pluggable backends behind one dispatcher:
```
┌─────────────────────────────────────────────────┐
│ ScarfMon.measure / measureAsync / event │
│ ↓ (one os_unfair_lock per emit, no allocs) │
├─────────────────────────────────────────────────┤
│ ScarfMonSignpostBackend (always on) │
│ ScarfMonRingBuffer (Full mode only) │
│ ScarfMonLoggerBackend (Full mode only) │
└─────────────────────────────────────────────────┘
```
Boot wiring lives in [`ScarfMonBoot.swift`](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Sources/ScarfCore/Diagnostics/ScarfMonBoot.swift) — `ScarfMonBoot.configure(mode:)` is called once at app launch from both `ScarfApp.init` (Mac) and `ScarfIOSApp.init` (iOS).
The ring buffer is fixed at 4096 entries × ~80 bytes per sample = ~320 KB resident — enough for several minutes of streaming-chat activity at 200 samples/s without overwriting interesting context. When the buffer is full, the oldest sample is overwritten.
Tests live at [`ScarfMonTests.swift`](https://github.com/awizemann/scarf/tree/main/scarf/Packages/ScarfCore/Tests/ScarfCoreTests/ScarfMonTests.swift) — 11 cases covering ring ordering / wrap / reset, summary aggregation, percentiles, install / isActive transitions, the throw path on `measureAsync`, boot mode switches, and JSON export round-trips.
## Reading the in-app summary table
Each row is one `(category, name)` bucket. Columns:
- **category** — the enum case
- **name** — the call site name
- **count** — total samples
- **p50 / p95 / max** — for `interval` samples, percentile durations in ms
- **bytes** — running total of any reported payload sizes
Default sort is **p95 descending** so the slowest hot path is at the top. The table caps at 20 rows; for the full picture, **Copy as JSON**.
## When to use Instruments instead
The in-app panel is for "I want a quick number to share." For deep work, attach Instruments and pick the **Time Profiler** template — every `ScarfMon.measure(...)` call shows up as a signpost event in the **Points of Interest** track. Filter the track by `subsystem: com.scarf.mon`. Pair with Time Profiler in the same trace and you'll see exactly which Swift functions are hot at the moment ScarfMon recorded a slow `sqlite.query`.
## Roadmap
ScarfMon currently covers the chat hot path, transport, SQLite, and a slice of disk I/O. Next planned coverage areas:
- **Sessions list / Dashboard** — render counts, snapshot loads.
- **Project switcher / file watcher** — debounce + tick rates.
- **Image encoding** — vision uploads.
- **Memory / Skills surfaces** — reads + parse.
- **Cron / Curator panes** — read paths during pollings.
Adding any of these is a 12 line measure point at the call site. See the developer guide above.
## Related pages
- [Architecture Overview](Architecture-Overview) — where ScarfCore + Diagnostics fits in the stack
- [Core Services](Core-Services) — the services ScarfMon instruments today
- [Hermes Paths](Hermes-Paths) — the file paths `loadConfig` / `loadCronJobs` reach for
- [Privacy Policy](Privacy-Policy) — what data the apps access