Dashboard

The Overview page (the default route after sign-in) is your roll-up of every endpoint in scope. It’s built around six KPI cards on top, a stack of time-series charts in the middle, and per-endpoint rankings at the bottom, all driven by the same time-window picker so the whole page tells a consistent story.

Filter bar

Sticky at the top of the page:

Control	Behaviour
Endpoints	Multi-select with text search. Empty selection means “fleet-wide” (every endpoint).
Range	`24h` · `7d` · `30d` · `90d`. Drives every chart and KPI on the page.
Refresh	Forces a refetch immediately. The dashboard polls; this is the bypass.

The header next to the title shows X of Y endpoints in scope, the active range as Metrics over <range> rolling · status reflects now, and an “Updated Xs ago” freshness counter.

How fresh the data is

The dashboard polls. It doesn’t subscribe to realtime changes. Three timers govern freshness:

Time-series refetch: fires when the range or endpoint selection changes.
In-window memos: recompute every minute so the right edge of every chart slides forward without a refetch.
Freshness label: ticks every 5 seconds so the “Updated Xs ago” counter is current.

For a 24h range the dashboard pulls hourly rollups plus a synthetic in-progress bucket from raw check rows, so the right edge is genuinely live. For 7d / 30d / 90d it pulls daily summaries plus today’s hourlies as fill-in.

Tour the dashboardSilent screencast — coming soon

Tour the dashboard

Fleet hero

The top strip of counters:

Healthy / Degraded / Down / Inconclusive / Paused: current status snapshot from each endpoint’s last_status.
Active incidents: count of incidents in the active state.
Notifications sent: count of dispatches in the selected window.
Incidents in window: opens within the selected range.

KPI cards

Six cards across, each with a number + a small inline sparkline drawn from the same buckets the charts use:

KPI	What it shows
Fleet uptime	`% of buckets that were healthy` across the selection.
Global P95	95th-percentile response time across all included endpoints.
Error rate	Share of buckets that recorded any non-healthy status.
Avg response	Average response time, with an early-half vs late-half trend % on the back.
Checks ran	Total probes executed in the window.
Incidents	Total incidents with an open / resolved breakdown.

Error budget banner

A gradient bar shows budgetRemaining against your SLO target (set under Settings → Check defaults and SLO). When you’re well inside budget the bar reads green; as you spend, it shifts amber and then red.

Charts

Stacked one above the other, all sharing the time window:

Response Time Percentiles: P50 / P95 / P99 lines.
Uptime % vs SLO: uptime line plus your SLO target, with a toggleable downtime series for context.
Error Rate: non-healthy share over time.
Status Bar Chart: stacked healthy / degraded / down per bucket, useful for spotting clustered failures.

Endpoint heatmap

Every endpoint × every bucket as a coloured cell. Lets you scan an entire fleet in seconds and spot an endpoint that’s been silently degraded for hours.

Per-endpoint chart

A response-time area chart of the top 8 endpoints by average latency in the window. Use it to find the small set of endpoints driving your fleet P95.

Active incidents + live activity

Two side-by-side panels at the bottom of the chart stack:

Active incidents: every currently-active incident with its endpoint, cause, and time-since-open.
Live activity feed: the most recent incident state changes (opens and resolves).

SLO compliance

A per-endpoint list of window-uptime against your SLO target. Endpoints below target sort first.

Rank cards

Three “top offenders” cards on the bottom row:

Slowest: top endpoints by P95 in the window.
Flakiest: top endpoints by incident count.
Highest error rate: top endpoints by share of non-healthy buckets.

Empty state

If you haven’t added any endpoints yet, the page shows a centred icon, the line “No endpoints yet”, and an Add Endpoint button. Every other widget is hidden until you have at least one endpoint with at least one check on it.

What’s next

Endpoints Incidents Settings → Check defaults Catalogue Troubleshooting