Concepts
WatchDeck has a small vocabulary. Five nouns cover almost everything you’ll see in the UI. Learn them once and the rest of the product reads cleanly.
Endpoint
The thing you’re monitoring — and its configuration. A single row in mx_endpoints holds:
- Identity — name, optional description, type (
httporport). - Target —
urlfor HTTP, orhost+portfor TCP. - Probe config — method, expected status codes, custom headers, assertions.
- Cadence & thresholds — check interval, timeout, latency threshold, SSL warning days, failure / recovery thresholds.
- Routing — which notification channels to use, plus an optional escalation channel and delay.
- Live state — last status, last check time, last response time, consecutive failure / healthy streaks, and the current open incident (if any).
There’s no separate “check config” record — adding an endpoint is configuring its check.
Lifecycle status
An endpoint’s status is one of:
| Status | Behaviour |
|---|---|
active | Default. The scheduler runs probes on the configured interval. |
paused | Scheduler skips it. No probes, no incidents, no notifications. History kept. |
archived | Hidden from the list view. Reserved for future use. |
Check
A single recorded run of an endpoint’s probe. Each row in mx_checks captures one verdict:
status—healthy,degraded,down, orinconclusive.status_reason— short human string (e.g."HTTP 502 — expected 200").response_time,status_code,ssl_days_remaining,port_open,body_bytes.assertion_result— per-rule pass/fail breakdown.
Checks are append-only. Once written, a check row never changes.
In conversation people use “check” loosely — sometimes the probe configuration, sometimes the run. The database treats them as distinct: the endpoint row holds the config; each mx_checks row holds one run.
Run statuses
| Status | When it happens |
|---|---|
healthy | Probe succeeded and every assertion passed. |
degraded | Probe succeeded but a degraded-severity rule fired — latency over budget, SSL inside the warning window, soft-fail assertion. |
down | Status code mismatch, network or connection error, port refused, or a down-severity assertion failed. |
inconclusive | The run couldn’t reach a verdict (rare — typically a scheduling or storage error). Doesn’t move the streak counters. |
Incident
A grouped failure window. When an endpoint’s consecutive_failures reaches its failure_threshold, an incident opens. When consecutive_healthy reaches recovery_threshold, it resolves. The thresholds keep a single transient blip from paging anyone.
Incident status
Incidents have just two states:
| Status | Meaning |
|---|---|
active | Open and still firing. The endpoint’s current_incident_id points at it. |
resolved | Closed — the recovery streak met its threshold. resolved_at and duration_seconds are populated. |
There’s no acknowledged state — incidents go straight from active to resolved once the recovery streak fires.
Each incident carries a cause (endpoint_down or endpoint_degraded), a timeline JSONB array of every check that contributed to it, and a notifications-sent counter. See Incidents for the full lifecycle.
Channel
A destination for notifications. Channels live independently of endpoints — you create them once under Notifications and attach them to as many endpoints as you like.
| Type | Notes |
|---|---|
email | One or more recipients per channel. |
slack | Webhook into a Slack channel. |
discord | Webhook into a Discord channel. |
webhook | Generic JSON POST to any URL. |
Each channel carries its own filters (severity, event type), delivery priority, quiet hours, and rate limit — see Notifications for how those compose with per-endpoint routing.
Mute
A scoped silence. Mutes suppress channel delivery without changing endpoint state — incidents still open and resolve, but no message goes out.
| Scope | Effect |
|---|---|
endpoint | Silences notifications for one endpoint. |
channel | Silences one channel across all endpoints it serves. |
global | Silences everything. |
Each mute has an optional expires_at and a free-text reason — useful for planned maintenance windows.
How they fit together
Endpoint ──── runs ────► Check (1:many)
│ (one row per probe)
│
├──── may open ────► Incident (1:many)
│ (grouped failure window)
│
└──── routes to ────► Channel (many:many)
│
└──── may be silenced by ──► MuteEverything else in the product — the dashboard tiles, the catalogue, the notification log — is built on top of these five nouns.