Concepts

WatchDeck has a small vocabulary. Five nouns cover almost everything you’ll see in the UI. Learn them once and the rest of the product reads cleanly.

Endpoint

The thing you’re monitoring — and its configuration. A single row in mx_endpoints holds:

Identity — name, optional description, type (http or port).
Target — url for HTTP, or host + port for TCP.
Probe config — method, expected status codes, custom headers, assertions.
Cadence & thresholds — check interval, timeout, latency threshold, SSL warning days, failure / recovery thresholds.
Routing — which notification channels to use, plus an optional escalation channel and delay.
Live state — last status, last check time, last response time, consecutive failure / healthy streaks, and the current open incident (if any).

There’s no separate “check config” record — adding an endpoint is configuring its check.

Lifecycle status

An endpoint’s status is one of:

Status	Behaviour
`active`	Default. The scheduler runs probes on the configured interval.
`paused`	Scheduler skips it. No probes, no incidents, no notifications. History kept.
`archived`	Hidden from the list view. Reserved for future use.

Check

A single recorded run of an endpoint’s probe. Each row in mx_checks captures one verdict:

status — healthy, degraded, down, or inconclusive.
status_reason — short human string (e.g. "HTTP 502 — expected 200").
response_time, status_code, ssl_days_remaining, port_open, body_bytes.
assertion_result — per-rule pass/fail breakdown.

Checks are append-only. Once written, a check row never changes.

In conversation people use “check” loosely — sometimes the probe configuration, sometimes the run. The database treats them as distinct: the endpoint row holds the config; each mx_checks row holds one run.

Run statuses

Status	When it happens
`healthy`	Probe succeeded and every assertion passed.
`degraded`	Probe succeeded but a degraded-severity rule fired — latency over budget, SSL inside the warning window, soft-fail assertion.
`down`	Status code mismatch, network or connection error, port refused, or a down-severity assertion failed.
`inconclusive`	The run couldn’t reach a verdict (rare — typically a scheduling or storage error). Doesn’t move the streak counters.

Incident

A grouped failure window. When an endpoint’s consecutive_failures reaches its failure_threshold, an incident opens. When consecutive_healthy reaches recovery_threshold, it resolves. The thresholds keep a single transient blip from paging anyone.

Incident status

Incidents have just two states:

Status	Meaning
`active`	Open and still firing. The endpoint’s `current_incident_id` points at it.
`resolved`	Closed — the recovery streak met its threshold. `resolved_at` and `duration_seconds` are populated.

There’s no acknowledged state — incidents go straight from active to resolved once the recovery streak fires.

Each incident carries a cause (endpoint_down or endpoint_degraded), a timeline JSONB array of every check that contributed to it, and a notifications-sent counter. See Incidents for the full lifecycle.

Channel

A destination for notifications. Channels live independently of endpoints — you create them once under Notifications and attach them to as many endpoints as you like.

Type	Notes
`email`	One or more recipients per channel.
`slack`	Webhook into a Slack channel.
`discord`	Webhook into a Discord channel.
`webhook`	Generic JSON POST to any URL.

Each channel carries its own filters (severity, event type), delivery priority, quiet hours, and rate limit — see Notifications for how those compose with per-endpoint routing.

Mute

A scoped silence. Mutes suppress channel delivery without changing endpoint state — incidents still open and resolve, but no message goes out.

Scope	Effect
`endpoint`	Silences notifications for one endpoint.
`channel`	Silences one channel across all endpoints it serves.
`global`	Silences everything.

Each mute has an optional expires_at and a free-text reason — useful for planned maintenance windows.

How they fit together


  Endpoint  ──── runs ────►  Check                 (1:many)
     │                       (one row per probe)
     │
     ├──── may open ────►   Incident               (1:many)
     │                      (grouped failure window)
     │
     └──── routes to ────►  Channel                (many:many)
                                │
                                └──── may be silenced by ──►  Mute

Everything else in the product — the dashboard tiles, the catalogue, the notification log — is built on top of these five nouns.