Skip to main content

Alert Configurations

An alert configuration is a rule that tells Stratora when to fire an alert — which metric to watch, what condition to check, and what thresholds constitute a warning or critical state.

Stratora has two kinds of alert configurations that are managed together in a single unified view.


Built-In Configurations

Stratora ships with a set of built-in alert configurations that cover the most common monitoring scenarios. These are created automatically during installation and apply globally to all nodes.

NameTypeMetricConditionWarningCriticalDuration
Service StoppedServiceequalsImmediate
Node UnreachableReachabilityequals60 s
High CPU UsageMetriccpu_usage_percent>80%95%5 min
High Memory UsageMetricmemory_usage_percent>85%95%5 min
Low Disk SpaceMetricdisk_usage_percent>80%95%Immediate
High Disk LatencyMetricdisk_latency_ms>20 ms50 ms5 min
Interface DownInterfaceinterface_statusequalsImmediate
High Interface ErrorsMetricinterface_errors_rate>10/s100/s5 min

Built-in configurations:

  • Can be enabled or disabled but not deleted
  • Apply globally (all nodes)
  • Can be overridden per-node, per-group, or per-site using custom configurations

Custom Configurations

Custom alert configurations let you create your own rules or override the thresholds of a built-in configuration for a specific scope.

Navigate to Alerting → Alert Configurations and click Add Configuration.

Configuration Fields

FieldRequiredDescription
NameYesDisplay name for the configuration
Alert TypeYesmetric, service, reachability, or interface
MetricConditionalThe metric to evaluate (required for metric type)
ConditionYesComparison operator — see below
Warning ThresholdNoValue that triggers a warning alert
Critical ThresholdNoValue that triggers a critical alert
DurationNoHow long the condition must persist before firing (default: immediate)
ScopeYesWhere this configuration applies — see below
Escalation TeamNoWhich escalation team handles notifications
EnabledYesWhether the configuration is active (default: yes)

Alert Types

TypeWhat It Monitors
MetricA numeric metric value against a threshold (CPU, memory, disk, latency, etc.)
ServiceWhether a Windows or Linux service is running or stopped
ReachabilityWhether the node is reachable via ping or agent heartbeat
InterfaceWhether a network interface is up or down

Conditions

ConditionOperatorExample
Greater than>CPU usage > 90%
Less than<Free disk space < 10 GB
Equals=Service state = stopped
Not equals!=Interface status != up

Duration

The duration field controls how long a condition must persist before an alert fires. This prevents transient spikes from generating noise.

  • Immediate (0 seconds) — fires as soon as the condition is detected
  • 5 minutes — the metric must stay above/below the threshold for 5 consecutive minutes

The evaluator uses a rolling average over the duration window to smooth out momentary fluctuations.


Scoping

Every custom configuration has a scope that determines which nodes it applies to.

ScopeApplies To
GlobalAll nodes in the system
SiteAll nodes in a specific site
Node GroupAll nodes in a specific node group
NodeA single node

When multiple configurations match the same metric on a node (e.g., a global built-in and a node-level custom override), the most specific scope wins. A node-level configuration takes precedence over a group-level one, which takes precedence over a site-level one, which takes precedence over a global built-in.


Template-Generated Configurations

Device templates can include their own alert rules. For example, the VMware vCenter template ships with rules for:

  • vCenter unreachable (100% packet loss for 3 minutes)
  • ESXi host high CPU (> 90% for 10 minutes)
  • ESXi host critical CPU (> 95% for 5 minutes)
  • ESXi host high memory (> 90% for 10 minutes)
  • VM high CPU / memory / disk latency
  • Datastore usage warnings

These template rules are applied automatically when a node uses the template — no manual configuration needed.


Evaluation

The alert evaluator runs on a 10-second cycle, checking every enabled configuration against current metric data.

For metric-type configurations, the evaluator:

  1. Queries the metric from VictoriaMetrics using a PromQL-compatible query
  2. Applies duration averaging if configured (e.g., avg_over_time over 5 minutes)
  3. For multi-instance metrics (disk volumes, network interfaces), evaluates the worst-value instance
  4. Compares the result against warning and critical thresholds
  5. Creates, updates, or resolves alerts based on the result

For service and interface configurations, the evaluator checks the current state directly.

For reachability configurations, the evaluator requires 3 consecutive failed checks (30 seconds) before firing to avoid false positives from momentary network blips.

info

A 60-second resolution grace period applies to all configurations. Once a condition clears, the evaluator waits 60 seconds of sustained normal readings before resolving the alert. This prevents alerts from rapidly flapping between active and resolved states.


Escalation Team Assignment

Each configuration can optionally be linked to an escalation team. When an alert fires from that configuration, the escalation team's notification steps are triggered automatically.

If no escalation team is assigned, the alert is still created and visible in the UI — it just won't generate notifications.


Unified View

The Alert Configurations page shows both built-in and custom configurations in a single list. Each entry displays:

  • Source — whether the configuration is built-in (by Stratora) or custom (by a user)
  • Author — "Stratora" for built-in, or the username who created it for custom
  • Scope — global, site, group, or node
  • Thresholds — warning and critical values
  • Status — enabled or disabled
  • Escalation team — the linked team, if any