Skip to main content

Alert Response Times

Stratora's alerting pipeline has well-defined latency characteristics at each stage. This page documents expected detection, notification delivery, and action-response times so operators can set accurate expectations for their environment.

Node Unreachable (Fast-Path)

The Node Unreachable alert detects total connectivity loss as quickly as one evaluation cycle — a single 100% packet-loss sample fires the alert.

EventExpected Time
Detection20–30 seconds
Recovery (Stratora contribution)~40 seconds after node is genuinely reachable again
Recovery (Total, depending on boot speed)45–100 seconds

The 20–30s range is timing alignment: worst case is a node going down immediately after an evaluation cycle completes, requiring a full 10s wait for the next cycle plus up to 10s for Telegraf to collect and flush the ping data.

Reachability fast-path alerts (Node Unreachable, Agent Heartbeat Lost, Collector Offline) skip the standard 20-second resolution grace period because they already require a multi-cycle recovery streak before considering the node back. See Alert Configurations — Evaluation for details.

High Packet Loss (Threshold Alert)

The High Packet Loss alert uses a rolling-average packet-loss percentage over a sliding window. It is a secondary signal for partial degradation — the Node Unreachable alert above is the primary indicator for complete connectivity loss.

EventExpected Time
Detection (sustained packet loss)~60 seconds
Recovery detection (from stable ping)80–110 seconds
Recovery detection (from power-on)~120–150 seconds (boot + window decay + grace alignment)

Detection requires sustained loss. A single dropped packet will not trigger an alert. Loss must exceed the configured threshold (default: 5% warning, 20% critical) over the full evaluation window (default: 60 seconds).

The 20-second resolution grace period applies to packet-loss threshold alerts and most other configurations.

Alert Evaluation

Alert rules are evaluated on a configurable interval (default: 10 seconds). After a node transitions to offline/degraded, the alert rule evaluator must complete its next cycle before the alert fires and enters the notification pipeline.

Notification Delivery

Notification delivery time depends on the channel:

ChannelTypical Delivery
Email5-30 seconds (dependent on mail relay)
Slack1-5 seconds
Microsoft Teams1-5 seconds
Webhook1-3 seconds
SMS (Twilio - bidirectional)5-15 seconds
SMS (Twilio - polling mode)15-45 seconds (dependent on poll interval)
Voice (Twilio)10-30 seconds (call setup)

ACK / Escalate Response (SMS Polling Mode)

In air-gapped or outbound-only deployments, Stratora uses Twilio Sync polling to receive ACK and ESCALATE replies sent via SMS. The poll interval is configurable (default: 15 seconds).

ActionResponse Latency (polling mode)
ACK via SMS replyUp to 1x poll interval (default: 15s or less)
ESCALATE via SMS replyUp to 1x poll interval (default: 15s or less)

In bidirectional mode (internet-accessible deployments), Twilio delivers inbound SMS replies directly to Stratora via webhook. ACK/Escalate response latency drops to approximately 1-3 seconds.

End-to-End Example

For a node that goes offline:

  1. 0s - Node stops responding to ICMP ping
  2. ~50s - Loss threshold exceeded; node marked offline
  3. ~60s - Alert rule fires on next evaluation cycle
  4. ~61-65s - Slack/Teams notification delivered
  5. ~70-90s - Email delivered (mail relay dependent)
  6. ~75-80s - SMS delivered (Twilio bidirectional)
  7. ~80-95s - SMS delivered (Twilio polling mode)

Note: The end-to-end times above reflect typical conditions. High-latency mail relays, Twilio rate limits, or large escalation team fan-out may add additional seconds.

Tuning

The ping detection window and thresholds are configurable. Defaults are tuned to balance sensitivity against spurious alert suppression:

ParameterDefaultNotes
Loss evaluation window60 secondsShorter = faster detection, more sensitive to transient loss
Alert threshold (warning)5%
Alert threshold (critical)20%
Twilio poll interval15 secondsPolling mode only