Alert Response Times
Stratora's alerting pipeline has well-defined latency characteristics at each stage. This page documents expected detection, notification delivery, and action-response times so operators can set accurate expectations for their environment.
Node Unreachable (Fast-Path)
The Node Unreachable alert detects total connectivity loss as quickly as one evaluation cycle — a single 100% packet-loss sample fires the alert.
| Event | Expected Time |
|---|---|
| Detection | 20–30 seconds |
| Recovery (Stratora contribution) | ~40 seconds after node is genuinely reachable again |
| Recovery (Total, depending on boot speed) | 45–100 seconds |
The 20–30s range is timing alignment: worst case is a node going down immediately after an evaluation cycle completes, requiring a full 10s wait for the next cycle plus up to 10s for Telegraf to collect and flush the ping data.
Reachability fast-path alerts (Node Unreachable, Agent Heartbeat Lost, Collector Offline) skip the standard 20-second resolution grace period because they already require a multi-cycle recovery streak before considering the node back. See Alert Configurations — Evaluation for details.
High Packet Loss (Threshold Alert)
The High Packet Loss alert uses a rolling-average packet-loss percentage over a sliding window. It is a secondary signal for partial degradation — the Node Unreachable alert above is the primary indicator for complete connectivity loss.
| Event | Expected Time |
|---|---|
| Detection (sustained packet loss) | ~60 seconds |
| Recovery detection (from stable ping) | 80–110 seconds |
| Recovery detection (from power-on) | ~120–150 seconds (boot + window decay + grace alignment) |
Detection requires sustained loss. A single dropped packet will not trigger an alert. Loss must exceed the configured threshold (default: 5% warning, 20% critical) over the full evaluation window (default: 60 seconds).
The 20-second resolution grace period applies to packet-loss threshold alerts and most other configurations.
Alert Evaluation
Alert rules are evaluated on a configurable interval (default: 10 seconds). After a node transitions to offline/degraded, the alert rule evaluator must complete its next cycle before the alert fires and enters the notification pipeline.
Notification Delivery
Notification delivery time depends on the channel:
| Channel | Typical Delivery |
|---|---|
| 5-30 seconds (dependent on mail relay) | |
| Slack | 1-5 seconds |
| Microsoft Teams | 1-5 seconds |
| Webhook | 1-3 seconds |
| SMS (Twilio - bidirectional) | 5-15 seconds |
| SMS (Twilio - polling mode) | 15-45 seconds (dependent on poll interval) |
| Voice (Twilio) | 10-30 seconds (call setup) |
ACK / Escalate Response (SMS Polling Mode)
In air-gapped or outbound-only deployments, Stratora uses Twilio Sync polling to receive ACK and ESCALATE replies sent via SMS. The poll interval is configurable (default: 15 seconds).
| Action | Response Latency (polling mode) |
|---|---|
| ACK via SMS reply | Up to 1x poll interval (default: 15s or less) |
| ESCALATE via SMS reply | Up to 1x poll interval (default: 15s or less) |
In bidirectional mode (internet-accessible deployments), Twilio delivers inbound SMS replies directly to Stratora via webhook. ACK/Escalate response latency drops to approximately 1-3 seconds.
End-to-End Example
For a node that goes offline:
- 0s - Node stops responding to ICMP ping
- ~50s - Loss threshold exceeded; node marked offline
- ~60s - Alert rule fires on next evaluation cycle
- ~61-65s - Slack/Teams notification delivered
- ~70-90s - Email delivered (mail relay dependent)
- ~75-80s - SMS delivered (Twilio bidirectional)
- ~80-95s - SMS delivered (Twilio polling mode)
Note: The end-to-end times above reflect typical conditions. High-latency mail relays, Twilio rate limits, or large escalation team fan-out may add additional seconds.
Tuning
The ping detection window and thresholds are configurable. Defaults are tuned to balance sensitivity against spurious alert suppression:
| Parameter | Default | Notes |
|---|---|---|
| Loss evaluation window | 60 seconds | Shorter = faster detection, more sensitive to transient loss |
| Alert threshold (warning) | 5% | |
| Alert threshold (critical) | 20% | |
| Twilio poll interval | 15 seconds | Polling mode only |