Skip to main content

Stratora Architecture

Overview

Stratora is a distributed infrastructure monitoring platform designed for manufacturing IT environments and MSPs. It follows a central-server architecture with remote collectors and agents, deployed on-premises to support OT/IT network segmentation.


Component Diagram


Communication Protocols

FlowProtocolEndpointIntervalAuth
Collector config pullHTTPS GET/api/v1/remote-collectors/{id}/config10sX-API-Key (sk_stra_)
Collector status reportHTTPS POST/api/v1/remote-collectors/{id}/reportOn config changeX-API-Key
Agent heartbeatHTTPS POST/api/v1/agents/heartbeat/{node_id}10sX-API-Key
Metric ingestHTTPS POST/api/v1/ingest/influx (InfluxDB line protocol)10s flushX-API-Key
Component enrollmentHTTPS POST/api/v1/components/registerOnce (at install)Enrollment token
Discovery scansInternal (server goroutine)N/AOn demandN/A
Frontend APIHTTPS/api/v1/*On demandSession cookie / JWT
PromQL queriesInternal HTTPVictoriaMetrics :8428/api/v1/query*On demandN/A (localhost)

Enrollment Flow

Key properties:

  • Enrollment token is multi-use — same token deploys unlimited collectors/agents
  • Re-enrollment rotates the API key without creating duplicates
  • Each component (collector or agent) gets its own independent sk_stra_ API key
  • Enrollment tokens are SHA-256 hashed; component API keys are bcrypt-hashed

Config Delivery Flow

Dirty flag triggers:

  • Node created, updated, deleted, or reassigned
  • Credential updated, attached, or detached
  • Device template reloaded
  • Server startup (all online collectors marked dirty)

Debounce: 5-second window batches rapid changes into a single config regeneration.


Polling Intervals

TargetDefaultException Reason
All collection (SNMP, ping, system)10sBalance between freshness and load
QNAP NAS SNMP60sHeavy MIB walks
Synology NAS SNMP60sHeavy MIB walks
vCenter vSphere API300sVMware API rate limits
SSL certificate checks300sCertificates don't change frequently
Config debounce window5sBatch rapid changes
Node unreachable threshold300sIndustry standard (5 min)
Alert evaluator cycle10sReal-time alerting
Frontend refetch interval10sDashboard freshness

Windows Services

Service NameBinaryPurposeRuns On
StratoraBackendbackend.exeGo/Gin API serverServer only
StratoraNginxnginx.exeHTTPS reverse proxy + static filesServer only
StratoraVictoriaMetricsvictoria-metrics.exeTime-series databaseServer only
postgresql-x64-17pg_ctl.exePostgreSQL metadata databaseServer only
StratoraCollectorcollector.exeManages collector Telegraf lifecycleServer + remote collectors
stratora-collector-telegraftelegraf.exeSNMP/ICMP/vSphere pollingServer + remote collectors
StratoraAgentstratora-agent.exeManages agent Telegraf lifecycleAll monitored Windows servers
stratora-agent-telegraftelegraf.exeLocal system metrics (WMI)All monitored Windows servers

Directory Structure

PathPurpose
C:\Program Files\Stratora\Collector\Collector binary + Telegraf binary
C:\Program Files\Stratora\Collector\telegraf\Telegraf executable for collector
C:\ProgramData\Stratora\Collector\config.jsonCollector credentials (component_id, api_key, server_url)
C:\ProgramData\Stratora\Collector\telegraf\Server-generated Telegraf config (telegraf.conf)
C:\ProgramData\Stratora\Collector\logs\Collector + Telegraf logs (with rotation)
C:\Program Files\Stratora\Agent\Agent binary + Telegraf binary
C:\Program Files\Stratora\Agent\config.jsonAgent credentials (component_id, api_key, server_url)
C:\ProgramData\Stratora\Agent\logs\Agent + Telegraf logs

Health Status Model

StatusMeaningSet ByTrigger
DiscoveringNode recently added or reassigned, awaiting first dataNode creation / collector reassignmentAutomatic on create/reassign
HealthyNode reachable, no threshold breachesAlert evaluatorSuccessful data collection
WarningNode reachable, metric threshold warningAlert evaluatorWarning-level alert active
CriticalNode reachable, metric threshold criticalAlert evaluatorCritical-level metric alert active
OfflineNode unreachable for 5+ minutesAlert evaluatorReachability alert (node_unreachable, agent_heartbeat)
MaintenancePlanned maintenance windowAdminManual or scheduled

Discovery Scans

Discovery scans run server-side only as goroutines using the gosnmp library. The scan pipeline:

  1. Admin creates a discovery job (target CIDR, SNMP community, scan options)
  2. Server spawns a background goroutine (runScan)
  3. Phase 1: ICMP/TCP ping sweep (optional)
  4. Phase 2: SNMP probe — queries sysDescr, sysObjectID, sysName, sysLocation, sysContact
  5. Phase 3: Template fingerprinting — matches sysObjectID against device_template_fingerprints
  6. Phase 4: DNS lookup (optional)
  7. Results stored in discovered_devices table
  8. Admin imports discovered devices into monitoring (auto-assigns first approved collector)

Current limitation: Remote collectors have no discovery capability. Subnets only reachable from a remote collector's network cannot be scanned. This is a planned enhancement — the collector would poll a discovery job queue and execute scans locally.


Metric Ingest Path

All metrics flow through the authenticated backend ingest proxy. VictoriaMetrics is bound to 127.0.0.1:8428 and is not directly accessible from remote collectors.


Key Database Tables

TablePurpose
nodesAll monitored devices (network, server, storage, etc.)
remote_componentsRegistered collectors and agents (enrollment, API keys, heartbeats)
collector_targetsWhich nodes are assigned to which collector
collector_config_dirtyDirty flag for config regeneration with debounce timestamp
collector_configsCached generated Telegraf configs per collector (hash-based)
alertsActive and historical alerts with state tracking
alert_definitionsBuilt-in and custom alert rules with PromQL conditions
credentialsSNMP communities, WMI credentials (AES-256-GCM encrypted)
device_templatesTelegraf config templates per device type
device_template_fingerprintssysObjectID patterns for auto-detection during discovery
discovery_jobsDiscovery scan jobs (target CIDR, status, results count)
discovered_devicesDevices found during discovery scans

Authentication

User Authentication

  1. User submits credentials to POST /api/v1/auth/login
  2. Backend validates against local database or LDAP/Active Directory
  3. Session token stored in HTTP-only cookie
  4. Subsequent requests validated via middleware

Component Authentication (Collectors/Agents)

  1. X-API-Key: sk_stra_... header on every request
  2. Backend validates against remote_components.api_key_hash (bcrypt)
  3. Approval gate: unapproved collectors receive 403 Forbidden

Role-Based Access Control

RoleCapabilities
AdminFull access, user management, settings, credential management
OperatorManage nodes, dashboards, alerts, attach credentials
ViewerRead-only access to dashboards, nodes, and masked credentials

Deployment Patterns

Single Server

All components on one machine. Suitable for small environments (< 100 nodes), development, and proof of concept.

Distributed Collectors

Backend centralized, collectors deployed at remote sites. Suitable for multi-site deployments, IT/OT network segmentation, and scale-out metric collection. Verified in production with DJN-DC-DEV01 (local) + DJN-DC-DEV04 (remote).

High Availability (Planned)

  • Backend clustering with shared PostgreSQL
  • VictoriaMetrics cluster mode
  • Load balancer for frontend

Technology Stack

LayerTechnologyWhy
BackendGo 1.24, GinPerformance, concurrency, single binary
FrontendReact 19, TypeScript, Tailwind, ViteComponent model, ecosystem, React Query
Metrics DBVictoriaMetricsPromQL, compression, performance
Metadata DBPostgreSQLReliability, JSONB support, migrations
CollectionTelegrafPlugin ecosystem, SNMP support, stability
Reverse ProxyNGINXTLS termination, static file serving
VisualizationReact Flow, RechartsTopology maps, charts