Monitoring and Alerting

A production validator needs continuous monitoring, not just one-off diagnosis. This page covers the ongoing metrics-and-alerts setup; for one-time profiling of a slow node, see Performance and Profiling.

Prometheus metrics

CometBFT exposes Prometheus metrics. Enable them in config.toml:

[instrumentation]
prometheus = true
prometheus_listen_addr = ":26660"

The node then serves metrics at that port for a Prometheus scraper. From there, Grafana dashboards visualize them and Alertmanager (or your alerting stack) fires on thresholds.

What to alert on

The signals that matter most for a validator:

Missed blocks / not signing. The earliest warning that something is wrong; a sustained miss leads to downtime slashing.
Block height stalled or falling behind peers. The node is stuck or out of sync.
Peer count dropping toward zero. Networking or connectivity failure.
Disk filling up. A pruned node still grows; running out of disk halts the node.
Sentry/validator connectivity (if using a sentry architecture); see Validator Security.

Reference

One-time profiling and diagnosis: Performance and Profiling.
CometBFT instrumentation: the [instrumentation] section of config.toml.

Prometheus metrics​

What to alert on​

Reference​

Prometheus metrics

What to alert on

Reference