Was this page helpful?
Caution
You're viewing documentation for an unstable version of Scylla Manager. Switch to the latest stable version.
Health Check¶
ScyllaDB Manager automatically adds three health check tasks when the cluster is added to the ScyllaDB Manager and to existing clusters during the upgrade procedure. You can see the tasks created by the healthcheck when you run the sctool tasks command.
For example:
sctool tasks -c prod-cluster
returns:
sctool tasks -c prod-cluster
╭────────────────────────┬──────────────┬────────┬──────────────────┬─────────┬───────┬──────────────┬────────────┬─────────┬────────────────╮
│ Task │ Schedule │ Window │ Timezone │ Success │ Error │ Last Success │ Last Error │ Status │ Next │
├────────────────────────┼──────────────┼────────┼──────────────────┼─────────┼───────┼──────────────┼────────────┼─────────┼────────────────┤
│ healthcheck/cql │ @every 15s │ │ America/New_York │ 4 │ 0 │ 1s ago │ │ DONE │ in 13s │
│ healthcheck/alternator │ @every 15s │ │ America/New_York │ 3 │ 0 │ 14s ago │ │ RUNNING │ │
│ healthcheck/rest │ @every 1m0s │ │ America/New_York │ 1 │ 0 │ 1s ago │ │ DONE │ in 58s │
│ repair/all-weekly │ 0 23 * * SAT │ │ America/New_York │ 0 │ 0 │ │ │ NEW │ in 2d13h30m55s │
╰────────────────────────┴──────────────┴────────┴──────────────────┴─────────┴───────┴──────────────┴────────────┴─────────┴────────────────╯
We can see three healthcheck related tasks:
Healthcheck - which checks the ScyllaDB CQL, repeating every 15 seconds.
Healthcheck Alternator - which checks the ScyllaDB Alternator API, repeating every 15 seconds.
Healthcheck REST - which checks the ScyllaDB REST API, repeating every minute.
ScyllaDB Health Check¶
The ScyllaDB health check task ensures that CQL native port is accessible on all the nodes. ScyllaDB Manager reads CQL IP address and port from the node configuration, and can automatically detect TLS/SSL connection. There are two types of CQL health check Credentials agnostic health check and CQL query health check.
The results are available using the sctool status command.
For example:
sctool status -c prod-cluster
Datacenter: eu-west
╭────┬────────────┬───────────┬───────────┬───────────────┬──────────┬──────┬──────────┬────────┬──────────┬──────────────────────────────────────╮
│ │ Alternator │ CQL │ REST │ Address │ Uptime │ CPUs │ Memory │ Scylla │ Agent │ Host ID │
├────┼────────────┼───────────┼───────────┼───────────────┼──────────┼──────┼──────────┼────────┼──────────┼──────────────────────────────────────┤
│ UN │ UP (4ms) │ UP (3ms) │ UP (2ms) │ 34.203.122.52 │ 237h2m1s │ 4 │ 15.43GiB │ 4.1.0 │ 2.2.0 │ 8bfd18f1-ac3b-4694-bcba-30bc272554df │
│ UN │ UP (15ms) │ UP (11ms) │ UP (12ms) │ 10.0.138.46 │ 237h2m1s │ 4 │ 15.43GiB │ 4.1.0 │ 2.2.0 │ 238acd01-813c-4c55-bd65-5219bb19bc20 │
│ UN │ UP (17ms) │ UP (5ms) │ UP (7ms) │ 10.0.196.204 │ 237h2m1s │ 4 │ 15.43GiB │ 4.1.0 │ 2.2.0 │ bde4581a-b25e-49fc-8cd9-1651d7683f80 │
│ UN │ UP (10ms) │ UP (4ms) │ UP (5ms) │ 10.0.66.115 │ 237h2m1s │ 4 │ 15.43GiB │ 4.1.0 │ 2.2.0 │ 918a52aa-cc42-43a4-a499-f7b1ccb53b18 │
╰────┴────────────┴───────────┴───────────┴───────────────┴──────────┴──────┴──────────┴────────┴──────────┴──────────────────────────────────────╯
The status information is also available as a metric in ScyllaDB Monitoring Manager dashboard.
The healthcheck task checks nodes every 15 seconds, the interval can be changed using task-update command.
The CQL column shows the CQL status, SSL indicator if SSL is enabled on a node, and time the check took.
Available statuses are:
UP - Situation normal
DOWN - Failed to connect to host or CQL error
ERROR - Precondition failure, no request was sent
UNAUTHORISED - Wrong username or password - only if
usernameis specified for clusterTIMEOUT - Timeout
The REST column shows the status of ScyllaDB Manager Server to ScyllaDB API communication, and time the check took.
Available statuses are:
UP - Situation normal
DOWN - Failed to connect to host
ERROR - Precondition failure, no request was sent
HTTP XXX - HTTP failure and its status code
UNAUTHORISED - Missing or Incorrect Authentication Token was used
TIMEOUT - Timeout
Error information¶
Added in version 2.5: ScyllaDB Manager
In case of error (status ERROR or DOWN) there is additional error section below the table describing the errors.
sctool status -c test-cluster
Datacenter: eu-west
╭────┬────────────┬────────────┬──────────┬────────────────┬──────────┬──────┬──────────┬────────┬──────────┬──────────────────────────────────────╮
│ │ Alternator │ CQL │ REST │ Address │ Uptime │ CPUs │ Memory │ Scylla │ Agent │ Host ID │
├────┼────────────┼────────────┼──────────┼────────────────┼──────────┼──────┼──────────┼────────┼──────────┼──────────────────────────────────────┤
│ UN │ UP (12ms) │ DOWN (0ms) │ UP (3ms) │ 192.168.100.11 │ 1h32m35s │ 4 │ 31.11GiB │ 4.2.1 │ 2.5.0 │ 1edbfd5b-4b1c-4bb0-afab-d69fd25db6af │
│ UN │ UP (8ms) │ UP (3ms) │ UP (5ms) │ 192.168.100.12 │ 1h32m35s │ 4 │ 31.11GiB │ 4.2.1 │ 2.5.0 │ 0c0999a2-c879-4e69-9924-1641c8487bd5 │
│ UN │ UP (10ms) │ UP (8ms) │ UP (1ms) │ 192.168.100.13 │ 1h32m35s │ 4 │ 31.11GiB │ 4.2.1 │ 2.5.0 │ 73e9818e-ed8d-4ea8-89e4-cf485dfd4ebe │
╰────┴────────────┴────────────┴──────────┴────────────────┴──────────┴──────┴──────────┴────────┴──────────┴──────────────────────────────────────╯
Errors:
- 192.168.100.11 CQL: dial tcp 192.168.100.11:9042: connect: connection refused
Node information¶
Added in version 2.2: ScyllaDB Manager
Node status check also provides additional columns that show properties of the available nodes. Those are:
CPUs - Total OS CPU count
Memory - Total OS memory available
Uptime - How long the system has been running without restarts
ScyllaDB - Version of ScyllaDB server running on the node
Agent - Version of ScyllaDB Manager Agent running on the node
Host - UUID of the node
Address - IP address of the node
ScyllaDB Monitoring¶
If you have enabled the ScyllaDB Monitoring stack, ScyllaDB Manager dashboard includes the same cluster status report. In addition, the Prometheus Alert Manager has an alert to report when a ScyllaDB node health check fails.
Credentials agnostic health check¶
ScyllaDB Manager does not require database credentials to work. CQL health check is based on sending CQL OPTIONS frame and does not start a CQL session. This is simple and effective but does not test CQL all the way down. For that you may consider upgrading to :ref: <cql-query-health-check>`.
CQL query health check¶
Added in version 2.2: ScyllaDB Manager
You may specify CQL username and password flags when adding cluster to ScyllaDB Manager using sctool cluster add command.
It’s also possible to add or change that using sctool cluster update command.
Once ScyllaDB Manager has CQL credential to the cluster, when performing a health check, it would try to connect to each node and execute SELECT now() FROM system.local WHERE key='local' query.
ScyllaDB Alternator Health Check¶
Added in version 2.2: ScyllaDB Manager
If Alternator is enabled it will check the ScyllaDB Alternator API connectivity for all nodes in parallel. In ScyllaDB 4.0, it uses simplified ping checking if the socket is open and if it’s responding. In ScyllaDB 4.1+, it queries the system table.
Please check the ScyllaDB Manager ScyllaDB Manager Config to adjust timeouts for your cluster.
ScyllaDB REST API Health Check¶
Checks ScyllaDB REST API connectivity by performing single HTTP request-response cycle between ScyllaDB Manager Server and all ScyllaDB nodes in parallel.
Please check the ScyllaDB Manager ScyllaDB Manager Config to adjust timeouts for your cluster.