Caution
You're viewing documentation for a previous version of Scylla Manager. Switch to the latest stable version.
The repair commands allow you to: create and update a repair (ad-hoc or scheduled), and change selected parameters while a repair is running.
sctool repair <subcommand> [global flags] [parameters]
Subcommands
Command |
Usage |
---|---|
Schedule a repair (ad-hoc or scheduled). |
|
Change parameters while a repair is running. |
|
Modify properties of the existing repair task. |
The repair command allows you to schedule or run ad-hoc cluster repair.
sctool repair --cluster <id|name> [--dc <list of glob patterns>] [--dry-run]
[--fail-fast] [--interval <time between task runs>] [--host <node IP>]
[--intensity <float>] [--keyspace <list of glob patterns>] [--parallel <integer>]
[--start-date <now+duration|RFC3339>]
[global flags]
In addition to Global flags, repair takes the following parameters:
-c
, --cluster
¶The cluster name. This is the name you assigned to the cluster when you created it with cluster add. You can see the cluster name and ID by running the command ref:cluster-list.
--dc <list of glob patterns>
¶List of data centers to be repaired, separated by a comma. This can also include glob patterns.
The following syntax is supported:
*
- matches any number of any characters including none
?
- matches any single character
[abc]
- matches one character given in the bracket
[a-z]
- matches one character from the range given in the bracket
Patterns are evaluated from left to right.
If a pattern starts with !
it unselects items that were selected by previous patterns
i.e. a?,!aa
selects ab but not aa.
Example
Given the following data centers: us-east-1, us-east-2, us-west-1, us-west-2.
Parameter |
Selects |
---|---|
|
us-east-1, us-west-2 |
|
us-east-1, us-east-2 |
|
us-west-1, us-west-2 |
Default: everything - all data centers
--dry-run
¶Validates and displays repair information without actually scheduling the repair. This allows you to display what will happen should the repair run with the parameters you set.
Example
Given the following keyspaces:
system_auth
system_distributed
system_traces
test_keyspace_dc1_rf2, test_keyspace_dc1_rf3, and test_keyspace_dc2_rf2
keyspace_dc2_rf3
test_keyspace_rf2 and test_keyspace_rf3
The following command will run a repair on all keyspaces except for test_keyspace_dc1_rf2 in dry-run mode.
sctool repair --dry-run -K '*,!test_keyspace_dc1_rf2'
NOTICE: dry run mode, repair is not scheduled
Data Centers:
- dc1
- dc2
Keyspace: system_auth
(all tables)
Keyspace: system_distributed
(all tables)
Keyspace: system_traces
(all tables)
Keyspace: test_keyspace_dc1_rf3
(all tables)
Keyspace: test_keyspace_dc2_rf2
(all tables)
Keyspace: test_keyspace_dc2_rf3
(all tables)
Keyspace: test_keyspace_rf2
(all tables)
Keyspace: test_keyspace_rf3
(all tables)
Example with error
sctool repair -K 'system*.bla' --dry-run -c bla
NOTICE: dry run mode, repair is not scheduled
Error: API error (status 400)
{
"message": "no matching units found for filters, ks=[system*.*bla*]",
"trace_id": "b_mSOUoOSyqSnDtk9EANyg"
}
--fail-fast
¶Stops the repair process on the first error.
Default: False
--host <node IP>
¶Address of a node to repair, you can use either an IPv4 or IPv6 address. Specifying the host flag limits repair to token ranges replicated by a given node. It can be used in conjunction with dc flag, in such a case the node must belong to the specified datacenters.
--intensity <float>
¶How many token ranges per shard to repair in a single Scylla node at the same time. By default this is 1.
It can be a decimal between (0,1). In that case the number of token ranges is a fraction of number of shards. For Scylla clusters that do not support row-level repair (Scylla 2019 and earlier), it specifies percent of shards that can be repaired in parallel on a repair master node.
If you set it to 0 the number of token ranges is adjusted to the maximum supported by node (see max_repair_ranges_in_parallel in Scylla logs). Changing the intensity impacts repair granularity if you need to resume it, the higher the value the more work on resume.
Default: 1
--parallel <integer>
¶The maximum number of Scylla repair jobs that can run at the same time (on different token ranges and replicas). Each node can take part in at most one repair at any given moment. By default the maximum possible parallelism is used. The effective parallelism depends on a keyspace replication factor (RF) and the number of nodes. The formula to calculate is is as follows: number of nodes / RF, ex. for 6 node cluster with RF=3 the maximum parallelism is 2.
Default: 0
-K, --keyspace <list of glob patterns>
¶A list of glob patterns separated by a comma. The patterns match keyspaces and tables, when you write the pattern, separate the keyspace name from the table name with a dot (KEYSPACE.TABLE).
The following syntax is supported:
*
- matches any number of any characters including none
?
- matches any single character
[abc]
- matches one character given in the bracket
[a-z]
- matches one character from the range given in the bracket
Patterns are evaluated from left to right.
If a pattern starts with !
it unselects items that were selected by previous patterns
i.e. a?,!aa
selects ab but not aa.
Example
Given the following tables:
shopping_cart.cart
orders.orders_by_date_2018_11_01
orders.orders_by_date_2018_11_15
orders.orders_by_date_2018_11_29
orders.orders_by_date_2018_12_06
Parameter |
Selects |
---|---|
|
everything - all tables in all keyspaces |
|
shopping_cart.cart |
|
shopping_cart.cart |
|
orders.orders_by_date_2018_11_01 orders.orders_by_date_2018_11_15 orders.orders_by_date_2018_11_29 |
|
orders.orders_by_date_2018_11_29 |
|
orders.orders_by_date_2018_11_01 orders.orders_by_date_2018_11_29 |
Default: everything - all tables in all keyspaces
--interval <time between task runs>
¶Amount of time after which a successfully completed task would be run again. Supported time units include:
d
- days,
h
- hours,
m
- minutes,
s
- seconds.
Default 0 (no interval)
Note
The task run date is aligned with --start date
value. For example, if you select --interval 7d
task would run weekly at the --start-date
time.
-s, --start-date <now+duration|RFC3339>
¶The date can be expressed relatively to now or as a RFC3339 formatted string.
To run the task in 2 hours use now+2h
, supported units are:
h
- hours,
m
- minutes,
s
- seconds,
ms
- milliseconds.
If you want the task to start at a specified date use RFC3339 formatted string i.e. 2018-01-02T15:04:05-07:00
.
If you want the repair to start immediately, use the value now
or skip this flag.
Default: now (start immediately)
-r, --num-retries <times to rerun a failed task>
¶Number of times a task reruns following a failure. The task reruns 10 minutes following a failure.
If the task fails after the retry times have been used, it will not retry again until its next run which was scheduled according to the --interval
parameter.
Note
If this is an ad hoc repair, the task will not run again.
Default: 3
Repairs can be scheduled to run on selected keyspaces/tables, nodes, or datacenters. Scheduled repairs run every n days depending on the frequency you set. A scheduled repair runs at the time you set it to run at. If no time is given, the repair runs immediately. Repairs can run once, or can run at a set schedule based on a time interval.
In this example, you create a repair task for a cluster named prod-cluster. The task begins on May 2, 2019 at 3:04 PM. It repeats every week at this time. As there are no datacenters or keyspaces listed, all datacenters and all data in the specified cluster are repaired.
sctool repair -c prod-cluster -s 2019-05-02T15:04:05-07:00 --interval 7d
The system replies with a repair task ID. You can use this ID to change the start time, stop the repair, or cancel the repair.
repair/3208ff15-6e8f-48b2-875c-d3c73f545410
This example repairs all datacenters starting with the name dc-asia-
, such as dc-asia-1
.
The repair begins on September 15, 2018 at 7:00 PM (JST, for example) and runs every week.
sctool repair -c prod-cluster --dc 'asia-*' -s 2018-09-15T19:00:05-07:00 --interval 7d
Using glob patterns gives you additional flexibility in selecting both keyspaces and tables. This example repairs all tables in the orders keyspace starting with 2018_12_ prefix. The repair is scheduled to run on December 4, 2018 at 8:00 AM and will run after that point every week.
sctool repair -c prod-cluster -K 'orders.2018_12_' -s 2018-12-04T08:00:05-07:00 --interval 7d
You can limit scope of repair to token ranges replicated by a specific node by specifying --host
flag.
This is equivalent to running nodetool repair -full
on that node.
If you want to recreate a node in a multi DC cluster, you can repair with local datacenter only.
To do that you must specify the --dc
flag pointing to the datacenter where the node belongs.
This example, repairs node with IP 34.203.122.52
that belongs to datacenter named eu-west
within that datacenter.
sctool repair -c prod-cluster --host 34.203.122.52 --dc eu-west
The repair control command allows you to change repair parameters while a repair is running.
sctool repair control --cluster <id|name> [--intensity <float>] [--parallel <integer>]
In addition to Global flags, repair takes the following repair control parameters:
--intensity <float>
¶How many token ranges per shard to repair in a single Scylla node at the same time. By default this is 1.
It can be a decimal between (0,1). In that case the number of token ranges is a fraction of number of shards. For Scylla clusters that do not support row-level repair (Scylla 2019 and earlier), it specifies percent of shards that can be repaired in parallel on a repair master node.
If you set it to 0 the number of token ranges is adjusted to the maximum supported by node (see max_repair_ranges_in_parallel in Scylla logs). Changing the intensity impacts repair granularity if you need to resume it, the higher the value the more work on resume.
Default: 1
--parallel <integer>
¶The maximum number of Scylla repair jobs that can run at the same time (on different token ranges and replicas). Each node can take part in at most one repair at any given moment. By default the maximum possible parallelism is used. The effective parallelism depends on a keyspace replication factor (RF) and the nr. of nodes. The formula to calculate is is as follows: nr. nodes / RF, ex. for 6 node cluster with RF=3 the maximum parallelism is 2.
Default: 0
The repair update command allows you to modify properties of an already existing repair task.
sctool repair update <task_type/task_id> --cluster <id|name> [--dc <list of glob patterns>] [--dry-run]
[--fail-fast] [--interval <time between task runs>]
[--intensity <float>] [--keyspace <list of glob patterns>] [--parallel <integer>]
[--start-date <now+duration|RFC3339>]
[global flags]
In addition to Global flags, repair update takes the same parameters as repair parameters
This example updates the repair task 143d160f-e53c-4890-a9e7-149561376cfd adding an intensity parameter to speed up the repair.
sctool repair update -c prod-cluster repair/143d160f-e53c-4890-a9e7-149561376cfd --intensity 0