ceph.rados.core_workflows module

core_workflows module is a rados layer configuration module for Ceph cluster. It allows us to perform various day1 and day2 operations such as 1. Creating , modifying, setting , getting, writing, scrubbing, reading various pools like EC and replicated 2. Increase decrease PG counts, enable - disable - configure modules that do this 3. Enable logging to file, set and reset config params and cluster checks 4. Set-up email alerts and other cluster operations More operations to be added as needed

class ceph.rados.core_workflows.RadosOrchestrator(node: CephAdmin)

Bases: object

RadosOrchestrator class contains various methods that perform various day1 and day2 operations on the cluster Usage: The class is initialized with the CephAdmin object for various operations

autoscaler_pool_settings(**kwargs)

Sets various options on pools wrt PG Autoscaler :param **kwargs: various kwargs to be sent

Supported kw args: 1. pg_autoscale_mode: PG saler mode for the indivudial pool. Values-> on, warn, off. (str) 2. target_size_ratio: ratio of cluster pool will utilize. Values -> 0 - 1. (float) 3. target_size_bytes: size the pool is assumed to utilize. eg: 10T (str) 4. pg_num_min: minimum pg’s for a pool. (int)

Returns:

bench_read(pool_name: str, **kwargs) → bool

Method to trigger Read operations via the Rados Bench tool :param pool_name: pool on which the operation will be performed :param kwargs: Any other param that needs to passed

rados_read_duration -> duration of read operation (int)

Returns: True -> pass, False -> fail

bench_write(pool_name: str, **kwargs) → bool

Method to trigger Write operations via the Rados Bench tool :param pool_name: pool on which the operation will be performed :param kwargs: Any other param that needs to passed :param 1. rados_write_duration -> duration of write operation: :type 1. rados_write_duration -> duration of write operation: int :param 2. byte_size -> size of objects to be written: eg : 10KB, 4096 :type 2. byte_size -> size of objects to be written: str

Returns: True -> pass, False -> fail

change_osd_state(action: str, target: int) → bool

Changes the state of the OSD daemons wrt the action provided :param action: operation to be performed on the service, i.e start, stop, restart :param target: ID osd the target OSD

Returns: Pass -> True, Fail -> False

change_recover_threads(config: dict, action: str)

increases or decreases the recovery threads based on the action sent :param config: Config from the suite file for the run :param action: Set or remove increase the backfill / recovery threads

Values“set” -> set the threads to specified value
“rm” -> remove the config changes made

check_compression_size(pool_name: str, **kwargs) → bool

Checks the given pool size against “compression_required_ratio” and verifies that data is compressed in accordance to the ratio provided :param pool_name: Name of the pool :param **kwargs: additional params needed.

Allowed values:
compression_required_ratio: ratio set on the pool for compression

Returns: True -> pass, False -> fail

collect_osd_daemon_ids(osd_node) → dict: The method is used to collect the various OSD daemons present on a particular node :param osd_node: name of the OSD node on which osd daemon details are collected (ceph.ceph.CephNode): ceph node :return: list of OSD ID’s

configure_pg_autoscaler(**kwargs) → bool

Configures pg_Autoscaler as a global global parameter and on pools :param **kwargs: Any other param that needs to be set

mon_target_pg_per_osd -> Sets the target number of PG’s per OSD

pool_config -> Config to be changed on the given pool (dict)
for supported args, look autoscaler_pool_settings() doc

pg_autoscale_value -> Mode of pg auto-scaling to be set, if pool name is provided (str)
the allowed values are : 1. off -> turns off PG autoscaler on the given pool 2. warn -> displays warnings in ceph status, but does not trigger autoscale 3. on -> automatically autoscale based on PG count in pool

default_mode -> Default mode to be set for all the newly created pools on the cluster (str)
the allowed values are : 1. off -> turns off PG autoscaler on the given pool 2. warn -> displays warnings in ceph status, but does not trigger autoscale 3. on -> automatically autoscale based on PG count in pool

Returns: True -> pass, False -> fail

create_erasure_pool(name: str, **kwargs) → bool

Creates a erasure code profile and then creates a pool with the same References: https://docs.ceph.com/en/latest/rados/operations/erasure-code/ :param name: Name of the profile to create :param **kwargs: Any other param that needs to be set in the EC profile

k -> the number of data chunks (int)

m -> the number of coding chunks (int)

l -> Group the coding and data chunks into sets of size locality.

crush-failure-domain -> crush object to be us to store replica sets (str)

plugin -> plugin to be set (str)
supported plugins: 1. jerasure (default) 2. isa 3. lrc 4. shec 5. clay

pool_name -> pool name to create and associate with the EC profile being created

Returns: True -> pass, False -> fail

create_pool(pool_name: str, **kwargs) → bool

Create a pool named from the pool_name parameter.

Args:

pool_name: name of the pool being created. kwargs: Any other args that need to be passed

pg_num -> number of PG’s and PGP’s

ec_profile_name -> name of EC profile if pool being created is a EC pool

min_size -> min replication size for pool for pool to serve data

size -> min replication size for pool for pool to write data

erasure_code_use_overwrites -> allows overrides in an erasure coded pool

allow_ec_overwrites -> This lets RBD and CephFS store their data in an erasure coded pool

disable_pg_autoscale -> sets auto-scale mode off on the pool

crush_rule -> custom crush rule for the pool

pool_quota -> limit the maximum number of objects or the maximum number of bytes stored

Returns: True -> pass, False -> fail

detete_pool(pool: str) → bool

Deletes the given pool from the cluster :param pool: name of the pool to be deleted

Returns: True -> pass, False -> fail

disable_configuration_checks(configs: list) → bool

disables checks for the configs provided Note: Once enabled the module, all the config checks are enabled by default :param configs: list of config checks that need to be disabled. (list)

Returns: True -> Pass, False -> fail

enable_balancer(**kwargs) → bool

Enables the balancer module with the given mode :param kwargs: Any other args that need to be passed :param Supported kw args:

balancer_mode: There are currently two supported balancer modes (str) -> crush-compat -> upmap (default )

target_max_misplaced_ratiothe percentage of PGs that are allowed to misplaced by balancer (float)
target_max_misplaced_ratio = .07

sleep_intervalnumber of seconds to sleep in between runs (int)
sleep_interval = 60

Returns: True -> pass, False -> fail

enable_configuration_checks(configs: list) → bool

Enables checks for the configs provided Note: Once enabled the module, all the config checks are enabled by default :param configs: list of config checks that need to be Enabled. (list)

Returns: True -> Pass, False -> fail

enable_email_alerts(**kwargs) → bool

Enables the email alerts module and configures alerts to be sent References : https://docs.ceph.com/en/latest/mgr/alerts/ :param **kwargs: Any other param that needs to be set :param Various args that can be passed are: :param 1. smtp_host: :param 2. smtp_sender: :param 3. smtp_ssl: :param 4. smtp_port: :param 5. interval: :param 6. smtp_from_name: :param 7. smtp_destination:

Returns: True -> pass, False -> fail

enable_file_logging() → bool: Enables the cluster logging into files at var/log/ceph and checks file permissions Returns: True -> pass, False -> fail

fetch_host_node(daemon_type: str, daemon_id: Optional[str] = None)

Provides the Ceph cluster object for the given daemon. ceph_cluster :param daemon_type: type of daemon

Allowed values: alertmanager, crash, mds, mgr, mon, osd, rgw, prometheus, grafana, node-exporter

Parameters: daemon_id – name of the daemon, ID in case of OSD’s

Returns: ceph object for the node

get_cluster_date()

Used to get the osd parameter value :param cmd: Command that needs to be run on container

Returns : string value

get_pg_acting_set(**kwargs) → list

Fetches the PG details about the given pool and then returns the acting set of OSD’s from sample PG of the pool :param kwargs: Args that can be passed to fetch acting set

pool_name: name of the pool whose one of the acting OSD set is needed. pg_num: pg whose acting set needs to be fetched None: Collects the acting set of pool with ID 1

Parameters: eg –

Returns: list osd’s part of acting set eg : [3,15,20]

get_pool_property(pool, props)

Returns: key value pair for the requested property Note : Trying to fetch the value for property, which has not been set will error out

list_pools() → list: Collect the list of pools present on the cluster Returns: list of pool names

pool_inline_compression(pool_name: str, **kwargs) → bool

BlueStore supports inline compression using snappy, zlib, or lz4. This module sets various compression modes and other related configs :param pool_name: pool name on which compression needs to be enabled and configured :param **kwargs: Various args that can be passed:

compression_modeWhether data in BlueStore is compressed is determined by compression mode.

The modes are:
none: Never compress data. passive: Do not compress data unless the write operation has a compressible hint set. aggressive: Compress data unless the write operation has an incompressible hint set. force: Try to compress data no matter what.

compression_algorithmcompression algorithm to be used.

Supported:
<empty string> snappy zlib zstd lz4

compression_required_ratioThe ratio of the size of the data chunk after compression.
eg : 0.7

compression_min_blob_sizeChunks smaller than this are never compressed.
eg : 10B

compression_max_blob_sizeChunks larger than this value are broken into smaller blobs
eg : 10G

Returns: Pass -> true , Fail -> false

reweight_crush_items(**kwargs) → bool

Performs Re-weight of various CRUSH items, based on key-value pairs sent :param **kwargs: Arguments for the commands

Returns: True -> pass, False -> fail

run_ceph_command(cmd: str, timeout: int = 300)

Runs ceph commands with json tag for the action specified otherwise treats action as command and returns formatted output :param cmd: Command that needs to be run :param timeout: Maximum time allowed for execution.

Returns: dictionary of the output

run_deep_scrub(**kwargs)

Run scrub on the given OSD or on all OSD’s

Args: kwargs: 1. osd : if a OSD id is passed , scrub to be triggered on that osd

eg: obj.run_deep_scrub(osd=3)

Returns: True -> pass, False -> fail

run_scrub(**kwargs)

Run scrub on the given OSD or on all OSD’s

Args:: kwargs: 1. osd : if a OSD id is passed , scrub to be triggered on that osd

eg: obj.run_scrub(osd=3)

Returns: True -> pass, False -> fail

set_cluster_configuration_checks(**kwargs) → bool

Sets up Cephadm to periodically scan each of the hosts in the cluster, and to understand the state of the OS,: disks, NICs etc ref doc : https://docs.ceph.com/en/latest/cephadm/operations/#cluster-configuration-checks

Parameters

kwargs – Any other param that needs to passed
are (The allowed list of configuration values that can be sent) –
disable_check_list (1.) – list of config checks that need to be disabled. (list)
enable_check_list (2.) – list of config checks that need to be Enabled. (list)
are –
kernel_security (1.) – checks SELINUX/Apparmor profiles are consistent across cluster hosts
os_subscription (2.) – checks subscription states are consistent for all cluster hosts
public_network (3.) – check that all hosts have a NIC on the Ceph public_netork
osd_mtu_size (4.) – check that OSD hosts share a common MTU setting
osd_linkspeed (5.) – check that OSD hosts share a common linkspeed
network_missing (6.) – checks that the cluster/public networks defined exist on the Ceph hosts
ceph_release (7.) – check for Ceph version consistency - ceph daemons should be on the same release
kernel_version (8.) – checks that the MAJ.MIN of the kernel on Ceph hosts is consistent

Returns: True -> pass, False -> fail

set_pool_property(pool, props, value)

Used to fetch a given property set on the pool :param pool: name of the pool :param props: property to be set on pool.

Allowed values : size|min_size|pg_num|pgp_num|crush_rule|hashpspool|nodelete|nopgchange|nosizechange| write_fadvise_dontneed|noscrub|nodeep-scrub|hit_set_type|hit_set_period|hit_set_count| hit_set_fpp|use_gmt_hitset|target_max_objects|target_max_bytes|cache_target_dirty_ratio| cache_target_dirty_high_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age| erasure_code_profile|min_read_recency_for_promote|all|min_write_recency_for_promote|fast_read| hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_interval| deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_priority|compression_mode| compression_algorithm|compression_required_ratio|compression_max_blob_size| compression_min_blob_size|csum_type|csum_min_block|csum_max_block|allow_ec_overwrites| fingerprint_algorithm|pg_autoscale_mode|pg_autoscale_bias|pg_num_min|target_size_bytes| target_size_ratio|dedup_tier|dedup_chunk_algorithm|dedup_cdc_chunk_size

Parameters: value – value to be set for the property

Returns: Pass -> True, Fail -> False

verify_ec_overwrites(**kwargs) → bool

Creates RBD image on overwritten EC pool & replicated metadata pool :param **kwargs: various kwargs to be sent

Supported kw args:

image_name : name of the RBD image

image_size : size of the RBD image

Returns: True -> pass, False -> fail

verify_reweight(affected_osds: list, osd_info: list) → bool

Verifies if Re-weight of various CRUSH items reduced the data on the re-weighted OSD’s :param affected_osds: osd’s whose weights were changed :param osd_info: OSD details before the re-weight was performed

Returns: Pass -> True, Fail -> False