Databend Metrics

Metrics are crucial to monitor the performance and health of the system. Databend collects and stores two types of metrics, Meta Metrics and Query Metrics, in the format of Prometheus. Meta Metrics are used for real-time monitoring and debugging of the Metasrv component, while Query Metrics are used for monitoring the performance of the Databend-query component.

You can access the metrics through a web browser using the following URLs:

Meta Metrics: http://<admin_api_address>/v1/metrics. Defaults to 0.0.0.0:28002/v1/metrics.
Query Metrics: http://<metric_api_address>/metrics. Defaults to 0.0.0.0:7070/metrics.

tip

Alternatively, you can visualize the metrics using third-party tools. For information about supported tools and integration tutorials, refer to Monitor > Using 3rd-party Tools. When employing the Prometheus & Grafana solution, you can create dashboards using our provided dashboard templates, available here. For more details, check out the Prometheus & Grafana guide.

Meta Metrics

Here's a list of Meta metrics captured by Databend.

Server

These metrics describe the status of the metasrv. All these metrics are prefixed with metasrv_server_.

Name	Description	Type
current_leader_id	Current leader id of cluster, 0 means no leader.	Gauge
is_leader	Whether or not this node is current leader.	Gauge
node_is_health	Whether or not this node is health.	Gauge
leader_changes	Number of leader changes seen.	Counter
applying_snapshot	Whether or not statemachine is applying snapshot.	Gauge
proposals_applied	Total number of consensus proposals applied.	Gauge
last_log_index	Index of the last log entry..	Gauge
current_term	Current term.	Gauge
proposals_pending	Total number of pending proposals.	Gauge
proposals_failed	Total number of failed proposals.	Counter
watchers	Total number of active watchers.	Gauge

current_leader_id indicate current leader id of cluster, 0 means no leader. If a cluster has no leader, it is unavailable.

is_leader indicate if this metasrv currently is the leader of cluster, and leader_changes show the total number of leader changes since start.If change leader too frequently, it will impact the performance of metasrv, also it signal that the cluster is unstable.

If and only if the node state is Follower or Leader , node_is_health is 1, otherwise is 0.

proposals_applied records the total number of applied write requests.

last_log_index records the last log index has been appended to this Raft node's log, current_term records the current term of the Raft node.

proposals_pending indicates how many proposals are queued to commit currently.Rising pending proposals suggests there is a high client load or the member cannot commit proposals.

proposals_failed show the total number of failed write requests, it is normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster.

watchers show the total number of active watchers currently.

Raft Network

These metrics describe the network status of raft nodes in the metasrv. All these metrics are prefixed with metasrv_raft_network_.

Name	Description	Labels	Type
active_peers	Current number of active connections to peers.	id(node id),address(peer address)	Gauge
fail_connect_to_peer	Total number of fail connections to peers.	id(node id),address(peer address)	Counter
sent_bytes	Total number of sent bytes to peers.	to(node id)	Counter
recv_bytes	Total number of received bytes from peers.	from(remote address)	Counter
sent_failures	Total number of send failures to peers.	to(node id)	Counter
snapshot_send_success	Total number of successful snapshot sends.	to(node id)	Counter
snapshot_send_failures	Total number of snapshot send failures.	to(node id)	Counter
snapshot_send_inflights	Total number of inflight snapshot sends.	to(node id)	Gauge
snapshot_sent_seconds	Total latency distributions of snapshot sends.	to(node id)	Histogram
snapshot_recv_success	Total number of successful receive snapshot.	from(remote address)	Counter
snapshot_recv_failures	Total number of snapshot receive failures.	from(remote address)	Counter
snapshot_recv_inflights	Total number of inflight snapshot receives.	from(remote address)	Gauge
snapshot_recv_seconds	Total latency distributions of snapshot receives.	from(remote address)	Histogram

active_peers indicates how many active connection between cluster members, fail_connect_to_peer indicates the number of fail connections to peers. Each has the labels: id(node id) and address (peer address).

sent_bytes and recv_bytes record the sent and receive bytes to and from peers, and sent_failures records the number of fail sent to peers.

snapshot_send_success and snapshot_send_failures indicates the success and fail number of sent snapshot.snapshot_send_inflights indicate the inflight snapshot sends, each time send a snapshot, this field will increment by one, after sending snapshot is done, this field will decrement by one.

snapshot_sent_seconds indicate the total latency distributions of snapshot sends.

snapshot_recv_success and snapshot_recv_failures indicates the success and fail number of receive snapshot.snapshot_recv_inflights indicate the inflight receiving snapshot, each time receive a snapshot, this field will increment by one, after receiving snapshot is done, this field will decrement by one.

snapshot_recv_seconds indicate the total latency distributions of snapshot receives.

Raft Storage

These metrics describe the storage status of raft nodes in the metasrv. All these metrics are prefixed with metasrv_raft_storage_.

Name	Description	Labels	Type
raft_store_write_failed	Total number of raft store write failures.	func(function name)	Counter
raft_store_read_failed	Total number of raft store read failures.	func(function name)	Counter

raft_store_write_failed and raft_store_read_failed indicate the total number of raft store write and read failures.

Meta Network

These metrics describe the network status of meta service in the metasrv. All these metrics are prefixed with metasrv_meta_network_.

Name	Description	Type
sent_bytes	Total number of sent bytes to meta grpc client.	Counter
recv_bytes	Total number of recv bytes from meta grpc client.	Counter
inflights	Total number of inflight meta grpc requests.	Gauge
req_success	Total number of success request from meta grpc client.	Counter
req_failed	Total number of fail request from meta grpc client.	Counter
rpc_delay_seconds	Latency distribution of meta-service API in second.	Histogram

Query Metrics

Here's a list of Query metrics captured by Databend.

Name	Type	Description	Labels
databend_cache_access_count	Counter	Number of cache accesses.	cache_name
databend_cache_hit_count	Counter	Counts the number of cache hits for different cache types.	cache_name
databend_cache_miss_count	Counter	Number of cache misses.	cache_name
databend_cache_miss_load_millisecond	Histogram	Distribution of cache miss load times.	cache_name
databend_cluster_discovered_node	Gauge	Reports information about discovered nodes exposed externally.	local_id, cluster_id, tenant_id, flight_address
databend_compact_hook_compaction_ms	Histogram	Histogram of the time spent on compaction operations.	operation
databend_compact_hook_execution_ms	Histogram	Distribution of execution time for compact hook operations.	operation: MergeInto, Insert
databend_fuse_block_index_read_bytes	Counter	Number of bytes read for block index.
databend_fuse_block_index_write_bytes_total	Counter	Total number of bytes written for index blocks.
databend_fuse_block_index_write_milliseconds	Histogram	Distribution of the time taken to write index blocks.
databend_fuse_block_index_write_nums_total	Counter	Total number of index blocks written.
databend_fuse_block_write_bytes	Counter	Total number of bytes written.
databend_fuse_block_write_millioseconds	Histogram	Distribution of time taken to write blocks.
databend_fuse_block_write_nums	Counter	Total number of blocks written.
databend_fuse_blocks_bloom_pruning_after	Counter	Number of blocks after executing block-level bloom pruning.
databend_fuse_blocks_bloom_pruning_before	Counter	Number of blocks before executing block-level bloom pruning.
databend_fuse_blocks_range_pruning_after	Counter	Number of blocks after executing block-level range pruning.
databend_fuse_blocks_range_pruning_before	Counter	Number of blocks before executing block-level range pruning.
databend_fuse_bytes_block_bloom_pruning_after	Counter	Data size in bytes after executing block-level bloom pruning.
databend_fuse_bytes_block_bloom_pruning_before	Counter	Data size in bytes before executing block-level bloom pruning.
databend_fuse_bytes_segment_range_pruning_after	Counter	Data size in bytes after executing segment-level range pruning.
databend_fuse_bytes_segment_range_pruning_before	Counter	Data size in bytes before executing segment-level range pruning.
databend_fuse_commit_aborts	Counter	Number of times commit aborted due to errors.
databend_fuse_commit_copied_files	Counter	Total number of files copied during commit operations.
databend_fuse_commit_milliseconds	Counter	Total time taken for commit mutations.
databend_fuse_commit_mutation_modified_segment_exists_in_latest	Counter	Counts the existence of modified segments in the latest commit mutation.
databend_fuse_commit_mutation_success	Counter	Number of successful mutations committed.
databend_fuse_commit_mutation_unresolvable_conflict	Counter	Number of times unresolvable commit conflicts occurred.
databend_fuse_compact_block_build_lazy_part_milliseconds	Histogram	Distribution of the time spent building the lazy part during compaction.
databend_fuse_compact_block_build_task_milliseconds	Histogram	Distribution of the time spent building the compact block.
databend_fuse_compact_block_read_bytes	Counter	Cumulative size of blocks read during compaction, in bytes.
databend_fuse_compact_block_read_milliseconds	Histogram	Histogram of time spent reading blocks during compaction.
databend_fuse_compact_block_read_nums	Counter	Counts the number of blocks read during compaction.
databend_fuse_pruning_milliseconds	Histogram	Time spent on pruning segments.
databend_fuse_remote_io_deserialize_milliseconds	Histogram	Time spent on decompressing and deserializing raw data into DataBlocks.
databend_fuse_remote_io_read_bytes	Counter	Cumulative number of bytes read from object storage.
databend_fuse_remote_io_read_bytes_after_merged	Counter	Cumulative number of bytes read from object storage after merging.
databend_fuse_remote_io_read_milliseconds	Histogram	Histogram of time spent reading from S3.
databend_fuse_remote_io_read_parts	Counter	Cumulative count of partitioned table data blocks read from object storage.
databend_fuse_remote_io_seeks	Counter	Cumulative count of independent IO operations during reads from object storage.
databend_fuse_remote_io_seeks_after_merged	Counter	Cumulative count of IO merges during reads from object storage.
databend_fuse_segments_range_pruning_after	Counter	Number of segments after executing segment-level range pruning.
databend_fuse_segments_range_pruning_before	Counter	Number of segments before executing segment-level range pruning.
databend_merge_into_accumulate_milliseconds	Histogram	Overall time distribution for merge operations.
databend_merge_into_append_blocks_counter	Counter	Total number of blocks written in merge into.
databend_merge_into_append_blocks_rows_counter	Counter	Total number of rows written in merge into.
databend_merge_into_apply_milliseconds	Histogram	Time distribution for merge into operations.
databend_merge_into_matched_operation_milliseconds	Histogram	Time distribution for matched operations in merge operations.
databend_merge_into_matched_rows	Counter	Total number of matched rows in merge operations.
databend_merge_into_not_matched_operation_milliseconds	Histogram	Time distribution for 'not matched' part of merge into operations.
databend_merge_into_replace_blocks_counter	Counter	Number of replacement blocks generated by merge operations.
databend_merge_into_replace_blocks_rows_counter	Counter	Number of rows replaced by merge operations.
databend_merge_into_split_milliseconds	Histogram	Time taken for splitting merge operations.
databend_merge_into_unmatched_rows	Counter	Total number of rows unmatched in merge into.
databend_meta_grpc_client_request_duration_ms	Histogram	Distribution of request durations for different types of requests (Upsert, Txn, StreamList, StreamMGet, GetClientInfo) made to the meta leader.	endpoint, request
databend_meta_grpc_client_request_inflight	Gauge	Current number of queries connecting to the meta.
databend_meta_grpc_client_request_success	Counter	Number of successful requests to the meta.	endpoint, request
databend_opendal_bytes	Counter	Total number of bytes read and written by the OpenDAL endpoint.	scheme (the scheme used for the operation, e.g., "s3"), op (the type of operation, e.g., "read" or "write")
databend_opendal_bytes_histogram	Histogram	Distribution of response times and counts by operation.	scheme (the scheme used for the operation, e.g., "s3"), op (the type of operation, e.g., "write")
databend_opendal_errors	Counter	Number of errors and their types encountered in OpenDAL operations.	scheme (the scheme used for the operation, e.g., "s3"), op (the type of operation, e.g., "read"), err (the type of error encountered, e.g., "NotFound")
databend_opendal_request_duration_seconds	Histogram	Duration of OpenDAL requests to object storage.	scheme (the scheme used for the operation, e.g., "s3"), op (the type of operation, e.g., "read")
databend_opendal_requests	Counter	Number of various types of requests made using OpenDAL.	scheme (the scheme used for the request, e.g., "s3"), op (the operation type, e.g., "batch", "list", "presign", "read", "write", "delete", "stat")
databend_process_cpu_seconds_total	Counter	Total CPU time (seconds) used by users and system.
databend_process_max_fds	Gauge	Maximum number of open file descriptors.
databend_process_open_fds	Gauge	Number of open file descriptors.
databend_process_resident_memory_bytes	Gauge	Resident memory size in bytes.
databend_process_start_time_seconds	Gauge	Start time of the process since Unix epoch in seconds.
databend_process_threads	Gauge	Number of OS threads in use.
databend_process_virtual_memory_bytes	Gauge	Virtual memory size in bytes.
databend_query_duration_ms	Histogram	Tracks the distribution of execution times for different types of queries initiated by various handlers.	handler, kind, tenant, cluster
databend_query_error	Counter	Total number of query errors.	handler="HTTPQuery", kind="Other", tenant="wubx", cluster="w189"
databend_query_failed	Counter	Total number of failed requests.
databend_query_http_requests_count	Counter	Number of HTTP requests, categorized by method, API endpoint, and status code.	method, api, status
databend_query_http_response_duration_seconds	Histogram	Query response time distribution, categorized by HTTP method and API endpoint.	method, api, le, sum, count
databend_query_http_response_errors_count	Counter	Counts and types of request errors.	code, err
databend_query_result_bytes	Counter	Total number of bytes in the data returned by each query.	handler, kind, tenant, cluster
databend_query_result_rows	Counter	Total number of data rows returned by each query.	handler, kind, tenant, cluster
databend_query_scan_bytes	Counter	Total size of data scanned by queries in bytes.	handler, kind, tenant, cluster
databend_query_scan_io_bytes	Counter	Total size of data scanned and transferred during queries, in bytes.	handler, kind, tenant, cluster
databend_query_scan_io_bytes_cost_ms	Histogram	Distribution of IO scan time during queries.	handler, kind, tenant, cluster
databend_query_scan_partitions	Counter	Total number of partitions (blocks) scanned by queries.	handler, kind, tenant, cluster
databend_query_scan_rows	Counter	Total number of data rows scanned by queries.	handler, kind, tenant, cluster
databend_query_start	Counter	Tracks the number of query executions initiated by different handlers. It categorizes queries into various kinds such as SELECT, UPDATE, INSERT, and others.	handler, kind, tenant, cluster
databend_query_success	Counter	Number of successful queries by type.	handler, kind, tenant, cluster
databend_query_total_partitions	Counter	Total number of partitions (blocks) involved in the query.	handler, kind, tenant, cluster
databend_query_write_bytes	Counter	Cumulative number of bytes written by queries.	handler, kind, tenant, cluster
databend_query_write_io_bytes	Counter	Total size of data written and transmitted by queries.	handler, kind, tenant, cluster
databend_query_write_io_bytes_cost_ms	Histogram	Time cost of writing IO bytes for queries.	handler, kind, tenant, cluster
databend_query_write_rows	Counter	Cumulative number of rows written by queries.	handler, kind, tenant, cluster
databend_session_close_numbers	Counter	Number of session closures.
databend_session_connect_numbers	Counter	Records the cumulative total number of connections made to the nodes since the system started.
databend_session_connections	Gauge	Measures the current number of active connections to the nodes.
databend_session_queue_acquire_duration_ms	Histogram	Distribution of waiting queue acquisition time.
databend_session_queued_queries	Gauge	Number of SQL queries currently in the query queue.
databend_session_running_acquired_queries	Gauge	Current number of acquired queries in the running session.

Databend Metrics

Meta Metrics

Server

Raft Network

Raft Storage

Meta Network

Query Metrics

Join our growing community

GitHub

Slack

X(Twitter)

YouTube

Explore Databend Cloud for FREE

Meta Metrics​

Server​

Raft Network​

Raft Storage​

Meta Network​

Query Metrics​

Join our growing community

GitHub

Slack

X(Twitter)

YouTube

Explore Databend Cloud for FREE

Meta Metrics

Server

Raft Network

Raft Storage

Meta Network

Query Metrics