Skip to main content

Summary

Adding config backward compatibility will allow us to iterate quickly while avoiding breaking the environment.

Motivation

While early birds are starting to deploy databend by themselves, it's time for us to establish some contracts between users. We should allow users to upgrade their deployments without breaking backward compatibility. In this RFC, we will focus on config.

config I mentioned here including:

  • config file that is read by databend-query and databend-meta.
  • config env that read by databend-query and databend-meta
  • application args that are accepted by databend-query and databend-meta
  • protobuf messaged that generated by databend-query (stored inside databend-meta)

Out of scope:

  • Tools like fuzz and metactl are not covered by this RFC.
  • Command-line UX of databend-query and databend-meta is another topic. We will not cover it in this RFC.
  • Config input/output by SQL/HTTP Rest API are not covered (for example, the output of table system.config)

For convenience, I will use databend to refer to databend-query and databend-meta.

With this RFC, our users will upgrade their deployments without breaking. Old configs should always work along with the new implementations.

Guide-level explanation

No action is needed for users to take while upgrading their deployments. They upgrade databend by replacing binaries and images directly.

Sometimes, they will get DEPRECATED warnings for some config fields. It's up to users to decide whether to migrate them. Before we introduce the versioned config formally, no config will be removed. And all config fields will work as before.

Reference-level explanation

Inside databend, we will split config into inner and outer:

inner

Config instances used inside databend. All logic SHOULD be implemented towards the inner config.

outer

Config instances are used as the front office of the databend. They will transform into an inner config. Other modules SHOULD NOT depend on outer config.

Take query for example:

The inner config of the query will be like this:

#[derive(Clone, Default, Debug, PartialEq, Serialize, Deserialize)]
#[serde(default)]
pub struct Config {
pub query: QueryConfig,
pub log: LogConfig,
pub meta: MetaConfig,
pub storage: StorageConfig,
pub catalog: HiveCatalogConfig,
}

The outer config of the query will be like this:

#[derive(Clone, Default, Debug, PartialEq, Serialize, Deserialize, Parser)]
#[clap(about, version, author)]
#[serde(default)]
pub struct ConfigV0 {
#[clap(long, short = 'c', default_value_t)]
pub config_file: String,

#[clap(flatten)]
pub query: QueryConfigV0,

#[clap(flatten)]
pub log: LogConfigV0,

#[clap(flatten)]
pub meta: MetaConfigV0,

#[clap(flatten)]
pub storage: StorageConfigV0,

#[clap(flatten)]
pub catalog: HiveCatalogConfigV0,
}

The inner config users have to maintain the outer config.

For example: common-io should provide inner config StorageConfig. If query wants to include StorageConfig inside QueryConfig, query needs to:

  • Implement versioned outer config for StorageConfig called StorageConfigV0.
  • Implement Into<StorageConfig> for StorageConfigV0.
  • Refer StorageConfig in QueryConfig,
  • Refer StorageConfigV0 in QueryConfigV0.

Config Maintenance

All maintenance notices SHOULD be applied to the outer config struct.

  • Add config: add with new default is compatible, or it's forbidden.
  • Remove config: remove field is not allowed. Mark them as DEPRECATED instead.
  • Change config: change config type and structure are not allowed.

Drawbacks

Maintenance burden

Introducing an outer config will increase the complexity of the config handler.

Rationale and alternatives

The most important thing is that RFC intends to split inner and outer config instances. Make inner as simple as possible and leave the userland interactive works for outer to handle.

serde doesn't work in this way.

How to work with protobuf used by meta?

As described in the reference, the config used by protobuf is another outer config. It should handle versions by itself. Based on the current status of databend common-proto-conv, we will keep all fields until we decide to increase OLDEST_COMPATIBLE_VER.

Prior art

None, this RFC is the first try for backward config compatibility.

Unresolved questions

None.

Future possibilities

Introduce versioned config

We can introduce a versioned config to allow users to specify the config versions:

  • config file: version=42
  • config env: export CONFIG_VERSION=42
  • args: --config-version=42

Suppose compatible changes happened as a new config entry was added. databend will make sure that the entry has a default value.

Suppose incompatible changes happened, like config been removed/renamed/changed. databend increases the config version. The older version will still load by the specified version and be converted to the latest config internally. A DEPRECATED warning will also be printed for removed config fields. So users can decide whether to migrate them.

Load different versions from config files and envs

It's possible to load different versions from config files and envs.

For example:

Old version from config files:

version = 23

a = "Version 23"

New version from env:

export CONFIG_VERSION=42
export QUERY_B = "Version 42"

For the best situation, we can load from env via version 42 and then load from config via version 23.