- RFC PR: datafuselabs/databend#5324
- Tracking Issue: datafuselabs/databend#5297
Summary
Adding config backward compatibility will allow us to iterate quickly while avoiding breaking the environment.
Motivation
While early birds are starting to deploy databend by themselves, it's time for us to establish some contracts between users. We should allow users to upgrade their deployments without breaking backward compatibility. In this RFC, we will focus on config
.
config
I mentioned here including:
- config file that is read by
databend-query
anddatabend-meta
. - config env that read by
databend-query
anddatabend-meta
- application args that are accepted by
databend-query
anddatabend-meta
- protobuf messaged that generated by
databend-query
(stored insidedatabend-meta
)
Out of scope:
- Tools like
fuzz
andmetactl
are not covered by this RFC. - Command-line UX of
databend-query
anddatabend-meta
is another topic. We will not cover it in this RFC. - Config input/output by SQL/HTTP Rest API are not covered (for example, the output of table
system.config
)
For convenience, I will use
databend
to refer todatabend-query
anddatabend-meta
.
With this RFC, our users will upgrade their deployments without breaking. Old configs should always work along with the new implementations.
Guide-level explanation
No action is needed for users to take while upgrading their deployments. They upgrade databend by replacing binaries and images directly.
Sometimes, they will get DEPRECATED
warnings for some config fields. It's up to users to decide whether to migrate them. Before we introduce the versioned config formally, no config will be removed. And all config fields will work as before.
Reference-level explanation
Inside databend, we will split config into inner
and outer
:
inner
Config instances used inside databend. All logic SHOULD be implemented towards the inner
config.
outer
Config instances are used as the front office of the databend. They will transform into an inner
config. Other modules SHOULD NOT depend on outer
config.
Take query
for example:
The inner config of the query will be like this:
#[derive(Clone, Default, Debug, PartialEq, Serialize, Deserialize)]
#[serde(default)]
pub struct Config {
pub query: QueryConfig,
pub log: LogConfig,
pub meta: MetaConfig,
pub storage: StorageConfig,
pub catalog: HiveCatalogConfig,
}
The outer config of the query will be like this:
#[derive(Clone, Default, Debug, PartialEq, Serialize, Deserialize, Parser)]
#[clap(about, version, author)]
#[serde(default)]
pub struct ConfigV0 {
#[clap(long, short = 'c', default_value_t)]
pub config_file: String,
#[clap(flatten)]
pub query: QueryConfigV0,
#[clap(flatten)]
pub log: LogConfigV0,
#[clap(flatten)]
pub meta: MetaConfigV0,
#[clap(flatten)]
pub storage: StorageConfigV0,
#[clap(flatten)]
pub catalog: HiveCatalogConfigV0,
}
The inner
config users have to maintain the outer
config.
For example: common-io
should provide inner
config StorageConfig
. If query
wants to include StorageConfig
inside QueryConfig
, query
needs to:
- Implement versioned
outer
config forStorageConfig
calledStorageConfigV0
. - Implement
Into<StorageConfig> for StorageConfigV0
. - Refer
StorageConfig
inQueryConfig
, - Refer
StorageConfigV0
inQueryConfigV0
.
Config Maintenance
All maintenance notices SHOULD be applied to the outer
config struct.
- Add config: add with new default is compatible, or it's forbidden.
- Remove config: remove field is not allowed. Mark them as
DEPRECATED
instead. - Change config: change config type and structure are not allowed.
Drawbacks
Maintenance burden
Introducing an outer
config will increase the complexity of the config handler.
Rationale and alternatives
Why not use serde
and related tools?
The most important thing is that RFC intends to split inner
and outer
config instances. Make inner
as simple as possible and leave the userland interactive works for outer
to handle.
serde
doesn't work in this way.
How to work with protobuf used by meta
?
As described in the reference, the config used by protobuf
is another outer
config. It should handle versions by itself. Based on the current status of databend common-proto-conv
, we will keep all fields until we decide to increase OLDEST_COMPATIBLE_VER
.
Prior art
None, this RFC is the first try for backward config compatibility.
Unresolved questions
None.
Future possibilities
Introduce versioned config
We can introduce a versioned config to allow users to specify the config versions:
- config file:
version=42
- config env:
export CONFIG_VERSION=42
- args:
--config-version=42
Suppose compatible changes happened as a new config entry was added. databend will make sure that the entry has a default value.
Suppose incompatible changes happened, like config been removed/renamed/changed. databend increases the config version. The older version will still load by the specified version and be converted to the latest config internally. A DEPRECATED
warning will also be printed for removed config fields. So users can decide whether to migrate them.
Load different versions from config files and envs
It's possible to load different versions from config files and envs.
For example:
Old version from config files:
version = 23
a = "Version 23"
New version from env:
export CONFIG_VERSION=42
export QUERY_B = "Version 42"
For the best situation, we can load from env via version 42 and then load from config via version 23.