Skip to main content

MARKOV_GENERATE

Using the model trained by MARKOV_TRAIN to anonymize the dataset.

Syntax

MARKOV_GENERATE( <model>, <params>, <seed>, <determinator> )

Arguments

ArgumentsDescription
modelThe return model of markov_train
paramsJson string: {"order": 5, "sliding_window_size": 8}
order:order of markov model to generate strings,
size of a sliding window in a source string - its hash is used as a seed for RNG in markov model
seedseed
determinatorSource string

Return Type

String.

Examples

Generate multiple PII-like columns (name + email) from small seed sets:

-- 1) Train separate models on names and emails (PII text)
CREATE TABLE markov_name_model AS
SELECT markov_train(name) AS model
FROM (
VALUES ('Alice Johnson'),('Bob Smith'),('Carol Davis'),('David Miller'),('Emma Wilson'),
('Frank Brown'),('Grace Lee'),('Henry Clark'),('Irene Torres'),('Jack White')
) AS t(name);

CREATE TABLE markov_email_model AS
SELECT markov_train(email) AS model
FROM (
VALUES ('alice.johnson@gmail.com'),('bob.smith@yahoo.com'),('carol.davis@outlook.com'),
('david.miller@example.com'),('emma.wilson@example.com'),('frank.brown@gmail.com'),
('grace.lee@example.com'),('henry.clark@example.com'),('irene.torres@example.com'),
('jack.white@example.com')
) AS t(email);

-- 2) Generate synthetic name + email pairs; seed keeps it reproducible
SELECT
markov_generate(n.model, '{"order":3,"sliding_window_size":12}', 3030, CONCAT('orig_', number)) AS fake_name,
markov_generate(e.model, '{"order":3,"sliding_window_size":12}', 3030, CONCAT('orig_', number, '@example.com')) AS fake_email
FROM numbers(6)
JOIN markov_name_model n
JOIN markov_email_model e
LIMIT 6;
-- Sample output
+----------------+-------------------------+
| fake_name | fake_email |
+----------------+-------------------------+
| Frank Brown | henry.clark@example |
| Grace Johnso | quinn.foster@example |
| Rachel | paul.adams@example |
| Carol David | olivia.baker@example |
| Jack White | frank.brown@gmail.com |
| Noah Harris | race.johnson@example |
+----------------+-------------------------+
Try Databend Cloud for FREE

Multimodal, object-storage-native warehouse for BI, vectors, search, and geo.

Snowflake-compatible SQL with automatic scaling.

Sign up and get $200 in credits.

Try it today