Data Purge and Recycle
Overview
In Databend, data is not immediately deleted when you run DROP
, TRUNCATE
, or DELETE
commands. This enables Databend's time travel feature, allowing you to access previous states of your data. However, this approach means that storage space is not automatically freed up after these operations.
Before DELETE: After DELETE: After VACUUM:
+----------------+ +----------------+ +----------------+
| Current Data | | New Version | | Current Data |
| | | (After DELETE) | | (After DELETE) |
+----------------+ +----------------+ +----------------+
| Historical Data| | Historical Data| | |
| (Time Travel) | | (Original Data)| | |
+----------------+ +----------------+ +----------------+
Storage not freed Storage freed
Types of Data to Clean
In Databend, there are four main types of data that may need cleaning:
- Dropped Table Data: Data files from tables that have been dropped using the DROP TABLE command
- Table History Data: Historical versions of tables, including snapshots created through UPDATE, DELETE, and other operations
- Orphan Files: Snapshots, segments, and blocks that are no longer associated with any table
- Spill Temporary Files: Temporary files created when memory usage exceeds available limits during query execution (for joins, aggregates, sorts, etc.). Databend automatically cleans up these files when queries complete normally. Manual cleanup is only needed in rare cases when Databend crashes or shuts down unexpectedly during query execution.
Using VACUUM Commands
The VACUUM command family is the primary method for cleaning data in Databend (Enterprise Edition Feature). Different VACUUM subcommands are used depending on the type of data you need to clean.
VACUUM Commands:
+------------------------+ +------------------------+ +------------------------+
| VACUUM DROP TABLE | | VACUUM TABLE | | VACUUM TEMPORARY FILES |
+------------------------+ +------------------------+ +------------------------+
| Cleans dropped tables | | Cleans table history | | Cleans spill files |
| and their data files | | and orphan files | | (rarely needed) |
+------------------------+ +------------------------+ +------------------------+
VACUUM DROP TABLE
This command permanently deletes data files of dropped tables, freeing up storage space.
VACUUM DROP TABLE [FROM <database_name>] [DRY RUN [SUMMARY]] [LIMIT <file_count>];
Options:
FROM <database_name>
: Restrict to a specific databaseDRY RUN [SUMMARY]
: Preview files to be removed without actually deleting themLIMIT <file_count>
: Limit the number of files to be vacuumed
Examples:
-- Preview files that would be removed
VACUUM DROP TABLE DRY RUN;
-- Preview summary of files that would be removed
VACUUM DROP TABLE DRY RUN SUMMARY;
-- Remove dropped tables from the "default" database
VACUUM DROP TABLE FROM default;
-- Remove up to 1000 files from dropped tables
VACUUM DROP TABLE LIMIT 1000;
VACUUM TABLE
This command removes historical data for a specified table, clearing old versions and freeing storage.
VACUUM TABLE <table_name> [DRY RUN [SUMMARY]];
Options:
DRY RUN [SUMMARY]
: Preview files to be removed without actually deleting them
Examples:
-- Preview files that would be removed
VACUUM TABLE my_table DRY RUN;
-- Preview summary of files that would be removed
VACUUM TABLE my_table DRY RUN SUMMARY;
-- Remove historical data from my_table
VACUUM TABLE my_table;
VACUUM TEMPORARY FILES
This command clears temporary spilled files used for joins, aggregates, and sorts, freeing up storage space.
VACUUM TEMPORARY FILES;
Note: While this command is provided as a manual method for cleaning up temporary files, it's rarely needed during normal operation since Databend automatically handles cleanup in most cases.
Adjusting Data Retention Time
The VACUUM commands remove data files older than the DATA_RETENTION_TIME_IN_DAYS
setting. By default, Databend retains historical data for 1 day (24 hours). You can adjust this setting:
-- Change retention period to 2 days
SET GLOBAL DATA_RETENTION_TIME_IN_DAYS = 2;
-- Check current retention setting
SHOW SETTINGS LIKE 'DATA_RETENTION_TIME_IN_DAYS';
Edition | Default Retention | Maximum Retention |
---|---|---|
Databend Community & Enterprise Editions | 1 day (24 hours) | 90 days |
Databend Cloud (Personal) | 1 day (24 hours) | 1 day (24 hours) |
Databend Cloud (Business) | 1 day (24 hours) | 90 days |