Column compression is the application of a standard compression algorithm to a persisted column's data.
Column data will be compressed on disk for faster loading into memory at the cost of the decompression processing time on that loaded data. Once in memory, the data will remain uncompressed until it is no longer needed. Any inserting or updating of data in memory would result in column data being compressed and written to disk depending on which of these conditions is reached first:
- Memory data chunk being written to becomes full
- System compression timeout (disk_auto_optimize_timeout) is reached
Column compression can be specified at three different levels, and the compression applied to an individual column (if any) will be determined by which of the following configuration items is found first, in this order:
- Compression explicitly defined on the column (compress column property)
- Compression explicitly defined on the column's table at the time of its creation, applying to any columns on that table with no explicit compression definition of their own (compression_codec table property)
- Global default column compression (compression_codec)
Available column compression algorithms are as follows. Those with an optional compression parameter are shown with the minimum & maximum values for that parameter.
Name | Arg. Min. | Arg. Max. | Notes |
---|---|---|---|
none | Apply no compression to this column, overriding any higher-level defaults. | ||
lz4 | 0 | 12 | Default is 0. Values of 9 and above enable LZ4_HC. |
snappy | |||
zstd | -131072 | 22 | Default is 3. Negative values prioritize speed over compression ratio. |
For example, to apply the following column compression during table creation, in Python:
- LZ4 compression on the first_name
- zstd compression on the last_name
- Snappy compression on all other columns
|
|
To add a new column with column compression to a table, in Python:
|
|
Limitations and Cautions
- Columns with any of the following characteristics are not eligible for
column compression:
- Member of a filtered view or join view
- Member of a logical external table or SQL view
- The column compression of an existing column cannot be modified directly.