Compression can be applied to an individual column of any data type to reduce its memory footprint. By default, a column is stored uncompressed within memory. After compression is applied, the column remains in a compressed state until used. When the data is retrieved, a copy is temporarily uncompressed for use and then discarded when no longer being processed. When new data is added or existing data updated, the affected data segment will be uncompressed, modified, and then recompressed immediately. Compression persists through database restarts.
No functionality is disabled on compressed columns, but insert & update operations will be slower. Data retrieval operations may be slower, but should still perform well overall.
A column can be compressed via the /alter/table endpoint. You can also set column compression in GAdmin.
The compression setting determines which compression algorithm to use (if any). Each compression algorithm varies in compression ratio (ratio between uncompressed size and compressed size) and the speed at which the data is compressed and uncompressed. There are four compression settings available:
none
(the default)snappy
-- high compression/decompression speed (minimum 250-500 MB/s
per core), large compression ratio (~2.091)lz4
-- high compression speed (minimum 400 MB/s per core) and higher
decompression speed, large compression ratio (~2.101)lz4hc
-- slower compression speed but higher decompression speed than
lz4
, higher compression ratio than lz4
(~2.720)For example, to apply Snappy compression to a column in Python:
gpudb.alter_table(
table_name = "taxi_data",
action = "set_column_compression",
value = "pickup_datetime",
options = {"compression_type":"snappy"}
)
To turn off column compression in Python:
gpudb.alter_table(
table_name = "taxi_data",
action = "set_column_compression",
value = "pickup_datetime",
options = {"compression_type":"none"}
)
Columns with any of the following characteristics are not eligible for compression: