Dictionary encoding is a data compression technique that can be applied to individual columns of the following effective types:
- char1 - char256
It will store each unique value of a column in memory and associate each record with its corresponding unique value. This eliminates the storage of duplicate values in a column, reducing the overall memory & disk space required to hold the data.
Dictionary encoding is most effective on columns with low cardinality; the fewer the number of unique values within a column, the greater the reduction in memory usage. Queries against the encoded column will generally be faster.
A column can be created with dictionary encoding in effect by applying the dict data handling property to the column during type creation (using /create/type). An existing column can be converted to use dictionary encoding by modifying the column and applying the dict property (using /alter/table).
For example, to apply dictionary encoding to a column during table creation, in Python:
To apply dictionary encoding to a column after table creation, in Python:
To remove dictionary encoding from a column, in Python, alter the column, specifying all non-dictionary-encoding properties the the column currently has:
Don't leave any spaces between properties in an /alter/table command's column_properties option.
Limitations and Cautions
Columns with any of the following characteristics are not eligible for dictionary encoding: