Dictionary encoding is a data compression technique that can be applied to individual columns of the following effective types:
- int
- long
- date
- char1 - char256
It will store each unique value of a column in memory and associate each record with its corresponding unique value. This eliminates the storage of duplicate values in a column, reducing the overall memory & disk space required to hold the data.
Dictionary encoding is most effective on columns with low cardinality; the fewer the number of unique values within a column, the greater the reduction in memory usage. Queries against the encoded column will generally be faster.
A column can be created with dictionary encoding in effect by applying the dict data handling property to the column during type creation (using /create/type). An existing column can be converted to use dictionary encoding by modifying the column and applying the dict property (using /alter/table).
For example, to apply dictionary encoding to a column during table creation, in Python:
|
|
To apply dictionary encoding to a column after table creation, in Python:
|
|
To remove dictionary encoding from a column, in Python, alter the column, specifying all non-dictionary-encoding properties the column currently has:
|
|
Important
Don't leave any spaces between properties in an /alter/table command's column_properties option.
Limitations and Cautions
Columns with any of the following characteristics are not eligible for dictionary encoding:
- Any effective types other than:
- int
- long
- date
- char1 - char256
- Store-only data handling
- Member of a filtered view or join view
- Member of an existing primary key or shard key; i.e., dictionary encoding cannot be applied to primary key or shard key columns after table creation