A type is analogous to a traditional database definition for a table. Before
data can be stored in Kinetica, a type must be specified for that data. For
every type, Kinetica assigns a unique GUID. Kinetica will use the same
GUID for all types with identical characteristics.
Every type in Kinetica consists of the following:
For details on nullability and working with nulls, see Nulls.
Type Label
A type label serves as a tagging mechanism for the type. The type label
can be any text string specified by the client. The type label serves two
purposes. First, it identifies tables with similar data. Second, it helps
determine a type’s uniqueness.
Type Definition
A type definition (sometimes referred to as type schema) consists of a set
of column names, their respective base data types, and their nullability.
Column Types
Each column in the database will have a specific type that is the
combination of a base type and an optional set of
column properties, listed below.
A given column in a type definition must be of one of the following
base types. If no data type column properties are
specified to modify the base type, the column will take on the corresponding
default specific type.
For example, a column with an int base type and no
column properties that modify the data type will have a specific type of
int, which is a 32-bit signed integer. A column with an int base type
and a data type column property of int16 will have a specific type of
int16, which is a 16-bit signed integer.
The following lists the specific type within each base type, the
number of bytes in memory an instance of the specific type
occupies, and the minimum and maximum values the specific type can
be assigned.
int
| Specific Type | Bytes | Minimum Value | Maximum Value |
|---|
int (default) | 4 | -2147483648 | 2147483647 |
int16 | 2 | -32768 | 32767 |
int8 | 1 | -128 | 127 |
boolean | 1 | 0 | 1 |
long
| Specific Type | Bytes | Minimum Value | Maximum Value |
|---|
long (default) | 8 | -9223372036854775808 | 9223372036854775807 |
timestamp | 8 | -30610224000000 (1/1/1000 00:00:00.000) | 29379542399999 (12/31/2900 23:59:59.999) |
float
| Specific Type | Bytes | Minimum Value | Maximum Value |
|---|
float | 4 | -3.40282 * 1038 | 3.40282 * 1038 |
double
| Specific Type | Bytes | Minimum Value | Maximum Value |
|---|
double | 8 | -1.79769313486231 * 10308 | 1.79769313486231 * 10308 |
string
| Specific Type | Bytes | Minimum Value | Maximum Value |
|---|
string (default) | 32 + length | (empty string) | (100,000,000,000 bytes) |
array(boolean[,n]) | varies | (empty array) | (100,000,000,000 bytes) |
array(int[,n]) | varies | (empty array) | (100,000,000,000 bytes) |
array(long[,n]) | varies | (empty array) | (100,000,000,000 bytes) |
array(ulong[,n]) | varies | (empty array) | (100,000,000,000 bytes) |
array(float[,n]) | varies | (empty array) | (100,000,000,000 bytes) |
array(double[,n]) | varies | (empty array) | (100,000,000,000 bytes) |
array(string[,n]) | varies | (empty array) | (100,000,000,000 bytes) |
char1 | 1 | (empty string) | (1 byte) |
char2 | 2 | (empty string) | (2 bytes) |
char4 | 4 | (empty string) | (4 bytes) |
char8 | 8 | (empty string) | (8 bytes) |
char16 | 16 | (empty string) | (16 bytes) |
char32 | 32 | (empty string) | (32 bytes) |
char64 | 64 | (empty string) | (64 bytes) |
char128 | 128 | (empty string) | (128 bytes) |
char256 | 256 | (empty string) | (256 bytes) |
decimal(p<=18,s) | 8 | -999999999999999999 | 999999999999999999 |
decimal(p>=19,s) | 12 | -39614081257132168796771975168 | 39614081257132168796771975167 |
date | 4 | 1000-01-01 | 2900-12-31 |
datetime | 8 | 1000-01-01 00:00:00.000 | 2900-12-31 23:59:59.999 |
ipv4 | 4 | 0.0.0.0 | 255.255.255.255 |
json | varies | (empty string) | (100,000,000,000 bytes) |
time | 4 | 00:00:00.000 | 23:59:59.999 |
ulong | 8 | 0 | 18446744073709551615 |
uuid | 16 | (empty string) | (36 bytes) |
wkt | varies | (empty geometry) | (100,000,000,000 bytes) |
bytes
| Specific Type | Bytes | Minimum Value | Maximum Value |
|---|
bytes (default) | N/A | (empty array) | (100,000,000,000 bytes) |
vector(n) | 4 * n | (empty vector) | (100,000,000,000 bytes) |
wkt | varies | (empty geometry) | (100,000,000,000 bytes) |
- For more information on complex column types, see: Arrays,
JSON, and Vector Type.
- Character types use UTF-8 encoding and are defined using byte-designated
sizes; the use of wide (multi-byte) characters will decrease the total
number of characters that can be stored in a given fixed-width string type.
- The decimal type should be used instead of float or double
when exact values of decimal places need to be represented. This is often
used with currency.
- The decimal type will be stored in a byte size appropriate for the
p
value (precision) specified; p values 18 and under will be stored in 8
bytes, while p values 19 and over will be stored in 12 bytes. The
maximum p value is 27 and the maximum s value (scale) is 18.
- Any date/time within the year
9999 can be used to denote an unknown or
out-of-range date. No dates/times in between 2900 and 9999 will be
recognized.
- Adding nullability to a column requires an additional byte per value;
e.g., a nullable integer requires 5 bytes in memory instead of 4.
- A valid UUID can be either 32 characters (without hyphens) or 36 characters
(with hyphens), e.g.,
1bd0b4cc0cbc11eb97be02420a000046 or
1bd0b4cc-0cbc-11eb-97be-02420a000046
- Numeric literals may be specified in decimal,
hexadecimal with a leading
0x, octal with a leading 0o, or binary
with a leading 0b. For example, the number 42 may be represented in any
of the following ways: 42, 0x2a, 0o52, and 0b101010.
Hexadecimal, octal and binary literals may also be specified as: x'2a',
o'52', and b'101010', respectively.
Column Properties
Kinetica has an additional layer of semantic regarding column data type. At the
time of creation, a column can be given one or more of the supported properties,
which give the column special meaning or handling. The properties can refine
the data type, direct special handling, or define the keyed nature of the
column. These modifiers can impact the number of records that can be stored in
memory, the performance of queries, and the types of operations that can be
performed on the data.
Data Types
The following properties can be used to modify the allowable set of values for
the corresponding base type. Only one of the following properties may be
applied to a given column, and the column must be of the stated base type. The
exception to this is nullability, which can be applied to any column in
addition to the data type specifiers listed here. However, nullable is not
used at type creation, but will be returned in a /show/table call.
Denoting a column’s nullability is API-dependent, and, with the exception of
the Java API, does not involve the direct use of the nullable column
property.
| Property | Base Type | Description |
|---|
array(boolean[,n]) | string | Boolean array of any length |
array(int[,n]) | string | Integer array of any length |
array(long[,n]) | string | Long array of any length |
array(ulong[,n]) | string | Unsigned long array of any length |
array(float[,n]) | string | Float array of any length |
array(double[,n]) | string | Double array of any length |
array(string[,n]) | string | String array of any length |
boolean | int | Numbers limited to 0 & 1; optimizes memory and query performance |
char1 | string | Text of up to 1 character; optimizes memory, disk, and query performance |
char2 | string | Text of up to 2 characters; optimizes memory, disk, and query performance |
char4 | string | Text of up to 4 characters; optimizes memory, disk, and query performance |
char8 | string | Text of up to 8 characters; optimizes memory, disk, and query performance |
char16 | string | Text of up to 16 characters; optimizes memory, disk, and query performance |
char32 | string | Text of up to 32 characters; optimizes memory, disk, and query performance |
char64 | string | Text of up to 64 characters; optimizes memory, disk, and query performance |
char128 | string | Text of up to 128 characters; optimizes memory, disk, and query performance |
char256 | string | Text of up to 256 characters; optimizes memory, disk, and query performance |
date | string | Interprets a string field as a date of the form YYYY-[M]M-[D]D |
datetime | string | Interprets a string field as a combination of date and time in the form of YYYY-[M]M-[D]D [H]H24:MI:SS[.mmm] |
decimal | string | Interprets a string field as a decimal number, with up to 27 digits of precision and 18 digits of scale |
int8 | int | Numbers limited to 8-bit signed integers; optimizes memory and query performance |
int16 | int | Numbers limited to 16-bit signed integers; optimizes memory and query performance |
ipv4 | string | Dotted decimal IPv4 addresses of the form: A.B.C.D where A, B, C and D are between 0 and 255, inclusive (e.g. 127.0.0.1); optimizes memory, disk, and query performance |
json | string | Interprets a string field as a JSON object |
nullable | <any> | Values can be set to null See Nulls for working with null types & values in the API |
time | string | Interprets a string field as a time of the form [H]H24:MI:SS[.mmm] |
timestamp | long | Timestamps in milliseconds since the Unix epoch: Jan 1 1970 00:00:00 |
ulong | string | Interprets a string field as a 64-bit unsigned long |
uuid | string | Interprets a string field as a 32 character (without hyphens) or 36 character (with hyphens) universally unique identifier (UUID) |
vector(n) | bytes | Vector of n float values, generally used in vector search |
wkt | string or bytes | Indicates that the column has WKT (or WKB) strings that should be handled as geometry objects. |
Data Handling
One or more of the following properties can be assigned to a column to alter the
way the data is stored & handled. Valid combinations are detailed in the
descriptions.
| Property | Description |
|---|
compress(type) | This property will compress the values of the associated column using the specified compression type, reducing disk usage and I/O load time. Queries against the column may also be faster. For column compression details, see Column Compression. |
data | Directs that the column’s data should be stored in memory, making it available for use in query expressions. Default property for all columns. |
dict | This property will dictionary encode the associated column, reducing memory and disk usage. Queries against the column will also be faster. For Dictionary Encoding details, see Dictionary Encoding. |
normalize | For vector column type, automatically normalizes each vector to have a magnitude (L2 norm) of 1; see Vector Type |
text_search | Enables full text search for string columns. For full text search details & limitations, see Full Text Search. |
Data Keys
One or both of these keyed attributes can be assigned to one or more columns.
For more information, see the sections on
primary keys &
shard keys.
Foreign keys cannot be assigned as column
properties.
| Property | Description |
|---|
primary_key | Makes this column part of (or the entire) primary key |
shard_key | Makes this column part of (or the entire) shard key |
Data Replacement
The following property can be assigned to a column to replace certain kinds of
data.
| Property | Description |
|---|
init_with_now | For date, time, datetime, and timestamp columns, this property will replace empty strings and invalid timestamp values with NOW() when inserting data. |
init_with_uuid | For uuid columns, this property will replace empty strings with a universally unique identifier (UUID). |
update_with_now | For date, datetime, time, and timestamp columns, this property will replace empty strings and invalid timestamp values with NOW() when updating data. |