Arrays

Array Type

The array data type provides the capability to store a list of primitive data type values in a single column.

All arrays in Kinetica are one-dimensional. They are also unbounded, though the number of items in the array can be specified upon table/column creation for compatibility with other databases.

Arrays can be lists of any of the following data types:

  • boolean
  • int
  • long
  • ulong
  • float
  • double
  • string

Create Table

SQL
1
2
3
4
5
CREATE OR REPLACE TABLE example.arr
(
    ai INTEGER[] NOT NULL,
    av3 VARCHAR[3] NOT NULL
)
Python
1
2
3
4
5
6
7
table_name = 'example.arr'
columns = [
    ['ai', 'string', 'array(int)'],
    ['as3', 'string', 'array(string,3)']
]

gpudb.GPUdbTable(columns, table_name, db = kinetica)

Insert Data

SQL
1
2
INSERT INTO example.arr
VALUES ('[123456789, -987654321, 0, 0]','["ABCDEFG", "HIJKLMNOPQRSTUVWXYZ", ""]')
Python (as strings)
1
2
3
4
5
6
7
8
data = [[
        '[123456789, -987654321, 0, 0]',
        '["ABCDEFG", "HIJKLMNOPQRSTUVWXYZ", ""]'
]]

t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

t.insert_records(data)
Python (as native types)
1
2
3
4
5
6
7
8
data = [[
        [123456789, -987654321, 0, 0],
        ['ABCDEFG', 'HIJKLMNOPQRSTUVWXYZ', '']
]]

t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

t.insert_records(data)

Retrieve Data

To retrieve the 2nd item of an integer array and the entirety of a string array:

SQL
1
2
SELECT ai[2] AS ai_item_2, av3
FROM example.arr
Python
1
2
3
4
5
6
7
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

records = t.get_records_by_column(["ARRAY_ITEM(ai, 2) AS ai_item_2", "as3"])

print('%-10s %s' % ('ai_item_2', 'as3'))
for record in zip(records['ai_item_2'], records['as3']):
    print('%-10d %s' % (record[0], record[1]))
Python (w/ native type conversions)
1
2
3
4
5
6
7
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica, convert_special_types_on_retrieval = True)

records = t.get_records_by_column(["ARRAY_ITEM(ai, 2) AS ai_item_2", "as3"], encoding = 'json')

print('%-10s %s' % ('ai_item_2', 'as3'))
for record in zip(records['ai_item_2'], records['as3']):
    print('%-10d %s' % (record[0], record[1]))

Array Functions

Scalar Functions

These functions can be applied to individual array column values.

FunctionDescription
ARRAY_APPEND(array, value)Returns an array consisting of array with value appended to the end.
ARRAY_CONCAT(arr1, arr2)Returns an array consisting of all the elements of array arr1 followed by all of the elements of array arr2.
ARRAY_CONTAINS(array, value)Returns true if value is present in array.
ARRAY_CONTAINS_ALL(arr1, arr2)Returns true if every value in array arr2 is present in array arr1; otherwise returns false.
ARRAY_CONTAINS_ANY(arr1, arr2)Returns true if any value in array arr2 is present in array arr; otherwise returns false.
ARRAY_DISTINCT(array)Returns an array consisting of array values with duplicates removed.
ARRAY_EMPTY(array)Returns true if array is empty; otherwise returns false.
ARRAY_EXCEPT(arr1, arr2)Returns an array consisting of all of the elements of array arr1 with the ones that also appear in array arr2 removed.
ARRAY_INTERSECT(arr1, arr2)Returns an array consisting of all of the elements of array arr1 that also appear in array arr2.
ARRAY_ITEM(array, pos)

Returns the item in array at the given 1-based pos position.

See examples.

ARRAY_LENGTH(array)

Returns the number of items in the given array at the 1st dimension.

See examples.

ARRAY_LOWER(array, dim)Returns the lowest index of array in the given dim dimension; since all arrays are 1-dimensional, only specifying a dimension of 1 will return a value, and since array indices are 1-based, that value will always be 1.
ARRAY_NDIMS(array)Returns the number of dimensions in the given array; since all arrays are 1-dimensional, that number will always be 1.
ARRAY_NOT_EMPTY(array)Returns true if array is not empty; otherwise returns false.
ARRAY_SLICE(array, from, to)Returns an array consisting of the values from array between the from index (inclusive) up to the to index (exclusive) using 0-based indexing. Negative indexes are treated as offsets from the end of the array.
ARRAY_TO_STRING(array, delim)

Converts the items of array into a string delimited by delim.

See examples.

ARRAY_UPPER(array, dim)Returns the highest index of array in the given dim dimension; since all arrays are 1-dimensional, only specifying a dimension of 1 will return a value, and since array indices are 1-based, that value will always be equivalent to ARRAY_LENGTH.
MAKE_ARRAY(value)Returns an array consisting of the single value element.
STRING_TO_ARRAY(str, delim)

Converts the values in str, delimited by delim, into an array.

See examples.


Examples

ARRAY_ITEM
SQL
1
2
SELECT ARRAY_ITEM(ai, 2) AS ai_item_2, ARRAY_ITEM(av3, 2) AS av3_item_2
FROM example.arr
Python
1
2
3
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

records = t.get_records_by_column(["ARRAY_ITEM(ai, 2) AS ai_item_2", "as3"])

+--------------+-----------------------+
|    ai_item_2 | av3_item_2            |
+--------------+-----------------------+
|   -987654321 | HIJKLMNOPQRSTUVWXYZ   |
+--------------+-----------------------+
ARRAY_LENGTH
SQL
1
2
SELECT ARRAY_LENGTH(ai) AS ai_len, ARRAY_LENGTH(av3) AS av3_len
FROM example.arr
Python
1
2
3
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

records = t.get_records_by_column(["ARRAY_LENGTH(ai) AS ai_len", "ARRAY_LENGTH(as3) AS as3_len"])

+----------+-----------+
|   ai_len |   av3_len |
+----------+-----------+
|        4 |         3 |
+----------+-----------+
ARRAY_TO_STRING
SQL
1
2
SELECT ARRAY_TO_STRING(ai, ',') AS ai_str, ARRAY_TO_STRING(av3, '~') AS av3_str
FROM example.arr
Python
1
2
3
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

records = t.get_records_by_column(["ARRAY_TO_STRING(ai, ',') AS ai_str", "ARRAY_TO_STRING(as3, '~') AS as3_str"])

+----------------------------+--------------------------------+
| ai_str                     | av3_str                        |
+----------------------------+--------------------------------+
| 123456789,-987654321,0,0   | ABCDEFG~HIJKLMNOPQRSTUVWXYZ~   |
+----------------------------+--------------------------------+
STRING_TO_ARRAY
SQL
1
2
3
SELECT
    STRING_TO_ARRAY('123456789,-987654321,0,0', ',') AS ai_str,
    STRING_TO_ARRAY('ABCDEFG~HIJKLMNOPQRSTUVWXYZ~', '~') AS av3_str
Python
1
2
3
4
5
6
t = gpudb.GPUdbTable(name = 'SYSTEM.ITER', db = kinetica)

records = t.get_records_by_column([
    "STRING_TO_ARRAY('123456789,-987654321,0,0', ',') AS stri_as",
    "STRING_TO_ARRAY('ABCDEFG~HIJKLMNOPQRSTUVWXYZ~', '~') AS strs_as"
])

+--------------------------------------+------------------------------------------+
| ai_str                               | av3_str                                  |
+--------------------------------------+------------------------------------------+
| ["123456789","-987654321","0","0"]   | ["ABCDEFG","HIJKLMNOPQRSTUVWXYZ",null]   |
+--------------------------------------+------------------------------------------+

Aggregation/Transposition Functions

These functions can be used to convert primitive column values into arrays of those values (aggregation) or to convert array columns into columns of primitive values (unnest).

FunctionDescription
ARRAY_AGG(expr)

Combines column values within a group into a single array of those values.

See examples.

ARRAY_AGG_DISTINCT(expr)

Combines column values within a group into a single array of those values, removing any duplicates.

See examples.

UNNEST_JSON_ARRAY

Transposes array values into columnar data.

See UNNEST_JSON_ARRAY.


Examples

ARRAY_AGG
Input
+------+---------+
|   id | name    |
+------+---------+
|    1 | Joe     |
|    2 | Jane    |
|    3 | Jeff    |
|    4 | Joyce   |
+------+---------+
SQL
1
2
SELECT ARRAY_AGG(id) AS id_list, ARRAY_AGG(name) AS name_list
FROM example.arr_agg
Python
1
2
3
4
5
6
t = gpudb.GPUdbTable(name = 'example.arr_agg', db = kinetica)

records = t.aggregate_group_by([
    "ARRAY_AGG(id) AS id_list",
    "ARRAY_AGG(name) AS name_list"
])['records']

+-------------+---------------------------------+
| id_list     | name_list                       |
+-------------+---------------------------------+
| [1,2,3,4]   | ["Joe","Jane","Jeff","Joyce"]   |
+-------------+---------------------------------+
ARRAY_AGG_DISTINCT
Input
+------+---------+
|   id | name    |
+------+---------+
|    1 | Joe     |
|    1 | Joyce   |
|    2 | Jane    |
|    2 | Jeff    |
|    3 | Jane    |
|    3 | Jeff    |
|    4 | Joe     |
|    4 | Joyce   |
+------+---------+
SQL
1
2
SELECT ARRAY_AGG_DISTINCT(id) AS id_list, ARRAY_AGG_DISTINCT(name) AS name_list
FROM example.arr_agg
Python
1
2
3
4
5
6
t = gpudb.GPUdbTable(name = 'example.arr_agg', db = kinetica)

records = t.aggregate_group_by([
    "ARRAY_AGG_DISTINCT(id) AS id_list",
    "ARRAY_AGG_DISTINCT(name) AS name_list"
])['records']

+-------------+---------------------------------+
| id_list     | name_list                       |
+-------------+---------------------------------+
| [1,2,3,4]   | ["Jane","Jeff","Joe","Joyce"]   |
+-------------+---------------------------------+

UNNEST_JSON_ARRAY

The UNNEST_JSON_ARRAY function transposes array values into columnar data.

The two forms of the UNNEST_JSON_ARRAY function follow, one for cases where only one column needs to be unnested and one for cases where more than one column needs to be unnested. See examples.

UNNEST_JSON_ARRAY for 1 Column Syntax
1
2
SELECT <column alias> [, ordinality]
FROM UNNEST_JSON_ARRAY(SELECT <column name> FROM <table name>) [WITH ORDINALITY] [<column alias>]
ParametersDescription
<column alias>Column alias to use for the resulting unnested data column; if no alias specified, use * to select the column.
<table name>Name of the table containing the array column.
<column name>Name of the array column to unnest.
WITH ORDINALITYAlso return the 0-based index in the source arrays for each transposed array value; the index value will be returned as the column alias ordinality.

UNNEST_JSON_ARRAY for Multiple Columns Syntax
1
2
SELECT <column alias list> [, <ordinality alias>]
FROM <table name> <table alias>, UNNEST_JSON_ARRAY(<column list>) [WITH ORDINALITY] <table/columns aliases>
ParametersDescription
<column alias list>Column aliases given to the resulting unnested data columns, specified in <table/columns aliases>.
<ordinality alias>Column alias given to the resulting ordinality column in <table/columns aliases>, if WITH ORDINALITY is specified.
<table name>Name of the table containing the array column(s).
<table alias>Alias to use for the table containing the array column(s).
<column list>Comma-separated list of array columns to unnest.
<table/columns aliases>

Alias to use for the virtual table and columns being used to return the unnested data (including any ordinality alias, if specified), using the form:

<table alias>(<column alias list> [,<ordinality alias>])

For example, to return two columns of unnested array data as uai & uav, respectively, and the corresponding ordinality index as index, all within the virtual table unn use:

unn(uai, uav, index)
WITH ORDINALITYAlso return the 0-based index in the source arrays for each transposed array value; the index value will be returned in an alias specified in <table/columns aliases>.

Examples

The following examples use this data set as a baseline:

+-----------+-----------------+
| ai        | av              |
+-----------+-----------------+
| [1,2,3]   | ["A","B","C"]   |
| [4,5,6]   | ["D","E","F"]   |
| [7,8,9]   | ["D","E","F"]   |
| [1,2,3]   | ["G","H","I"]   |
+-----------+-----------------+

To turn the strings in the av array in the example.arr_unn table into a column of values aliased as uav:

1
2
SELECT uav
FROM UNNEST_JSON_ARRAY(SELECT av FROM example.arr_unn) uav
+-------+
| uav   |
+-------+
| A     |
| B     |
| C     |
| D     |
| D     |
| E     |
| E     |
| F     |
| F     |
| G     |
| H     |
| I     |
+-------+

To turn the integers in the ai array in the example.arr_unn table into a column of values aliased as uai and the corresponding index in the original array for each as a column of indices named ordinality:

1
2
SELECT uai, ordinality
FROM UNNEST_JSON_ARRAY(SELECT ai FROM example.arr_unn) WITH ORDINALITY uai
+-------+--------------+
|   uai |   ordinality |
+-------+--------------+
|     1 |            0 |
|     1 |            0 |
|     2 |            1 |
|     2 |            1 |
|     3 |            2 |
|     3 |            2 |
|     4 |            0 |
|     5 |            1 |
|     6 |            2 |
|     7 |            0 |
|     8 |            1 |
|     9 |            2 |
+-------+--------------+

To turn the integers & strings in the ai & av arrays in the example.arr_unn table into two columns of values aliased as uai & uav in a virtual table with alias t, respectively:

1
2
SELECT uai, uav
FROM example.arr_unn eau, UNNEST_JSON_ARRAY(eau.ai, eau.av) t(uai, uav)
+-------+-------+
|   uai | uav   |
+-------+-------+
|     1 | A     |
|     1 | G     |
|     2 | B     |
|     2 | H     |
|     3 | C     |
|     3 | I     |
|     4 | D     |
|     5 | E     |
|     6 | F     |
|     7 | D     |
|     8 | E     |
|     9 | F     |
+-------+-------+

To turn the integers & strings in the ai & av arrays in the example.arr_unn table into two columns of values aliased as uai & uav in a virtual table with alias t, respectively, and adding the corresponding index in the original array for each as a column of indices aliased as index:

1
2
SELECT uai, uav, index
FROM example.arr_unn eau, UNNEST_JSON_ARRAY(eau.ai, eau.av) WITH ORDINALITY t(uai, uav, index)
+-------+-------+---------+
|   uai | uav   |   index |
+-------+-------+---------+
|     1 | A     |       0 |
|     1 | G     |       0 |
|     2 | B     |       1 |
|     2 | H     |       1 |
|     3 | C     |       2 |
|     3 | I     |       2 |
|     4 | D     |       0 |
|     5 | E     |       1 |
|     6 | F     |       2 |
|     7 | D     |       0 |
|     8 | E     |       1 |
|     9 | F     |       2 |
+-------+-------+---------+

To turn the integers in the ai array in the example.arr_unn table into a column of values aliased as uai, removing any duplicates across all arrays:

1
2
SELECT DISTINCT uai
FROM UNNEST_JSON_ARRAY(SELECT ai FROM example.arr_unn) uai
+-------+
|   uai |
+-------+
|     1 |
|     2 |
|     3 |
|     4 |
|     5 |
|     6 |
|     7 |
|     8 |
|     9 |
+-------+