Arrays

Array Type

The array data type provides the capability to store a list of primitive data type values in a single column.

All arrays in Kinetica are one-dimensional. They are also unbounded, though the number of items in the array can be specified upon table/column creation for compatibility with other databases.

Arrays can be lists of any of the following data types:

  • boolean
  • int
  • long
  • float
  • double
  • string

Create Table

SQL
1
2
3
4
5
CREATE OR REPLACE TABLE example.arr
(
    ai INTEGER[] NOT NULL,
    av3 VARCHAR[3] NOT NULL
)
Python
1
2
3
4
5
6
7
table_name = 'example.arr'
columns = [
    ['ai', 'string', 'array(int)'],
    ['as3', 'string', 'array(string,3)']
]

gpudb.GPUdbTable(columns, table_name, db = kinetica)

Insert Data

SQL
1
2
INSERT INTO example.arr
VALUES ('[123456789, -987654321, 0, 0]','["ABCDEFG", "HIJKLMNOPQRSTUVWXYZ", ""]')
Python (as strings)
1
2
3
4
5
6
7
8
data = [[
        '[123456789, -987654321, 0, 0]',
        '["ABCDEFG", "HIJKLMNOPQRSTUVWXYZ", ""]'
]]

t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

t.insert_records(data)
Python (as native types)
1
2
3
4
5
6
7
8
data = [[
        [123456789, -987654321, 0, 0],
        ['ABCDEFG', 'HIJKLMNOPQRSTUVWXYZ', '']
]]

t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

t.insert_records(data)

Retrieve Data

To retrieve the 2nd item of an integer array and the entirety of a string array:

SQL
1
2
SELECT ai[2] AS ai_item_2, av3
FROM example.arr
Python
1
2
3
4
5
6
7
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

records = t.get_records_by_column(["ARRAY_ITEM(ai, 2) AS ai_item_2", "as3"])

print('%-10s %s' % ('ai_item_2', 'as3'))
for record in zip(records['ai_item_2'], records['as3']):
    print('%-10d %s' % (record[0], record[1]))
Python (w/ native type conversions)
1
2
3
4
5
6
7
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica, convert_special_types_on_retrieval = True)

records = t.get_records_by_column(["ARRAY_ITEM(ai, 2) AS ai_item_2", "as3"], encoding = 'json')

print('%-10s %s' % ('ai_item_2', 'as3'))
for record in zip(records['ai_item_2'], records['as3']):
    print('%-10d %s' % (record[0], record[1]))

Array Functions

Scalar Functions

These functions can be applied to individual array column values.

FunctionDescription
ARRAY_ITEM(array, pos)

Returns the item in array at the given 1-based pos position

See examples.

ARRAY_LENGTH(array)

Returns the number of items in the given array at the 1st dimension

See examples.

ARRAY_LOWER(array, dim)Returns the lowest index of array in the given dim dimension; since all arrays are 1-dimensional, only specifying a dimension of 1 will return a value, and since array indices are 1-based, that value will always be 1
ARRAY_NDIMS(array)Returns the number of dimensions in the given array; since all arrays are 1-dimensional, that number will always be 1
ARRAY_TO_STRING(array, delim)

Converts the items of array into a string delimited by delim

See examples.

ARRAY_UPPER(array, dim)Returns the highest index of array in the given dim dimension; since all arrays are 1-dimensional, only specifying a dimension of 1 will return a value, and since array indices are 1-based, that value will always be equivalent to ARRAY_LENGTH
STRING_TO_ARRAY(str, delim)

Converts the values in str, delimited by delim, into an array

See examples.


Examples

ARRAY_ITEM
SQL
1
2
SELECT ARRAY_ITEM(ai, 2) AS ai_item_2, ARRAY_ITEM(av3, 2) AS av3_item_2
FROM example.arr
Python
1
2
3
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

records = t.get_records_by_column(["ARRAY_ITEM(ai, 2) AS ai_item_2", "as3"])

+--------------+-----------------------+
|    ai_item_2 | av3_item_2            |
+--------------+-----------------------+
|   -987654321 | HIJKLMNOPQRSTUVWXYZ   |
+--------------+-----------------------+
ARRAY_LENGTH
SQL
1
2
SELECT ARRAY_LENGTH(ai) AS ai_len, ARRAY_LENGTH(av3) AS av3_len
FROM example.arr
Python
1
2
3
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

records = t.get_records_by_column(["ARRAY_LENGTH(ai) AS ai_len", "ARRAY_LENGTH(as3) AS as3_len"])

+----------+-----------+
|   ai_len |   av3_len |
+----------+-----------+
|        4 |         3 |
+----------+-----------+
ARRAY_TO_STRING
SQL
1
2
SELECT ARRAY_TO_STRING(ai, ',') AS ai_str, ARRAY_TO_STRING(av3, '~') AS av3_str
FROM example.arr
Python
1
2
3
t = gpudb.GPUdbTable(name = 'example.arr', db = kinetica)

records = t.get_records_by_column(["ARRAY_TO_STRING(ai, ',') AS ai_str", "ARRAY_TO_STRING(as3, '~') AS as3_str"])

+----------------------------+--------------------------------+
| ai_str                     | av3_str                        |
+----------------------------+--------------------------------+
| 123456789,-987654321,0,0   | ABCDEFG~HIJKLMNOPQRSTUVWXYZ~   |
+----------------------------+--------------------------------+
STRING_TO_ARRAY
SQL
1
2
3
SELECT
    STRING_TO_ARRAY('123456789,-987654321,0,0', ',') AS ai_str,
    STRING_TO_ARRAY('ABCDEFG~HIJKLMNOPQRSTUVWXYZ~', '~') AS av3_str
Python
1
2
3
4
5
6
t = gpudb.GPUdbTable(name = 'SYSTEM.ITER', db = kinetica)

records = t.get_records_by_column([
    "STRING_TO_ARRAY('123456789,-987654321,0,0', ',') AS stri_as",
    "STRING_TO_ARRAY('ABCDEFG~HIJKLMNOPQRSTUVWXYZ~', '~') AS strs_as"
])

+--------------------------------------+------------------------------------------+
| ai_str                               | av3_str                                  |
+--------------------------------------+------------------------------------------+
| ["123456789","-987654321","0","0"]   | ["ABCDEFG","HIJKLMNOPQRSTUVWXYZ",null]   |
+--------------------------------------+------------------------------------------+

Aggregation Functions

These functions can be used across multiple primitive column values within grouping queries.

FunctionDescription
ARRAY_AGG(expr)

Combines column values within a group into a single array of those values

See examples.

ARRAY_AGG_DISTINCT(expr)

Combines column values within a group into a single array of those values, removing any duplicates

See examples.


Examples

ARRAY_AGG
Input
+------+---------+
|   id | name    |
+------+---------+
|    1 | Joe     |
|    2 | Jane    |
|    3 | Jeff    |
|    4 | Joyce   |
+------+---------+
SQL
1
2
SELECT ARRAY_AGG(id) AS id_list, ARRAY_AGG(name) AS name_list
FROM example.arr_agg
Python
1
2
3
4
5
6
t = gpudb.GPUdbTable(name = 'example.arr_agg', db = kinetica)

records = t.aggregate_group_by([
    "ARRAY_AGG(id) AS id_list",
    "ARRAY_AGG(name) AS name_list"
])['records']

+-------------+---------------------------------+
| id_list     | name_list                       |
+-------------+---------------------------------+
| [1,3,4,2]   | ["Joe","Jeff","Joyce","Jane"]   |
+-------------+---------------------------------+
ARRAY_AGG_DISTINCT
Input
+------+---------+
|   id | name    |
+------+---------+
|    1 | Joe     |
|    1 | Joyce   |
|    2 | Jane    |
|    2 | Jeff    |
|    3 | Jane    |
|    3 | Jeff    |
|    4 | Joe     |
|    4 | Joyce   |
+------+---------+
SQL
1
2
SELECT ARRAY_AGG_DISTINCT(id) AS id_list, ARRAY_AGG_DISTINCT(name) AS name_list
FROM example.arr_agg
Python
1
2
3
4
5
6
t = gpudb.GPUdbTable(name = 'example.arr_agg', db = kinetica)

records = t.aggregate_group_by([
    "ARRAY_AGG_DISTINCT(id) AS id_list",
    "ARRAY_AGG_DISTINCT(name) AS name_list"
])['records']

+-------------+---------------------------------+
| id_list     | name_list                       |
+-------------+---------------------------------+
| [1,2,3,4]   | ["Jane","Jeff","Joe","Joyce"]   |
+-------------+---------------------------------+

UNNEST_JSON_ARRAY

The UNNEST_JSON_ARRAY function transposes array values into columnar data.

The two forms of the UNNEST_JSON_ARRAY function follow, one for cases where only one column needs to be unnested and one for cases where more than one column needs to be unnested. See examples.

UNNEST_JSON_ARRAY for 1 Column Syntax
1
2
SELECT <column alias> [, ordinality]
FROM UNNEST_JSON_ARRAY(SELECT <column name> FROM <table name>) [WITH ORDINALITY] [<column alias>]
ParametersDescription
<column alias>Column alias to use for the resulting unnested data column; if no alias specified, use * to select the column
<table name>Name of the table containing the array column
<column name>Name of the array column to unnest
WITH ORDINALITYAlso return the 0-based index in the source arrays for each transposed array value; the index value will be returned as the column alias ordinality

UNNEST_JSON_ARRAY for Multiple Columns Syntax
1
2
SELECT <column alias list> [, <ordinality alias>]
FROM <table name> <table alias>, UNNEST_JSON_ARRAY(<column list>) [WITH ORDINALITY] <table/columns aliases>
ParametersDescription
<column alias list>Column aliases given to the resulting unnested data columns, specified in <table/columns aliases>
<ordinality alias>Column alias given to the resulting ordinality column in <table/columns aliases>, if WITH ORDINALITY is specified
<table name>Name of the table containing the array column(s)
<table alias>Alias to use for the table containing the array column(s)
<column list>Comma-separated list of array columns to unnest
<table/columns aliases>

Alias to use for the virtual table and columns being used to return the unnested data (including any ordinality alias, if specified), using the form:

<table alias>(<column alias list> [,<ordinality alias>])

For example, to return two columns of unnested array data as uai & uav, respectively, and the corresponding ordinality index as index, all within the virtual table unn use:

unn(uai, uav, index)
WITH ORDINALITYAlso return the 0-based index in the source arrays for each transposed array value; the index value will be returned in an alias specified in <table/columns aliases>

Examples

The following examples use this data set as a baseline:

+-----------+-----------------+
| ai        | av              |
+-----------+-----------------+
| [1,2,3]   | ["A","B","C"]   |
| [4,5,6]   | ["D","E","F"]   |
| [7,8,9]   | ["D","E","F"]   |
| [1,2,3]   | ["G","H","I"]   |
+-----------+-----------------+

To turn the strings in the av array in the example.arr_unn table into a column of values aliased as uav:

1
2
SELECT uav
FROM UNNEST_JSON_ARRAY(SELECT av FROM example.arr_unn) uav
+-------+
| uav   |
+-------+
| A     |
| B     |
| C     |
| D     |
| D     |
| E     |
| E     |
| F     |
| F     |
| G     |
| H     |
| I     |
+-------+

To turn the integers in the ai array in the example.arr_unn table into a column of values aliased as uai and the corresponding index in the original array for each as a column of indices named ordinality:

1
2
SELECT uai, ordinality
FROM UNNEST_JSON_ARRAY(SELECT ai FROM example.arr_unn) WITH ORDINALITY uai
+-------+--------------+
|   uai |   ordinality |
+-------+--------------+
|     1 |            0 |
|     1 |            0 |
|     2 |            1 |
|     2 |            1 |
|     3 |            2 |
|     3 |            2 |
|     4 |            0 |
|     5 |            1 |
|     6 |            2 |
|     7 |            0 |
|     8 |            1 |
|     9 |            2 |
+-------+--------------+

To turn the integers & strings in the ai & av arrays in the example.arr_unn table into two columns of values aliased as uai & uav in a virtual table with alias t, respectively:

1
2
SELECT uai, uav
FROM example.arr_unn eau, UNNEST_JSON_ARRAY(eau.ai, eau.av) t(uai, uav)
+-------+-------+
|   uai | uav   |
+-------+-------+
|     1 | A     |
|     1 | G     |
|     2 | B     |
|     2 | H     |
|     3 | C     |
|     3 | I     |
|     4 | D     |
|     5 | E     |
|     6 | F     |
|     7 | D     |
|     8 | E     |
|     9 | F     |
+-------+-------+

To turn the integers & strings in the ai & av arrays in the example.arr_unn table into two columns of values aliased as uai & uav in a virtual table with alias t, respectively, and adding the corresponding index in the original array for each as a column of indices aliased as index:

1
2
SELECT uai, uav, index
FROM example.arr_unn eau, UNNEST_JSON_ARRAY(eau.ai, eau.av) WITH ORDINALITY t(uai, uav, index)
+-------+-------+---------+
|   uai | uav   |   index |
+-------+-------+---------+
|     1 | A     |       0 |
|     1 | G     |       0 |
|     2 | B     |       1 |
|     2 | H     |       1 |
|     3 | C     |       2 |
|     3 | I     |       2 |
|     4 | D     |       0 |
|     5 | E     |       1 |
|     6 | F     |       2 |
|     7 | D     |       0 |
|     8 | E     |       1 |
|     9 | F     |       2 |
+-------+-------+---------+

To turn the integers in the ai array in the example.arr_unn table into a column of values aliased as uai, removing any duplicates across all arrays:

1
2
SELECT DISTINCT uai
FROM UNNEST_JSON_ARRAY(SELECT ai FROM example.arr_unn) uai
+-------+
|   uai |
+-------+
|     1 |
|     2 |
|     3 |
|     4 |
|     5 |
|     6 |
|     7 |
|     8 |
|     9 |
+-------+