User Defined Functions (UDFs)

Kinetica provides support for User-Defined Function (UDF) creation and management in SQL. Unlike conventional UDFs, Kinetica UDFs are external programs that can be managed via SQL and may run in distributed fashion across the cluster.

A UDF that is defined to return a specific table type is known as a User-Defined Table Function (UDTF), which can be used within a SELECT statement to return the output of the function as a query result set.

UDF/UDTF features accessible via SQL include:

For UDF/UDTF execute permission management, see:


CREATE FUNCTION

Creates a new UDF/UDTF with the given options. A UDTF is simply a UDF defined to return a single table, using the RETURNS TABLE clause. Only users with the SYSTEM ADMIN permission can create UDFs & UDTFs.

CREATE FUNCTION Syntax
1
2
3
4
5
6
7
CREATE [OR REPLACE] FUNCTION <function name>
[RETURNS TABLE (<column definition list>)]
[MODE = '<execution mode>']
[RUN_COMMAND = '<command>']
[RUN_COMMAND_ARGS = '<command args>']
FILE PATHS '<file paths>'
[WITH OPTIONS (<option name> = '<option value>'[,...])]
Parameters Description
OR REPLACE Any existing UDF with the same name will be dropped before creating this one
<function name> Name of the UDF, which can be referenced in subsequent commands
RETURNS TABLE

Specifies that the UDF returns a table after execution and defines the column names and types to which the function's output data is expected to conform

Important

This defines the UDF as a User-Defined Table Function (UDTF), which allows the function to be called within a SQL query; see Executing Functions for details.

MODE

Name of the execution mode for the UDF; the following modes are available:

Constant Description
DISTRIBUTED The function's processing will be distributed amongst the nodes in the cluster
NON-DISTRIBUTED The function's processing will be completed on a single node in the cluster
RUN_COMMAND The command used to execute the files associated with the UDF; e.g., python, java
RUN_COMMAND_ARGS The arguments supplied alongside the given RUN_COMMAND; e.g., -jar udf.jar, --username jdoe --password 'Password123!'
FILE PATHS A comma-separated list of single-quoted file paths from which the function and any additional files will be loaded; the file paths should be KiFS locations and exist prior to the UDF/UDTF creation.
WITH OPTIONS

Optional indicator that a comma-delimited list of option/value assignments will follow

Option Description
max_concurrency_per_node The maximum number of concurrent instances of the function that will be executed per node. Setting to 0 allows unlimited concurrency. The default value is 0.

Before creating a UDF/UDTF, upload the function file, udf_sos_py_proc.py to a new directory, udf, in KiFS:

Create UDF Directory Example
1
CREATE DIRECTORY 'udf'
UPLOAD UDF FILE Example
1
2
UPLOAD FILE '/tmp/proc/udf_sos_py_proc.py'
INTO 'udf'

To create a distributed Python UDF_SOS_PY_PROC UDF that uses that file and returns a table featuring two columns:

CREATE FUNCTION (UDF) Example
1
2
3
4
5
CREATE OR REPLACE FUNCTION UDF_SOS_PY_PROC
MODE = 'distributed'
RUN_COMMAND = 'python'
RUN_COMMAND_ARGS = 'udf/udf_sos_py_proc.py'
FILE PATHS 'kifs://udf/udf_sos_py_proc.py'

To create the same function as a UDTF instead:

CREATE FUNCTION (UDTF) Example
1
2
3
4
5
6
CREATE OR REPLACE FUNCTION UDTF_SOS_PY_PROC
RETURNS TABLE (id SMALLINT NOT NULL, y FLOAT)
MODE = 'distributed'
RUN_COMMAND = 'python'
RUN_COMMAND_ARGS = 'udf/udf_sos_py_proc.py'
FILE PATHS 'kifs://udf/udf_sos_py_proc.py'

The Python script, udf_sos_py_proc.py , employed in these examples is shown below:

UDF Function Python Script
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
################################################################################
#                                                                              #
# Kinetica UDF Sum of Squares Example UDF                                      #
# ---------------------------------------------------------------------------- #
# This UDF takes pairs of input & output tables, computing the sum of the      #
# squares of all the columns for each input table and saving the resulting     #
# sums to the first column of the corresponding output table.                  #
#                                                                              #
################################################################################

import sys
import math
from itertools import islice
from kinetica_proc import ProcData

# Instantiate a handle to the ProcData() class
proc_data = ProcData()

# For each pair of input & output tables, calculate the sum of squares of input
#    columns and save results to first output table column
for in_table, out_table in zip(proc_data.input_data, proc_data.output_data):

    # Extend the output table's record capacity by the number of records in the input table
    out_table.size = in_table.size

    # Grab a handle to the second column in the output table (the sum column)
    y = out_table[1];

    # For every record in the table...
    for i in range(0, in_table.size):
        # Copy the input IDs in the first column of the input table
        # to the first column of the output table for later association
        out_table[0][i] = in_table[0][i]

    # Loop through the remaining input table columns
    for in_column in islice(in_table, 1, None):
        # For every record value in the column...
        for calc_num in range(0, in_table.size):
            # Add the square of that value to the corresponding output column
            y[calc_num] += in_column[calc_num] ** 2

proc_data.complete()

Executing Functions

An existing UDF/UDTF can be executed using the EXECUTE FUNCTION command. Additionally, a UDTF can be executed as a table function within a SELECT statement. Any user with the SYSTEM ADMIN permission or the EXECUTE FUNCTION permission on a specific UDF/UDTF (or across all UDFs/UDTFs) is allowed to execute it.

Execute UDF/UDTF Syntax
1
2
3
4
5
6
EXECUTE FUNCTION <function name>
(
   <function parameter 1> => <function value 1>,
   ...
   <function parameter N> => <function value N>
)
Execute UDTF Table Function Syntax
1
2
3
4
5
6
7
8
9
SELECT * FROM TABLE
(
   <function name>
   (
      <function parameter 1> => <function value 1>,
      ...
      <function parameter N> => <function value N>
   )
)

The <function name> should be the name of an existing UDF.

Function Parameters

The following are the available function parameters that may be passed to a UDF in an EXECUTE FUNCTION or table function call, along with their corresponding function values.

Parameters Description
INPUT_TABLE_NAMES

Optional set of "tables" that will be used as input to the UDF/UDTF, specified as a set of queries defining the source data for each "table". The value of this parameter is either the INPUT_TABLE or INPUT_TABLES function, depending on the number of input "tables".

Value Description
INPUT_TABLE(<query>)

If only one input table is needed, it can be passed by query or by name to the INPUT_TABLE function; e.g.:

INPUT_TABLE(customer)

INPUT_TABLE(SELECT id, name FROM customer)
INPUT_TABLES(<queries>)

If more than one input table is needed, each can be passed via query as a list to the INPUT_TABLES function; e.g.:

INPUT_TABLES
(
    (SELECT id, name FROM customer),
    (SELECT id, customer_id FROM order)
)
OUTPUT_TABLE_NAMES

Optional set of tables that will be used as output, specified as a set of names passed as a comma-delimited list of strings to the OUTPUT_TABLES function; e.g.:

OUTPUT_TABLES('sales_summary', 'customer_profile')
PARAMS

Optional list of parameters to pass to the function, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.:

KV_PAIRS(customer_id = '1', report_type = 'yearly')
OPTIONS

Optional list of function execution options, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.:

KV_PAIRS(max_concurrency_per_node = '2')

For example, to execute the UDF_SOS_PY_PROC UDF, specifying an input and output table:

Execute UDF/UDTF Example
1
2
3
4
5
EXECUTE FUNCTION UDF_SOS_PY_PROC
(
    INPUT_TABLE_NAMES => INPUT_TABLE(SELECT id, x1, x2 FROM example.udf_sos_in_table),
    OUTPUT_TABLE_NAMES => OUTPUT_TABLES('example.udf_sos_out_table')
)

To execute the UDTF_SOS_PY_PROC UDTF, specifying an input table and returning the output as the result set of a SELECT statement:

Execute UDTF Table Function Example
1
2
3
4
5
6
7
8
SELECT * FROM TABLE
(
    UDTF_SOS_PY_PROC
    (
        INPUT_TABLE_NAMES => INPUT_TABLE(SELECT id, x1, x2 FROM example.udf_sos_in_table)
    )
)
ORDER BY id

DROP FUNCTION

Removes an existing UDF/UDTF. Only users with the SYSTEM ADMIN permission can remove UDFs.

DROP FUNCTION Syntax
1
DROP FUNCTION <function name>
Parameters Description
<function name> Name of the UDF to remove

To drop the UDF_SOS_PY_PROC UDF:

DROP FUNCTION Example
1
DROP FUNCTION UDF_SOS_PY_PROC

SHOW FUNCTION

Outputs the configuration of an existing UDF/UDTF. Users with SYSTEM ADMIN permission will see the entire UDF configuration, while users with proc level permissions will only see the name of the UDF & its execution mode.

SHOW FUNCTION Syntax
1
SHOW FUNCTION < <function name> | * >
Parameters Description
<function name> Name of the UDF whose configuration will be output. Use * instead to output the DDL of all UDFs.

Note

The response to SHOW FUNCTION is a single-column result set with the DDL statement as the value in the DDL column.

To show the UDF_SOS_PY_PROC UDF:

SHOW FUNCTION Example
1
SHOW FUNCTION UDF_SOS_PY_PROC

DESCRIBE FUNCTION

Displays the attributes of the given UDF/UDTF in tabular format. Users with SYSTEM ADMIN permission or proc-level permissions on the function (or across all functions) will be allowed to see the function attributes.

DESCRIBE FUNCTION Syntax
1
DESCRIBE FUNCTION < <function name> | * >
Parameters Description
<function name> Name of the UDF whose attributes will be output. Use * instead to output the attributes of all UDFs.

Note

The response to DESCRIBE FUNCTION is a six-column result set:

  • FUNCTION_NAME - name of the UDF/UDTF
  • MODE - execution mode of the function; distributed or non-distributed
  • RUN_COMMAND - command used to run the function
  • RUN_COMMAND_ARGS - command arguments used to run the function
  • FILE_PATHS - list of files involved in the execution of the function
  • WITH_OPTIONS - list of key/value pairs of options used to create the function

To describe the UDF_SOS_PY_PROC UDF:

DROP FUNCTION Example
1
DESCRIBE FUNCTION UDF_SOS_PY_PROC