User Defined Functions (UDFs)

Kinetica provides support for User-Defined Function (UDF) creation and management in SQL. Unlike conventional UDFs, Kinetica UDFs are external programs that can be managed via SQL and may run in distributed fashion across the cluster.

A UDF that is defined to return a specific table type is known as a User-Defined Table Function, which can be used within a SELECT statement to return the output of the function as a query result set.

UDF/UDTF features accessible via SQL include:

For UDF/UDTF execute permission management, see:


CREATE FUNCTION

Creates a new UDF/UDTF with the given options. A UDTF is simply a UDF defined to return a single table, using the RETURNS TABLE clause. Only users with the SYSTEM ADMIN permission can create UDFs & UDTFs.

CREATE FUNCTION Syntax
1
2
3
4
5
6
7
CREATE [OR REPLACE] FUNCTION <function name>
[RETURNS TABLE (<column definition list>)]
[MODE = '<execution mode>']
[RUN_COMMAND = '<command>']
[RUN_COMMAND_ARGS = '<command args>']
FILE PATHS '<file paths>'
[WITH OPTIONS (<option name> = '<option value>'[,...])]
ParametersDescription
OR REPLACEAny existing UDF with the same name will be dropped before creating this one
<function name>Name of the UDF, which can be referenced in subsequent commands
RETURNS TABLE

Specifies that the UDF returns a table after execution and defines the column names and types to which the function's output data is expected to conform

Important

This defines the UDF as a User-Defined Table Function (UDTF), which allows the function to be called within a SQL query; see Executing Functions for details.

MODE

Name of the execution mode for the UDF; the following modes are available:

ConstantDescription
DISTRIBUTEDThe function's processing will be distributed amongst the nodes in the cluster
NON-DISTRIBUTEDThe function's processing will be completed on a single node in the cluster
RUN_COMMANDThe command used to execute the files associated with the UDF; e.g., python, java
RUN_COMMAND_ARGSThe arguments supplied alongside the given RUN_COMMAND; e.g., -jar udf.jar, --username jdoe --password 'Password123!'
FILE PATHS

A comma-separated list of single-quoted file paths from which the function and any additional files will be loaded; the file paths can refer to one of two locations:

  • OS-level paths under the external files directory that are either:
    • accessible to all nodes in the Kinetica cluster
    • mirrored across each node in the Kinetica cluster
  • KiFS paths

The files must exist prior to the UDF/UDTF creation.

WITH OPTIONS

Optional indicator that a comma-delimited list of option/value assignments will follow

OptionDescription
max_concurrency_per_nodeThe maximum number of concurrent instances of the function that will be executed per node. Setting to 0 allows unlimited concurrency. The default value is 0.

For example, to create a distributed Python UDF_SOS_PY_PROC UDF that returns a table featuring two columns (noting that the udf_sos_py_proc.py file was uploaded prior to running the example):

CREATE FUNCTION (UDF/local) Example
1
2
3
4
5
CREATE OR REPLACE FUNCTION UDF_SOS_PY_PROC
MODE = 'distributed'
RUN_COMMAND = 'python'
RUN_COMMAND_ARGS = '/opt/gpudb/persist/data/udf_sos_py_proc.py'
FILE PATHS '/opt/gpudb/persist/data/udf_sos_py_proc.py'

To create the same function as a UDTF with files loaded from KiFS instead:

CREATE FUNCTION (UDTF/KiFS) Example
1
2
3
4
5
6
CREATE OR REPLACE FUNCTION UDTF_SOS_PY_PROC
RETURNS TABLE (id SMALLINT NOT NULL, y FLOAT)
MODE = 'distributed'
RUN_COMMAND = 'python'
RUN_COMMAND_ARGS = 'udf/udf_sos_py_proc.py'
FILE PATHS 'kifs://udf/udf_sos_py_proc.py'

Executing Functions

An existing UDF/UDTF can be executed using the EXECUTE FUNCTION command. Additionally, a UDTF can be executed as a table function within a SELECT statement. Any user with the SYSTEM ADMIN permission or the EXECUTE FUNCTION permission on a specific UDF/UDTF (or across all UDFs/UDTFs) is allowed to execute it.

Execute UDF/UDTF Syntax
1
2
3
4
5
6
EXECUTE FUNCTION <function name>
(
   <function parameter 1> => <function value 1>,
   ...
   <function parameter N> => <function value N>
)
Execute UDTF Table Function Syntax
1
2
3
4
5
6
7
8
9
SELECT * FROM TABLE
(
   <function name>
   (
      <function parameter 1> => <function value 1>,
      ...
      <function parameter N> => <function value N>
   )
)

The <function name> should be the name of an existing UDF.

Function Parameters

The following are the available function parameters that may be passed to a UDF in an EXECUTE FUNCTION or table function call, along with their corresponding function values.

ParametersDescription
INPUT_TABLE_NAMES

Optional set of "tables" that will be used as input to the UDF/UDTF, specified as a set of queries defining the source data for each "table". The value of this parameter is either the INPUT_TABLE or INPUT_TABLES function, depending on the number of input "tables".

ValueDescription
INPUT_TABLE(<query>)

If only one input table is needed, it can be passed by query or by name to the INPUT_TABLE function; e.g.:

INPUT_TABLE(customer)

INPUT_TABLE(SELECT id, name FROM customer)
INPUT_TABLES(<queries>)

If more than one input table is needed, each can be passed via query as a list to the INPUT_TABLES function; e.g.:

INPUT_TABLES
(
    (SELECT id, name FROM customer),
    (SELECT id, customer_id FROM order)
)
OUTPUT_TABLE_NAMES

Optional set of tables that will be used as output, specified as a set of names passed as a comma-delimited list of strings to the OUTPUT_TABLES function; e.g.:

OUTPUT_TABLES('sales_summary', 'customer_profile')
PARAMS

Optional list of parameters to pass to the function, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.:

KV_PAIRS(customer_id = '1', report_type = 'yearly')
OPTIONS

Optional list of function execution options, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.:

KV_PAIRS(max_concurrency_per_node = '2')

For example, to execute the UDF_SOS_PY_PROC UDF, specifying an input and output table:

Execute UDF/UDTF Example
1
2
3
4
5
EXECUTE FUNCTION UDF_SOS_PY_PROC
(
    INPUT_TABLE_NAMES => INPUT_TABLE(SELECT id, x1, x2 FROM example.udf_sos_in_table),
    OUTPUT_TABLE_NAMES => OUTPUT_TABLES('example.udf_sos_out_table')
)

To execute the UDTF_SOS_PY_PROC UDTF, specifying an input table and returning the output as the result set of a SELECT statement:

Execute UDTF Table Function Example
1
2
3
4
5
6
7
SELECT * FROM TABLE
(
    UDTF_SOS_PY_PROC
    (
        INPUT_TABLE_NAMES => INPUT_TABLE(SELECT id, x1, x2 FROM example.udf_sos_in_table)
    )
)

DROP FUNCTION

Removes an existing UDF/UDTF. Only users with the SYSTEM ADMIN permission can remove UDFs.

DROP FUNCTION Syntax
1
DROP FUNCTION <function name>
ParametersDescription
<function name>Name of the UDF to remove

To drop the UDF_SOS_PY_PROC UDF:

DROP FUNCTION Example
1
DESCRIBE FUNCTION UDF_SOS_PY_PROC

SHOW FUNCTION

Outputs the configuration of an existing UDF/UDTF. Users with SYSTEM ADMIN permission will see the entire UDF configuration, while users with proc level permissions will only see the name of the UDF & its execution mode.

SHOW FUNCTION Syntax
1
SHOW FUNCTION < <function name> | * >
ParametersDescription
<function name>Name of the UDF whose configuration will be output. Use * instead to output the DDL of all credentials.

Note

The response to SHOW FUNCTION is a single-column result set with the DDL statement as the value in the DDL column.

To show the UDF_SOS_PY_PROC UDF:

SHOW FUNCTION Example
1
SHOW FUNCTION UDF_SOS_PY_PROC

DESCRIBE FUNCTION

Displays the attributes of the given UDF/UDTF in tabular format. Users with SYSTEM ADMIN permission or proc-level permissions on the function (or across all functions) will be allowed to see the function attributes.

DESCRIBE FUNCTION Syntax
1
DESCRIBE FUNCTION <function name>
ParametersDescription
<function name>Name of the UDF whose attributes will be output

Note

The response to DESCRIBE FUNCTION is a six-column result set:

  • FUNCTION_NAME - name of the UDF/UDTF
  • MODE - execution mode of the function; distributed or non-distributed
  • RUN_COMMAND - command used to run the function
  • RUN_COMMAND_ARGS - command arguments used to run the function
  • FILE_PATHS - list of files involved in the execution of the function
  • WITH_OPTIONS - list of key/value pairs of options used to create the function

To describe the UDF_SOS_PY_PROC UDF:

DROP FUNCTION Example
1
DESCRIBE FUNCTION UDF_SOS_PY_PROC