User Defined Functions (UDFs)

Kinetica provides support for User-Defined Function (UDF) creation and management in SQL. Unlike conventional UDFs, Kinetica UDFs are external programs that can be managed via SQL and may run in distributed fashion across the cluster.

A UDF that is defined to return a specific table type is known as a User-Defined Table Function (UDTF), which can be used within a SELECT statement to return the output of the function as a query result set.

Package management for Python UDFs is available via the UDF environment feature.

UDF/UDTF features accessible via SQL include:

For UDF/UDTF execute permission management, see:

UDF/UDTF environment management features accessible via SQL include:


CREATE FUNCTION

Creates a new UDF/UDTF with the given options. A UDTF is simply a UDF defined to return a single table, using the RETURNS TABLE clause.

CREATE FUNCTION Syntax
1
2
3
4
5
6
7
CREATE [OR REPLACE] FUNCTION <function name>
[RETURNS TABLE (<column definition list>)]
[MODE = '<execution mode>']
[RUN_COMMAND = '<command>']
[RUN_COMMAND_ARGS = '<command args>']
FILE PATHS '<file paths>'
[WITH OPTIONS (<option name> = '<option value>'[,...])]
Parameters Description
OR REPLACE Any existing UDF with the same name will be dropped before creating this one
<function name> Name of the UDF, which can be referenced in subsequent commands
RETURNS TABLE

Specifies that the UDF returns a table after execution and defines the column names and types to which the function's output data is expected to conform

Important

This defines the UDF as a User-Defined Table Function (UDTF), which allows the function to be called within a SQL query; see Executing Functions for details.

MODE

Name of the execution mode for the UDF; the following modes are available:

Constant Description
DISTRIBUTED The function's processing will be distributed among the nodes in the cluster
NONDISTRIBUTED The function's processing will be completed on a single node in the cluster
RUN_COMMAND The command used to execute the files associated with the UDF; e.g., python
RUN_COMMAND_ARGS

The arguments supplied alongside the given RUN_COMMAND; e.g.:

'udf.py','--username','jdoe','--password','jdoe123'
FILE PATHS A comma-separated list of single-quoted file paths from which the function and any additional files will be loaded; the file paths should be KiFS locations and exist prior to the UDF/UDTF creation.
WITH OPTIONS

Optional indicator that a comma-delimited list of option/value assignments will follow

Option Description
max_concurrency_per_node The maximum number of concurrent instances of the function that will be executed per node. Setting to 0 allows unlimited concurrency. The default value is 0.
set_environment The Python UDF environment to use for this Python UDF.

Before creating a UDF/UDTF, upload the function file, udf_st.py to a new directory, udf, in KiFS:

Create UDF Directory Example
1
CREATE DIRECTORY 'udf' WITH OPTIONS (no_error_if_exists = true);
UPLOAD UDF FILE Example
1
2
UPLOAD FILE 'udf_st.py'
INTO 'udf'

To create a distributed Python udf_st UDF that uses that file:

CREATE FUNCTION (UDF) Example
1
2
3
4
5
6
CREATE OR REPLACE FUNCTION udf_st
MODE = 'distributed'
RUN_COMMAND = 'python'
RUN_COMMAND_ARGS = 'udf/udf_st.py'
FILE PATHS 'kifs://udf/udf_st.py'
WITH OPTIONS (SET_ENVIRONMENT = 'udfe_st')

To create the same function as a UDTF that returns a table featuring three columns instead:

CREATE FUNCTION (UDTF) Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE OR REPLACE FUNCTION UDTF_ST
RETURNS TABLE
(
      id SMALLINT NOT NULL,
      sentence VARCHAR,
      score REAL
)
MODE = 'distributed'
RUN_COMMAND = 'python'
RUN_COMMAND_ARGS = 'udf/udf_st.py'
FILE PATHS 'kifs://udf/udf_st.py'
WITH OPTIONS (SET_ENVIRONMENT = 'udfe_st')

Executing Functions

An existing UDF/UDTF can be executed using the EXECUTE FUNCTION command. Additionally, a UDTF can be executed as a table function within a SELECT statement.

Execute UDF/UDTF Syntax
1
2
3
4
5
6
EXECUTE FUNCTION <function name>
(
   <function parameter 1> => <function value 1>,
   ...
   <function parameter N> => <function value N>
)
Execute UDTF Table Function Syntax
1
2
3
4
5
6
7
8
9
SELECT * FROM TABLE
(
   <function name>
   (
      <function parameter 1> => <function value 1>,
      ...
      <function parameter N> => <function value N>
   )
)

The <function name> should be the name of an existing UDF.

Tip

To preserve the runtime statistics of a UDF/UDTF, use the KI_HINT_SAVE_UDF_STATS hint in the call to it; see below for an example.


Function Parameters

The following are the available function parameters that may be passed to a UDF in an EXECUTE FUNCTION or table function call, along with their corresponding function values.

Parameters Description
INPUT_TABLE_NAMES

Optional set of "tables" that will be used as input to the UDF/UDTF, specified as a set of queries defining the source data for each "table". The value of this parameter is either the INPUT_TABLE or INPUT_TABLES function, depending on the number of input "tables".

Value Description
INPUT_TABLE(<query>)

If only one input table is needed, it can be passed by query or by name to the INPUT_TABLE function; e.g.:

INPUT_TABLE(customer)

INPUT_TABLE(SELECT id, name FROM customer)
INPUT_TABLES(<queries>)

If more than one input table is needed, each can be passed via query as a list to the INPUT_TABLES function; e.g.:

INPUT_TABLES
(
    (SELECT id, name FROM customer),
    (SELECT id, customer_id FROM order)
)
OUTPUT_TABLE_NAMES

Optional set of tables that will be used as output, specified as a set of names passed as a comma-delimited list of strings to the OUTPUT_TABLES function; e.g.:

OUTPUT_TABLES('sales_summary', 'customer_profile')
PARAMS

Optional list of parameters to pass to the function, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.:

KV_PAIRS(customer_id = '1', report_type = 'yearly')
OPTIONS

Optional list of function execution options, specified as a set of key/value pairs passed as a comma-delimited list of <key> = '<value>' assignments to the KV_PAIRS function; e.g.:

KV_PAIRS(run_tag = 'my-udf')

Examples

To execute the udf_st UDF, specifying an input and output table and the hint to save runtime statistics:

Execute UDF Example
1
2
3
4
5
6
EXECUTE FUNCTION udf_st /* KI_HINT_SAVE_UDF_STATS */
(
    INPUT_TABLE_NAMES => INPUT_TABLE(example.udf_st_in_table),
    OUTPUT_TABLE_NAMES => OUTPUT_TABLES('example.udf_st_out_table'),
    PARAMS => KV_PAIRS(sentence = 'I live in a neighborhood in a small town.')
)

To execute the UDTF_ST UDTF, specifying an input table and returning the output as the result set of a SELECT statement:

Execute UDTF Table Function Example
1
2
3
4
5
6
7
8
9
SELECT TOP 5 score, sentence FROM TABLE
(
    UDTF_ST
    (
        INPUT_TABLE_NAMES => INPUT_TABLE(example.udf_st_in_table),
        PARAMS => KV_PAIRS(sentence = 'I live in a neighborhood in a small town.')
    )
)
ORDER BY score DESC

DROP FUNCTION

Removes an existing UDF/UDTF.

DROP FUNCTION Syntax
1
DROP FUNCTION <function name>
Parameters Description
<function name> Name of the UDF to remove

To drop the UDF_SOS_PY_PROC UDF:

DROP FUNCTION Example
1
DROP FUNCTION UDF_SOS_PY_PROC

SHOW FUNCTION

Outputs the configuration of one or more existing UDFs/UDTFs. Users with SYSTEM ADMIN or ADMIN permission on the UDF (or all UDFs) will see the entire UDF configuration, while users with EXECUTE permissions will only see the name of the UDF & its execution mode.

SHOW FUNCTION Syntax
1
SHOW FUNCTION < <function name> | * >
Parameters Description
<function name> Name of the UDF whose configuration will be output. Use * instead to output the DDL of all UDFs.

Note

The response to SHOW FUNCTION is a single-column result set with the DDL statement as the value in the DDL column.

To show the UDF_SOS_PY_PROC UDF:

SHOW FUNCTION Example
1
SHOW FUNCTION UDF_SOS_PY_PROC

SHOW FUNCTION STATUS

Outputs the status of one or more running, or previously running, UDFs/UDTFs. Users with SYSTEM ADMIN or ADMIN permission on all UDFs will see all current & historical UDF runs, while users with EXECUTE or ADMIN permissions on individual UDFs will only see the current & historical runs for UDFs/UDTFs that they have executed themselves. Historical run data is preserved when a UDF/UDTF is executed with the KI_HINT_SAVE_UDF_STATS hint.

SHOW FUNCTION STATUS Syntax
1
SHOW FUNCTION STATUS [VERBOSE] [FOR '<run ID>']
Parameters Description
VERBOSE Specifies verbose output mode. This modifies the response as noted below.
FOR '<run ID>' Specifies that only the status of UDF/UDTF execution with run ID <run ID> should be output.

Note

The response to SHOW FUNCTION STATUS depends on whether the VERBOSE option was specified.

  • No VERBOSE - output is a three-column result set:
    • FUNCTION_NAME - name of the UDF/UDTF
    • RUN_ID - ID associated with a given execution of the UDF/UDTF
    • OVERALL_STATUS - completion status of UDF/UDTF
  • VERBOSE - output is a single-column, single-row result set with a list of all UDF/UDTF statuses, each as a formatted block of information in the FUNCTION_STATUSES column

To show the status of all UDF/UDTF runs:

SHOW FUNCTION STATUS Example
1
SHOW FUNCTION STATUS

To show the verbose status of a UDF/UDTF with run ID 1000:

SHOW FUNCTION STATUS By Run ID Example
1
SHOW FUNCTION STATUS VERBOSE FOR '1000'

DESCRIBE FUNCTION

Displays the attributes of the given UDF/UDTF in tabular format. Users with SYSTEM ADMIN or ADMIN permission on the UDF (or all UDFs) will see the entire UDF configuration, while users with EXECUTE permissions will only see the name of the UDF & its execution mode.

DESCRIBE FUNCTION Syntax
1
DESC[RIBE] FUNCTION < <function name> | * >
Parameters Description
<function name> Name of the UDF whose attributes will be output. Use * instead to output the attributes of all UDFs.

Note

The response to DESCRIBE FUNCTION is a six-column result set:

  • FUNCTION_NAME - name of the UDF/UDTF
  • MODE - execution mode of the function; distributed or nondistributed
  • RUN_COMMAND - command used to run the function
  • RUN_COMMAND_ARGS - command arguments used to run the function
  • FILE_PATHS - list of files involved in the execution of the function
  • WITH_OPTIONS - list of key/value pairs of options used to create the function

To describe the UDF_SOS_PY_PROC UDF:

DROP FUNCTION Example
1
DESCRIBE FUNCTION UDF_SOS_PY_PROC

CREATE FUNCTION ENVIRONMENT

Creates a new function environment for UDFs/UDTFs written using the Python UDF API, where any packages required by the Python UDF can be installed and used. Packages can be installed with the ALTER FUNCTION ENVIRONMENT command.

CREATE FUNCTION ENVIRONMENT Syntax
1
CREATE [OR REPLACE] FUNCTION ENVIRONMENT <environment name>

Note

UDFs running with no function environment specified run in the default function environment, kinetica-default-environment. See Pre-installed Libraries on the Cluster for the list of pre-installed Python packages.

Parameters Description
OR REPLACE Any existing UDF environment with the same name will be dropped before creating this one
<environment name> Name of the UDF environment, which can be referenced in subsequent commands

To create the Python UDF environment udfe_st:

CREATE FUNCTION ENVIRONMENT Example
1
CREATE OR REPLACE FUNCTION ENVIRONMENT udfe_st;

ALTER FUNCTION ENVIRONMENT

Manages the installed Python packages associated with the given Python UDF/UDTF environment. Packages are given as a space-separated list of names; and for installations, an optional version number per package can be specified.

ALTER FUNCTION ENVIRONMENT (Install Package) Syntax
1
2
ALTER FUNCTION ENVIRONMENT <environment name>
INSTALL PYTHON PACKAGE '<package name>[==<package version>] [...]>'
ALTER FUNCTION ENVIRONMENT (Uninstall Package) Syntax
1
2
ALTER FUNCTION ENVIRONMENT <environment name>
UNINSTALL PYTHON PACKAGE '<package name> [...]>'
Parameters Description
<environment name> Name of the UDF environment to manage
<package name> Name of the Python package to install into or uninstall from the UDF environment
<package version> Version number of the Python package to install

To install the sentence-transformers package and urllib3 version 1.26.18 into the Python UDF environment udfe_st:

Install Python Package Example
1
2
ALTER FUNCTION ENVIRONMENT udfe_st
INSTALL PACKAGE 'urllib3==1.26.18 sentence-transformers';

To uninstall the sentence-transformers & urllib3 packages from the Python UDF environment udfe_st:

Uninstall Python Package Example
1
2
ALTER FUNCTION ENVIRONMENT udfe_st
UNINSTALL PACKAGE 'sentence-transformers urllib3'

DROP FUNCTION ENVIRONMENT

Removes an existing Python UDF/UDTF environment.

DROP FUNCTION ENVIRONMENT Syntax
1
DROP FUNCTION ENVIRONMENT <environment name>
Parameters Description
<environment name> Name of the UDF environment to remove

To remove the Python UDF environment udfe_st:

DROP FUNCTION ENVIRONMENT Example
1
DROP FUNCTION ENVIRONMENT udfe_st

SHOW FUNCTION ENVIRONMENT

Outputs the CREATE FUNCTION ENVIRONMENT statement used to create the given Python UDF/UDTF environment(s).

SHOW FUNCTION ENVIRONMENT Syntax
1
SHOW FUNCTION ENVIRONMENT < <environment name> | * >

Note

The response to SHOW FUNCTION ENVIRONMENT is a single-column result set with the DDL statement as the value in the DDL column.

Parameters Description
<environment name> Name of the UDF environment whose CREATE statement will be output. Use * instead to output the CREATE statements of all UDF environments.

To show the Python UDF environment udfe_st:

SHOW FUNCTION ENVIRONMENT Example
1
SHOW FUNCTION ENVIRONMENT udfe_st

To show all Python UDF environments:

SHOW FUNCTION ENVIRONMENT (All Environments) Example
1
SHOW FUNCTION ENVIRONMENT *

DESCRIBE FUNCTION ENVIRONMENT

Displays the Python package listing for the given Python UDF/UDTF environment(s).

DESCRIBE FUNCTION ENVIRONMENT Syntax
1
DESC[RIBE] FUNCTION ENVIRONMENT < <environment name> | * >

Note

The response to DESCRIBE FUNCTION ENVIRONMENT is a two-column result set:

  • ENVIRONMENT_NAME - name of the function environment name
  • PACKAGES - names & versions of the Python packages installed in the function environment; this includes both user-installed packages and those installed by default
Parameters Description
<environment name> Name of the UDF environment whose Python packages will be output. Use * instead to output the Python packages of all UDF environments.

To list the packages in the Python UDF environment udfe_st:

DESC FUNCTION ENVIRONMENT Example
1
DESC FUNCTION ENVIRONMENT udfe_st

To list the packages in all Python UDF environments:

DESC FUNCTION ENVIRONMENT (All Environments) Example
1
DESC FUNCTION ENVIRONMENT *