Though any of the
native APIs, SQL, or
GAdmin
can be used for deploying and executing UDFs written in any UDF API language,
all the examples below are written using the native Python API for
convenience.
Deployment
Calling the create_proc() method (CREATE FUNCTION in SQL) will deploy the specified UDF to the Kinetica execution environment, to every server in the cluster. The method takes the following parameters:proc_name
proc_name
A system-wide unique name for the UDF
execution_mode
execution_mode
An execution mode; either distributed or nondistributed
files
files
A set of files composing the UDF package, including the names of the files and the binary data for
those files. The files specified will be created on the target Kinetica servers in the UDF
directory, with the given data and filenames; if the filenames contain subdirectories, that structure
will be copied to the target servers.Files in KiFS can also be used here. To use a KiFS file, pass the KiFS
URI as the name of the file and set the binary data portion to Will be made available to the UDF and need to be referenced in the
b''. In this case, the KiFS
files will be copied into the UDF execution environment under their respective UDF directories for
use when execute_proc() is called.For example, a file uploaded to (and referenced in the files mapping as):args listing as:Uploading files using the
files parameter should be reserved for smaller files; larger
files should be uploaded to KiFS and referenced there instead.command
command
The name of the command to run, which can be a file within the deployed UDF fileset, or any command
able to be executed within the host environment, e.g.,
python. If a host environment command is
specified, the host environment must be properly configured to support that command’s execution.args
args
A list of command-line arguments to pass to the specified command; e.g.,
./<file name>.pyAny files referenced here that were uploaded to KiFS will need to be prefixed with their
corresponding KiFS directory names; i.e.,
<kifs dir>/<file name>.<ext>options
options
Optional parameters for UDF creation, including the
function environment to use for the Python UDF.See create_proc() for more details.
Creating a UDF - Direct File Passing
To deploy a C++ UDF using the native Python API, a local, compiled proc executable (udf_tc_proc) can be read in as bytes and then passed into the
create_proc() call as a files map, with the key as the name of the file
and the value as the byte array.
Create UDF Example - Filename Constants
Create UDF Example - File Map Loading
Create UDF Example - create_proc() Call
Creating a UDF - Uploading to KiFS
To deploy a C++ UDF using the native Python API, a local, compiled proc executable (udf_tc_proc) can be uploaded to a KiFS directory and then
referenced in the create_proc() call, with the key as the KiFS path to the
file and the value as an empty byte array, b''.
Create UDF Example - KiFS File Uploading
Create UDF Example - KiFS create_proc() Call
Concurrency Limits
Themax_concurrency_per_node setting is available in the options map of
the /create/proc. This option allows you to define a per-Kinetica-
host concurrency limit for a UDF, i.e. no more than n OS processes (UDF
instances) in charge of evaluating the UDF will be permitted to execute
concurrently on a single Kinetica host. You may want to set a concurrency
limit if you have limited resources (like GPUs) and want to avoid the risks of
continually exhausting your resources. This setting is particularly useful for
distributed UDFs, but it will also work for non-distributed UDFs.
You can also set concurrency limits on the Edit Proc
screen in the UDF section of
GAdmin
Execution
Calling the execute_proc() method (see Executing Functions in SQL) will execute the specified UDF within the targeted Kinetica execution environment. The method takes the following parameters:proc_name
proc_name
The system-wide unique name for the UDF
params
params
Set of string-to-string key/value paired parameters to pass to the UDF
bin_params
bin_params
Set of string-to-binary key/value paired parameters to pass to the UDF
input_table_names
input_table_names
Input data table names, to be processed by the UDF
input_column_names
input_column_names
Mapping of input data table names to their respective column names, to be processed as input
data by the UDF
output_table_names
output_table_names
Output data table names, where processed data is to be appended
options
options
Optional parameters for UDF execution; see execute_proc() for details
run_id, which is
a string that can be used in subsequent checks of the execution status.
For example, to execute a proc that’s already been created (udf_tc_proc)
using existing input (udf_tc_in_table) and output (udf_tc_out_table)
tables:
Execute UDF Example - Parameter Constants
Execute UDF Example - execute_proc() Call
Management
UDFs can be managed using SQL, GAdmin, or through one of the native API calls:| Native API | SQL Command | Description |
|---|---|---|
| delete_proc() | DROP FUNCTION | Removes the given UDF definition from the system; needs to be called before create_proc() when recreating a UDF |
| has_proc() | Returns whether the given UDF exists | |
| kill_proc() | Terminates a running UDF (or UDFs) | |
| show_proc() | SHOW FUNCTION | Returns the parameter values used in creating the UDF |
| show_proc_status() | SHOW FUNCTION STATUS | Returns whether the UDF (or UDFs) is still running, has completed, or has exited with an error, along with any processed results |