Version:

Running UDFs

Since UDFs provide the means for submitting and running arbitrary executable code on the server, they are disabled by default. When enabled, it is recommended that user authorization also be enabled, to ensure that UDFs are run only by trusted users. To this end, Kinetica only allows access to UDF functionality for users with administrative access. UDFs are also run under an OS user that is separate from the primary Kinetica user, to provide greater containment of UDF system access.

Prerequisites

  • Kinetica v5.4 or later

  • UDFs enabled in /opt/gpudb/core/etc/gpudb.conf:

    enable_procs = true
    
  • User authorization enabled (recommended); see Security Configuration section for details

  • Kinetica UDF APIs; these come packaged with the default Kinetica installation, but can be downloaded separately, if developing UDFs without a local Kinetica instance installed

Download the Kinetica APIs

The Kinetica UDF APIs are located in these directories, in a default Kinetica installation:

Default UDF API Locations
API Directory
C++ /opt/gpudb/udf/api/cpp
Java /opt/gpudb/udf/api/java
Python /opt/gpudb/udf/api/python

If developing UDFs without a local Kinetica installation, the APIs can be downloaded from here:

UDF API Download Locations
API GitHub Link
C++ https://github.com/kineticadb/kinetica-udf-api-cpp.git
Java https://github.com/kineticadb/kinetica-udf-api-java.git
Python https://github.com/kineticadb/kinetica-udf-api-python.git

After downloading, see the README.md in the UDF API directory created for further setup instructions.

Deployment

Calling the /create/proc endpoint will deploy the specified UDF to the Kinetica execution environment, to every server in the cluster. The endpoint takes the following parameters:

  • The system-wide unique name for the UDF
  • The set of files composing the UDF package, including the names of the files and the binary data for those files--the files specified will be created on the target Kinetica servers, with the given data and filenames; if the filenames contain subdirectories, that structure will be replicated on the target servers
  • The name of the command to run, which can be a file within the deployed UDF fileset, or any command able to be executed within the host environment. If a host environment command is specified (java, for instance), the host environment must be properly configured to support that command's execution.
  • A list of command-line arguments to pass to the specified command
  • An execution mode, either distributed or nondistributed

Note

For convenience, the native Python API can be used for deploying UDFs written in any language. All of the UDF examples given here will use the Python API for deployment.

Execution

Calling the /execute/proc endpoint will execute the specified UDF within the targeted Kinetica execution environment. The endpoint takes the following parameters:

  • The system-wide unique name for the UDF
  • Sets of string and/or binary key/value paired parameters to pass to the UDF
  • Input data table & column names, to be processed by the UDF
  • Output data table names, where processed data is to be appended
  • Options to cache the input data for use by subsequent /execute/proc invocations, and to use that cached data in subsequent invocations

The call is asynchronous and will return immediately with a run_id, which is a string that can be used in subsequent checks of the execution status.

Note

For convenience, the native Python API can also be used for executing UDFs written in any language. All of the UDF examples given here will use the Python API for execution.

Management

The status of a running UDF can be checked, by run_id, using the /show/proc/status endpoint. It will return whether the UDF is still running, has completed, or has exited with an error, along with any processed results.

Other UDF management endpoints include:

  • /show/proc - returns the parameter values used in creating the UDF
  • /has/proc - returns whether the given UDF exists
  • /kill/proc - terminates a running UDF by run_id
  • /delete/proc - removes the given UDF definition from the system; needs to be called before /create/proc when recreating a UDF

Note

The Kinetica Administration Application can be used to perform all of these actions via the Advanced Query Tool, under Query ‣ Advanced Query.