Version:

Running C++ UDFs

The information below includes all information one needs to know to begin running C++ UDFs. For more information on writing C++ UDFs, see C++ UDF API; for more information on simulating running UDFs, see UDF Simulator. Example C++ UDFs can be found here.

Important

Though any of the native APIs can be used for running UDFs written in any UDF API language, all the examples below are written using the native Python API for convenience.

Deployment

Calling the createProc() method will deploy the specified UDF to the Kinetica execution environment, to every server in the cluster. The method takes the following parameters:

Parameter Description
procName A system-wide unique name for the UDF
executionMode An execution mode; either distributed or nondistributed
files

A set of files composing the UDF package, including the names of the files and the binary data for those files--the files specified will be created on the target Kinetica servers (default is the /opt/gpudb/procs/ directory), with the given data and filenames; if the filenames contain subdirectories, that structure will be copied to the target servers

Note: Uploading files using the files parameter should be reserved for smaller files; larger files should be uploaded to KiFS.

command The name of the command to run, which can be a file within the deployed UDF fileset, or any command able to be executed within the host environment, e.g., ./<file name>. If a host environment command is specified, the host environment must be properly configured to support that command's execution
args A list of command-line arguments to pass to the specified command
options Optional parameters for UDF creation; see createProc() for details

For example, to deploy a C++ UDF using the native Python API, a local, compiled proc executable (udf_tc_cpp_proc) will need to be read in as bytes and then passed into the createProc() call as a value paired with its key (source file udf_tc_cpp_proc) inside map files.

# Read proc code in as bytes and add to a file data map
files = {}
with open(file_name, 'rb') as file:
    files[file_name] = file.read()
print 'Registering proc...'
response = h_db.create_proc(
    proc_name=proc_name,
    execution_mode='distributed',
    files=files,
    command='./' + file_name,
    args=[],
    options={}
)
print response

Concurrency Limits

The max_concurrency_per_node setting is available in the options map of the /create/proc. This option allows you to define a per-Kinetica- host concurrency limit for a UDF, i.e. no more than n OS processes (UDF instances) in charge of evaluating the UDF will be permitted to execute concurrently on a single Kinetica host. You may want to set a concurrency limit if you have limited resources (like GPUs) and want to avoid the risks of continually exhausting your resources. This setting is particularly useful for distributed UDFs, but it will also work for non-distributed UDFs.

Note

You can also set concurrency limits on the Edit Proc screen in the UDF section of GAdmin

The default value for the setting is 0, which results in no limits. If you set the value to 4, only 4 instances of the UDF will be queued to execute the UDF. This holds true across all invokations of the proc; this means that even if /execute/proc is called eight times, only 4 processes will be running. Another instance will be queued as soon as one instance finishes processing. This process will repeat, only allowing 4 instances of the UDF to run at a time, until all instances have completed or the UDF is killed.

Execution

Calling the executeProc() method will execute the specified UDF within the targeted Kinetica execution environment. The method takes the following parameters:

Parameter Description
procName The system-wide unique name for the UDF
params Set of string-to-string key/value paired parameters to pass to the UDF
binParams Set of string-to-binary key/value paired parameters to pass to the UDF
inputTableNames Input data table names, to be processed by the UDF
inputColumnNames Mapping of input data table names to their respective column names, to be processed as input data by the UDF
outputTableNames Output data table names, where processed data is to be appended
options Optional parameters for UDF execution; see executeProc() for details

The call is asynchronous and will return immediately with a run_id, which is a string that can be used in subsequent checks of the execution status.

For example, to execute a proc that's already been created (udf_tc_cpp_proc) using existing input (udf_tc_cpp_in_table) and output (udf_tc_cpp_out_table) tables:

print 'Executing proc...'
response = h_db.execute_proc(
    proc_name=proc_name,
    params={},
    bin_params={},
    input_table_names=[INPUT_TABLE],
    input_column_names={},
    output_table_names=[OUTPUT_TABLE],
    options={}
)
print response

Management

UDFs can be managed using either GAdmin or one of the methods below:

  • showProcStatus() -- Returns whether the UDF is still running, has completed, or has exited with an error, along with any processed results
  • showProc() -- returns the parameter values used in creating the UDF
  • hasProc() -- returns whether the given UDF exists
  • killProc() -- terminates a running UDF by run_id
  • deleteProc() -- removes the given UDF definition from the system; needs to be called before createProc() when recreating a UDF