The information below includes all information one needs to know to begin running Java UDFs. For more information on writing Java UDFs, see Java UDF API; for more information on simulating running UDFs, see UDF Simulator. Example Java UDFs can be found here.
Important
Though any of the native APIs can be used for running UDFs written in any UDF API language, all the examples below are written using the native Java API.
Calling the createProc() method will deploy the specified UDF to the Kinetica execution environment, to every server in the cluster. The method takes the following parameters:
Parameter | Description |
---|---|
procName |
A system-wide unique name for the UDF |
executionMode |
An execution mode; either distributed or nondistributed |
files |
A set of files composing the UDF package, including the names of the files and the binary data for those files--the files specified will be created on the target Kinetica
servers (default is the Note: Uploading files using the |
command |
The name of the command to run, which can be a file within the deployed UDF fileset, or any command able to be executed within the host environment, e.g., java . If a host
environment command is specified, the host environment must be properly configured to support that command's execution |
args |
A list of command-line arguments to pass to the specified command, e.g., -cp <class path> <class name> or -jar <file name.jar> |
options |
Optional parameters for UDF creation; see createProc() for details |
For example, to deploy a Java UDF using the native Java API, a local,
compiled proc jar (UdfTcJavaProc.jar
) will need to be read in as bytes and
then passed into the createProc()
call as a value paired with its key
(source file UdfTcJavaProc.jar
) inside map files
. A Java class
path argument
(-cp /opt/gpudb/udf/api/java/proc-api/kinetica-proc-api-1.0-jar-with-dependencies.jar:UdfTcJavaProc.jar UdfTcJavaProc
)
is also provided.
static String CSV_FILE_NAME = "rank_tom.csv";
static String JAR_HOME = "/opt/gpudb/udf/api/java/proc-api/";
static String PROC_JAR_FILE = PROC_NAME + ".jar";
Map<String, ByteBuffer> filesMap = new HashMap<>();
for (String fileName : Arrays.asList(CSV_FILE_NAME, PROC_JAR_FILE))
{
byte [] fileAsBytes = Files.readAllBytes(new File(fileName).toPath());
ByteBuffer fileByteBuffer = ByteBuffer.wrap(fileAsBytes);
filesMap.put(fileName, fileByteBuffer);
}
System.out.println("Registering distributed proc...");
CreateProcResponse createProcResponse = hDb.createProc(
PROC_NAME,
"distributed",
filesMap,
"java",
Arrays.asList("-cp", CLASS_PATH, PROC_NAME),
null
);
System.out.println("Proc created successfully:");
System.out.println(createProcResponse);
The max_concurrency_per_node
setting is available in the options
map of
the /create/proc. This option allows you to define a per-Kinetica-
host concurrency limit for a UDF, i.e. no more than n OS processes (UDF
instances) in charge of evaluating the UDF will be permitted to execute
concurrently on a single Kinetica host. You may want to set a concurrency
limit if you have limited resources (like GPUs) and want to avoid the risks of
continually exhausting your resources. This setting is particularly useful for
distributed UDFs, but it will also work for non-distributed UDFs.
Note
You can also set concurrency limits on the Edit Proc screen in the UDF section of GAdmin
The default value for the setting is 0, which results in no limits. If you
set the value to 4, only 4 instances of the UDF will be queued to execute
the UDF. This holds true across all invokations of the proc; this means that
even if /execute/proc
is called eight times, only 4 processes will be
running. Another instance will be queued as soon as one instance finishes
processing. This process will repeat, only allowing 4 instances of the UDF
to run at a time, until all instances have completed or the UDF is
killed.
Calling the executeProc() method will execute the specified UDF within the targeted Kinetica execution environment. The method takes the following parameters:
Parameter | Description |
---|---|
procName |
The system-wide unique name for the UDF |
params |
Set of string-to-string key/value paired parameters to pass to the UDF |
binParams |
Set of string-to-binary key/value paired parameters to pass to the UDF |
inputTableNames |
Input data table names, to be processed by the UDF |
inputColumnNames |
Mapping of input data table names to their respective column names, to be processed as input data by the UDF |
outputTableNames |
Output data table names, where processed data is to be appended |
options |
Optional parameters for UDF execution; see executeProc() for details |
The call is asynchronous and will return immediately with a run_id
, which is
a string that can be used in subsequent checks of the execution status.
For example, to execute a proc that's already been created
(UdfTcJavaProc
) using existing input (udf_tc_java_in_table
) and
output (udf_tc_java_out_table
) tables:
static String INPUT_TABLE = "udf_tc_java_in_table";
static String OUTPUT_TABLE = "udf_tc_java_out_table";
static String PROC_NAME = "UdfTcJavaProc";
System.out.println("Executing proc...");
ExecuteProcResponse executeProcResponse = hDb.executeProc(
PROC_NAME,
null,
null,
Collections.singletonList(INPUT_TABLE),
null,
Collections.singletonList(OUTPUT_TABLE),
null
);
System.out.println("Proc executed successfully:");
System.out.println(executeProcResponse);
System.out.println("Check the system log or 'gpudb-proc.log' for execution information");
UDFs can be managed using either GAdmin or one of the methods below: