The UDF simulator simulates the mechanics of the /execute/proc call without the UDF actually having to be created in the database. The simulator reads data out of tables, writes it to files in the correct format, and provides the environment variable that needs to be set for the UDF API. After the UDF code is run, it can optionally read any output that was written and write it back into Kinetica. The simulator does not run the UDF code itself, it only manages the environment so the developer can run the UDF code from a debugger, Jupyter notebook, etc., as long as the environment variable is set correctly.
Important
The UDF simulator is invoked via a Python script packaged with the
database: /opt/gpudb/api/python/examples/udfsim.py
. Though the script is
distributed with the Python API, you can use any type of UDF with the
simulator (Java, C++, etc.).
Mode | Description |
---|---|
execute <execute parameters> |
Simulates proc execution |
output <output parameters> |
Processes proc output |
clean |
Cleans up files generated from execute or output . Takes no parameters |
Parameter/Flag | Type Restriction | Description |
---|---|---|
-f </path/to/file> , --path </path/to/file> |
N/A | User-specified directory in which to create the
control files. Default directory is
/opt/gpudb/api/python/ |
-p <param_name value> , --param <param_name value> |
N/A | Proc execution parameter(s) (see /execute/proc for more information) |
-d , --distributed |
N/A | Enable distributed UDF simulation. This is the default execution mode |
-i <table_name [column_name, ...]> , --input <table_name [column_name, ...]> |
Distributed only | Input table and optional column list |
-o <table_name> , --output <table_name> |
Distributed only | Output table |
-n , --nondistributed |
N/A | Enable non-distributed UDF simulation |
-K <url> , --url <url> |
N/A | Kinetica URL. Default is http://localhost:9191 |
-U <username> , --username <username> |
N/A | Kinetica username for authentication |
-P <password> , --password <password> |
N/A | Kinetica password for authentication |
-h , --help |
N/A | Prints the help menu |
Parameter/Flag | Description | |
---|---|---|
-d , --dry-run |
Display output only; no output written to Kinetica | |
-K URL , --url URL |
Kinetica URL. Default is http://localhost:9191 |
|
-U username , --username username |
Kinetica username for authentication | |
-P password , --password password |
Kinetica password for authentication | |
-h , --help |
Prints the help menu |
Run the simulator with the execute
argument and any parameters. Once
finished, it prints an export command that sets the environment variable
needed for the UDF API:
python udfsim.py execute <execute parameters>
Run the printed export
command via command line:
export KINETICA_PCF=/opt/gpudb/api/kinetica-api-python/gpudb/kinetica-udf-sim-icf-Wvbdqi
Execute the UDF. For example, executing a Python UDF script:
python udf_cublas_proc.py
Tip
You can execute the UDF using whatever method (Jupyter notebook, debugger, etc.) as long as the environment variable output from step 2 has been set. The UDF can be executed multiple times without rerunning step 1 and step 2 as long as it hasn't output any data. For iterative testing purposes, it may be desirable to comment out any data output code and instead use print statements
Optionally, run the simulator with the output
argument and any
parameters to output data from the UDF into the database:
python udfsim.py output <output parameters>
Note
This mode requires the environment variable output from step 2 to be set
Optionally, run the simulator with the clean
argument to clean up all the
files written in step 2. The files can also be manually deleted if desired:
python udfsim.py clean
Note
This mode requires the environment variable output from step 2 to be set
Running the UDF simulator for the
Python table copy
proc
with the following parameters/flags:
udf_tc_py_in_table
table as input and udf_tc_py_out_table
table as
output (both tables created using
Python table copy init script
),/tmp/data/udf-sim-test/
output
mode$ python /opt/gpudb/python/api/examples/udfsim.py execute -f /tmp/data/udf-sim-test/ -d -i udf_tc_py_in_table -o udf_tc_py_out_table
export KINETICA_PCF=/tmp/data/udf-sim-test/kinetica-udf-sim-icf-k2cncp
$ python /opt/gpudb/udf/api/python/udf_tc_py_proc.py
$ python examples/udfsim.py output -d
No results
Output:
udf_tc_py_out_table: 10000 records
The UDF simulator has some limitations: