The following is a complete example, using the Python API, of a CUDA-based UDF that performs various computations using the scikit-CUDA interface. It will take two vectors and one matrix of data loaded from a Kinetica table and perform various operations in both NumPy & cuBLAS, writing the comparison output to the system log.
This setup assumes the UDF is being developed on the Kinetica host (or head
node host, if a multi-node Kinetica cluster); and that the Python database
API is available at
/opt/gpudb/api/python and the Python UDF API is
/opt/gpudb/udf/api/python. This setup also assumes a local
CUDA installation at
This example will contain the following Python scripts (click to download):
- udf_cublas_py_init.py : creates the schema & input table and loads test data
- udf_cublas_py_proc.py : the UDF itself
- udf_cublas_py_exec.py : creates & executes the UDF
The CUDA Toolkit is a pre-requisite for this example. Visit nVidia for details.
All commands should be run as the
The Scikit-CUDA package is also required and must first be installed.
Ensure that the CUDA path is set:
$ export CUDA_ROOT=/usr/local/cuda $ export PATH=$PATH:$CUDA_ROOT/bin $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64
Install the Scikit-CUDA package:
After copying the three example scripts to a
directory on the Kinetica head node, the example can be run as follows,
ptionally specifying the database host and a username & password to the Python
The results of the UDF run can be seen in the system log file.
The UDF will load data upon which calculations will be performed from example_udf_python.udf_cublas_in_table. Results of those calculations will be logged to the system log.
The input table will contain three float columns and be populated with 10 sets of randomly-generated numbers. The output log will contain the results of each operation, first in NumPy and then in cuBLAS, which should match.
This initialization script creates the schema & input table and populates it using the standard Kinetica Python API outside of the UDF execution framework.
Several aspects of the initialization process are noteworthy:
- The external database connection, indicative of the use of the standard Kinetica Python API--the UDF will not have this, as it runs within the database:
- Schema and input table creation:
This is the UDF itself. It uses the Kinetica Python UDF API to perform various mathematical functions against the input table column values and output the results to the system log. It runs within the UDF execution framework, and as such, is not called directly--instead, it is registered and launched by udf_cublas_py_exec.py.
Noteworthy in the UDF are the following:
- The initial call to ProcData() to access the database:
- The retrieval of input data values:
- The final call to complete() to mark the process as finished and ready for clean-up:
The execution script uses the standard Kinetica Python API to register the UDF in the database and then execute it.
The registration step associates a name with the UDF execution code contained in udf_cublas_py_proc.py, the command ( python ) and arguments (the name of the proc script) to use to run it, and that it will run in distributed mode.
The execution step invokes the UDF by name, passing in the input table name against which the UDF will execute.