Example UDF (CUDA) - CUBLAS¶

The following is a complete example, using the Python API, of a CUDA-based UDF that performs various computations using the scikit-CUDA interface. It will take two vectors and one matrix of data loaded from a Kinetica table and perform various operations in both NumPy & cuBLAS, writing the comparison output to /opt/gpudb/core/logs/gpudb.log.

This setup assumes the UDF is being developed on the Kinetica host (or head node host, if a multi-node Kinetica cluster); and that the Python database API is available at /opt/gpudb/api/python and the Python UDF API is available at /opt/gpudb/udf/api/python. This setup also assumes a local CUDA installation at /usr/local/cuda.

This example will contain the following Python scripts (click to download):

udf_cublas_py_init.py: creates the input table and loads test data
udf_cublas_py_proc.py: the UDF itself
udf_cublas_py_exec.py: creates & executes the UDF

The CUDA Toolkit is a pre-requisite for this example. Visit nVidia for details.

Note

All commands should be run as the gpudb user.

The Scikit-CUDA package is also required and must first be installed.

Ensure that the CUDA path is set:

$ export CUDA_ROOT=/usr/local/cuda
$ export PATH=$PATH:$CUDA_ROOT/bin
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64

Install the Scikit-CUDA package:

$ /opt/gpudb/bin/gpudb-pip install scikit-cuda

After copying the three example scripts to a gpudb-accessible directory on the Kinetica head node, the example can be run as follows:

$ /opt/gpudb/bin/gpudb_python udf_cublas_py_init.py
$ /opt/gpudb/bin/gpudb_python udf_cublas_py_exec.py

The results of the UDF run can be seen in the /opt/gpudb/core/logs/gpudb.log file.

Execution Detail¶

The UDF will load data upon which calculations will be performed from udf_cublas_in_table. Results of those calculations will be logged to /opt/gpudb/core/logs/gpudb.log.

The input table will contain three float columns and be populated with 10 sets of randomly-generated numbers. The output log will contain the results of each operation, first in NumPy and then in cuBLAS, which should match.

udf_cublas_py_init.py¶

This initialization script creates the input table and populates it using the standard Kinetica Python API outside of the UDF execution framework.

Several aspects of the initialization process are noteworthy:

The external database connection, indicative of the use of the standard Kinetica Python API--the UDF will not have this, as it runs within the database:

h_db = GPUdb(encoding = 'BINARY', host = KINETICA_HOST, port = KINETICA_PORT)

Input table creation:

columns = []
columns.append(GPUdbRecordColumn("x", GPUdbRecordColumn._ColumnType.FLOAT))
columns.append(GPUdbRecordColumn("y", GPUdbRecordColumn._ColumnType.FLOAT))
columns.append(GPUdbRecordColumn("z", GPUdbRecordColumn._ColumnType.FLOAT))
input_table = GPUdbTable(columns, INPUT_TABLE, db = h_db, options = GPUdbTableOptions.default().is_replicated(True))

udf_cublas_py_proc.py¶

This is the UDF itself. It uses the Kinetica Python UDF API to perform various mathematical functions against the input table column values and output the results to /opt/gpudb/core/logs/gpudb.log. It runs within the UDF execution framework, and as such, is not called directly--instead, it is registered and launched by udf_cublas_py_exec.py.

Noteworthy in the UDF are the following:

The initial call to ProcData() to access the database:

   proc_data = ProcData()

The retrieval of input data values:

       example(proc_data)

def example(pd):

   in_table = pd.input_data[0]
   
   for i in xrange(0, in_table.size):
      x[i,0] = in_table['x'][i]
      y[i,0] = in_table['y'][i]

The final call to complete() to mark the process as finished and ready for clean-up:

   proc_data.complete()

udf_cublas_py_exec.py¶

The execution script uses the standard Kinetica Python API to register the UDF in the database and then execute it.

The registration step associates a name with the UDF execution code contained in udf_cublas_py_proc.py, the command ( python ) and arguments (the name of the proc script) to use to run it, and that it will run in distributed mode.

response = h_db.create_proc(proc_name, 'distributed', files, 'python', [file_name], {})

The execution step invokes the UDF by name, passing in the input table name against which the UDF will execute.

response = h_db.execute_proc(proc_name, {}, {}, [INPUT_TABLE], {}, [], {})

Table Of Contents