Download & Run
This example will contain the following Python scripts (click to download):- A UDF management program, udf_sos_manager.py, written using the Python API, which creates the input & output tables, and creates the UDF and executes it.
- A UDF, udf_sos_proc.py, written using the Python UDF API, which contains the sum-of-squares example.
Run Example
-
The
udf_sos_py_in_tabletable is created in the user’s default schema (ki_home, unless a different one was assigned during account creation) -
A matching
udf_sos_py_out_tabletable is created in the same schema -
The
udf_sos_py_in_tablecontains 10,000 records of random data -
The
udf_sos_py_out_tablecontains the sum of square of the two columns fromudf_sos_py_in_table. -
To show the source columns and the sum-of-squares together, run the
following query:
Show Sum of Squares Calculations
UDF Detail
The example UDF uses a single table,udf_sos_py_in_table, as input and a
corresponding table, udf_sos_py_out_table, for output.
The input table will contain two float columns and be populated with 10,000
pairs of randomly-generated numbers. The output table will contain one float
column that will hold the sums calculated by the UDF. Both tables will also
contain an int column that is the calculation identifier, allowing the input
data to be matched up with the output data after the UDF has run.
The UDF will assume the first column of the input table, as defined in
the original table creation process, is the identifier field. All of
the remaining columns after the first will be used in the
sum-of-squares calculation.
Initialization (udf_sos_manager.py init)
The init option invokes theinit() function in the
udf_sos_manager.py script. This function will create the input table for
the UDF to use as the source of the calculations and the output table into which
the results will be inserted. It also populates the input data using the
standard Kinetica Python API, all outside of the UDF execution framework.
Several aspects of the initialization process are noteworthy:
-
The external database connection, indicative of the use of the standard
Kinetica Python API—the UDF itself will not have this, as it runs within
the database:
Connect to the Database
-
Input and output table creation:
Create Input TablePopulate Input Table DataCreate Results Table
UDF (udf_sos_proc.py)
Theudf_sos_proc.py script is the UDF itself. It uses the Kinetica Python
UDF API to compute the sums of squares of input table columns and output those
sums to the output table. It runs within the UDF execution framework, and as
such, is not called directly—instead, it is registered and launched by
udf_sos_manager.py.
Noteworthy in the UDF are the following:
-
The initial call to
ProcData()to access the database:Begin UDF -
The size of the output table must be specified before writing to it:
Size Results Table to Match the Input Table
-
The sum-of-squares computation and writing to the output table:
Compute Sum-of-Squares and Write to Output Table
-
The final call to
complete()to mark the process as finished and ready for clean-up:End UDF
Execution (udf_sos_manager.py exec)
The exec option invokes theexec() function in the
udf_sos_manager.py script. This function will read the UDF script in as
bytes, and create a UDF, uploading the script to the database. The function will
then execute the UDF.
-
The registration step associates a name with the UDF execution code contained
in udf_sos_proc.py, the command
( python3 ) and arguments
(the name of the proc script) to use to run it, and that it will run in
distributed mode.
Create UDF
-
The execution step invokes the UDF by name, passing in the input & output
table names against which the UDF will execute.
Execute UDF