Since Kinetica isn't a standard RDBMS, we have created this simple guide to get you started with your development efforts.
Note: It is recommended that the locations of python
and pip
are put in
the PATH
environment variable. If they are not, you will need to use the
full path to the python
and pip
executables in the instructions below.
Also, administrative access will most likely be required when installing the
Python packages.
Installation
- Download the Python API corresponding to your version of Kinetica.
All versions of the Kinetica Python API can be found at https://github.com/kineticadb/kinetica-api-python.git. Download the zip file of the branch that corresponds to the version of Kinetica you are running.
Unzip the downloaded file and cd into the resulting directory.
Install the Kinetica API and Avro packages:
pip install -e . pip install avroYou are now ready to run Python programs which use Kinetica. To test:
cd example python example.py
The GPUdb
object provides access to your Kinetica instance. To interact
with Kinetica, you must create a new object of this class, passing the URL of
the instance you are connecting to:
h_db = gpudb.GPUdb( encoding = 'BINARY', host = 'localhost', port = '9191' )
Tables in Kinetica are similar to tables in other databases--they have a
user-provided name and definition. Unlike other databases, the type definition
is declared by creating a new type in Kinetica. The easiest way to do this in
Python is to create an object that represents the type definition in JSON
format and then pass this object to the Kinetica create_type
API call:
#Declare weather record type
WeatherRecordType = """
{
"type": "record",
"name": "Weather",
"fields": [
{"name":"timestamp","type":"long"},
{"name":"x","type":"float"},
{"name":"y","type":"float"},
{"name":"cityName","type":"string"},
{"name":"country","type":"string"},
{"name":"region","type":"string"},
{"name":"temperature","type":"float"},
{"name":"weatherConditions","type":"string"}
]
} """.replace(' ', '').replace('\n', '')
response = h_db.create_type( type_definition = WeatherRecordType, label = 'WeatherType' )
# Get the type_id of this new type. It will be used in the table creation step.
type_id_1 = response['type_id']
At this point, Kinetica now has the new type, registered with the name
Weather
. Now a table can be created with this type, using the type_id
returned by the create_type
call:
response = h_db.create_table( table_name = "Weather", type_id = type_id_1 )
To insert records into our new table, first we need to create a list to hold our records:
# Create a list for your objects to be inserted into the table
encoded_obj_list = []
Next, create an ordered dictionary to represent each record and add the fields
and values corresponding to the type registered above. In this example, we are
inserting 10 of our Weather
records with arbitrary values. After each
object is created and the fields and values are added, we encode the object for
Kinetica insertion and add it to our object list:
# Generate data to be inserted into the table
for val in range(1, 10):
datum = collections.OrderedDict()
datum["timestamp"] = val
datum["x"] = val + 0.1
datum["y"] = val + 0.1
datum["cityName"] = "Rome"
datum["country"] = 'Italy'
datum["region"] = 'Abruzzi'
datum["temperature"] = randint(0, 100)
datum["weatherConditions"] = 'Sunny'
# Encode the data appropriately for insertion and add to our object list
encoded_obj_list.append(h_db.encode_datum(WeatherRecordType, datum))
# Optional parameter that enables returning IDs for the newly inserted records
options = {'return_record_ids':'true'}
Now that we have our encoded object list, we can insert the objects into Kinetica:
# Insert the records into the table
response = h_db.insert_records( table_name = "WeatherType",
data = encoded_obj_list,
list_encoding = 'binary',
options = options )
Since Kinetica was designed to handle big data, records are retrieved in
chunks. For example, to get the first 1000 records from the Weather
table:
getRecordsResponse = h_db.get_records( table_name = 'Weather',
offset = 0, limit = 1000,
encoding = 'json', options = {} )
The records are returned in the get_records
response, in the
records_json
field:
records = getRecordsResponse['records_json']
The records are returned in the form of JSON strings. We can iterate through the records, decoding each one to access the individual fields of the record:
for record in records:
decoded = json.loads(record)
print 'timestamp', decoded["timestamp"], new_line
To read all records in a table, you would retrieve chunks of records until read all of them.
showTableResponse = h_db.show_table(table_name='Weather', options={'get_sizes':'true'})
totalNumRecords = showTableResponse['sizes'][0]
max_read = 1000
start = 0
while start < totalNumRecords:
getRecordsResponse = h_db.get_records(table_name='Weather',offset=start,
limit=max_read,encoding='json',
options={})
records = getRecordsResponse['records_json']
for record in records:
print "record:", record, new_line
# end for
start += max_read
# end while
Kinetica provides a general purpose filter function and many highly optimized
filter functions. These filters create new, usually temporary, views
containing the filtered data. The general purpose filter method works similarly
to the SQL SELECT ... WHERE ...
query. Using our weather example, let's
say you want to filter on records where the temperature is over 70. The
following code will create a new view called Over70
containing all the
records matching our filter expression:
# Filter records into a view. Response contains the count only
filterResponse = h_db.filter( table_name = my_table, view_name = 'Over70', expression = 'temperature > 70.0')
Records from this new table are retrieved in the same manner as described in Reading Records from a Table above.
For more details on filtering expressions, see concepts.
To delete records from a table, you would use filter expressions to identify the records to delete. The delete method allows you to specify multiple filter expressions, so it takes a list of expressions, rather than a single string.
Continuing our example, let's delete all records where the temperature is below 50:
deleteResponse = h_db.delete_records( table_name = my_table, expressions = expressions, options = {} )