Version:

Getting Started In Python

Since Kinetica isn't a standard RDBMS, we have created this simple guide to get you started with your development efforts.

The Kinetica Python API Installation

Requirements
  • Python 2.7
  • pip

Note: It is recommended that the locations of python and pip are put in the PATH environment variable. If they are not, you will need to use the full path to the python and pip executables in the instructions below. Also, administrative access will most likely be required when installing the Python packages.

Installation

  1. Download the Python API corresponding to your version of Kinetica.

    All versions of the Kinetica Python API can be found at https://github.com/kineticadb/kinetica-api-python.git. Download the zip file of the branch that corresponds to the version of Kinetica you are running.

  2. Unzip the downloaded file and cd into the resulting directory.

  3. Install the Kinetica API and Avro packages:

    pip install -e .
    pip install avro
    
  4. You are now ready to run Python programs which use Kinetica. To test:

    cd example
    python example.py
    

Connecting to a Kinetica Instance

The GPUdb object provides access to your Kinetica instance. To interact with Kinetica, you must create a new object of this class, passing the URL of the instance you are connecting to:

h_db = gpudb.GPUdb( encoding = 'BINARY', host = 'localhost', port = '9191' )

Creating a Table

Tables in Kinetica are similar to tables in other databases--they have a user-provided name and definition. Unlike other databases, the type definition is declared by creating a new type in Kinetica. The easiest way to do this in Python is to create an object that represents the type definition in JSON format and then pass this object to the Kinetica create_type API call:

#Declare weather record type
WeatherRecordType = """
{
    "type": "record",
    "name": "Weather",
    "fields": [
        {"name":"timestamp","type":"long"},
        {"name":"x","type":"float"},
        {"name":"y","type":"float"},
        {"name":"cityName","type":"string"},
        {"name":"country","type":"string"},
        {"name":"region","type":"string"},
        {"name":"temperature","type":"float"},
        {"name":"weatherConditions","type":"string"}
    ]
}  """.replace(' ', '').replace('\n', '')

response = h_db.create_type( type_definition = WeatherRecordType, label = 'WeatherType' )
# Get the type_id of this new type.  It will be used in the table creation step.
type_id_1 = response['type_id']

At this point, Kinetica now has the new type, registered with the name Weather. Now a table can be created with this type, using the type_id returned by the create_type call:

response = h_db.create_table( table_name = "Weather", type_id = type_id_1 )

Writing to a Table

To insert records into our new table, first we need to create a list to hold our records:

# Create a list for your objects to be inserted into the table
encoded_obj_list = []

Next, create an ordered dictionary to represent each record and add the fields and values corresponding to the type registered above. In this example, we are inserting 10 of our Weather records with arbitrary values. After each object is created and the fields and values are added, we encode the object for Kinetica insertion and add it to our object list:

# Generate data to be inserted into the table
for val in range(1, 10):
    datum = collections.OrderedDict()
    datum["timestamp"] = val
    datum["x"] = val + 0.1
    datum["y"] = val + 0.1
    datum["cityName"] = "Rome"
    datum["country"] = 'Italy'
    datum["region"] = 'Abruzzi'
    datum["temperature"] = randint(0, 100)
    datum["weatherConditions"] = 'Sunny'

    # Encode the data appropriately for insertion and add to our object list
    encoded_obj_list.append(h_db.encode_datum(WeatherRecordType, datum))

# Optional parameter that enables returning IDs for the newly inserted records
options = {'return_record_ids':'true'}

Now that we have our encoded object list, we can insert the objects into Kinetica:

# Insert the records into the table
response = h_db.insert_records( table_name = "WeatherType",
                                data = encoded_obj_list,
                                list_encoding = 'binary',
                                options = options )

Reading Records from a Table

Since Kinetica was designed to handle big data, records are retrieved in chunks. For example, to get the first 1000 records from the Weather table:

getRecordsResponse = h_db.get_records( table_name = 'Weather',
                                       offset = 0, limit = 1000,
                                       encoding = 'json', options = {} )

The records are returned in the get_records response, in the records_json field:

records = getRecordsResponse['records_json']

The records are returned in the form of JSON strings. We can iterate through the records, decoding each one to access the individual fields of the record:

for record in records:
    decoded = json.loads(record)
    print 'timestamp', decoded["timestamp"], new_line

To read all records in a table, you would retrieve chunks of records until read all of them.

showTableResponse = h_db.show_table(table_name='Weather', options={'get_sizes':'true'})
totalNumRecords   = showTableResponse['sizes'][0]

max_read = 1000
start    = 0

while start < totalNumRecords:
    getRecordsResponse = h_db.get_records(table_name='Weather',offset=start,
                                          limit=max_read,encoding='json',
                                          options={})
    records = getRecordsResponse['records_json']
    for record in records:
        print "record:", record, new_line
    # end for
    start += max_read
# end while

Filtering Data

Kinetica provides a general purpose filter function and many highly optimized filter functions. These filters create new, usually temporary, views containing the filtered data. The general purpose filter method works similarly to the SQL SELECT ... WHERE ... query. Using our weather example, let's say you want to filter on records where the temperature is over 70. The following code will create a new view called Over70 containing all the records matching our filter expression:

# Filter records into a view.  Response contains the count only
filterResponse = h_db.filter( table_name = my_table, view_name = 'Over70', expression = 'temperature > 70.0')

Records from this new table are retrieved in the same manner as described in Reading Records from a Table above.

For more details on filtering expressions, see concepts.

Deleting Records from a Table

To delete records from a table, you would use filter expressions to identify the records to delete. The delete method allows you to specify multiple filter expressions, so it takes a list of expressions, rather than a single string.

Continuing our example, let's delete all records where the temperature is below 50:

deleteResponse = h_db.delete_records( table_name = my_table, expressions = expressions, options = {} )