The following guide provides step-by-step instructions to get started writing Python applications using Kinetica. This guide demonstrates only a small set of the available API. A detailed description of the complete interface is available under Python API Reference.
Requirements
Note
The locations of python
and pip
should be placed in the
PATH
environment variable. If they are not, you'll need to use
the full path to the python
and pip
executables in the
instructions below. Also, administrative access will most likely be
required when installing the Python packages.
In the desired directory, run the following but be sure to replace
kinetica-version
with the correct branch name of the installed Kinetica
version:
git clone -b [kinetica-version] --single-branch https://github.com/kineticadb/kinetica-api-python.git
Change directory into the newly downloaded repository.
cd
into the resulting directory.In the root directory of the unzipped repository, install the Kinetica API and Avro packages:
pip install -e .
pip install avro
Test the installation:
cd example
python example.py
To interact with Kinetica, you must first instantiate an object of the GPUdb
class while providing the connection URL, including the host and port of the
database server.
import gpudb
""" Establish connection with a locally-running instance of Kinetica,
using binary encoding to save memory """
h_db = gpudb.GPUdb(encoding='BINARY', host='127.0.0.1', port='9191')
If the server is configured to authenticate users, the username and password will also need to be provided at the time of creating the object:
h_db = gpudb.GPUdb(encoding='BINARY', host='127.0.0.1', port='9191', username='user1', password='mysecret')
Before a table can be created and any data can be loaded into the database, a
Type needs to be defined. The type
can be defined using one of two methods: a JSON string specifying the schema in
raw format, or the more object-oriented GPUdbRecordType
object form.
A JSON string describing the fields (i.e. columns) of the type along with a name for the type can be passed into your handle. Each field consists of a name and a data type:
my_type='''{
"type":"record",
"name":"polygon_point",
"fields":[
{"name":"col1","type":"double"},
{"name":"col2","type":"string"}
]
}'''
response = h_db.create_type(type_definition = my_type, label = 'type_id_1')
You can also pass in column sub-type properties using the following syntax:
response = h_db.create_type(type_definition = my_type, label = 'type_id_1', properties = {"column_name": ["column_property1", "column_property2"], "column_name2": ["column_property1"]})
Note
To read more about column types and properties, see Types.
The returned object from the create_type call contains a unique type identifier allocated by the system.
Alternatively, the type can be created using the GPUdbRecordType
object
class and the GPUdbRecordColumn
and GPUdbColumnProperty
helper classes:
""" Create columns; column arguments consist of name then type, e.g.,
gpudb.GPUdbRecordColumn( "column_name", column_type, [column_property1,
column_property2] ) """
columns = []
columns.append(gpudb.GPUdbRecordColumn("city", gpudb.GPUdbRecordColumn._ColumnType.STRING, [gpudb.GPUdbColumnProperty.CHAR16]))
columns.append(gpudb.GPUdbRecordColumn("state_province", gpudb.GPUdbRecordColumn._ColumnType.STRING, [gpudb.GPUdbColumnProperty.CHAR32]))
columns.append(gpudb.GPUdbRecordColumn("country", gpudb.GPUdbRecordColumn._ColumnType.STRING, [gpudb.GPUdbColumnProperty.CHAR16]))
columns.append(gpudb.GPUdbRecordColumn("x", gpudb.GPUdbRecordColumn._ColumnType.DOUBLE))
columns.append(gpudb.GPUdbRecordColumn("y", gpudb.GPUdbRecordColumn._ColumnType.DOUBLE))
columns.append(gpudb.GPUdbRecordColumn("avg_temp", gpudb.GPUdbRecordColumn._ColumnType.DOUBLE))
columns.append(gpudb.GPUdbRecordColumn("time_zone", gpudb.GPUdbRecordColumn._ColumnType.STRING, [gpudb.GPUdbColumnProperty.CHAR8]))
# Create the type object
weather_record_type = gpudb.GPUdbRecordType(columns, label="weather_record_type")
""" Create the type in the database and save the type ID, needed to create
a table in the next step """
weather_record_type.create_type(h_db)
weather_type_id = weather_record_type.type_id
After you've created a type and stored the type ID in a variable, you can create a table.
response = h_db.create_table(table_name=weather_table, type_id=weather_type_id)
print "Table created: {}".format(response['status_info']['status'])
You can insert single records or multiple records using an encoded object list.
Note
The following methods are not the only way to insert records, but they are the simplest.
In this example, a dictionary object will be used to map key/value pairs to table column values.
# Create ordered dictionary for keys & values of record
datum = collections.OrderedDict()
datum["city"] = "Washington, D.C."
datum["state_province"] = "--"
datum["country"] = "USA"
datum["x"] = -77.016389
datum["y"] = 38.904722
datum["avg_temp"] = 58.5
datum["time_zone"] = "UTC-5"
# Encode record and put into a single element list
single_record = [gpudb.GPUdbRecord(weather_record_type, datum).binary_data]
# Insert the record into the table
response = h_db.insert_records(table_name=weather_table, data=single_record, list_encoding="binary", options={})
print "Number of single records inserted: {}".format(response["count_inserted"])
In this example, records will be defined in-line and added to a list.
# Create a list of encoded in-line records
encoded_obj_list = []
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Paris", "TX", "USA", -95.547778, 33.6625, 64.6, "UTC-6"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Memphis", "TN", "USA", -89.971111, 35.1175, 63, "UTC-6"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Sydney", "Nova Scotia", "Canada", -60.19551, 46.13631, 44.5, "UTC-4"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["La Paz", "Baja California Sur", "Mexico", -110.310833, 24.142222, 77, "UTC-7"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["St. Petersburg", "FL", "USA", -82.64, 27.773056, 74.5, "UTC-5"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Oslo", "--", "Norway", 10.75, 59.95, 45.5, "UTC+1"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Paris", "--", "France", 2.3508, 48.8567, 56.5, "UTC+1"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Memphis", "--", "Egypt", 31.250833, 29.844722, 73, "UTC+2"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["St. Petersburg", "--", "Russia", 30.3, 59.95, 43.5, "UTC+3"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Lagos", "Lagos", "Nigeria", 3.384082, 6.455027, 83, "UTC+1"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["La Paz", "Pedro Domingo Murillo", "Bolivia", -68.15, -16.5, 44, "UTC-4"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Sao Paulo", "Sao Paulo", "Brazil", -46.633333, -23.55, 69.5, "UTC-3"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Santiago", "Santiago Province", "Chile", -70.666667, -33.45, 62, "UTC-4"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Buenos Aires", "--", "Argentina", -58.381667, -34.603333, 65, "UTC-3"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Manaus", "Amazonas", "Brazil", -60.016667, -3.1, 83.5, "UTC-4"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Sydney", "New South Wales", "Australia", 151.209444, -33.865, 63.5, "UTC+10"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Auckland", "--", "New Zealand", 174.74, -36.840556, 60.5, "UTC+12"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Jakarta", "--", "Indonesia", 106.816667, -6.2, 83, "UTC+7"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Hobart", "--", "Tasmania", 147.325, -42.880556, 56, "UTC+10"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Perth", "Western Australia", "Australia", 115.858889, -31.952222, 68, "UTC+8"]).binary_data)
# Insert the records into the table
response = h_db.insert_records(table_name=weather_table, data=encoded_obj_list, list_encoding="binary", options={})
print "Number of batch records inserted: {}".format(response["count_inserted"])
Once the table is populated with data, the data can be retrieved as JSON from the system (by a call to get_records) as shown below:
""" Retrieve no more than 10 records as JSON from weather_table. Note that
records are stringified and have to be parsed. """
weatherLocs = h_db.get_records(table_name=weather_table, offset=0, limit=10, encoding="json", options={"sort_by":"city"})['records_json']
print "{:<20s} {:<25s} {:<15s} {:<10s} {:<11s} {:<9s} {:<8s}".format("City","State/Province","Country","Latitude","Longitude","Avg. Temp","Time Zone")
print "{:=<20s} {:=<25s} {:=<15s} {:=<10s} {:=<11s} {:=<9s} {:=<9s}".format("", "", "", "", "", "", "")
for weatherLoc in weatherLocs:
print "{city:<20s} {state_province:<25s} {country:<15s} {y:10.6f} {x:11.6f} {avg_temp:9.1f} {time_zone}".format(**json.loads(weatherLoc))
The data can also be retrieved using binary encoding, which is much faster than JSON encoding.
""" Retrieve no more than 25 of the remaining records as binary from weather
table. Note that records are binary and have to be decoded. """
response = h_db.get_records(table_name=weather_table, offset=10, limit=25, encoding="binary", options={"sort_by":"city"})
weatherLocs = gpudb.GPUdbRecord.decode_binary_data(response["type_schema"], response["records_binary"])
for weatherLoc in weatherLocs:
print "{city:<20s} {state_province:<25s} {country:<15s} {y:10.6f} {x:11.6f} {avg_temp:9.1f} {time_zone}".format(**weatherLoc)
""" Note that total_number_of_records does not reflect offset/limit; it's
the count of all records or those which match the given expression """
print "\nNumber of records in new table: {:d}".format(response["total_number_of_records"])
For large tables, the data can be easily be retrieved in smaller blocks by using
the offset
and limit
parameters, as shown above. The returned response
also contains the schema (or data type) of the results.
Filters are an easy way to filter down a larger table into more concise views.
""" Filter records where column x is less than 0, i.e., cities in the
western hemisphere, and store the filter in a view """
response = h_db.filter(table_name=weather_table, view_name=weather_w_view, expression="x < 0")
print "Number of records in the western hemisphere: {}".format(response["count"])
""" Filter records where column x is less than 0 and column y is greater
than 0, i.e., cities in the northwestern semi-hemisphere, and store
the filter in a view """
response = h_db.filter(table_name=weather_table, view_name=weather_nw_view, expression="x < 0 and y > 0")
print "Number of records in the northwestern semi-hemisphere: {}".format(response["count"])
""" Filter records using the same expressions as Example 2, but using
query chaining this time """
# Clear the view, as it will be recreated with chained query
h_db.clear_table(weather_nw_view)
response = h_db.filter(table_name=weather_w_view, view_name=weather_nw_view, expression="y > 0")
print "Number of records in the northwestern semi-hemisphere (with query chaining): {}".format(response["count"])
""" Filter by list where country name is USA, Brazil, or Australia """
country_map = {"country": ["USA", "Brazil", "Australia"]}
response = h_db.filter_by_list(table_name=weather_table, view_name=weather_country_view, column_values_map=country_map)
print "Number of records where country name is USA, Brazil, or Australia: {}".format(response["count"])
""" Filter by range cities that are east of GMT (the Prime Meridian) """
response = h_db.filter_by_range(table_name=weather_table, view_name=weather_e_view, column_name="x", lower_bound=0, upper_bound=180)
print "Number of records that are east of the Prime Meridian (x > 0): {}".format(response["count"])
Kinetica supports various aggregate and group-by queries, which group and aggregate your data to return counts and useful statistics.
""" Aggregate count, min, mean, and max on the average temperature """
response = h_db.aggregate_statistics(table_name=weather_table, column_name="avg_temp", stats="count,min,max,mean")
print "Statistics of values in the average temperature column:"
print "\tCount: {count:.0f}\n\tMin: {min:4.2f}\n\tMean: {mean:4.2f}\n\tMax: {max:4.2f}\n".format(**response["stats"])
""" Find unique city names """
response = h_db.aggregate_unique(table_name=weather_table, column_name="city", offset=0, limit=25, encoding="json")
print "Unique city names:"
weatherLocs = h_db.parse_dynamic_response(response)['response']['city']
for weatherLoc in weatherLocs:
print "\t* {}".format(weatherLoc)
""" Find number of weather locations per country in the northwestern
semi-hemisphere; use binary decoding """
response = h_db.aggregate_group_by(table_name=weather_nw_view, column_names=["country"], offset=0, limit=25, encoding="binary")
countries = gpudb.GPUdbRecord.decode_binary_data(response["response_schema_str"], response["binary_encoded_response"])
print "Weather locations per country in the northwest semi-hemisphere:"
for country in zip(countries["column_1"], countries["column_2"]):
print "\t{:<10s}{:2d}".format(country[0] + ":", country[1])
""" Filter table to southeastern semi-hemisphere records, group by country,
and aggregate min, max, and mean on the average temperature; use JSON
decoding """
h_db.filter(table_name=weather_table, view_name=weather_se_view, expression="x > 0 and y < 0")
response = h_db.aggregate_group_by(table_name=weather_se_view, column_names=["country", "min(avg_temp)", "max(avg_temp)", "mean(avg_temp)"], offset=0, limit=25, encoding="json")
data = h_db.parse_dynamic_response(response)['response']
print "{:<20s} {:^5s} {:^5s} {:^5s}".format("SE Semi-Hemi Country", "Min", "Mean", "Max")
print "{:=<20s} {:=<5s} {:=<5s} {:=<5s}".format("", "", "", "")
for countryWeather in zip(data["country"], data["min(avg_temp)"], data["mean(avg_temp)"], data["max(avg_temp)"]):
print "{:<20s} {:5.2f} {:5.2f} {:5.2f}".format(*countryWeather)
""" Filter for southern hemisphere cities and create a histogram for the
average temperature of those cities (divided into every 10 degrees,
e.g., 40s, 50s, 60s, etc.) """
h_db.filter(table_name=weather_table, view_name=weather_h_view, expression="y < 0")
histogram_result = h_db.aggregate_histogram(table_name=weather_h_view, column_name="avg_temp", start=40, end=90, interval=10)
print "Number of southern hemisphere cities with average temps in the given ranges:"
for histogroup in zip([40, 50, 60, 70, 80], histogram_result['counts']):
print "\t{}s: {:2.0f}".format(*histogroup)
You can delete records from a table using filter expressions. This method allows you to specify multiple filter expressions--note that each expression is used to delete records independently from the others (i.e., a record only needs to meet any one expression's criteria to be deleted from the table).
""" Filter for cities that are either south of latitude -50 or west of
longitude -50 to determine how many records will be deleted; delete
the records, then confirm the deletion by refiltering """
deleteExpression = "x < -50 or y < -50"
delete_filter_result = h_db.filter(table_name=weather_table, expression=deleteExpression)
print "Number of records that meet deletion criteria before deleting: {}".format(delete_filter_result['count'])
h_db.delete_records(table_name = weather_table, expressions=[deleteExpression])
delete_filter_result = h_db.filter(table_name=weather_table, expression=deleteExpression)
print "Number of records that meet deletion criteria after deleting: {}".format(delete_filter_result['count'])
Included below is a complete sample program containing all the above queries:
"""This script walks through how to use the Python API.
Covered here: importing GPUdb, instantiating Kinetica, creating a type,
creating a table, inserting records, retrieving records, filtering records,
aggregating/grouping records, and deleting records.
"""
import collections
import json
import gpudb
def gpudb_example():
print "TUTORIAL OUTPUT"
print "===============\n"
# all tables/views used in examples below
weather_table = "weather"
weather_w_view = "weather_west"
weather_nw_view = "weather_northwest"
weather_country_view = "weather_country"
weather_e_view = "weather_east"
weather_se_view = "weather_southeast"
weather_h_view = "weather_histogram"
""" Establish connection with a locally-running instance of Kinetica,
using binary encoding to save memory """
h_db = gpudb.GPUdb(encoding='BINARY', host='127.0.0.1', port='9191')
print
print "CREATING A TYPE & TABLE"
print "-----------------------"
print
""" Create columns; column arguments consist of name then type, e.g.,
gpudb.GPUdbRecordColumn( "column_name", column_type, [column_property1,
column_property2] ) """
columns = []
columns.append(gpudb.GPUdbRecordColumn("city", gpudb.GPUdbRecordColumn._ColumnType.STRING, [gpudb.GPUdbColumnProperty.CHAR16]))
columns.append(gpudb.GPUdbRecordColumn("state_province", gpudb.GPUdbRecordColumn._ColumnType.STRING, [gpudb.GPUdbColumnProperty.CHAR32]))
columns.append(gpudb.GPUdbRecordColumn("country", gpudb.GPUdbRecordColumn._ColumnType.STRING, [gpudb.GPUdbColumnProperty.CHAR16]))
columns.append(gpudb.GPUdbRecordColumn("x", gpudb.GPUdbRecordColumn._ColumnType.DOUBLE))
columns.append(gpudb.GPUdbRecordColumn("y", gpudb.GPUdbRecordColumn._ColumnType.DOUBLE))
columns.append(gpudb.GPUdbRecordColumn("avg_temp", gpudb.GPUdbRecordColumn._ColumnType.DOUBLE))
columns.append(gpudb.GPUdbRecordColumn("time_zone", gpudb.GPUdbRecordColumn._ColumnType.STRING, [gpudb.GPUdbColumnProperty.CHAR8]))
# Create the type object
weather_record_type = gpudb.GPUdbRecordType(columns, label="weather_record_type")
""" Create the type in the database and save the type ID, needed to create
a table in the next step """
weather_record_type.create_type(h_db)
weather_type_id = weather_record_type.type_id
# Create the table from the type
if h_db.has_table(table_name=weather_table)['table_exists']:
h_db.clear_table(weather_table)
response = h_db.create_table(table_name=weather_table, type_id=weather_type_id)
print "Table created: {}".format(response['status_info']['status'])
print "\n"
print "INSERTING DATA"
print "--------------"
print
# Insert single record example
# Create ordered dictionary for keys & values of record
datum = collections.OrderedDict()
datum["city"] = "Washington, D.C."
datum["state_province"] = "--"
datum["country"] = "USA"
datum["x"] = -77.016389
datum["y"] = 38.904722
datum["avg_temp"] = 58.5
datum["time_zone"] = "UTC-5"
# Encode record and put into a single element list
single_record = [gpudb.GPUdbRecord(weather_record_type, datum).binary_data]
# Insert the record into the table
response = h_db.insert_records(table_name=weather_table, data=single_record, list_encoding="binary", options={})
print "Number of single records inserted: {}".format(response["count_inserted"])
# Insert multiple records example
# Create a list of encoded in-line records
encoded_obj_list = []
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Paris", "TX", "USA", -95.547778, 33.6625, 64.6, "UTC-6"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Memphis", "TN", "USA", -89.971111, 35.1175, 63, "UTC-6"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Sydney", "Nova Scotia", "Canada", -60.19551, 46.13631, 44.5, "UTC-4"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["La Paz", "Baja California Sur", "Mexico", -110.310833, 24.142222, 77, "UTC-7"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["St. Petersburg", "FL", "USA", -82.64, 27.773056, 74.5, "UTC-5"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Oslo", "--", "Norway", 10.75, 59.95, 45.5, "UTC+1"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Paris", "--", "France", 2.3508, 48.8567, 56.5, "UTC+1"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Memphis", "--", "Egypt", 31.250833, 29.844722, 73, "UTC+2"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["St. Petersburg", "--", "Russia", 30.3, 59.95, 43.5, "UTC+3"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Lagos", "Lagos", "Nigeria", 3.384082, 6.455027, 83, "UTC+1"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["La Paz", "Pedro Domingo Murillo", "Bolivia", -68.15, -16.5, 44, "UTC-4"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Sao Paulo", "Sao Paulo", "Brazil", -46.633333, -23.55, 69.5, "UTC-3"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Santiago", "Santiago Province", "Chile", -70.666667, -33.45, 62, "UTC-4"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Buenos Aires", "--", "Argentina", -58.381667, -34.603333, 65, "UTC-3"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Manaus", "Amazonas", "Brazil", -60.016667, -3.1, 83.5, "UTC-4"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Sydney", "New South Wales", "Australia", 151.209444, -33.865, 63.5, "UTC+10"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Auckland", "--", "New Zealand", 174.74, -36.840556, 60.5, "UTC+12"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Jakarta", "--", "Indonesia", 106.816667, -6.2, 83, "UTC+7"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Hobart", "--", "Tasmania", 147.325, -42.880556, 56, "UTC+10"]).binary_data)
encoded_obj_list.append(gpudb.GPUdbRecord(weather_record_type, ["Perth", "Western Australia", "Australia", 115.858889, -31.952222, 68, "UTC+8"]).binary_data)
# Insert the records into the table
response = h_db.insert_records(table_name=weather_table, data=encoded_obj_list, list_encoding="binary", options={})
print "Number of batch records inserted: {}".format(response["count_inserted"])
print "\n"
print "RETRIEVING DATA"
print "---------------"
print
""" Retrieve no more than 10 records as JSON from weather_table. Note that
records are stringified and have to be parsed. """
weatherLocs = h_db.get_records(table_name=weather_table, offset=0, limit=10, encoding="json", options={"sort_by":"city"})['records_json']
print "{:<20s} {:<25s} {:<15s} {:<10s} {:<11s} {:<9s} {:<8s}".format("City","State/Province","Country","Latitude","Longitude","Avg. Temp","Time Zone")
print "{:=<20s} {:=<25s} {:=<15s} {:=<10s} {:=<11s} {:=<9s} {:=<9s}".format("", "", "", "", "", "", "")
for weatherLoc in weatherLocs:
print "{city:<20s} {state_province:<25s} {country:<15s} {y:10.6f} {x:11.6f} {avg_temp:9.1f} {time_zone}".format(**json.loads(weatherLoc))
""" Retrieve no more than 25 of the remaining records as binary from weather
table. Note that records are binary and have to be decoded. """
response = h_db.get_records(table_name=weather_table, offset=10, limit=25, encoding="binary", options={"sort_by":"city"})
weatherLocs = gpudb.GPUdbRecord.decode_binary_data(response["type_schema"], response["records_binary"])
for weatherLoc in weatherLocs:
print "{city:<20s} {state_province:<25s} {country:<15s} {y:10.6f} {x:11.6f} {avg_temp:9.1f} {time_zone}".format(**weatherLoc)
""" Note that total_number_of_records does not reflect offset/limit; it's
the count of all records or those which match the given expression """
print "\nNumber of records in new table: {:d}".format(response["total_number_of_records"])
print "\n"
print "FILTERING"
print "---------"
print
### Filter Example 1
""" Filter records where column x is less than 0, i.e., cities in the
western hemisphere, and store the filter in a view """
response = h_db.filter(table_name=weather_table, view_name=weather_w_view, expression="x < 0")
print "Number of records in the western hemisphere: {}".format(response["count"])
### Filter Example 2
""" Filter records where column x is less than 0 and column y is greater
than 0, i.e., cities in the northwestern semi-hemisphere, and store
the filter in a view """
response = h_db.filter(table_name=weather_table, view_name=weather_nw_view, expression="x < 0 and y > 0")
print "Number of records in the northwestern semi-hemisphere: {}".format(response["count"])
### Filter Example 3
""" Filter records using the same expressions as Example 2, but using
query chaining this time """
# Clear the view, as it will be recreated with chained query
h_db.clear_table(weather_nw_view)
response = h_db.filter(table_name=weather_w_view, view_name=weather_nw_view, expression="y > 0")
print "Number of records in the northwestern semi-hemisphere (with query chaining): {}".format(response["count"])
### Filter Example 4
""" Filter by list where country name is USA, Brazil, or Australia """
country_map = {"country": ["USA", "Brazil", "Australia"]}
response = h_db.filter_by_list(table_name=weather_table, view_name=weather_country_view, column_values_map=country_map)
print "Number of records where country name is USA, Brazil, or Australia: {}".format(response["count"])
### Filter Example 5
""" Filter by range cities that are east of GMT (the Prime Meridian) """
response = h_db.filter_by_range(table_name=weather_table, view_name=weather_e_view, column_name="x", lower_bound=0, upper_bound=180)
print "Number of records that are east of the Prime Meridian (x > 0): {}".format(response["count"])
print "\n"
print "AGGREGATING, GROUPING, and HISTOGRAMS"
print "-------------------------------------"
print
### Aggregate Example 1
""" Aggregate count, min, mean, and max on the average temperature """
response = h_db.aggregate_statistics(table_name=weather_table, column_name="avg_temp", stats="count,min,max,mean")
print "Statistics of values in the average temperature column:"
print "\tCount: {count:.0f}\n\tMin: {min:4.2f}\n\tMean: {mean:4.2f}\n\tMax: {max:4.2f}\n".format(**response["stats"])
### Aggregate Example 2
""" Find unique city names """
response = h_db.aggregate_unique(table_name=weather_table, column_name="city", offset=0, limit=25, encoding="json")
print "Unique city names:"
weatherLocs = h_db.parse_dynamic_response(response)['response']['city']
for weatherLoc in weatherLocs:
print "\t* {}".format(weatherLoc)
print
### Aggregate Example 3
""" Find number of weather locations per country in the northwestern
semi-hemisphere; use binary decoding """
response = h_db.aggregate_group_by(table_name=weather_nw_view, column_names=["country"], offset=0, limit=25, encoding="binary")
countries = gpudb.GPUdbRecord.decode_binary_data(response["response_schema_str"], response["binary_encoded_response"])
print "Weather locations per country in the northwest semi-hemisphere:"
for country in zip(countries["column_1"], countries["column_2"]):
print "\t{:<10s}{:2d}".format(country[0] + ":", country[1])
print
### Aggregate Example 4
""" Filter table to southeastern semi-hemisphere records, group by country,
and aggregate min, max, and mean on the average temperature; use JSON
decoding """
h_db.filter(table_name=weather_table, view_name=weather_se_view, expression="x > 0 and y < 0")
response = h_db.aggregate_group_by(table_name=weather_se_view, column_names=["country", "min(avg_temp)", "max(avg_temp)", "mean(avg_temp)"], offset=0, limit=25, encoding="json")
data = h_db.parse_dynamic_response(response)['response']
print "{:<20s} {:^5s} {:^5s} {:^5s}".format("SE Semi-Hemi Country", "Min", "Mean", "Max")
print "{:=<20s} {:=<5s} {:=<5s} {:=<5s}".format("", "", "", "")
for countryWeather in zip(data["country"], data["min(avg_temp)"], data["mean(avg_temp)"], data["max(avg_temp)"]):
print "{:<20s} {:5.2f} {:5.2f} {:5.2f}".format(*countryWeather)
print
### Aggregate Example 5
""" Filter for southern hemisphere cities and create a histogram for the
average temperature of those cities (divided into every 10 degrees,
e.g., 40s, 50s, 60s, etc.) """
h_db.filter(table_name=weather_table, view_name=weather_h_view, expression="y < 0")
histogram_result = h_db.aggregate_histogram(table_name=weather_h_view, column_name="avg_temp", start=40, end=90, interval=10)
print "Number of southern hemisphere cities with average temps in the given ranges:"
for histogroup in zip([40, 50, 60, 70, 80], histogram_result['counts']):
print "\t{}s: {:2.0f}".format(*histogroup)
print "\n"
print "DELETING DATA"
print "-------------"
print
""" Filter for cities that are either south of latitude -50 or west of
longitude -50 to determine how many records will be deleted; delete
the records, then confirm the deletion by refiltering """
deleteExpression = "x < -50 or y < -50"
delete_filter_result = h_db.filter(table_name=weather_table, expression=deleteExpression)
print "Number of records that meet deletion criteria before deleting: {}".format(delete_filter_result['count'])
h_db.delete_records(table_name = weather_table, expressions=[deleteExpression])
delete_filter_result = h_db.filter(table_name=weather_table, expression=deleteExpression)
print "Number of records that meet deletion criteria after deleting: {}".format(delete_filter_result['count'])
# end gpudb_example()
if __name__ == '__main__':
gpudb_example()
After running the above sample script, the following is output:
TUTORIAL OUTPUT
===============
CREATING A TYPE & TABLE
-----------------------
Table created: OK
INSERTING DATA
--------------
Number of single records inserted: 1
Number of batch records inserted: 20
RETRIEVING DATA
---------------
City State/Province Country Latitude Longitude Avg. Temp Time Zone
==================== ========================= =============== ========== =========== ========= =========
Auckland -- New Zealand -36.840556 174.740000 60.5 UTC+12
Buenos Aires -- Argentina -34.603333 -58.381667 65.0 UTC-3
Hobart -- Tasmania -42.880556 147.325000 56.0 UTC+10
Jakarta -- Indonesia -6.200000 106.816667 83.0 UTC+7
La Paz Baja California Sur Mexico 24.142222 -110.310833 77.0 UTC-7
La Paz Pedro Domingo Murillo Bolivia -16.500000 -68.150000 44.0 UTC-4
Lagos Lagos Nigeria 6.455027 3.384082 83.0 UTC+1
Manaus Amazonas Brazil -3.100000 -60.016667 83.5 UTC-4
Memphis TN USA 35.117500 -89.971111 63.0 UTC-6
Memphis -- Egypt 29.844722 31.250833 73.0 UTC+2
Oslo -- Norway 59.950000 10.750000 45.5 UTC+1
Paris TX USA 33.662500 -95.547778 64.6 UTC-6
Paris -- France 48.856700 2.350800 56.5 UTC+1
Perth Western Australia Australia -31.952222 115.858889 68.0 UTC+8
Santiago Santiago Province Chile -33.450000 -70.666667 62.0 UTC-4
Sao Paulo Sao Paulo Brazil -23.550000 -46.633333 69.5 UTC-3
St. Petersburg FL USA 27.773056 -82.640000 74.5 UTC-5
St. Petersburg -- Russia 59.950000 30.300000 43.5 UTC+3
Sydney Nova Scotia Canada 46.136310 -60.195510 44.5 UTC-4
Sydney New South Wales Australia -33.865000 151.209444 63.5 UTC+10
Washington, D.C. -- USA 38.904722 -77.016389 58.5 UTC-5
Number of records in new table: 21
FILTERING
---------
Number of records in the western hemisphere: 11
Number of records in the northwestern semi-hemisphere: 6
Number of records in the northwestern semi-hemisphere (with query chaining): 6
Number of records where country name is USA, Brazil, or Australia: 8
Number of records that are east of the Prime Meridian (x > 0): 10
AGGREGATING, GROUPING, and HISTOGRAMS
-------------------------------------
Statistics of values in the average temperature column:
Count: 21
Min: 43.50
Mean: 63.74
Max: 83.50
Unique city names:
* Auckland
* Buenos Aires
* Hobart
* Jakarta
* La Paz
* Lagos
* Manaus
* Memphis
* Oslo
* Paris
* Perth
* Santiago
* Sao Paulo
* St. Petersburg
* Sydney
* Washington, D.C.
Weather locations per country in the northwest semi-hemisphere:
Canada: 1
Mexico: 1
USA: 4
SE Semi-Hemi Country Min Mean Max
==================== ===== ===== =====
Tasmania 56.00 56.00 56.00
New Zealand 60.50 60.50 60.50
Australia 63.50 65.75 68.00
Indonesia 83.00 83.00 83.00
Number of southern hemisphere cities with average temps in the given ranges:
40s: 1
50s: 1
60s: 6
70s: 0
80s: 2
DELETING DATA
-------------
Number of records that meet deletion criteria before deleting: 10
Number of records that meet deletion criteria after deleting: 0