Version:

API


Dynamic Schemas

For some endpoints (e.g. /aggregate/groupby, /aggregate/unique, /get/records/bycolumn), the response from Kinetica is formatted depending on the request and the data being returned. This is known as a dynamic schema. The response will contain an Avro schema string (in the response_schema_str field) and then either an Avro binary encoded response (in the binary_encoded_response field) or an Avro JSON encoded response (in the json_encoded_response field).

For dynamic schema responses, the response always consists of a series of arrays of various types, corresponding to the columns being returned. These arrays are labeled column_1, column_2, etc. In addition there is a string array column named column_headers containing the names or aliases of the returned columns. The first name in column_headers corresponds to column_1, the second to column_2 and so on. Lastly, there is a string array column named column_datatypes containing the data types of the returned columns. These correspond to the returned data columns in the same way column_headers do.

Dynamic Schemas in Python

As an example, using Python, consider doing an aggregate_groupby on a string column named dept_name, computing the max of a integer column named sal:

retobj = gpudb.aggregate_group_by(
             table_name = 'emp',
             column_names = ['dept_name','max(sal)'],
             offset = 0,
             limit = 10,
             encoding = 'json',
             options = {'sort_by':'key'}
         )

In this case, retobj['response_schema_str'] will look like:

{
    "fields": [
        {
            "name": "column_1",
            "type": {
                "items": "string",
                "type": "array"
            }
        },
        {
            "name": "column_2",
            "type": {
                "items": [
                    "int",
                    "null"
                ],
                "type": "array"
            }
        },
        {
            "name": "column_headers",
            "type": {
                "items": "string",
                "type": "array"
            }
        },
        {
            "name": "column_datatypes",
            "type": {
                "items": "string",
                "type": "array"
            }
        }
    ],
    "name": "generic_response",
    "type": "record"
}

And the actual data response, retobj['json_encoded_response'], will look like:

{
    "column_1": [
        "DEPT1",
        "DEPT2",
        "DEPT3",
        "DEPT4"
    ],
    "column_2": [
        200000,
        150000,
        110000,
        12345
    ],
    "column_datatypes": [
        "char64",
        "int"
    ],
    "column_headers": [
        "dept_name",
        "max(sal)"
    ]
}

The enumerated columns can be translated into their textual names using the gpudb.parse_dynamic_response(retobj)['response'] function, resulting in this:

OrderedDict
(
    [
        (u'dept_name', [u'DEPT1', u'DEPT2', u'DEPT3', u'DEPT4']),
        (u'max(sal)', [200000, 150000, 110000, 12345])
    ]
)

Dynamic Schemas in Java

Dynamic schemas are simpler to handle in Java. The Record, Type, and Column classes take care of the lower level details and provide a simple way to access the data. Once the records are returned by your query, you can get the type of the record and all the column details from the type. Below is an example of a query, iterating through the records, and iterating through each column in each record.:

AggregateGroupByResponse resp1 = gpudb.aggregateGroupBy(aggGroupByRequest);
List<Record> records = resp1.getData();
for(Record rec : records){
    Type type = rec.getType();

    List<Type.Column> columns = type.getColumns();
    for(int i=0;i<columns.size();i++){
        System.out.printf("%s:%s, ", columns.get(i).getName() ,rec.get(i));
    }
    System.out.println();
}