Class GPUdbTable
- class gpudb.GPUdbTable(_type=None, name=None, options=None, db=None, read_only_table_count=None, delete_temporary_views=True, temporary_view_names=None, create_views=True, use_multihead_io=False, use_multihead_ingest=False, multihead_ingest_batch_size=10000, flush_multi_head_ingest_per_insertion=False, convert_special_types_on_retrieval=False)[source]
Parameters
- _type (
RecordType
orGPUdbRecordType
or list of lists of str) – Either a
GPUdbRecordType
orRecordType
object which represents a type for the table, or a nested list of lists, where each internal list has the format of:# Just the name and type [ "name", "type (double, int etc.)" ] # Name, type, and one column property [ "name", "type (double, int etc.)", "nullable" ] # Name, type, and multiple column properties [ "name", "string", "char4", "nullable" ]
Pass None for collections. If creating a GPUdbTable object for a pre-existing table, then also pass None.
If no table with the given name exists, then the given type will be created in GPUdb before creating the table.
Default is None.
- name (str) –
The name for the table. if none provided, then a random name will be generated using
random_name()
. The name may contain the schema name (separated by a period). Alternatively, if the table name has no schema name but a collection name is specified via the options, that collection name will be treated as the schema name. Must not specify a schema name in this argument and specify a collection name also. The fully qualified version of the table name, i.e. ‘schema_name.table_name’ will be used for all endpoint calls internally.- options (GPUdbTableOptions or dict) –
A
GPUdbTableOptions
object or a dict containing options for the table creation.- db (GPUdb) –
A
GPUdb
object that allows the user to connect to the GPUdb server.- read_only_table_count (int) –
For known read-only tables, provide the number of records in it. Integer. Must provide the name of the table.
- delete_temporary_views (bool) –
If true, then in terminal queries–queries that can not be chained–delete the temporary views upon completion. Defaults to True.
- create_views (bool) –
Indicates whether or not to create views for this table.
- temporary_view_names (list) –
Optional list of temporary view names (that ought to be deleted upon terminal queries)
- use_multihead_io (bool) –
Indicates whether or not to use multi-head input and output (meaning ingestion and lookup). Default is False. Note that multi-head ingestion is more computation intensive for sharded tables, and it is probably advisable only if there is a heavy ingestion load. Choose carefully.
Please see documentation of parameters multihead_ingest_batch_size and flush_multi_head_ingest_per_insertion for controlling the multi-head ingestion related behavior.
- use_multihead_ingest (bool) –
Indicates whether or not to use multi-head ingestion, if available upon insertion. Note that multi-head ingestion is more computation intensive for sharded tables, and it is probably advisable only if there is a heavy ingestion load. Default is False. Will be deprecated in version 7.0.
- multihead_ingest_batch_size (int) –
Used only in conjunction with use_multihead_ingest; ignored otherwise. Sets the batch size to be used for the ingestor. Must be greater than zero. Default is 10,000. The multi-head ingestor flushes the inserted records every multihead_ingest_batch_size automatically, unless flush_multi_head_ingest_automatically is False. Any remaining records would have to be manually flushed using
flush_data_to_server()
by the user, or will be automatically flushed perinsert_records()
if flush_multi_head_ingest_automatically is True.- flush_multi_head_ingest_per_insertion (bool) –
Used only in conjunction with use_multihead_ingest; ignored otherwise. If True, flushes the multi-head ingestor in every
insert_records()
call. Otherwise, the multi-head ingestor flushes the data to the server when a worker queue reaches multihead_ingest_batch_size in size, and any remaining records will have to be manually flushed usingflush_data_to_server()
. Default False.- convert_special_types_on_retrieval (bool) –
Convert array types to list and JSON types to dicts while retrieval. Default False.
Returns
A GPUdbTable object.
- static random_name()[source]
Returns a randomly generated UUID-based name. Use underscores instead of hyphens.
- set_logger_level(log_level)[source]
Set the log level for the GPUdbTable class and any multi-head i/o related classes it uses.
Parameters
- log_level (int, long, or str) –
A valid log level for the logging module
- property table_name
Return user given name for this table (or the randomly generated one, if applicable).
- property name
Return user given name for this table (or the randomly generated one, if applicable). Return the qualified version
- property qualified_table_name
Return the fully qualified name for this table, including any schemas.
- property is_read_only
Is the table read-only, or can we modify it?
- property count
Return the table’s size/length/count.
- property is_collection
Returns True if the table is a collection; False otherwise.
- property collection_name
Returns the name of the collection this table is a member of; None if this table does not belong to any collection.
- get_table_type()[source]
Return the table’s (record) type (the GPUdbRecordType object, not the c-extension RecordType).
- alias(alias)[source]
Create an alias string for this table.
Parameters
- alias (str) –
A string that contains the alias.
Returns
A string with the format “this-table-name as alias”.
- create_view(view_name, count=None)[source]
Given a view name and a related response, create a new GPUdbTable object which is a read-only table with the intermediate tables automatically updated.
Returns
A
GPUdbTable
object
- cleanup()[source]
Clear/drop all intermediate tables if settings allow it.
Returns
self for enabling chaining method invocations.
- exists(options={})[source]
Checks for the existence of a table with the given name.
Returns
A boolean flag indicating whether the table currently –
exists in the database.
- flush_data_to_server()[source]
If multi-head ingestion is enabled, then flush all records in the ingestors’ worker queues so that they actually get inserted to the server database.
- static convert_special_type_values_in_insert_records(record_type: GPUdbRecordType, records: list | dict | List[Record])[source]
Convert any special value type (array, json, vector etc.) suitably
Parameters
- record_type (GPUdbRecordType) –
the record type for these ‘records’
- records (Union[Union[ list, dict], List[Record]]) –
A single record value (list or dict)
Returns
- object –
the converted value
- static convert_special_type_values_in_get_records(record_type: GPUdbRecordType, records: list | dict | List[Record])[source]
Convert any special value type (array, json, vector etc.) suitably
Parameters
- record_type (GPUdbRecordType) –
the record type for these ‘records’
- records (Union[Union[ list, dict], List[Record]]) –
A single record value (list or dict)
Returns
- object –
the converted value
- insert_records(*args, **kwargs)[source]
Insert one or more records.
Parameters
- args –
Values for all columns of a single record or multiple records. For a single record, use either of the following syntaxes:
insert_records( 1, 2, 3 ) insert_records( [1, 2, 3] )
For multiple records, use either of the following syntaxes:
insert_records( [ [1, 2, 3], [4, 5, 6] ] ) insert_records( [1, 2, 3], [4, 5, 6] )
Also, the user can use keyword arguments to pass in values:
# For a record type with two integers named 'a' and 'b': insert_records( {"a": 1, "b": 1}, {"a": 42, "b": 32} ) # Also can use a list to pass the dicts insert_records( [ {"a": 1, "b": 1}, {"a": 42, "b": 32} ] )
Additionally, the user may provide options for the insertion operation. For example:
insert_records( [1, 2, 3], [4, 5, 6], options = {"return_record_ids": "true"} )
- kwargs –
Values for all columns for a single record. Mutually exclusive with args (i.e. cannot provide both) when it only contains data.
May contain an ‘options’ keyword arg which will be passed to the database for the insertion operation.
Returns
A
GPUdbTable
object with the insert_records() –response fields converted to attributes and stored within.
- insert_records_random(count=None, options={})[source]
Generates a specified number of random records and adds them to the given table. There is an optional parameter that allows the user to customize the ranges of the column values. It also allows the user to specify linear profiles for some or all columns in which case linear values are generated rather than random ones. Only individual tables are supported for this operation.
This operation is synchronous, meaning that a response will not be returned until all random records are fully available.
Parameters
- count (long) –
Number of records to generate.
- options (dict of dicts of floats) –
Optional parameter to pass in specifications for the randomness of the values. This map is different from the options parameter of most other endpoints in that it is a map of string to map of string to doubles, while most others are maps of string to string. In this map, the top level keys represent which column’s parameters are being specified, while the internal keys represents which parameter is being specified. These parameters take on different meanings depending on the type of the column. Below follows a more detailed description of the map: Default value is an empty dict ( {} ). Allowed keys are:
seed – If provided, the internal random number generator will be initialized with the given value. The minimum is 0. This allows for the same set of random numbers to be generated across invocation of this endpoint in case the user wants to repeat the test. Since input parameter options, is a map of maps, we need an internal map to provide the seed value. For example, to pass 100 as the seed value through this parameter, you need something equivalent to: ‘options’ = {‘seed’: { ‘value’: 100 } } Allowed keys are:
value – Pass the seed value here.
all – This key indicates that the specifications relayed in the internal map are to be applied to all columns of the records. Allowed keys are:
min – For numerical columns, the minimum of the generated values is set to this value. Default is -99999. For point, shape, and track semantic types, min for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are -180.0 and -90.0. For the ‘TIMESTAMP’ column, the default minimum corresponds to Jan 1, 2010. For string columns, the minimum length of the randomly generated strings is set to this value (default is 0). If both minimum and maximum are provided, minimum must be less than or equal to max. Value needs to be within [0, 200]. If the min is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track types, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.
max – For numerical columns, the maximum of the generated values is set to this value. Default is 99999. For point, shape, and track semantic types, max for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are 180.0 and 90.0. For string columns, the maximum length of the randomly generated strings is set to this value (default is 200). If both minimum and maximum are provided, max must be greater than or equal to min. Value needs to be within [0, 200]. If the max is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track types, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.
interval – If specified, generate values for all columns evenly spaced with the given interval value. If a max value is specified for a given column the data is randomly generated between min and max and decimated down to the interval. If no max is provided the data is linearly generated starting at the minimum value (instead of generating random data). For non-decimated string-type columns the interval value is ignored. Instead the values are generated following the pattern: ‘attrname_creationIndex#’, i.e. the column name suffixed with an underscore and a running counter (starting at 0). For string types with limited size (e.g., char4) the prefix is dropped. No nulls will be generated for nullable columns.
null_percentage – If specified, then generate the given percentage of the count as nulls for all nullable columns. This option will be ignored for non-nullable columns. The value must be within the range [0, 1.0]. The default value is 5% (0.05).
cardinality – If specified, limit the randomly generated values to a fixed set. Not allowed on a column with interval specified, and is not applicable to WKT or Track-specific columns. The value must be greater than 0. This option is disabled by default.
attr_name – Set the following parameters for the column specified by the key. This overrides any parameter set by all. Allowed keys are:
min – For numerical columns, the minimum of the generated values is set to this value. Default is -99999. For point, shape, and track semantic types, min for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are -180.0 and -90.0. For the ‘TIMESTAMP’ column, the default minimum corresponds to Jan 1, 2010. For string columns, the minimum length of the randomly generated strings is set to this value (default is 0). If both minimum and maximum are provided, minimum must be less than or equal to max. Value needs to be within [0, 200]. If the min is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track types, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.
max – For numerical columns, the maximum of the generated values is set to this value. Default is 99999. For point, shape, and track semantic types, max for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are 180.0 and 90.0. For string columns, the maximum length of the randomly generated strings is set to this value (default is 200). If both minimum and maximum are provided, max must be greater than or equal to min. Value needs to be within [0, 200]. If the max is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track types, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.
interval – If specified, generate values for all columns evenly spaced with the given interval value. If a max value is specified for a given column the data is randomly generated between min and max and decimated down to the interval. If no max is provided the data is linearly generated starting at the minimum value (instead of generating random data). For non-decimated string-type columns the interval value is ignored. Instead the values are generated following the pattern: ‘attrname_creationIndex#’, i.e. the column name suffixed with an underscore and a running counter (starting at 0). For string types with limited size (e.g., char4) the prefix is dropped. No nulls will be generated for nullable columns.
null_percentage – If specified and if this column is nullable, then generate the given percentage of the count as nulls. This option will result in an error if the column is not nullable. The value must be within the range [0, 1.0]. The default value is 5% (0.05).
cardinality – If specified, limit the randomly generated values to a fixed set. Not allowed on a column with interval specified, and is not applicable to WKT or Track-specific columns. The value must be greater than 0. This option is disabled by default.
track_length – This key-map pair is only valid for track type data sets (an error is thrown otherwise). No nulls would be generated for nullable columns. Allowed keys are:
min – Minimum possible length for generated series; default is 100 records per series. Must be an integral value within the range [1, 500]. If both min and max are specified, min must be less than or equal to max.
max – Maximum possible length for generated series; default is 500 records per series. Must be an integral value within the range [1, 500]. If both min and max are specified, max must be greater than or equal to min.
Returns
A
GPUdbTable
object with the insert_records() response –fields converted to attributes (and stored within) –
following entries
- table_name (str) –
Value of input parameter table_name.
- count (long) –
Value of input parameter count.
- get_records_by_key(key_values, expression='', options=None)[source]
Fetches the record(s) from the appropriate worker rank directly (or, if multi-head record retrieval is not set up, then from the head node) that map to the given shard key.
Parameters
- key_values (list or dict) –
Values for the sharding columns of the record to fetch either in a list (then it is assumed to be in the order of the sharding keys in the record type) or a dict. Must not have any missing sharding/primary column value or any extra column values.
- expression (str) –
Optional parameter. If given, it is passed to /get/records as a filter expression.
- options (dict of str to str or None) –
Any /get/records options to be passed onto the GPUdb server. Optional parameter.
Returns
The decoded records.
- get_records(offset=0, limit=-9999, encoding='binary', options={}, force_primitive_return_types=True)[source]
Retrieves records from a given table, optionally filtered by an expression and/or sorted by a column. This operation can be performed on tables, views, or on homogeneous collections (collections containing tables of all the same type). Records can be returned encoded as binary or json.
This operation supports paging through the data via the input parameter offset and input parameter limit parameters. Note that when paging through a table, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records retrieved may differ between calls based on the updates applied.
Decodes and returns the fetched records.
Parameters
- offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). Default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
- limit (long) –
A positive integer indicating the maximum number of results to be returned. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. Default value is -9999.
- encoding (str) –
Specifies the encoding for returned records. Default value is ‘binary’. Allowed values are:
binary
json
The default value is ‘binary’.
- options (dict of str) –
Default value is an empty dict ( {} ). Allowed keys are:
expression – Optional filter expression to apply to the table.
fast_index_lookup – Indicates if indexes should be used to perform the lookup for a given expression if possible. Only applicable if there is no sorting, the expression contains only equivalence comparisons based on existing tables indexes and the range of requested values is from [0 to END_OF_SET]. The default value is true.
sort_by – Optional column that the data should be sorted by. Empty by default (i.e. no sorting is applied).
sort_order – String indicating how the returned values should be sorted - ascending or descending. If sort_order is provided, sort_by has to be provided. Allowed values are:
ascending
descending
The default value is ‘ascending’.
- force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then
Record
objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use theRecord
object option instead. If True, but none of the returned columns require a conversion, then the originalRecord
objects will be returned. Default value is True.
Returns
A list of
Record
objects containing the record values.
- get_records_by_column(column_names, offset=0, limit=-9999, encoding='binary', options={}, print_data=False, force_primitive_return_types=True, get_column_major=True)[source]
For a given table, retrieves the values of the given columns within a given range. It returns maps of column name to the vector of values for each supported data type (double, float, long, int and string). This operation supports pagination feature, i.e. values that are retrieved are those associated with the indices between the start (offset) and end value (offset + limit) parameters (inclusive). If there are num_points values in the table then each of the indices between 0 and num_points-1 retrieves a unique value.
Note that when using the pagination feature, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records or values retrieved may differ between calls (noncontiguous or overlap) based on the type of the update.
The response is returned as a dynamic schema. For details see: dynamic schemas documentation.
Parameters
- column_names (list of str) –
The list of column values to retrieve.
- offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The minimum allowed value is 0. The maximum allowed value is MAX_INT.
- limit (long) –
A positive integer indicating the maximum number of results to be returned (if not provided the default is -9999), or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned.
- encoding (str) –
Specifies the encoding for returned records; either ‘binary’ or ‘json’. Default value is ‘binary’. Allowed values are:
binary
json
The default value is ‘binary’.
- options (dict of str) –
Default value is an empty dict ( {} ). Allowed keys are:
expression – Optional filter expression to apply to the table.
sort_by – Optional column that the data should be sorted by. Empty by default (i.e. no sorting is applied).
sort_order – String indicating how the returned values should be sorted - ascending or descending. Default is ‘ascending’. If sort_order is provided, sort_by has to be provided. Allowed values are:
ascending
descending
The default value is ‘ascending’.
order_by – Comma-separated list of the columns to be sorted by; e.g. ‘timestamp asc, x desc’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name.
- print_data (bool) –
If True, print the fetched data to the console in a tabular format if the data is being returned in the column-major format. Default is False.
- force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then
Record
objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use theRecord
object option instead. If True, but none of the returned columns require a conversion, then the originalRecord
objects will be returned. Default value is True.- get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.
Decodes the fetched records and saves them in the response class in an attribute called data.
Returns
A dict of column name to column values for column-major data, or –
a list of
Record
objects for row-major data.
- get_records_by_series(world_table_name=None, offset=0, limit=250, encoding='binary', options={}, force_primitive_return_types=True)[source]
Retrieves the complete series/track records from the given input parameter world_table_name based on the partial track information contained in the input parameter table_name.
This operation supports paging through the data via the input parameter offset and input parameter limit parameters.
In contrast to
get_records()
this returns records grouped by series/track. So if input parameter offset is 0 and input parameter limit is 5 this operation would return the first 5 series/tracks in input parameter table_name. Each series/track will be returned sorted by their TIMESTAMP column.Parameters
- world_table_name (str) –
Name of the table containing the complete series/track information to be returned for the tracks present in the input parameter table_name. Typically this is used when retrieving series/tracks from a view (which contains partial series/tracks) but the user wants to retrieve the entire original series/tracks. Can be blank.
- offset (int) –
A positive integer indicating the number of initial series/tracks to skip (useful for paging through the results). Default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
- limit (int) –
A positive integer indicating the maximum number of series/tracks to be returned. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. Default value is 250.
- encoding (str) –
Specifies the encoding for returned records; either ‘binary’ or ‘json’. Default value is ‘binary’. Allowed values are:
binary
json
The default value is ‘binary’.
- options (dict of str) –
Optional parameters. Default value is an empty dict ( {} ).
- force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then
Record
objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use theRecord
object option instead. If True, but none of the returned columns require a conversion, then the originalRecord
objects will be returned. Default value is True.
Returns
A list of list of
Record
objects containing the record values. –Each external record corresponds to a single track (or series)
- get_records_from_collection(offset=0, limit=-9999, encoding='binary', options={}, force_primitive_return_types=True)[source]
Retrieves records from a collection. The operation can optionally return the record IDs which can be used in certain queries such as
delete_records()
.This operation supports paging through the data via the input parameter offset and input parameter limit parameters.
Note that when using the Java API, it is not possible to retrieve records from join tables using this operation.
Parameters
- offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). Default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
- limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the max number of results should be returned. Default value is -9999.
- encoding (str) –
Specifies the encoding for returned records; either ‘binary’ or ‘json’. Default value is ‘binary’. Allowed values are:
binary
json
The default value is ‘binary’.
- options (dict of str) –
Default value is an empty dict ( {} ). Allowed keys are:
return_record_ids – If ‘true’ then return the internal record ID along with each returned record. Default is ‘false’. Allowed values are:
true
false
The default value is ‘false’.
- force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then
Record
objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use theRecord
object option instead. If True, but none of the returned columns require a conversion, then the originalRecord
objects will be returned. Default value is True.
Returns
A list of
Record
objects containing the record values.
- get_geo_json(offset=0, limit=-9999, options={}, force_primitive_return_types=True)[source]
Retrieves records as a GeoJSON from a given table, optionally filtered by an expression and/or sorted by a column. This operation can be performed on tables, views, or on homogeneous collections (collections containing tables of all the same type). Records can be returned encoded as binary or json.
This operation supports paging through the data via the input parameter offset and input parameter limit parameters. Note that when paging through a table, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records retrieved may differ between calls based on the updates applied.
Decodes and returns the fetched records.
Parameters
- offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). Default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
- limit (long) –
A positive integer indicating the maximum number of results to be returned. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. Default value is -9999.
- encoding (str) –
Specifies the encoding for returned records. Default value is ‘binary’. Allowed values are:
binary
json
The default value is ‘binary’.
- options (dict of str) –
Default value is an empty dict ( {} ). Allowed keys are:
expression – Optional filter expression to apply to the table.
fast_index_lookup – Indicates if indexes should be used to perform the lookup for a given expression if possible. Only applicable if there is no sorting, the expression contains only equivalence comparisons based on existing tables indexes and the range of requested values is from [0 to END_OF_SET]. The default value is true.
sort_by – Optional column that the data should be sorted by. Empty by default (i.e. no sorting is applied).
sort_order – String indicating how the returned values should be sorted - ascending or descending. If sort_order is provided, sort_by has to be provided. Allowed values are:
ascending
descending
The default value is ‘ascending’.
- force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then
Record
objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use theRecord
object option instead. If True, but none of the returned columns require a conversion, then the originalRecord
objects will be returned. Default value is True.
Returns
A GeoJSON object (a dict)
- to_df(**kwargs)[source]
Converts the table data to a Pandas Data Frame.
Parameters
- batch_size (int) –
The number of records to retrieve at a time from the database
Returns
A Pandas Data Frame containing the table data.
- classmethod from_df(df, db: GPUdb, table_name: str, column_types: dict = {}, clear_table: bool = False, create_table: bool = True, load_data: bool = True, show_progress: bool = False, batch_size: int = 5000, **kwargs)[source]
Load a Data Frame into a table; optionally dropping any existing table, creating it if it doesn’t exist, and loading data into it; and then returning a GPUdbTable reference to the table.
Parameters
- df (pd.DataFrame) –
The Pandas Data Frame to load into a table
- db (GPUdb) –
GPUdb instance
- table_name (str) –
Name of the target Kinetica table for the Data Frame loading
- column_types (dict) –
Optional Kinetica column properties to apply to the column type definitions inferred from the Data Frame; map of column name to a list of column properties for that column, excluding the inferred base type. For example:
{ "middle_name": [ 'char64', 'nullable' ], "state": [ 'char2', 'dict' ] }
- clear_table (bool) –
Whether to drop an existing table of the same name or not before creating this one.
- create_table (bool) –
Whether to create the table if it doesn’t exist or not.
- load_data (bool) –
Whether to load data into the target table or not.
- show_progress (bool) –
Whether to show progress of the operation on the console.
- batch_size (int) –
The number of records at a time to load into the target table.
Raises
GPUdbException
Returns
- GPUdbTable –
a GPUdbTable instance created from the Data Frame passed in
- static create_join_table(db, join_table_name=None, table_names=None, column_names=None, expressions=[], options={})[source]
Creates a table that is the result of a SQL JOIN.
For join details and examples see: Joins. For limitations, see Join Limitations and Cautions.
Parameters
- join_table_name (str) –
Name of the join table to be created, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria.
- table_names (list of str) –
The list of table names composing the join, each in [schema_name.]table_name format, using standard name resolution rules. Corresponds to a SQL statement FROM clause. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- column_names (list of str) –
List of member table columns or column expressions to be included in the join. Columns can be prefixed with ‘table_id.column_name’, where ‘table_id’ is the table name or alias. Columns can be aliased via the syntax ‘column_name as alias’. Wild cards ‘*’ can be used to include all columns across member tables or ‘table_id.*’ for all of a single table’s columns. Columns and column expressions composing the join must be uniquely named or aliased–therefore, the ‘*’ wild card cannot be used if column names aren’t unique across all tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- expressions (list of str) –
An optional list of expressions to combine and filter the joined tables. Corresponds to a SQL statement WHERE clause. For details see: expressions. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter join_table_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_join_table_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the join as part of input parameter join_table_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the join. If the schema is non-existent, it will be automatically created. The default value is ‘’.max_query_dimensions – No longer used.
optimize_lookups – Use more memory to speed up the joining of tables. Allowed values are:
true
false
The default value is ‘false’.
strategy_definition – The tier strategy for the table and its columns.
ttl – Sets the TTL of the join table specified in input parameter join_table_name.
view_id – view this projection is part of. The default value is ‘’.
no_count – Return a count of 0 for the join table for logging and for
GPUdbTable.show_table()
; optimization needed for large overlapped equi-join stencils. The default value is ‘false’.chunk_size – Maximum number of records per joined-chunk for this table. Defaults to the gpudb.conf file chunk size
enable_virtual_chunking – Collect chunks with accumulated size less than chunk_size into a single chunk. The default value is ‘false’.
enable_pk_equi_join – Use equi-join to do primary key joins rather than using primary-key-index
The default value is an empty dict ( {} ).
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- static create_union(db, table_name=None, table_names=None, input_column_names=None, output_column_names=None, options={})[source]
Merges data from one or more tables with comparable data types into a new table.
The following merges are supported:
UNION (DISTINCT/ALL) - For data set union details and examples, see Union. For limitations, see Union Limitations and Cautions.
INTERSECT (DISTINCT/ALL) - For data set intersection details and examples, see Intersect. For limitations, see Intersect Limitations.
EXCEPT (DISTINCT/ALL) - For data set subtraction details and examples, see Except. For limitations, see Except Limitations.
MERGE VIEWS - For a given set of filtered views on a single table, creates a single filtered view containing all of the unique records across all of the given filtered data sets.
Non-charN ‘string’ and ‘bytes’ column types cannot be merged, nor can columns marked as store-only.
Parameters
- table_name (str) –
Name of the table to be created, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria.
- table_names (list of str) –
The list of table names to merge, in [schema_name.]table_name format, using standard name resolution rules. Must contain the names of one or more existing tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- input_column_names (list of lists of str) –
The list of columns from each of the corresponding input tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- output_column_names (list of str) –
The list of names of the columns to be stored in the output table. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter table_name. If persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_table_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the projection as part of input parameter table_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of the schema for the output table. If the schema provided is non-existent, it will be automatically created. The default value is ‘’.mode – If merge_views, then this operation will merge the provided views. All input parameter table_names must be views from the same underlying base table. Allowed values are:
union_all – Retains all rows from the specified tables.
union – Retains all unique rows from the specified tables (synonym for union_distinct).
union_distinct – Retains all unique rows from the specified tables.
except – Retains all unique rows from the first table that do not appear in the second table (only works on 2 tables).
except_all – Retains all rows(including duplicates) from the first table that do not appear in the second table (only works on 2 tables).
intersect – Retains all unique rows that appear in both of the specified tables (only works on 2 tables).
intersect_all – Retains all rows(including duplicates) that appear in both of the specified tables (only works on 2 tables).
merge_views – Merge two or more views (or views of views) of the same base data set into a new view. If this mode is selected input parameter input_column_names AND input parameter output_column_names must be empty. The resulting view would match the results of a SQL OR operation, e.g., if filter 1 creates a view using the expression ‘x = 20’ and filter 2 creates a view using the expression ‘x <= 10’, then the merge views operation creates a new view using the expression ‘x = 20 OR x <= 10’.
The default value is ‘union_all’.
chunk_size – Indicates the number of records per chunk to be used for this output table.
chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for this output table.
chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for this output table.
create_indexes – Comma-separated list of columns on which to create indexes on the output table. The columns specified must be present in input parameter output_column_names.
ttl – Sets the TTL of the output table specified in input parameter table_name.
persist – If true, then the output table specified in input parameter table_name will be persisted and will not expire unless a ttl is specified. If false, then the output table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:
true
false
The default value is ‘false’.
view_id – ID of view of which this output table is a member. The default value is ‘’.
force_replicated – If true, then the output table specified in input parameter table_name will be replicated even if the source tables are not. Allowed values are:
true
false
The default value is ‘false’.
strategy_definition – The tier strategy for the table and its columns.
The default value is an empty dict ( {} ).
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- static merge_records(db, table_name=None, source_table_names=None, field_maps=None, options={})[source]
Create a new empty result table (specified by input parameter table_name), and insert all records from source tables (specified by input parameter source_table_names) based on the field mapping information (specified by input parameter field_maps).
For merge records details and examples, see Merge Records. For limitations, see Merge Records Limitations and Cautions.
The field map (specified by input parameter field_maps) holds the user-specified maps of target table column names to source table columns. The array of input parameter field_maps must match one-to-one with the input parameter source_table_names, e.g., there’s a map present in input parameter field_maps for each table listed in input parameter source_table_names.
Parameters
- table_name (str) –
The name of the new result table for the records to be merged into, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. Must NOT be an existing table.
- source_table_names (list of str) –
The list of names of source tables to get the records from, each in [schema_name.]table_name format, using standard name resolution rules. Must be existing table names. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- field_maps (list of dicts of str to str) –
Contains a list of source/target column mappings, one mapping for each source table listed in input parameter source_table_names being merged into the target table specified by input parameter table_name. Each mapping contains the target column names (as keys) that the data in the mapped source columns or column expressions (as values) will be merged into. All of the source columns being merged into a given target column must match in type, as that type will determine the type of the new target column. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter table_name. If persist is false, then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_table_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the merged table as part of input parameter table_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created merged table specified by input parameter table_name.is_replicated – Indicates the distribution scheme for the data of the merged table specified in input parameter table_name. If true, the table will be replicated. If false, the table will be randomly sharded. Allowed values are:
true
false
The default value is ‘false’.
ttl – Sets the TTL of the merged table specified in input parameter table_name.
persist – If true, then the table specified in input parameter table_name will be persisted and will not expire unless a ttl is specified. If false, then the table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:
true
false
The default value is ‘true’.
chunk_size – Indicates the number of records per chunk to be used for the merged table specified in input parameter table_name.
chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the merged table specified in input parameter table_name.
chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the merged table specified in input parameter table_name.
view_id – view this result table is part of. The default value is ‘’.
The default value is an empty dict ( {} ).
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_convex_hull(x_column_name=None, y_column_name=None, options={})[source]
Calculates and returns the convex hull for the values in a table specified by input parameter table_name.
Parameters
- x_column_name (str) –
Name of the column containing the x coordinates of the points for the operation being performed.
- y_column_name (str) –
Name of the column containing the y coordinates of the points for the operation being performed.
- options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- x_vector (list of floats) –
Array of x coordinates of the resulting convex set.
- y_vector (list of floats) –
Array of y coordinates of the resulting convex set.
- count (int) –
Count of the number of points in the convex set.
is_valid (bool)
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_group_by(column_names=None, offset=0, limit=-9999, encoding='binary', options={}, force_primitive_return_types=True, get_column_major=True)[source]
Calculates unique combinations (groups) of values for the given columns in a given table or view and computes aggregates on each unique combination. This is somewhat analogous to an SQL-style SELECT…GROUP BY.
For aggregation details and examples, see Aggregation. For limitations, see Aggregation Limitations.
Any column(s) can be grouped on, and all column types except unrestricted-length strings may be used for computing applicable aggregates; columns marked as store-only are unable to be used in grouping or aggregation.
The results can be paged via the input parameter offset and input parameter limit parameters. For example, to get 10 groups with the largest counts the inputs would be: limit=10, options={“sort_order”:”descending”, “sort_by”:”value”}.
Input parameter options can be used to customize behavior of this call e.g. filtering or sorting the results.
To group by columns ‘x’ and ‘y’ and compute the number of objects within each group, use: column_names=[‘x’,’y’,’count(*)’].
To also compute the sum of ‘z’ over each group, use: column_names=[‘x’,’y’,’count(*)’,’sum(z)’].
Available aggregation functions are: count(*), sum, min, max, avg, mean, stddev, stddev_pop, stddev_samp, var, var_pop, var_samp, arg_min, arg_max and count_distinct.
Available grouping functions are Rollup, Cube, and Grouping Sets
This service also provides support for Pivot operations.
Filtering on aggregates is supported via expressions using aggregation functions supplied to having.
The response is returned as a dynamic schema. For details see: dynamic schemas documentation.
If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column names must adhere to standard naming conventions; column/aggregation expressions will need to be aliased. If the source table’s shard key is used as the grouping column(s) and all result records are selected (input parameter offset is 0 and input parameter limit is -9999), the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available when any of the values of input parameter column_names is an unrestricted-length string.
Parameters
- column_names (list of str) –
List of one or more column names, expressions, and aggregate expressions. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
- limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.
- encoding (str) –
Specifies the encoding for returned records. Allowed values are:
binary – Indicates that the returned records should be binary encoded.
json – Indicates that the returned records should be json encoded.
The default value is ‘binary’.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema as part of result_table and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema which is to contain the table specified in result_table. If the schema provided is non-existent, it will be automatically created.expression – Filter expression to apply to the table prior to computing the aggregate group by.
chunked_expression_evaluation – evaluate the filter expression during group-by chunk processing. Allowed values are:
true
false
The default value is ‘false’.
having – Filter expression to apply to the aggregated results.
sort_order – [DEPRECATED–use order_by instead] String indicating how the returned values should be sorted - ascending or descending. Allowed values are:
ascending – Indicates that the returned values should be sorted in ascending order.
descending – Indicates that the returned values should be sorted in descending order.
The default value is ‘ascending’.
sort_by – [DEPRECATED–use order_by instead] String determining how the results are sorted. Allowed values are:
key – Indicates that the returned values should be sorted by key, which corresponds to the grouping columns. If you have multiple grouping columns (and are sorting by key), it will first sort the first grouping column, then the second grouping column, etc.
value – Indicates that the returned values should be sorted by value, which corresponds to the aggregates. If you have multiple aggregates (and are sorting by value), it will first sort by the first aggregate, then the second aggregate, etc.
The default value is ‘value’.
order_by – Comma-separated list of the columns to be sorted by as well as the sort direction, e.g., ‘timestamp asc, x desc’. The default value is ‘’.
strategy_definition – The tier strategy for the table and its columns.
result_table – The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. Column names (group-by and aggregate fields) need to be given aliases e.g. [“FChar256 as fchar256”, “sum(FDouble) as sfd”]. If present, no results are returned in the response. This option is not available if one of the grouping attributes is an unrestricted string (i.e.; not charN) type.
result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:
true
false
The default value is ‘false’.
result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:
true
false
The default value is ‘false’.
result_table_generate_pk – If true then set a primary key for the result table. Must be used in combination with the result_table option. Allowed values are:
true
false
The default value is ‘false’.
ttl – Sets the TTL of the table specified in result_table.
chunk_size – Indicates the number of records per chunk to be used for the result table. Must be used in combination with the result_table option.
chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the result_table option.
chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the result_table option.
create_indexes – Comma-separated list of columns on which to create indexes on the result table. Must be used in combination with the result_table option.
view_id – ID of view of which the result table will be a member. The default value is ‘’.
pivot – pivot column
pivot_values – The value list provided will become the column headers in the output. Should be the values from the pivot_column.
grouping_sets – Customize the grouping attribute sets to compute the aggregates. These sets can include ROLLUP or CUBE operartors. The attribute sets should be enclosed in paranthesis and can include composite attributes. All attributes specified in the grouping sets must present in the groupby attributes.
rollup – This option is used to specify the multilevel aggregates.
cube – This option is used to specify the multidimensional aggregates.
shard_key – Comma-separated list of the columns to be sharded on; e.g. ‘column1, column2’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.
The default value is an empty dict ( {} ).
- force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then
Record
objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use theRecord
object option instead. If True, but none of the returned columns require a conversion, then the originalRecord
objects will be returned. Default value is True.- get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.
Returns
A read-only GPUdbTable object if input options has “result_table”; –
otherwise the response from the server, which is a dict containing –
the following entries–
- response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.
- total_number_of_records (long) –
Total/Filtered number of records. This may be an over-estimate if a limit was applied and there are additional records (i.e., when output parameter has_more_records is true).
- has_more_records (bool) –
Too many records. Returned a partial set.
- info (dict of str to str) –
Additional information. Allowed keys are:
qualified_result_table_name – The fully qualified name of the table (i.e. including the schema) used to store the results.
The default value is an empty dict ( {} ).
- records (list of
Record
) – A list of
Record
objects which contain the decoded records.- data (list of
Record
) – A list of
Record
objects which contain the decoded records.
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_histogram(column_name=None, start=None, end=None, interval=None, options={})[source]
Performs a histogram calculation given a table, a column, and an interval function. The input parameter interval is used to produce bins of that size and the result, computed over the records falling within each bin, is returned. For each bin, the start value is inclusive, but the end value is exclusive–except for the very last bin for which the end value is also inclusive. The value returned for each bin is the number of records in it, except when a column name is provided as a value_column. In this latter case the sum of the values corresponding to the value_column is used as the result instead. The total number of bins requested cannot exceed 10,000.
NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service a request that specifies a value_column.
Parameters
- column_name (str) –
Name of a column or an expression of one or more column names over which the histogram will be calculated.
- start (float) –
Lower end value of the histogram interval, inclusive.
- end (float) –
Upper end value of the histogram interval, inclusive.
- interval (float) –
The size of each bin within the start and end parameters.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
value_column – The name of the column to use when calculating the bin values (values are summed). The column must be a numerical type (int, double, long, float).
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- counts (list of floats) –
The array of calculated values that represents the histogram data points.
- start (float) –
Value of input parameter start.
- end (float) –
Value of input parameter end.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_k_means(column_names=None, k=None, tolerance=None, options={})[source]
This endpoint runs the k-means algorithm - a heuristic algorithm that attempts to do k-means clustering. An ideal k-means clustering algorithm selects k points such that the sum of the mean squared distances of each member of the set to the nearest of the k points is minimized. The k-means algorithm however does not necessarily produce such an ideal cluster. It begins with a randomly selected set of k points and then refines the location of the points iteratively and settles to a local minimum. Various parameters and options are provided to control the heuristic search.
NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.
Parameters
- column_names (list of str) –
List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- k (int) –
The number of mean points to be determined by the algorithm.
- tolerance (float) –
Stop iterating when the distances between successive points is less than the given tolerance.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
whiten – When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
max_iters – Number of times to try to hit the tolerance limit before giving up - default is 10.
num_tries – Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:
true
false
The default value is ‘false’.
result_table – The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response.
result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:
true
false
The default value is ‘false’.
ttl – Sets the TTL of the table specified in result_table.
The default value is an empty dict ( {} ).
Returns
A read-only GPUdbTable object if input options has “result_table”; –
otherwise the response from the server, which is a dict containing –
the following entries–
- means (list of lists of floats) –
The k-mean values found.
- counts (list of longs) –
The number of elements in the cluster closest the corresponding k-means values.
- rms_dists (list of floats) –
The root mean squared distance of the elements in the cluster for each of the k-means values.
- count (long) –
The total count of all the clusters - will be the size of the input table.
- rms_dist (float) –
The sum of all the rms_dists - the value the k-means algorithm is attempting to minimize.
- tolerance (float) –
The distance between the last two iterations of the algorithm before it quit.
- num_iters (int) –
The number of iterations the algorithm executed before it quit.
- info (dict of str to str) –
Additional information. Allowed keys are:
qualified_result_table_name – The fully qualified name of the result table (i.e. including the schema) used to store the results.
The default value is an empty dict ( {} ).
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_min_max(column_name=None, options={})[source]
Calculates and returns the minimum and maximum values of a particular column in a table.
Parameters
- column_name (str) –
Name of a column or an expression of one or more column on which the min-max will be calculated.
- options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- min (float) –
Minimum value of the input parameter column_name.
- max (float) –
Maximum value of the input parameter column_name.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_min_max_geometry(column_name=None, options={})[source]
Calculates and returns the minimum and maximum x- and y-coordinates of a particular geospatial geometry column in a table.
Parameters
- column_name (str) –
Name of a geospatial geometry column on which the min-max will be calculated.
- options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- min_x (float) –
Minimum x-coordinate value of the input parameter column_name.
- max_x (float) –
Maximum x-coordinate value of the input parameter column_name.
- min_y (float) –
Minimum y-coordinate value of the input parameter column_name.
- max_y (float) –
Maximum y-coordinate value of the input parameter column_name.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_statistics(column_name=None, stats=None, options={})[source]
Calculates the requested statistics of the given column(s) in a given table.
The available statistics are: count (number of total objects), mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, weighted_average, cardinality (unique count), estimated_cardinality, percentile, and percentile_rank.
Estimated cardinality is calculated by using the hyperloglog approximation technique.
Percentiles and percentile ranks are approximate and are calculated using the t-digest algorithm. They must include the desired percentile/percentile_rank. To compute multiple percentiles each value must be specified separately (i.e. ‘percentile(75.0),percentile(99.0),percentile_rank(1234.56),percentile_rank(-5)’).
A second, comma-separated value can be added to the percentile statistic to calculate percentile resolution, e.g., a 50th percentile with 200 resolution would be ‘percentile(50,200)’.
The weighted average statistic requires a weight column to be specified in weight_column_name. The weighted average is then defined as the sum of the products of input parameter column_name times the weight_column_name values divided by the sum of the weight_column_name values.
Additional columns can be used in the calculation of statistics via additional_column_names. Values in these columns will be included in the overall aggregate calculation–individual aggregates will not be calculated per additional column. For instance, requesting the count & mean of input parameter column_name x and additional_column_names y & z, where x holds the numbers 1-10, y holds 11-20, and z holds 21-30, would return the total number of x, y, & z values (30), and the single average value across all x, y, & z values (15.5).
The response includes a list of key/value pairs of each statistic requested and its corresponding value.
Parameters
- column_name (str) –
Name of the primary column for which the statistics are to be calculated.
- stats (str) –
Comma separated list of the statistics to calculate, e.g. “sum,mean”. Allowed values are:
count – Number of objects (independent of the given column(s)).
mean – Arithmetic mean (average), equivalent to sum/count.
stdv – Sample standard deviation (denominator is count-1).
variance – Unbiased sample variance (denominator is count-1).
skew – Skewness (third standardized moment).
kurtosis – Kurtosis (fourth standardized moment).
sum – Sum of all values in the column(s).
min – Minimum value of the column(s).
max – Maximum value of the column(s).
weighted_average – Weighted arithmetic mean (using the option weight_column_name as the weighting column).
cardinality – Number of unique values in the column(s).
estimated_cardinality – Estimate (via hyperloglog technique) of the number of unique values in the column(s).
percentile – Estimate (via t-digest) of the given percentile of the column(s) (percentile(50.0) will be an approximation of the median). Add a second, comma-separated value to calculate percentile resolution, e.g., ‘percentile(75,150)’
percentile_rank – Estimate (via t-digest) of the percentile rank of the given value in the column(s) (if the given value is the median of the column(s), percentile_rank(<median>) will return approximately 50.0).
- options (dict of str to str) –
Optional parameters. Allowed keys are:
additional_column_names – A list of comma separated column names over which statistics can be accumulated along with the primary column. All columns listed and input parameter column_name must be of the same type. Must not include the column specified in input parameter column_name and no column can be listed twice.
weight_column_name – Name of column used as weighting attribute for the weighted average statistic.
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- stats (dict of str to floats) –
(statistic name, double value) pairs of the requested statistics, including the total count by default.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_statistics_by_range(select_expression='', column_name=None, value_column_name=None, stats=None, start=None, end=None, interval=None, options={})[source]
Divides the given set into bins and calculates statistics of the values of a value-column in each bin. The bins are based on the values of a given binning-column. The statistics that may be requested are mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, first, last and weighted average. In addition to the requested statistics the count of total samples in each bin is returned. This counts vector is just the histogram of the column used to divide the set members into bins. The weighted average statistic requires a weight column to be specified in weight_column_name. The weighted average is then defined as the sum of the products of the value column times the weight column divided by the sum of the weight column.
There are two methods for binning the set members. In the first, which can be used for numeric valued binning-columns, a min, max and interval are specified. The number of bins, nbins, is the integer upper bound of (max-min)/interval. Values that fall in the range [min+n*interval,min+(n+1)*interval) are placed in the nth bin where n ranges from 0..nbin-2. The final bin is [min+(nbin-1)*interval,max]. In the second method, bin_values specifies a list of binning column values. Binning-columns whose value matches the nth member of the bin_values list are placed in the nth bin. When a list is provided, the binning-column must be of type string or int.
NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.
Parameters
- select_expression (str) –
For a non-empty expression statistics are calculated for those records for which the expression is true. The default value is ‘’.
- column_name (str) –
Name of the binning-column used to divide the set samples into bins.
- value_column_name (str) –
Name of the value-column for which statistics are to be computed.
- stats (str) –
A string of comma separated list of the statistics to calculate, e.g. ‘sum,mean’. Available statistics: mean, stdv (standard deviation), variance, skew, kurtosis, sum.
- start (float) –
The lower bound of the binning-column.
- end (float) –
The upper bound of the binning-column.
- interval (float) –
The interval of a bin. Set members fall into bin i if the binning-column falls in the range [start+interval*i, start+interval*(i+1)).
- options (dict of str to str) –
Map of optional parameters: Allowed keys are:
additional_column_names – A list of comma separated value-column names over which statistics can be accumulated along with the primary value_column.
bin_values – A list of comma separated binning-column values. Values that match the nth bin_values value are placed in the nth bin.
weight_column_name – Name of the column used as weighting column for the weighted_average statistic.
order_column_name – Name of the column used for candlestick charting techniques.
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- stats (dict of str to lists of floats) –
A map with a key for each statistic in the stats input parameter having a value that is a vector of the corresponding value-column bin statistics. In a addition the key count has a value that is a histogram of the binning-column.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_unique(column_name=None, offset=0, limit=-9999, encoding='binary', options={}, force_primitive_return_types=True, get_column_major=True)[source]
Returns all the unique values from a particular column (specified by input parameter column_name) of a particular table or view (specified by input parameter table_name). If input parameter column_name is a numeric column, the values will be in output parameter binary_encoded_response. Otherwise if input parameter column_name is a string column, the values will be in output parameter json_encoded_response. The results can be paged via input parameter offset and input parameter limit parameters.
Columns marked as store-only are unable to be used with this function.
To get the first 10 unique values sorted in descending order input parameter options would be:
{"limit":"10","sort_order":"descending"}
The response is returned as a dynamic schema. For details see: dynamic schemas documentation.
If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column name must adhere to standard naming conventions; any column expression will need to be aliased. If the source table’s shard key is used as the input parameter column_name, the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available if the value of input parameter column_name is an unrestricted-length string.
Parameters
- column_name (str) –
Name of the column or an expression containing one or more column names on which the unique function would be applied.
- offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
- limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.
- encoding (str) –
Specifies the encoding for returned records. Allowed values are:
binary – Indicates that the returned records should be binary encoded.
json – Indicates that the returned records should be json encoded.
The default value is ‘binary’.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema as part of result_table and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema which is to contain the table specified in result_table. If the schema provided is non-existent, it will be automatically created.expression – Optional filter expression to apply to the table.
sort_order – String indicating how the returned values should be sorted. Allowed values are:
ascending
descending
The default value is ‘ascending’.
order_by – Comma-separated list of the columns to be sorted by as well as the sort direction, e.g., ‘timestamp asc, x desc’. The default value is ‘’.
result_table – The name of the table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If present, no results are returned in the response. Not available if input parameter column_name is an unrestricted-length string.
result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:
true
false
The default value is ‘false’.
result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:
true
false
The default value is ‘false’.
result_table_generate_pk – If true then set a primary key for the result table. Must be used in combination with the result_table option. Allowed values are:
true
false
The default value is ‘false’.
ttl – Sets the TTL of the table specified in result_table.
chunk_size – Indicates the number of records per chunk to be used for the result table. Must be used in combination with the result_table option.
chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the result_table option.
chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the result_table option.
view_id – ID of view of which the result table will be a member. The default value is ‘’.
The default value is an empty dict ( {} ).
- force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then
Record
objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use theRecord
object option instead. If True, but none of the returned columns require a conversion, then the originalRecord
objects will be returned. Default value is True.- get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.
Returns
A read-only GPUdbTable object if input options has “result_table”; –
otherwise the response from the server, which is a dict containing –
the following entries–
- table_name (str) –
The same table name as was passed in the parameter list.
- response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.
- has_more_records (bool) –
Too many records. Returned a partial set.
- info (dict of str to str) –
Additional information. Allowed keys are:
qualified_result_table_name – The fully qualified name of the table (i.e. including the schema) used to store the results.
The default value is an empty dict ( {} ).
- records (list of
Record
) – A list of
Record
objects which contain the decoded records.- data (list of
Record
) – A list of
Record
objects which contain the decoded records.
Raises
- GPUdbException – –
Upon an error from the server.
- aggregate_unpivot(column_names=None, variable_column_name='', value_column_name='', pivoted_columns=None, encoding='binary', options={}, force_primitive_return_types=True, get_column_major=True)[source]
Rotate the column values into rows values.
For unpivot details and examples, see Unpivot. For limitations, see Unpivot Limitations.
Unpivot is used to normalize tables that are built for cross tabular reporting purposes. The unpivot operator rotates the column values for all the pivoted columns. A variable column, value column and all columns from the source table except the unpivot columns are projected into the result table. The variable column and value columns in the result table indicate the pivoted column name and values respectively.
The response is returned as a dynamic schema. For details see: dynamic schemas documentation.
Parameters
- column_names (list of str) –
List of column names or expressions. A wildcard ‘*’ can be used to include all the non-pivoted columns from the source table. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- variable_column_name (str) –
Specifies the variable/parameter column name. The default value is ‘’.
- value_column_name (str) –
Specifies the value column name. The default value is ‘’.
- pivoted_columns (list of str) –
List of one or more values typically the column names of the input table. All the columns in the source table must have the same data type. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- encoding (str) –
Specifies the encoding for returned records. Allowed values are:
binary – Indicates that the returned records should be binary encoded.
json – Indicates that the returned records should be json encoded.
The default value is ‘binary’.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema as part of result_table and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema which is to contain the table specified in result_table. If the schema is non-existent, it will be automatically created.result_table – The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If present, no results are returned in the response.
result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:
true
false
The default value is ‘false’.
expression – Filter expression to apply to the table prior to unpivot processing.
order_by – Comma-separated list of the columns to be sorted by; e.g. ‘timestamp asc, x desc’. The columns specified must be present in input table. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.
chunk_size – Indicates the number of records per chunk to be used for the result table. Must be used in combination with the result_table option.
chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the result_table option.
chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the result_table option.
limit – The number of records to keep. The default value is ‘’.
ttl – Sets the TTL of the table specified in result_table.
view_id – view this result table is part of. The default value is ‘’.
create_indexes – Comma-separated list of columns on which to create indexes on the table specified in result_table. The columns specified must be present in output column names. If any alias is given for any column name, the alias must be used, rather than the original column name.
result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:
true
false
The default value is ‘false’.
The default value is an empty dict ( {} ).
- force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then
Record
objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use theRecord
object option instead. If True, but none of the returned columns require a conversion, then the originalRecord
objects will be returned. Default value is True.- get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.
Returns
A read-only GPUdbTable object if input options has “result_table”; –
otherwise the response from the server, which is a dict containing –
the following entries–
- table_name (str) –
Typically shows the result-table name if provided in the request (Ignore otherwise).
- response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.
- total_number_of_records (long) –
Total/Filtered number of records.
- has_more_records (bool) –
Too many records. Returned a partial set.
- info (dict of str to str) –
Additional information. Allowed keys are:
qualified_result_table_name – The fully qualified name of the table (i.e. including the schema) used to store the results.
The default value is an empty dict ( {} ).
- records (list of
Record
) – A list of
Record
objects which contain the decoded records.- data (list of
Record
) – A list of
Record
objects which contain the decoded records.
Raises
- GPUdbException – –
Upon an error from the server.
- alter_table(action=None, value=None, options={})[source]
Apply various modifications to a table or view. The available modifications include the following:
Manage a table’s columns–a column can be added, removed, or have its type and properties modified, including whether it is dictionary encoded or not.
External tables cannot be modified except for their refresh method.
Create or delete a column, low-cardinality index, chunk skip, geospatial, CAGRA, or HNSW index. This can speed up certain operations when using expressions containing equality or relational operators on indexed columns. This only applies to tables.
Create or delete a foreign key on a particular column.
Manage a range-partitioned or a manual list-partitioned table’s partitions.
Set (or reset) the tier strategy of a table or view.
Refresh and manage the refresh mode of a materialized view or an external table.
Set the time-to-live (TTL). This can be applied to tables or views.
Set the global access mode (i.e. locking) for a table. This setting trumps any role-based access controls that may be in place; e.g., a user with write access to a table marked read-only will not be able to insert records into it. The mode can be set to read-only, write-only, read/write, and no access.
Parameters
- action (str) –
Modification operation to be applied. Allowed values are:
allow_homogeneous_tables – No longer supported; action will be ignored.
create_index – Creates a column (attribute) index, low-cardinality index, chunk skip index, geospatial index, CAGRA index, or HNSW index (depending on the specified index_type), on the column name specified in input parameter value. If this column already has the specified index, an error will be returned.
refresh_index – Refreshes an index identified by index_type, on the column name specified in input parameter value. Currently applicable only to CAGRA indices.
delete_index – Deletes a column (attribute) index, low-cardinality index, chunk skip index, geospatial index, CAGRA index, or HNSW index (depending on the specified index_type), on the column name specified in input parameter value. If this column does not have the specified index, an error will be returned.
move_to_collection – [DEPRECATED–please use move_to_schema and use
GPUdb.create_schema()
to create the schema if non-existent] Moves a table or view into a schema named input parameter value. If the schema provided is non-existent, it will be automatically created.move_to_schema – Moves a table or view into a schema named input parameter value. If the schema provided is nonexistent, an error will be thrown. If input parameter value is empty, then the table or view will be placed in the user’s default schema.
protected – No longer used. Previously set whether the given input parameter table_name should be protected or not. The input parameter value would have been either ‘true’ or ‘false’.
rename_table – Renames a table or view to input parameter value. Has the same naming restrictions as tables.
ttl – Sets the time-to-live in minutes of the table or view specified in input parameter table_name.
add_comment – Adds the comment specified in input parameter value to the table specified in input parameter table_name. Use column_name to set the comment for a column.
add_column – Adds the column specified in input parameter value to the table specified in input parameter table_name. Use column_type and column_properties in input parameter options to set the column’s type and properties, respectively.
change_column – Changes type and properties of the column specified in input parameter value. Use column_type and column_properties in input parameter options to set the column’s type and properties, respectively. Note that primary key and/or shard key columns cannot be changed. All unchanging column properties must be listed for the change to take place, e.g., to add dictionary encoding to an existing ‘char4’ column, both ‘char4’ and ‘dict’ must be specified in the input parameter options map.
set_column_compression – No longer supported; action will be ignored.
delete_column – Deletes the column specified in input parameter value from the table specified in input parameter table_name.
create_foreign_key – Creates a foreign key specified in input parameter value using the format ‘(source_column_name [, …]) references target_table_name(primary_key_column_name [, …]) [as foreign_key_name]’.
delete_foreign_key – Deletes a foreign key. The input parameter value should be the foreign_key_name specified when creating the key or the complete string used to define it.
add_partition – Adds the partition specified in input parameter value, to either a range-partitioned or manual list-partitioned table.
remove_partition – Removes the partition specified in input parameter value (and relocates all of its data to the default partition) from either a range-partitioned or manual list-partitioned table.
delete_partition – Deletes the partition specified in input parameter value (and all of its data) from either a range-partitioned or manual list-partitioned table.
set_global_access_mode – Sets the global access mode (i.e. locking) for the table specified in input parameter table_name. Specify the access mode in input parameter value. Valid modes are ‘no_access’, ‘read_only’, ‘write_only’ and ‘read_write’.
refresh – For a materialized view, replays all the table creation commands required to create the view. For an external table, reloads all data in the table from its associated source files or data source.
set_refresh_method – For a materialized view, sets the method by which the view is refreshed to the method specified in input parameter value - one of ‘manual’, ‘periodic’, or ‘on_change’. For an external table, sets the method by which the table is refreshed to the method specified in input parameter value - either ‘manual’ or ‘on_start’.
set_refresh_start_time – Sets the time to start periodic refreshes of this materialized view to the datetime string specified in input parameter value with format ‘YYYY-MM-DD HH:MM:SS’. Subsequent refreshes occur at the specified time + N * the refresh period.
set_refresh_stop_time – Sets the time to stop periodic refreshes of this materialized view to the datetime string specified in input parameter value with format ‘YYYY-MM-DD HH:MM:SS’.
set_refresh_period – Sets the time interval in seconds at which to refresh this materialized view to the value specified in input parameter value. Also, sets the refresh method to periodic if not already set.
set_refresh_span – Sets the future time-offset(in seconds) for the view refresh to stop.
set_refresh_execute_as – Sets the user name to refresh this materialized view to the value specified in input parameter value.
remove_text_search_attributes – Removes text search attribute from all columns.
remove_shard_keys – Removes the shard key property from all columns, so that the table will be considered randomly sharded. The data is not moved. The input parameter value is ignored.
set_strategy_definition – Sets the tier strategy for the table and its columns to the one specified in input parameter value, replacing the existing tier strategy in its entirety.
cancel_datasource_subscription – Permanently unsubscribe a data source that is loading continuously as a stream. The data source can be Kafka / S3 / Azure.
pause_datasource_subscription – Temporarily unsubscribe a data source that is loading continuously as a stream. The data source can be Kafka / S3 / Azure.
resume_datasource_subscription – Resubscribe to a paused data source subscription. The data source can be Kafka / S3 / Azure.
change_owner – Change the owner resource group of the table.
set_load_vectors_policy – Set startup data loading scheme for the table; see description of ‘load_vectors_policy’ in
GPUdb.create_table()
for possible values for input parameter valueset_build_pk_index_policy – Set startup primary key generation scheme for the table; see description of ‘build_pk_index_policy’ in
GPUdb.create_table()
for possible values for input parameter valueset_build_materialized_view_policy – Set startup rebuilding scheme for the materialized view; see description of ‘build_materialized_view_policy’ in
GPUdb.create_materialized_view()
for possible values for input parameter value
- value (str) –
The value of the modification, depending on input parameter action. For example, if input parameter action is add_column, this would be the column name; while the column’s definition would be covered by the column_type, column_properties, column_default_value, and add_column_expression in input parameter options. If input parameter action is ttl, it would be the number of minutes for the new TTL. If input parameter action is refresh, this field would be blank.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
action
column_name
table_name
column_default_value – When adding a column, set a default value for existing records. For nullable columns, the default value will be null, regardless of data type.
column_properties – When adding or changing a column, set the column properties (strings, separated by a comma: data, store_only, text_search, char8, int8 etc).
column_type – When adding or changing a column, set the column type (strings, separated by a comma: int, double, string, null etc).
compression_type – No longer supported; option will be ignored. Allowed values are:
none
snappy
lz4
lz4hc
The default value is ‘snappy’.
copy_values_from_column – [DEPRECATED–please use add_column_expression instead.]
rename_column – When changing a column, specify new column name.
validate_change_column – When changing a column, validate the change before applying it (or not). Allowed values are:
true – Validate all values. A value too large (or too long) for the new type will prevent any change.
false – When a value is too large or long, it will be truncated.
The default value is ‘true’.
update_last_access_time – Indicates whether the time-to-live (TTL) expiration countdown timer should be reset to the table’s TTL. Allowed values are:
true – Reset the expiration countdown timer to the table’s configured TTL.
false – Don’t reset the timer; expiration countdown will continue from where it is, as if the table had not been accessed.
The default value is ‘true’.
add_column_expression – When adding a column, an optional expression to use for the new column’s values. Any valid expression may be used, including one containing references to existing columns in the same table.
strategy_definition – Optional parameter for specifying the tier strategy for the table and its columns when input parameter action is set_strategy_definition, replacing the existing tier strategy in its entirety.
index_type – Type of index to create, when input parameter action is create_index; to refresh, when input parameter action is refresh_index; or to delete, when input parameter action is delete_index. Allowed values are:
column – Create or delete a column (attribute) index.
low_cardinality – Create a low-cardinality column (attribute) index.
chunk_skip – Create or delete a chunk skip index.
geospatial – Create or delete a geospatial index
cagra – Create or delete a CAGRA index on a vector column
hnsw – Create or delete an HNSW index on a vector column
The default value is ‘column’.
index_options – Options to use when creating an index, in the format “key: value [, key: value [, …]]”. Valid options vary by index type.
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- table_name (str) –
Table on which the operation was performed.
- action (str) –
Modification operation that was performed.
- value (str) –
The value of the modification that was performed.
- type_id (str) –
return the type_id (when changing a table, a new type may be created)
- type_definition (str) –
return the type_definition (when changing a table, a new type may be created)
- properties (dict of str to lists of str) –
return the type properties (when changing a table, a new type may be created)
- label (str) –
return the type label (when changing a table, a new type may be created)
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- alter_table_columns(column_alterations=None, options=None)[source]
Apply various modifications to columns in a table, view. The available modifications include the following:
Create or delete an index on a particular column. This can speed up certain operations when using expressions containing equality or relational operators on indexed columns. This only applies to tables.
Manage a table’s columns–a column can be added, removed, or have its type and properties modified, including whether it is dictionary encoded or not.
Parameters
- column_alterations (list of dicts of str to str) –
List of alter table add/delete/change column requests - all for the same table. Each request is a map that includes ‘column_name’, ‘action’ and the options specific for the action. Note that the same options as in alter table requests but in the same map as the column name and the action. For example: [{‘column_name’:’col_1’,’action’:’change_column’,’rename_column’:’col_2’},{‘column_name’:’col_1’,’action’:’add_column’, ‘type’:’int’,’default_value’:’1’}]. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters.
Returns
The response from the server which is a dict containing the –
following entries–
- table_name (str) –
Table on which the operation was performed.
- type_id (str) –
return the type_id (when changing a table, a new type may be created)
- type_definition (str) –
return the type_definition (when changing a table, a new type may be created)
- properties (dict of str to lists of str) –
return the type properties (when changing a table, a new type may be created)
- label (str) –
return the type label (when changing a table, a new type may be created)
- column_alterations (list of dicts of str to str) –
List of alter table add/delete/change column requests - all for the same table. Each request is a map that includes ‘column_name’, ‘action’ and the options specific for the action. Note that the same options as in alter table requests but in the same map as the column name and the action. For example: [{‘column_name’:’col_1’,’action’:’change_column’,’rename_column’:’col_2’},{‘column_name’:’col_1’,’action’:’add_column’, ‘type’:’int’,’default_value’:’1’}]
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- append_records(source_table_name=None, field_map=None, options={})[source]
Append (or insert) all records from a source table (specified by input parameter source_table_name) to a particular target table (specified by input parameter table_name). The field map (specified by input parameter field_map) holds the user specified map of target table column names with their mapped source column names.
Parameters
- source_table_name (str) –
The source table name to get records from, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table name.
- field_map (dict of str to str) –
Contains the mapping of column names from the target table (specified by input parameter table_name) as the keys, and corresponding column names or expressions (e.g., ‘col_name+1’) from the source table (specified by input parameter source_table_name). Must be existing column names in source table and target table, and their types must be matched. For details on using expressions, see Expressions.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
offset – A positive integer indicating the number of initial results to skip from input parameter source_table_name. Default is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT. The default value is ‘0’.
limit – A positive integer indicating the maximum number of results to be returned from input parameter source_table_name. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. The default value is ‘-9999’.
expression – Optional filter expression to apply to the input parameter source_table_name. The default value is ‘’.
order_by – Comma-separated list of the columns to be sorted by from source table (specified by input parameter source_table_name), e.g., ‘timestamp asc, x desc’. The order_by columns do not have to be present in input parameter field_map. The default value is ‘’.
update_on_existing_pk – Specifies the record collision policy for inserting source table records (specified by input parameter source_table_name) into a target table (specified by input parameter table_name) with a primary key. If set to true, any existing table record with primary key values that match those of a source table record being inserted will be replaced by that new record (the new data will be “upserted”). If set to false, any existing table record with primary key values that match those of a source table record being inserted will remain unchanged, while the source record will be rejected and an error handled as determined by ignore_existing_pk. If the specified table does not have a primary key, then this option has no effect. Allowed values are:
true – Upsert new records when primary keys match existing records
false – Reject new records when primary keys match existing records
The default value is ‘false’.
ignore_existing_pk – Specifies the record collision error-suppression policy for inserting source table records (specified by input parameter source_table_name) into a target table (specified by input parameter table_name) with a primary key, only used when not in upsert mode (upsert mode is disabled when update_on_existing_pk is false). If set to true, any source table record being inserted that is rejected for having primary key values that match those of an existing target table record will be ignored with no error generated. If false, the rejection of any source table record for having primary key values matching an existing target table record will result in an error being raised. If the specified table does not have a primary key or if upsert mode is in effect (update_on_existing_pk is true), then this option has no effect. Allowed values are:
true – Ignore source table records whose primary key values collide with those of target table records
false – Raise an error for any source table record whose primary key values collide with those of a target table record
The default value is ‘false’.
truncate_strings – If set to true, it allows inserting longer strings into smaller charN string columns by truncating the longer strings to fit. Allowed values are:
true
false
The default value is ‘false’.
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
table_name (str)
- info (dict of str to str) –
Additional information. The default value is an empty dict ( {} ).
Raises
- GPUdbException – –
Upon an error from the server.
- clear_statistics(column_name='', options={})[source]
Clears statistics (cardinality, mean value, etc.) for a column in a specified table.
Parameters
- column_name (str) –
Name of the column in input parameter table_name for which to clear statistics. The column must be from an existing table. An empty string clears statistics for all columns in the table. The default value is ‘’.
- options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- table_name (str) –
Value of input parameter table_name.
- column_name (str) –
Value of input parameter column_name.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- clear(authorization='', options={})[source]
Clears (drops) one or all tables in the database cluster. The operation is synchronous meaning that the table will be cleared before the function returns. The response payload returns the status of the operation along with the name of the table that was cleared.
Parameters
- authorization (str) –
No longer used. User can pass an empty string. The default value is ‘’.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
no_error_if_not_exists – If true and if the table specified in input parameter table_name does not exist no error is returned. If false and if the table specified in input parameter table_name does not exist then an error is returned. Allowed values are:
true
false
The default value is ‘false’.
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- table_name (str) –
Value of input parameter table_name for a given table, or ‘ALL CLEARED’ in case of clearing all tables.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- collect_statistics(column_names=None, options={})[source]
Collect statistics for a column(s) in a specified table.
Parameters
- column_names (list of str) –
List of one or more column names in input parameter table_name for which to collect statistics (cardinality, mean value, etc.). The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- table_name (str) –
Value of input parameter table_name.
- column_names (list of str) –
Value of input parameter column_names.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- create_projection(column_names=None, options={}, projection_name=None)[source]
Creates a new projection of an existing table. A projection represents a subset of the columns (potentially including derived columns) of a table.
For projection details and examples, see Projections. For limitations, see Projection Limitations and Cautions.
Window functions, which can perform operations like moving averages, are available through this endpoint as well as
GPUdb.get_records_by_column()
.A projection can be created with a different shard key than the source table. By specifying shard_key, the projection will be sharded according to the specified columns, regardless of how the source table is sharded. The source table can even be unsharded or replicated.
If input parameter table_name is empty, selection is performed against a single-row virtual table. This can be useful in executing temporal (NOW()), identity (USER()), or constant-based functions (GEODIST(-77.11, 38.88, -71.06, 42.36)).
Parameters
- column_names (list of str) –
List of columns from input parameter table_name to be included in the projection. Can include derived columns. Can be specified as aliased via the syntax ‘column_name as alias’. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter projection_name. If persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_projection_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the projection as part of input parameter projection_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the projection. If the schema is non-existent, it will be automatically created. The default value is ‘’.expression – An optional filter expression to be applied to the source table prior to the projection. The default value is ‘’.
is_replicated – If true then the projection will be replicated even if the source table is not. Allowed values are:
true
false
The default value is ‘false’.
offset – The number of initial results to skip (this can be useful for paging through the results). The default value is ‘0’.
limit – The number of records to keep. The default value is ‘-9999’.
order_by – Comma-separated list of the columns to be sorted by; e.g. ‘timestamp asc, x desc’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.
chunk_size – Indicates the number of records per chunk to be used for this projection.
chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for this projection.
chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for this projection.
create_indexes – Comma-separated list of columns on which to create indexes on the projection. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name.
ttl – Sets the TTL of the projection specified in input parameter projection_name.
shard_key – Comma-separated list of the columns to be sharded on; e.g. ‘column1, column2’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.
persist – If true, then the projection specified in input parameter projection_name will be persisted and will not expire unless a ttl is specified. If false, then the projection will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:
true
false
The default value is ‘false’.
preserve_dict_encoding – If true, then columns that were dict encoded in the source table will be dict encoded in the projection. Allowed values are:
true
false
The default value is ‘true’.
retain_partitions – Determines whether the created projection will retain the partitioning scheme from the source table. Allowed values are:
true
false
The default value is ‘false’.
partition_type – Partitioning scheme to use. Allowed values are:
RANGE – Use range partitioning.
INTERVAL – Use interval partitioning.
LIST – Use list partitioning.
HASH – Use hash partitioning.
SERIES – Use series partitioning.
partition_keys – Comma-separated list of partition keys, which are the columns or column expressions by which records will be assigned to partitions defined by partition_definitions.
partition_definitions – Comma-separated list of partition definitions, whose format depends on the choice of partition_type. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for example formats.
is_automatic_partition – If true, a new partition will be created for values which don’t fall into an existing partition. Currently only supported for list partitions. Allowed values are:
true
false
The default value is ‘false’.
view_id – ID of view of which this projection is a member. The default value is ‘’.
strategy_definition – The tier strategy for the table and its columns.
join_window_functions – If set, window functions which require a reshard will be computed separately and joined back together, if the width of the projection is greater than the join_window_functions_threshold. The default value is ‘true’.
join_window_functions_threshold – If the projection is greater than this width (in bytes), then window functions which require a reshard will be computed separately and joined back together. The default value is ‘’.
The default value is an empty dict ( {} ).
- projection_name (str) –
Name of the projection to be created, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- create_table_monitor(options={})[source]
Creates a monitor that watches for a single table modification event type (insert, update, or delete) on a particular table (identified by input parameter table_name) and forwards event notifications to subscribers via ZMQ. After this call completes, subscribe to the returned output parameter topic_id on the ZMQ table monitor port (default 9002). Each time an operation of the given type on the table completes, a multipart message is published for that topic; the first part contains only the topic ID, and each subsequent part contains one binary-encoded Avro object that corresponds to the event and can be decoded using output parameter type_schema. The monitor will continue to run (regardless of whether or not there are any subscribers) until deactivated with
GPUdb.clear_table_monitor()
.For more information on table monitors, see Table Monitors.
Parameters
- options (dict of str to str) –
Optional parameters. Allowed keys are:
event – Type of modification event on the target table to be monitored by this table monitor. Allowed values are:
insert – Get notifications of new record insertions. The new row images are forwarded to the subscribers.
update – Get notifications of update operations. The modified row count information is forwarded to the subscribers.
delete – Get notifications of delete operations. The deleted row count information is forwarded to the subscribers.
The default value is ‘insert’.
monitor_id – ID to use for this monitor instead of a randomly generated one
datasink_name – Name of an existing data sink to send change data notifications to
destination – Destination for the output data in format ‘destination_type://path[:port]’. Supported destination types are ‘http’, ‘https’ and ‘kafka’.
kafka_topic_name – Name of the Kafka topic to publish to if destination in input parameter options is specified and is a Kafka broker
increasing_column – Column on subscribed table that will increase for new records (e.g., TIMESTAMP).
expression – Filter expression to limit records for notification
refresh_method – Method controlling when the table monitor reports changes to the input parameter table_name. Allowed values are:
on_change – Report changes as they occur.
periodic – Report changes periodically at rate specified by refresh_period.
The default value is ‘on_change’.
refresh_period – When refresh_method is periodic, specifies the period in seconds at which changes are reported.
refresh_start_time – When refresh_method is periodic, specifies the first time at which changes are reported. Value is a datetime string with format ‘YYYY-MM-DD HH:MM:SS’.
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- topic_id (str) –
The ZMQ topic ID to subscribe to for table events.
- table_name (str) –
Value of input parameter table_name.
- type_schema (str) –
JSON Avro schema of the table, for use in decoding published records.
- info (dict of str to str) –
Additional information. Allowed keys are:
ttl – For insert_table/delete_table events, the ttl of the table.
insert_topic_id – The topic id for ‘insert’ event in input parameter options
update_topic_id – The topic id for ‘update’ event in input parameter options
delete_topic_id – The topic id for ‘delete’ event in input parameter options
insert_type_schema – The JSON Avro schema of the table in output parameter table_name
update_type_schema – The JSON Avro schema for ‘update’ events
delete_type_schema – The JSON Avro schema for ‘delete’ events
The default value is an empty dict ( {} ).
Raises
- GPUdbException – –
Upon an error from the server.
- delete_records(expressions=None, options={})[source]
Deletes record(s) matching the provided criteria from the given table. The record selection criteria can either be one or more input parameter expressions (matching multiple records), a single record identified by record_id options, or all records when using delete_all_records. Note that the three selection criteria are mutually exclusive. This operation cannot be run on a view. The operation is synchronous meaning that a response will not be available until the request is completely processed and all the matching records are deleted.
Parameters
- expressions (list of str) –
A list of the actual predicates, one for each select; format should follow the guidelines provided here. Specifying one or more input parameter expressions is mutually exclusive to specifying record_id in the input parameter options. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
global_expression – An optional global expression to reduce the search space of the input parameter expressions. The default value is ‘’.
record_id – A record ID identifying a single record, obtained at the time of
insertion of the record
or by callingGPUdb.get_records_from_collection()
with the return_record_ids option. This option cannot be used to delete records from replicated tables.delete_all_records – If set to true, all records in the table will be deleted. If set to false, then the option is effectively ignored. Allowed values are:
true
false
The default value is ‘false’.
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- count_deleted (long) –
Total number of records deleted across all expressions.
- counts_deleted (list of longs) –
Total number of records deleted per expression.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- filter(expression=None, options={}, view_name='')[source]
Filters data based on the specified expression. The results are stored in a result set with the given input parameter view_name.
For details see Expressions.
The response message contains the number of points for which the expression evaluated to be true, which is equivalent to the size of the result view.
Parameters
- expression (str) –
The select expression to filter the specified table. For details see Expressions.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.view_id – view this filtered-view is part of. The default value is ‘’.
ttl – Sets the TTL of the view specified in input parameter view_name.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_area(x_column_name=None, x_vector=None, y_column_name=None, y_vector=None, options={}, view_name='')[source]
Calculates which objects from a table are within a named area of interest (NAI/polygon). The operation is synchronous, meaning that a response will not be returned until all the matching objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input NAI restriction specification is created with the name input parameter view_name passed in as part of the input.
Parameters
- x_column_name (str) –
Name of the column containing the x values to be filtered.
- x_vector (list of floats) –
List of x coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- y_column_name (str) –
Name of the column containing the y values to be filtered.
- y_vector (list of floats) –
List of y coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_area_geometry(column_name=None, x_vector=None, y_vector=None, options={}, view_name='')[source]
Calculates which geospatial geometry objects from a table intersect a named area of interest (NAI/polygon). The operation is synchronous, meaning that a response will not be returned until all the matching objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input NAI restriction specification is created with the name input parameter view_name passed in as part of the input.
Parameters
- column_name (str) –
Name of the geospatial geometry column to be filtered.
- x_vector (list of floats) –
List of x coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- y_vector (list of floats) –
List of y coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] The schema for the newly created view. If the schema is non-existent, it will be automatically created.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_box(x_column_name=None, min_x=None, max_x=None, y_column_name=None, min_y=None, max_y=None, options={}, view_name='')[source]
Calculates how many objects within the given table lie in a rectangular box. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set which satisfies the input NAI restriction specification is also created when a input parameter view_name is passed in as part of the input payload.
Parameters
- x_column_name (str) –
Name of the column on which to perform the bounding box query. Must be a valid numeric column.
- min_x (float) –
Lower bound for the column chosen by input parameter x_column_name. Must be less than or equal to input parameter max_x.
- max_x (float) –
Upper bound for input parameter x_column_name. Must be greater than or equal to input parameter min_x.
- y_column_name (str) –
Name of a column on which to perform the bounding box query. Must be a valid numeric column.
- min_y (float) –
Lower bound for input parameter y_column_name. Must be less than or equal to input parameter max_y.
- max_y (float) –
Upper bound for input parameter y_column_name. Must be greater than or equal to input parameter min_y.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_box_geometry(column_name=None, min_x=None, max_x=None, min_y=None, max_y=None, options={}, view_name='')[source]
Calculates which geospatial geometry objects from a table intersect a rectangular box. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set which satisfies the input NAI restriction specification is also created when a input parameter view_name is passed in as part of the input payload.
Parameters
- column_name (str) –
Name of the geospatial geometry column to be filtered.
- min_x (float) –
Lower bound for the x-coordinate of the rectangular box. Must be less than or equal to input parameter max_x.
- max_x (float) –
Upper bound for the x-coordinate of the rectangular box. Must be greater than or equal to input parameter min_x.
- min_y (float) –
Lower bound for the y-coordinate of the rectangular box. Must be less than or equal to input parameter max_y.
- max_y (float) –
Upper bound for the y-coordinate of the rectangular box. Must be greater than or equal to input parameter min_y.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_geometry(column_name=None, input_wkt='', operation=None, options={}, view_name='')[source]
Applies a geometry filter against a geospatial geometry column in a given table or view. The filtering geometry is provided by input parameter input_wkt.
Parameters
- column_name (str) –
Name of the column to be used in the filter. Must be a geospatial geometry column.
- input_wkt (str) –
A geometry in WKT format that will be used to filter the objects in input parameter table_name. The default value is ‘’.
- operation (str) –
The geometric filtering operation to perform. Allowed values are:
contains – Matches records that contain the given WKT in input parameter input_wkt, i.e. the given WKT is within the bounds of a record’s geometry.
crosses – Matches records that cross the given WKT.
disjoint – Matches records that are disjoint from the given WKT.
equals – Matches records that are the same as the given WKT.
intersects – Matches records that intersect the given WKT.
overlaps – Matches records that overlap the given WKT.
touches – Matches records that touch the given WKT.
within – Matches records that are within the given WKT.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_list(column_values_map=None, options={}, view_name='')[source]
Calculates which records from a table have values in the given list for the corresponding column. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input filter specification is also created if a input parameter view_name is passed in as part of the request.
For example, if a type definition has the columns ‘x’ and ‘y’, then a filter by list query with the column map {“x”:[“10.1”, “2.3”], “y”:[“0.0”, “-31.5”, “42.0”]} will return the count of all data points whose x and y values match both in the respective x- and y-lists, e.g., “x = 10.1 and y = 0.0”, “x = 2.3 and y = -31.5”, etc. However, a record with “x = 10.1 and y = -31.5” or “x = 2.3 and y = 0.0” would not be returned because the values in the given lists do not correspond.
Parameters
- column_values_map (dict of str to lists of str) –
List of values for the corresponding column in the table
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.filter_mode – String indicating the filter mode, either ‘in_list’ or ‘not_in_list’. Allowed values are:
in_list – The filter will match all items that are in the provided list(s).
not_in_list – The filter will match all items that are not in the provided list(s).
The default value is ‘in_list’.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_radius(x_column_name=None, x_center=None, y_column_name=None, y_center=None, radius=None, options={}, view_name='')[source]
Calculates which objects from a table lie within a circle with the given radius and center point (i.e. circular NAI). The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input circular NAI restriction specification is also created if a input parameter view_name is passed in as part of the request.
For track data, all track points that lie within the circle plus one point on either side of the circle (if the track goes beyond the circle) will be included in the result.
Parameters
- x_column_name (str) –
Name of the column to be used for the x-coordinate (the longitude) of the center.
- x_center (float) –
Value of the longitude of the center. Must be within [-180.0, 180.0]. The minimum allowed value is -180. The maximum allowed value is 180.
- y_column_name (str) –
Name of the column to be used for the y-coordinate-the latitude-of the center.
- y_center (float) –
Value of the latitude of the center. Must be within [-90.0, 90.0]. The minimum allowed value is -90. The maximum allowed value is 90.
- radius (float) –
The radius of the circle within which the search will be performed. Must be a non-zero positive value. It is in meters; so, for example, a value of ‘42000’ means 42 km. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema which is to contain the newly created view. If the schema is non-existent, it will be automatically created.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_radius_geometry(column_name=None, x_center=None, y_center=None, radius=None, options={}, view_name='')[source]
Calculates which geospatial geometry objects from a table intersect a circle with the given radius and center point (i.e. circular NAI). The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input circular NAI restriction specification is also created if a input parameter view_name is passed in as part of the request.
Parameters
- column_name (str) –
Name of the geospatial geometry column to be filtered.
- x_center (float) –
Value of the longitude of the center. Must be within [-180.0, 180.0]. The minimum allowed value is -180. The maximum allowed value is 180.
- y_center (float) –
Value of the latitude of the center. Must be within [-90.0, 90.0]. The minimum allowed value is -90. The maximum allowed value is 90.
- radius (float) –
The radius of the circle within which the search will be performed. Must be a non-zero positive value. It is in meters; so, for example, a value of ‘42000’ means 42 km. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_range(column_name=None, lower_bound=None, upper_bound=None, options={}, view_name='')[source]
Calculates which objects from a table have a column that is within the given bounds. An object from the table identified by input parameter table_name is added to the view input parameter view_name if its column is within [input parameter lower_bound, input parameter upper_bound] (inclusive). The operation is synchronous. The response provides a count of the number of objects which passed the bound filter. Although this functionality can also be accomplished with the standard filter function, it is more efficient.
For track objects, the count reflects how many points fall within the given bounds (which may not include all the track points of any given track).
Parameters
- column_name (str) –
Name of a column on which the operation would be applied.
- lower_bound (float) –
Value of the lower bound (inclusive).
- upper_bound (float) –
Value of the upper bound (inclusive).
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_series(track_id=None, target_track_ids=None, options={}, view_name='')[source]
Filters objects matching all points of the given track (works only on track type data). It allows users to specify a particular track to find all other points in the table that fall within specified ranges (spatial and temporal) of all points of the given track. Additionally, the user can specify another track to see if the two intersect (or go close to each other within the specified ranges). The user also has the flexibility of using different metrics for the spatial distance calculation: Euclidean (flat geometry) or Great Circle (spherical geometry to approximate the Earth’s surface distances). The filtered points are stored in a newly created result set. The return value of the function is the number of points in the resultant set (view).
This operation is synchronous, meaning that a response will not be returned until all the objects are fully available.
Parameters
- track_id (str) –
The ID of the track which will act as the filtering points. Must be an existing track within the given table.
- target_track_ids (list of str) –
Up to one track ID to intersect with the “filter” track. If any provided, it must be an valid track ID within the given set. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.spatial_radius – A positive number passed as a string representing the radius of the search area centered around each track point’s geospatial coordinates. The value is interpreted in meters. Required parameter. The minimum allowed value is ‘0’.
time_radius – A positive number passed as a string representing the maximum allowable time difference between the timestamps of a filtered object and the given track’s points. The value is interpreted in seconds. Required parameter. The minimum allowed value is ‘0’.
spatial_distance_metric – A string representing the coordinate system to use for the spatial search criteria. Acceptable values are ‘euclidean’ and ‘great_circle’. Optional parameter; default is ‘euclidean’. Allowed values are:
euclidean
great_circle
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_string(expression=None, mode=None, column_names=None, options={}, view_name='')[source]
Calculates which objects from a table or view match a string expression for the given string columns. Setting case_sensitive can modify case sensitivity in matching for all modes except search. For search mode details and limitations, see Full Text Search.
Parameters
- expression (str) –
The expression with which to filter the table.
- mode (str) –
The string filtering mode to apply. See below for details. Allowed values are:
search – Full text search query with wildcards and boolean operators. Note that for this mode, no column can be specified in input parameter column_names; all string columns of the table that have text search enabled will be searched.
equals – Exact whole-string match (accelerated).
contains – Partial substring match (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.
starts_with – Strings that start with the given expression (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.
regex – Full regular expression search (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.
- column_names (list of str) –
List of columns on which to apply the filter. Ignored for search mode. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.case_sensitive – If false then string filtering will ignore case. Does not apply to search mode. Allowed values are:
true
false
The default value is ‘true’.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_table(column_name=None, source_table_name=None, source_table_column_name=None, options={}, view_name='')[source]
Filters objects in one table based on objects in another table. The user must specify matching column types from the two tables (i.e. the target table from which objects will be filtered and the source table based on which the filter will be created); the column names need not be the same. If a input parameter view_name is specified, then the filtered objects will then be put in a newly created view. The operation is synchronous, meaning that a response will not be returned until all objects are fully available in the result view. The return value contains the count (i.e. the size) of the resulting view.
Parameters
- column_name (str) –
Name of the column by whose value the data will be filtered from the table designated by input parameter table_name.
- source_table_name (str) –
Name of the table whose data will be compared against in the table called input parameter table_name, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.
- source_table_column_name (str) –
Name of the column in the input parameter source_table_name whose values will be used as the filter for table input parameter table_name. Must be a geospatial geometry column if in ‘spatial’ mode; otherwise, Must match the type of the input parameter column_name.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.filter_mode – String indicating the filter mode, either in_table or not_in_table. Allowed values are:
in_table
not_in_table
The default value is ‘in_table’.
mode – Mode - should be either spatial or normal. Allowed values are:
normal
spatial
The default value is ‘normal’.
buffer – Buffer size, in meters. Only relevant for spatial mode. The default value is ‘0’.
buffer_method – Method used to buffer polygons. Only relevant for spatial mode. Allowed values are:
normal
geos – Use geos 1 edge per corner algorithm
The default value is ‘normal’.
max_partition_size – Maximum number of points in a partition. Only relevant for spatial mode. The default value is ‘0’.
max_partition_score – Maximum number of points * edges in a partition. Only relevant for spatial mode. The default value is ‘8000000’.
x_column_name – Name of column containing x value of point being filtered in spatial mode. The default value is ‘x’.
y_column_name – Name of column containing y value of point being filtered in spatial mode. The default value is ‘y’.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- filter_by_value(is_string=None, value=0, value_str='', column_name=None, options={}, view_name='')[source]
Calculates which objects from a table has a particular value for a particular column. The input parameters provide a way to specify either a String or a Double valued column and a desired value for the column on which the filter is performed. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new result view which satisfies the input filter restriction specification is also created with a view name passed in as part of the input payload. Although this functionality can also be accomplished with the standard filter function, it is more efficient.
Parameters
- is_string (bool) –
Indicates whether the value being searched for is string or numeric.
- value (float) –
The value to search for. The default value is 0.
- value_str (str) –
The string value to search for. The default value is ‘’.
- column_name (str) –
Name of a column on which the filter by value would be applied.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:
true
false
The default value is ‘false’.
collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use
GPUdb.create_schema()
to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.
The default value is an empty dict ( {} ).
- view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.
Returns
A read-only GPUdbTable object.
Raises
- GPUdbException – –
Upon an error from the server.
- lock_table(lock_type='status', options={})[source]
Manages global access to a table’s data. By default a table has a input parameter lock_type of read_write, indicating all operations are permitted. A user may request a read_only or a write_only lock, after which only read or write operations, respectively, are permitted on the table until the lock is removed. When input parameter lock_type is no_access then no operations are permitted on the table. The lock status can be queried by setting input parameter lock_type to status.
Parameters
- lock_type (str) –
The type of lock being applied to the table. Setting it to status will return the current lock status of the table without changing it. Allowed values are:
status – Show locked status
no_access – Allow no read/write operations
read_only – Allow only read operations
write_only – Allow only write operations
read_write – Allow all read/write operations
The default value is ‘status’.
- options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- lock_type (str) –
Returns the lock state of the table.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- show_table(options={})[source]
Retrieves detailed information about a table, view, or schema, specified in input parameter table_name. If the supplied input parameter table_name is a schema the call can return information about either the schema itself or the tables and views it contains. If input parameter table_name is empty, information about all schemas will be returned.
If the option get_sizes is set to true, then the number of records in each table is returned (in output parameter sizes and output parameter full_sizes), along with the total number of objects across all requested tables (in output parameter total_size and output parameter total_full_size).
For a schema, setting the show_children option to false returns only information about the schema itself; setting show_children to true returns a list of tables and views contained in the schema, along with their corresponding detail.
To retrieve a list of every table, view, and schema in the database, set input parameter table_name to ‘*’ and show_children to true. When doing this, the returned output parameter total_size and output parameter total_full_size will not include the sizes of non-base tables (e.g., filters, views, joins, etc.).
Parameters
- options (dict of str to str) –
Optional parameters. Allowed keys are:
dependencies – Include view dependencies in the output. Allowed values are:
true
false
The default value is ‘false’.
force_synchronous – If true then the table sizes will wait for read lock before returning. Allowed values are:
true
false
The default value is ‘true’.
get_cached_sizes – If true then the number of records in each table, along with a cumulative count, will be returned; blank, otherwise. This version will return the sizes cached at rank 0, which may be stale if there is a multihead insert occuring. Allowed values are:
true
false
The default value is ‘false’.
get_sizes – If true then the number of records in each table, along with a cumulative count, will be returned; blank, otherwise. Allowed values are:
true
false
The default value is ‘false’.
no_error_if_not_exists – If false will return an error if the provided input parameter table_name does not exist. If true then it will return an empty result. Allowed values are:
true
false
The default value is ‘false’.
show_children – If input parameter table_name is a schema, then true will return information about the tables and views in the schema, and false will return information about the schema itself. If input parameter table_name is a table or view, show_children must be false. If input parameter table_name is empty, then show_children must be true. Allowed values are:
true
false
The default value is ‘true’.
get_column_info – If true then column info (memory usage, etc) will be returned. Allowed values are:
true
false
The default value is ‘false’.
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- table_name (str) –
Value of input parameter table_name.
- table_names (list of str) –
If input parameter table_name is a table or view, then the single element of the array is input parameter table_name. If input parameter table_name is a schema and show_children is set to true, then this array is populated with the names of all tables and views in the given schema; if show_children is false, then this array will only include the schema name itself. If input parameter table_name is an empty string, then the array contains the names of all tables in the user’s default schema.
- table_descriptions (list of lists of str) –
List of descriptions for the respective tables in output parameter table_names. Allowed values are:
COLLECTION
JOIN
LOGICAL_EXTERNAL_TABLE
LOGICAL_VIEW
MATERIALIZED_EXTERNAL_TABLE
MATERIALIZED_VIEW
MATERIALIZED_VIEW_MEMBER
MATERIALIZED_VIEW_UNDER_CONSTRUCTION
REPLICATED
RESULT_TABLE
SCHEMA
VIEW
- type_ids (list of str) –
Type ids of the respective tables in output parameter table_names.
- type_schemas (list of str) –
Type schemas of the respective tables in output parameter table_names.
- type_labels (list of str) –
Type labels of the respective tables in output parameter table_names.
- properties (list of dicts of str to lists of str) –
Property maps of the respective tables in output parameter table_names.
- additional_info (list of dicts of str to str) –
Additional information about the respective tables in output parameter table_names. Allowed keys are:
request_avro_type – Method by which this table was created. Allowed values are:
create_table
create_projection
create_union
request_avro_json – The JSON representation of request creating this table. The default value is ‘’.
protected – No longer used. Indicated whether the respective table was protected or not. Allowed values are:
true
false
record_bytes – The number of in-memory bytes per record which is the sum of the byte sizes of all columns with property ‘data’.
total_bytes – The total size in bytes of all data stored in the table.
collection_names – [DEPRECATED–use schema_name instead] This will now contain the name of the schema for the table. There can only be one schema for a table.
schema_name – The name of the schema for the table. There can only be one schema for a table.
table_ttl – The value of the time-to-live setting. Not present for schemas.
remaining_table_ttl – The remaining time-to-live, in minutes, before the respective table expires (-1 if it will never expire). Not present for schemas.
primary_key_type – The primary key type of the table (if it has a primary key). Allowed values are:
memory – In-memory primary key
disk – On-disk primary key
foreign_keys – Semicolon-separated list of foreign keys, of the format ‘source_column references target_table(primary_key_column)’. Not present for schemas. The default value is ‘’.
foreign_shard_key – Foreign shard key description of the format: <fk_foreign_key> references <pk_column_name> from <pk_table_name>(<pk_primary_key>). Not present for schemas. The default value is ‘’.
partition_type – Partitioning scheme used for this table. Allowed values are:
RANGE – Using range partitioning
INTERVAL – Using interval partitioning
LIST – Using manual list partitioning
HASH – Using hash partitioning.
SERIES – Using series partitioning.
NONE – Using no partitioning
The default value is ‘NONE’.
partition_keys – Comma-separated list of partition keys. The default value is ‘’.
partition_definitions – Comma-separated list of partition definitions, whose format depends on the partition_type. See partitioning documentation for details. The default value is ‘’.
is_automatic_partition – True if partitions will be created for LIST VALUES which don’t fall into existing partitions. The default value is ‘’.
attribute_indexes – Semicolon-separated list of indexes. For column (attribute) indexes, only the indexed column name will be listed. For other index types, the index type will be listed with the colon-delimited indexed column(s) and the comma-delimited index option(s) using the form: <index_type>@<column_list>@<column_options>. Not present for schemas. The default value is ‘’.
compressed_columns – No longer supported. The default value is ‘’.
column_info – JSON-encoded string representing a map of column name to information including memory usage if the get_column_info option is true. The default value is ‘’.
global_access_mode – Returns the global access mode (i.e. lock status) for the table. Allowed values are:
no_access – No read/write operations are allowed on this table.
read_only – Only read operations are allowed on this table.
write_only – Only write operations are allowed on this table.
read_write – All read/write operations are allowed on this table.
view_table_name – For materialized view the name of the view this member table is part of - if same as the table_name then this is the root of the view. The default value is ‘’.
is_view_persisted – True if the view named view_table_name is persisted - reported for each view member. Means method of recreating this member is saved - not the members data. The default value is ‘’.
is_dirty – True if some input table of the materialized view that affects this member table has been modified since the last refresh. The default value is ‘’.
refresh_method – For materialized view current refresh_method - one of manual, periodic, on_change. The default value is ‘’.
refresh_start_time – For materialized view with periodic refresh_method the current intial datetime string that periodic refreshes began. The default value is ‘’.
refresh_stop_time – Time at which the periodic view refresh stops. The default value is ‘’.
refresh_period – For materialized view with periodic refresh_method the current refresh period in seconds. The default value is ‘’.
last_refresh_time – For materialized view the a datatime string indicating the last time the view was refreshed. The default value is ‘’.
next_refresh_time – For materialized view with periodic refresh_method a datetime string indicating the next time the view is to be refreshed. The default value is ‘’.
user_chunk_size – User-specified number of records per chunk, if provided at table creation time. The default value is ‘’.
user_chunk_column_max_memory – User-specified target max bytes per column in a chunk, if provided at table creation time. The default value is ‘’.
user_chunk_max_memory – User-specified target max bytes for all columns in a chunk, if provided at table creation time. The default value is ‘’.
owner_resource_group – Name of the owner resource group. The default value is ‘’.
alternate_shard_keys – Semicolon-separated list of shard keys that were equated in joins (applicable for join tables). The default value is ‘’.
datasource_subscriptions – Semicolon-separated list of datasource names the table has subscribed to. The default value is ‘’.
null_modifying_columns – Comma-separated list of null modifying column names. The default value is ‘’.
- sizes (list of longs) –
If get_sizes is true, an array containing the number of records of each corresponding table in output parameter table_names. Otherwise, an empty array.
- full_sizes (list of longs) –
If get_sizes is true, an array containing the number of records of each corresponding table in output parameter table_names (same values as output parameter sizes). Otherwise, an empty array.
- join_sizes (list of floats) –
If get_sizes is true, an array containing the number of unfiltered records in the cross product of the sub-tables of each corresponding join-table in output parameter table_names. For simple tables, this number will be the same as output parameter sizes. For join-tables, this value gives the number of joined-table rows that must be processed by any aggregate functions operating on the table. Otherwise, (if get_sizes is false), an empty array.
- total_size (long) –
If get_sizes is true, the sum of the elements of output parameter sizes. Otherwise, -1.
- total_full_size (long) –
If get_sizes is true, the sum of the elements of output parameter full_sizes (same value as output parameter total_size). Otherwise, -1.
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- update_records(expressions=None, new_values_maps=None, records_to_insert=[], records_to_insert_str=[], record_encoding='binary', options={})[source]
Runs multiple predicate-based updates in a single call. With the list of given expressions, any matching record’s column values will be updated as provided in input parameter new_values_maps. There is also an optional ‘upsert’ capability where if a particular predicate doesn’t match any existing record, then a new record can be inserted.
Note that this operation can only be run on an original table and not on a result view.
This operation can update primary key values. By default only ‘pure primary key’ predicates are allowed when updating primary key values. If the primary key for a table is the column ‘attr1’, then the operation will only accept predicates of the form: “attr1 == ‘foo’” if the attr1 column is being updated. For a composite primary key (e.g. columns ‘attr1’ and ‘attr2’) then this operation will only accept predicates of the form: “(attr1 == ‘foo’) and (attr2 == ‘bar’)”. Meaning, all primary key columns must appear in an equality predicate in the expressions. Furthermore each ‘pure primary key’ predicate must be unique within a given request. These restrictions can be removed by utilizing some available options through input parameter options.
The update_on_existing_pk option specifies the record primary key collision policy for tables with a primary key, while ignore_existing_pk specifies the record primary key collision error-suppression policy when those collisions result in the update being rejected. Both are ignored on tables with no primary key.
Parameters
- expressions (list of str) –
A list of the actual predicates, one for each update; format should follow the guidelines
here
. The user can provide a single element (which will be automatically promoted to a list internally) or a list.- new_values_maps (list of dicts of str to optional str) –
List of new values for the matching records. Each element is a map with (key, value) pairs where the keys are the names of the columns whose values are to be updated; the values are the new values. The number of elements in the list should match the length of input parameter expressions. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- records_to_insert (list of bytes) –
An optional list of new binary-avro encoded records to insert, one for each update. If one of input parameter expressions does not yield a matching record to be updated, then the corresponding element from this list will be added to the table. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- records_to_insert_str (list of str) –
An optional list of JSON encoded objects to insert, one for each update, to be added if the particular update did not match any objects. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- record_encoding (str) –
Identifies which of input parameter records_to_insert and input parameter records_to_insert_str should be used. Allowed values are:
binary
json
The default value is ‘binary’.
- options (dict of str to str) –
Optional parameters. Allowed keys are:
global_expression – An optional global expression to reduce the search space of the predicates listed in input parameter expressions. The default value is ‘’.
bypass_safety_checks – When set to true, all predicates are available for primary key updates. Keep in mind that it is possible to destroy data in this case, since a single predicate may match multiple objects (potentially all of records of a table), and then updating all of those records to have the same primary key will, due to the primary key uniqueness constraints, effectively delete all but one of those updated records. Allowed values are:
true
false
The default value is ‘false’.
update_on_existing_pk – Specifies the record collision policy for updating a table with a primary key. There are two ways that a record collision can occur.
The first is an “update collision”, which happens when the update changes the value of the updated record’s primary key, and that new primary key already exists as the primary key of another record in the table.
The second is an “insert collision”, which occurs when a given filter in input parameter expressions finds no records to update, and the alternate insert record given in input parameter records_to_insert (or input parameter records_to_insert_str) contains a primary key matching that of an existing record in the table.
If update_on_existing_pk is set to true, “update collisions” will result in the existing record collided into being removed and the record updated with values specified in input parameter new_values_maps taking its place; “insert collisions” will result in the collided-into record being updated with the values in input parameter records_to_insert/input parameter records_to_insert_str (if given).
If set to false, the existing collided-into record will remain unchanged, while the update will be rejected and the error handled as determined by ignore_existing_pk. If the specified table does not have a primary key, then this option has no effect. Allowed values are:
true – Overwrite the collided-into record when updating a record’s primary key or inserting an alternate record causes a primary key collision between the record being updated/inserted and another existing record in the table
false – Reject updates which cause primary key collisions between the record being updated/inserted and an existing record in the table
The default value is ‘false’.
ignore_existing_pk – Specifies the record collision error-suppression policy for updating a table with a primary key, only used when primary key record collisions are rejected (update_on_existing_pk is false). If set to true, any record update that is rejected for resulting in a primary key collision with an existing table record will be ignored with no error generated. If false, the rejection of any update for resulting in a primary key collision will cause an error to be reported. If the specified table does not have a primary key or if update_on_existing_pk is true, then this option has no effect. Allowed values are:
true – Ignore updates that result in primary key collisions with existing records
false – Treat as errors any updates that result in primary key collisions with existing records
The default value is ‘false’.
update_partition – Force qualifying records to be deleted and reinserted so their partition membership will be reevaluated. Allowed values are:
true
false
The default value is ‘false’.
truncate_strings – If set to true, any strings which are too long for their charN string fields will be truncated to fit. Allowed values are:
true
false
The default value is ‘false’.
use_expressions_in_new_values_maps – When set to true, all new values in input parameter new_values_maps are considered as expression values. When set to false, all new values in input parameter new_values_maps are considered as constants. NOTE: When true, string constants will need to be quoted to avoid being evaluated as expressions. Allowed values are:
true
false
The default value is ‘false’.
record_id – ID of a single record to be updated (returned in the call to
GPUdb.insert_records()
orGPUdb.get_records_from_collection()
).
The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
- count_updated (long) –
Total number of records updated.
- counts_updated (list of longs) –
Total number of records updated per predicate in input parameter expressions.
- count_inserted (long) –
Total number of records inserted (due to expressions not matching any existing records).
- counts_inserted (list of longs) –
Total number of records inserted per predicate in input parameter expressions (will be either 0 or 1 for each expression).
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- update_records_by_series(world_table_name=None, view_name='', reserved=[], options={})[source]
Updates the view specified by input parameter table_name to include full series (track) information from the input parameter world_table_name for the series (tracks) present in the input parameter view_name.
Parameters
- world_table_name (str) –
Name of the table containing the complete series (track) information, in [schema_name.]table_name format, using standard name resolution rules.
- view_name (str) –
Name of the view containing the series (tracks) which have to be updated, in [schema_name.]view_name format, using standard name resolution rules. The default value is ‘’.
- reserved (list of str) –
The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.
- options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).
Returns
The response from the server which is a dict containing the –
following entries–
count (int)
- info (dict of str to str) –
Additional information.
Raises
- GPUdbException – –
Upon an error from the server.
- _type (