Class GPUdb

class gpudb.GPUdb(host=None, options=None, *args, **kwargs)[source]

This is the main class to be used to provide the client functionality to interact with the server.

Usage patterns

Secured setup (Default)

This code given below will set up a secured connection. The property ‘skip_ssl_cert_verification’ is set to ‘False’ by default. SSL certificate check will be enforced by default.

options = GPUdb.Options()
options.username = "user"
options.password = "password"
options.logging_level = "debug"

gpudb = GPUdb(host='https://your_server_ip_or_FQDN:8082/gpudb-0', options=options )

Unsecured setup

The code given below will set up an unsecured connection to the server. The property ‘skip_ssl_cert_verification’ has been set explicitly to ‘True’. So, irrespective of whether an SSL setup is there or not all certificate checks will be bypassed.

options = GPUdb.Options()
options.username = "user"
options.password = "password"
options.skip_ssl_cert_verification = True
options.logging_level = "debug"

gpudb = GPUdb(host='https://your_server_ip_or_FQDN:8082/gpudb-0', options=options )

Another way of setting up an unsecured connection is as given by the code below. In this case, the URL is not a secured one so no SSL setup comes into play.

options = GPUdb.Options()
options.username = "user"
options.password = "password"
options.logging_level = "debug"

gpudb = GPUdb(host='http://your_server_ip_or_FQDN:9191', options=options )

Construct a new GPUdb client instance. This object communicates to the database server at the given address. This class implements HA failover, which means that upon certain error conditions, this class will try to establish connection with one of the other clusters (specified by the user or known to the ring) to continue service. There are several options related to how to control that in the GPUdb.Options class that can be controlled via options.

Note

Please read the docstring of options about backward- compatibility related notes.

Parameters

host (str or list of str) –
The URL(s) of the GPUdb server. May be provided as a comma separated string or a list of strings containing head or worker rank URLs of the server clusters. Must be full and valid URLs. Example: “https://domain.com:port/path/”. If only a single URL or host is given, and no primary_host is explicitly specified via the options, then the given URL will be used as the primary URL. Default is ‘http://127.0.0.1:9191’ (implemented internally).

Note that in versions 7.0 and prior, the URL also allowed username:password@ in front of the hostname. That is now deprecated. For now, anything in the hostname separated by the @ symbol will be discarded. (But the constructor will still function). Please use the appropriate properties of the options argument to set the username and password.

options (GPUdb.Options or dict) –
Optional arguments for creating this GPUdb object. To be backward compatible to 7.0 versions, keyword arguments will be honored (only if no options is given). I.e., if options is given, no positional or keyword argument can be given. See Options for all available properties.

See also

GPUdb.Options

class HASynchronicityMode(value, names=None, module=None, type=None, start=1)[source]

Inner enumeration class to represent the high-availability synchronicity override mode that is applied to each endpoint call. Available enumerations are:

DEFAULT – No override; defer to the HA process for synchronizing endpoints (which has different logic for different endpoints). This is the default mode.
NONE – Do not replicate the endpoint calls to the backup cluster.
SYNCHRONOUS – Synchronize all endpoint calls
ASYNCHRONOUS – Do NOT synchronize any endpoint call

class HAFailoverOrder(value, names=None, module=None, type=None, start=1)[source]

Inner enumeration class to represent the high-availability failover order that is applied to ring resiliency or inter-cluster failover. The order dictates in which pattern backup clusters will be chosen when a failover needs to happen in the client API. Available enumerations are:

RANDOM – Randomly choose the backup cluster from the available clusters. This is the default mode.
SEQUENTIAL – Choose the cluster sequentially from the list of clusters (the union of the user given clusters and auto-discovered clusters).

class Options(options=None)[source]

Encapsulates the various options used to create a GPUdb object. The same object can be used on multiple GPUdb client handles and state modifications are chained together:

For backward compatibility, we will support the following options from the 7.0 GPUdb keyword arguments and map them to the following properties:

connection -> protocol
no_init_db_contact -> disable_auto_discovery

opts = GPUdb.Options.default()
opts.disable_failover = True
db1 = gpudb.GPUdb( host = "http://1.2.3.4:9191",
                   options = opts )
opts.primary_host = "http://7.8.9.0:9191"
db2 = gpudb.GPUdb( host = "http://1.2.3.4:9191",
                   options = opts )

Create a default set of options for GPUdb object creation.

Parameters

options (dict or GPUdb.Options) –
Optional dictionary with options already loaded. If a GPUdb.Options object, then this will work like a copy constructor and make a full copy of the input argument.

Returns

An Options object.

as_json() → str[source]: Return the options as a JSON. Will stringify parameters as needed. For example, GPUdb.URL and GPUdb.HAFailoverOrder objects will be stringified.

property cluster_reconnect_count

Gets the number of times the API tries to reconnect to the same cluster (when a failover event has been triggered), before actually failing over to any available backup cluster. Does not apply when only a single cluster is available.

This method is now deprecated.

property disable_auto_discovery: Gets the property indicating whether to disable automatic discovery of backup clusters or worker rank URLs. If set to true, then the GPUdb object will not connect to the database at initialization time, and will only work with the URLs given.

property disable_failover: Gets the whether failover upon failures is to be completely disabled.

property encoding

Gets the encoding used by the client. Supported values are:

binary
snappy
json

property ha_failover_order: Gets the current inter-cluster failover order. This indicates in which order–sequential or random–the backup clusters would be used when an inter-cluster failover event happens. Default is RANDOM.

property host_manager_port: Gets the host manager port number. Some endpoints are supported only at the host manager, rather than the head node of the database.

property hostname_regex: Gets the regex pattern to be used to filter URLs of the servers. If null, then the first URL encountered per rank will be used. Returns a compiled regex object or None if no regex is being used.

property http_headers: Gets the custom HTTP headers that will be used per HTTP endpoint submission by the GPUdb to the server. The header keys and values must be strings. Returns a deep copy.

add_http_header(header, value)[source]

Adds a custom HTTP header to the set of ones which will be used per HTTP endpoint submission by the GPUdb to the server. The header key and value must be strings. Also, the following headers are protected, and cannot be overridden by the user:

“Accept”
“Authorization”
“Content-type”
“X-Kinetica-Group”

Parameters

header (str) –
The single header to add.

value (str) –
Value of the single header value to add.

property initial_connection_attempt_timeout

Gets the timeout used when trying to establish a connection to the database at GPUdb initialization. The value is given in milliseconds and the default is 0. 0 indicates no retry will be done; instead, the user given URLs will be stored without farther discovery.

If multiple URLs are given by the user, then API will try all of them once before retrying or giving up. When this timeout is set to a non-zero value, and the first attempt failed, then the API will wait (sleep) for a certain amount of time and try again. Upon consecutive failures, the sleep amount will be doubled. So, before the first retry (i.e. the second attempt), the API will sleep for one minute. Before the second retry, the API will sleep for two minutes, the next sleep interval would be four minutes, and onward.

property server_connection_timeout

Gets the timeout used when trying to establish a connection to the database at GPUdb initialization. The value is given in milliseconds and the default is 0. 0 indicates no retry will be done; instead, the user given URLs will be stored without farther discovery.

If multiple URLs are given by the user, then API will try all of them once before retrying or giving up. When this timeout is set to a non-zero value, and the first attempt failed, then the API will wait (sleep) for a certain amount of time and try again. Upon consecutive failures, the sleep amount will be doubled. So, before the first retry (i.e. the second attempt), the API will sleep for one minute. Before the second retry, the API will sleep for two minutes, the next sleep interval would be four minutes, and onward.

property intra_cluster_failover_timeout

Gets the timeout used when trying to recover from an intra-cluster failover event. The value is given in seconds. The default is equivalent to 5 minutes.

This method is now deprecated.

property logging_level: Gets the logging level that will be used by the API. By default, logging is set by the root logger (possibly set by the end user application). If the user sets the logging level explicitly via this options class, then the programmatically set level will be used instead.

property password: Gets the password to be used for authentication to GPUdb.

property primary_host: Gets the hostname of the primary cluster of the HA environment.

property protocol: Gets the protocol being used by the client.

property skip_ssl_cert_verification: Gets the value of the property indicating whether to verify the SSL certificate for HTTPS connections.

property timeout: Gets the timeout value, in milliseconds, after which a lack of response from the GPUdb server will result in requests being aborted. A timeout of zero is interpreted as an infinite timeout. Note that this applies independently to various stages of communication, so overall a request may run for longer than this without being aborted.

property username: Gets the username to be used for authentication to GPUdb.

property oauth_token: Gets the OAuth2 token to be used for authentication to GPUdb.

class Version(version_str)[source]

An internal class to handle Kinetica Version (client API or server).

Takes in a string containing a Kinetica version and creates a GPUdb.Version object from it.

Parameters

version_str (str) –
A string containing the Kinetica version (client or server). Expect at least four components separated by a period (.). There may be additional parts after the fourth component that will be discarded.

property first: Read-only property–first component of the version.

property second: Read-only property–second component of the version.

property third: Read-only property–third component of the version.

property fourth: Read-only property–fourth component of the version.

is_version_compatible(other)[source]

Given another version, are the two compatible based on just the first two components taken into account? We don’t take the 3rd and 4th components into account since the server and the API ought to work as long as the first two components match.

TODO: Possibly add another optional parameter for taking how many: components to take into account when checking for compatibility.

Parameters

other (GPUdb.Version) –
The other version object.

Returns

True if the two are compatible, False otherwise.

class ValidateUrl[source]

An internal class to handle connection URL parsing

static validate_url(url=None)[source]

Takes in a string URL, validates it, adds defaults where necessary, and returns a tuple with the URL components.

Parameters

url (str) –
A string containing a Kinetica connection URL.

Returns

A two-part tuple, the first is whether or not the URL was able –

to be parsed, and the second is either the error message (if the –

URL couldn’t be parsed) or a 7-part tuple containing the –

parsed URL and its components (if it could be parsed)

* Full URL –

* Protocol (HTTP,HTTPS) –

* Hostname –

* Port –

* Path –

* Username (if specified in the URL) –

* Password (if specified in the URL)

class URL(url=None, port=None, protocol=None, accept_full_urls_only=False)[source]

An internal class to handle URLs. Stores the hostname/IP address, port, protocol, path, and the full URL (as a string).

Takes in a string containing a full URL, or another URL object, and creates a URL object from it.

Parameters

url (str or GPUdb.URL) –
Either a hostname/URL string or another GPUdb.URL object to create this object for. Note that the port is not a mandatory part of the URL.

port (int) –
Optional port. If specified, will be appended to any host specified and will override the port of any URL specified.

protocol (str) –
Optional protocol. If specified, will be prepended to any host specified and will override the protocol of any URL specified.

accept_full_urls_only (bool) –
Optional argument. If False, then be flexible in the parsing; for example, if no port is given, use the default port. If True, then only accept full URLs only. Default is False.

property host: Read-only property–hostname or IP address.

property port: Read-only property–port.

property using_default_port: Read-only property–boolean indicating if we’re using a default port, or using the user given port (or the lack thereof).

property protocol: Read-only property–protocol (HTTP or HTTPS).

property using_default_protocol: Read-only property–boolean indicating if we’re using a default protocol, or using the user given protocol.

property path: Read-only property–URL path.

property url: Read-only property–fully qualified URL.

property username: Read-only property–username in URL, if present.

property password: Read-only property–password in URL, if present.

class ClusterAddressInfo(head_rank_url, worker_rank_urls=None, host_names=None, host_manager_url=None, host_manager_port=None, is_primary_cluster=None, server_version=None, logging_level=None)[source]

Inner class to keep track of all relevant information for a given Kinetica cluster. It mostly keeps track of URLs and hostnames, with some additional information like whether the cluster is primary or not.

Creates a ClusterAddressInfo object with the given information.

Parameters

head_rank_url (str or GPUdb.URL) –
Only required argument. Must be a full URL string or GPUdb.URL object. E.g. “http://1.2.3.4:8082/gpudb-0”.

worker_rank_urls (list of str or GPUdb.URL) –
Optional argument. Must be a list of fully qualified URLs. These URLs correspond to the worker ranks’ addresses.

host_names (list of str) –
Optional argument. Must be a list of strings. These strings contain hostnames or IP addresses for all the nodes/hosts in the cluster. May contain the protocol (e.g. “http://host0”).

host_manager_url (str or GPUdb.URL) –
Optional argument, mutually exclusive with host_manager_port. If given, must be a fully qualified URL for the host manager of this cluster.

host_manager_port (int) –
Optional argument, mutually exclusive with host_manager_url. If given, must be an integer in the range [1, 65535].

is_primary_cluster (bool) –
Optional boolean argument. Indicates if this cluster is to be treated as the primary cluster. Default is False.

server_version (str or GPUdb.Version) –
Optional string containing the server version. If given, will be parsed as a GPUdb.Version object. Default is None.

logging_level (int) –
Optional level at which logs should be output. Default is None.

property head_rank_url: Returns the current head node GPUdb.URL for this cluster.

property protocol: Returns the protocol used (‘HTTP’ or ‘HTTPS’). This is derived from the head rank URL. A read-only property.

property worker_rank_urls: Returns the list of the worker rank GPUdb.URL objects for this cluster. May be empty if worker http servers are disabled.

property host_names: Returns the list of hostnames for this cluster.

property host_manager_url: Returns the host manager GPUdb.URL for this cluster.

property is_primary_cluster: Returns whether this cluster is the primary cluster in the ring.

property is_intra_cluster_failover_enabled

Returns whether this cluster has intra-cluster failover enabled.

This method is now deprecated.

property server_version: Returns the version of this cluster, if known; None otherwise.

does_cluster_contain_node(host_name)[source]

Checks if the given hostname (or IP address) is part of this cluster.

Parameters

host_name (str) –
String containing a hostname or an IP address.

Returns

True if this cluster contains a machine with the given –

hostname or IP address, False otherwise.

END_OF_SET = -9999: (int) Used for indicating that all of the records (till the end of the set are desired)–generally used for /get/records/* functions.

get_version_info()[source]: Return the version information for this API.

get_host()[source]: Return the host this client is talking to.

get_primary_host()[source]: Return the primary host for this client.

set_primary_host(new_primary_host, start_using_new_primary_host=False, delete_old_primary_host=False)[source]

Set the primary host for this client. Start using this host per the user’s directions. Also, either delete any existing primary host information, or relegate it to the ranks of a backup host.

Parameters

value (str) –
A string containing the full URL of the new primary host (of the format ‘http[s]://X.X.X.X:PORT[/httpd-name]’). Must have valid URL format. May be part of the given back-up hosts, or be a completely new one.

start_using_new_primary_host (bool) –
Boolean flag indicating if the new primary host should be used starting immediately. Please be cautious about setting the value of this flag to True; there may be unintended consequences regarding query chaining. Caveat: if values given is False, but delete_old_primary_host is True and the old primary host, if any, was being used at the time of this function call, then the client still DOES switch over to the new primary host. Default value is False.

delete_old_primary_host (bool) –
Boolean flag indicating that if a primary host was already set, delete that information. If False, then any existing primary host URL would treated as a regular back-up cluster’s host. Default value is False.

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method will no longer be functional. This method will be a no-op, not changing primary host. port. The method will be removed in version 7.2.0.0. The only way to set the primary host is via GPUdb.Options at GPUdb initialization. It cannot be changed after that.

get_port()[source]: Return the port the host is listening to.

get_host_manager_port()[source]: Return the port the host manager is listening to.

get_url(stringified=True)[source]

Return the GPUdb.URL or its string representation that points to the current head node of the current cluster in use.

Parameters

stringified (bool) –
Optional argument. If True, return the string representation, otherwise return the GPUdb.URL object. Default is True.

Returns

The GPUdb.URL object or its string representation.

get_hm_url(stringified=True)[source]

Return the GPUdb.URL or its string representation that points to the current host manager of the current cluster in use.

Parameters

stringified (bool) –
Optional argument. If True, return the string representation, otherwise return the GPUdb.URL object. Default is True.

Returns

The GPUdb.URL object or its string representation.

get_failover_urls()[source]

Return a list of the head node URLs for each of the clusters in the HA ring in failover order.

Returns

A list of GPUdb.URL objects.

get_head_node_urls()[source]

Return a list of the head node URLs for each of the clusters in the HA ring for the database server.

Returns

A list of GPUdb.URL objects.

get_num_cluster_switches()[source]: Gets the number of times the client has switched to a different cluster amongst the high availability ring.

property current_cluster_info: Return the GPUdb.ClusterAddressInfo object containing information on the current/active cluster.

property all_cluster_info: Return the list of GPUdb.ClusterAddressInfo objects that contain address of each of the clusters in the ring.

property ha_ring_size: Return the list of GPUdb.ClusterAddressInfo objects that contain address of each of the clusters in the ring.

property options: Return the GPUdb.Options object that contains all the knobs the user can turn for controlling this class’s behavior.

property gpudb_full_url: Returns the full URL of the current head rank of the currently active cluster.

property server_version: Returns the GPUdb.Version object representing the version of the currently active cluster of the Kinetica server.

property protocol: Returns the HTTP protocol being used by the GPUdb object to communicate to the database server.

property primary_host: Returns the primary hostname.

property username: Gets the username to be used for authentication to GPUdb.

property password: Gets the password to be used for authentication to GPUdb.

property oauth_token: Gets the OAuth2 token to be used for authentication to GPUdb.

property timeout: Gets the timeout used for http connections to GPUdb.

property disable_auto_discovery: Returns whether auto-discovery has been disabled.

property logging_level: Returns the integer value of the logging level that is being used by the API. By default, logging is set to NOTSET, and the logger will honor the root logger’s level.

property get_known_types: Return all known types; if none, return None.

get_known_type(type_id, lookup_type=True)[source]

Given an type ID, return any associated known type; if none is found, then optionally try to look it up and save it. Otherwise, return None.

Parameters

type_id (str) –
The ID for the type.

lookup_type (bool) –
If True, then if the type is not already found, then to look it up by invoking show_types(), save it for the future, and return it.

Returns

The associated RecordType, if found (or looked up) –

otherwise.

get_all_available_full_urls(stringified=True)[source]

Return the list of GPUdb.URL objects or its string representation that points to the current head node of each of the clusters in the ring.

Parameters

stringified (bool) –
Optional argument. If True, return the string representation, otherwise return the GPUdb.URL object. Default is True.

Returns

The GPUdb.URL object or its string representation.

add_http_header(header, value)[source]

Adds an HTTP header to the map of additional HTTP headers to send to the server with each request. If the header is already in the map, its value is replaced with the specified value. The user is not allowed to modify the following headers:

Accept
Authorization
X-Kinetica-Group
Content-type

remove_http_header(header)[source]

Removes the given HTTP header from the map of additional HTTP headers to send to GPUdb with each request. The user is not allowed to remove the following headers:

Accept
Authorization
X-Kinetica-Group
Content-type

get_http_headers()[source]: Returns a dict containing all the custom headers used currently by GPUdb. Returns a deep copy so that the user does not accidentally change the headers. Note that the API may use other headers as appropriate; the ones returned here are the custom ones set up by the user.

log_debug(message)[source]: Logging method for debug.

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method is deprecated, and may be removed in a future version. Previously, this was a static method; now it is an instance method. This method will log messages as intended.

log_warn(message)[source]: Logging method for warnings.

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method is deprecated, and may be removed in a future version. Previously, this was a static method; now it is an instance method. This method will log messages as intended.

log_info(message)[source]: Logging method for information.

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method is deprecated, and may be removed in a future version. Previously, this was a static method; now it is an instance method. This method will log messages as intended.

log_error(message)[source]: Logging method for error.

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method is deprecated, and may be removed in a future version. Previously, this was a static method; now it is an instance method. This method will log messages as intended.

encode_datum(SCHEMA, datum, encoding=None)[source]

Returns an Avro binary or JSON encoded datum dict using its schema.

Parameters

SCHEMA (str or avro.Schema) –
A parsed schema object from avro.schema.parse() or a string containing the schema.

datum (dict) –
A dict of key-value pairs containing the data to encode (the entries must match the schema).

encode_datum_cext(SCHEMA, datum, encoding=None)[source]

Returns an avro binary or JSON encoded datum dict using its schema.

Parameters

SCHEMA (str or avro.Schema) –
A parsed schema object from avro.schema.parse() or a string containing the schema.

datum (dict) –
A dict of key-value pairs containing the data to encode (the entries must match the schema).

static valid_json(json_string)[source]: Validates a JSON string by trying to parse it into a Python object

static merge_dicts(*dict_args)[source]: Given any number of dictionaries, shallow copy and merge into a new dict, precedence goes to key-value pairs in latter dictionaries.

logger(ranks, log_levels, options={})[source]

Convenience function to change log levels of some or all GPUdb ranks.

Parameters

ranks (list of ints) –
A list containing the ranks to which to apply the new log levels.

log_levels (dict of str to str) –
A map where the keys dictate which log’s levels to change, and the values dictate what the new log levels will be.

options (dict of str to str) –
Optional parameters. Default value is an empty dict ( {} ).

Returns

A dict with the following entries–

status (str) –
The status of the endpoint (‘OK’ or ‘ERROR’).

log_levels (map of str to str) –
A map of each log level to its respective value

set_server_logger_level(ranks, log_levels, options={})[source]

Convenience function to change log levels of some or all GPUdb ranks.

Parameters

ranks (list of ints) –
A list containing the ranks to which to apply the new log levels.

log_levels (dict of str to str) –
A map where the keys dictate which log’s levels to change, and the values dictate what the new log levels will be.

options (dict of str to str) –
Optional parameters. Default value is an empty dict ( {} ).

Returns

A dict with the following entries–

status (str) –
The status of the endpoint (‘OK’ or ‘ERROR’).

log_levels (map of str to str) –
A map of each log level to its respective value

set_client_logger_level(log_level)[source]

Set the log level for the client GPUdb class.

Parameters

log_level (int or str) –
A valid log level for the logging module

insert(*, table_name: str = None, records: List[Any] | Dict[str, Any] | List[List[Any]] | List[Dict[str, Any]] = None, options=None)[source]

Insert one or more records.

Parameters

table_name –
keyword only The name of the Kinetica table to insert data into

records –
keyword only Values for all columns of a single record or multiple records. For a single record, use either of the following syntaxes:
insert_records( [1, 2, 3] )
For multiple records, use either of the following syntaxes:
insert_records( [ [1, 2, 3], [4, 5, 6] ] )
insert_records(   [1, 2, 3], [4, 5, 6]   )
Also, the user can use keyword arguments to pass in values:
# For a record type with two integers named 'a' and 'b':
insert_records( {"a":  1, "b":  1},
                {"a": 42, "b": 32} )

# Also can use a list to pass the dicts
insert_records( [ {"a":  1, "b":  1},
                  {"a": 42, "b": 32} ] )
Additionally, the user may provide options for the insertion operation. For example:
insert_records( [1, 2, 3], [4, 5, 6],
                options = {"return_record_ids": "true"} )
options –
keyword only Values for all columns for a single record. Mutually exclusive with args (i.e. cannot provide both) when it only contains data.

May contain an ‘options’ keyword arg which will be passed to the database for the insertion operation.

Returns

A GPUdbTable object with the insert_records() –

response fields converted to attributes and stored within.

delete(*, table_name=None, expression=None)[source]

Deletes the record matching the provided criterion from the given table. The record selection criteria can be a single input parameter expression (matching multiple records) The operation is synchronous meaning that a response will not be available until the request is completely processed and all the matching records are deleted.

Parameters

table_name (str) –
Name of the table from which to delete records, in [schema_name.]table_name format, using standard name resolution rules. Must contain the name of an existing table; not applicable to views.

expression (str) –
The actual predicate, to be used by the delete operation; format should follow the guidelines provided here. Specifying an input parameter expression is mutually exclusive to specifying record_id in the input parameter options. The user can provide a single element (which will be automatically promoted to a list internally) or a list having a single element.

Returns

A dict with the following entries if successful–

count_deleted (long) –
Total number of records deleted across all expressions.

In case of error it returns a dict - {‘status’ –
‘ERROR’, ‘message’: ‘Some error message’}

update(*, table_name=None, expression=None, new_values_map=None)[source]

Runs predicate-based updates in a single call. With the given expression, any matching record’s column values will be updated as provided in input parameter new_values_map.

Note that this operation can only be run on an original table and not on a result view.

Parameters

table_name (str) –
Name of table to be updated, in [schema_name.]table_name format, using standard name resolution rules. Must be a currently existing table and not a view.

expression (str) –
An actual predicate for the update; format should follow the guidelines here. The user should provide a single element (which will be automatically promoted to a list internally).

new_values_map (a dict of str to optional str) –
List of new values for the matching records. Each element is a (key, value) pair where the keys are the names of the columns whose values are to be updated; the values are the new values. The user can provide a single element (which will be automatically promoted to a list internally).

Returns

A dict with the following entries if successful–

count_updated (long) –
Total number of records updated.

In case of error it returns a dict - {‘status’ –
‘ERROR’, ‘message’: ‘Some error message’}

insert_records_from_json(json_records, table_name, json_options=None, create_table_options=None, options=None)[source]

Method to insert a single JSON record or an array of JSON records passed in as a string.

Parameters

json_records (str) –
Either a single JSON record or an array of JSON records (as string). Mandatory.

table_name (str) –
The name of the table to insert into.

json_options (dict) –
Only valid option is validate which could be True or False

create_table_options (dict) –
Same options as the create_table_options in GPUdb.insert_records_from_payload() endpoint

options (dict) –
Same options as options in GPUdb.insert_records_from_payload() endpoint

Example

response = gpudb.insert_records_from_json(records, "test_insert_records_json", json_options={'validate': True}, create_table_options={'truncate_table': 'true'})
response_object = json.loads(response)
print(response_object['data']['count_inserted'])

See also

GPUdb.insert_records_from_payload()

get_records_json(table_name, column_names=None, offset=0, limit=-9999, expression=None, orderby_columns=None, having_clause=None)[source]

This method is used to retrieve records from a Kinetica table in the form of a JSON array (stringified). The only mandatory parameter is the ‘tableName’. The rest are all optional with suitable defaults wherever applicable.

Parameters

table_name (str) –
Name of the table

column_names (list) –
the columns names to retrieve

offset (int) –
the offset to start from - default 0

limit (int) –
the maximum number of records - default GPUdb.END_OF_SET

expression (str) –
the filter expression

orderby_columns (list) –
the list of columns to order by

having_clause (str) –
the having clause

Returns

The response string (JSON)

Raises

GPUdbException –
On detecting invalid parameters or some other internal errors

Example

resp = gpudb.get_records_json("table_name")
json_object = json.loads(resp)
print(json_object["data"]["records"])

wms(wms_params, url=None)[source]

Submits a WMS call to the server.

Parameters

wms_params (str) –
A string containing the WMS endpoint parameters, not containing the ‘/wms’ endpoint itself.

url (str or GPUdb.URL) –
An optional URL to which we submit the /wms endpoint. If None given, use the current URL for this GPUdb object.

Returns

A dict with the following entries–

data –
The /wms content.

status_info (dict) –
A dict containing more information regarding the request. Keys:

status

message

response_time

ping(url)[source]

Pings the given URL and returns the response. If no response, returns an empty string.

Parameters

url (GPUdb.URL) –
The URL which we are supposed to ping.

Returns

The ping response, or an empty string if it fails.

is_kinetica_running(url)[source]

Verifies that GPUdb is running at the given URL (does not do any HA failover).

Parameters

url (GPUdb.URL) –
The URL which we are supposed to ping.

Returns

True if Kinetica is running at that URL, False otherwise.

get_server_debug_information(url)[source]

Gets the database debug information from the given URL and returns the response.

Parameters

url (GPUdb.URL) –
The URL which we are supposed to get information from.

Returns

The debug response.

to_df(sql: str, sql_params: list = [], batch_size: int = 5000, sql_opts: dict = {}, show_progress: bool = False)[source]

Runs the given query and converts the result to a Pandas Data Frame.

Parameters

sql (str) –
The SQL query to run

sql_params (list) –
The SQL parameters that will be substituted for tokens (e.g. $1 $2)

batch_size (int) –
The number of records to retrieve at a time from the database

sql_opts (dict) –
The options for SQL execution, matching the options passed to GPUdb.execute_sql(). Defaults to None.

show_progress (bool) –
Whether to display progress on the console or not. Defaults to False.

Raises

GPUdbException

Returns

pd.DataFrame –
A Pandas Data Frame containing the result set of the SQL query or None if there are no results

query(sql, batch_size=5000, sql_params=[], sql_opts={})[source]

Execute a SQL query and return a GPUdbSqlIterator

Parameters

sql (str) –
The SQL query to run

batch_size (int) –
The number of records to retrieve at a time from the database

sql_params (list of native types) –
The SQL parameters that will be substituted for tokens (e.g. $1 $2)

sql_opts (dict) –
The options for SQL execution, matching the options passed to GPUdb.execute_sql(). Defaults to None.

Returns

An instance of GPUdbSqlIterator.

query_one(sql, sql_params=[], sql_opts={})[source]

Execute a SQL query that returns only one row.

Parameters

sql (str) –
The SQL query to run

sql_params (list of native types) –
The SQL parameters that will be substituted for tokens (e.g. $1 $2)

sql_opts (dict) –
The options for SQL execution, matching the options passed to GPUdb.execute_sql(). Defaults to None.

Returns

The returned row or None.

execute(sql, sql_params=[], sql_opts={})[source]

Execute a SQL query and return the row count.

Parameters

sql (str) –
The SQL to execute

sql_params (list of native types) –
The SQL parameters that will be substituted for tokens (e.g. $1 $2)

sql_opts (dict) –
The options for SQL execution, matching the options passed to GPUdb.execute_sql(). Defaults to None.

Returns

Number of records affected

static get_connection(enable_ssl_cert_verification=False, enable_auto_discovery=False, enable_failover=False, logging_level='INFO') → GPUdb[source]

Get a connection to Kinetica getting connection and authentication information from environment variables.

This method is useful particularly for Jupyter notebooks, which won’t need authentication credentials embedded within them. This, in turn, helps to prevent commit of credentials to the notebook version control. In addition, some features including auto-discovery and SSL certificate verification are disabled by default to simplify connections for simple use cases.

The following environment variables are required: - KINETICA_URL: the url of the Kinetica server - KINETICA_USER: the username to connect with - KINETICA_PASSWD: the password to connect with

Parameters

enable_ssl_cert_verification (bool) –
Enable SSL certificate verification.

enable_auto_discovery (bool) –
Enable auto-discovery of the initial cluster nodes, as well as any attached failover clusters. This allows for both multi-head ingestion & key lookup, as well as cluster failover.

enable_failover (bool) –
Enable failover to another cluster.

logging_level (str) –
Logging level for the connection. (INFO by default)

Returns (GPUdb):: An active connection to Kinetica.

load_gpudb_schemas()[source]: Saves all request and response schemas for GPUdb queries in a lookup table (lookup by query name).

load_gpudb_func_to_endpoint_map()[source]: Saves a mapping of rest endpoint function names to endpoints in a dictionary.

admin_add_host(host_address=None, options={})[source]

Adds a host to an existing cluster.

Note

This method should be used for on-premise deployments only.

Parameters

host_address (str) –
IP address of the host that will be added to the cluster. This host must have installed the same version of Kinetica as the cluster to which it is being added.

options (dict of str to str) –
Optional parameters. Allowed keys are:

dry_run – If set to true, only validation checks will be performed. No host is added. Allowed values are:

true

false

The default value is ‘false’.

accepts_failover – If set to true, the host will accept processes (ranks, graph server, etc.) in the event of a failover on another node in the cluster. Allowed values are:

true

false

The default value is ‘false’.

public_address – The publicly-accessible IP address for the host being added, typically specified for clients using multi-head operations. This setting is required if any other host(s) in the cluster specify a public address.

host_manager_public_url – The publicly-accessible full path URL to the host manager on the host being added, e.g., ‘http://172.123.45.67:9300’. The default host manager port can be found in the list of ports used by Kinetica.

ram_limit – The desired RAM limit for the host being added, i.e. the sum of RAM usage for all processes on the host will not be able to exceed this value. Supported units: K (thousand), KB (kilobytes), M (million), MB (megabytes), G (billion), GB (gigabytes); if no unit is provided, the value is assumed to be in bytes. For example, if ram_limit is set to 10M, the resulting RAM limit is 10 million bytes. Set ram_limit to -1 to have no RAM limit.

gpus – Comma-delimited list of GPU indices (starting at 1) that are eligible for running worker processes. If left blank, all GPUs on the host being added will be eligible.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

added_host (str) –
Identifier for the newly added host, of the format ‘hostN’ where N is the integer identifier of that host. Note that the host identifier is transient, i.e. it may change in the future if other hosts are removed.

info (dict of str to str) –
Additional information.

admin_add_ranks(hosts=None, config_params=None, options={})[source]

Add one or more ranks to an existing Kinetica cluster. The new ranks will not contain any data initially (other than replicated tables) and will not be assigned any shards. To rebalance data and shards across the cluster, use GPUdb.admin_rebalance().

The database must be offline for this operation, see GPUdb.admin_offline()

For example, if attempting to add three new ranks (two ranks on host 172.123.45.67 and one rank on host 172.123.45.68) to a Kinetica cluster with additional configuration parameters:

input parameter hosts would be an array including 172.123.45.67 in the first two indices (signifying two ranks being added to host 172.123.45.67) and 172.123.45.68 in the last index (signifying one rank being added to host 172.123.45.67)
input parameter config_params would be an array of maps, with each map corresponding to the ranks being added in input parameter hosts. The key of each map would be the configuration parameter name and the value would be the parameter’s value, e.g. ‘{“rank.gpu”:”1”}’

This endpoint’s processing includes copying all replicated table data to the new rank(s) and therefore could take a long time. The API call may time out if run directly. It is recommended to run this endpoint asynchronously via GPUdb.create_job().

Note

This method should be used for on-premise deployments only.

Parameters

hosts (list of str) –
Array of host IP addresses (matching a hostN.address from the gpudb.conf file), or host identifiers (e.g. ‘host0’ from the gpudb.conf file), on which to add ranks to the cluster. The hosts must already be in the cluster. If needed beforehand, to add a new host to the cluster use GPUdb.admin_add_host(). Include the same entry as many times as there are ranks to add to the cluster, e.g., if two ranks on host 172.123.45.67 should be added, input parameter hosts could look like ‘[“172.123.45.67”, “172.123.45.67”]’. All ranks will be added simultaneously, i.e. they’re not added in the order of this array. Each entry in this array corresponds to the entry at the same index in the input parameter config_params. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

config_params (list of dicts of str to str) –
Array of maps containing configuration parameters to apply to the new ranks found in input parameter hosts. For example, ‘{“rank.gpu”:”2”, “tier.ram.rank.limit”:”10000000000”}’. Currently, the available parameters are rank-specific parameters in the Network, Hardware, Text Search, and RAM Tiered Storage sections in the gpudb.conf file, with the key exception of the ‘rankN.host’ settings in the Network section that will be determined by input parameter hosts instead. Though many of these configuration parameters typically are affixed with ‘rankN’ in the gpudb.conf file (where N is the rank number), the ‘N’ should be omitted in input parameter config_params as the new rank number(s) are not allocated until the ranks have been added to the cluster. Each entry in this array corresponds to the entry at the same index in the input parameter hosts. This array must either be completely empty or have the same number of elements as the input parameter hosts. An empty input parameter config_params array will result in the new ranks being set with default parameters. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

dry_run – If true, only validation checks will be performed. No ranks are added. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

added_ranks (list of str) –
The number assigned to each added rank, formatted as ‘rankN’, in the same order as the ranks in input parameter hosts and input parameter config_params.

info (dict of str to str) –
Additional information.

admin_alter_host(host=None, options={})[source]

Alter properties on an existing host in the cluster. Currently, the only property that can be altered is a hosts ability to accept failover processes.

Parameters

host (str) –
Identifies the host this applies to. Can be the host address, or formatted as ‘hostN’ where N is the host number as specified in gpudb.conf

options (dict of str to str) –
Optional parameters. Allowed keys are:

accepts_failover – If set to true, the host will accept processes (ranks, graph server, etc.) in the event of a failover on another node in the cluster. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

admin_alter_jobs(job_ids=None, action=None, options={})[source]

Perform the requested action on a list of one or more job(s). Based on the type of job and the current state of execution, the action may not be successfully executed. The final result of the attempted actions for each specified job is returned in the status array of the response. See Job Manager for more information.

Parameters

job_ids (list of longs) –
Jobs to be modified. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

action (str) –
Action to be performed on the jobs specified by job_ids. Allowed values are:

cancel

options (dict of str to str) –
Optional parameters. Allowed keys are:

job_tag – Job tag returned in call to create the job

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

job_ids (list of longs) –
Jobs on which the action was performed.

action (str) –
Action requested on the jobs.

status (list of str) –
Status of the requested action for each job.

info (dict of str to str) –
Additional information.

admin_backup_begin(options={})[source]

Prepares the system for a backup by closing all open file handles after allowing current active jobs to complete. When the database is in backup mode, queries that result in a disk write operation will be blocked until backup mode has been completed by using GPUdb.admin_backup_end().

Parameters

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

admin_backup_end(options={})[source]

Restores the system to normal operating mode after a backup has completed, allowing any queries that were blocked to complete.

Parameters

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

admin_ha_offline(offline=None, options={})[source]

Pauses consumption of messages from other HA clusters to support data repair/recovery scenarios. In-flight queries may fail to replicate to other clusters in the ring when going offline.

Parameters

offline (bool) –
Set to true if desired state is offline. Allowed values are:

True

False

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

admin_ha_refresh(options={})[source]

Restarts the HA processing on the given cluster as a mechanism of accepting breaking HA conf changes. Additionally the cluster is put into read-only while HA is restarting.

Parameters

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

admin_offline(offline=None, options={})[source]

Take the system offline. When the system is offline, no user operations can be performed with the exception of a system shutdown.

Parameters

offline (bool) –
Set to true if desired state is offline. Allowed values are:

True

False

options (dict of str to str) –
Optional parameters. Allowed keys are:

flush_to_disk – Flush to disk when going offline. Allowed values are:

true

false

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

is_offline (bool) –
Returns true if the system is offline, or false otherwise.

info (dict of str to str) –
Additional information.

admin_rebalance(options={})[source]

Rebalance the data in the cluster so that all nodes contain an equal number of records approximately and/or rebalance the shards to be equally distributed (as much as possible) across all the ranks.

The database must be offline for this operation, see GPUdb.admin_offline()

If GPUdb.admin_rebalance() is invoked after a change is made to the cluster, e.g., a host was added or removed, sharded data will be evenly redistributed across the cluster by number of shards per rank while unsharded data will be redistributed across the cluster by data size per rank
If GPUdb.admin_rebalance() is invoked at some point when unsharded data (a.k.a. randomly-sharded) in the cluster is unevenly distributed over time, sharded data will not move while unsharded data will be redistributed across the cluster by data size per rank

NOTE: Replicated data will not move as a result of this call

This endpoint’s processing time depends on the amount of data in the system, thus the API call may time out if run directly. It is recommended to run this endpoint asynchronously via GPUdb.create_job().

Parameters

options (dict of str to str) –
Optional parameters. Allowed keys are:

rebalance_sharded_data – If true, sharded data will be rebalanced approximately equally across the cluster. Note that for clusters with large amounts of sharded data, this data transfer could be time consuming and result in delayed query responses. Allowed values are:

true

false

The default value is ‘true’.

rebalance_unsharded_data – If true, unsharded data (a.k.a. randomly-sharded) will be rebalanced approximately equally across the cluster. Note that for clusters with large amounts of unsharded data, this data transfer could be time consuming and result in delayed query responses. Allowed values are:

true

false

The default value is ‘true’.

table_includes – Comma-separated list of unsharded table names to rebalance. Not applicable to sharded tables because they are always rebalanced. Cannot be used simultaneously with table_excludes. This parameter is ignored if rebalance_unsharded_data is false.

table_excludes – Comma-separated list of unsharded table names to not rebalance. Not applicable to sharded tables because they are always rebalanced. Cannot be used simultaneously with table_includes. This parameter is ignored if rebalance_unsharded_data is false.

aggressiveness – Influences how much data is moved at a time during rebalance. A higher aggressiveness will complete the rebalance faster. A lower aggressiveness will take longer but allow for better interleaving between the rebalance and other queries. Valid values are constants from 1 (lowest) to 10 (highest). The default value is ‘10’.

compact_after_rebalance – Perform compaction of deleted records once the rebalance completes to reclaim memory and disk space. Default is true, unless repair_incorrectly_sharded_data is set to true. Allowed values are:

true

false

The default value is ‘true’.

compact_only – If set to true, ignore rebalance options and attempt to perform compaction of deleted records to reclaim memory and disk space without rebalancing first. Allowed values are:

true

false

The default value is ‘false’.

repair_incorrectly_sharded_data – Scans for any data sharded incorrectly and re-routes the data to the correct location. Only necessary if GPUdb.admin_verify_db() reports an error in sharding alignment. This can be done as part of a typical rebalance after expanding the cluster or in a standalone fashion when it is believed that data is sharded incorrectly somewhere in the cluster. Compaction will not be performed by default when this is enabled. If this option is set to true, the time necessary to rebalance and the memory used by the rebalance may increase. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

admin_remove_host(host=None, options={})[source]

Removes a host from an existing cluster. If the host to be removed has any ranks running on it, the ranks must be removed using GPUdb.admin_remove_ranks() or manually switched over to a new host using GPUdb.admin_switchover() prior to host removal. If the host to be removed has the graph server or SQL planner running on it, these must be manually switched over to a new host using GPUdb.admin_switchover().

Note

This method should be used for on-premise deployments only.

Parameters

host (str) –
Identifies the host this applies to. Can be the host address, or formatted as ‘hostN’ where N is the host number as specified in gpudb.conf

options (dict of str to str) –
Optional parameters. Allowed keys are:

dry_run – If set to true, only validation checks will be performed. No host is removed. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

admin_remove_ranks(ranks=None, options={})[source]

Remove one or more ranks from an existing Kinetica cluster. All data will be rebalanced to other ranks before the rank(s) is removed unless the rebalance_sharded_data or rebalance_unsharded_data parameters are set to false in the input parameter options, in which case the corresponding sharded data and/or unsharded data (a.k.a. randomly-sharded) will be deleted.

The database must be offline for this operation, see GPUdb.admin_offline()

This endpoint’s processing time depends on the amount of data in the system, thus the API call may time out if run directly. It is recommended to run this endpoint asynchronously via GPUdb.create_job().

Note

This method should be used for on-premise deployments only.

Parameters

ranks (list of str) –
Each array value designates one or more ranks to remove from the cluster. Values can be formatted as ‘rankN’ for a specific rank, ‘hostN’ (from the gpudb.conf file) to remove all ranks on that host, or the host IP address (hostN.address from the gpub.conf file) which also removes all ranks on that host. Rank 0 (the head rank) cannot be removed (but can be moved to another host using GPUdb.admin_switchover()). At least one worker rank must be left in the cluster after the operation. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

rebalance_sharded_data – If true, sharded data will be rebalanced approximately equally across the cluster. Note that for clusters with large amounts of sharded data, this data transfer could be time consuming and result in delayed query responses. Allowed values are:

true

false

The default value is ‘true’.

rebalance_unsharded_data – If true, unsharded data (a.k.a. randomly-sharded) will be rebalanced approximately equally across the cluster. Note that for clusters with large amounts of unsharded data, this data transfer could be time consuming and result in delayed query responses. Allowed values are:

true

false

The default value is ‘true’.

aggressiveness – Influences how much data is moved at a time during rebalance. A higher aggressiveness will complete the rebalance faster. A lower aggressiveness will take longer but allow for better interleaving between the rebalance and other queries. Valid values are constants from 1 (lowest) to 10 (highest). The default value is ‘10’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

removed_ranks (list of str) –
The number assigned to each rank removed from the cluster. This array will be empty if the operation fails.

info (dict of str to str) –
Additional information.

admin_repair_table(table_names=None, options={})[source]

Manually repair a corrupted table. Returns information about affected tables.

Parameters

table_names (list of str) –
List of tables to query. An asterisk returns all tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

repair_policy – Corrective action to take. Allowed values are:

delete_chunks – Deletes any corrupted chunks

shrink_columns – Shrinks corrupted chunks to the shortest column

replay_wal – Manually invokes write-ahead log (WAL) replay on the table

verify_all – If false only table chunk data already known to be corrupted will be repaired. Otherwise the database will perform a full table scan to check for correctness. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_names (list of str) –
List of repaired tables.

repair_status (list of str) –
List of repair status by table.

info (dict of str to str) –
Additional information.

admin_send_alert(message='', label='', log_level=None, options={})[source]

Sends a user generated alert to the monitoring system.

Parameters

message (str) –
Alert message body. The default value is ‘’.

label (str) –
Label to add to alert message. The default value is ‘’.

log_level (str) –
Alert message logging criteria. Allowed values are:

fatal

error

warn

info

debug

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

admin_show_alerts(num_alerts=None, options={})[source]

Requests a list of the most recent alerts. Returns lists of alert data, including timestamp and type.

Parameters

num_alerts (int) –
Number of most recent alerts to request. The response will include up to input parameter num_alerts depending on how many alerts there are in the system. A value of 0 returns all stored alerts.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

timestamps (list of str) –
Timestamp for when the alert occurred, sorted from most recent to least recent. Each array entry corresponds with the entries at the same index in output parameter types and output parameter params.

types (list of str) –
Type of system alert, sorted from most recent to least recent. Each array entry corresponds with the entries at the same index in output parameter timestamps and output parameter params.

params (list of dicts of str to str) –
Parameters for each alert, sorted from most recent to least recent. Each array entry corresponds with the entries at the same index in output parameter timestamps and output parameter types.

info (dict of str to str) –
Additional information.

admin_show_cluster_operations(history_index=0, options={})[source]

Requests the detailed status of the current operation (by default) or a prior cluster operation specified by input parameter history_index. Returns details on the requested cluster operation.

The response will also indicate how many cluster operations are stored in the history.

Parameters

history_index (int) –
Indicates which cluster operation to retrieve. Use 0 for the most recent. The default value is 0.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

history_index (int) –
The index of this cluster operation in the reverse-chronologically sorted list of operations, where 0 is the most recent operation.

history_size (int) –
Number of cluster operations executed to date.

in_progress (bool) –
Whether this cluster operation is currently in progress or not. Allowed values are:

True

False

start_time (str) –
The start time of the cluster operation.

end_time (str) –
The end time of the cluster operation, if completed.

endpoint (str) –
The endpoint that initiated the cluster operation.

endpoint_schema (str) –
The schema for the original request.

overall_status (str) –
Overall success status of the operation. Allowed values are:

OK – The operation was successful, or, if still in progress, the operation is successful so far.

ERROR – An error occurred executing the operation.

user_stopped (bool) –
Whether a user stopped this operation at any point while in progress. Allowed values are:

True

False

percent_complete (int) –
Percent complete of this entire operation.

dry_run (bool) –
Whether this operation was a dry run. Allowed values are:

True

False

messages (list of str) –
Updates and error messages if any.

add_ranks (bool) –
Whether adding ranks is (or was) part of this operation. Allowed values are:

True

False

add_ranks_status (str) –
If this was a rank-adding operation, the add-specific status of the operation. Allowed values are:

NOT_STARTED

IN_PROGRESS

INTERRUPTED

COMPLETED_OK

ERROR

ranks_being_added (list of ints) –
The rank numbers of the ranks currently being added, or the rank numbers that were added if the operation is complete.

rank_hosts (list of str) –
The host IP addresses of the ranks being added, in the same order as the output parameter ranks_being_added list.

add_ranks_percent_complete (int) –
Current percent complete of the add ranks operation.

remove_ranks (bool) –
Whether removing ranks is (or was) part of this operation. Allowed values are:

True

False

remove_ranks_status (str) –
If this was a rank-removing operation, the removal-specific status of the operation. Allowed values are:

NOT_STARTED

IN_PROGRESS

INTERRUPTED

COMPLETED_OK

ERROR

ranks_being_removed (list of ints) –
The ranks being removed, or that have been removed if the operation is completed.

remove_ranks_percent_complete (int) –
Current percent complete of the remove ranks operation.

rebalance (bool) –
Whether data and/or shard rebalancing is (or was) part of this operation. Allowed values are:

True

False

rebalance_unsharded_data (bool) –
Whether rebalancing of unsharded data is (or was) part of this operation. Allowed values are:

True

False

rebalance_unsharded_data_status (str) –
If this was an operation that included rebalancing unsharded data, the rebalancing-specific status of the operation. Allowed values are:

NOT_STARTED

IN_PROGRESS

INTERRUPTED

COMPLETED_OK

ERROR

unsharded_rebalance_percent_complete (int) –
Percentage of unsharded tables that completed rebalancing, out of all unsharded tables to rebalance.

rebalance_sharded_data (bool) –
Whether rebalancing of sharded data is (or was) part of this operation. Allowed values are:

True

False

shard_array_version (long) –
Version of the shard array that is (or was) being rebalanced to. Each change to the shard array results in the version number incrementing.

rebalance_sharded_data_status (str) –
If this was an operation that included rebalancing sharded data, the rebalancing-specific status of the operation. Allowed values are:

NOT_STARTED

IN_PROGRESS

INTERRUPTED

COMPLETED_OK

ERROR

num_shards_changing (int) –
Number of shards that will change as part of rebalance.

sharded_rebalance_percent_complete (int) –
Percentage of shard keys, and their associated data if applicable, that have completed rebalancing.

info (dict of str to str) –
Additional information.

admin_show_jobs(options={})[source]

Get a list of the current jobs in GPUdb.

Parameters

options (dict of str to str) –
Optional parameters. Allowed keys are:

show_async_jobs – If true, then the completed async jobs are also included in the response. By default, once the async jobs are completed they are no longer included in the jobs list. Allowed values are:

true

false

The default value is ‘false’.

show_worker_info – If true, then information is also returned from worker ranks. By default only status from the head rank is returned. Allowed values are:

true

false

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

job_id (list of longs)

status (list of str)

endpoint_name (list of str)

time_received (list of longs)

auth_id (list of str)

source_ip (list of str)

query_text (list of str)

user_data (list of str)

flags (list of str)

info (dict of str to str) –
Additional information. Allowed keys are:

job_tag – The job tag specified by the user or if unspecified by user, an internally generated unique identifier for the job across clusters.

worker_info – Worker job information as json

The default value is an empty dict ( {} ).

admin_show_shards(options={})[source]

Show the mapping of shards to the corresponding rank and tom. The response message contains list of 16384 (total number of shards in the system) Rank and TOM numbers corresponding to each shard.

Parameters

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

version (long) –
Current shard array version number.

rank (list of ints) –
Array of ranks indexed by the shard number.

tom (list of ints) –
Array of toms to which the corresponding shard belongs.

info (dict of str to str) –
Additional information.

admin_shutdown(exit_type=None, authorization=None, options={})[source]

Exits the database server application.

Parameters

exit_type (str) –
Reserved for future use. User can pass an empty string.

authorization (str) –
No longer used. User can pass an empty string.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

exit_status (str) –
‘OK’ upon (right before) successful exit.

info (dict of str to str) –
Additional information.

admin_switchover(processes=None, destinations=None, options={})[source]

Manually switch over one or more processes to another host. Individual ranks or entire hosts may be moved to another host.

Note

This method should be used for on-premise deployments only.

Parameters

processes (list of str) –
Indicates the process identifier to switch over to another host. Options are ‘hostN’ and ‘rankN’ where ‘N’ corresponds to the number associated with a host or rank in the Network section of the gpudb.conf file; e.g., ‘host[N].address’ or ‘rank[N].host’. If ‘hostN’ is provided, all processes on that host will be moved to another host. Each entry in this array will be switched over to the corresponding host entry at the same index in input parameter destinations. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

destinations (list of str) –
Indicates to which host to switch over each corresponding process given in input parameter processes. Each index must be specified as ‘hostN’ where ‘N’ corresponds to the number associated with a host or rank in the Network section of the gpudb.conf file; e.g., ‘host[N].address’. Each entry in this array will receive the corresponding process entry at the same index in input parameter processes. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

dry_run – If set to true, only validation checks will be performed. Nothing is switched over. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

admin_verify_db(options={})[source]

Verify database is in a consistent state. When inconsistencies or errors are found, the verified_ok flag in the response is set to false and the list of errors found is provided in the error_list.

Parameters

options (dict of str to str) –
Optional parameters. Allowed keys are:

rebuild_on_error – [DEPRECATED – Use the Rebuild DB feature of GAdmin instead.]. Allowed values are:

true

false

The default value is ‘false’.

verify_nulls – When true, verifies that null values are set to zero. Allowed values are:

true

false

The default value is ‘false’.

verify_persist – When true, persistent objects will be compared against their state in memory and workers will be checked for orphaned table data in persist. To check for orphaned worker data, either set concurrent_safe in input parameter options to true or place the database offline. Allowed values are:

true

false

The default value is ‘false’.

concurrent_safe – When true, allows this endpoint to be run safely with other concurrent database operations. Other operations may be slower while this is running. Allowed values are:

true

false

The default value is ‘true’.

verify_rank0 – If true, compare rank0 table metadata against workers’ metadata. Allowed values are:

true

false

The default value is ‘false’.

delete_orphaned_tables – If true, orphaned table directories found on workers for which there is no corresponding metadata will be deleted. It is recommended to run this while the database is offline OR set concurrent_safe in input parameter options to true. Allowed values are:

true

false

The default value is ‘false’.

verify_orphaned_tables_only – If true, only the presence of orphaned table directories will be checked, all persistence and table consistency checks will be skipped. Allowed values are:

true

false

The default value is ‘false’.

table_includes – Comma-separated list of table names to include when verifying table consistency on wokers. Cannot be used simultaneously with table_excludes.

table_excludes – Comma-separated list of table names to exclude when verifying table consistency on wokers. Cannot be used simultaneously with table_includes.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

verified_ok (bool) –
True if no errors were found, false otherwise. The default value is False.

error_list (list of str) –
List of errors found while validating the database internal state. The default value is an empty list ( [] ).

orphaned_tables_total_size (long) –
If verify_persist is true, verify_orphaned_tables_only is true or delete_orphaned_tables is true, this is the sum in bytes of all orphaned tables found. Otherwise, -1.

info (dict of str to str) –
Additional information.

aggregate_convex_hull(table_name=None, x_column_name=None, y_column_name=None, options={})[source]

Calculates and returns the convex hull for the values in a table specified by input parameter table_name.

Parameters

table_name (str) –
Name of table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.

x_column_name (str) –
Name of the column containing the x coordinates of the points for the operation being performed.

y_column_name (str) –
Name of the column containing the y coordinates of the points for the operation being performed.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

x_vector (list of floats) –
Array of x coordinates of the resulting convex set.

y_vector (list of floats) –
Array of y coordinates of the resulting convex set.

count (int) –
Count of the number of points in the convex set.

is_valid (bool)

info (dict of str to str) –
Additional information.

aggregate_group_by(table_name=None, column_names=None, offset=0, limit=-9999, encoding='binary', options={})[source]

Calculates unique combinations (groups) of values for the given columns in a given table or view and computes aggregates on each unique combination. This is somewhat analogous to an SQL-style SELECT…GROUP BY.

For aggregation details and examples, see Aggregation. For limitations, see Aggregation Limitations.

Any column(s) can be grouped on, and all column types except unrestricted-length strings may be used for computing applicable aggregates; columns marked as store-only are unable to be used in grouping or aggregation.

The results can be paged via the input parameter offset and input parameter limit parameters. For example, to get 10 groups with the largest counts the inputs would be: limit=10, options={“sort_order”:”descending”, “sort_by”:”value”}.

Input parameter options can be used to customize behavior of this call e.g. filtering or sorting the results.

To group by columns ‘x’ and ‘y’ and compute the number of objects within each group, use: column_names=[‘x’,’y’,’count(*)’].

To also compute the sum of ‘z’ over each group, use: column_names=[‘x’,’y’,’count(*)’,’sum(z)’].

Available aggregation functions are: count(*), sum, min, max, avg, mean, stddev, stddev_pop, stddev_samp, var, var_pop, var_samp, arg_min, arg_max and count_distinct.

Available grouping functions are Rollup, Cube, and Grouping Sets

This service also provides support for Pivot operations.

Filtering on aggregates is supported via expressions using aggregation functions supplied to having.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column names must adhere to standard naming conventions; column/aggregation expressions will need to be aliased. If the source table’s shard key is used as the grouping column(s) and all result records are selected (input parameter offset is 0 and input parameter limit is -9999), the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available when any of the values of input parameter column_names is an unrestricted-length string.

Parameters

table_name (str) –
Name of an existing table or view on which the operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.

column_names (list of str) –
List of one or more column names, expressions, and aggregate expressions. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records. Allowed values are:

binary – Indicates that the returned records should be binary encoded.

json – Indicates that the returned records should be json encoded.

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema as part of result_table and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema which is to contain the table specified in result_table. If the schema provided is non-existent, it will be automatically created.

expression – Filter expression to apply to the table prior to computing the aggregate group by.

pipelined_expression_evaluation – evaluate the group-by during last JoinedSet filter plan step. Allowed values are:

true

false

The default value is ‘false’.

having – Filter expression to apply to the aggregated results.

sort_order – [DEPRECATED–use order_by instead] String indicating how the returned values should be sorted - ascending or descending. Allowed values are:

ascending – Indicates that the returned values should be sorted in ascending order.

descending – Indicates that the returned values should be sorted in descending order.

The default value is ‘ascending’.

sort_by – [DEPRECATED–use order_by instead] String determining how the results are sorted. Allowed values are:

key – Indicates that the returned values should be sorted by key, which corresponds to the grouping columns. If you have multiple grouping columns (and are sorting by key), it will first sort the first grouping column, then the second grouping column, etc.

value – Indicates that the returned values should be sorted by value, which corresponds to the aggregates. If you have multiple aggregates (and are sorting by value), it will first sort by the first aggregate, then the second aggregate, etc.

The default value is ‘value’.

order_by – Comma-separated list of the columns to be sorted by as well as the sort direction, e.g., ‘timestamp asc, x desc’. The default value is ‘’.

strategy_definition – The tier strategy for the table and its columns.

compression_codec – The default compression codec for the result table’s columns.

result_table – The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. Column names (group-by and aggregate fields) need to be given aliases e.g. [“FChar256 as fchar256”, “sum(FDouble) as sfd”]. If present, no results are returned in the response. This option is not available if one of the grouping attributes is an unrestricted string (i.e.; not charN) type.

result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

result_table_generate_pk – If true then set a primary key for the result table. Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

result_table_generate_soft_pk – If true then set a soft primary key for the result table. Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in result_table.

chunk_size – Indicates the number of records per chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the result_table option.

create_indexes – Comma-separated list of columns on which to create indexes on the result table. Must be used in combination with the result_table option.

view_id – ID of view of which the result table will be a member. The default value is ‘’.

pivot – pivot column

pivot_values – The value list provided will become the column headers in the output. Should be the values from the pivot_column.

grouping_sets – Customize the grouping attribute sets to compute the aggregates. These sets can include ROLLUP or CUBE operators. The attribute sets should be enclosed in parentheses and can include composite attributes. All attributes specified in the grouping sets must present in the group-by attributes.

rollup – This option is used to specify the multilevel aggregates.

cube – This option is used to specify the multidimensional aggregates.

shard_key – Comma-separated list of the columns to be sharded on; e.g. ‘column1, column2’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

binary_encoded_response (bytes) –
Avro binary encoded response.

json_encoded_response (str) –
Avro JSON encoded response.

total_number_of_records (long) –
Total/Filtered number of records. This may be an over-estimate if a limit was applied and there are additional records (i.e., when output parameter has_more_records is true).

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_result_table_name – The fully qualified name of the table (i.e. including the schema) used to store the results.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
A RecordType object using which the user can decode the binary data by using GPUdbRecord.decode_binary_data(). If JSON encoding is used, then None.

aggregate_group_by_and_decode(table_name=None, column_names=None, offset=0, limit=-9999, encoding='binary', options={}, record_type=None, force_primitive_return_types=True, get_column_major=True)[source]

Calculates unique combinations (groups) of values for the given columns in a given table or view and computes aggregates on each unique combination. This is somewhat analogous to an SQL-style SELECT…GROUP BY.

For aggregation details and examples, see Aggregation. For limitations, see Aggregation Limitations.

Any column(s) can be grouped on, and all column types except unrestricted-length strings may be used for computing applicable aggregates; columns marked as store-only are unable to be used in grouping or aggregation.

The results can be paged via the input parameter offset and input parameter limit parameters. For example, to get 10 groups with the largest counts the inputs would be: limit=10, options={“sort_order”:”descending”, “sort_by”:”value”}.

Input parameter options can be used to customize behavior of this call e.g. filtering or sorting the results.

To group by columns ‘x’ and ‘y’ and compute the number of objects within each group, use: column_names=[‘x’,’y’,’count(*)’].

To also compute the sum of ‘z’ over each group, use: column_names=[‘x’,’y’,’count(*)’,’sum(z)’].

Available aggregation functions are: count(*), sum, min, max, avg, mean, stddev, stddev_pop, stddev_samp, var, var_pop, var_samp, arg_min, arg_max and count_distinct.

Available grouping functions are Rollup, Cube, and Grouping Sets

This service also provides support for Pivot operations.

Filtering on aggregates is supported via expressions using aggregation functions supplied to having.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column names must adhere to standard naming conventions; column/aggregation expressions will need to be aliased. If the source table’s shard key is used as the grouping column(s) and all result records are selected (input parameter offset is 0 and input parameter limit is -9999), the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available when any of the values of input parameter column_names is an unrestricted-length string.

Parameters

table_name (str) –
Name of an existing table or view on which the operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.

column_names (list of str) –
List of one or more column names, expressions, and aggregate expressions. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records. Allowed values are:

binary – Indicates that the returned records should be binary encoded.

json – Indicates that the returned records should be json encoded.

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema as part of result_table and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema which is to contain the table specified in result_table. If the schema provided is non-existent, it will be automatically created.

expression – Filter expression to apply to the table prior to computing the aggregate group by.

pipelined_expression_evaluation – evaluate the group-by during last JoinedSet filter plan step. Allowed values are:

true

false

The default value is ‘false’.

having – Filter expression to apply to the aggregated results.

sort_order – [DEPRECATED–use order_by instead] String indicating how the returned values should be sorted - ascending or descending. Allowed values are:

ascending – Indicates that the returned values should be sorted in ascending order.

descending – Indicates that the returned values should be sorted in descending order.

The default value is ‘ascending’.

sort_by – [DEPRECATED–use order_by instead] String determining how the results are sorted. Allowed values are:

key – Indicates that the returned values should be sorted by key, which corresponds to the grouping columns. If you have multiple grouping columns (and are sorting by key), it will first sort the first grouping column, then the second grouping column, etc.

value – Indicates that the returned values should be sorted by value, which corresponds to the aggregates. If you have multiple aggregates (and are sorting by value), it will first sort by the first aggregate, then the second aggregate, etc.

The default value is ‘value’.

order_by – Comma-separated list of the columns to be sorted by as well as the sort direction, e.g., ‘timestamp asc, x desc’. The default value is ‘’.

strategy_definition – The tier strategy for the table and its columns.

compression_codec – The default compression codec for the result table’s columns.

result_table – The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. Column names (group-by and aggregate fields) need to be given aliases e.g. [“FChar256 as fchar256”, “sum(FDouble) as sfd”]. If present, no results are returned in the response. This option is not available if one of the grouping attributes is an unrestricted string (i.e.; not charN) type.

result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

result_table_generate_pk – If true then set a primary key for the result table. Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

result_table_generate_soft_pk – If true then set a soft primary key for the result table. Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in result_table.

chunk_size – Indicates the number of records per chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the result_table option.

create_indexes – Comma-separated list of columns on which to create indexes on the result table. Must be used in combination with the result_table option.

view_id – ID of view of which the result table will be a member. The default value is ‘’.

pivot – pivot column

pivot_values – The value list provided will become the column headers in the output. Should be the values from the pivot_column.

grouping_sets – Customize the grouping attribute sets to compute the aggregates. These sets can include ROLLUP or CUBE operators. The attribute sets should be enclosed in parentheses and can include composite attributes. All attributes specified in the grouping sets must present in the group-by attributes.

rollup – This option is used to specify the multilevel aggregates.

cube – This option is used to specify the multidimensional aggregates.

shard_key – Comma-separated list of the columns to be sharded on; e.g. ‘column1, column2’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
The record type expected in the results, or None to determine the appropriate type automatically. If known, providing this may improve performance in binary mode. Not used in JSON mode. The default value is None.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.

Returns

A dict with the following entries–

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

total_number_of_records (long) –
Total/Filtered number of records. This may be an over-estimate if a limit was applied and there are additional records (i.e., when output parameter has_more_records is true).

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_result_table_name – The fully qualified name of the table (i.e. including the schema) used to store the results.

The default value is an empty dict ( {} ).

records (list of Record) –
A list of Record objects which contain the decoded records.

aggregate_histogram(table_name=None, column_name=None, start=None, end=None, interval=None, options={})[source]

Performs a histogram calculation given a table, a column, and an interval function. The input parameter interval is used to produce bins of that size and the result, computed over the records falling within each bin, is returned. For each bin, the start value is inclusive, but the end value is exclusive–except for the very last bin for which the end value is also inclusive. The value returned for each bin is the number of records in it, except when a column name is provided as a value_column. In this latter case the sum of the values corresponding to the value_column is used as the result instead. The total number of bins requested cannot exceed 10,000.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service a request that specifies a value_column.

Parameters

table_name (str) –
Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.

column_name (str) –
Name of a column or an expression of one or more column names over which the histogram will be calculated.

start (float) –
Lower end value of the histogram interval, inclusive.

end (float) –
Upper end value of the histogram interval, inclusive.

interval (float) –
The size of each bin within the start and end parameters.

options (dict of str to str) –
Optional parameters. Allowed keys are:

value_column – The name of the column to use when calculating the bin values (values are summed). The column must be a numerical type (int, double, long, float).

start – The start parameter for char types.

end – The end parameter for char types.

interval – The interval parameter for char types.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

counts (list of floats) –
The array of calculated values that represents the histogram data points.

start (float) –
Value of input parameter start.

end (float) –
Value of input parameter end.

info (dict of str to str) –
Additional information.

aggregate_k_means(table_name=None, column_names=None, k=None, tolerance=None, options={})[source]

This endpoint runs the k-means algorithm - a heuristic algorithm that attempts to do k-means clustering. An ideal k-means clustering algorithm selects k points such that the sum of the mean squared distances of each member of the set to the nearest of the k points is minimized. The k-means algorithm however does not necessarily produce such an ideal cluster. It begins with a randomly selected set of k points and then refines the location of the points iteratively and settles to a local minimum. Various parameters and options are provided to control the heuristic search.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

Parameters

table_name (str) –
Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.

column_names (list of str) –
List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

k (int) –
The number of mean points to be determined by the algorithm.

tolerance (float) –
Stop iterating when the distances between successive points is less than the given tolerance.

options (dict of str to str) –
Optional parameters. Allowed keys are:

whiten – When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.

max_iters – Number of times to try to hit the tolerance limit before giving up - default is 10.

num_tries – Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:

true

false

The default value is ‘false’.

result_table – The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If this option is specified, the results are not returned in the response.

result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in result_table.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

means (list of lists of floats) –
The k-mean values found.

counts (list of longs) –
The number of elements in the cluster closest the corresponding k-means values.

rms_dists (list of floats) –
The root mean squared distance of the elements in the cluster for each of the k-means values.

count (long) –
The total count of all the clusters - will be the size of the input table.

rms_dist (float) –
The sum of all the rms_dists - the value the k-means algorithm is attempting to minimize.

tolerance (float) –
The distance between the last two iterations of the algorithm before it quit.

num_iters (int) –
The number of iterations the algorithm executed before it quit.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_result_table_name – The fully qualified name of the result table (i.e. including the schema) used to store the results.

The default value is an empty dict ( {} ).

aggregate_min_max(table_name=None, column_name=None, options={})[source]

Calculates and returns the minimum and maximum values of a particular column in a table.

Parameters

table_name (str) –
Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.

column_name (str) –
Name of a column or an expression of one or more column on which the min-max will be calculated.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

min (float) –
Minimum value of the input parameter column_name.

max (float) –
Maximum value of the input parameter column_name.

info (dict of str to str) –
Additional information. Allowed keys are:

min_string – The minimum value of input parameter column_name when it is a char type

max_string – The maximum value of input parameter column_name when it is a char type

The default value is an empty dict ( {} ).

aggregate_min_max_geometry(table_name=None, column_name=None, options={})[source]

Calculates and returns the minimum and maximum x- and y-coordinates of a particular geospatial geometry column in a table.

Parameters

table_name (str) –
Name of the table on which the operation will be performed. Must be an existing table, in [schema_name.]table_name format, using standard name resolution rules.

column_name (str) –
Name of a geospatial geometry column on which the min-max will be calculated.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

min_x (float) –
Minimum x-coordinate value of the input parameter column_name.

max_x (float) –
Maximum x-coordinate value of the input parameter column_name.

min_y (float) –
Minimum y-coordinate value of the input parameter column_name.

max_y (float) –
Maximum y-coordinate value of the input parameter column_name.

info (dict of str to str) –
Additional information.

aggregate_statistics(table_name=None, column_name=None, stats=None, options={})[source]

Calculates the requested statistics of the given column(s) in a given table.

The available statistics are: count (number of total objects), mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, weighted_average, cardinality (unique count), estimated_cardinality, percentile, and percentile_rank.

Estimated cardinality is calculated by using the hyperloglog approximation technique.

Percentiles and percentile ranks are approximate and are calculated using the t-digest algorithm. They must include the desired percentile/percentile_rank. To compute multiple percentiles each value must be specified separately (i.e. ‘percentile(75.0),percentile(99.0),percentile_rank(1234.56),percentile_rank(-5)’).

A second, comma-separated value can be added to the percentile statistic to calculate percentile resolution, e.g., a 50th percentile with 200 resolution would be ‘percentile(50,200)’.

The weighted average statistic requires a weight column to be specified in weight_column_name. The weighted average is then defined as the sum of the products of input parameter column_name times the weight_column_name values divided by the sum of the weight_column_name values.

Additional columns can be used in the calculation of statistics via additional_column_names. Values in these columns will be included in the overall aggregate calculation–individual aggregates will not be calculated per additional column. For instance, requesting the count & mean of input parameter column_name x and additional_column_names y & z, where x holds the numbers 1-10, y holds 11-20, and z holds 21-30, would return the total number of x, y, & z values (30), and the single average value across all x, y, & z values (15.5).

The response includes a list of key/value pairs of each statistic requested and its corresponding value.

Parameters

table_name (str) –
Name of the table on which the statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.

column_name (str) –
Name of the primary column for which the statistics are to be calculated.

stats (str) –
Comma separated list of the statistics to calculate, e.g. “sum,mean”. Allowed values are:

count – Number of objects (independent of the given column(s)).

mean – Arithmetic mean (average), equivalent to sum/count.

stdv – Sample standard deviation (denominator is count-1).

variance – Unbiased sample variance (denominator is count-1).

skew – Skewness (third standardized moment).

kurtosis – Kurtosis (fourth standardized moment).

sum – Sum of all values in the column(s).

min – Minimum value of the column(s).

max – Maximum value of the column(s).

weighted_average – Weighted arithmetic mean (using the option weight_column_name as the weighting column).

cardinality – Number of unique values in the column(s).

estimated_cardinality – Estimate (via hyperloglog technique) of the number of unique values in the column(s).

percentile – Estimate (via t-digest) of the given percentile of the column(s) (percentile(50.0) will be an approximation of the median). Add a second, comma-separated value to calculate percentile resolution, e.g., ‘percentile(75,150)’

percentile_rank – Estimate (via t-digest) of the percentile rank of the given value in the column(s) (if the given value is the median of the column(s), percentile_rank(<median>) will return approximately 50.0).

options (dict of str to str) –
Optional parameters. Allowed keys are:

additional_column_names – A list of comma separated column names over which statistics can be accumulated along with the primary column. All columns listed and input parameter column_name must be of the same type. Must not include the column specified in input parameter column_name and no column can be listed twice.

weight_column_name – Name of column used as weighting attribute for the weighted average statistic.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

stats (dict of str to floats) –
(statistic name, double value) pairs of the requested statistics, including the total count by default.

info (dict of str to str) –
Additional information.

aggregate_statistics_by_range(table_name=None, select_expression='', column_name=None, value_column_name=None, stats=None, start=None, end=None, interval=None, options={})[source]

Divides the given set into bins and calculates statistics of the values of a value-column in each bin. The bins are based on the values of a given binning-column. The statistics that may be requested are mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, first, last and weighted average. In addition to the requested statistics the count of total samples in each bin is returned. This counts vector is just the histogram of the column used to divide the set members into bins. The weighted average statistic requires a weight column to be specified in weight_column_name. The weighted average is then defined as the sum of the products of the value column times the weight column divided by the sum of the weight column.

There are two methods for binning the set members. In the first, which can be used for numeric valued binning-columns, a min, max and interval are specified. The number of bins, nbins, is the integer upper bound of (max-min)/interval. Values that fall in the range [min+n*interval,min+(n+1)*interval) are placed in the nth bin where n ranges from 0..nbin-2. The final bin is [min+(nbin-1)*interval,max]. In the second method, bin_values specifies a list of binning column values. Binning-columns whose value matches the nth member of the bin_values list are placed in the nth bin. When a list is provided, the binning-column must be of type string or int.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

Parameters

table_name (str) –
Name of the table on which the ranged-statistics operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.

select_expression (str) –
For a non-empty expression statistics are calculated for those records for which the expression is true. The default value is ‘’.

column_name (str) –
Name of the binning-column used to divide the set samples into bins.

value_column_name (str) –
Name of the value-column for which statistics are to be computed.

stats (str) –
A string of comma separated list of the statistics to calculate, e.g. ‘sum,mean’. Available statistics: mean, stdv (standard deviation), variance, skew, kurtosis, sum.

start (float) –
The lower bound of the binning-column.

end (float) –
The upper bound of the binning-column.

interval (float) –
The interval of a bin. Set members fall into bin i if the binning-column falls in the range [start+interval*i, start+interval*(i+1)).

options (dict of str to str) –
Map of optional parameters: Allowed keys are:

additional_column_names – A list of comma separated value-column names over which statistics can be accumulated along with the primary value_column.

bin_values – A list of comma separated binning-column values. Values that match the nth bin_values value are placed in the nth bin.

weight_column_name – Name of the column used as weighting column for the weighted_average statistic.

order_column_name – Name of the column used for candlestick charting techniques.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

stats (dict of str to lists of floats) –
A map with a key for each statistic in the stats input parameter having a value that is a vector of the corresponding value-column bin statistics. In a addition the key count has a value that is a histogram of the binning-column.

info (dict of str to str) –
Additional information.

aggregate_unique(table_name=None, column_name=None, offset=0, limit=-9999, encoding='binary', options={})[source]

Returns all the unique values from a particular column (specified by input parameter column_name) of a particular table or view (specified by input parameter table_name). If input parameter column_name is a numeric column, the values will be in output parameter binary_encoded_response. Otherwise if input parameter column_name is a string column, the values will be in output parameter json_encoded_response. The results can be paged via input parameter offset and input parameter limit parameters.

Columns marked as store-only are unable to be used with this function.

To get the first 10 unique values sorted in descending order input parameter options would be:

{"limit":"10","sort_order":"descending"}

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column name must adhere to standard naming conventions; any column expression will need to be aliased. If the source table’s shard key is used as the input parameter column_name, the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available if the value of input parameter column_name is an unrestricted-length string.

Parameters

table_name (str) –
Name of an existing table or view on which the operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.

column_name (str) –
Name of the column or an expression containing one or more column names on which the unique function would be applied.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records. Allowed values are:

binary – Indicates that the returned records should be binary encoded.

json – Indicates that the returned records should be json encoded.

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema as part of result_table and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema which is to contain the table specified in result_table. If the schema provided is non-existent, it will be automatically created.

expression – Optional filter expression to apply to the table.

sort_order – String indicating how the returned values should be sorted. Allowed values are:

ascending

descending

The default value is ‘ascending’.

order_by – Comma-separated list of the columns to be sorted by as well as the sort direction, e.g., ‘timestamp asc, x desc’. The default value is ‘’.

result_table – The name of the table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If present, no results are returned in the response. Not available if input parameter column_name is an unrestricted-length string.

result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

result_table_generate_pk – If true then set a primary key for the result table. Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in result_table.

chunk_size – Indicates the number of records per chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the result_table option.

compression_codec – The default compression codec for the result table’s columns.

view_id – ID of view of which the result table will be a member. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
The same table name as was passed in the parameter list.

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

binary_encoded_response (bytes) –
Avro binary encoded response.

json_encoded_response (str) –
Avro JSON encoded response.

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_result_table_name – The fully qualified name of the table (i.e. including the schema) used to store the results.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
A RecordType object using which the user can decode the binary data by using GPUdbRecord.decode_binary_data(). If JSON encoding is used, then None.

aggregate_unique_and_decode(table_name=None, column_name=None, offset=0, limit=-9999, encoding='binary', options={}, record_type=None, force_primitive_return_types=True, get_column_major=True)[source]

Returns all the unique values from a particular column (specified by input parameter column_name) of a particular table or view (specified by input parameter table_name). If input parameter column_name is a numeric column, the values will be in output parameter binary_encoded_response. Otherwise if input parameter column_name is a string column, the values will be in output parameter json_encoded_response. The results can be paged via input parameter offset and input parameter limit parameters.

Columns marked as store-only are unable to be used with this function.

To get the first 10 unique values sorted in descending order input parameter options would be:

{"limit":"10","sort_order":"descending"}

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column name must adhere to standard naming conventions; any column expression will need to be aliased. If the source table’s shard key is used as the input parameter column_name, the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available if the value of input parameter column_name is an unrestricted-length string.

Parameters

table_name (str) –
Name of an existing table or view on which the operation will be performed, in [schema_name.]table_name format, using standard name resolution rules.

column_name (str) –
Name of the column or an expression containing one or more column names on which the unique function would be applied.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records. Allowed values are:

binary – Indicates that the returned records should be binary encoded.

json – Indicates that the returned records should be json encoded.

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema as part of result_table and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema which is to contain the table specified in result_table. If the schema provided is non-existent, it will be automatically created.

expression – Optional filter expression to apply to the table.

sort_order – String indicating how the returned values should be sorted. Allowed values are:

ascending

descending

The default value is ‘ascending’.

order_by – Comma-separated list of the columns to be sorted by as well as the sort direction, e.g., ‘timestamp asc, x desc’. The default value is ‘’.

result_table – The name of the table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If present, no results are returned in the response. Not available if input parameter column_name is an unrestricted-length string.

result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

result_table_generate_pk – If true then set a primary key for the result table. Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in result_table.

chunk_size – Indicates the number of records per chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the result_table option.

compression_codec – The default compression codec for the result table’s columns.

view_id – ID of view of which the result table will be a member. The default value is ‘’.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
The record type expected in the results, or None to determine the appropriate type automatically. If known, providing this may improve performance in binary mode. Not used in JSON mode. The default value is None.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.

Returns

A dict with the following entries–

table_name (str) –
The same table name as was passed in the parameter list.

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_result_table_name – The fully qualified name of the table (i.e. including the schema) used to store the results.

The default value is an empty dict ( {} ).

records (list of Record) –
A list of Record objects which contain the decoded records.

aggregate_unpivot(table_name=None, column_names=None, variable_column_name='', value_column_name='', pivoted_columns=None, encoding='binary', options={})[source]

Rotate the column values into rows values.

For unpivot details and examples, see Unpivot. For limitations, see Unpivot Limitations.

Unpivot is used to normalize tables that are built for cross tabular reporting purposes. The unpivot operator rotates the column values for all the pivoted columns. A variable column, value column and all columns from the source table except the unpivot columns are projected into the result table. The variable column and value columns in the result table indicate the pivoted column name and values respectively.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

table_name (str) –
Name of the table on which the operation will be performed. Must be an existing table/view, in [schema_name.]table_name format, using standard name resolution rules.

column_names (list of str) –
List of column names or expressions. A wildcard ‘*’ can be used to include all the non-pivoted columns from the source table. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

variable_column_name (str) –
Specifies the variable/parameter column name. The default value is ‘’.

value_column_name (str) –
Specifies the value column name. The default value is ‘’.

pivoted_columns (list of str) –
List of one or more values typically the column names of the input table. All the columns in the source table must have the same data type. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

encoding (str) –
Specifies the encoding for returned records. Allowed values are:

binary – Indicates that the returned records should be binary encoded.

json – Indicates that the returned records should be json encoded.

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema as part of result_table and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema which is to contain the table specified in result_table. If the schema is non-existent, it will be automatically created.

result_table – The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If present, no results are returned in the response.

result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

expression – Filter expression to apply to the table prior to unpivot processing.

order_by – Comma-separated list of the columns to be sorted by; e.g. ‘timestamp asc, x desc’. The columns specified must be present in input table. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.

chunk_size – Indicates the number of records per chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the result_table option.

compression_codec – The default compression codec for the result table’s columns.

limit – The number of records to keep. The default value is ‘’.

ttl – Sets the TTL of the table specified in result_table.

view_id – view this result table is part of. The default value is ‘’.

create_indexes – Comma-separated list of columns on which to create indexes on the table specified in result_table. The columns specified must be present in output column names. If any alias is given for any column name, the alias must be used, rather than the original column name.

result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Typically shows the result-table name if provided in the request (Ignore otherwise).

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

binary_encoded_response (bytes) –
Avro binary encoded response.

json_encoded_response (str) –
Avro JSON encoded response.

total_number_of_records (long) –
Total/Filtered number of records.

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_result_table_name – The fully qualified name of the table (i.e. including the schema) used to store the results.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
A RecordType object using which the user can decode the binary data by using GPUdbRecord.decode_binary_data(). If JSON encoding is used, then None.

aggregate_unpivot_and_decode(table_name=None, column_names=None, variable_column_name='', value_column_name='', pivoted_columns=None, encoding='binary', options={}, record_type=None, force_primitive_return_types=True, get_column_major=True)[source]

Rotate the column values into rows values.

For unpivot details and examples, see Unpivot. For limitations, see Unpivot Limitations.

Unpivot is used to normalize tables that are built for cross tabular reporting purposes. The unpivot operator rotates the column values for all the pivoted columns. A variable column, value column and all columns from the source table except the unpivot columns are projected into the result table. The variable column and value columns in the result table indicate the pivoted column name and values respectively.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

table_name (str) –
Name of the table on which the operation will be performed. Must be an existing table/view, in [schema_name.]table_name format, using standard name resolution rules.

column_names (list of str) –
List of column names or expressions. A wildcard ‘*’ can be used to include all the non-pivoted columns from the source table. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

variable_column_name (str) –
Specifies the variable/parameter column name. The default value is ‘’.

value_column_name (str) –
Specifies the value column name. The default value is ‘’.

pivoted_columns (list of str) –
List of one or more values typically the column names of the input table. All the columns in the source table must have the same data type. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

encoding (str) –
Specifies the encoding for returned records. Allowed values are:

binary – Indicates that the returned records should be binary encoded.

json – Indicates that the returned records should be json encoded.

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of result_table. If result_table_persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_result_table_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema as part of result_table and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema which is to contain the table specified in result_table. If the schema is non-existent, it will be automatically created.

result_table – The name of a table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If present, no results are returned in the response.

result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

expression – Filter expression to apply to the table prior to unpivot processing.

order_by – Comma-separated list of the columns to be sorted by; e.g. ‘timestamp asc, x desc’. The columns specified must be present in input table. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.

chunk_size – Indicates the number of records per chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for the result table. Must be used in combination with the result_table option.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for the result table. Must be used in combination with the result_table option.

compression_codec – The default compression codec for the result table’s columns.

limit – The number of records to keep. The default value is ‘’.

ttl – Sets the TTL of the table specified in result_table.

view_id – view this result table is part of. The default value is ‘’.

create_indexes – Comma-separated list of columns on which to create indexes on the table specified in result_table. The columns specified must be present in output column names. If any alias is given for any column name, the alias must be used, rather than the original column name.

result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
The record type expected in the results, or None to determine the appropriate type automatically. If known, providing this may improve performance in binary mode. Not used in JSON mode. The default value is None.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.

Returns

A dict with the following entries–

table_name (str) –
Typically shows the result-table name if provided in the request (Ignore otherwise).

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

total_number_of_records (long) –
Total/Filtered number of records.

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_result_table_name – The fully qualified name of the table (i.e. including the schema) used to store the results.

The default value is an empty dict ( {} ).

records (list of Record) –
A list of Record objects which contain the decoded records.

alter_backup(backup_name=None, action=None, value=None, datasink_name=None, options={})[source]

Alters an existing database backup containing a current snapshot of existing objects.

Parameters

backup_name (str) –
Name of the backup object to be altered

action (str) –
Operation to be applied. Allowed values are:

checksum – Calculate checksum for backup files

ddl_only – Only save the DDL, do not backup table data

max_incremental_backups_to_keep – Maximum number of incremental backups to keep

merge – Merges all backup instances and creates a single full backup

purge – Purges backup instances

value (str) –
Action specific argument.

datasink_name (str) –
Datasink where backup will be stored.

options (dict of str to str) –
Optional parameters. Allowed keys are:

comment – Comments to store with the new backup instance

dry_run – Dry run of backup changes. Allowed values are:

false

true

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

backup_name (str) –
Value of input parameter backup_name.

backup_id (long) –
Backup ID.

total_bytes (long) –
Total size of files affected by alter operation

total_number_of_records (long) –
Total number of records affected alter operation

info (dict of str to str) –
Additional information.

alter_credential(credential_name=None, credential_updates_map=None, options=None)[source]

Alter the properties of an existing credential.

Parameters

credential_name (str) –
Name of the credential to be altered. Must be an existing credential.

credential_updates_map (dict of str to str) –
Map containing the properties of the credential to be updated. Error if empty. Allowed keys are:

type – New type for the credential. Allowed values are:

aws_access_key

aws_iam_role

azure_ad

azure_oauth

azure_sas

azure_storage_key

docker

gcs_service_account_id

gcs_service_account_keys

hdfs

kafka

identity – New user for the credential

secret – New password for the credential

schema_name – Updates the schema name. If schema_name doesn’t exist, an error will be thrown. If schema_name is empty, then the user’s default schema will be used.

options (dict of str to str) –
Optional parameters.

Returns

A dict with the following entries–

credential_name (str) –
Value of input parameter credential_name.

info (dict of str to str) –
Additional information.

alter_datasink(name=None, datasink_updates_map=None, options=None)[source]

Alters the properties of an existing data sink

Parameters

name (str) –
Name of the data sink to be altered. Must be an existing data sink.

datasink_updates_map (dict of str to str) –
Map containing the properties of the data sink to be updated. Error if empty. Allowed keys are:

destination – Destination for the output data in format ‘destination_type://path[:port]’.

Supported destination types are ‘azure’, ‘gcs’, ‘hdfs’, ‘http’, ‘https’, ‘jdbc’, ‘kafka’, and ‘s3’.

connection_timeout – Timeout in seconds for connecting to this sink

wait_timeout – Timeout in seconds for waiting for a response from this sink

credential – Name of the credential object to be used in this data sink

s3_bucket_name – Name of the Amazon S3 bucket to use as the data sink

s3_region – Name of the Amazon S3 region where the given bucket is located

s3_verify_ssl – Whether to verify SSL connections. Allowed values are:

true – Connect with SSL verification

false – Connect without verifying the SSL connection; for testing purposes, bypassing TLS errors, self-signed certificates, etc.

The default value is ‘true’.

s3_use_virtual_addressing – Whether to use virtual addressing when referencing the Amazon S3 sink. Allowed values are:

true – The requests URI should be specified in virtual-hosted-style format where the bucket name is part of the domain name in the URL.

false – Use path-style URI for requests.

The default value is ‘true’.

s3_aws_role_arn – Amazon IAM Role ARN which has required S3 permissions that can be assumed for the given S3 IAM user

s3_encryption_customer_algorithm – Customer encryption algorithm used encrypting data

s3_encryption_customer_key – Customer encryption key to encrypt or decrypt data

s3_encryption_type – Server side encryption type

s3_kms_key_id – KMS key

hdfs_kerberos_keytab – Kerberos keytab file location for the given HDFS user. This may be a KIFS file.

hdfs_delegation_token – Delegation token for the given HDFS user

hdfs_use_kerberos – Use kerberos authentication for the given HDFS cluster. Allowed values are:

true

false

The default value is ‘false’.

azure_storage_account_name – Name of the Azure storage account to use as the data sink, this is valid only if tenant_id is specified

azure_container_name – Name of the Azure storage container to use as the data sink

azure_tenant_id – Active Directory tenant ID (or directory ID)

azure_sas_token – Shared access signature token for Azure storage account to use as the data sink

azure_oauth_token – Oauth token to access given storage container

gcs_bucket_name – Name of the Google Cloud Storage bucket to use as the data sink

gcs_project_id – Name of the Google Cloud project to use as the data sink

gcs_service_account_keys – Google Cloud service account keys to use for authenticating the data sink

jdbc_driver_jar_path – JDBC driver jar file location. This may be a KIFS file.

jdbc_driver_class_name – Name of the JDBC driver class

kafka_url – The publicly-accessible full path URL to the kafka broker, e.g., ‘http://172.123.45.67:9300’.

kafka_topic_name – Name of the Kafka topic to use for this data sink, if it references a Kafka broker

anonymous – Create an anonymous connection to the storage provider–DEPRECATED: this is now the default. Specify use_managed_credentials for non-anonymous connection. Allowed values are:

true

false

The default value is ‘true’.

use_managed_credentials – When no credentials are supplied, we use anonymous access by default. If this is set, we will use cloud provider user settings. Allowed values are:

true

false

The default value is ‘false’.

use_https – Use https to connect to datasink if true, otherwise use http. Allowed values are:

true

false

The default value is ‘true’.

max_batch_size – Maximum number of records per notification message. The default value is ‘1’.

max_message_size – Maximum size in bytes of each notification message. The default value is ‘1000000’.

json_format – The desired format of JSON encoded notifications message. Allowed values are:

flat – A single record is returned per message

nested – Records are returned as an array per message

The default value is ‘flat’.

skip_validation – Bypass validation of connection to this data sink. Allowed values are:

true

false

The default value is ‘false’.

schema_name – Updates the schema name. If schema_name doesn’t exist, an error will be thrown. If schema_name is empty, then the user’s default schema will be used.

options (dict of str to str) –
Optional parameters.

Returns

A dict with the following entries–

updated_properties_map (dict of str to str) –
Map of values updated

info (dict of str to str) –
Additional information.

alter_datasource(name=None, datasource_updates_map=None, options=None)[source]

Alters the properties of an existing data source

Parameters

name (str) –
Name of the data source to be altered. Must be an existing data source.

datasource_updates_map (dict of str to str) –
Map containing the properties of the data source to be updated. Error if empty. Allowed keys are:

location – Location of the remote storage in ‘storage_provider_type://[storage_path[:storage_port]]’ format.

Supported storage provider types are ‘azure’, ‘gcs’, ‘hdfs’, ‘jdbc’, ‘kafka’, ‘confluent’, and ‘s3’.

user_name – Name of the remote system user; may be an empty string

password – Password for the remote system user; may be an empty string

skip_validation – Bypass validation of connection to remote source. Allowed values are:

true

false

The default value is ‘false’.

connection_timeout – Timeout in seconds for connecting to this storage provider

wait_timeout – Timeout in seconds for reading from this storage provider

credential – Name of the credential object to be used in data source

s3_bucket_name – Name of the Amazon S3 bucket to use as the data source

s3_region – Name of the Amazon S3 region where the given bucket is located

s3_verify_ssl – Whether to verify SSL connections. Allowed values are:

true – Connect with SSL verification

false – Connect without verifying the SSL connection; for testing purposes, bypassing TLS errors, self-signed certificates, etc.

The default value is ‘true’.

s3_use_virtual_addressing – Whether to use virtual addressing when referencing the Amazon S3 source. Allowed values are:

true – The requests URI should be specified in virtual-hosted-style format where the bucket name is part of the domain name in the URL.

false – Use path-style URI for requests.

The default value is ‘true’.

s3_aws_role_arn – Amazon IAM Role ARN which has required S3 permissions that can be assumed for the given S3 IAM user

s3_encryption_customer_algorithm – Customer encryption algorithm used encrypting data

s3_encryption_customer_key – Customer encryption key to encrypt or decrypt data

hdfs_kerberos_keytab – Kerberos keytab file location for the given HDFS user. This may be a KIFS file.

hdfs_delegation_token – Delegation token for the given HDFS user

hdfs_use_kerberos – Use kerberos authentication for the given HDFS cluster. Allowed values are:

true

false

The default value is ‘false’.

azure_storage_account_name – Name of the Azure storage account to use as the data source, this is valid only if tenant_id is specified

azure_container_name – Name of the Azure storage container to use as the data source

azure_tenant_id – Active Directory tenant ID (or directory ID)

azure_sas_token – Shared access signature token for Azure storage account to use as the data source

azure_oauth_token – OAuth token to access given storage container

gcs_bucket_name – Name of the Google Cloud Storage bucket to use as the data source

gcs_project_id – Name of the Google Cloud project to use as the data source

gcs_service_account_keys – Google Cloud service account keys to use for authenticating the data source

jdbc_driver_jar_path – JDBC driver jar file location. This may be a KIFS file.

jdbc_driver_class_name – Name of the JDBC driver class

kafka_url – The publicly-accessible full path URL to the Kafka broker, e.g., ‘http://172.123.45.67:9300’.

kafka_topic_name – Name of the Kafka topic to use as the data source

anonymous – Create an anonymous connection to the storage provider–DEPRECATED: this is now the default. Specify use_managed_credentials for non-anonymous connection. Allowed values are:

true

false

The default value is ‘true’.

use_managed_credentials – When no credentials are supplied, we use anonymous access by default. If this is set, we will use cloud provider user settings. Allowed values are:

true

false

The default value is ‘false’.

use_https – Use https to connect to datasource if true, otherwise use http. Allowed values are:

true

false

The default value is ‘true’.

schema_name – Updates the schema name. If schema_name doesn’t exist, an error will be thrown. If schema_name is empty, then the user’s default schema will be used.

schema_registry_connection_retries – Confluent Schema registry connection timeout (in Secs)

schema_registry_connection_timeout – Confluent Schema registry connection timeout (in Secs)

schema_registry_credential – Confluent Schema Registry credential object name.

schema_registry_location – Location of Confluent Schema Registry in ‘[storage_path[:storage_port]]’ format.

schema_registry_port – Confluent Schema Registry port (optional).

options (dict of str to str) –
Optional parameters.

Returns

A dict with the following entries–

updated_properties_map (dict of str to str) –
Map of values updated

info (dict of str to str) –
Additional information.

alter_directory(directory_name=None, directory_updates_map=None, options={})[source]

Alters an existing directory in KiFS.

Parameters

directory_name (str) –
Name of the directory in KiFS to be altered.

directory_updates_map (dict of str to str) –
Map containing the properties of the directory to be altered. Error if empty. Allowed keys are:

data_limit – The maximum capacity, in bytes, to apply to the directory. Set to -1 to indicate no upper limit.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

directory_name (str) –
Value of input parameter directory_name.

info (dict of str to str) –
Additional information.

alter_environment(environment_name=None, action=None, value=None, options={})[source]

Alters an existing environment which can be referenced by a user-defined function (UDF).

Parameters

environment_name (str) –
Name of the environment to be altered.

action (str) –
Modification operation to be applied. Allowed values are:

install_package – Install a python package from PyPI, an external data source or KiFS

install_requirements – Install packages from a requirements file

uninstall_package – Uninstall a python package.

uninstall_requirements – Uninstall packages from a requirements file

reset – Uninstalls all packages in the environment and resets it to the original state at time of creation

rebuild – Recreates the environment and re-installs all packages, upgrades the packages if necessary based on dependencies

value (str) –
The value of the modification, depending on input parameter action. For example, if input parameter action is install_package, this would be the python package name.

If input parameter action is install_requirements, this would be the path of a requirements file from which to install packages.

If an external data source is specified in datasource_name, this can be the path to a wheel file or source archive. Alternatively, if installing from a file (wheel or source archive), the value may be a reference to a file in KiFS.

options (dict of str to str) –
Optional parameters. Allowed keys are:

datasource_name – Name of an existing external data source from which packages specified in input parameter value can be loaded

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

environment_name (str) –
Value of input parameter environment_name.

info (dict of str to str) –
Additional information.

alter_resource_group(name=None, tier_attributes={}, ranking='', adjoining_resource_group='', options={})[source]

Alters the properties of an existing resource group to facilitate resource management.

Parameters

name (str) –
Name of the group to be altered. Must be an existing resource group name or an empty string when used in conjunction with is_default_group.

tier_attributes (dict of str to dicts of str to str) –
Optional map containing tier names and their respective attribute group limits. The only valid attribute limit that can be set is max_memory (in bytes) for the VRAM & RAM tiers.

For instance, to set max VRAM capacity to 1GB per rank per GPU and max RAM capacity to 10GB per rank, use: {‘VRAM’:{‘max_memory’:’1000000000’}, ‘RAM’:{‘max_memory’:’10000000000’}}. Allowed keys are:

max_memory – Maximum amount of memory usable at one time, per rank, per GPU, for the VRAM tier; or maximum amount of memory usable at one time, per rank, for the RAM tier.

The default value is an empty dict ( {} ).

ranking (str) –
If the resource group ranking is to be updated, this indicates the relative ranking among existing resource groups where this resource group will be placed. Allowed values are:

<blank> – Don’t change the ranking

first – Make this resource group the new first one in the ordering

last – Make this resource group the new last one in the ordering

before – Place this resource group before the one specified by input parameter adjoining_resource_group in the ordering

after – Place this resource group after the one specified by input parameter adjoining_resource_group in the ordering

The default value is ‘’.

adjoining_resource_group (str) –
If input parameter ranking is before or after, this field indicates the resource group before or after which the current group will be placed; otherwise, leave blank. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

max_cpu_concurrency – Maximum number of simultaneous threads that will be used to execute a request, per rank, for this group. The minimum allowed value is ‘4’.

max_data – Maximum amount of data, per rank, in bytes, that can be used by all database objects within this group. Set to -1 to indicate no upper limit. The minimum allowed value is ‘-1’.

max_scheduling_priority – Maximum priority of a scheduled task for this group. The minimum allowed value is ‘1’. The maximum allowed value is ‘100’.

max_tier_priority – Maximum priority of a tiered object for this group. The minimum allowed value is ‘1’. The maximum allowed value is ‘10’.

is_default_group – If true, this request applies to the global default resource group. It is an error for this field to be true when the input parameter name field is also populated. Allowed values are:

true

false

The default value is ‘false’.

persist – If true and a system-level change was requested, the system configuration will be written to disk upon successful application of this request. This will commit the changes from this request and any additional in-memory modifications. Allowed values are:

true

false

The default value is ‘true’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

alter_role(name=None, action=None, value=None, options={})[source]

Alters a Role.

Parameters

name (str) –
Name of the role to be altered. Must be an existing role.

action (str) –
Modification operation to be applied to the role. Allowed values are:

set_comment – Sets the comment for an internal role.

set_resource_group – Sets the resource group for an internal role. The resource group must exist, otherwise, an empty string assigns the role to the default resource group.

value (str) –
The value of the modification, depending on input parameter action.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

alter_schema(schema_name=None, action=None, value=None, options={})[source]

Used to change the name of a SQL-style schema, specified in input parameter schema_name.

Parameters

schema_name (str) –
Name of the schema to be altered.

action (str) –
Modification operation to be applied. Allowed values are:

add_comment – Adds a comment describing the schema

rename_schema – Renames a schema to input parameter value. Has the same naming restrictions as tables.

value (str) –
The value of the modification, depending on input parameter action. For now the only value of input parameter action is rename_schema. In this case the value is the new name of the schema.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

schema_name (str) –
Value of input parameter schema_name.

info (dict of str to str) –
Additional information.

alter_system_properties(property_updates_map=None, options={})[source]

The GPUdb.alter_system_properties() endpoint is primarily used to simplify the testing of the system and is not expected to be used during normal execution. Commands are given through the input parameter property_updates_map whose keys are commands and values are strings representing integer values (for example ‘8000’) or boolean values (‘true’ or ‘false’).

Parameters

property_updates_map (dict of str to str) –
Map containing the properties of the system to be updated. Error if empty. Allowed keys are:

concurrent_kernel_execution – Enables concurrent kernel execution if the value is true and disables it if the value is false. Allowed values are:

true

false

subtask_concurrency_limit – Sets the maximum number of simultaneous threads allocated to a given request, on each rank. Note that thread allocation may also be limited by resource group limits and/or system load.

chunk_size – Sets the number of records per chunk to be used for all new tables.

chunk_column_max_memory – Sets the target maximum data size for each column in a chunk to be used for all new tables.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for all new tables.

execution_mode – Sets the execution_mode for kernel executions to the specified string value. Possible values are host, device, default (engine decides) or an integer value that indicates max chunk size to exec on host

external_files_directory – Sets the root directory path where external table data files are accessed from. Path must exist on the head node

request_timeout – Number of minutes after which filtering (e.g., GPUdb.filter()) and aggregating (e.g., GPUdb.aggregate_group_by()) queries will timeout. The default value is ‘20’. The minimum allowed value is ‘0’. The maximum allowed value is ‘1440’.

max_get_records_size – The maximum number of records the database will serve for a given data retrieval call. The default value is ‘20000’. The minimum allowed value is ‘0’. The maximum allowed value is ‘1000000’.

enable_audit – Enable or disable auditing.

audit_headers – Enable or disable auditing of request headers.

audit_body – Enable or disable auditing of request bodies.

audit_data – Enable or disable auditing of request data.

audit_response – Enable or disable auditing of response information.

shadow_agg_size – Size of the shadow aggregate chunk cache in bytes. The default value is ‘10000000’. The minimum allowed value is ‘0’. The maximum allowed value is ‘2147483647’.

shadow_filter_size – Size of the shadow filter chunk cache in bytes. The default value is ‘10000000’. The minimum allowed value is ‘0’. The maximum allowed value is ‘2147483647’.

enable_overlapped_equi_join – Enable overlapped-equi-join filter. The default value is ‘true’.

enable_one_step_compound_equi_join – Enable the one_step compound-equi-join algorithm. The default value is ‘true’.

kafka_batch_size – Maximum number of records to be ingested in a single batch. The default value is ‘1000’. The minimum allowed value is ‘1’. The maximum allowed value is ‘10000000’.

kafka_poll_timeout – Maximum time (milliseconds) for each poll to get records from kafka. The default value is ‘0’. The minimum allowed value is ‘0’. The maximum allowed value is ‘1000’.

kafka_wait_time – Maximum time (seconds) to buffer records received from kafka before ingestion. The default value is ‘30’. The minimum allowed value is ‘1’. The maximum allowed value is ‘120’.

egress_parquet_compression – Parquet file compression type. Allowed values are:

uncompressed

snappy

gzip

The default value is ‘snappy’.

egress_single_file_max_size – Max file size (in MB) to allow saving to a single file. May be overridden by target limitations. The default value is ‘10000’. The minimum allowed value is ‘1’. The maximum allowed value is ‘200000’.

max_concurrent_kernels – Sets the max_concurrent_kernels value of the conf. The minimum allowed value is ‘0’. The maximum allowed value is ‘256’.

system_metadata_retention_period – Sets the system_metadata.retention_period value of the conf. The minimum allowed value is ‘1’.

tcs_per_tom – Size of the worker rank data calculation thread pool. This is primarily used for computation-based operations such as aggregates and record retrieval. The minimum allowed value is ‘2’. The maximum allowed value is ‘8192’.

tps_per_tom – Size of the worker rank data processing thread pool. This includes operations such as inserts, updates, & deletes on table data. Multi-head inserts are not affected by this limit. The minimum allowed value is ‘2’. The maximum allowed value is ‘8192’.

background_worker_threads – Size of the worker rank background thread pool. This includes background operations such as watermark evictions catalog table updates. The minimum allowed value is ‘1’. The maximum allowed value is ‘8192’.

log_debug_job_info – Outputs various job-related information to the rank logs. Used for troubleshooting.

enable_thread_hang_logging – Log a stack trace for any thread that runs longer than a defined threshold. Used for troubleshooting. The default value is ‘true’.

ai_enable_rag – Enable RAG. The default value is ‘false’.

ai_api_provider – AI API provider type

ai_api_url – AI API URL

ai_api_key – AI API key

ai_api_connection_timeout – AI API connection timeout in seconds

ai_api_embeddings_model – AI API model name

telm_persist_query_metrics – Enable or disable persisting of query metrics.

postgres_proxy_idle_connection_timeout – Idle connection timeout in seconds

postgres_proxy_keep_alive – Enable postgres proxy keep alive. The default value is ‘false’.

kifs_directory_data_limit – The default maximum capacity to apply when creating a KiFS directory (bytes). The minimum allowed value is ‘-1’.

compression_codec – The default compression algorithm applied to any column without a column-level or table-level default compression specified at the time it was created

disk_auto_optimize_timeout – Time interval in seconds after which the database will apply optimizations/transformations to persisted data, such as compression. The minimum allowed value is ‘0’.

ha_consumer_replay_offset – Initializes HA replay from the given timestamp (as milliseconds since unix epoch). The minimum allowed value is ‘-1’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

evict_to_cold – If true and evict_columns is specified, the given objects will be evicted to cold storage (if such a tier exists). Allowed values are:

true

false

persist – If true the system configuration will be written to disk upon successful application of this request. This will commit the changes from this request and any additional in-memory modifications. Allowed values are:

true

false

The default value is ‘true’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

updated_properties_map (dict of str to str) –
Map of values updated; for speed tests, a map of values measured to the measurement

info (dict of str to str) –
Additional information.

alter_table(table_name=None, action=None, value=None, options={})[source]

Apply various modifications to a table or view. The available modifications include the following:

Manage a table’s columns–a column can be added, removed, or have its type and properties modified, including whether it is dictionary encoded or not.

External tables cannot be modified except for their refresh method.

Create or delete a column, low-cardinality index, chunk skip, geospatial, CAGRA, or HNSW index. This can speed up certain operations when using expressions containing equality or relational operators on indexed columns. This only applies to tables.

Create or delete a foreign key on a particular column.

Manage a range-partitioned or a manual list-partitioned table’s partitions.

Set (or reset) the tier strategy of a table or view.

Refresh and manage the refresh mode of a materialized view or an external table.

Set the time-to-live (TTL). This can be applied to tables or views.

Set the global access mode (i.e. locking) for a table. This setting trumps any role-based access controls that may be in place; e.g., a user with write access to a table marked read-only will not be able to insert records into it. The mode can be set to read-only, write-only, read/write, and no access.

Parameters

table_name (str) –
Table on which the operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table or view.

action (str) –
Modification operation to be applied. Allowed values are:

create_index – Creates a column (attribute) index, low-cardinality index, chunk skip index, geospatial index, CAGRA index, or HNSW index (depending on the specified index_type), on the column name specified in input parameter value. If this column already has the specified index, an error will be returned.

refresh_index – Refreshes an index identified by index_type, on the column name specified in input parameter value. Currently applicable only to CAGRA indices.

delete_index – Deletes a column (attribute) index, low-cardinality index, chunk skip index, geospatial index, CAGRA index, or HNSW index (depending on the specified index_type), on the column name specified in input parameter value. If this column does not have the specified index, an error will be returned.

move_to_collection – [DEPRECATED–please use move_to_schema and use GPUdb.create_schema() to create the schema if non-existent] Moves a table or view into a schema named input parameter value. If the schema provided is non-existent, it will be automatically created.

move_to_schema – Moves a table or view into a schema named input parameter value. If the schema provided is nonexistent, an error will be thrown. If input parameter value is empty, then the table or view will be placed in the user’s default schema.

protected – No longer used. Previously set whether the given input parameter table_name should be protected or not. The input parameter value would have been either ‘true’ or ‘false’.

rename_table – Renames a table or view to input parameter value. Has the same naming restrictions as tables.

ttl – Sets the time-to-live in minutes of the table or view specified in input parameter table_name.

add_comment – Adds the comment specified in input parameter value to the table specified in input parameter table_name. Use column_name to set the comment for a column.

add_column – Adds the column specified in input parameter value to the table specified in input parameter table_name. Use column_type and column_properties in input parameter options to set the column’s type and properties, respectively.

change_column – Changes type and properties of the column specified in input parameter value. Use column_type and column_properties in input parameter options to set the column’s type and properties, respectively. Note that primary key and/or shard key columns cannot be changed. All unchanging column properties must be listed for the change to take place, e.g., to add dictionary encoding to an existing ‘char4’ column, both ‘char4’ and ‘dict’ must be specified in the input parameter options map.

delete_column – Deletes the column specified in input parameter value from the table specified in input parameter table_name.

create_foreign_key – Creates a foreign key specified in input parameter value using the format ‘(source_column_name [, …]) references target_table_name(primary_key_column_name [, …]) [as foreign_key_name]’.

delete_foreign_key – Deletes a foreign key. The input parameter value should be the foreign_key_name specified when creating the key or the complete string used to define it.

add_partition – Adds the partition specified in input parameter value, to either a range-partitioned or manual list-partitioned table.

remove_partition – Removes the partition specified in input parameter value (and relocates all of its data to the default partition) from either a range-partitioned or manual list-partitioned table.

delete_partition – Deletes the partition specified in input parameter value (and all of its data) from either a range-partitioned or manual list-partitioned table.

set_global_access_mode – Sets the global access mode (i.e. locking) for the table specified in input parameter table_name. Specify the access mode in input parameter value. Valid modes are ‘no_access’, ‘read_only’, ‘write_only’ and ‘read_write’.

refresh – For a materialized view, replays all the table creation commands required to create the view. For an external table, reloads all data in the table from its associated source files or data source.

set_refresh_method – For a materialized view, sets the method by which the view is refreshed to the method specified in input parameter value - one of ‘manual’, ‘periodic’, or ‘on_change’. For an external table, sets the method by which the table is refreshed to the method specified in input parameter value - either ‘manual’ or ‘on_start’.

set_refresh_start_time – Sets the time to start periodic refreshes of this materialized view to the datetime string specified in input parameter value with format ‘YYYY-MM-DD HH:MM:SS’. Subsequent refreshes occur at the specified time + N * the refresh period.

set_refresh_stop_time – Sets the time to stop periodic refreshes of this materialized view to the datetime string specified in input parameter value with format ‘YYYY-MM-DD HH:MM:SS’.

set_refresh_period – Sets the time interval in seconds at which to refresh this materialized view to the value specified in input parameter value. Also, sets the refresh method to periodic if not already set.

set_refresh_span – Sets the future time-offset(in seconds) for the view refresh to stop.

set_refresh_execute_as – Sets the user name to refresh this materialized view to the value specified in input parameter value.

remove_text_search_attributes – Removes text search attribute from all columns.

remove_shard_keys – Removes the shard key property from all columns, so that the table will be considered randomly sharded. The data is not moved. The input parameter value is ignored.

set_strategy_definition – Sets the tier strategy for the table and its columns to the one specified in input parameter value, replacing the existing tier strategy in its entirety.

cancel_datasource_subscription – Permanently unsubscribe a data source that is loading continuously as a stream. The data source can be Kafka / S3 / Azure.

pause_datasource_subscription – Temporarily unsubscribe a data source that is loading continuously as a stream. The data source can be Kafka / S3 / Azure.

resume_datasource_subscription – Resubscribe to a paused data source subscription. The data source can be Kafka / S3 / Azure.

change_owner – Change the owner resource group of the table.

set_load_vectors_policy – Set startup data loading scheme for the table; see description of ‘load_vectors_policy’ in GPUdb.create_table() for possible values for input parameter value

set_build_pk_index_policy – Set startup primary key generation scheme for the table; see description of ‘build_pk_index_policy’ in GPUdb.create_table() for possible values for input parameter value

set_build_materialized_view_policy – Set startup rebuilding scheme for the materialized view; see description of ‘build_materialized_view_policy’ in GPUdb.create_materialized_view() for possible values for input parameter value

value (str) –
The value of the modification, depending on input parameter action. For example, if input parameter action is add_column, this would be the column name; while the column’s definition would be covered by the column_type, column_properties, column_default_value, and add_column_expression in input parameter options. If input parameter action is ttl, it would be the number of minutes for the new TTL. If input parameter action is refresh, this field would be blank.

options (dict of str to str) –
Optional parameters. Allowed keys are:

action

column_name

table_name

column_default_value – When adding a column, set a default value for existing records. For nullable columns, the default value will be null, regardless of data type.

column_properties – When adding or changing a column, set the column properties (strings, separated by a comma: data, text_search, char8, int8 etc).

column_type – When adding or changing a column, set the column type (strings, separated by a comma: int, double, string, null etc).

copy_values_from_column – [DEPRECATED–please use add_column_expression instead.]

rename_column – When changing a column, specify new column name.

validate_change_column – When changing a column, validate the change before applying it (or not). Allowed values are:

true – Validate all values. A value too large (or too long) for the new type will prevent any change.

false – When a value is too large or long, it will be truncated.

The default value is ‘true’.

update_last_access_time – Indicates whether the time-to-live (TTL) expiration countdown timer should be reset to the table’s TTL. Allowed values are:

true – Reset the expiration countdown timer to the table’s configured TTL.

false – Don’t reset the timer; expiration countdown will continue from where it is, as if the table had not been accessed.

The default value is ‘true’.

add_column_expression – When adding a column, an optional expression to use for the new column’s values. Any valid expression may be used, including one containing references to existing columns in the same table.

strategy_definition – Optional parameter for specifying the tier strategy for the table and its columns when input parameter action is set_strategy_definition, replacing the existing tier strategy in its entirety.

index_type – Type of index to create, when input parameter action is create_index; to refresh, when input parameter action is refresh_index; or to delete, when input parameter action is delete_index. Allowed values are:

column – Create or delete a column (attribute) index.

low_cardinality – Create a low-cardinality column (attribute) index.

chunk_skip – Create or delete a chunk skip index.

geospatial – Create or delete a geospatial index

cagra – Create or delete a CAGRA index on a vector column

hnsw – Create or delete an HNSW index on a vector column

The default value is ‘column’.

index_options – Options to use when creating an index, in the format “key: value [, key: value [, …]]”. Valid options vary by index type.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Table on which the operation was performed.

action (str) –
Modification operation that was performed.

value (str) –
The value of the modification that was performed.

type_id (str) –
return the type_id (when changing a table, a new type may be created)

type_definition (str) –
return the type_definition (when changing a table, a new type may be created)

properties (dict of str to lists of str) –
return the type properties (when changing a table, a new type may be created)

label (str) –
return the type label (when changing a table, a new type may be created)

info (dict of str to str) –
Additional information.

alter_table_columns(table_name=None, column_alterations=None, options=None)[source]

Apply various modifications to columns in a table, view. The available modifications include the following:

Create or delete an index on a particular column. This can speed up certain operations when using expressions containing equality or relational operators on indexed columns. This only applies to tables.

Manage a table’s columns–a column can be added, removed, or have its type and properties modified, including whether it is dictionary encoded or not.

Parameters

table_name (str) –
Table on which the operation will be performed. Must be an existing table or view, in [schema_name.]table_name format, using standard name resolution rules.

column_alterations (list of dicts of str to str) –
List of alter table add/delete/change column requests - all for the same table. Each request is a map that includes ‘column_name’, ‘action’ and the options specific for the action. Note that the same options as in alter table requests but in the same map as the column name and the action. For example: [{‘column_name’:’col_1’,’action’:’change_column’,’rename_column’:’col_2’},{‘column_name’:’col_1’,’action’:’add_column’, ‘type’:’int’,’default_value’:’1’}]. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters.

Returns

A dict with the following entries–

table_name (str) –
Table on which the operation was performed.

type_id (str) –
return the type_id (when changing a table, a new type may be created)

type_definition (str) –
return the type_definition (when changing a table, a new type may be created)

properties (dict of str to lists of str) –
return the type properties (when changing a table, a new type may be created)

label (str) –
return the type label (when changing a table, a new type may be created)

column_alterations (list of dicts of str to str) –
List of alter table add/delete/change column requests - all for the same table. Each request is a map that includes ‘column_name’, ‘action’ and the options specific for the action. Note that the same options as in alter table requests but in the same map as the column name and the action. For example: [{‘column_name’:’col_1’,’action’:’change_column’,’rename_column’:’col_2’},{‘column_name’:’col_1’,’action’:’add_column’, ‘type’:’int’,’default_value’:’1’}]

info (dict of str to str) –
Additional information.

alter_table_metadata(table_names=None, metadata_map=None, options={})[source]

Updates (adds or changes) metadata for tables. The metadata key and values must both be strings. This is an easy way to annotate whole tables rather than single records within tables. Some examples of metadata are owner of the table, table creation timestamp etc.

Parameters

table_names (list of str) –
Names of the tables whose metadata will be updated, in [schema_name.]table_name format, using standard name resolution rules. All specified tables must exist, or an error will be returned. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

metadata_map (dict of str to str) –
A map which contains the metadata of the tables that are to be updated. Note that only one map is provided for all the tables; so the change will be applied to every table. If the provided map is empty, then all existing metadata for the table(s) will be cleared.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_names (list of str) –
Value of input parameter table_names.

metadata_map (dict of str to str) –
Value of input parameter metadata_map.

info (dict of str to str) –
Additional information.

alter_table_monitor(topic_id=None, monitor_updates_map=None, options=None)[source]

Alters a table monitor previously created with GPUdb.create_table_monitor().

Parameters

topic_id (str) –
The topic ID returned by GPUdb.create_table_monitor().

monitor_updates_map (dict of str to str) –
Map containing the properties of the table monitor to be updated. Error if empty. Allowed keys are:

schema_name – Updates the schema name. If schema_name doesn’t exist, an error will be thrown. If schema_name is empty, then the user’s default schema will be used.

options (dict of str to str) –
Optional parameters.

Returns

A dict with the following entries–

topic_id (str) –
Value of input parameter topic_id.

info (dict of str to str) –
Additional information.

alter_tier(name=None, options={})[source]

Alters properties of an existing tier to facilitate resource management.

To disable watermark-based eviction, set both high_watermark and low_watermark to 100.

Parameters

name (str) –
Name of the tier to be altered. Must be an existing tier group name: vram, ram, disk[n], persist, cold[n].

options (dict of str to str) –
Optional parameters. Allowed keys are:

capacity – Maximum size in bytes this tier may hold at once, per rank.

high_watermark – Threshold of usage of this tier’s resource that once exceeded, will trigger watermark-based eviction from this tier. The minimum allowed value is ‘0’. The maximum allowed value is ‘100’.

low_watermark – Threshold of resource usage that once fallen below after crossing the high_watermark, will cease watermark-based eviction from this tier. The minimum allowed value is ‘0’. The maximum allowed value is ‘100’.

wait_timeout – Timeout in seconds for reading from or writing to this resource. Applies to cold storage tiers only.

persist – If true the system configuration will be written to disk upon successful application of this request. This will commit the changes from this request and any additional in-memory modifications. Allowed values are:

true

false

The default value is ‘true’.

rank – Apply the requested change only to a specific rank. The minimum allowed value is ‘0’. The maximum allowed value is ‘10000’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

alter_user(name=None, action=None, value=None, options={})[source]

Alters a user.

Parameters

name (str) –
Name of the user to be altered. Must be an existing user.

action (str) –
Modification operation to be applied to the user. Allowed values are:

set_activated – Is the user allowed to login.

true – User may login

false – User may not login

set_comment – Sets the comment for an internal user.

set_default_schema – Set the default_schema for an internal user. An empty string means the user will have no default schema.

set_password – Sets the password of the user. The user must be an internal user.

set_resource_group – Sets the resource group for an internal user. The resource group must exist, otherwise, an empty string assigns the user to the default resource group.

value (str) –
The value of the modification, depending on input parameter action.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

alter_video(path=None, options={})[source]

Alters a video.

Parameters

path (str) –
Fully-qualified KiFS path to the video to be altered.

options (dict of str to str) –
Optional parameters. Allowed keys are:

ttl – Sets the TTL of the video.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

path (str) –
Fully qualified KIFS path to the video file.

info (dict of str to str) –
Additional information.

alter_wal(table_names=None, options={})[source]

Alters table write-ahead log (WAL) settings. Returns information about the requested table WAL modifications.

Parameters

table_names (list of str) –
List of tables to modify. An asterisk changes the system settings. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

max_segment_size – Maximum size of an individual segment file

segment_count – Approximate number of segment files to split the WAL across. Must be at least two.

sync_policy – Maximum size of an individual segment file. Allowed values are:

none – Disables the WAL

background – WAL entries are periodically written instead of immediately after each operation

flush – Protects entries in the event of a database crash

fsync – Protects entries in the event of an OS crash

flush_frequency – Specifies how frequently WAL entries are written with background sync. This is a global setting and can only be used with the system {options.table_names} specifier ‘*’.

checksum – If true each entry will be checked against a protective checksum. Allowed values are:

true

false

The default value is ‘true’.

override_non_default – If true tables with unique WAL settings will be overridden when applying a system level change. Allowed values are:

true

false

The default value is ‘false’.

restore_system_settings – If true tables with unique WAL settings will be reverted to the current global settings. Cannot be used in conjunction with any other option. Allowed values are:

true

false

The default value is ‘false’.

persist – If true and a system-level change was requested, the system configuration will be written to disk upon successful application of this request. This will commit the changes from this request and any additional in-memory modifications. Allowed values are:

true

false

The default value is ‘true’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

append_records(table_name=None, source_table_name=None, field_map=None, options={})[source]

Append (or insert) all records from a source table (specified by input parameter source_table_name) to a particular target table (specified by input parameter table_name). The field map (specified by input parameter field_map) holds the user specified map of target table column names with their mapped source column names.

Parameters

table_name (str) –
The table name for the records to be appended, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

source_table_name (str) –
The source table name to get records from, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table name.

field_map (dict of str to str) –
Contains the mapping of column names from the target table (specified by input parameter table_name) as the keys, and corresponding column names or expressions (e.g., ‘col_name+1’) from the source table (specified by input parameter source_table_name). Must be existing column names in source table and target table, and their types must be matched. For details on using expressions, see Expressions.

options (dict of str to str) –
Optional parameters. Allowed keys are:

offset – A positive integer indicating the number of initial results to skip from input parameter source_table_name. Default is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT. The default value is ‘0’.

limit – A positive integer indicating the maximum number of results to be returned from input parameter source_table_name. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. The default value is ‘-9999’.

expression – Optional filter expression to apply to the input parameter source_table_name. The default value is ‘’.

order_by – Comma-separated list of the columns to be sorted by from source table (specified by input parameter source_table_name), e.g., ‘timestamp asc, x desc’. The order_by columns do not have to be present in input parameter field_map. The default value is ‘’.

update_on_existing_pk – Specifies the record collision policy for inserting source table records (specified by input parameter source_table_name) into a target table (specified by input parameter table_name) with a primary key. If set to true, any existing table record with primary key values that match those of a source table record being inserted will be replaced by that new record (the new data will be “upserted”). If set to false, any existing table record with primary key values that match those of a source table record being inserted will remain unchanged, while the source record will be rejected and an error handled as determined by ignore_existing_pk. If the specified table does not have a primary key, then this option has no effect. Allowed values are:

true – Upsert new records when primary keys match existing records

false – Reject new records when primary keys match existing records

The default value is ‘false’.

ignore_existing_pk – Specifies the record collision error-suppression policy for inserting source table records (specified by input parameter source_table_name) into a target table (specified by input parameter table_name) with a primary key, only used when not in upsert mode (upsert mode is disabled when update_on_existing_pk is false). If set to true, any source table record being inserted that is rejected for having primary key values that match those of an existing target table record will be ignored with no error generated. If false, the rejection of any source table record for having primary key values matching an existing target table record will result in an error being raised. If the specified table does not have a primary key or if upsert mode is in effect (update_on_existing_pk is true), then this option has no effect. Allowed values are:

true – Ignore source table records whose primary key values collide with those of target table records

false – Raise an error for any source table record whose primary key values collide with those of a target table record

The default value is ‘false’.

pk_conflict_predicate_higher – The record with higher value for the column resolves the primary-key insert conflict. The default value is ‘’.

pk_conflict_predicate_lower – The record with lower value for the column resolves the primary-key insert conflict. The default value is ‘’.

truncate_strings – If set to true, it allows inserting longer strings into smaller charN string columns by truncating the longer strings to fit. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str)

info (dict of str to str) –
Additional information. The default value is an empty dict ( {} ).

clear_statistics(table_name='', column_name='', options={})[source]

Clears statistics (cardinality, mean value, etc.) for a column in a specified table.

Parameters

table_name (str) –
Name of a table, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table. The default value is ‘’.

column_name (str) –
Name of the column in input parameter table_name for which to clear statistics. The column must be from an existing table. An empty string clears statistics for all columns in the table. The default value is ‘’.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

column_name (str) –
Value of input parameter column_name.

info (dict of str to str) –
Additional information.

clear_table(table_name='', authorization='', options={})[source]

Clears (drops) one or all tables in the database cluster. The operation is synchronous meaning that the table will be cleared before the function returns. The response payload returns the status of the operation along with the name of the table that was cleared.

Parameters

table_name (str) –
Name of the table to be cleared, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table. Empty string clears all available tables, though this behavior is be prevented by default via gpudb.conf parameter ‘disable_clear_all’. The default value is ‘’.

authorization (str) –
No longer used. User can pass an empty string. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If true and if the table specified in input parameter table_name does not exist no error is returned. If false and if the table specified in input parameter table_name does not exist then an error is returned. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name for a given table, or ‘ALL CLEARED’ in case of clearing all tables.

info (dict of str to str) –
Additional information.

clear_table_monitor(topic_id=None, options={})[source]

Deactivates a table monitor previously created with GPUdb.create_table_monitor().

Parameters

topic_id (str) –
The topic ID returned by GPUdb.create_table_monitor().

options (dict of str to str) –
Optional parameters. Allowed keys are:

keep_autogenerated_sink – If true, the auto-generated datasink associated with this monitor, if there is one, will be retained for further use. If false, then the auto-generated sink will be dropped if there are no other monitors referencing it. Allowed values are:

true

false

The default value is ‘false’.

clear_all_references – If true, all references that share the same input parameter topic_id will be cleared. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

topic_id (str) –
Value of input parameter topic_id.

info (dict of str to str) –
Additional information.

clear_tables(table_names=[], options={})[source]

Clears (drops) tables in the database cluster. The operation is synchronous meaning that the tables will be cleared before the function returns. The response payload returns the status of the operation for each table requested.

Parameters

table_names (list of str) –
Names of the tables to be cleared, in [schema_name.]table_name format, using standard name resolution rules. Must be existing tables. Empty list clears all available tables, though this behavior is be prevented by default via gpudb.conf parameter ‘disable_clear_all’. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If true and if a table specified in input parameter table_names does not exist no error is returned. If false and if a table specified in input parameter table_names does not exist then an error is returned. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

tables (dict of str to str) –
For each table in input parameter table_names, any error from the clear operation, or an empty string if successful.

info (dict of str to str) –
Additional information.

clear_trigger(trigger_id=None, options={})[source]

Clears or cancels the trigger identified by the specified handle. The output returns the handle of the trigger cleared as well as indicating success or failure of the trigger deactivation.

Parameters

trigger_id (str) –
ID for the trigger to be deactivated.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

trigger_id (str) –
Value of input parameter trigger_id.

info (dict of str to str) –
Additional information.

collect_statistics(table_name=None, column_names=None, options={})[source]

Collect statistics for a column(s) in a specified table.

Parameters

table_name (str) –
Name of a table, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

column_names (list of str) –
List of one or more column names in input parameter table_name for which to collect statistics (cardinality, mean value, etc.). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

column_names (list of str) –
Value of input parameter column_names.

info (dict of str to str) –
Additional information.

create_backup(backup_name=None, backup_type=None, backup_objects_map=None, datasink_name=None, options={})[source]

Creates a database backup containing a current snapshot of existing objects.

Parameters

backup_name (str) –
Name for this backup object. If the backup object already exists, only an incremental or differential backup can be made, unless recreate is specified

backup_type (str) –
Type of backup to create. Allowed values are:

incremental

differential

full

backup_objects_map (dict of str to str) –
Map of objects to be captured in the backup. Error if empty and creating full backup. Error if non-empty when creating an incremental or differential backup. Allowed keys are:

all – All object types in a schema (excludes permissions, system configuration, host secret key, KiFS directories and user defined functions)

table – Database Table

credential – Credential

context – Context

datasink – Data Sink

datasource – Data Source

stored_procedure – SQL Procedure

monitor – Table Monitor (Stream)

user – User (internal and external) and associated permissions

role – Role, role members (roles or users, recursively) and associated permissions

configuration – If true, backup the database configuration file. Allowed values are:

false

true

The default value is ‘false’.

datasink_name (str) –
Datasink where backup will be stored.

options (dict of str to str) –
Optional parameters. Allowed keys are:

comment – Comments to store with this backup

checksum – Calculate checksum for backup files. Allowed values are:

false

true

The default value is ‘true’.

ddl_only – Only save the DDL, do not backup table data. Allowed values are:

true

false

The default value is ‘false’.

max_incremental_backups_to_keep – Maximum number of incremental backups to keep. The default value is ‘-1’.

delete_intermediate_backups – When the backup type is differential, delete any intermediate incremental or differential backups. This overrides max_incremental_backups_to_keep. Allowed values are:

false

true

The default value is ‘false’.

recreate – Replace the existing backup object with a new full backup if it already exists. Allowed values are:

false

true

The default value is ‘false’.

dry_run – Dry run of backup. Allowed values are:

false

true

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

backup_name (str) –
Value of input parameter backup_name.

backup_id (long) –
Backup ID.

copied_bytes (long) –
Total size of all files copied for this snapshot

copied_files (long) –
Total number of files copied for this snapshot

copied_records (long) –
Total number of records in all files copied for this snapshot

total_number_of_records (long) –
Total number of records that can be restored from this snapshot

info (dict of str to str) –
Additional information.

create_credential(credential_name=None, type=None, identity=None, secret=None, options={})[source]

Create a new credential.

Parameters

credential_name (str) –
Name of the credential to be created. Must contain only letters, digits, and underscores, and cannot begin with a digit. Must not match an existing credential name.

type (str) –
Type of the credential to be created. Allowed values are:

aws_access_key

aws_iam_role

azure_ad

azure_oauth

azure_sas

azure_storage_key

confluent

docker

gcs_service_account_id

gcs_service_account_keys

hdfs

jdbc

kafka

nvidia_api_key

openai_api_key

identity (str) –
User of the credential to be created.

secret (str) –
Password of the credential to be created.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

credential_name (str) –
Value of input parameter credential_name.

info (dict of str to str) –
Additional information.

create_datasink(name=None, destination=None, options={})[source]

Creates a data sink, which contains the destination information for a data sink that is external to the database.

Parameters

name (str) –
Name of the data sink to be created.

destination (str) –
Destination for the output data in format ‘storage_provider_type://path[:port]’.

Supported storage provider types are ‘azure’, ‘gcs’, ‘hdfs’, ‘http’, ‘https’, ‘jdbc’, ‘kafka’, and ‘s3’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

connection_timeout – Timeout in seconds for connecting to this data sink

wait_timeout – Timeout in seconds for waiting for a response from this data sink

credential – Name of the credential object to be used in this data sink

s3_bucket_name – Name of the Amazon S3 bucket to use as the data sink

s3_region – Name of the Amazon S3 region where the given bucket is located

s3_verify_ssl – Whether to verify SSL connections. Allowed values are:

true – Connect with SSL verification

false – Connect without verifying the SSL connection; for testing purposes, bypassing TLS errors, self-signed certificates, etc.

The default value is ‘true’.

s3_use_virtual_addressing – Whether to use virtual addressing when referencing the Amazon S3 sink. Allowed values are:

true – The requests URI should be specified in virtual-hosted-style format where the bucket name is part of the domain name in the URL.

false – Use path-style URI for requests.

The default value is ‘true’.

s3_aws_role_arn – Amazon IAM Role ARN which has required S3 permissions that can be assumed for the given S3 IAM user

s3_encryption_customer_algorithm – Customer encryption algorithm used encrypting data

s3_encryption_customer_key – Customer encryption key to encrypt or decrypt data

s3_encryption_type – Server side encryption type

s3_kms_key_id – KMS key

hdfs_kerberos_keytab – Kerberos keytab file location for the given HDFS user. This may be a KIFS file.

hdfs_delegation_token – Delegation token for the given HDFS user

hdfs_use_kerberos – Use kerberos authentication for the given HDFS cluster. Allowed values are:

true

false

The default value is ‘false’.

azure_storage_account_name – Name of the Azure storage account to use as the data sink, this is valid only if tenant_id is specified

azure_container_name – Name of the Azure storage container to use as the data sink

azure_tenant_id – Active Directory tenant ID (or directory ID)

azure_sas_token – Shared access signature token for Azure storage account to use as the data sink

azure_oauth_token – Oauth token to access given storage container

gcs_bucket_name – Name of the Google Cloud Storage bucket to use as the data sink

gcs_project_id – Name of the Google Cloud project to use as the data sink

gcs_service_account_keys – Google Cloud service account keys to use for authenticating the data sink

jdbc_driver_jar_path – JDBC driver jar file location

jdbc_driver_class_name – Name of the JDBC driver class

kafka_topic_name – Name of the Kafka topic to publish to if input parameter destination is a Kafka broker

max_batch_size – Maximum number of records per notification message. The default value is ‘1’.

max_message_size – Maximum size in bytes of each notification message. The default value is ‘1000000’.

json_format – The desired format of JSON encoded notifications message. Allowed values are:

flat – A single record is returned per message

nested – Records are returned as an array per message

The default value is ‘flat’.

use_managed_credentials – When no credentials are supplied, we use anonymous access by default. If this is set, we will use cloud provider user settings. Allowed values are:

true

false

The default value is ‘false’.

use_https – Use https to connect to datasink if true, otherwise use http. Allowed values are:

true

false

The default value is ‘true’.

skip_validation – Bypass validation of connection to this data sink. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

create_datasource(name=None, location=None, user_name=None, password=None, options={})[source]

Creates a data source, which contains the location and connection information for a data store that is external to the database.

Parameters

name (str) –
Name of the data source to be created.

location (str) –
Location of the remote storage in ‘storage_provider_type://[storage_path[:storage_port]]’ format.

Supported storage provider types are ‘azure’, ‘gcs’, ‘hdfs’, ‘jdbc’, ‘kafka’, ‘confluent’, and ‘s3’.

user_name (str) –
Name of the remote system user; may be an empty string

password (str) –
Password for the remote system user; may be an empty string

options (dict of str to str) –
Optional parameters. Allowed keys are:

skip_validation – Bypass validation of connection to remote source. Allowed values are:

true

false

The default value is ‘false’.

connection_timeout – Timeout in seconds for connecting to this storage provider

wait_timeout – Timeout in seconds for reading from this storage provider

credential – Name of the credential object to be used in data source

s3_bucket_name – Name of the Amazon S3 bucket to use as the data source

s3_region – Name of the Amazon S3 region where the given bucket is located

s3_verify_ssl – Whether to verify SSL connections. Allowed values are:

true – Connect with SSL verification

false – Connect without verifying the SSL connection; for testing purposes, bypassing TLS errors, self-signed certificates, etc.

The default value is ‘true’.

s3_use_virtual_addressing – Whether to use virtual addressing when referencing the Amazon S3 source. Allowed values are:

true – The requests URI should be specified in virtual-hosted-style format where the bucket name is part of the domain name in the URL.

false – Use path-style URI for requests.

The default value is ‘true’.

s3_aws_role_arn – Amazon IAM Role ARN which has required S3 permissions that can be assumed for the given S3 IAM user

s3_encryption_customer_algorithm – Customer encryption algorithm used encrypting data

s3_encryption_customer_key – Customer encryption key to encrypt or decrypt data

hdfs_kerberos_keytab – Kerberos keytab file location for the given HDFS user. This may be a KIFS file.

hdfs_delegation_token – Delegation token for the given HDFS user

hdfs_use_kerberos – Use kerberos authentication for the given HDFS cluster. Allowed values are:

true

false

The default value is ‘false’.

azure_storage_account_name – Name of the Azure storage account to use as the data source, this is valid only if tenant_id is specified

azure_container_name – Name of the Azure storage container to use as the data source

azure_tenant_id – Active Directory tenant ID (or directory ID)

azure_sas_token – Shared access signature token for Azure storage account to use as the data source

azure_oauth_token – OAuth token to access given storage container

gcs_bucket_name – Name of the Google Cloud Storage bucket to use as the data source

gcs_project_id – Name of the Google Cloud project to use as the data source

gcs_service_account_keys – Google Cloud service account keys to use for authenticating the data source

is_stream – To load from Azure/GCS/S3 as a stream continuously. Allowed values are:

true

false

The default value is ‘false’.

kafka_topic_name – Name of the Kafka topic to use as the data source

jdbc_driver_jar_path – JDBC driver jar file location. This may be a KIFS file.

jdbc_driver_class_name – Name of the JDBC driver class

anonymous – Use anonymous connection to storage provider–DEPRECATED: this is now the default. Specify use_managed_credentials for non-anonymous connection. Allowed values are:

true

false

The default value is ‘true’.

use_managed_credentials – When no credentials are supplied, we use anonymous access by default. If this is set, we will use cloud provider user settings. Allowed values are:

true

false

The default value is ‘false’.

use_https – Use https to connect to datasource if true, otherwise use http. Allowed values are:

true

false

The default value is ‘true’.

schema_registry_location – Location of Confluent Schema Registry in ‘[storage_path[:storage_port]]’ format.

schema_registry_credential – Confluent Schema Registry credential object name.

schema_registry_port – Confluent Schema Registry port (optional).

schema_registry_connection_retries – Confluent Schema registry connection timeout (in Secs)

schema_registry_connection_timeout – Confluent Schema registry connection timeout (in Secs)

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

create_directory(directory_name=None, options={})[source]

Creates a new directory in KiFS. The new directory serves as a location in which the user can upload files using GPUdb.upload_files().

Parameters

directory_name (str) –
Name of the directory in KiFS to be created.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_home_directory – When set, a home directory is created for the user name provided in the value. The input parameter directory_name must be an empty string in this case. The user must exist.

data_limit – The maximum capacity, in bytes, to apply to the created directory. Set to -1 to indicate no upper limit. If empty, the system default limit is applied.

no_error_if_exists – If true, does not return an error if the directory already exists. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

directory_name (str) –
Value of input parameter directory_name.

info (dict of str to str) –
Additional information.

create_environment(environment_name=None, options={})[source]

Creates a new environment which can be used by user-defined functions (UDF).

Parameters

environment_name (str) –
Name of the environment to be created.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

environment_name (str) –
Value of input parameter environment_name.

info (dict of str to str) –
Additional information.

create_graph(graph_name=None, directed_graph=True, nodes=None, edges=None, weights=None, restrictions=None, options={})[source]

Creates a new graph network using given nodes, edges, weights, and restrictions.

IMPORTANT: It’s highly recommended that you review the Graphs & Solvers concepts documentation, the Graph REST Tutorial, and/or some graph examples before using this endpoint.

Parameters

graph_name (str) –
Name of the graph resource to generate.

directed_graph (bool) –
If set to true, the graph will be directed. If set to false, the graph will not be directed. Consult Directed Graphs for more details. Allowed values are:

True

False

The default value is True.

nodes (list of str) –
Nodes represent fundamental topological units of a graph. Nodes must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS NODE_ID’, expressions, e.g., ‘ST_MAKEPOINT(column1, column2) AS NODE_WKTPOINT’, or constant values, e.g., ‘{9, 10, 11} AS NODE_ID’. If using constant values in an identifier combination, the number of values specified must match across the combination. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

edges (list of str) –
Edges represent the required fundamental topological unit of a graph that typically connect nodes. Edges must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS EDGE_ID’, expressions, e.g., ‘SUBSTR(column, 1, 6) AS EDGE_NODE1_NAME’, or constant values, e.g., “{‘family’, ‘coworker’} AS EDGE_LABEL”. If using constant values in an identifier combination, the number of values specified must match across the combination. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

weights (list of str) –
Weights represent a method of informing the graph solver of the cost of including a given edge in a solution. Weights must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS WEIGHTS_EDGE_ID’, expressions, e.g., ‘ST_LENGTH(wkt) AS WEIGHTS_VALUESPECIFIED’, or constant values, e.g., ‘{4, 15} AS WEIGHTS_VALUESPECIFIED’. If using constant values in an identifier combination, the number of values specified must match across the combination. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

restrictions (list of str) –
Restrictions represent a method of informing the graph solver which edges and/or nodes should be ignored for the solution. Restrictions must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS RESTRICTIONS_EDGE_ID’, expressions, e.g., ‘column/2 AS RESTRICTIONS_VALUECOMPARED’, or constant values, e.g., ‘{0, 0, 0, 1} AS RESTRICTIONS_ONOFFCOMPARED’. If using constant values in an identifier combination, the number of values specified must match across the combination. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

merge_tolerance – If node geospatial positions are input (e.g., WKTPOINT, X, Y), determines the minimum separation allowed between unique nodes. If nodes are within the tolerance of each other, they will be merged as a single node. The default value is ‘1.0E-5’.

recreate – If set to true and the graph (using input parameter graph_name) already exists, the graph is deleted and recreated. Allowed values are:

true

false

The default value is ‘false’.

save_persist – If set to true, the graph will be saved in the persist directory (see the config reference for more information). If set to false, the graph will be removed when the graph server is shutdown. Allowed values are:

true

false

The default value is ‘false’.

add_table_monitor – Adds a table monitor to every table used in the creation of the graph; this table monitor will trigger the graph to update dynamically upon inserts to the source table(s). Note that upon database restart, if save_persist is also set to true, the graph will be fully reconstructed and the table monitors will be reattached. For more details on table monitors, see GPUdb.create_table_monitor(). Allowed values are:

true

false

The default value is ‘false’.

graph_table – If specified, the created graph is also created as a table with the given name, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. The table will have the following identifier columns: ‘EDGE_ID’, ‘EDGE_NODE1_ID’, ‘EDGE_NODE2_ID’. If left blank, no table is created. The default value is ‘’.

add_turns – Adds dummy ‘pillowed’ edges around intersection nodes where there are more than three edges so that additional weight penalties can be imposed by the solve endpoints. (increases the total number of edges). Allowed values are:

true

false

The default value is ‘false’.

is_partitioned – Allowed values are:

true

false

The default value is ‘false’.

server_id – Indicates which graph server(s) to send the request to. Default is to send to the server with the most available memory.

use_rtree – Use an range tree structure to accelerate and improve the accuracy of snapping, especially to edges. Allowed values are:

true

false

The default value is ‘true’.

label_delimiter – If provided the label string will be split according to this delimiter and each sub-string will be applied as a separate label onto the specified edge. The default value is ‘’.

allow_multiple_edges – Multigraph choice; allowing multiple edges with the same node pairs if set to true, otherwise, new edges with existing same node pairs will not be inserted. Allowed values are:

true

false

The default value is ‘true’.

embedding_table – If table exists (should be generated by the match/graph match_embedding solver), the vector embeddings for the newly inserted nodes will be appended into this table. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

result (bool) –
Indicates a successful creation on all servers.

num_nodes (long) –
Total number of nodes created.

num_edges (long) –
Total number of edges created.

edges_ids (list of longs) –
[Deprecated] Edges given as pairs of node indices. Only populated if export_create_results internal option is set to true.

info (dict of str to str) –
Additional information.

create_job(endpoint=None, request_encoding='binary', data=None, data_str=None, options={})[source]

Create a job which will run asynchronously. The response returns a job ID, which can be used to query the status and result of the job. The status and the result of the job upon completion can be requested by GPUdb.get_job().

Parameters

endpoint (str) –
Indicates which endpoint to execute, e.g. ‘/alter/table’.

request_encoding (str) –
The encoding of the request payload for the job. Allowed values are:

binary

json

snappy

The default value is ‘binary’.

data (bytes) –
Binary-encoded payload for the job to be run asynchronously. The payload must contain the relevant input parameters for the endpoint indicated in input parameter endpoint. Please see the documentation for the appropriate endpoint to see what values must (or can) be specified. If this parameter is used, then input parameter request_encoding must be binary or snappy.

data_str (str) –
JSON-encoded payload for the job to be run asynchronously. The payload must contain the relevant input parameters for the endpoint indicated in input parameter endpoint. Please see the documentation for the appropriate endpoint to see what values must (or can) be specified. If this parameter is used, then input parameter request_encoding must be json.

options (dict of str to str) –
Optional parameters. Allowed keys are:

job_tag – Tag to use for submitted job. The same tag could be used on backup cluster to retrieve response for the job. Tags can use letter, numbers, ‘_’ and ‘-’

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

job_id (long) –
An identifier for the job created by this call.

info (dict of str to str) –
Additional information. Allowed keys are:

job_tag – The job tag specified by the user or if unspecified by user, a unique identifier generated internally.

query_id – A unique identifier for this job generated for use in tracing telemetry data

The default value is an empty dict ( {} ).

create_join_table(join_table_name=None, table_names=None, column_names=None, expressions=[], options={})[source]

Creates a table that is the result of a SQL JOIN.

For join details and examples see: Joins. For limitations, see Join Limitations and Cautions.

Parameters

join_table_name (str) –
Name of the join table to be created, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria.

table_names (list of str) –
The list of table names composing the join, each in [schema_name.]table_name format, using standard name resolution rules. Corresponds to a SQL statement FROM clause. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

column_names (list of str) –
List of member table columns or column expressions to be included in the join. Columns can be prefixed with ‘table_id.column_name’, where ‘table_id’ is the table name or alias. Columns can be aliased via the syntax ‘column_name as alias’. Wild cards ‘*’ can be used to include all columns across member tables or ‘table_id.*’ for all of a single table’s columns. Columns and column expressions composing the join must be uniquely named or aliased–therefore, the ‘*’ wild card cannot be used if column names aren’t unique across all tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

expressions (list of str) –
An optional list of expressions to combine and filter the joined tables. Corresponds to a SQL statement WHERE clause. For details see: expressions. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter join_table_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_join_table_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the join as part of input parameter join_table_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the join. If the schema is non-existent, it will be automatically created. The default value is ‘’.

max_query_dimensions – No longer used.

optimize_lookups – Use more memory to speed up the joining of tables. Allowed values are:

true

false

The default value is ‘false’.

strategy_definition – The tier strategy for the table and its columns.

ttl – Sets the TTL of the join table specified in input parameter join_table_name.

view_id – view this projection is part of. The default value is ‘’.

no_count – Return a count of 0 for the join table for logging and for GPUdb.show_table(); optimization needed for large overlapped equi-join stencils. The default value is ‘false’.

chunk_size – Maximum number of records per joined-chunk for this table. Defaults to the gpudb.conf file chunk size

enable_virtual_chunking – Collect chunks with accumulated size less than chunk_size into a single chunk. The default value is ‘false’.

max_virtual_chunk_size – Maximum number of records per virtual-chunk. When set, enables virtual chunking. Defaults to chunk_size if virtual chunking otherwise enabled.

min_virtual_chunk_size – Minimum number of records per virtual-chunk. When set, enables virtual chunking. Defaults to chunk_size if virtual chunking otherwise enabled.

enable_sparse_virtual_chunking – materialize virtual chunks with only non-deleted values. The default value is ‘false’.

enable_equi_join_lazy_result_store – Allow using the lazy result store to cache computation of one side of a multichunk equi-join. Reduces computation but also reduces parallelism to the number of chunks on the other side of the equi-join

enable_predicate_equi_join_lazy_result_store – Allow using the lazy result store to cache computation of one side of a multichunk predicate-equi-join. Reduces computation but also reduces parallelism to the number of chunks on the other side of the equi-join

enable_pk_equi_join – Use equi-join to do primary key joins rather than using primary-key-index

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

join_table_name (str) –
Value of input parameter join_table_name.

count (long) –
The number of records in the join table filtered by the given select expression.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_join_table_name – The fully qualified name of the join table (i.e. including the schema)

The default value is an empty dict ( {} ).

create_materialized_view(table_name=None, options={})[source]

Initiates the process of creating a materialized view, reserving the view’s name to prevent other views or tables from being created with that name.

For materialized view details and examples, see Materialized Views.

The response contains output parameter view_id, which is used to tag each subsequent operation (projection, union, aggregation, filter, or join) that will compose the view.

Parameters

table_name (str) –
Name of the table to be created that is the top-level table of the materialized view, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria.

options (dict of str to str) –
Optional parameters. Allowed keys are:

collection_name – [DEPRECATED–please specify the containing schema for the materialized view as part of input parameter table_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema which is to contain the newly created view. If the schema provided is non-existent, it will be automatically created.

execute_as – User name to use to run the refresh job

build_materialized_view_policy – Sets startup materialized view rebuild scheme. Allowed values are:

always – Rebuild as many materialized views as possible before accepting requests.

lazy – Rebuild the necessary materialized views at start, and load the remainder lazily.

on_demand – Rebuild materialized views as requests use them.

system – Rebuild materialized views using the system-configured default.

The default value is ‘system’.

persist – If true, then the materialized view specified in input parameter table_name will be persisted and will not expire unless a ttl is specified. If false, then the materialized view will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

refresh_span – Sets the future time-offset(in seconds) at which periodic refresh stops

refresh_stop_time – When refresh_method is periodic, specifies the time at which a periodic refresh is stopped. Value is a datetime string with format ‘YYYY-MM-DD HH:MM:SS’.

refresh_method – Method by which the join can be refreshed when the data in underlying member tables have changed. Allowed values are:

manual – Refresh only occurs when manually requested by calling GPUdb.alter_table() with an ‘action’ of ‘refresh’

on_query – Refresh any time the view is queried.

on_change – If possible, incrementally refresh (refresh just those records added) whenever an insert, update, delete or refresh of input table is done. A full refresh is done if an incremental refresh is not possible.

periodic – Refresh table periodically at rate specified by refresh_period

The default value is ‘manual’.

refresh_period – When refresh_method is periodic, specifies the period in seconds at which refresh occurs

refresh_start_time – When refresh_method is periodic, specifies the first time at which a refresh is to be done. Value is a datetime string with format ‘YYYY-MM-DD HH:MM:SS’.

ttl – Sets the TTL of the table specified in input parameter table_name.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

view_id (str) –
Value of view_id.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_table_name – The fully qualified name of the result table (i.e. including the schema)

The default value is an empty dict ( {} ).

create_proc(proc_name=None, execution_mode='distributed', files={}, command='', args=[], options={})[source]

Creates an instance (proc) of the user-defined functions (UDF) specified by the given command, options, and files, and makes it available for execution.

Parameters

proc_name (str) –
Name of the proc to be created. Must not be the name of a currently existing proc.

execution_mode (str) –
The execution mode of the proc. Allowed values are:

distributed – Input table data will be divided into data segments that are distributed across all nodes in the cluster, and the proc command will be invoked once per data segment in parallel. Output table data from each invocation will be saved to the same node as the corresponding input data.

nondistributed – The proc command will be invoked only once per execution, and will not have direct access to any tables named as input or output table parameters in the call to GPUdb.execute_proc(). It will, however, be able to access the database using native API calls.

The default value is ‘distributed’.

files (dict of str to bytes) –
A map of the files that make up the proc. The keys of the map are file names, and the values are the binary contents of the files. The file names may include subdirectory names (e.g. ‘subdir/file’) but must not resolve to a directory above the root for the proc.

Files may be loaded from existing files in KiFS. Those file names should be prefixed with the uri kifs:// and the values in the map should be empty. The default value is an empty dict ( {} ).

command (str) –
The command (excluding arguments) that will be invoked when the proc is executed. It will be invoked from the directory containing the proc input parameter files and may be any command that can be resolved from that directory. It need not refer to a file actually in that directory; for example, it could be ‘java’ if the proc is a Java application; however, any necessary external programs must be preinstalled on every database node. If the command refers to a file in that directory, it must be preceded with ‘./’ as per Linux convention. If not specified, and exactly one file is provided in input parameter files, that file will be invoked. The default value is ‘’.

args (list of str) –
An array of command-line arguments that will be passed to input parameter command when the proc is executed. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

max_concurrency_per_node – The maximum number of concurrent instances of the proc that will be executed per node. 0 allows unlimited concurrency. The default value is ‘0’.

set_environment – A python environment to use when executing the proc. Must be an existing environment, else an error will be returned. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

proc_name (str) –
Value of input parameter proc_name.

info (dict of str to str) –
Additional information.

create_projection(table_name=None, projection_name=None, column_names=None, options={})[source]

Creates a new projection of an existing table. A projection represents a subset of the columns (potentially including derived columns) of a table.

For projection details and examples, see Projections. For limitations, see Projection Limitations and Cautions.

Window functions, which can perform operations like moving averages, are available through this endpoint as well as GPUdb.get_records_by_column().

A projection can be created with a different shard key than the source table. By specifying shard_key, the projection will be sharded according to the specified columns, regardless of how the source table is sharded. The source table can even be unsharded or replicated.

If input parameter table_name is empty, selection is performed against a single-row virtual table. This can be useful in executing temporal (NOW()), identity (USER()), or constant-based functions (GEODIST(-77.11, 38.88, -71.06, 42.36)).

Parameters

table_name (str) –
Name of the existing table on which the projection is to be applied, in [schema_name.]table_name format, using standard name resolution rules. An empty table name creates a projection from a single-row virtual table, where columns specified should be constants or constant expressions.

projection_name (str) –
Name of the projection to be created, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria.

column_names (list of str) –
List of columns from input parameter table_name to be included in the projection. Can include derived columns. Can be specified as aliased via the syntax ‘column_name as alias’. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter projection_name. If persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_projection_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the projection as part of input parameter projection_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the projection. If the schema is non-existent, it will be automatically created. The default value is ‘’.

expression – An optional filter expression to be applied to the source table prior to the projection. The default value is ‘’.

is_replicated – If true then the projection will be replicated even if the source table is not. Allowed values are:

true

false

The default value is ‘false’.

offset – The number of initial results to skip (this can be useful for paging through the results). The default value is ‘0’.

limit – The number of records to keep. The default value is ‘-9999’.

order_by – Comma-separated list of the columns to be sorted by; e.g. ‘timestamp asc, x desc’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.

chunk_size – Indicates the number of records per chunk to be used for this projection.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for this projection.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for this projection.

create_indexes – Comma-separated list of columns on which to create indexes on the projection. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name.

ttl – Sets the TTL of the projection specified in input parameter projection_name.

shard_key – Comma-separated list of the columns to be sharded on; e.g. ‘column1, column2’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.

persist – If true, then the projection specified in input parameter projection_name will be persisted and will not expire unless a ttl is specified. If false, then the projection will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

preserve_dict_encoding – If true, then columns that were dict encoded in the source table will be dict encoded in the projection. Allowed values are:

true

false

The default value is ‘true’.

retain_partitions – Determines whether the created projection will retain the partitioning scheme from the source table. Allowed values are:

true

false

The default value is ‘false’.

partition_type – Partitioning scheme to use. Allowed values are:

RANGE – Use range partitioning.

INTERVAL – Use interval partitioning.

LIST – Use list partitioning.

HASH – Use hash partitioning.

SERIES – Use series partitioning.

partition_keys – Comma-separated list of partition keys, which are the columns or column expressions by which records will be assigned to partitions defined by partition_definitions.

partition_definitions – Comma-separated list of partition definitions, whose format depends on the choice of partition_type. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for example formats.

is_automatic_partition – If true, a new partition will be created for values which don’t fall into an existing partition. Currently only supported for list partitions. Allowed values are:

true

false

The default value is ‘false’.

view_id – ID of view of which this projection is a member. The default value is ‘’.

strategy_definition – The tier strategy for the table and its columns.

compression_codec – The default compression codec for the projection’s columns.

join_window_functions – If set, window functions which require a reshard will be computed separately and joined back together, if the width of the projection is greater than the join_window_functions_threshold. The default value is ‘true’.

join_window_functions_threshold – If the projection is greater than this width (in bytes), then window functions which require a reshard will be computed separately and joined back together. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

projection_name (str) –
Value of input parameter projection_name.

info (dict of str to str) –
Additional information. Allowed keys are:

count – Number of records in the final table

qualified_projection_name – The fully qualified name of the projection (i.e. including the schema).

The default value is an empty dict ( {} ).

create_resource_group(name=None, tier_attributes={}, ranking=None, adjoining_resource_group='', options={})[source]

Creates a new resource group to facilitate resource management.

Parameters

name (str) –
Name of the group to be created. Must contain only letters, digits, and underscores, and cannot begin with a digit. Must not match existing resource group name.

tier_attributes (dict of str to dicts of str to str) –
Optional map containing tier names and their respective attribute group limits. The only valid attribute limit that can be set is max_memory (in bytes) for the VRAM & RAM tiers.

For instance, to set max VRAM capacity to 1GB per rank per GPU and max RAM capacity to 10GB per rank, use: {‘VRAM’:{‘max_memory’:’1000000000’}, ‘RAM’:{‘max_memory’:’10000000000’}}. Allowed keys are:

max_memory – Maximum amount of memory usable at one time, per rank, per GPU, for the VRAM tier; or maximum amount of memory usable at one time, per rank, for the RAM tier.

The default value is an empty dict ( {} ).

ranking (str) –
Indicates the relative ranking among existing resource groups where this new resource group will be placed. Allowed values are:

first – Make this resource group the new first one in the ordering

last – Make this resource group the new last one in the ordering

before – Place this resource group before the one specified by input parameter adjoining_resource_group in the ordering

after – Place this resource group after the one specified by input parameter adjoining_resource_group in the ordering

adjoining_resource_group (str) –
If input parameter ranking is before or after, this field indicates the resource group before or after which the current group will be placed; otherwise, leave blank. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

max_cpu_concurrency – Maximum number of simultaneous threads that will be used to execute a request, per rank, for this group. The minimum allowed value is ‘4’.

max_data – Maximum amount of data, per rank, in bytes, that can be used by all database objects within this group. Set to -1 to indicate no upper limit. The minimum allowed value is ‘-1’.

max_scheduling_priority – Maximum priority of a scheduled task for this group. The minimum allowed value is ‘1’. The maximum allowed value is ‘100’.

max_tier_priority – Maximum priority of a tiered object for this group. The minimum allowed value is ‘1’. The maximum allowed value is ‘10’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

create_role(name=None, options={})[source]

Creates a new role.

Note

This method should be used for on-premise deployments only.

Parameters

name (str) –
Name of the role to be created. Must contain only lowercase letters, digits, and underscores, and cannot begin with a digit. Must not be the same name as an existing user or role.

options (dict of str to str) –
Optional parameters. Allowed keys are:

resource_group – Name of an existing resource group to associate with this user

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

create_schema(schema_name=None, options={})[source]

Creates a SQL-style schema. Schemas are containers for tables and views. Multiple tables and views can be defined with the same name in different schemas.

Parameters

schema_name (str) –
Name of the schema to be created. Has the same naming restrictions as tables.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_exists – If true, prevents an error from occurring if the schema already exists. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

schema_name (str) –
Value of input parameter schema_name.

info (dict of str to str) –
Additional information.

create_table(table_name=None, type_id=None, options={})[source]

Creates a new table with the given type (definition of columns). The type is specified in input parameter type_id as either a numerical type ID (as returned by GPUdb.create_type()) or as a list of columns, each specified as a list of the column name, data type, and any column attributes.

Example of a type definition with some parameters:

[
    ["id", "int8", "primary_key"],
    ["dept_id", "int8", "primary_key", "shard_key"],
    ["manager_id", "int8", "nullable"],
    ["first_name", "char32"],
    ["last_name", "char64"],
    ["salary", "decimal"],
    ["hire_date", "date"]
]

Each column definition consists of the column name (which should meet the standard column naming criteria), the column’s specific type (int, long, float, double, string, bytes, or any of the properties map values from GPUdb.create_type()), and any data handling, data key, or data replacement properties.

A table may optionally be designated to use a replicated distribution scheme, or be assigned: foreign keys to other tables, a partitioning scheme, and/or a tier strategy.

Parameters

table_name (str) –
Name of the table to be created, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. Error for requests with existing table of the same name and type ID may be suppressed by using the no_error_if_exists option.

type_id (str) –
The type for the table, specified as either an existing table’s numerical type ID (as returned by GPUdb.create_type()) or a type definition (as described above).

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_exists – If true, prevents an error from occurring if the table already exists and is of the given type. If a table with the same ID but a different type exists, it is still an error. Allowed values are:

true

false

The default value is ‘false’.

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter table_name. If is_result_table is true, then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_table_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema as part of input parameter table_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema which is to contain the newly created table. If the schema is non-existent, it will be automatically created.

is_collection – [DEPRECATED–please use GPUdb.create_schema() to create a schema instead] Indicates whether to create a schema instead of a table. Allowed values are:

true

false

The default value is ‘false’.

is_replicated – Affects the distribution scheme for the table’s data. If true and the given type has no explicit shard key defined, the table will be replicated. If false, the table will be sharded according to the shard key specified in the given input parameter type_id, or randomly sharded, if no shard key is specified. Note that a type containing a shard key cannot be used to create a replicated table. Allowed values are:

true

false

The default value is ‘false’.

foreign_keys – Semicolon-separated list of foreign keys, of the format ‘(source_column_name [, …]) references target_table_name(primary_key_column_name [, …]) [as foreign_key_name]’.

foreign_shard_key – Foreign shard key of the format ‘source_column references shard_by_column from target_table(primary_key_column)’.

partition_type – Partitioning scheme to use. Allowed values are:

RANGE – Use range partitioning.

INTERVAL – Use interval partitioning.

LIST – Use list partitioning.

HASH – Use hash partitioning.

SERIES – Use series partitioning.

partition_keys – Comma-separated list of partition keys, which are the columns or column expressions by which records will be assigned to partitions defined by partition_definitions.

partition_definitions – Comma-separated list of partition definitions, whose format depends on the choice of partition_type. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for example formats.

is_automatic_partition – If true, a new partition will be created for values which don’t fall into an existing partition. Currently only supported for list partitions. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in input parameter table_name.

chunk_size – Indicates the number of records per chunk to be used for this table.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for this table.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for this table.

is_result_table – Indicates whether the table is a memory-only table. A result table cannot contain columns with text_search data-handling, and it will not be retained if the server is restarted. Allowed values are:

true

false

The default value is ‘false’.

strategy_definition – The tier strategy for the table and its columns.

compression_codec – The default compression codec for this table’s columns.

load_vectors_policy – Set startup data loading scheme for the table. Allowed values are:

always – Load as much vector data as possible into memory before accepting requests.

lazy – Load the necessary vector data at start, and load the remainder lazily.

on_demand – Load vector data as requests use it.

system – Load vector data using the system-configured default.

The default value is ‘system’.

build_pk_index_policy – Set startup primary-key index generation scheme for the table. Allowed values are:

always – Generate as much primary key index data as possible before accepting requests.

lazy – Generate the necessary primary key index data at start, and load the remainder lazily.

on_demand – Generate primary key index data as requests use it.

system – Generate primary key index data using the system-configured default.

The default value is ‘system’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

type_id (str) –
Value of input parameter type_id.

is_collection (bool) –
[DEPRECATED–this will always return false] Indicates if the created entity is a schema.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_table_name – The fully qualified name of the new table (i.e. including the schema)

The default value is an empty dict ( {} ).

create_table_external(table_name=None, filepaths=None, modify_columns={}, create_table_options={}, options={})[source]

Creates a new external table, which is a local database object whose source data is located externally to the database. The source data can be located either in KiFS; on the cluster, accessible to the database; or remotely, accessible via a pre-defined external data source.

The external table can have its structure defined explicitly, via input parameter create_table_options, which contains many of the options from GPUdb.create_table(); or defined implicitly, inferred from the source data.

Parameters

table_name (str) –
Name of the table to be created, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria.

filepaths (list of str) –
A list of file paths from which data will be sourced;

For paths in KiFS, use the URI prefix of kifs:// followed by the path to a file or directory. File matching by prefix is supported, e.g. kifs://dir/file would match dir/file_1 and dir/file_2. When prefix matching is used, the path must start with a full, valid KiFS directory name.

If an external data source is specified in datasource_name, these file paths must resolve to accessible files at that data source location. Prefix matching is supported. If the data source is hdfs, prefixes must be aligned with directories, i.e. partial file names will not match.

If no data source is specified, the files are assumed to be local to the database and must all be accessible to the gpudb user, residing on the path (or relative to the path) specified by the external files directory in the Kinetica configuration file. Wildcards (*) can be used to specify a group of files. Prefix matching is supported, the prefixes must be aligned with directories.

If the first path ends in .tsv, the text delimiter will be defaulted to a tab character. If the first path ends in .psv, the text delimiter will be defaulted to a pipe character (|). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

modify_columns (dict of str to dicts of str to str) –
Not implemented yet. The default value is an empty dict ( {} ).

create_table_options (dict of str to str) –
Options from GPUdb.create_table(), allowing the structure of the table to be defined independently of the data source. Allowed keys are:

type_id – ID of a currently registered type.

no_error_if_exists – If true, prevents an error from occurring if the table already exists and is of the given type. If a table with the same name but a different type exists, it is still an error. Allowed values are:

true

false

The default value is ‘false’.

is_replicated – Affects the distribution scheme for the table’s data. If true and the given table has no explicit shard key defined, the table will be replicated. If false, the table will be sharded according to the shard key specified in the given type_id, or randomly sharded, if no shard key is specified. Note that a type containing a shard key cannot be used to create a replicated table. Allowed values are:

true

false

The default value is ‘false’.

foreign_keys – Semicolon-separated list of foreign keys, of the format ‘(source_column_name [, …]) references target_table_name(primary_key_column_name [, …]) [as foreign_key_name]’.

foreign_shard_key – Foreign shard key of the format ‘source_column references shard_by_column from target_table(primary_key_column)’.

partition_type – Partitioning scheme to use. Allowed values are:

RANGE – Use range partitioning.

INTERVAL – Use interval partitioning.

LIST – Use list partitioning.

HASH – Use hash partitioning.

SERIES – Use series partitioning.

partition_keys – Comma-separated list of partition keys, which are the columns or column expressions by which records will be assigned to partitions defined by partition_definitions.

partition_definitions – Comma-separated list of partition definitions, whose format depends on the choice of partition_type. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for example formats.

is_automatic_partition – If true, a new partition will be created for values which don’t fall into an existing partition. Currently, only supported for list partitions. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in input parameter table_name.

chunk_size – Indicates the number of records per chunk to be used for this table.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for this table.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for this table.

is_result_table – Indicates whether the table is a memory-only table. A result table cannot contain columns with text_search data-handling, and it will not be retained if the server is restarted. Allowed values are:

true

false

The default value is ‘false’.

strategy_definition – The tier strategy for the table and its columns.

compression_codec – The default compression codec for this table’s columns.

The default value is an empty dict ( {} ).

options (dict of str to str) –
Optional parameters. Allowed keys are:

bad_record_table_name – Name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string). When error_handling is abort, bad records table is not populated.

bad_record_table_limit – A positive integer indicating the maximum number of records that can be written to the bad-record-table. The default value is ‘10000’.

bad_record_table_limit_per_input – For subscriptions, a positive integer indicating the maximum number of records that can be written to the bad-record-table per file/payload. Default value will be bad_record_table_limit and total size of the table per rank is limited to bad_record_table_limit.

batch_size – Number of records to insert per batch when inserting data. The default value is ‘50000’.

column_formats – For each target column specified, applies the column-property-bound format to the source data loaded into that column. Each column format will contain a mapping of one or more of its column properties to an appropriate format for each property. Currently supported column properties include date, time, & datetime. The parameter value must be formatted as a JSON string of maps of column names to maps of column properties to their corresponding column formats, e.g., ‘{ “order_date” : { “date” : “%Y.%m.%d” }, “order_time” : { “time” : “%H:%M:%S” } }’.

See default_column_formats for valid format syntax.

columns_to_load – Specifies a comma-delimited list of columns from the source data to load. If more than one file is being loaded, this list applies to all files.

Column numbers can be specified discretely or as a range. For example, a value of ‘5,7,1..3’ will insert values from the fifth column in the source data into the first column in the target table, from the seventh column in the source data into the second column in the target table, and from the first through third columns in the source data into the third through fifth columns in the target table.

If the source data contains a header, column names matching the file header names may be provided instead of column numbers. If the target table doesn’t exist, the table will be created with the columns in this order. If the target table does exist with columns in a different order than the source data, this list can be used to match the order of the target table. For example, a value of ‘C, B, A’ will create a three column table with column C, followed by column B, followed by column A; or will insert those fields in that order into a table created with columns in that order. If the target table exists, the column names must match the source data field names for a name-mapping to be successful.

Mutually exclusive with columns_to_skip.

columns_to_skip – Specifies a comma-delimited list of columns from the source data to skip. Mutually exclusive with columns_to_load.

compression_type – Source data compression type. Allowed values are:

none – No compression.

auto – Auto detect compression type

gzip – gzip file compression.

bzip2 – bzip2 file compression.

The default value is ‘auto’.

datasource_name – Name of an existing external data source from which data file(s) specified in input parameter filepaths will be loaded

default_column_formats – Specifies the default format to be applied to source data loaded into columns with the corresponding column property. Currently supported column properties include date, time, & datetime. This default column-property-bound format can be overridden by specifying a column property & format for a given target column in column_formats. For each specified annotation, the format will apply to all columns with that annotation unless a custom column_formats for that annotation is specified.

The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., ‘{ “date” : “%Y.%m.%d”, “time” : “%H:%M:%S” }’. Column formats are specified as a string of control characters and plain text. The supported control characters are ‘Y’, ‘m’, ‘d’, ‘H’, ‘M’, ‘S’, and ‘s’, which follow the Linux ‘strptime()’ specification, as well as ‘s’, which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds).

Formats for the ‘date’ annotation must include the ‘Y’, ‘m’, and ‘d’ control characters. Formats for the ‘time’ annotation must include the ‘H’, ‘M’, and either ‘S’ or ‘s’ (but not both) control characters. Formats for the ‘datetime’ annotation meet both the ‘date’ and ‘time’ control character requirements. For example, ‘{“datetime” : “%m/%d/%Y %H:%M:%S” }’ would be used to interpret text as “05/04/2000 12:12:11”

error_handling – Specifies how errors should be handled upon insertion. Allowed values are:

permissive – Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.

ignore_bad_records – Malformed records are skipped.

abort – Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.

The default value is ‘abort’.

external_table_type – Specifies whether the external table holds a local copy of the external data. Allowed values are:

materialized – Loads a copy of the external data into the database, refreshed on demand

logical – External data will not be loaded into the database; the data will be retrieved from the source upon servicing each query against the external table

The default value is ‘materialized’.

file_type – Specifies the type of the file(s) whose records will be inserted. Allowed values are:

avro – Avro file format

delimited_text – Delimited text file format; e.g., CSV, TSV, PSV, etc.

gdb – Esri/GDB file format

json – Json file format

parquet – Apache Parquet file format

shapefile – ShapeFile file format

The default value is ‘delimited_text’.

flatten_columns – Specifies how to handle nested columns. Allowed values are:

true – Break up nested columns to multiple columns

false – Treat nested columns as json columns instead of flattening

The default value is ‘false’.

gdal_configuration_options – Comma separated list of gdal conf options, for the specific requests: key=value

ignore_existing_pk – Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when update_on_existing_pk is false). If set to true, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If false, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by error_handling. If the specified table does not have a primary key or if upsert mode is in effect (update_on_existing_pk is true), then this option has no effect. Allowed values are:

true – Ignore new records whose primary key values collide with those of existing records

false – Treat as errors any new records whose primary key values collide with those of existing records

The default value is ‘false’.

ingestion_mode – Whether to do a full load, dry run, or perform a type inference on the source data. Allowed values are:

full – Run a type inference on the source data (if needed) and ingest

dry_run – Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of error_handling.

type_inference_only – Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.

The default value is ‘full’.

jdbc_fetch_size – The JDBC fetch size, which determines how many rows to fetch per round trip. The default value is ‘50000’.

kafka_consumers_per_rank – Number of Kafka consumer threads per rank (valid range 1-6). The default value is ‘1’.

kafka_group_id – The group id to be used when consuming data from a Kafka topic (valid only for Kafka datasource subscriptions).

kafka_offset_reset_policy – Policy to determine whether the Kafka data consumption starts either at earliest offset or latest offset. Allowed values are:

earliest

latest

The default value is ‘earliest’.

kafka_optimistic_ingest – Enable optimistic ingestion where Kafka topic offsets and table data are committed independently to achieve parallelism. Allowed values are:

true

false

The default value is ‘false’.

kafka_subscription_cancel_after – Sets the Kafka subscription lifespan (in minutes). Expired subscription will be cancelled automatically.

kafka_type_inference_fetch_timeout – Maximum time to collect Kafka messages before type inferencing on the set of them.

layer – Geo files layer(s) name(s): comma separated.

loading_mode – Scheme for distributing the extraction and loading of data from the source data file(s). This option applies only when loading files that are local to the database. Allowed values are:

head – The head node loads all data. All files must be available to the head node.

distributed_shared – The head node coordinates loading data by worker processes across all nodes from shared files available to all workers.

NOTE:

Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.

distributed_local – A single worker process on each node loads all files that are available to it. This option works best when each worker loads files from its own file system, to maximize performance. In order to avoid data duplication, either each worker performing the load needs to have visibility to a set of files unique to it (no file is visible to more than one node) or the target table needs to have a primary key (which will allow the worker to automatically deduplicate data).

NOTE:

If the target table doesn’t exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail.

If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.

The default value is ‘head’.

local_time_offset – Apply an offset to Avro local timestamp columns.

max_records_to_load – Limit the number of records to load in this request: if this number is larger than batch_size, then the number of records loaded will be limited to the next whole number of batch_size (per working thread).

num_tasks_per_rank – Number of tasks for reading file per rank. Default will be system configuration parameter, external_file_reader_num_tasks.

poll_interval – If true, the number of seconds between attempts to load external files into the table. If zero, polling will be continuous as long as data is found. If no data is found, the interval will steadily increase to a maximum of 60 seconds. The default value is ‘0’.

primary_keys – Comma separated list of column names to set as primary keys, when not specified in the type.

refresh_method – Method by which the table can be refreshed from its source data. Allowed values are:

manual – Refresh only occurs when manually requested by invoking the refresh action of GPUdb.alter_table() on this table.

on_start – Refresh table on database startup and when manually requested by invoking the refresh action of GPUdb.alter_table() on this table.

The default value is ‘manual’.

schema_registry_connection_retries – Confluent Schema registry connection timeout (in Secs)

schema_registry_connection_timeout – Confluent Schema registry connection timeout (in Secs)

schema_registry_max_consecutive_connection_failures – Max records to skip due to SR connection failures, before failing

max_consecutive_invalid_schema_failure – Max records to skip due to schema related errors, before failing

schema_registry_schema_name – Name of the Avro schema in the schema registry to use when reading Avro records.

shard_keys – Comma separated list of column names to set as shard keys, when not specified in the type.

skip_lines – Skip a number of lines from the beginning of the file.

start_offsets – Starting offsets by partition to fetch from kafka. A comma separated list of partition:offset pairs.

subscribe – Continuously poll the data source to check for new data and load it into the table. Allowed values are:

true

false

The default value is ‘false’.

table_insert_mode – Insertion scheme to use when inserting records from multiple shapefiles. Allowed values are:

single – Insert all records into a single table.

table_per_file – Insert records from each file into a new table corresponding to that file.

The default value is ‘single’.

text_comment_string – Specifies the character string that should be interpreted as a comment line prefix in the source data. All lines in the data starting with the provided string are ignored.

For delimited_text file_type only. The default value is ‘#’.

text_delimiter – Specifies the character delimiting field values in the source data and field names in the header (if present).

For delimited_text file_type only. The default value is ‘,’.

text_escape_character – Specifies the character that is used to escape other characters in the source data.

An ‘a’, ‘b’, ‘f’, ‘n’, ‘r’, ‘t’, or ‘v’ preceded by an escape character will be interpreted as the ASCII bell, backspace, form feed, line feed, carriage return, horizontal tab, & vertical tab, respectively. For example, the escape character followed by an ‘n’ will be interpreted as a newline within a field value.

The escape character can also be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

For delimited_text file_type only.

text_has_header – Indicates whether the source data contains a header row.

For delimited_text file_type only. Allowed values are:

true

false

The default value is ‘true’.

text_header_property_delimiter – Specifies the delimiter for column properties in the header row (if present). Cannot be set to same value as text_delimiter.

For delimited_text file_type only. The default value is ‘|’.

text_null_string – Specifies the character string that should be interpreted as a null value in the source data.

For delimited_text file_type only. The default value is ‘\N’.

text_quote_character – Specifies the character that should be interpreted as a field value quoting character in the source data. The character must appear at beginning and end of field value to take effect. Delimiters within quoted fields are treated as literals and not delimiters. Within a quoted field, two consecutive quote characters will be interpreted as a single literal quote character, effectively escaping it. To not have a quote character, specify an empty string.

For delimited_text file_type only. The default value is ‘”’.

text_search_columns – Add ‘text_search’ property to internally inferenced string columns. Comma separated list of column names or ‘*’ for all columns. To add ‘text_search’ property only to string columns greater than or equal to a minimum size, also set the text_search_min_column_length

text_search_min_column_length – Set the minimum column size for strings to apply the ‘text_search’ property to. Used only when text_search_columns has a value.

truncate_strings – If set to true, truncate string values that are longer than the column’s type size. Allowed values are:

true

false

The default value is ‘false’.

truncate_table – If set to true, truncates the table specified by input parameter table_name prior to loading the file(s). Allowed values are:

true

false

The default value is ‘false’.

type_inference_max_records_read

type_inference_mode – Optimize type inferencing for either speed or accuracy. Allowed values are:

accuracy – Scans data to get exactly-typed & sized columns for all data scanned.

speed – Scans data and picks the widest possible column types so that ‘all’ values will fit with minimum data scanned

The default value is ‘speed’.

remote_query – Remote SQL query from which data will be sourced

remote_query_filter_column – Name of column to be used for splitting remote_query into multiple sub-queries using the data distribution of given column

remote_query_increasing_column – Column on subscribed remote query result that will increase for new records (e.g., TIMESTAMP).

remote_query_partition_column – Alias name for remote_query_filter_column.

update_on_existing_pk – Specifies the record collision policy for inserting into a table with a primary key. If set to true, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be ‘upserted’). If set to false, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by ignore_existing_pk & error_handling. If the specified table does not have a primary key, then this option has no effect. Allowed values are:

true – Upsert new records when primary keys match existing records

false – Reject new records when primary keys match existing records

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

type_id (str) –
ID of the currently registered table structure type for this external table

type_definition (str) –
A JSON string describing the columns of the created external table

type_label (str) –
The user-defined description associated with the table’s structure

type_properties (dict of str to lists of str) –
A mapping of each external table column name to an array of column properties associated with that column

count_inserted (long) –
Number of records inserted into the external table.

count_skipped (long) –
Number of records skipped, when not running in abort error handling mode.

count_updated (long) –
[Not yet implemented] Number of records updated within the external table.

info (dict of str to str) –
Additional information.

files (list of str)

create_table_monitor(table_name=None, options={})[source]

Creates a monitor that watches for a single table modification event type (insert, update, or delete) on a particular table (identified by input parameter table_name) and forwards event notifications to subscribers via ZMQ. After this call completes, subscribe to the returned output parameter topic_id on the ZMQ table monitor port (default 9002). Each time an operation of the given type on the table completes, a multipart message is published for that topic; the first part contains only the topic ID, and each subsequent part contains one binary-encoded Avro object that corresponds to the event and can be decoded using output parameter type_schema. The monitor will continue to run (regardless of whether or not there are any subscribers) until deactivated with GPUdb.clear_table_monitor().

For more information on table monitors, see Table Monitors.

Parameters

table_name (str) –
Name of the table to monitor, in [schema_name.]table_name format, using standard name resolution rules.

options (dict of str to str) –
Optional parameters. Allowed keys are:

event – Type of modification event on the target table to be monitored by this table monitor. Allowed values are:

insert – Get notifications of new record insertions. The new row images are forwarded to the subscribers.

update – Get notifications of update operations. The modified row count information is forwarded to the subscribers.

delete – Get notifications of delete operations. The deleted row count information is forwarded to the subscribers.

The default value is ‘insert’.

monitor_id – ID to use for this monitor instead of a randomly generated one

datasink_name – Name of an existing data sink to send change data notifications to

destination – Destination for the output data in format ‘destination_type://path[:port]’. Supported destination types are ‘http’, ‘https’ and ‘kafka’.

kafka_topic_name – Name of the Kafka topic to publish to if destination in input parameter options is specified and is a Kafka broker

increasing_column – Column on subscribed table that will increase for new records (e.g., TIMESTAMP).

expression – Filter expression to limit records for notification

join_table_names – A comma-separated list of tables (optionally with aliases) to include in the join. The monitored table input parameter table_name must be included, representing only the newly inserted rows (deltas) since the last notification. Other tables can be any existing tables or views. Aliases can be used with the ‘table_name as alias’ syntax.

join_column_names – A comma-separated list of columns or expressions to include from the joined tables. Column references can use table names or aliases defined in ‘join_table_names’. Each column can optionally be aliased using ‘as’. The selected columns will also appear in the notification output.

join_expressions – Optional filter or join expressions to apply when combining the tables. Expressions are standard SQL-style conditions and can reference any table or alias listed in ‘join_table_names’. This corresponds to the WHERE clause of the underlying join, and can include conditions to filter the delta rows.

refresh_method – Method controlling when the table monitor reports changes to the input parameter table_name. Allowed values are:

on_change – Report changes as they occur.

periodic – Report changes periodically at rate specified by refresh_period.

The default value is ‘on_change’.

refresh_period – When refresh_method is periodic, specifies the period in seconds at which changes are reported.

refresh_start_time – When refresh_method is periodic, specifies the first time at which changes are reported. Value is a datetime string with format ‘YYYY-MM-DD HH:MM:SS’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

topic_id (str) –
The ZMQ topic ID to subscribe to for table events.

table_name (str) –
Value of input parameter table_name.

type_schema (str) –
JSON Avro schema of the table, for use in decoding published records.

info (dict of str to str) –
Additional information. Allowed keys are:

ttl – For insert_table/delete_table events, the ttl of the table.

insert_topic_id – The topic id for ‘insert’ event in input parameter options

update_topic_id – The topic id for ‘update’ event in input parameter options

delete_topic_id – The topic id for ‘delete’ event in input parameter options

insert_type_schema – The JSON Avro schema of the table in output parameter table_name

update_type_schema – The JSON Avro schema for ‘update’ events

delete_type_schema – The JSON Avro schema for ‘delete’ events

The default value is an empty dict ( {} ).

create_trigger_by_area(request_id=None, table_names=None, x_column_name=None, x_vector=None, y_column_name=None, y_vector=None, options={})[source]

Sets up an area trigger mechanism for two column_names for one or more tables. (This function is essentially the two-dimensional version of GPUdb.create_trigger_by_range().) Once the trigger has been activated, any record added to the listed tables(s) via GPUdb.insert_records() with the chosen columns’ values falling within the specified region will trip the trigger. All such records will be queued at the trigger port (by default ‘9001’ but able to be retrieved via GPUdb.show_system_status()) for any listening client to collect. Active triggers can be cancelled by using the GPUdb.clear_trigger() endpoint or by clearing all relevant tables.

The output returns the trigger handle as well as indicating success or failure of the trigger activation.

Parameters

request_id (str) –
User-created ID for the trigger. The ID can be alphanumeric, contain symbols, and must contain at least one character.

table_names (list of str) –
Names of the tables on which the trigger will be activated and maintained, each in [schema_name.]table_name format, using standard name resolution rules. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

x_column_name (str) –
Name of a numeric column on which the trigger is activated. Usually ‘x’ for geospatial data points.

x_vector (list of floats) –
The respective coordinate values for the region on which the trigger is activated. This usually translates to the x-coordinates of a geospatial region. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

y_column_name (str) –
Name of a second numeric column on which the trigger is activated. Usually ‘y’ for geospatial data points.

y_vector (list of floats) –
The respective coordinate values for the region on which the trigger is activated. This usually translates to the y-coordinates of a geospatial region. Must be the same length as xvals. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

trigger_id (str) –
Value of input parameter request_id.

info (dict of str to str) –
Additional information.

create_trigger_by_range(request_id=None, table_names=None, column_name=None, min=None, max=None, options={})[source]

Sets up a simple range trigger for a column_name for one or more tables. Once the trigger has been activated, any record added to the listed tables(s) via GPUdb.insert_records() with the chosen column_name’s value falling within the specified range will trip the trigger. All such records will be queued at the trigger port (by default ‘9001’ but able to be retrieved via GPUdb.show_system_status()) for any listening client to collect. Active triggers can be cancelled by using the GPUdb.clear_trigger() endpoint or by clearing all relevant tables.

The output returns the trigger handle as well as indicating success or failure of the trigger activation.

Parameters

request_id (str) –
User-created ID for the trigger. The ID can be alphanumeric, contain symbols, and must contain at least one character.

table_names (list of str) –
Tables on which the trigger will be active, each in [schema_name.]table_name format, using standard name resolution rules. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

column_name (str) –
Name of a numeric column_name on which the trigger is activated.

min (float) –
The lower bound (inclusive) for the trigger range.

max (float) –
The upper bound (inclusive) for the trigger range.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

trigger_id (str) –
Value of input parameter request_id.

info (dict of str to str) –
Additional information.

create_type(type_definition=None, label=None, properties={}, options={})[source]

Creates a new type describing the columns of a table. The type definition is specified as a list of columns, each specified as a list of the column name, data type, and any column attributes.

Example of a type definition with some parameters:

[
    ["id", "int8", "primary_key"],
    ["dept_id", "int8", "primary_key", "shard_key"],
    ["manager_id", "int8", "nullable"],
    ["first_name", "char32"],
    ["last_name", "char64"],
    ["salary", "decimal"],
    ["hire_date", "date"]
]

Each column definition consists of the column name (which should meet the standard column naming criteria), the column’s specific type (int, long, float, double, string, bytes, or any of the possible values for input parameter properties), and any data handling, data key, or data replacement properties.

Note that some properties are mutually exclusive–i.e. they cannot be specified for any given column simultaneously. One example of mutually exclusive properties are primary_key and nullable.

A single primary key and/or single shard key can be set across one or more columns. If a primary key is specified, then a uniqueness constraint is enforced, in that only a single object can exist with a given primary key column value (or set of values for the key columns, if using a composite primary key). When inserting data into a table with a primary key, depending on the parameters in the request, incoming objects with primary key values that match existing objects will either overwrite (i.e. update) the existing object or will be skipped and not added into the set.

Parameters

type_definition (str) –
a JSON string describing the columns of the type to be registered, as described above.

label (str) –
A user-defined description string which can be used to differentiate between tables and types with otherwise identical schemas.

properties (dict of str to lists of str) –
[DEPRECATED–please use these property values in the input parameter type_definition directly, as described at the top, instead] Each key-value pair specifies the properties to use for a given column where the key is the column name. All keys used must be relevant column names for the given table. Specifying any property overrides the default properties for that column (which is based on the column’s data type). Allowed values are:

data – Default property for all numeric and string type columns; makes the column available for GPU queries.

text_search – Valid only for select ‘string’ columns. Enables full text search–see Full Text Search for details and applicable string column types.

timestamp – Valid only for ‘long’ columns. Indicates that this field represents a timestamp and will be provided in milliseconds since the Unix epoch: 00:00:00 Jan 1 1970. Dates represented by a timestamp must fall between the year 1000 and the year 2900.

ulong – Valid only for ‘string’ columns. It represents an unsigned long integer data type. The string can only be interpreted as an unsigned long data type with minimum value of zero, and maximum value of 18446744073709551615.

uuid – Valid only for ‘string’ columns. It represents an uuid data type. Internally, it is stored as a 128-bit integer.

decimal – Valid only for ‘string’ columns. It represents a SQL type NUMERIC(19, 4) data type. There can be up to 15 digits before the decimal point and up to four digits in the fractional part. The value can be positive or negative (indicated by a minus sign at the beginning). This property is mutually exclusive with the text_search property.

date – Valid only for ‘string’ columns. Indicates that this field represents a date and will be provided in the format ‘YYYY-MM-DD’. The allowable range is 1000-01-01 through 2900-01-01. This property is mutually exclusive with the text_search property.

time – Valid only for ‘string’ columns. Indicates that this field represents a time-of-day and will be provided in the format ‘HH:MM:SS.mmm’. The allowable range is 00:00:00.000 through 23:59:59.999. This property is mutually exclusive with the text_search property.

datetime – Valid only for ‘string’ columns. Indicates that this field represents a datetime and will be provided in the format ‘YYYY-MM-DD HH:MM:SS.mmm’. The allowable range is 1000-01-01 00:00:00.000 through 2900-01-01 23:59:59.999. This property is mutually exclusive with the text_search property.

char1 – This property provides optimized memory, disk and query performance for string columns. Strings with this property must be no longer than 1 character.

char2 – This property provides optimized memory, disk and query performance for string columns. Strings with this property must be no longer than 2 characters.

char4 – This property provides optimized memory, disk and query performance for string columns. Strings with this property must be no longer than 4 characters.

char8 – This property provides optimized memory, disk and query performance for string columns. Strings with this property must be no longer than 8 characters.

char16 – This property provides optimized memory, disk and query performance for string columns. Strings with this property must be no longer than 16 characters.

char32 – This property provides optimized memory, disk and query performance for string columns. Strings with this property must be no longer than 32 characters.

char64 – This property provides optimized memory, disk and query performance for string columns. Strings with this property must be no longer than 64 characters.

char128 – This property provides optimized memory, disk and query performance for string columns. Strings with this property must be no longer than 128 characters.

char256 – This property provides optimized memory, disk and query performance for string columns. Strings with this property must be no longer than 256 characters.

boolean – This property provides optimized memory and query performance for int columns. Ints with this property must be between 0 and 1(inclusive)

int8 – This property provides optimized memory and query performance for int columns. Ints with this property must be between -128 and +127 (inclusive)

int16 – This property provides optimized memory and query performance for int columns. Ints with this property must be between -32768 and +32767 (inclusive)

ipv4 – This property provides optimized memory, disk and query performance for string columns representing IPv4 addresses (i.e. 192.168.1.1). Strings with this property must be of the form: A.B.C.D where A, B, C and D are in the range of 0-255.

array – Valid only for ‘string’ columns. Indicates that this field contains an array. The value type and (optionally) the item count should be specified in parenthesis; e.g., ‘array(int, 10)’ for a 10-integer array. Both ‘array(int)’ and ‘array(int, -1)’ will designate an unlimited-length integer array, though no bounds checking is performed on arrays of any length.

json – Valid only for ‘string’ columns. Indicates that this field contains values in JSON format.

vector – Valid only for ‘bytes’ columns. Indicates that this field contains a vector of floats. The length should be specified in parenthesis, e.g., ‘vector(1000)’.

wkt – Valid only for ‘string’ and ‘bytes’ columns. Indicates that this field contains geospatial geometry objects in Well-Known Text (WKT) or Well-Known Binary (WKB) format.

primary_key – This property indicates that this column will be part of (or the entire) primary key.

soft_primary_key – This property indicates that this column will be part of (or the entire) soft primary key.

shard_key – This property indicates that this column will be part of (or the entire) shard key.

nullable – This property indicates that this column is nullable. However, setting this property is insufficient for making the column nullable. The user must declare the type of the column as a union between its regular type and ‘null’ in the Avro schema for the record type in input parameter type_definition. For example, if a column is of type integer and is nullable, then the entry for the column in the Avro schema must be: [‘int’, ‘null’].

The C++, C#, Java, and Python APIs have built-in convenience for bypassing setting the Avro schema by hand. For those languages, one can use this property as usual and not have to worry about the Avro schema for the record.

compress – This property indicates that this column should be compressed with the given codec and optional level; e.g., ‘compress(snappy)’ for Snappy compression and ‘compress(zstd(7))’ for zstd level 7 compression. This property is primarily used in order to save disk space.

dict – This property indicates that this column should be dictionary encoded. It can only be used in conjunction with restricted string (charN), int, long or date columns. Dictionary encoding is best for columns where the cardinality (the number of unique values) is expected to be low. This property can save a large amount of memory.

init_with_now – For ‘date’, ‘time’, ‘datetime’, or ‘timestamp’ column types, replace empty strings and invalid timestamps with ‘NOW()’ upon insert.

init_with_uuid – For ‘uuid’ type, replace empty strings and invalid UUID values with randomly-generated UUIDs upon insert.

update_with_now – For ‘date’, ‘time’, ‘datetime’, or ‘timestamp’ column types, always update the field with ‘NOW()’ upon any update.

The default value is an empty dict ( {} ).

options (dict of str to str) –
Optional parameters. Allowed keys are:

compression_codec – The default compression codec for this type’s columns.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

type_id (str) –
An identifier representing the created type. This type_id can be used in subsequent calls to create a table

type_definition (str) –
Value of input parameter type_definition.

label (str) –
Value of input parameter label.

properties (dict of str to lists of str) –
Value of input parameter properties.

info (dict of str to str) –
Additional information.

create_union(table_name=None, table_names=None, input_column_names=None, output_column_names=None, options={})[source]

Merges data from one or more tables with comparable data types into a new table.

The following merges are supported:

UNION (DISTINCT/ALL) - For data set union details and examples, see Union. For limitations, see Union Limitations and Cautions.

INTERSECT (DISTINCT/ALL) - For data set intersection details and examples, see Intersect. For limitations, see Intersect Limitations.

EXCEPT (DISTINCT/ALL) - For data set subtraction details and examples, see Except. For limitations, see Except Limitations.

MERGE VIEWS - For a given set of filtered views on a single table, creates a single filtered view containing all of the unique records across all of the given filtered data sets.

Non-charN ‘string’ and ‘bytes’ column types cannot be merged, nor can columns marked as store-only.

Parameters

table_name (str) –
Name of the table to be created, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria.

table_names (list of str) –
The list of table names to merge, in [schema_name.]table_name format, using standard name resolution rules. Must contain the names of one or more existing tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

input_column_names (list of lists of str) –
The list of columns from each of the corresponding input tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

output_column_names (list of str) –
The list of names of the columns to be stored in the output table. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter table_name. If persist is false (or unspecified), then this is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_table_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the projection as part of input parameter table_name and use GPUdb.create_schema() to create the schema if non-existent] Name of the schema for the output table. If the schema provided is non-existent, it will be automatically created. The default value is ‘’.

mode – The mode describes what rows of the tables being unioned will be retained. Allowed values are:

union_all – Retains all rows from the specified tables.

union – Retains all unique rows from the specified tables (synonym for union_distinct).

union_distinct – Retains all unique rows from the specified tables.

except – Retains all unique rows from the first table that do not appear in the second table (only works on 2 tables).

except_all – Retains all rows(including duplicates) from the first table that do not appear in the second table (only works on 2 tables).

intersect – Retains all unique rows that appear in both of the specified tables (only works on 2 tables).

intersect_all – Retains all rows(including duplicates) that appear in both of the specified tables (only works on 2 tables).

The default value is ‘union_all’.

long_hash – When true use 128 bit hash for union-distinct, except, except_all, intersect and intersect_all modes. Otherwise use 64 bit hash.

chunk_size – Indicates the number of records per chunk to be used for this output table.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for this output table.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for this output table.

create_indexes – Comma-separated list of columns on which to create indexes on the output table. The columns specified must be present in input parameter output_column_names.

ttl – Sets the TTL of the output table specified in input parameter table_name.

persist – If true, then the output table specified in input parameter table_name will be persisted and will not expire unless a ttl is specified. If false, then the output table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

true

false

The default value is ‘false’.

view_id – ID of view of which this output table is a member. The default value is ‘’.

force_replicated – If true, then the output table specified in input parameter table_name will be replicated even if the source tables are not. Allowed values are:

true

false

The default value is ‘false’.

strategy_definition – The tier strategy for the table and its columns.

compression_codec – The default compression codec for this table’s columns.

no_count – Return a count of 0 for the union table response to avoid the cost of counting; optimization needed for many chunk virtual_union’s. The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

info (dict of str to str) –
Additional information. Allowed keys are:

count – Number of records in the final table

qualified_table_name – The fully qualified name of the result table (i.e. including the schema)

The default value is an empty dict ( {} ).

create_user_external(name=None, options={})[source]

Creates a new external user (a user whose credentials are managed by an external LDAP).

Note

This method should be used for on-premise deployments only.

Parameters

name (str) –
Name of the user to be created. Must exactly match the user’s name in the external LDAP, prefixed with a @. Must not be the same name as an existing user.

options (dict of str to str) –
Optional parameters. Allowed keys are:

activated – Is the user allowed to login. Allowed values are:

true – User may login

false – User may not login

The default value is ‘true’.

create_home_directory – When true, a home directory in KiFS is created for this user. Allowed values are:

true

false

The default value is ‘true’.

default_schema – Default schema to associate with this user

directory_data_limit – The maximum capacity to apply to the created directory if create_home_directory is true. Set to -1 to indicate no upper limit. If empty, the system default limit is applied.

resource_group – Name of an existing resource group to associate with this user

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

create_user_internal(name=None, password=None, options={})[source]

Creates a new internal user (a user whose credentials are managed by the database system).

Parameters

name (str) –
Name of the user to be created. Must contain only lowercase letters, digits, and underscores, and cannot begin with a digit. Must not be the same name as an existing user or role.

password (str) –
Initial password of the user to be created. May be an empty string for no password.

options (dict of str to str) –
Optional parameters. Allowed keys are:

activated – Is the user allowed to login. Allowed values are:

true – User may login

false – User may not login

The default value is ‘true’.

create_home_directory – When true, a home directory in KiFS is created for this user. Allowed values are:

true

false

The default value is ‘true’.

default_schema – Default schema to associate with this user

directory_data_limit – The maximum capacity to apply to the created directory if create_home_directory is true. Set to -1 to indicate no upper limit. If empty, the system default limit is applied.

resource_group – Name of an existing resource group to associate with this user

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information. The default value is an empty dict ( {} ).

create_video(attribute=None, begin=None, duration_seconds=None, end=None, frames_per_second=None, style=None, path=None, style_parameters=None, options={})[source]

Creates a job to generate a sequence of raster images that visualize data over a specified time.

Parameters

attribute (str) –
The animated attribute to map to the video’s frames. Must be present in the LAYERS specified for the visualization. This is often a time-related field but may be any numeric type.

begin (str) –
The start point for the video. Accepts an expression evaluable over the input parameter attribute.

duration_seconds (float) –
Seconds of video to produce

end (str) –
The end point for the video. Accepts an expression evaluable over the input parameter attribute.

frames_per_second (float) –
The presentation frame rate of the encoded video in frames per second.

style (str) –
The name of the visualize mode; should correspond to the schema used for the input parameter style_parameters field. Allowed values are:

chart

raster

classbreak

contour

heatmap

labels

path (str) –
Fully-qualified KiFS path. Write access is required. A file must not exist at that path, unless replace_if_exists is true.

style_parameters (str) –
A string containing the JSON-encoded visualize request. Must correspond to the visualize mode specified in the input parameter style field.

options (dict of str to str) –
Optional parameters. Allowed keys are:

ttl – Sets the TTL of the video.

window – Specified using the data-type corresponding to the input parameter attribute. For a window of size W, a video frame rendered for time t will visualize data in the interval [t-W,t]. The minimum window size is the interval between successive frames. The minimum value is the default. If a value less than the minimum value is specified, it is replaced with the minimum window size. Larger values will make changes throughout the video appear more smooth while smaller values will capture fast variations in the data.

no_error_if_exists – If true, does not return an error if the video already exists. Ignored if replace_if_exists is true. Allowed values are:

false

true

The default value is ‘false’.

replace_if_exists – If true, deletes any existing video with the same path before creating a new video. Allowed values are:

false

true

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

job_id (long) –
An identifier for the created job.

path (str) –
Fully qualified KIFS path to the video file.

info (dict of str to str) –
Additional information.

delete_directory(directory_name=None, options={})[source]

Deletes a directory from KiFS.

Parameters

directory_name (str) –
Name of the directory in KiFS to be deleted. The directory must contain no files, unless recursive is true

options (dict of str to str) –
Optional parameters. Allowed keys are:

recursive – If true, will delete directory and all files residing in it. If false, directory must be empty for deletion. Allowed values are:

true

false

The default value is ‘false’.

no_error_if_not_exists – If true, no error is returned if specified directory does not exist. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

directory_name (str) –
Value of input parameter directory_name.

info (dict of str to str) –
Additional information.

delete_files(file_names=None, options={})[source]

Deletes one or more files from KiFS.

Parameters

file_names (list of str) –
An array of names of files to be deleted. File paths may contain wildcard characters after the KiFS directory delimiter.

Accepted wildcard characters are asterisk (*) to represent any string of zero or more characters, and question mark (?) to indicate a single character. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If true, no error is returned if a specified file does not exist. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

file_names (list of str) –
Names of the files deleted from KiFS

info (dict of str to str) –
Additional information.

delete_graph(graph_name=None, options={})[source]

Deletes an existing graph from the graph server and/or persist.

Parameters

graph_name (str) –
Name of the graph to be deleted.

options (dict of str to str) –
Optional parameters. Allowed keys are:

delete_persist – If set to true, the graph is removed from the server and persist. If set to false, the graph is removed from the server but is left in persist. The graph can be reloaded from persist if it is recreated with the same ‘graph_name’. Allowed values are:

true

false

The default value is ‘true’.

server_id – Indicates which graph server(s) to send the request to. Default is to send to get information about all the servers.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

result (bool) –
Indicates a successful deletion.

info (dict of str to str) –
Additional information.

delete_proc(proc_name=None, options={})[source]

Deletes a proc. Any currently running instances of the proc will be killed.

Parameters

proc_name (str) –
Name of the proc to be deleted. Must be the name of a currently existing proc.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

proc_name (str) –
Value of input parameter proc_name.

info (dict of str to str) –
Additional information.

delete_records(table_name=None, expressions=None, options={})[source]

Deletes record(s) matching the provided criteria from the given table. The record selection criteria can either be one or more input parameter expressions (matching multiple records), a single record identified by record_id options, or all records when using delete_all_records. Note that the three selection criteria are mutually exclusive. This operation cannot be run on a view. The operation is synchronous meaning that a response will not be available until the request is completely processed and all the matching records are deleted.

Parameters

table_name (str) –
Name of the table from which to delete records, in [schema_name.]table_name format, using standard name resolution rules. Must contain the name of an existing table; not applicable to views.

expressions (list of str) –
A list of the actual predicates, one for each select; format should follow the guidelines provided here. Specifying one or more input parameter expressions is mutually exclusive to specifying record_id in the input parameter options. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

global_expression – An optional global expression to reduce the search space of the input parameter expressions. The default value is ‘’.

record_id – A record ID identifying a single record, obtained at the time of insertion of the record or by calling GPUdb.get_records_from_collection() with the return_record_ids option. This option cannot be used to delete records from replicated tables.

delete_all_records – If set to true, all records in the table will be deleted. If set to false, then the option is effectively ignored. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count_deleted (long) –
Total number of records deleted across all expressions.

counts_deleted (list of longs) –
Total number of records deleted per expression.

info (dict of str to str) –
Additional information.

delete_resource_group(name=None, options={})[source]

Deletes a resource group.

Parameters

name (str) –
Name of the resource group to be deleted.

options (dict of str to str) –
Optional parameters. Allowed keys are:

cascade_delete – If true, delete any existing entities owned by this group. Otherwise this request will return an error of any such entities exist. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

delete_role(name=None, options={})[source]

Deletes an existing role.

Note

This method should be used for on-premise deployments only.

Parameters

name (str) –
Name of the role to be deleted. Must be an existing role.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

delete_user(name=None, options={})[source]

Deletes an existing user.

Note

This method should be used for on-premise deployments only.

Parameters

name (str) –
Name of the user to be deleted. Must be an existing user.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

download_files(file_names=None, read_offsets=None, read_lengths=None, options={})[source]

Downloads one or more files from KiFS.

Parameters

file_names (list of str) –
An array of the file names to download from KiFS. File paths may contain wildcard characters after the KiFS directory delimiter.

Accepted wildcard characters are asterisk (*) to represent any string of zero or more characters, and question mark (?) to indicate a single character. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

read_offsets (list of longs) –
An array of starting byte offsets from which to read each respective file in input parameter file_names. Must either be empty or the same length as input parameter file_names. If empty, files are downloaded in their entirety. If not empty, input parameter read_lengths must also not be empty. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

read_lengths (list of longs) –
Array of number of bytes to read from each respective file in input parameter file_names. Must either be empty or the same length as input parameter file_names. If empty, files are downloaded in their entirety. If not empty, input parameter read_offsets must also not be empty. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

file_encoding – Encoding to be applied to the output file data. When using JSON serialization it is recommended to specify this as base64. Allowed values are:

base64 – Apply base64 encoding to the output file data.

none – Do not apply any encoding to the output file data.

The default value is ‘none’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

file_names (list of str) –
Names of the files downloaded from KiFS

file_data (list of bytes) –
Data for the respective downloaded files listed in output parameter file_names

info (dict of str to str) –
Additional information.

drop_credential(credential_name=None, options={})[source]

Drop an existing credential.

Parameters

credential_name (str) –
Name of the credential to be dropped. Must be an existing credential.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

credential_name (str) –
Value of input parameter credential_name.

info (dict of str to str) –
Additional information.

drop_datasink(name=None, options={})[source]

Drops an existing data sink.

By default, if any table monitors use this sink as a destination, the request will be blocked unless option clear_table_monitors is true.

Parameters

name (str) –
Name of the data sink to be dropped. Must be an existing data sink.

options (dict of str to str) –
Optional parameters. Allowed keys are:

clear_table_monitors – If true, any table monitors that use this data sink will be cleared. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

drop_datasource(name=None, options={})[source]

Drops an existing data source. Any external tables that depend on the data source must be dropped before it can be dropped.

Parameters

name (str) –
Name of the data source to be dropped. Must be an existing data source.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

info (dict of str to str) –
Additional information.

drop_environment(environment_name=None, options={})[source]

Drop an existing user-defined function (UDF) environment.

Parameters

environment_name (str) –
Name of the environment to be dropped. Must be an existing environment.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If true and if the environment specified in input parameter environment_name does not exist, no error is returned. If false and if the environment specified in input parameter environment_name does not exist, then an error is returned. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

environment_name (str) –
Value of input parameter environment_name.

info (dict of str to str) –
Additional information.

drop_schema(schema_name=None, options={})[source]

Drops an existing SQL-style schema, specified in input parameter schema_name.

Parameters

schema_name (str) –
Name of the schema to be dropped. Must be an existing schema.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If true and if the schema specified in input parameter schema_name does not exist, no error is returned. If false and if the schema specified in input parameter schema_name does not exist, then an error is returned. Allowed values are:

true

false

The default value is ‘false’.

cascade – If true, all tables within the schema will be dropped. If false, the schema will be dropped only if empty. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

schema_name (str) –
Value of input parameter schema_name.

info (dict of str to str) –
Additional information.

execute_proc(proc_name=None, params={}, bin_params={}, input_table_names=[], input_column_names={}, output_table_names=[], options={})[source]

Executes a proc. This endpoint is asynchronous and does not wait for the proc to complete before returning.

If the proc being executed is distributed, input parameter input_table_names & input parameter input_column_names may be passed to the proc to use for reading data, and input parameter output_table_names may be passed to the proc to use for writing data.

If the proc being executed is non-distributed, these table parameters will be ignored.

Parameters

proc_name (str) –
Name of the proc to execute. Must be the name of a currently existing proc.

params (dict of str to str) –
A map containing named parameters to pass to the proc. Each key/value pair specifies the name of a parameter and its value. The default value is an empty dict ( {} ).

bin_params (dict of str to bytes) –
A map containing named binary parameters to pass to the proc. Each key/value pair specifies the name of a parameter and its value. The default value is an empty dict ( {} ).

input_table_names (list of str) –
Names of the tables containing data to be passed to the proc. Each name specified must be the name of a currently existing table, in [schema_name.]table_name format, using standard name resolution rules. If no table names are specified, no data will be passed to the proc. This parameter is ignored if the proc has a non-distributed execution mode. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

input_column_names (dict of str to lists of str) –
Map of table names from input parameter input_table_names to lists of names of columns from those tables that will be passed to the proc. Each column name specified must be the name of an existing column in the corresponding table. If a table name from input parameter input_table_names is not included, all columns from that table will be passed to the proc. This parameter is ignored if the proc has a non-distributed execution mode. The default value is an empty dict ( {} ).

output_table_names (list of str) –
Names of the tables to which output data from the proc will be written, each in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If a specified table does not exist, it will automatically be created with the same schema as the corresponding table (by order) from input parameter input_table_names, excluding any primary and shard keys. If a specified table is a non-persistent result table, it must not have primary or shard keys. If no table names are specified, no output data can be returned from the proc. This parameter is ignored if the proc has a non-distributed execution mode. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

cache_input – No longer supported; option will be ignored. The default value is ‘’.

use_cached_input – No longer supported; option will be ignored. The default value is ‘’.

run_tag – A string that, if not empty, can be used in subsequent calls to GPUdb.show_proc_status() or GPUdb.kill_proc() to identify the proc instance. The default value is ‘’.

max_output_lines – The maximum number of lines of output from stdout and stderr to return via GPUdb.show_proc_status(). If the number of lines output exceeds the maximum, earlier lines are discarded. The default value is ‘100’.

execute_at_startup – If true, an instance of the proc will run when the database is started instead of running immediately. The output parameter run_id can be retrieved using GPUdb.show_proc() and used in GPUdb.show_proc_status(). Allowed values are:

true

false

The default value is ‘false’.

execute_at_startup_as – Sets the alternate user name to execute this proc instance as when execute_at_startup is true. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

run_id (str) –
The run ID of the running proc instance. This may be passed to GPUdb.show_proc_status() to obtain status information, or GPUdb.kill_proc() to kill the proc instance.

info (dict of str to str) –
Additional information.

execute_sql(statement=None, offset=0, limit=-9999, encoding='binary', request_schema_str='', data=[], options={})[source]

Execute a SQL statement (query, DML, or DDL).

See SQL Support for the complete set of supported SQL commands.

When a caller wants all the results from a large query (e.g., more than max_get_records_size records), they can make multiple calls to this endpoint using the input parameter offset and input parameter limit parameters to page through the results. Normally, this will execute the input parameter statement query each time. To avoid re-executing the query each time and to keep the results in the same order, the caller should specify a paging_table name to hold the results of the query between calls and specify the paging_table on subsequent calls. When this is done, the caller should clear the paging table and any other tables in the result_table_list (both returned in the response) when they are done paging through the results. Output parameter paging_table (and result_table_list) will be empty if no paging table was created (e.g., when all the query results were returned in the first call).

Parameters

statement (str) –
SQL statement (query, DML, or DDL) to be executed

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records; either ‘binary’ or ‘json’. Allowed values are:

binary

json

The default value is ‘binary’.

request_schema_str (str) –
Avro schema of input parameter data. The default value is ‘’.

data (list of bytes) –
An array of binary-encoded data for the records to be binded to the SQL query. Or use query_parameters to pass the data in JSON format. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

cost_based_optimization – If false, disables the cost-based optimization of the given query. Allowed values are:

true

false

The default value is ‘false’.

distributed_joins – If true, enables the use of distributed joins in servicing the given query. Any query requiring a distributed join will succeed, though hints can be used in the query to change the distribution of the source data to allow the query to succeed. Allowed values are:

true

false

The default value is ‘false’.

distributed_operations – If true, enables the use of distributed operations in servicing the given query. Any query requiring a distributed join will succeed, though hints can be used in the query to change the distribution of the source data to allow the query to succeed. Allowed values are:

true

false

The default value is ‘false’.

ignore_existing_pk – Specifies the record collision error-suppression policy for inserting into or updating a table with a primary key, only used when primary key record collisions are rejected (update_on_existing_pk is false). If set to true, any record insert/update that is rejected for resulting in a primary key collision with an existing table record will be ignored with no error generated. If false, the rejection of any insert/update for resulting in a primary key collision will cause an error to be reported. If the specified table does not have a primary key or if update_on_existing_pk is true, then this option has no effect. Allowed values are:

true – Ignore inserts/updates that result in primary key collisions with existing records

false – Treat as errors any inserts/updates that result in primary key collisions with existing records

The default value is ‘false’.

late_materialization – If true, Joins/Filters results will always be materialized ( saved to result tables format). Allowed values are:

true

false

The default value is ‘false’.

paging_table – When specified (or paging_table_ttl is set), the system will create a paging table to hold the results of the query, when the output has more records than are in the response (i.e., when output parameter has_more_records is true). If the specified paging table exists, the records from the paging table are returned without re-evaluating the query. It is the caller’s responsibility to clear the output parameter paging_table and other tables in the result_table_list (both returned in the response) when they are done with this query.

paging_table_ttl – Sets the TTL of the paging table. -1 indicates no timeout. Setting this option will cause a paging table to be generated when needed. The output parameter paging_table and other tables in the result_table_list (both returned in the response) will be automatically cleared after the TTL expires, if set to a positive number. However, it is still recommended that the caller clear these tables when they are done with this query.

parallel_execution – If false, disables the parallel step execution of the given query. Allowed values are:

true

false

The default value is ‘true’.

plan_cache – If false, disables plan caching for the given query. Allowed values are:

true

false

The default value is ‘true’.

prepare_mode – If true, compiles a query into an execution plan and saves it in query cache. Query execution is not performed and an empty response will be returned to user. Allowed values are:

true

false

The default value is ‘false’.

preserve_dict_encoding – If true, then columns that were dict encoded in the source table will be dict encoded in the projection table. Allowed values are:

true

false

The default value is ‘true’.

query_parameters – Query parameters in JSON array or arrays (for inserting multiple rows). This can be used instead of input parameter data and input parameter request_schema_str.

results_caching – If false, disables caching of the results of the given query. Allowed values are:

true

false

The default value is ‘true’.

rule_based_optimization – If false, disables rule-based rewrite optimizations for the given query. Allowed values are:

true

false

The default value is ‘true’.

ssq_optimization – If false, scalar subqueries will be translated into joins. Allowed values are:

true

false

The default value is ‘true’.

ttl – Sets the TTL of the intermediate result tables used in query execution.

update_on_existing_pk – Specifies the record collision policy for inserting into or updating a table with a primary key. If set to true, any existing table record with primary key values that match those of a record being inserted or updated will be replaced by that record. If set to false, any such primary key collision will result in the insert/update being rejected and the error handled as determined by ignore_existing_pk. If the specified table does not have a primary key, then this option has no effect. Allowed values are:

true – Replace the collided-into record with the record inserted or updated when a new/modified record causes a primary key collision with an existing record

false – Reject the insert or update when it results in a primary key collision with an existing record

The default value is ‘false’.

validate_change_column – When changing a column using alter table, validate the change before applying it. If true, then validate all values. A value too large (or too long) for the new type will prevent any change. If false, then when a value is too large or long, it will be truncated. Allowed values are:

true

false

The default value is ‘true’.

current_schema – Use the supplied value as the default schema when processing this SQL command.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count_affected (long) –
The number of objects/records affected.

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

binary_encoded_response (bytes) –
Avro binary encoded response.

json_encoded_response (str) –
Avro JSON encoded response.

total_number_of_records (long) –
Total/Filtered number of records.

has_more_records (bool) –
Too many records. Returned a partial set. Allowed values are:

True

False

paging_table (str) –
Name of the table that has the result records of the query. Valid, when output parameter has_more_records is true. The caller should clear this and all tables in result_table_list when they are done querying.

info (dict of str to str) –
Additional information. Allowed keys are:

count – Number of records without final limits applied

result_table_list – List of tables, comma-separated, in addition to the output parameter paging_table, created as result of the query. These should be cleared by the caller when they are done querying.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
A RecordType object using which the user can decode the binary data by using GPUdbRecord.decode_binary_data(). If JSON encoding is used, then None.

execute_sql_and_decode(statement=None, offset=0, limit=-9999, encoding='binary', request_schema_str='', data=[], options={}, record_type=None, force_primitive_return_types=True, get_column_major=True)[source]

Execute a SQL statement (query, DML, or DDL).

See SQL Support for the complete set of supported SQL commands.

When a caller wants all the results from a large query (e.g., more than max_get_records_size records), they can make multiple calls to this endpoint using the input parameter offset and input parameter limit parameters to page through the results. Normally, this will execute the input parameter statement query each time. To avoid re-executing the query each time and to keep the results in the same order, the caller should specify a paging_table name to hold the results of the query between calls and specify the paging_table on subsequent calls. When this is done, the caller should clear the paging table and any other tables in the result_table_list (both returned in the response) when they are done paging through the results. Output parameter paging_table (and result_table_list) will be empty if no paging table was created (e.g., when all the query results were returned in the first call).

Parameters

statement (str) –
SQL statement (query, DML, or DDL) to be executed

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records; either ‘binary’ or ‘json’. Allowed values are:

binary

json

The default value is ‘binary’.

request_schema_str (str) –
Avro schema of input parameter data. The default value is ‘’.

data (list of bytes) –
An array of binary-encoded data for the records to be binded to the SQL query. Or use query_parameters to pass the data in JSON format. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

cost_based_optimization – If false, disables the cost-based optimization of the given query. Allowed values are:

true

false

The default value is ‘false’.

distributed_joins – If true, enables the use of distributed joins in servicing the given query. Any query requiring a distributed join will succeed, though hints can be used in the query to change the distribution of the source data to allow the query to succeed. Allowed values are:

true

false

The default value is ‘false’.

distributed_operations – If true, enables the use of distributed operations in servicing the given query. Any query requiring a distributed join will succeed, though hints can be used in the query to change the distribution of the source data to allow the query to succeed. Allowed values are:

true

false

The default value is ‘false’.

ignore_existing_pk – Specifies the record collision error-suppression policy for inserting into or updating a table with a primary key, only used when primary key record collisions are rejected (update_on_existing_pk is false). If set to true, any record insert/update that is rejected for resulting in a primary key collision with an existing table record will be ignored with no error generated. If false, the rejection of any insert/update for resulting in a primary key collision will cause an error to be reported. If the specified table does not have a primary key or if update_on_existing_pk is true, then this option has no effect. Allowed values are:

true – Ignore inserts/updates that result in primary key collisions with existing records

false – Treat as errors any inserts/updates that result in primary key collisions with existing records

The default value is ‘false’.

late_materialization – If true, Joins/Filters results will always be materialized ( saved to result tables format). Allowed values are:

true

false

The default value is ‘false’.

paging_table – When specified (or paging_table_ttl is set), the system will create a paging table to hold the results of the query, when the output has more records than are in the response (i.e., when output parameter has_more_records is true). If the specified paging table exists, the records from the paging table are returned without re-evaluating the query. It is the caller’s responsibility to clear the output parameter paging_table and other tables in the result_table_list (both returned in the response) when they are done with this query.

paging_table_ttl – Sets the TTL of the paging table. -1 indicates no timeout. Setting this option will cause a paging table to be generated when needed. The output parameter paging_table and other tables in the result_table_list (both returned in the response) will be automatically cleared after the TTL expires, if set to a positive number. However, it is still recommended that the caller clear these tables when they are done with this query.

parallel_execution – If false, disables the parallel step execution of the given query. Allowed values are:

true

false

The default value is ‘true’.

plan_cache – If false, disables plan caching for the given query. Allowed values are:

true

false

The default value is ‘true’.

prepare_mode – If true, compiles a query into an execution plan and saves it in query cache. Query execution is not performed and an empty response will be returned to user. Allowed values are:

true

false

The default value is ‘false’.

preserve_dict_encoding – If true, then columns that were dict encoded in the source table will be dict encoded in the projection table. Allowed values are:

true

false

The default value is ‘true’.

query_parameters – Query parameters in JSON array or arrays (for inserting multiple rows). This can be used instead of input parameter data and input parameter request_schema_str.

results_caching – If false, disables caching of the results of the given query. Allowed values are:

true

false

The default value is ‘true’.

rule_based_optimization – If false, disables rule-based rewrite optimizations for the given query. Allowed values are:

true

false

The default value is ‘true’.

ssq_optimization – If false, scalar subqueries will be translated into joins. Allowed values are:

true

false

The default value is ‘true’.

ttl – Sets the TTL of the intermediate result tables used in query execution.

update_on_existing_pk – Specifies the record collision policy for inserting into or updating a table with a primary key. If set to true, any existing table record with primary key values that match those of a record being inserted or updated will be replaced by that record. If set to false, any such primary key collision will result in the insert/update being rejected and the error handled as determined by ignore_existing_pk. If the specified table does not have a primary key, then this option has no effect. Allowed values are:

true – Replace the collided-into record with the record inserted or updated when a new/modified record causes a primary key collision with an existing record

false – Reject the insert or update when it results in a primary key collision with an existing record

The default value is ‘false’.

validate_change_column – When changing a column using alter table, validate the change before applying it. If true, then validate all values. A value too large (or too long) for the new type will prevent any change. If false, then when a value is too large or long, it will be truncated. Allowed values are:

true

false

The default value is ‘true’.

current_schema – Use the supplied value as the default schema when processing this SQL command.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
The record type expected in the results, or None to determine the appropriate type automatically. If known, providing this may improve performance in binary mode. Not used in JSON mode. The default value is None.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.

Returns

A dict with the following entries–

count_affected (long) –
The number of objects/records affected.

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

total_number_of_records (long) –
Total/Filtered number of records.

has_more_records (bool) –
Too many records. Returned a partial set. Allowed values are:

True

False

paging_table (str) –
Name of the table that has the result records of the query. Valid, when output parameter has_more_records is true. The caller should clear this and all tables in result_table_list when they are done querying.

info (dict of str to str) –
Additional information. Allowed keys are:

count – Number of records without final limits applied

result_table_list – List of tables, comma-separated, in addition to the output parameter paging_table, created as result of the query. These should be cleared by the caller when they are done querying.

The default value is an empty dict ( {} ).

records (list of Record) –
A list of Record objects which contain the decoded records.

export_query_metrics(options={})[source]

Export query metrics to a given destination. Returns query metrics.

Parameters

options (dict of str to str) –
Optional parameters. Allowed keys are:

expression – Filter for multi query export

filepath – Path to export target specified as a filename or existing directory.

format – Specifies which format to export the metrics. Allowed values are:

json – Generic json output

json_trace_event – Chromium/Perfetto trace event format

The default value is ‘json’.

job_id – Export query metrics for the currently running job

limit – Record limit per file for multi query export

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information. Allowed keys are:

exported_files – Comma separated list of filenames exported if applicable

output – Exported metrics if no other destination specified

The default value is an empty dict ( {} ).

export_records_to_files(table_name=None, filepath=None, options={})[source]

Export records from a table to files. All tables can be exported, in full or partial (see columns_to_export and columns_to_skip). Additional filtering can be applied when using export table with expression through SQL. Default destination is KIFS, though other storage types (Azure, S3, GCS, and HDFS) are supported through datasink_name; see GPUdb.create_datasink().

Server’s local file system is not supported. Default file format is delimited text. See options for different file types and different options for each file type. Table is saved to a single file if within max file size limits (may vary depending on datasink type). If not, then table is split into multiple files; these may be smaller than the max size limit.

All filenames created are returned in the response.

Parameters

table_name (str)

filepath (str) –
Path to data export target. If input parameter filepath has a file extension, it is read as the name of a file. If input parameter filepath is a directory, then the source table name with a random UUID appended will be used as the name of each exported file, all written to that directory. If filepath is a filename, then all exported files will have a random UUID appended to the given name. In either case, the target directory specified or implied must exist. The names of all exported files are returned in the response.

options (dict of str to str) –
Optional parameters. Allowed keys are:

batch_size – Number of records to be exported as a batch. The default value is ‘1000000’.

column_formats – For each source column specified, applies the column-property-bound format. Currently supported column properties include date, time, & datetime. The parameter value must be formatted as a JSON string of maps of column names to maps of column properties to their corresponding column formats, e.g., ‘{ “order_date” : { “date” : “%Y.%m.%d” }, “order_time” : { “time” : “%H:%M:%S” } }’.

See default_column_formats for valid format syntax.

columns_to_export – Specifies a comma-delimited list of columns from the source table to export, written to the output file in the order they are given.

Column names can be provided, in which case the target file will use those names as the column headers as well.

Alternatively, column numbers can be specified–discretely or as a range. For example, a value of ‘5,7,1..3’ will write values from the fifth column in the source table into the first column in the target file, from the seventh column in the source table into the second column in the target file, and from the first through third columns in the source table into the third through fifth columns in the target file.

Mutually exclusive with columns_to_skip.

columns_to_skip – Comma-separated list of column names or column numbers to not export. All columns in the source table not specified will be written to the target file in the order they appear in the table definition. Mutually exclusive with columns_to_export.

datasink_name – Datasink name, created using GPUdb.create_datasink().

default_column_formats – Specifies the default format to use to write data. Currently supported column properties include date, time, & datetime. This default column-property-bound format can be overridden by specifying a column property & format for a given source column in column_formats. For each specified annotation, the format will apply to all columns with that annotation unless custom column_formats for that annotation are specified.

The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., ‘{ “date” : “%Y.%m.%d”, “time” : “%H:%M:%S” }’. Column formats are specified as a string of control characters and plain text. The supported control characters are ‘Y’, ‘m’, ‘d’, ‘H’, ‘M’, ‘S’, and ‘s’, which follow the Linux ‘strptime()’ specification, as well as ‘s’, which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds).

Formats for the ‘date’ annotation must include the ‘Y’, ‘m’, and ‘d’ control characters. Formats for the ‘time’ annotation must include the ‘H’, ‘M’, and either ‘S’ or ‘s’ (but not both) control characters. Formats for the ‘datetime’ annotation meet both the ‘date’ and ‘time’ control character requirements. For example, ‘{“datetime” : “%m/%d/%Y %H:%M:%S” }’ would be used to write text as “05/04/2000 12:12:11”

export_ddl – Save DDL to a separate file. The default value is ‘false’.

file_extension – Extension to give the export file. The default value is ‘.csv’.

file_type – Specifies the file format to use when exporting data. Allowed values are:

delimited_text – Delimited text file format; e.g., CSV, TSV, PSV, etc.

parquet

The default value is ‘delimited_text’.

kinetica_header – Whether to include a Kinetica proprietary header. Will not be written if text_has_header is false. Allowed values are:

true

false

The default value is ‘false’.

kinetica_header_delimiter – If a Kinetica proprietary header is included, then specify a property separator. Different from column delimiter. The default value is ‘|’.

compression_type – File compression type. GZip can be applied to text and Parquet files. Snappy can only be applied to Parquet files, and is the default compression for them. Allowed values are:

uncompressed

snappy

gzip

single_file – Save records to a single file. This option may be ignored if file size exceeds internal file size limits (this limit will differ on different targets). Allowed values are:

true

false

overwrite

The default value is ‘true’.

single_file_max_size – Max file size (in MB) to allow saving to a single file. May be overridden by target limitations. The default value is ‘’.

text_delimiter – Specifies the character to write out to delimit field values and field names in the header (if present).

For delimited_text file_type only. The default value is ‘,’.

text_has_header – Indicates whether to write out a header row.

For delimited_text file_type only. Allowed values are:

true

false

The default value is ‘true’.

text_null_string – Specifies the character string that should be written out for the null value in the data.

For delimited_text file_type only. The default value is ‘\N’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Name of source table

count_exported (long) –
Number of source table records exported

count_skipped (long) –
Number of source table records skipped

files (list of str) –
Names of all exported files

last_timestamp (long) –
Timestamp of last file scanned

data_text (list of str)

data_bytes (list of bytes)

info (dict of str to str) –
Additional information

export_records_to_table(table_name=None, remote_query='', options={})[source]

Exports records from source table to the specified target table in an external database

Parameters

table_name (str) –
Name of the table from which the data will be exported to remote database, in [schema_name.]table_name format, using standard name resolution rules.

remote_query (str) –
Parameterized insert query to export gpudb table data into remote database. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

batch_size – Batch size, which determines how many rows to export per round trip. The default value is ‘200000’.

datasink_name – Name of an existing external data sink to which table name specified in input parameter table_name will be exported

jdbc_session_init_statement – Executes the statement per each JDBC session before doing actual load. The default value is ‘’.

jdbc_connection_init_statement – Executes the statement once before doing actual load. The default value is ‘’.

remote_table – Name of the target table to which source table is exported. When this option is specified remote_query cannot be specified. The default value is ‘’.

use_st_geomfrom_casts – Wraps parameterized variables with st_geomfromtext or st_geomfromwkb based on source column type. Allowed values are:

true

false

The default value is ‘false’.

use_indexed_parameters – Uses $n style syntax when generating insert query for remote_table option. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

count_inserted (long) –
Number of records inserted into the target table.

count_skipped (long) –
Number of records skipped.

count_updated (long) –
[Not yet implemented] Number of records updated within the target table.

info (dict of str to str) –
Additional information.

filter(table_name=None, view_name='', expression=None, options={})[source]

Filters data based on the specified expression. The results are stored in a result set with the given input parameter view_name.

For details see Expressions.

The response message contains the number of points for which the expression evaluated to be true, which is equivalent to the size of the result view.

Parameters

table_name (str) –
Name of the table to filter, in [schema_name.]table_name format, using standard name resolution rules. This may be the name of a table or a view (when chaining queries).

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

expression (str) –
The select expression to filter the specified table. For details see Expressions.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.

view_id – view this filtered-view is part of. The default value is ‘’.

ttl – Sets the TTL of the view specified in input parameter view_name.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records that matched the given select expression.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_area(table_name=None, view_name='', x_column_name=None, x_vector=None, y_column_name=None, y_vector=None, options={})[source]

Calculates which objects from a table are within a named area of interest (NAI/polygon). The operation is synchronous, meaning that a response will not be returned until all the matching objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input NAI restriction specification is created with the name input parameter view_name passed in as part of the input.

Parameters

table_name (str) –
Name of the table to filter, in [schema_name.]table_name format, using standard name resolution rules. This may be the name of a table or a view (when chaining queries).

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

x_column_name (str) –
Name of the column containing the x values to be filtered.

x_vector (list of floats) –
List of x coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

y_column_name (str) –
Name of the column containing the y values to be filtered.

y_vector (list of floats) –
List of y coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the area filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_area_geometry(table_name=None, view_name='', column_name=None, x_vector=None, y_vector=None, options={})[source]

Calculates which geospatial geometry objects from a table intersect a named area of interest (NAI/polygon). The operation is synchronous, meaning that a response will not be returned until all the matching objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input NAI restriction specification is created with the name input parameter view_name passed in as part of the input.

Parameters

table_name (str) –
Name of the table to filter, in [schema_name.]table_name format, using standard name resolution rules. This may be the name of a table or a view (when chaining queries).

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

column_name (str) –
Name of the geospatial geometry column to be filtered.

x_vector (list of floats) –
List of x coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

y_vector (list of floats) –
List of y coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] The schema for the newly created view. If the schema is non-existent, it will be automatically created.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the area filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_box(table_name=None, view_name='', x_column_name=None, min_x=None, max_x=None, y_column_name=None, min_y=None, max_y=None, options={})[source]

Calculates how many objects within the given table lie in a rectangular box. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set which satisfies the input NAI restriction specification is also created when a input parameter view_name is passed in as part of the input payload.

Parameters

table_name (str) –
Name of the table on which the bounding box operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

x_column_name (str) –
Name of the column on which to perform the bounding box query. Must be a valid numeric column.

min_x (float) –
Lower bound for the column chosen by input parameter x_column_name. Must be less than or equal to input parameter max_x.

max_x (float) –
Upper bound for input parameter x_column_name. Must be greater than or equal to input parameter min_x.

y_column_name (str) –
Name of a column on which to perform the bounding box query. Must be a valid numeric column.

min_y (float) –
Lower bound for input parameter y_column_name. Must be less than or equal to input parameter max_y.

max_y (float) –
Upper bound for input parameter y_column_name. Must be greater than or equal to input parameter min_y.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the box filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_box_geometry(table_name=None, view_name='', column_name=None, min_x=None, max_x=None, min_y=None, max_y=None, options={})[source]

Calculates which geospatial geometry objects from a table intersect a rectangular box. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set which satisfies the input NAI restriction specification is also created when a input parameter view_name is passed in as part of the input payload.

Parameters

table_name (str) –
Name of the table on which the bounding box operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

column_name (str) –
Name of the geospatial geometry column to be filtered.

min_x (float) –
Lower bound for the x-coordinate of the rectangular box. Must be less than or equal to input parameter max_x.

max_x (float) –
Upper bound for the x-coordinate of the rectangular box. Must be greater than or equal to input parameter min_x.

min_y (float) –
Lower bound for the y-coordinate of the rectangular box. Must be less than or equal to input parameter max_y.

max_y (float) –
Upper bound for the y-coordinate of the rectangular box. Must be greater than or equal to input parameter min_y.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the box filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_geometry(table_name=None, view_name='', column_name=None, input_wkt='', operation=None, options={})[source]

Applies a geometry filter against a geospatial geometry column in a given table or view. The filtering geometry is provided by input parameter input_wkt.

Parameters

table_name (str) –
Name of the table on which the filter by geometry will be performed, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table or view containing a geospatial geometry column.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

column_name (str) –
Name of the column to be used in the filter. Must be a geospatial geometry column.

input_wkt (str) –
A geometry in WKT format that will be used to filter the objects in input parameter table_name. The default value is ‘’.

operation (str) –
The geometric filtering operation to perform. Allowed values are:

contains – Matches records that contain the given WKT in input parameter input_wkt, i.e. the given WKT is within the bounds of a record’s geometry.

crosses – Matches records that cross the given WKT.

disjoint – Matches records that are disjoint from the given WKT.

equals – Matches records that are the same as the given WKT.

intersects – Matches records that intersect the given WKT.

overlaps – Matches records that overlap the given WKT.

touches – Matches records that touch the given WKT.

within – Matches records that are within the given WKT.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the geometry filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_list(table_name=None, view_name='', column_values_map=None, options={})[source]

Calculates which records from a table have values in the given list for the corresponding column. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input filter specification is also created if a input parameter view_name is passed in as part of the request.

For example, if a type definition has the columns ‘x’ and ‘y’, then a filter by list query with the column map {“x”:[“10.1”, “2.3”], “y”:[“0.0”, “-31.5”, “42.0”]} will return the count of all data points whose x and y values match both in the respective x- and y-lists, e.g., “x = 10.1 and y = 0.0”, “x = 2.3 and y = -31.5”, etc. However, a record with “x = 10.1 and y = -31.5” or “x = 2.3 and y = 0.0” would not be returned because the values in the given lists do not correspond.

Parameters

table_name (str) –
Name of the table to filter, in [schema_name.]table_name format, using standard name resolution rules. This may be the name of a table or a view (when chaining queries).

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

column_values_map (dict of str to lists of str) –
List of values for the corresponding column in the table

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.

filter_mode – String indicating the filter mode, either ‘in_list’ or ‘not_in_list’. Allowed values are:

in_list – The filter will match all items that are in the provided list(s).

not_in_list – The filter will match all items that are not in the provided list(s).

The default value is ‘in_list’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the list filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_radius(table_name=None, view_name='', x_column_name=None, x_center=None, y_column_name=None, y_center=None, radius=None, options={})[source]

Calculates which objects from a table lie within a circle with the given radius and center point (i.e. circular NAI). The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input circular NAI restriction specification is also created if a input parameter view_name is passed in as part of the request.

For track data, all track points that lie within the circle plus one point on either side of the circle (if the track goes beyond the circle) will be included in the result.

Parameters

table_name (str) –
Name of the table on which the filter by radius operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

x_column_name (str) –
Name of the column to be used for the x-coordinate (the longitude) of the center.

x_center (float) –
Value of the longitude of the center. Must be within [-180.0, 180.0]. The minimum allowed value is -180. The maximum allowed value is 180.

y_column_name (str) –
Name of the column to be used for the y-coordinate-the latitude-of the center.

y_center (float) –
Value of the latitude of the center. Must be within [-90.0, 90.0]. The minimum allowed value is -90. The maximum allowed value is 90.

radius (float) –
The radius of the circle within which the search will be performed. Must be a non-zero positive value. It is in meters; so, for example, a value of ‘42000’ means 42 km. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema which is to contain the newly created view. If the schema is non-existent, it will be automatically created.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the radius filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_radius_geometry(table_name=None, view_name='', column_name=None, x_center=None, y_center=None, radius=None, options={})[source]

Calculates which geospatial geometry objects from a table intersect a circle with the given radius and center point (i.e. circular NAI). The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input circular NAI restriction specification is also created if a input parameter view_name is passed in as part of the request.

Parameters

table_name (str) –
Name of the table on which the filter by radius operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

column_name (str) –
Name of the geospatial geometry column to be filtered.

x_center (float) –
Value of the longitude of the center. Must be within [-180.0, 180.0]. The minimum allowed value is -180. The maximum allowed value is 180.

y_center (float) –
Value of the latitude of the center. Must be within [-90.0, 90.0]. The minimum allowed value is -90. The maximum allowed value is 90.

radius (float) –
The radius of the circle within which the search will be performed. Must be a non-zero positive value. It is in meters; so, for example, a value of ‘42000’ means 42 km. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema provided is non-existent, it will be automatically created.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the radius filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_range(table_name=None, view_name='', column_name=None, lower_bound=None, upper_bound=None, options={})[source]

Calculates which objects from a table have a column that is within the given bounds. An object from the table identified by input parameter table_name is added to the view input parameter view_name if its column is within [input parameter lower_bound, input parameter upper_bound] (inclusive). The operation is synchronous. The response provides a count of the number of objects which passed the bound filter. Although this functionality can also be accomplished with the standard filter function, it is more efficient.

For track objects, the count reflects how many points fall within the given bounds (which may not include all the track points of any given track).

Parameters

table_name (str) –
Name of the table on which the filter by range operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

column_name (str) –
Name of a column on which the operation would be applied.

lower_bound (float) –
Value of the lower bound (inclusive).

upper_bound (float) –
Value of the upper bound (inclusive).

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the range filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_series(table_name=None, view_name='', track_id=None, target_track_ids=None, options={})[source]

Filters objects matching all points of the given track (works only on track type data). It allows users to specify a particular track to find all other points in the table that fall within specified ranges (spatial and temporal) of all points of the given track. Additionally, the user can specify another track to see if the two intersect (or go close to each other within the specified ranges). The user also has the flexibility of using different metrics for the spatial distance calculation: Euclidean (flat geometry) or Great Circle (spherical geometry to approximate the Earth’s surface distances). The filtered points are stored in a newly created result set. The return value of the function is the number of points in the resultant set (view).

This operation is synchronous, meaning that a response will not be returned until all the objects are fully available.

Parameters

table_name (str) –
Name of the table on which the filter by track operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. Must be a currently existing table with a track present.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

track_id (str) –
The ID of the track which will act as the filtering points. Must be an existing track within the given table.

target_track_ids (list of str) –
Up to one track ID to intersect with the “filter” track. If any provided, it must be an valid track ID within the given set. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.

spatial_radius – A positive number passed as a string representing the radius of the search area centered around each track point’s geospatial coordinates. The value is interpreted in meters. Required parameter. The minimum allowed value is ‘0’.

time_radius – A positive number passed as a string representing the maximum allowable time difference between the timestamps of a filtered object and the given track’s points. The value is interpreted in seconds. Required parameter. The minimum allowed value is ‘0’.

spatial_distance_metric – A string representing the coordinate system to use for the spatial search criteria. Acceptable values are ‘euclidean’ and ‘great_circle’. Optional parameter; default is ‘euclidean’. Allowed values are:

euclidean

great_circle

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the series filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_string(table_name=None, view_name='', expression=None, mode=None, column_names=None, options={})[source]

Calculates which objects from a table or view match a string expression for the given string columns. Setting case_sensitive can modify case sensitivity in matching for all modes except search. For search mode details and limitations, see Full Text Search.

Parameters

table_name (str) –
Name of the table on which the filter operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table or view.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

expression (str) –
The expression with which to filter the table.

mode (str) –
The string filtering mode to apply. See below for details. Allowed values are:

search – Full text search query with wildcards and boolean operators. Note that for this mode, no column can be specified in input parameter column_names; all string columns of the table that have text search enabled will be searched.

equals – Exact whole-string match (accelerated).

contains – Partial substring match (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.

starts_with – Strings that start with the given expression (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.

regex – Full regular expression search (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.

column_names (list of str) –
List of columns on which to apply the filter. Ignored for search mode. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.

case_sensitive – If false then string filtering will ignore case. Does not apply to search mode. Allowed values are:

true

false

The default value is ‘true’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records that passed the string filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_table(table_name=None, view_name='', column_name=None, source_table_name=None, source_table_column_name=None, options={})[source]

Filters objects in one table based on objects in another table. The user must specify matching column types from the two tables (i.e. the target table from which objects will be filtered and the source table based on which the filter will be created); the column names need not be the same. If a input parameter view_name is specified, then the filtered objects will then be put in a newly created view. The operation is synchronous, meaning that a response will not be returned until all objects are fully available in the result view. The return value contains the count (i.e. the size) of the resulting view.

Parameters

table_name (str) –
Name of the table whose data will be filtered, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

column_name (str) –
Name of the column by whose value the data will be filtered from the table designated by input parameter table_name.

source_table_name (str) –
Name of the table whose data will be compared against in the table called input parameter table_name, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

source_table_column_name (str) –
Name of the column in the input parameter source_table_name whose values will be used as the filter for table input parameter table_name. Must be a geospatial geometry column if in ‘spatial’ mode; otherwise, Must match the type of the input parameter column_name.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.

filter_mode – String indicating the filter mode, either in_table or not_in_table. Allowed values are:

in_table

not_in_table

The default value is ‘in_table’.

mode – Mode - should be either spatial or normal. Allowed values are:

normal

spatial

The default value is ‘normal’.

buffer – Buffer size, in meters. Only relevant for spatial mode. The default value is ‘0’.

buffer_method – Method used to buffer polygons. Only relevant for spatial mode. Allowed values are:

normal

geos – Use geos 1 edge per corner algorithm

The default value is ‘normal’.

max_partition_size – Maximum number of points in a partition. Only relevant for spatial mode. The default value is ‘0’.

max_partition_score – Maximum number of points * edges in a partition. Only relevant for spatial mode. The default value is ‘8000000’.

x_column_name – Name of column containing x value of point being filtered in spatial mode. The default value is ‘x’.

y_column_name – Name of column containing y value of point being filtered in spatial mode. The default value is ‘y’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records in input parameter table_name that have input parameter column_name values matching input parameter source_table_column_name values in input parameter source_table_name.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

filter_by_value(table_name=None, view_name='', is_string=None, value=0, value_str='', column_name=None, options={})[source]

Calculates which objects from a table has a particular value for a particular column. The input parameters provide a way to specify either a String or a Double valued column and a desired value for the column on which the filter is performed. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new result view which satisfies the input filter restriction specification is also created with a view name passed in as part of the input payload. Although this functionality can also be accomplished with the standard filter function, it is more efficient.

Parameters

table_name (str) –
Name of an existing table on which to perform the calculation, in [schema_name.]table_name format, using standard name resolution rules.

view_name (str) –
If provided, then this will be the name of the view containing the results, in [schema_name.]view_name format, using standard name resolution rules and meeting table naming criteria. Must not be an already existing table or view. The default value is ‘’.

is_string (bool) –
Indicates whether the value being searched for is string or numeric.

value (float) –
The value to search for. The default value is 0.

value_str (str) –
The string value to search for. The default value is ‘’.

column_name (str) –
Name of a column on which the filter by value would be applied.

options (dict of str to str) –
Optional parameters. Allowed keys are:

create_temp_table – If true, a unique temporary table name will be generated in the sys_temp schema and used in place of input parameter view_name. This is always allowed even if the caller does not have permission to create tables. The generated name is returned in qualified_view_name. Allowed values are:

true

false

The default value is ‘false’.

collection_name – [DEPRECATED–please specify the containing schema for the view as part of input parameter view_name and use GPUdb.create_schema() to create the schema if non-existent] Name of a schema for the newly created view. If the schema is non-existent, it will be automatically created.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (long) –
The number of records passing the value filter.

info (dict of str to str) –
Additional information. Allowed keys are:

qualified_view_name – The fully qualified name of the view (i.e. including the schema)

The default value is an empty dict ( {} ).

get_job(job_id=None, options={})[source]

Get the status and result of asynchronously running job. See the GPUdb.create_job() for starting an asynchronous job. Some fields of the response are filled only after the submitted job has finished execution.

Parameters

job_id (long) –
A unique identifier for the job whose status and result is to be fetched.

options (dict of str to str) –
Optional parameters. Allowed keys are:

job_tag – Job tag returned in call to create the job

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

endpoint (str) –
The endpoint which is being executed asynchronously. E.g. ‘/alter/table’.

job_status (str) –
Status of the submitted job. Allowed values are:

RUNNING – The job is currently executing.

DONE – The job execution has successfully completed and the response is included in the output parameter job_response or output parameter job_response_str field

ERROR – The job was attempted, but an error was encountered. The output parameter status_map contains the details of the error in error_message

CANCELLED – Job cancellation was requested while the execution was in progress.

running (bool) –
True if the end point is still executing.

progress (int) –
Approximate percentage of the job completed.

successful (bool) –
True if the job execution completed and no errors were encountered.

response_encoding (str) –
The encoding of the job result (contained in output parameter job_response or output parameter job_response_str. Allowed values are:

binary – The job result is binary-encoded. It is contained in output parameter job_response.

json – The job result is json-encoded. It is contained in output parameter job_response_str.

job_response (bytes) –
The binary-encoded response of the job. This field is populated only when the job has completed and output parameter response_encoding is binary

job_response_str (str) –
The json-encoded response of the job. This field is populated only when the job has completed and output parameter response_encoding is json

status_map (dict of str to str) –
Map of various status strings for the executed job. Allowed keys are:

error_message – Explains what error occurred while running the job asynchronously. This entry only exists when the job status is ERROR.

info (dict of str to str) –
Additional information.

get_records(table_name=None, offset=0, limit=-9999, encoding='binary', options={}, get_record_type=True)[source]

Retrieves records from a given table, optionally filtered by an expression and/or sorted by a column. This operation can be performed on tables and views. Records can be returned encoded as binary, json, or geojson.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters. Note that when paging through a table, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records retrieved may differ between calls based on the updates applied.

Parameters

table_name (str) –
Name of the table or view from which the records will be fetched, in [schema_name.]table_name format, using standard name resolution rules.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records; one of binary, json, or geojson. Allowed values are:

binary

json

geojson

arrow

The default value is ‘binary’.

options (dict of str to str) –
Allowed keys are:

expression – Optional filter expression to apply to the table.

fast_index_lookup – Indicates if indexes should be used to perform the lookup for a given expression if possible. Only applicable if there is no sorting, the expression contains only equivalence comparisons based on existing tables indexes and the range of requested values is from [0 to END_OF_SET]. Allowed values are:

true

false

The default value is ‘true’.

sort_by – Optional column that the data should be sorted by. Empty by default (i.e. no sorting is applied).

sort_order – String indicating how the returned values should be sorted - ascending or descending. If sort_order is provided, sort_by has to be provided. Allowed values are:

ascending

descending

The default value is ‘ascending’.

The default value is an empty dict ( {} ).

get_record_type (bool) –
If True, deduce and return the record type for the returned records. Default is True.

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

type_name (str)

type_schema (str) –
Avro schema of output parameter records_binary or output parameter records_json

records_binary (list of bytes) –
If the input parameter encoding was ‘binary’, then this list contains the binary encoded records retrieved from the table, otherwise not populated.

records_json (list of str) –
If the input parameter encoding was ‘json’, then this list contains the JSON encoded records retrieved from the table. If the input parameter encoding was ‘geojson’ this list contains a single entry consisting of a GeoJSON FeatureCollection containing a feature per record. Otherwise not populated.

total_number_of_records (long) –
Total/Filtered number of records.

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information.

record_type (RecordType or None) –
A RecordType object using which the user can decode the binary data by using GPUdbRecord.decode_binary_data(). Available only if get_record_type is True.

get_records_and_decode(table_name=None, offset=0, limit=-9999, encoding='binary', options={}, record_type=None, force_primitive_return_types=True)[source]

Retrieves records from a given table, optionally filtered by an expression and/or sorted by a column. This operation can be performed on tables and views. Records can be returned encoded as binary, json, or geojson.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters. Note that when paging through a table, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records retrieved may differ between calls based on the updates applied.

Parameters

table_name (str) –
Name of the table or view from which the records will be fetched, in [schema_name.]table_name format, using standard name resolution rules.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records; one of binary, json, or geojson. Allowed values are:

binary

json

geojson

arrow

The default value is ‘binary’.

options (dict of str to str) –
Allowed keys are:

expression – Optional filter expression to apply to the table.

fast_index_lookup – Indicates if indexes should be used to perform the lookup for a given expression if possible. Only applicable if there is no sorting, the expression contains only equivalence comparisons based on existing tables indexes and the range of requested values is from [0 to END_OF_SET]. Allowed values are:

true

false

The default value is ‘true’.

sort_by – Optional column that the data should be sorted by. Empty by default (i.e. no sorting is applied).

sort_order – String indicating how the returned values should be sorted - ascending or descending. If sort_order is provided, sort_by has to be provided. Allowed values are:

ascending

descending

The default value is ‘ascending’.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
The record type expected in the results, or None to determine the appropriate type automatically. If known, providing this may improve performance in binary mode. Not used in JSON mode. The default value is None.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

type_name (str)

type_schema (str) –
Avro schema of output parameter records_binary or output parameter records_json

total_number_of_records (long) –
Total/Filtered number of records.

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information.

records (list of Record) –
A list of Record objects which contain the decoded records.

get_records_by_column(table_name=None, column_names=None, offset=0, limit=-9999, encoding='binary', options={})[source]

For a given table, retrieves the values from the requested column(s). Maps of column name to the array of values as well as the column data type are returned. This endpoint supports pagination with the input parameter offset and input parameter limit parameters.

Window functions, which can perform operations like moving averages, are available through this endpoint as well as GPUdb.create_projection().

When using pagination, if the table (or the underlying table in the case of a view) is modified (records are inserted, updated, or deleted) during a call to the endpoint, the records or values retrieved may differ between calls based on the type of the update, e.g., the contiguity across pages cannot be relied upon.

If input parameter table_name is empty, selection is performed against a single-row virtual table. This can be useful in executing temporal (NOW()), identity (USER()), or constant-based functions (GEODIST(-77.11, 38.88, -71.06, 42.36)).

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

table_name (str) –
Name of the table or view on which this operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. An empty table name retrieves one record from a single-row virtual table, where columns specified should be constants or constant expressions.

column_names (list of str) –
The list of column values to retrieve. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records; either binary or json. Allowed values are:

binary

json

The default value is ‘binary’.

options (dict of str to str) –
Allowed keys are:

expression – Optional filter expression to apply to the table.

sort_by – Optional column that the data should be sorted by. Used in conjunction with sort_order. The order_by option can be used in lieu of sort_by / sort_order. The default value is ‘’.

sort_order – String indicating how the returned values should be sorted - ascending or descending. If sort_order is provided, sort_by has to be provided. Allowed values are:

ascending

descending

The default value is ‘ascending’.

order_by – Comma-separated list of the columns to be sorted by as well as the sort direction, e.g., ‘timestamp asc, x desc’. The default value is ‘’.

convert_wkts_to_wkbs – If true, then WKT string columns will be returned as WKB bytes. Allowed values are:

true

false

The default value is ‘false’.

route_to_tom – For multihead record retrieval without shard key expression - specifies from which tom to retrieve data.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
The same table name as was passed in the parameter list.

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

binary_encoded_response (bytes) –
Avro binary encoded response.

json_encoded_response (str) –
Avro JSON encoded response.

total_number_of_records (long) –
Total/Filtered number of records.

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information.

record_type (RecordType or None) –
A RecordType object using which the user can decode the binary data by using GPUdbRecord.decode_binary_data(). If JSON encoding is used, then None.

get_records_by_column_and_decode(table_name=None, column_names=None, offset=0, limit=-9999, encoding='binary', options={}, record_type=None, force_primitive_return_types=True, get_column_major=True)[source]

For a given table, retrieves the values from the requested column(s). Maps of column name to the array of values as well as the column data type are returned. This endpoint supports pagination with the input parameter offset and input parameter limit parameters.

Window functions, which can perform operations like moving averages, are available through this endpoint as well as GPUdb.create_projection().

When using pagination, if the table (or the underlying table in the case of a view) is modified (records are inserted, updated, or deleted) during a call to the endpoint, the records or values retrieved may differ between calls based on the type of the update, e.g., the contiguity across pages cannot be relied upon.

If input parameter table_name is empty, selection is performed against a single-row virtual table. This can be useful in executing temporal (NOW()), identity (USER()), or constant-based functions (GEODIST(-77.11, 38.88, -71.06, 42.36)).

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

table_name (str) –
Name of the table or view on which this operation will be performed, in [schema_name.]table_name format, using standard name resolution rules. An empty table name retrieves one record from a single-row virtual table, where columns specified should be constants or constant expressions.

column_names (list of str) –
The list of column values to retrieve. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use output parameter has_more_records to see if more records exist in the result to be fetched, and input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records; either binary or json. Allowed values are:

binary

json

The default value is ‘binary’.

options (dict of str to str) –
Allowed keys are:

expression – Optional filter expression to apply to the table.

sort_by – Optional column that the data should be sorted by. Used in conjunction with sort_order. The order_by option can be used in lieu of sort_by / sort_order. The default value is ‘’.

sort_order – String indicating how the returned values should be sorted - ascending or descending. If sort_order is provided, sort_by has to be provided. Allowed values are:

ascending

descending

The default value is ‘ascending’.

order_by – Comma-separated list of the columns to be sorted by as well as the sort direction, e.g., ‘timestamp asc, x desc’. The default value is ‘’.

convert_wkts_to_wkbs – If true, then WKT string columns will be returned as WKB bytes. Allowed values are:

true

false

The default value is ‘false’.

route_to_tom – For multihead record retrieval without shard key expression - specifies from which tom to retrieve data.

The default value is an empty dict ( {} ).

record_type (RecordType or None) –
The record type expected in the results, or None to determine the appropriate type automatically. If known, providing this may improve performance in binary mode. Not used in JSON mode. The default value is None.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.

Returns

A dict with the following entries–

table_name (str) –
The same table name as was passed in the parameter list.

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.

total_number_of_records (long) –
Total/Filtered number of records.

has_more_records (bool) –
Too many records. Returned a partial set.

info (dict of str to str) –
Additional information.

records (list of Record) –
A list of Record objects which contain the decoded records.

get_records_by_series(table_name=None, world_table_name=None, offset=0, limit=250, encoding='binary', options={})[source]

Retrieves the complete series/track records from the given input parameter world_table_name based on the partial track information contained in the input parameter table_name.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

In contrast to GPUdb.get_records() this returns records grouped by series/track. So if input parameter offset is 0 and input parameter limit is 5 this operation would return the first 5 series/tracks in input parameter table_name. Each series/track will be returned sorted by their TIMESTAMP column.

Parameters

table_name (str) –
Name of the table or view for which series/tracks will be fetched, in [schema_name.]table_name format, using standard name resolution rules.

world_table_name (str) –
Name of the table containing the complete series/track information to be returned for the tracks present in the input parameter table_name, in [schema_name.]table_name format, using standard name resolution rules. Typically this is used when retrieving series/tracks from a view (which contains partial series/tracks) but the user wants to retrieve the entire original series/tracks. Can be blank.

offset (int) –
A positive integer indicating the number of initial series/tracks to skip (useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (int) –
A positive integer indicating the maximum number of series/tracks to be returned. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. The default value is 250.

encoding (str) –
Specifies the encoding for returned records; either binary or json. Allowed values are:

binary

json

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_names (list of str) –
The table name (one per series/track) of the returned series/tracks.

type_names (list of str) –
The type IDs (one per series/track) of the returned series/tracks.

type_schemas (list of str) –
The type schemas (one per series/track) of the returned series/tracks.

list_records_binary (list of lists of bytes) –
If the encoding parameter of the request was ‘binary’ then this list-of-lists contains the binary encoded records for each object (inner list) in each series/track (outer list). Otherwise, empty list-of-lists.

list_records_json (list of lists of str) –
If the encoding parameter of the request was ‘json’ then this list-of-lists contains the json encoded records for each object (inner list) in each series/track (outer list). Otherwise, empty list-of-lists.

info (dict of str to str) –
Additional information.

record_types (list of RecordType) –
A list of RecordType objects using which the user can decode the binary data by using GPUdbRecord.decode_binary_data() per record.

get_records_by_series_and_decode(table_name=None, world_table_name=None, offset=0, limit=250, encoding='binary', options={}, force_primitive_return_types=True)[source]

Retrieves the complete series/track records from the given input parameter world_table_name based on the partial track information contained in the input parameter table_name.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

In contrast to GPUdb.get_records() this returns records grouped by series/track. So if input parameter offset is 0 and input parameter limit is 5 this operation would return the first 5 series/tracks in input parameter table_name. Each series/track will be returned sorted by their TIMESTAMP column.

Parameters

table_name (str) –
Name of the table or view for which series/tracks will be fetched, in [schema_name.]table_name format, using standard name resolution rules.

world_table_name (str) –
Name of the table containing the complete series/track information to be returned for the tracks present in the input parameter table_name, in [schema_name.]table_name format, using standard name resolution rules. Typically this is used when retrieving series/tracks from a view (which contains partial series/tracks) but the user wants to retrieve the entire original series/tracks. Can be blank.

offset (int) –
A positive integer indicating the number of initial series/tracks to skip (useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (int) –
A positive integer indicating the maximum number of series/tracks to be returned. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. The default value is 250.

encoding (str) –
Specifies the encoding for returned records; either binary or json. Allowed values are:

binary

json

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

Returns

A dict with the following entries–

table_names (list of str) –
The table name (one per series/track) of the returned series/tracks.

type_names (list of str) –
The type IDs (one per series/track) of the returned series/tracks.

type_schemas (list of str) –
The type schemas (one per series/track) of the returned series/tracks.

info (dict of str to str) –
Additional information.

records (list of list of Record) –
A list of list of Record objects which contain the decoded records.

get_records_from_collection(table_name=None, offset=0, limit=-9999, encoding='binary', options={})[source]

Retrieves records from a collection. The operation can optionally return the record IDs which can be used in certain queries such as GPUdb.delete_records().

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

Note that when using the Java API, it is not possible to retrieve records from join views using this operation. (DEPRECATED)

Parameters

table_name (str) –
Name of the collection or table from which records are to be retrieved, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing collection or table.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records; either binary or json. Allowed values are:

binary

json

The default value is ‘binary’.

options (dict of str to str) –
Allowed keys are:

return_record_ids – If true then return the internal record ID along with each returned record. Allowed values are:

true

false

The default value is ‘false’.

expression – Optional filter expression to apply to the table. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

type_names (list of str) –
The type IDs of the corresponding records in output parameter records_binary or output parameter records_json. This is useful when input parameter table_name is a heterogeneous collection (collections containing tables of different types).

records_binary (list of bytes) –
If the encoding parameter of the request was ‘binary’ then this list contains the binary encoded records retrieved from the table/collection. Otherwise, empty list.

records_json (list of str) –
If the encoding parameter of the request was ‘json’, then this list contains the JSON encoded records retrieved from the table/collection. Otherwise, empty list.

record_ids (list of str) –
If the ‘return_record_ids’ option of the request was ‘true’, then this list contains the internal ID for each object. Otherwise it will be empty.

info (dict of str to str) –
Additional information. Allowed keys are:

total_number_of_records – Total number of records.

has_more_records – Too many records. Returned a partial set. Allowed values are:

true

false

The default value is an empty dict ( {} ).

record_types (list of RecordType) –
A list of RecordType objects using which the user can decode the binary data by using GPUdbRecord.decode_binary_data() per record.

get_records_from_collection_and_decode(table_name=None, offset=0, limit=-9999, encoding='binary', options={}, force_primitive_return_types=True)[source]

Retrieves records from a collection. The operation can optionally return the record IDs which can be used in certain queries such as GPUdb.delete_records().

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

Note that when using the Java API, it is not possible to retrieve records from join views using this operation. (DEPRECATED)

Parameters

table_name (str) –
Name of the collection or table from which records are to be retrieved, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing collection or table.

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.

limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. Use input parameter offset & input parameter limit to request subsequent pages of results. The default value is -9999.

encoding (str) –
Specifies the encoding for returned records; either binary or json. Allowed values are:

binary

json

The default value is ‘binary’.

options (dict of str to str) –
Allowed keys are:

return_record_ids – If true then return the internal record ID along with each returned record. Allowed values are:

true

false

The default value is ‘false’.

expression – Optional filter expression to apply to the table. The default value is ‘’.

The default value is an empty dict ( {} ).

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

type_names (list of str) –
The type IDs of the corresponding records in output parameter records_binary or output parameter records_json. This is useful when input parameter table_name is a heterogeneous collection (collections containing tables of different types).

record_ids (list of str) –
If the ‘return_record_ids’ option of the request was ‘true’, then this list contains the internal ID for each object. Otherwise it will be empty.

info (dict of str to str) –
Additional information. Allowed keys are:

total_number_of_records – Total number of records.

has_more_records – Too many records. Returned a partial set. Allowed values are:

true

false

The default value is an empty dict ( {} ).

records (list of Record) –
A list of Record objects which contain the decoded records.

grant_permission(principal='', object=None, object_type=None, permission=None, options={})[source]

Grant user or role the specified permission on the specified object.

Parameters

principal (str) –
Name of the user or role for which the permission is being granted. Must be an existing user or role. The default value is ‘’.

object (str) –
Name of object permission is being granted to. It is recommended to use a fully-qualified name when possible.

object_type (str) –
The type of object being granted to. Allowed values are:

context – Context

credential – Credential

datasink – Data Sink

datasource – Data Source

directory – KIFS File Directory

graph – A Graph object

proc – UDF Procedure

schema – Schema

sql_proc – SQL Procedure

system – System-level access

table – Database Table

table_monitor – Table monitor

permission (str) –
Permission being granted. Allowed values are:

admin – Full read/write and administrative access on the object.

connect – Connect access on the given data source or data sink.

create – Ability to create new objects of this type.

delete – Delete rows from tables.

execute – Ability to Execute the Procedure object.

insert – Insert access to tables.

read – Ability to read, list and use the object.

send_alert – Ability to send system alerts.

update – Update access to the table.

user_admin – Access to administer users and roles that do not have system_admin permission.

write – Access to write, change and delete objects.

options (dict of str to str) –
Optional parameters. Allowed keys are:

columns – Apply table security to these columns, comma-separated. The default value is ‘’.

filter_expression – Optional filter expression to apply to this grant. Only rows that match the filter will be affected. The default value is ‘’.

with_grant_option – Allow the recipient to grant the same permission (or subset) to others. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

principal (str) –
Value of input parameter principal.

object (str) –
Value of input parameter object.

object_type (str) –
Value of input parameter object_type.

permission (str) –
Value of input parameter permission.

info (dict of str to str) –
Additional information.

grant_permission_credential(name=None, permission=None, credential_name=None, options={})[source]

Grants a credential-level permission to a user or role.

Parameters

name (str) –
Name of the user or role to which the permission will be granted. Must be an existing user or role.

permission (str) –
Permission to grant to the user or role. Allowed values are:

credential_admin – Full read/write and administrative access on the credential.

credential_read – Ability to read and use the credential.

credential_name (str) –
Name of the credential on which the permission will be granted. Must be an existing credential, or an empty string to grant access on all credentials.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

credential_name (str) –
Value of input parameter credential_name.

info (dict of str to str) –
Additional information.

grant_permission_datasource(name=None, permission=None, datasource_name=None, options={})[source]

Grants a data source permission to a user or role.

Parameters

name (str) –
Name of the user or role to which the permission will be granted. Must be an existing user or role.

permission (str) –
Permission to grant to the user or role. Allowed values are:

admin – Admin access on the given data source

connect – Connect access on the given data source

datasource_name (str) –
Name of the data source on which the permission will be granted. Must be an existing data source, or an empty string to grant permission on all data sources.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

datasource_name (str) –
Value of input parameter datasource_name.

info (dict of str to str) –
Additional information.

grant_permission_directory(name=None, permission=None, directory_name=None, options={})[source]

Grants a KiFS directory-level permission to a user or role.

Parameters

name (str) –
Name of the user or role to which the permission will be granted. Must be an existing user or role.

permission (str) –
Permission to grant to the user or role. Allowed values are:

directory_read – For files in the directory, access to list files, download files, or use files in server side functions

directory_write – Access to upload files to, or delete files from, the directory. A user or role with write access automatically has read access

directory_name (str) –
Name of the KiFS directory to which the permission grants access. An empty directory name grants access to all KiFS directories

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

directory_name (str) –
Value of input parameter directory_name.

info (dict of str to str) –
Additional information.

grant_permission_proc(name=None, permission=None, proc_name=None, options={})[source]

Grants a proc-level permission to a user or role.

Parameters

name (str) –
Name of the user or role to which the permission will be granted. Must be an existing user or role.

permission (str) –
Permission to grant to the user or role. Allowed values are:

proc_admin – Admin access to the proc.

proc_execute – Execute access to the proc.

proc_name (str) –
Name of the proc to which the permission grants access. Must be an existing proc, or an empty string to grant access to all procs.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

proc_name (str) –
Value of input parameter proc_name.

info (dict of str to str) –
Additional information.

grant_permission_system(name=None, permission=None, options={})[source]

Grants a system-level permission to a user or role.

Parameters

name (str) –
Name of the user or role to which the permission will be granted. Must be an existing user or role.

permission (str) –
Permission to grant to the user or role. Allowed values are:

system_admin – Full access to all data and system functions.

system_user_admin – Access to administer users and roles that do not have system_admin permission.

system_write – Read and write access to all tables.

system_read – Read-only access to all tables.

system_send_alert – Send system alerts.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

info (dict of str to str) –
Additional information.

grant_permission_table(name=None, permission=None, table_name=None, filter_expression='', options={})[source]

Grants a table-level permission to a user or role.

Parameters

name (str) –
Name of the user or role to which the permission will be granted. Must be an existing user or role.

permission (str) –
Permission to grant to the user or role. Allowed values are:

table_admin – Full read/write and administrative access to the table.

table_insert – Insert access to the table.

table_update – Update access to the table.

table_delete – Delete access to the table.

table_read – Read access to the table.

table_name (str) –
Name of the table to which the permission grants access, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table, view, or schema. If a schema, the permission also applies to tables and views in the schema.

filter_expression (str) –
Optional filter expression to apply to this grant. Only rows that match the filter will be affected. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

columns – Apply security to these columns, comma-separated. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

table_name (str) –
Value of input parameter table_name.

filter_expression (str) –
Value of input parameter filter_expression.

info (dict of str to str) –
Additional information.

grant_role(role=None, member=None, options={})[source]

Grants membership in a role to a user or role.

Parameters

role (str) –
Name of the role in which membership will be granted. Must be an existing role.

member (str) –
Name of the user or role that will be granted membership in input parameter role. Must be an existing user or role.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

role (str) –
Value of input parameter role.

member (str) –
Value of input parameter member.

info (dict of str to str) –
Additional information.

has_permission(principal='', object=None, object_type=None, permission=None, options={})[source]

Checks if the specified user has the specified permission on the specified object.

Parameters

principal (str) –
Name of the user for which the permission is being checked. Must be an existing user. If blank, will use the current user. The default value is ‘’.

object (str) –
Name of object to check for the requested permission. It is recommended to use a fully-qualified name when possible.

object_type (str) –
The type of object being checked. Allowed values are:

context – Context

credential – Credential

datasink – Data Sink

datasource – Data Source

directory – KiFS File Directory

graph – A Graph object

proc – UDF Procedure

schema – Schema

sql_proc – SQL Procedure

system – System-level access

table – Database Table

table_monitor – Table monitor

permission (str) –
Permission to check for. Allowed values are:

admin – Full read/write and administrative access on the object.

connect – Connect access on the given data source or data sink.

create – Ability to create new objects of this type.

delete – Delete rows from tables.

execute – Ability to Execute the Procedure object.

insert – Insert access to tables.

read – Ability to read, list and use the object.

send_alert – Ability to send system alerts.

update – Update access to the table.

user_admin – Access to administer users and roles that do not have system_admin permission.

write – Access to write, change and delete objects.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If false will return an error if the provided input parameter object does not exist or is blank. If true then it will return false for output parameter has_permission. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

principal (str) –
Value of input parameter principal

object (str) –
Fully-qualified value of input parameter object

object_type (str) –
Value of input parameter object_type

permission (str) –
Value of input parameter permission

has_permission (bool) –
Indicates whether the specified user has the specified permission on the specified target. Allowed values are:

True – User has the effective queried permission

False – User does not have the queried permission

filters (dict of str to str) –
Map of column/filters that have been granted.

info (dict of str to str) –
Additional information.

has_proc(proc_name=None, options={})[source]

Checks the existence of a proc with the given name.

Parameters

proc_name (str) –
Name of the proc to check for existence.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

proc_name (str) –
Value of input parameter proc_name

proc_exists (bool) –
Indicates whether the proc exists or not. Allowed values are:

True

False

info (dict of str to str) –
Additional information.

has_role(principal='', role=None, options={})[source]

Checks if the specified user has the specified role.

Parameters

principal (str) –
Name of the user for which role membership is being checked. Must be an existing user. If blank, will use the current user. The default value is ‘’.

role (str) –
Name of role to check for membership.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If false will return an error if the provided input parameter role does not exist or is blank. If true then it will return false for output parameter has_role. Allowed values are:

true

false

The default value is ‘false’.

only_direct – If false will search recursively if the input parameter principal is a member of input parameter role. If true then input parameter principal must directly be a member of input parameter role. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

principal (str) –
Value of input parameter principal

role (str) –
Input parameter role for which membership is being checked

has_role (bool) –
Indicates whether the specified user has membership in the specified target input parameter role. Allowed values are:

True – User has membership in the role

False – User does not have membership in the role

info (dict of str to str) –
Additional information. Allowed keys are:

direct – true when principal is directly a member of the role. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

has_schema(schema_name=None, options={})[source]

Checks for the existence of a schema with the given name.

Parameters

schema_name (str) –
Name of the schema to check for existence, in root, using standard name resolution rules.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

schema_name (str) –
Value of input parameter schema_name

schema_exists (bool) –
Indicates whether the schema exists or not. Allowed values are:

True

False

info (dict of str to str) –
Additional information.

has_table(table_name=None, options={})[source]

Checks for the existence of a table with the given name.

Parameters

table_name (str) –
Name of the table to check for existence, in [schema_name.]table_name format, using standard name resolution rules.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name

table_exists (bool) –
Indicates whether the table exists or not. Allowed values are:

True

False

info (dict of str to str) –
Additional information.

has_type(type_id=None, options={})[source]

Check for the existence of a type.

Parameters

type_id (str) –
Id of the type returned in response to GPUdb.create_type() request.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

type_id (str) –
Value of input parameter type_id.

type_exists (bool) –
Indicates whether the type exists or not. Allowed values are:

True

False

info (dict of str to str) –
Additional information.

insert_records(table_name=None, data=None, list_encoding=None, options={}, record_type=None)[source]

Adds multiple records to the specified table. The operation is synchronous, meaning that a response will not be returned until all the records are fully inserted and available. The response payload provides the counts of the number of records actually inserted and/or updated, and can provide the unique identifier of each added record.

The input parameter options parameter can be used to customize this function’s behavior.

The update_on_existing_pk option specifies the record collision policy for inserting into a table with a primary key, but is ignored if no primary key exists.

The return_record_ids option indicates that the database should return the unique identifiers of inserted records.

Parameters

table_name (str) –
Name of table to which the records are to be added, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table.

data (list of Records) –
An array of binary or json encoded data, or Record objects for the records to be added. The user can provide a single element (which will be automatically promoted to a list internally) or a list. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

list_encoding (str) –
The encoding of the records to be inserted. Allowed values are:

binary

json

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

update_on_existing_pk – Specifies the record collision policy for inserting into a table with a primary key. If set to true, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be “upserted”). If set to false, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by ignore_existing_pk, allow_partial_batch, & return_individual_errors. If the specified table does not have a primary key, then this option has no effect. Allowed values are:

true – Upsert new records when primary keys match existing records

false – Reject new records when primary keys match existing records

The default value is ‘false’.

ignore_existing_pk – Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when update_on_existing_pk is false). If set to true, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If false, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by allow_partial_batch & return_individual_errors. If the specified table does not have a primary key or if upsert mode is in effect (update_on_existing_pk is true), then this option has no effect. Allowed values are:

true – Ignore new records whose primary key values collide with those of existing records

false – Treat as errors any new records whose primary key values collide with those of existing records

The default value is ‘false’.

pk_conflict_predicate_higher – The record with higher value for the column resolves the primary-key insert conflict. The default value is ‘’.

pk_conflict_predicate_lower – The record with lower value for the column resolves the primary-key insert conflict. The default value is ‘’.

return_record_ids – If true then return the internal record id along for each inserted record. Allowed values are:

true

false

The default value is ‘false’.

truncate_strings – If set to true, any strings which are too long for their target charN string columns will be truncated to fit. Allowed values are:

true

false

The default value is ‘false’.

return_individual_errors – If set to true, success will always be returned, and any errors found will be included in the info map. The “bad_record_indices” entry is a comma-separated list of bad records (0-based). And if so, there will also be an “error_N” entry for each record with an error, where N is the index (0-based). Allowed values are:

true

false

The default value is ‘false’.

allow_partial_batch – If set to true, all correct records will be inserted and incorrect records will be rejected and reported. Otherwise, the entire batch will be rejected if any records are incorrect. Allowed values are:

true

false

The default value is ‘false’.

dry_run – If set to true, no data will be saved and any errors will be returned. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

record_type (RecordType) –
A RecordType object using which the binary data will be encoded. If None, then it is assumed that the data is already encoded, and no further encoding will occur. Default is None.

Returns

A dict with the following entries–

record_ids (list of str) –
An array containing the IDs with which the added records are identified internally.

count_inserted (int) –
The number of records inserted.

count_updated (int) –
The number of records updated.

info (dict of str to str) –
Additional information. Allowed keys are:

bad_record_indices – If return_individual_errors option is specified or implied, returns a comma-separated list of invalid indices (0-based)

error_N – Error message for record at index N (0-based)

insert_records_from_files(table_name=None, filepaths=None, modify_columns={}, create_table_options={}, options={})[source]

Reads from one or more files and inserts the data into a new or existing table. The source data can be located either in KiFS; on the cluster, accessible to the database; or remotely, accessible via a pre-defined external data source.

For delimited text files, there are two loading schemes: positional and name-based. The name-based loading scheme is enabled when the file has a header present and text_has_header is set to true. In this scheme, the source file(s) field names must match the target table’s column names exactly; however, the source file can have more fields than the target table has columns. If error_handling is set to permissive, the source file can have fewer fields than the target table has columns. If the name-based loading scheme is being used, names matching the file header’s names may be provided to columns_to_load instead of numbers, but ranges are not supported.

Note: Due to data being loaded in parallel, there is no insertion order guaranteed. For tables with primary keys, in the case of a primary key collision, this means it is indeterminate which record will be inserted first and remain, while the rest of the colliding key records are discarded.

Returns once all files are processed.

Parameters

table_name (str) –
Name of the table into which the data will be inserted, in [schema_name.]table_name format, using standard name resolution rules. If the table does not exist, the table will be created using either an existing type_id or the type inferred from the file, and the new table name will have to meet standard table naming criteria.

filepaths (list of str) –
A list of file paths from which data will be sourced;

For paths in KiFS, use the URI prefix of kifs:// followed by the path to a file or directory. File matching by prefix is supported, e.g. kifs://dir/file would match dir/file_1 and dir/file_2. When prefix matching is used, the path must start with a full, valid KiFS directory name.

If an external data source is specified in datasource_name, these file paths must resolve to accessible files at that data source location. Prefix matching is supported. If the data source is hdfs, prefixes must be aligned with directories, i.e. partial file names will not match.

If no data source is specified, the files are assumed to be local to the database and must all be accessible to the gpudb user, residing on the path (or relative to the path) specified by the external files directory in the Kinetica configuration file. Wildcards (*) can be used to specify a group of files. Prefix matching is supported, the prefixes must be aligned with directories.

If the first path ends in .tsv, the text delimiter will be defaulted to a tab character. If the first path ends in .psv, the text delimiter will be defaulted to a pipe character (|). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

modify_columns (dict of str to dicts of str to str) –
Not implemented yet. The default value is an empty dict ( {} ).

create_table_options (dict of str to str) –
Options from GPUdb.create_table(), allowing the structure of the table to be defined independently of the data source, when creating the target table. Allowed keys are:

type_id – ID of a currently registered type.

no_error_if_exists – If true, prevents an error from occurring if the table already exists and is of the given type. If a table with the same name but a different type exists, it is still an error. Allowed values are:

true

false

The default value is ‘false’.

is_replicated – Affects the distribution scheme for the table’s data. If true and the given table has no explicit shard key defined, the table will be replicated. If false, the table will be sharded according to the shard key specified in the given type_id, or randomly sharded, if no shard key is specified. Note that a type containing a shard key cannot be used to create a replicated table. Allowed values are:

true

false

The default value is ‘false’.

foreign_keys – Semicolon-separated list of foreign keys, of the format ‘(source_column_name [, …]) references target_table_name(primary_key_column_name [, …]) [as foreign_key_name]’.

foreign_shard_key – Foreign shard key of the format ‘source_column references shard_by_column from target_table(primary_key_column)’.

partition_type – Partitioning scheme to use. Allowed values are:

RANGE – Use range partitioning.

INTERVAL – Use interval partitioning.

LIST – Use list partitioning.

HASH – Use hash partitioning.

SERIES – Use series partitioning.

partition_keys – Comma-separated list of partition keys, which are the columns or column expressions by which records will be assigned to partitions defined by partition_definitions.

partition_definitions – Comma-separated list of partition definitions, whose format depends on the choice of partition_type. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for example formats.

is_automatic_partition – If true, a new partition will be created for values which don’t fall into an existing partition. Currently, only supported for list partitions. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in input parameter table_name.

chunk_size – Indicates the number of records per chunk to be used for this table.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for this table.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for this table.

is_result_table – Indicates whether the table is a memory-only table. A result table cannot contain columns with text_search data-handling, and it will not be retained if the server is restarted. Allowed values are:

true

false

The default value is ‘false’.

strategy_definition – The tier strategy for the table and its columns.

compression_codec – The default compression codec for this table’s columns.

The default value is an empty dict ( {} ).

options (dict of str to str) –
Optional parameters. Allowed keys are:

bad_record_table_name – Name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string). When error_handling is abort, bad records table is not populated.

bad_record_table_limit – A positive integer indicating the maximum number of records that can be written to the bad-record-table. The default value is ‘10000’.

bad_record_table_limit_per_input – For subscriptions, a positive integer indicating the maximum number of records that can be written to the bad-record-table per file/payload. Default value will be bad_record_table_limit and total size of the table per rank is limited to bad_record_table_limit.

batch_size – Number of records to insert per batch when inserting data. The default value is ‘50000’.

column_formats – For each target column specified, applies the column-property-bound format to the source data loaded into that column. Each column format will contain a mapping of one or more of its column properties to an appropriate format for each property. Currently supported column properties include date, time, & datetime. The parameter value must be formatted as a JSON string of maps of column names to maps of column properties to their corresponding column formats, e.g., ‘{ “order_date” : { “date” : “%Y.%m.%d” }, “order_time” : { “time” : “%H:%M:%S” } }’.

See default_column_formats for valid format syntax.

columns_to_load – Specifies a comma-delimited list of columns from the source data to load. If more than one file is being loaded, this list applies to all files.

Column numbers can be specified discretely or as a range. For example, a value of ‘5,7,1..3’ will insert values from the fifth column in the source data into the first column in the target table, from the seventh column in the source data into the second column in the target table, and from the first through third columns in the source data into the third through fifth columns in the target table.

If the source data contains a header, column names matching the file header names may be provided instead of column numbers. If the target table doesn’t exist, the table will be created with the columns in this order. If the target table does exist with columns in a different order than the source data, this list can be used to match the order of the target table. For example, a value of ‘C, B, A’ will create a three column table with column C, followed by column B, followed by column A; or will insert those fields in that order into a table created with columns in that order. If the target table exists, the column names must match the source data field names for a name-mapping to be successful.

Mutually exclusive with columns_to_skip.

columns_to_skip – Specifies a comma-delimited list of columns from the source data to skip. Mutually exclusive with columns_to_load.

compression_type – Source data compression type. Allowed values are:

none – No compression.

auto – Auto detect compression type

gzip – gzip file compression.

bzip2 – bzip2 file compression.

The default value is ‘auto’.

datasource_name – Name of an existing external data source from which data file(s) specified in input parameter filepaths will be loaded

default_column_formats – Specifies the default format to be applied to source data loaded into columns with the corresponding column property. Currently supported column properties include date, time, & datetime. This default column-property-bound format can be overridden by specifying a column property & format for a given target column in column_formats. For each specified annotation, the format will apply to all columns with that annotation unless a custom column_formats for that annotation is specified.

The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., ‘{ “date” : “%Y.%m.%d”, “time” : “%H:%M:%S” }’. Column formats are specified as a string of control characters and plain text. The supported control characters are ‘Y’, ‘m’, ‘d’, ‘H’, ‘M’, ‘S’, and ‘s’, which follow the Linux ‘strptime()’ specification, as well as ‘s’, which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds).

Formats for the ‘date’ annotation must include the ‘Y’, ‘m’, and ‘d’ control characters. Formats for the ‘time’ annotation must include the ‘H’, ‘M’, and either ‘S’ or ‘s’ (but not both) control characters. Formats for the ‘datetime’ annotation meet both the ‘date’ and ‘time’ control character requirements. For example, ‘{“datetime” : “%m/%d/%Y %H:%M:%S” }’ would be used to interpret text as “05/04/2000 12:12:11”

error_handling – Specifies how errors should be handled upon insertion. Allowed values are:

permissive – Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.

ignore_bad_records – Malformed records are skipped.

abort – Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.

The default value is ‘abort’.

file_type – Specifies the type of the file(s) whose records will be inserted. Allowed values are:

avro – Avro file format

delimited_text – Delimited text file format; e.g., CSV, TSV, PSV, etc.

gdb – Esri/GDB file format

json – Json file format

parquet – Apache Parquet file format

shapefile – ShapeFile file format

The default value is ‘delimited_text’.

flatten_columns – Specifies how to handle nested columns. Allowed values are:

true – Break up nested columns to multiple columns

false – Treat nested columns as json columns instead of flattening

The default value is ‘false’.

gdal_configuration_options – Comma separated list of gdal conf options, for the specific requests: key=value

ignore_existing_pk – Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when update_on_existing_pk is false). If set to true, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If false, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by error_handling. If the specified table does not have a primary key or if upsert mode is in effect (update_on_existing_pk is true), then this option has no effect. Allowed values are:

true – Ignore new records whose primary key values collide with those of existing records

false – Treat as errors any new records whose primary key values collide with those of existing records

The default value is ‘false’.

ingestion_mode – Whether to do a full load, dry run, or perform a type inference on the source data. Allowed values are:

full – Run a type inference on the source data (if needed) and ingest

dry_run – Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of error_handling.

type_inference_only – Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.

The default value is ‘full’.

kafka_consumers_per_rank – Number of Kafka consumer threads per rank (valid range 1-6). The default value is ‘1’.

kafka_group_id – The group id to be used when consuming data from a Kafka topic (valid only for Kafka datasource subscriptions).

kafka_offset_reset_policy – Policy to determine whether the Kafka data consumption starts either at earliest offset or latest offset. Allowed values are:

earliest

latest

The default value is ‘earliest’.

kafka_optimistic_ingest – Enable optimistic ingestion where Kafka topic offsets and table data are committed independently to achieve parallelism. Allowed values are:

true

false

The default value is ‘false’.

kafka_subscription_cancel_after – Sets the Kafka subscription lifespan (in minutes). Expired subscription will be cancelled automatically.

kafka_type_inference_fetch_timeout – Maximum time to collect Kafka messages before type inferencing on the set of them.

layer – Geo files layer(s) name(s): comma separated.

loading_mode – Scheme for distributing the extraction and loading of data from the source data file(s). This option applies only when loading files that are local to the database. Allowed values are:

head – The head node loads all data. All files must be available to the head node.

distributed_shared – The head node coordinates loading data by worker processes across all nodes from shared files available to all workers.

NOTE:

Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.

distributed_local – A single worker process on each node loads all files that are available to it. This option works best when each worker loads files from its own file system, to maximize performance. In order to avoid data duplication, either each worker performing the load needs to have visibility to a set of files unique to it (no file is visible to more than one node) or the target table needs to have a primary key (which will allow the worker to automatically deduplicate data).

NOTE:

If the target table doesn’t exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail.

If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.

The default value is ‘head’.

local_time_offset – Apply an offset to Avro local timestamp columns.

max_records_to_load – Limit the number of records to load in this request: if this number is larger than batch_size, then the number of records loaded will be limited to the next whole number of batch_size (per working thread).

num_tasks_per_rank – Number of tasks for reading file per rank. Default will be system configuration parameter, external_file_reader_num_tasks.

poll_interval – If true, the number of seconds between attempts to load external files into the table. If zero, polling will be continuous as long as data is found. If no data is found, the interval will steadily increase to a maximum of 60 seconds. The default value is ‘0’.

primary_keys – Comma separated list of column names to set as primary keys, when not specified in the type.

schema_registry_connection_retries – Confluent Schema registry connection timeout (in Secs)

schema_registry_connection_timeout – Confluent Schema registry connection timeout (in Secs)

schema_registry_max_consecutive_connection_failures – Max records to skip due to SR connection failures, before failing

max_consecutive_invalid_schema_failure – Max records to skip due to schema related errors, before failing

schema_registry_schema_name – Name of the Avro schema in the schema registry to use when reading Avro records.

shard_keys – Comma separated list of column names to set as shard keys, when not specified in the type.

skip_lines – Skip a number of lines from the beginning of the file.

start_offsets – Starting offsets by partition to fetch from kafka. A comma separated list of partition:offset pairs.

subscribe – Continuously poll the data source to check for new data and load it into the table. Allowed values are:

true

false

The default value is ‘false’.

table_insert_mode – Insertion scheme to use when inserting records from multiple shapefiles. Allowed values are:

single – Insert all records into a single table.

table_per_file – Insert records from each file into a new table corresponding to that file.

The default value is ‘single’.

text_comment_string – Specifies the character string that should be interpreted as a comment line prefix in the source data. All lines in the data starting with the provided string are ignored.

For delimited_text file_type only. The default value is ‘#’.

text_delimiter – Specifies the character delimiting field values in the source data and field names in the header (if present).

For delimited_text file_type only. The default value is ‘,’.

text_escape_character – Specifies the character that is used to escape other characters in the source data.

An ‘a’, ‘b’, ‘f’, ‘n’, ‘r’, ‘t’, or ‘v’ preceded by an escape character will be interpreted as the ASCII bell, backspace, form feed, line feed, carriage return, horizontal tab, & vertical tab, respectively. For example, the escape character followed by an ‘n’ will be interpreted as a newline within a field value.

The escape character can also be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

For delimited_text file_type only.

text_has_header – Indicates whether the source data contains a header row.

For delimited_text file_type only. Allowed values are:

true

false

The default value is ‘true’.

text_header_property_delimiter – Specifies the delimiter for column properties in the header row (if present). Cannot be set to same value as text_delimiter.

For delimited_text file_type only. The default value is ‘|’.

text_null_string – Specifies the character string that should be interpreted as a null value in the source data.

For delimited_text file_type only. The default value is ‘\N’.

text_quote_character – Specifies the character that should be interpreted as a field value quoting character in the source data. The character must appear at beginning and end of field value to take effect. Delimiters within quoted fields are treated as literals and not delimiters. Within a quoted field, two consecutive quote characters will be interpreted as a single literal quote character, effectively escaping it. To not have a quote character, specify an empty string.

For delimited_text file_type only. The default value is ‘”’.

text_search_columns – Add ‘text_search’ property to internally inferenced string columns. Comma separated list of column names or ‘*’ for all columns. To add ‘text_search’ property only to string columns greater than or equal to a minimum size, also set the text_search_min_column_length

text_search_min_column_length – Set the minimum column size for strings to apply the ‘text_search’ property to. Used only when text_search_columns has a value.

truncate_strings – If set to true, truncate string values that are longer than the column’s type size. Allowed values are:

true

false

The default value is ‘false’.

truncate_table – If set to true, truncates the table specified by input parameter table_name prior to loading the file(s). Allowed values are:

true

false

The default value is ‘false’.

type_inference_max_records_read

type_inference_mode – Optimize type inferencing for either speed or accuracy. Allowed values are:

accuracy – Scans data to get exactly-typed & sized columns for all data scanned.

speed – Scans data and picks the widest possible column types so that ‘all’ values will fit with minimum data scanned

The default value is ‘accuracy’.

update_on_existing_pk – Specifies the record collision policy for inserting into a table with a primary key. If set to true, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be ‘upserted’). If set to false, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by ignore_existing_pk & error_handling. If the specified table does not have a primary key, then this option has no effect. Allowed values are:

true – Upsert new records when primary keys match existing records

false – Reject new records when primary keys match existing records

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

type_id (str) –
ID of the currently registered table structure type for the target table

type_definition (str) –
A JSON string describing the columns of the target table

type_label (str) –
The user-defined description associated with the target table’s structure

type_properties (dict of str to lists of str) –
A mapping of each target table column name to an array of column properties associated with that column

count_inserted (long) –
Number of records inserted into the target table.

count_skipped (long) –
Number of records skipped, when not running in abort error handling mode.

count_updated (long) –
[Not yet implemented] Number of records updated within the target table.

info (dict of str to str) –
Additional information.

files (list of str)

insert_records_from_payload(table_name=None, data_text=None, data_bytes=None, modify_columns={}, create_table_options={}, options={})[source]

Reads from the given text-based or binary payload and inserts the data into a new or existing table. The table will be created if it doesn’t already exist.

Returns once all records are processed.

Parameters

table_name (str) –
Name of the table into which the data will be inserted, in [schema_name.]table_name format, using standard name resolution rules. If the table does not exist, the table will be created using either an existing type_id or the type inferred from the payload, and the new table name will have to meet standard table naming criteria.

data_text (str) –
Records formatted as delimited text

data_bytes (bytes) –
Records formatted as binary data

modify_columns (dict of str to dicts of str to str) –
Not implemented yet. The default value is an empty dict ( {} ).

create_table_options (dict of str to str) –
Options used when creating the target table. Includes type to use. The other options match those in GPUdb.create_table(). Allowed keys are:

type_id – ID of a currently registered type. The default value is ‘’.

no_error_if_exists – If true, prevents an error from occurring if the table already exists and is of the given type. If a table with the same ID but a different type exists, it is still an error. Allowed values are:

true

false

The default value is ‘false’.

is_replicated – Affects the distribution scheme for the table’s data. If true and the given type has no explicit shard key defined, the table will be replicated. If false, the table will be sharded according to the shard key specified in the given type_id, or randomly sharded, if no shard key is specified. Note that a type containing a shard key cannot be used to create a replicated table. Allowed values are:

true

false

The default value is ‘false’.

foreign_keys – Semicolon-separated list of foreign keys, of the format ‘(source_column_name [, …]) references target_table_name(primary_key_column_name [, …]) [as foreign_key_name]’.

foreign_shard_key – Foreign shard key of the format ‘source_column references shard_by_column from target_table(primary_key_column)’.

partition_type – Partitioning scheme to use. Allowed values are:

RANGE – Use range partitioning.

INTERVAL – Use interval partitioning.

LIST – Use list partitioning.

HASH – Use hash partitioning.

SERIES – Use series partitioning.

partition_keys – Comma-separated list of partition keys, which are the columns or column expressions by which records will be assigned to partitions defined by partition_definitions.

partition_definitions – Comma-separated list of partition definitions, whose format depends on the choice of partition_type. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for example formats.

is_automatic_partition – If true, a new partition will be created for values which don’t fall into an existing partition. Currently only supported for list partitions. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in input parameter table_name.

chunk_size – Indicates the number of records per chunk to be used for this table.

chunk_column_max_memory – Indicates the target maximum data size for each column in a chunk to be used for this table.

chunk_max_memory – Indicates the target maximum data size for all columns in a chunk to be used for this table.

is_result_table – Indicates whether the table is a memory-only table. A result table cannot contain columns with text_search data-handling, and it will not be retained if the server is restarted. Allowed values are:

true

false

The default value is ‘false’.

strategy_definition – The tier strategy for the table and its columns.

compression_codec – The default compression codec for this table’s columns.

The default value is an empty dict ( {} ).

options (dict of str to str) –
Optional parameters. Allowed keys are:

bad_record_table_name – Optional name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string).

bad_record_table_limit – A positive integer indicating the maximum number of records that can be written to the bad-record-table. Default value is 10000

bad_record_table_limit_per_input – For subscriptions: A positive integer indicating the maximum number of records that can be written to the bad-record-table per file/payload. Default value will be ‘bad_record_table_limit’ and total size of the table per rank is limited to ‘bad_record_table_limit’

batch_size – Internal tuning parameter–number of records per batch when inserting data.

column_formats – For each target column specified, applies the column-property-bound format to the source data loaded into that column. Each column format will contain a mapping of one or more of its column properties to an appropriate format for each property. Currently supported column properties include date, time, & datetime. The parameter value must be formatted as a JSON string of maps of column names to maps of column properties to their corresponding column formats, e.g., ‘{ “order_date” : { “date” : “%Y.%m.%d” }, “order_time” : { “time” : “%H:%M:%S” } }’.

See default_column_formats for valid format syntax.

columns_to_load – Specifies a comma-delimited list of columns from the source data to load. If more than one file is being loaded, this list applies to all files.

Column numbers can be specified discretely or as a range. For example, a value of ‘5,7,1..3’ will insert values from the fifth column in the source data into the first column in the target table, from the seventh column in the source data into the second column in the target table, and from the first through third columns in the source data into the third through fifth columns in the target table.

If the source data contains a header, column names matching the file header names may be provided instead of column numbers. If the target table doesn’t exist, the table will be created with the columns in this order. If the target table does exist with columns in a different order than the source data, this list can be used to match the order of the target table. For example, a value of ‘C, B, A’ will create a three column table with column C, followed by column B, followed by column A; or will insert those fields in that order into a table created with columns in that order. If the target table exists, the column names must match the source data field names for a name-mapping to be successful.

Mutually exclusive with columns_to_skip.

columns_to_skip – Specifies a comma-delimited list of columns from the source data to skip. Mutually exclusive with columns_to_load.

compression_type – Optional: payload compression type. Allowed values are:

none – Uncompressed

auto – Default. Auto detect compression type

gzip – gzip file compression.

bzip2 – bzip2 file compression.

The default value is ‘auto’.

default_column_formats – Specifies the default format to be applied to source data loaded into columns with the corresponding column property. Currently supported column properties include date, time, & datetime. This default column-property-bound format can be overridden by specifying a column property & format for a given target column in column_formats. For each specified annotation, the format will apply to all columns with that annotation unless a custom column_formats for that annotation is specified.

The parameter value must be formatted as a JSON string that is a map of column properties to their respective column formats, e.g., ‘{ “date” : “%Y.%m.%d”, “time” : “%H:%M:%S” }’. Column formats are specified as a string of control characters and plain text. The supported control characters are ‘Y’, ‘m’, ‘d’, ‘H’, ‘M’, ‘S’, and ‘s’, which follow the Linux ‘strptime()’ specification, as well as ‘s’, which specifies seconds and fractional seconds (though the fractional component will be truncated past milliseconds).

Formats for the ‘date’ annotation must include the ‘Y’, ‘m’, and ‘d’ control characters. Formats for the ‘time’ annotation must include the ‘H’, ‘M’, and either ‘S’ or ‘s’ (but not both) control characters. Formats for the ‘datetime’ annotation meet both the ‘date’ and ‘time’ control character requirements. For example, ‘{“datetime” : “%m/%d/%Y %H:%M:%S” }’ would be used to interpret text as “05/04/2000 12:12:11”

error_handling – Specifies how errors should be handled upon insertion. Allowed values are:

permissive – Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.

ignore_bad_records – Malformed records are skipped.

abort – Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.

The default value is ‘abort’.

file_type – Specifies the type of the file(s) whose records will be inserted. Allowed values are:

avro – Avro file format

delimited_text – Delimited text file format; e.g., CSV, TSV, PSV, etc.

gdb – Esri/GDB file format

json – Json file format

parquet – Apache Parquet file format

shapefile – ShapeFile file format

The default value is ‘delimited_text’.

flatten_columns – Specifies how to handle nested columns. Allowed values are:

true – Break up nested columns to multiple columns

false – Treat nested columns as json columns instead of flattening

The default value is ‘false’.

gdal_configuration_options – Comma separated list of gdal conf options, for the specific requests: key=value. The default value is ‘’.

ignore_existing_pk – Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when update_on_existing_pk is false). If set to true, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If false, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by error_handling. If the specified table does not have a primary key or if upsert mode is in effect (update_on_existing_pk is true), then this option has no effect. Allowed values are:

true – Ignore new records whose primary key values collide with those of existing records

false – Treat as errors any new records whose primary key values collide with those of existing records

The default value is ‘false’.

ingestion_mode – Whether to do a full load, dry run, or perform a type inference on the source data. Allowed values are:

full – Run a type inference on the source data (if needed) and ingest

dry_run – Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of error_handling.

type_inference_only – Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.

The default value is ‘full’.

layer – Optional: geo files layer(s) name(s): comma separated. The default value is ‘’.

loading_mode – Scheme for distributing the extraction and loading of data from the source data file(s). This option applies only when loading files that are local to the database. Allowed values are:

head – The head node loads all data. All files must be available to the head node.

distributed_shared – The head node coordinates loading data by worker processes across all nodes from shared files available to all workers.

NOTE:

Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.

distributed_local – A single worker process on each node loads all files that are available to it. This option works best when each worker loads files from its own file system, to maximize performance. In order to avoid data duplication, either each worker performing the load needs to have visibility to a set of files unique to it (no file is visible to more than one node) or the target table needs to have a primary key (which will allow the worker to automatically deduplicate data).

NOTE:

If the target table doesn’t exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail.

If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.

The default value is ‘head’.

local_time_offset – For Avro local timestamp columns

max_records_to_load – Limit the number of records to load in this request: If this number is larger than a batch_size, then the number of records loaded will be limited to the next whole number of batch_size (per working thread). The default value is ‘’.

num_tasks_per_rank – Optional: number of tasks for reading file per rank. Default will be external_file_reader_num_tasks

poll_interval – If true, the number of seconds between attempts to load external files into the table. If zero, polling will be continuous as long as data is found. If no data is found, the interval will steadily increase to a maximum of 60 seconds.

primary_keys – Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ‘’.

schema_registry_connection_retries – Confluent Schema registry connection timeout (in Secs)

schema_registry_connection_timeout – Confluent Schema registry connection timeout (in Secs)

schema_registry_max_consecutive_connection_failures – Max records to skip due to SR connection failures, before failing

max_consecutive_invalid_schema_failure – Max records to skip due to schema related errors, before failing

schema_registry_schema_name – Name of the Avro schema in the schema registry to use when reading Avro records.

shard_keys – Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ‘’.

skip_lines – Skip a number of lines from the beginning of the file.

subscribe – Continuously poll the data source to check for new data and load it into the table. Allowed values are:

true

false

The default value is ‘false’.

table_insert_mode – Optional: table_insert_mode. When inserting records from multiple files: if table_per_file then insert from each file into a new table. Currently supported only for shapefiles. Allowed values are:

single

table_per_file

The default value is ‘single’.

text_comment_string – Specifies the character string that should be interpreted as a comment line prefix in the source data. All lines in the data starting with the provided string are ignored.

For delimited_text file_type only. The default value is ‘#’.

text_delimiter – Specifies the character delimiting field values in the source data and field names in the header (if present).

For delimited_text file_type only. The default value is ‘,’.

text_escape_character – Specifies the character that is used to escape other characters in the source data.

An ‘a’, ‘b’, ‘f’, ‘n’, ‘r’, ‘t’, or ‘v’ preceded by an escape character will be interpreted as the ASCII bell, backspace, form feed, line feed, carriage return, horizontal tab, & vertical tab, respectively. For example, the escape character followed by an ‘n’ will be interpreted as a newline within a field value.

The escape character can also be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

For delimited_text file_type only.

text_has_header – Indicates whether the source data contains a header row.

For delimited_text file_type only. Allowed values are:

true

false

The default value is ‘true’.

text_header_property_delimiter – Specifies the delimiter for column properties in the header row (if present). Cannot be set to same value as text_delimiter.

For delimited_text file_type only. The default value is ‘|’.

text_null_string – Specifies the character string that should be interpreted as a null value in the source data.

For delimited_text file_type only. The default value is ‘\N’.

text_quote_character – Specifies the character that should be interpreted as a field value quoting character in the source data. The character must appear at beginning and end of field value to take effect. Delimiters within quoted fields are treated as literals and not delimiters. Within a quoted field, two consecutive quote characters will be interpreted as a single literal quote character, effectively escaping it. To not have a quote character, specify an empty string.

For delimited_text file_type only. The default value is ‘”’.

text_search_columns – Add ‘text_search’ property to internally inferenced string columns. Comma separated list of column names or ‘*’ for all columns. To add text_search property only to string columns of minimum size, set also the option ‘text_search_min_column_length’

text_search_min_column_length – Set minimum column size. Used only when ‘text_search_columns’ has a value.

truncate_strings – If set to true, truncate string values that are longer than the column’s type size. Allowed values are:

true

false

The default value is ‘false’.

truncate_table – If set to true, truncates the table specified by input parameter table_name prior to loading the file(s). Allowed values are:

true

false

The default value is ‘false’.

type_inference_max_records_read

type_inference_mode – optimize type inference for: Allowed values are:

accuracy – Scans data to get exactly-typed & sized columns for all data scanned.

speed – Scans data and picks the widest possible column types so that ‘all’ values will fit with minimum data scanned

The default value is ‘accuracy’.

update_on_existing_pk – Specifies the record collision policy for inserting into a table with a primary key. If set to true, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be “upserted”). If set to false, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by ignore_existing_pk & error_handling. If the specified table does not have a primary key, then this option has no effect. Allowed values are:

true – Upsert new records when primary keys match existing records

false – Reject new records when primary keys match existing records

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

type_id (str) –
ID of the currently registered table structure type for the target table

type_definition (str) –
A JSON string describing the columns of the target table

type_label (str) –
The user-defined description associated with the target table’s structure

type_properties (dict of str to lists of str) –
A mapping of each target table column name to an array of column properties associated with that column

count_inserted (long) –
Number of records inserted into the target table.

count_skipped (long) –
Number of records skipped, when not running in abort error handling mode.

count_updated (long) –
[Not yet implemented] Number of records updated within the target table.

info (dict of str to str) –
Additional information.

insert_records_from_query(table_name=None, remote_query=None, modify_columns={}, create_table_options={}, options={})[source]

Computes remote query result and inserts the result data into a new or existing table

Parameters

table_name (str) –
Name of the table into which the data will be inserted, in [schema_name.]table_name format, using standard name resolution rules. If the table does not exist, the table will be created using either an existing type_id or the type inferred from the remote query, and the new table name will have to meet standard table naming criteria.

remote_query (str) –
Query for which result data needs to be imported

modify_columns (dict of str to dicts of str to str) –
Not implemented yet. The default value is an empty dict ( {} ).

create_table_options (dict of str to str) –
Options used when creating the target table. Allowed keys are:

type_id – ID of a currently registered type. The default value is ‘’.

no_error_if_exists – If true, prevents an error from occurring if the table already exists and is of the given type. If a table with the same ID but a different type exists, it is still an error. Allowed values are:

true

false

The default value is ‘false’.

is_replicated – Affects the distribution scheme for the table’s data. If true and the given type has no explicit shard key defined, the table will be replicated. If false, the table will be sharded according to the shard key specified in the given type_id, or randomly sharded, if no shard key is specified. Note that a type containing a shard key cannot be used to create a replicated table. Allowed values are:

true

false

The default value is ‘false’.

foreign_keys – Semicolon-separated list of foreign keys, of the format ‘(source_column_name [, …]) references target_table_name(primary_key_column_name [, …]) [as foreign_key_name]’.

foreign_shard_key – Foreign shard key of the format ‘source_column references shard_by_column from target_table(primary_key_column)’.

partition_type – Partitioning scheme to use. Allowed values are:

RANGE – Use range partitioning.

INTERVAL – Use interval partitioning.

LIST – Use list partitioning.

HASH – Use hash partitioning.

SERIES – Use series partitioning.

partition_keys – Comma-separated list of partition keys, which are the columns or column expressions by which records will be assigned to partitions defined by partition_definitions.

partition_definitions – Comma-separated list of partition definitions, whose format depends on the choice of partition_type. See range partitioning, interval partitioning, list partitioning, hash partitioning, or series partitioning for example formats.

is_automatic_partition – If true, a new partition will be created for values which don’t fall into an existing partition. Currently only supported for list partitions. Allowed values are:

true

false

The default value is ‘false’.

ttl – Sets the TTL of the table specified in input parameter table_name.

chunk_size – Indicates the number of records per chunk to be used for this table.

is_result_table – Indicates whether the table is a memory-only table. A result table cannot contain columns with text_search data-handling, and it will not be retained if the server is restarted. Allowed values are:

true

false

The default value is ‘false’.

strategy_definition – The tier strategy for the table and its columns.

compression_codec – The default compression codec for this table’s columns.

The default value is an empty dict ( {} ).

options (dict of str to str) –
Optional parameters. Allowed keys are:

bad_record_table_name – Optional name of a table to which records that were rejected are written. The bad-record-table has the following columns: line_number (long), line_rejected (string), error_message (string). When error handling is Abort, bad records table is not populated.

bad_record_table_limit – A positive integer indicating the maximum number of records that can be written to the bad-record-table. Default value is 10000

batch_size – Number of records per batch when inserting data.

datasource_name – Name of an existing external data source from which table will be loaded

error_handling – Specifies how errors should be handled upon insertion. Allowed values are:

permissive – Records with missing columns are populated with nulls if possible; otherwise, the malformed records are skipped.

ignore_bad_records – Malformed records are skipped.

abort – Stops current insertion and aborts entire operation when an error is encountered. Primary key collisions are considered abortable errors in this mode.

The default value is ‘abort’.

ignore_existing_pk – Specifies the record collision error-suppression policy for inserting into a table with a primary key, only used when not in upsert mode (upsert mode is disabled when update_on_existing_pk is false). If set to true, any record being inserted that is rejected for having primary key values that match those of an existing table record will be ignored with no error generated. If false, the rejection of any record for having primary key values matching an existing record will result in an error being reported, as determined by error_handling. If the specified table does not have a primary key or if upsert mode is in effect (update_on_existing_pk is true), then this option has no effect. Allowed values are:

true – Ignore new records whose primary key values collide with those of existing records

false – Treat as errors any new records whose primary key values collide with those of existing records

The default value is ‘false’.

ingestion_mode – Whether to do a full load, dry run, or perform a type inference on the source data. Allowed values are:

full – Run a type inference on the source data (if needed) and ingest

dry_run – Does not load data, but walks through the source data and determines the number of valid records, taking into account the current mode of error_handling.

type_inference_only – Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response.

The default value is ‘full’.

jdbc_fetch_size – The JDBC fetch size, which determines how many rows to fetch per round trip.

jdbc_session_init_statement – Executes the statement per each jdbc session before doing actual load. The default value is ‘’.

num_splits_per_rank – Optional: number of splits for reading data per rank. Default will be external_file_reader_num_tasks. The default value is ‘’.

num_tasks_per_rank – Optional: number of tasks for reading data per rank. Default will be external_file_reader_num_tasks

primary_keys – Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ‘’.

shard_keys – Optional: comma separated list of column names, to set as primary keys, when not specified in the type. The default value is ‘’.

subscribe – Continuously poll the data source to check for new data and load it into the table. Allowed values are:

true

false

The default value is ‘false’.

truncate_table – If set to true, truncates the table specified by input parameter table_name prior to loading the data. Allowed values are:

true

false

The default value is ‘false’.

remote_query – Remote SQL query from which data will be sourced

remote_query_order_by – Name of column to be used for splitting the query into multiple sub-queries using ordering of given column. The default value is ‘’.

remote_query_filter_column – Name of column to be used for splitting the query into multiple sub-queries using the data distribution of given column. The default value is ‘’.

remote_query_increasing_column – Column on subscribed remote query result that will increase for new records (e.g., TIMESTAMP). The default value is ‘’.

remote_query_partition_column – Alias name for remote_query_filter_column. The default value is ‘’.

truncate_strings – If set to true, truncate string values that are longer than the column’s type size. Allowed values are:

true

false

The default value is ‘false’.

update_on_existing_pk – Specifies the record collision policy for inserting into a table with a primary key. If set to true, any existing table record with primary key values that match those of a record being inserted will be replaced by that new record (the new data will be “upserted”). If set to false, any existing table record with primary key values that match those of a record being inserted will remain unchanged, while the new record will be rejected and the error handled as determined by ignore_existing_pk & error_handling. If the specified table does not have a primary key, then this option has no effect. Allowed values are:

true – Upsert new records when primary keys match existing records

false – Reject new records when primary keys match existing records

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

type_id (str) –
ID of the currently registered table structure type for the target table

type_definition (str) –
A JSON string describing the columns of the target table

type_label (str) –
The user-defined description associated with the target table’s structure

type_properties (dict of str to lists of str) –
A mapping of each target table column name to an array of column properties associated with that column

count_inserted (long) –
Number of records inserted into the target table.

count_skipped (long) –
Number of records skipped, when not running in abort error handling mode.

count_updated (long) –
[Not yet implemented] Number of records updated within the target table.

info (dict of str to str) –
Additional information.

insert_records_random(table_name=None, count=None, options={})[source]

Generates a specified number of random records and adds them to the given table. There is an optional parameter that allows the user to customize the ranges of the column values. It also allows the user to specify linear profiles for some or all columns in which case linear values are generated rather than random ones. Only individual tables are supported for this operation.

This operation is synchronous, meaning that a response will not be returned until all random records are fully available.

Parameters

table_name (str) –
Table to which random records will be added, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table, not a view.

count (long) –
Number of records to generate.

options (dict of str to dicts of str to floats) –
Optional parameter to pass in specifications for the randomness of the values. This map is different from the options parameter of most other endpoints in that it is a map of string to map of string to doubles, while most others are maps of string to string. In this map, the top level keys represent which column’s parameters are being specified, while the internal keys represents which parameter is being specified. These parameters take on different meanings depending on the type of the column. Below follows a more detailed description of the map: Allowed keys are:

seed – If provided, the internal random number generator will be initialized with the given value. The minimum is 0. This allows for the same set of random numbers to be generated across invocation of this endpoint in case the user wants to repeat the test. Since input parameter options, is a map of maps, we need an internal map to provide the seed value. For example, to pass 100 as the seed value through this parameter, you need something equivalent to: ‘options’ = {‘seed’: { ‘value’: 100 } }. Allowed keys are:

value – The seed value to use

all – This key indicates that the specifications relayed in the internal map are to be applied to all columns of the records. Allowed keys are:

min – For numerical columns, the minimum of the generated values is set to this value. Default is -99999. For point, shape, and track columns, min for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are -180.0 and -90.0. For the ‘TIMESTAMP’ column, the default minimum corresponds to Jan 1, 2010.

For string columns, the minimum length of the randomly generated strings is set to this value (default is 0). If both minimum and maximum are provided, minimum must be less than or equal to max.

If the min is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.

max – For numerical columns, the maximum of the generated values is set to this value. Default is 99999. For point, shape, and track columns, max for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are 180.0 and 90.0.

For string columns, the maximum length of the randomly generated strings. If both minimum and maximum are provided, max must be greater than or equal to min.

If the max is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.

interval – If specified, generate values for all columns evenly spaced with the given interval value. If a max value is specified for a given column the data is randomly generated between min and max and decimated down to the interval. If no max is provided the data is linearly generated starting at the minimum value (instead of generating random data). For non-decimated string-type columns the interval value is ignored. Instead the values are generated following the pattern: ‘attrname_creationIndex#’, i.e. the column name suffixed with an underscore and a running counter (starting at 0). For string types with limited size (e.g. char4) the prefix is dropped. No nulls will be generated for nullable columns.

null_percentage – If specified, then generate the given percentage of the count as nulls for all nullable columns. This option will be ignored for non-nullable columns. The value must be within the range [0, 1.0]. The default value is 5% (0.05).

cardinality – If specified, limit the randomly generated values to a fixed set. Not allowed on a column with interval specified, and is not applicable to WKT or Track-specific columns. The value must be greater than 0. This option is disabled by default.

attr_name – Use the desired column name in place of attr_name, and set the following parameters for the column specified. This overrides any parameter set by all. Allowed keys are:

min – For numerical columns, the minimum of the generated values is set to this value. Default is -99999. For point, shape, and track columns, min for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are -180.0 and -90.0. For the ‘TIMESTAMP’ column, the default minimum corresponds to Jan 1, 2010.

For string columns, the minimum length of the randomly generated strings is set to this value (default is 0). If both minimum and maximum are provided, minimum must be less than or equal to max.

If the min is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.

max – For numerical columns, the maximum of the generated values is set to this value. Default is 99999. For point, shape, and track columns, max for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are 180.0 and 90.0.

For string columns, the maximum length of the randomly generated strings. If both minimum and maximum are provided, max must be greater than or equal to min.

If the max is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.

interval – If specified, generate values for all columns evenly spaced with the given interval value. If a max value is specified for a given column the data is randomly generated between min and max and decimated down to the interval. If no max is provided the data is linearly generated starting at the minimum value (instead of generating random data). For non-decimated string-type columns the interval value is ignored. Instead the values are generated following the pattern: ‘attrname_creationIndex#’, i.e. the column name suffixed with an underscore and a running counter (starting at 0). For string types with limited size (e.g. char4) the prefix is dropped. No nulls will be generated for nullable columns.

null_percentage – If specified and if this column is nullable, then generate the given percentage of the count as nulls. This option will result in an error if the column is not nullable. The value must be within the range [0, 1.0]. The default value is 5% (0.05).

cardinality – If specified, limit the randomly generated values to a fixed set. Not allowed on a column with interval specified, and is not applicable to WKT or Track-specific columns. The value must be greater than 0. This option is disabled by default.

track_length – This key-map pair is only valid for track data sets (an error is thrown otherwise). No nulls would be generated for nullable columns. Allowed keys are:

min – Minimum possible length for generated series; default is 100 records per series. Must be an integral value within the range [1, 500]. If both min and max are specified, min must be less than or equal to max. The minimum allowed value is 1. The maximum allowed value is 500.

max – Maximum possible length for generated series; default is 500 records per series. Must be an integral value within the range [1, 500]. If both min and max are specified, max must be greater than or equal to min. The minimum allowed value is 1. The maximum allowed value is 500.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

count (long) –
Number of records inserted.

info (dict of str to str) –
Additional information.

insert_symbol(symbol_id=None, symbol_format=None, symbol_data=None, options={})[source]

Adds a symbol or icon (i.e. an image) to represent data points when data is rendered visually. Users must provide the symbol identifier (string), a format (currently supported: ‘svg’ and ‘svg_path’), the data for the symbol, and any additional optional parameter (e.g. color). To have a symbol used for rendering create a table with a string column named ‘SYMBOLCODE’ (along with ‘x’ or ‘y’ for example). Then when the table is rendered (via WMS) if the ‘dosymbology’ parameter is ‘true’ then the value of the ‘SYMBOLCODE’ column is used to pick the symbol displayed for each point.

Parameters

symbol_id (str) –
The id of the symbol being added. This is the same id that should be in the ‘SYMBOLCODE’ column for objects using this symbol

symbol_format (str) –
Specifies the symbol format. Must be either ‘svg’ or ‘svg_path’. Allowed values are:

svg

svg_path

symbol_data (bytes) –
The actual symbol data. If input parameter symbol_format is ‘svg’ then this should be the raw bytes representing an svg file. If input parameter symbol_format is svg path then this should be an svg path string, for example: ‘M25.979,12.896,5.979,12.896,5.979,19.562,25.979,19.562z’

options (dict of str to str) –
Optional parameters. Allowed keys are:

color – If input parameter symbol_format is ‘svg’ this is ignored. If input parameter symbol_format is ‘svg_path’ then this option specifies the color (in RRGGBB hex format) of the path. For example, to have the path rendered in red, used ‘FF0000’. If ‘color’ is not provided then ‘00FF00’ (i.e. green) is used by default.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

symbol_id (str) –
Value of input parameter symbol_id.

info (dict of str to str) –
Additional information.

kill_proc(run_id='', options={})[source]

Kills a running proc instance.

Parameters

run_id (str) –
The run ID of a running proc instance. If a proc with a matching run ID is not found or the proc instance has already completed, no procs will be killed. If not specified, all running proc instances will be killed. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

run_tag – If input parameter run_id is specified, kill the proc instance that has a matching run ID and a matching run tag that was provided to GPUdb.execute_proc(). If input parameter run_id is not specified, kill the proc instance(s) where a matching run tag was provided to GPUdb.execute_proc(). The default value is ‘’.

clear_execute_at_startup – If true, kill and remove the instance of the proc matching the auto-start run ID that was created to run when the database is started. The auto-start run ID was returned from GPUdb.execute_proc() and can be retrieved using GPUdb.show_proc(). Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

run_ids (list of str) –
List of run IDs of proc instances that were killed.

info (dict of str to str) –
Additional information.

lock_table(table_name=None, lock_type='status', options={})[source]

Manages global access to a table’s data. By default a table has a input parameter lock_type of read_write, indicating all operations are permitted. A user may request a read_only or a write_only lock, after which only read or write operations, respectively, are permitted on the table until the lock is removed. When input parameter lock_type is no_access then no operations are permitted on the table. The lock status can be queried by setting input parameter lock_type to status.

Parameters

table_name (str) –
Name of the table to be locked, in [schema_name.]table_name format, using standard name resolution rules. It must be a currently existing table or view.

lock_type (str) –
The type of lock being applied to the table. Setting it to status will return the current lock status of the table without changing it. Allowed values are:

status – Show locked status

no_access – Allow no read/write operations

read_only – Allow only read operations

write_only – Allow only write operations

read_write – Allow all read/write operations

The default value is ‘status’.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

lock_type (str) –
Returns the lock state of the table.

info (dict of str to str) –
Additional information.

match_graph(graph_name=None, sample_points=None, solve_method='markov_chain', solution_table='', options={})[source]

Matches a directed route implied by a given set of latitude/longitude points to an existing underlying road network graph using a given solution type.

IMPORTANT: It’s highly recommended that you review the Graphs & Solvers concepts documentation, the Graph REST Tutorial, and/or some /match/graph examples before using this endpoint.

Parameters

graph_name (str) –
Name of the underlying geospatial graph resource to match to using input parameter sample_points.

sample_points (list of str) –
Sample points used to match to an underlying geospatial graph. Sample points must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with: existing column names, e.g., ‘table.column AS SAMPLE_X’; expressions, e.g., ‘ST_MAKEPOINT(table.x, table.y) AS SAMPLE_WKTPOINT’; or constant values, e.g., ‘{1, 2, 10} AS SAMPLE_TRIPID’. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

solve_method (str) –
The type of solver to use for graph matching. Allowed values are:

markov_chain – Matches input parameter sample_points to the graph using the Hidden Markov Model (HMM)-based method, which conducts a range-tree closest-edge search to find the best combinations of possible road segments (num_segments) for each sample point to create the best route. The route is secured one point at a time while looking ahead chain_width number of points, so the prediction is corrected after each point. This solution type is the most accurate but also the most computationally intensive. Related options: num_segments and chain_width.

match_od_pairs – Matches input parameter sample_points to find the most probable path between origin and destination pairs with cost constraints.

match_supply_demand – Matches input parameter sample_points to optimize scheduling multiple supplies (trucks) with varying sizes to varying demand sites with varying capacities per depot. Related options: partial_loading and max_combinations.

match_batch_solves – Matches input parameter sample_points source and destination pairs for the shortest path solves in batch mode.

match_loops – Matches closed loops (Eulerian paths) originating and ending at each graph node within min and max hops (levels).

match_charging_stations – Matches an optimal path across a number of ev-charging stations between source and target locations.

match_similarity – Matches the intersection set(s) by computing the Jaccard similarity score between node pairs.

match_pickup_dropoff – Matches the pickups and dropoffs by optimizing the total trip costs

match_clusters – Matches the graph nodes with a cluster index using Louvain clustering algorithm

match_pattern – Matches a pattern in the graph

match_embedding – Creates vector node embeddings

match_isochrone – Solves for isochrones for a set of input sources

The default value is ‘markov_chain’.

solution_table (str) –
The name of the table used to store the results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. This table contains a track of geospatial points for the matched portion of the graph, a track ID, and a score value. Also outputs a details table containing a trip ID (that matches the track ID), the latitude/longitude pair, the timestamp the point was recorded at, and an edge ID corresponding to the matched road segment. Must not be an existing table of the same name. The default value is ‘’.

options (dict of str to str) –
Additional parameters. Allowed keys are:

gps_noise – GPS noise value (in meters) to remove redundant sample points. Use -1 to disable noise reduction. The default value accounts for 95% of point variation (+ or -5 meters). The default value is ‘5.0’.

num_segments – Maximum number of potentially matching road segments for each sample point. For the markov_chain solver, the default is 3. The default value is ‘3’.

search_radius – Maximum search radius used when snapping sample points onto potentially matching surrounding segments. The default value corresponds to approximately 100 meters. The default value is ‘0.001’.

chain_width – For the markov_chain solver only. Length of the sample points lookahead window within the Markov kernel; the larger the number, the more accurate the solution. The default value is ‘9’.

source – Optional WKT starting point from input parameter sample_points for the solver. The default behavior for the endpoint is to use time to determine the starting point. The default value is ‘POINT NULL’.

destination – Optional WKT ending point from input parameter sample_points for the solver. The default behavior for the endpoint is to use time to determine the destination point. The default value is ‘POINT NULL’.

partial_loading – For the match_supply_demand solver only. When false (non-default), trucks do not off-load at the demand (store) side if the remainder is less than the store’s need. Allowed values are:

true – Partial off-loading at multiple store (demand) locations

false – No partial off-loading allowed if supply is less than the store’s demand.

The default value is ‘true’.

max_combinations – For the match_supply_demand solver only. This is the cutoff for the number of generated combinations for sequencing the demand locations - can increase this up to 2M. The default value is ‘10000’.

max_supply_combinations – For the match_supply_demand solver only. This is the cutoff for the number of generated combinations for sequencing the supply locations if/when ‘permute_supplies’ is true. The default value is ‘10000’.

left_turn_penalty – This will add an additional weight over the edges labeled as ‘left turn’ if the ‘add_turn’ option parameter of the GPUdb.create_graph() was invoked at graph creation. The default value is ‘0.0’.

right_turn_penalty – This will add an additional weight over the edges labeled as’ right turn’ if the ‘add_turn’ option parameter of the GPUdb.create_graph() was invoked at graph creation. The default value is ‘0.0’.

intersection_penalty – This will add an additional weight over the edges labeled as ‘intersection’ if the ‘add_turn’ option parameter of the GPUdb.create_graph() was invoked at graph creation. The default value is ‘0.0’.

sharp_turn_penalty – This will add an additional weight over the edges labeled as ‘sharp turn’ or ‘u-turn’ if the ‘add_turn’ option parameter of the GPUdb.create_graph() was invoked at graph creation. The default value is ‘0.0’.

aggregated_output – For the match_supply_demand solver only. When it is true (default), each record in the output table shows a particular truck’s scheduled cumulative round trip path (MULTILINESTRING) and the corresponding aggregated cost. Otherwise, each record shows a single scheduled truck route (LINESTRING) towards a particular demand location (store id) with its corresponding cost. The default value is ‘true’.

output_tracks – For the match_supply_demand solver only. When it is true (non-default), the output will be in tracks format for all the round trips of each truck in which the timestamps are populated directly from the edge weights starting from their originating depots. The default value is ‘false’.

max_trip_cost – For the match_supply_demand and match_pickup_dropoff solvers only. If this constraint is greater than zero (default) then the trucks/rides will skip traveling from one demand/pick location to another if the cost between them is greater than this number (distance or time). Zero (default) value means no check is performed. The default value is ‘0.0’.

filter_folding_paths – For the markov_chain solver only. When true (non-default), the paths per sequence combination is checked for folding over patterns and can significantly increase the execution time depending on the chain width and the number of GPS samples. Allowed values are:

true – Filter out the folded paths.

false – Do not filter out the folded paths

The default value is ‘false’.

unit_unloading_cost – For the match_supply_demand solver only. The unit cost per load amount to be delivered. If this value is greater than zero (default) then the additional cost of this unit load multiplied by the total dropped load will be added over to the trip cost to the demand location. The default value is ‘0.0’.

max_num_threads – For the markov_chain solver only. If specified (greater than zero), the maximum number of threads will not be greater than the specified value. It can be lower due to the memory and the number cores available. Default value of zero allows the algorithm to set the maximal number of threads within these constraints. The default value is ‘0’.

service_limit – For the match_supply_demand solver only. If specified (greater than zero), any supply actor’s total service cost (distance or time) will be limited by the specified value including multiple rounds (if set). The default value is ‘0.0’.

enable_reuse – For the match_supply_demand solver only. If specified (true), all supply actors can be scheduled for second rounds from their originating depots. Allowed values are:

true – Allows reusing supply actors (trucks, e.g.) for scheduling again.

false – Supply actors are scheduled only once from their depots.

The default value is ‘false’.

max_stops – For the match_supply_demand solver only. If specified (greater than zero), a supply actor (truck) can at most have this many stops (demand locations) in one round trip. Otherwise, it is unlimited. If ‘enable_truck_reuse’ is on, this condition will be applied separately at each round trip use of the same truck. The default value is ‘0’.

service_radius – For the match_supply_demand and match_pickup_dropoff solvers only. If specified (greater than zero), it filters the demands/picks outside this radius centered around the supply actor/ride’s originating location (distance or time). The default value is ‘0.0’.

permute_supplies – For the match_supply_demand solver only. If specified (true), supply side actors are permuted for the demand combinations during MSDO optimization - note that this option increases optimization time significantly - use of ‘max_combinations’ option is recommended to prevent prohibitively long runs. Allowed values are:

true – Generates sequences over supply side permutations if total supply is less than twice the total demand

false – Permutations are not performed, rather a specific order of supplies based on capacity is computed

The default value is ‘true’.

batch_tsm_mode – For the match_supply_demand solver only. When enabled, it sets the number of visits on each demand location by a single salesman at each trip is considered to be (one) 1, otherwise there is no bound. Allowed values are:

true – Sets only one visit per demand location by a salesman (TSM mode)

false – No preset limit (usual MSDO mode)

The default value is ‘false’.

round_trip – For the match_supply_demand solver only. When enabled, the supply will have to return back to the origination location. Allowed values are:

true – The optimization is done for trips in round trip manner always returning to originating locations

false – Supplies do not have to come back to their originating locations in their routes. The routes are considered finished at the final dropoff.

The default value is ‘true’.

num_cycles – For the match_clusters solver only. Terminates the cluster exchange iterations across 2-step-cycles (outer loop) when quality does not improve during iterations. The default value is ‘10’.

num_loops_per_cycle – For the match_clusters and match_embedding solvers only. Terminates the cluster exchanges within the first step iterations of a cycle (inner loop) unless convergence is reached. The default value is ‘10’.

num_output_clusters – For the match_clusters solver only. Limits the output to the top ‘num_output_clusters’ clusters based on density. Default value of zero outputs all clusters. The default value is ‘0’.

max_num_clusters – For the match_clusters and match_embedding solvers only. If set (value greater than zero), it terminates when the number of clusters goes below than this number. For embedding solver the default is 8. The default value is ‘0’.

cluster_quality_metric – For the match_clusters solver only. The quality metric for Louvain modularity optimization solver. Allowed values are:

girvan – Uses the Newman Girvan quality metric for cluster solver

spectral – Applies recursive spectral bisection (RSB) partitioning solver

The default value is ‘girvan’.

restricted_type – For the match_supply_demand solver only. Optimization is performed by restricting routes labeled by ‘MSDO_ODDEVEN_RESTRICTED’ only for this supply actor (truck) type. Allowed values are:

odd – Applies odd/even rule restrictions to odd tagged vehicles.

even – Applies odd/even rule restrictions to even tagged vehicles.

none – Does not apply odd/even rule restrictions to any vehicles.

The default value is ‘none’.

server_id – Indicates which graph server(s) to send the request to. Default is to send to the server, amongst those containing the corresponding graph, that has the most computational bandwidth. The default value is ‘’.

inverse_solve – For the match_batch_solves solver only. Solves source-destination pairs using inverse shortest path solver. Allowed values are:

true – Solves using inverse shortest path solver.

false – Solves using direct shortest path solver.

The default value is ‘false’.

min_loop_level – For the match_loops solver only. Finds closed loops around each node deducible not less than this minimal hop (level) deep. The default value is ‘0’.

max_loop_level – For the match_loops solver only. Finds closed loops around each node deducible not more than this maximal hop (level) deep. The default value is ‘5’.

search_limit – For the match_loops solver only. Searches within this limit of nodes per vertex to detect loops. The value zero means there is no limit. The default value is ‘10000’.

output_batch_size – For the match_loops solver only. Uses this value as the batch size of the number of loops in flushing(inserting) to the output table. The default value is ‘1000’.

charging_capacity – For the match_charging_stations solver only. This is the maximum ev-charging capacity of a vehicle (distance in meters or time in seconds depending on the unit of the graph weights). The default value is ‘300000.0’.

charging_candidates – For the match_charging_stations solver only. Solver searches for this many number of stations closest around each base charging location found by capacity. The default value is ‘10’.

charging_penalty – For the match_charging_stations solver only. This is the penalty for full charging. The default value is ‘30000.0’.

max_hops – For the match_similarity and match_embedding solvers only. Searches within this maximum hops for source and target node pairs to compute the Jaccard scores. The default value is ‘3’.

traversal_node_limit – For the match_similarity solver only. Limits the traversal depth if it reaches this many number of nodes. The default value is ‘1000’.

paired_similarity – For the match_similarity solver only. If true, it computes Jaccard score between each pair, otherwise it will compute Jaccard from the intersection set between the source and target nodes. Allowed values are:

true

false

The default value is ‘true’.

force_undirected – For the match_pattern and match_embedding solvers only. Pattern matching will be using both pattern and graph as undirected if set to true. Allowed values are:

true

false

The default value is ‘false’.

max_vector_dimension – For the match_embedding solver only. Limits the number of dimensions in node vector embeddings. The default value is ‘1000’.

optimize_embedding_weights – For the match_embedding solvers only. Solves to find the optimal weights per sub feature in vector embeddings. Allowed values are:

true

false

The default value is ‘false’.

embedding_weights – For the match_embedding solver only. User specified weights per sub feature in vector embeddings. The string contains the comma separated float values for each sub-feature in the vector space. These values will ONLY be used if ‘optimize_embedding_weights’ is false. The default value is ‘1.0,1.0,1.0,1.0’.

optimization_sampling_size – For the match_embedding solver only. Sets the number of random nodes from the graph for solving the weights using stochastic gradient descent. The default value is ‘1000’.

optimization_max_iterations – For the match_embedding solver only. When the iterations (epochs) for the convergence of the stochastic gradient descent algorithm reaches this number it bails out unless relative error between consecutive iterations is below the ‘optimization_error_tolerance’ option. The default value is ‘1000’.

optimization_error_tolerance – For the match_embedding solver only. When the relative error between all of the weights’ consecutive iterations falls below this threshold the optimization cycle is interrupted unless the number of iterations reaches the limit set by the option ‘max_optimization_iterations’. The default value is ‘0.001’.

optimization_iteration_rate – For the match_embedding solver only. It is otherwise known as the learning rate, which is the proportionality constant in front of the gradient term in successive iterations. The default value is ‘0.3’.

max_radius – For the match_isochrone solver only. Sets the maximal reachability limit for computing isochrones. Zero means no limit. The default value is ‘0.0’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

result (bool) –
Indicates a successful solution.

match_score (float) –
The mean square error calculation representing the map matching score. Values closer to zero are better.

info (dict of str to str) –
Additional information.

modify_graph(graph_name=None, nodes=None, edges=None, weights=None, restrictions=None, options={})[source]

Update an existing graph network using given nodes, edges, weights, restrictions, and options.

IMPORTANT: It’s highly recommended that you review the Graphs & Solvers concepts documentation, and Graph REST Tutorial before using this endpoint.

Parameters

graph_name (str) –
Name of the graph resource to modify.

nodes (list of str) –
Nodes with which to update existing input parameter nodes in graph specified by input parameter graph_name. Review Nodes for more information. Nodes must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS NODE_ID’, expressions, e.g., ‘ST_MAKEPOINT(column1, column2) AS NODE_WKTPOINT’, or raw values, e.g., ‘{9, 10, 11} AS NODE_ID’. If using raw values in an identifier combination, the number of values specified must match across the combination. Identifier combination(s) do not have to match the method used to create the graph, e.g., if column names were specified to create the graph, expressions or raw values could also be used to modify the graph. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

edges (list of str) –
Edges with which to update existing input parameter edges in graph specified by input parameter graph_name. Review Edges for more information. Edges must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS EDGE_ID’, expressions, e.g., ‘SUBSTR(column, 1, 6) AS EDGE_NODE1_NAME’, or raw values, e.g., “{‘family’, ‘coworker’} AS EDGE_LABEL”. If using raw values in an identifier combination, the number of values specified must match across the combination. Identifier combination(s) do not have to match the method used to create the graph, e.g., if column names were specified to create the graph, expressions or raw values could also be used to modify the graph. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

weights (list of str) –
Weights with which to update existing input parameter weights in graph specified by input parameter graph_name. Review Weights for more information. Weights must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS WEIGHTS_EDGE_ID’, expressions, e.g., ‘ST_LENGTH(wkt) AS WEIGHTS_VALUESPECIFIED’, or raw values, e.g., ‘{4, 15} AS WEIGHTS_VALUESPECIFIED’. If using raw values in an identifier combination, the number of values specified must match across the combination. Identifier combination(s) do not have to match the method used to create the graph, e.g., if column names were specified to create the graph, expressions or raw values could also be used to modify the graph. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

restrictions (list of str) –
Restrictions with which to update existing input parameter restrictions in graph specified by input parameter graph_name. Review Restrictions for more information. Restrictions must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS RESTRICTIONS_EDGE_ID’, expressions, e.g., ‘column/2 AS RESTRICTIONS_VALUECOMPARED’, or raw values, e.g., ‘{0, 0, 0, 1} AS RESTRICTIONS_ONOFFCOMPARED’. If using raw values in an identifier combination, the number of values specified must match across the combination. Identifier combination(s) do not have to match the method used to create the graph, e.g., if column names were specified to create the graph, expressions or raw values could also be used to modify the graph. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

restriction_threshold_value – Value-based restriction comparison. Any node or edge with a RESTRICTIONS_VALUECOMPARED value greater than the restriction_threshold_value will not be included in the graph.

export_create_results – If set to true, returns the graph topology in the response as arrays. Allowed values are:

true

false

The default value is ‘false’.

enable_graph_draw – If set to true, adds a ‘EDGE_WKTLINE’ column identifier to the specified graph_table so the graph can be viewed via WMS; for social and non-geospatial graphs, the ‘EDGE_WKTLINE’ column identifier will be populated with spatial coordinates derived from a flattening layout algorithm so the graph can still be viewed. Allowed values are:

true

false

The default value is ‘false’.

save_persist – If set to true, the graph will be saved in the persist directory (see the config reference for more information). If set to false, the graph will be removed when the graph server is shutdown. Allowed values are:

true

false

The default value is ‘false’.

add_table_monitor – Adds a table monitor to every table used in the creation of the graph; this table monitor will trigger the graph to update dynamically upon inserts to the source table(s). Note that upon database restart, if save_persist is also set to true, the graph will be fully reconstructed and the table monitors will be reattached. For more details on table monitors, see GPUdb.create_table_monitor(). Allowed values are:

true

false

The default value is ‘false’.

graph_table – If specified, the created graph is also created as a table with the given name, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. This table will have the following identifier columns: ‘EDGE_ID’, ‘EDGE_NODE1_ID’, ‘EDGE_NODE2_ID’. If left blank, no table is created. The default value is ‘’.

remove_label_only – When RESTRICTIONS on labeled entities requested, if set to true this will NOT delete the entity but only the label associated with the entity. Otherwise (default), it’ll delete the label AND the entity. Allowed values are:

true

false

The default value is ‘false’.

add_turns – Adds dummy ‘pillowed’ edges around intersection nodes where there are more than three edges so that additional weight penalties can be imposed by the solve endpoints. (increases the total number of edges). Allowed values are:

true

false

The default value is ‘false’.

turn_angle – Value in degrees modifies the thresholds for attributing right, left, sharp turns, and intersections. It is the vertical deviation angle from the incoming edge to the intersection node. The larger the value, the larger the threshold for sharp turns and intersections; the smaller the value, the larger the threshold for right and left turns; 0 < turn_angle < 90. The default value is ‘60’.

use_rtree – Use an range tree structure to accelerate and improve the accuracy of snapping, especially to edges. Allowed values are:

true

false

The default value is ‘true’.

label_delimiter – If provided the label string will be split according to this delimiter and each sub-string will be applied as a separate label onto the specified edge. The default value is ‘’.

allow_multiple_edges – Multigraph choice; allowing multiple edges with the same node pairs if set to true, otherwise, new edges with existing same node pairs will not be inserted. Allowed values are:

true

false

The default value is ‘true’.

embedding_table – If table exists (should be generated by the match/graph match_embedding solver), the vector embeddings for the newly inserted nodes will be appended into this table. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

result (bool) –
Indicates a successful modification on all servers.

num_nodes (long) –
Total number of nodes in the graph.

num_edges (long) –
Total number of edges in the graph.

edges_ids (list of longs) –
Edges given as pairs of node indices. Only populated if export_create_results is set to true.

info (dict of str to str) –
Additional information.

query_graph(graph_name=None, queries=None, restrictions=[], adjacency_table='', rings=1, options={})[source]

Employs a topological query on a graph generated a-priori by GPUdb.create_graph() and returns a list of adjacent edge(s) or node(s), also known as an adjacency list, depending on what’s been provided to the endpoint; providing edges will return nodes and providing nodes will return edges.

To determine the node(s) or edge(s) adjacent to a value from a given column, provide a list of values to input parameter queries. This field can be populated with column values from any table as long as the type is supported by the given identifier. See Query Identifiers for more information.

To return the adjacency list in the response, leave input parameter adjacency_table empty.

IMPORTANT: It’s highly recommended that you review the Graphs & Solvers concepts documentation, the Graph REST Tutorial, and/or some /match/graph examples before using this endpoint.

Parameters

graph_name (str) –
Name of the graph resource to query.

queries (list of str) –
Nodes or edges to be queried specified using query identifiers. Identifiers can be used with existing column names, e.g., ‘table.column AS QUERY_NODE_ID’, raw values, e.g., ‘{0, 2} AS QUERY_NODE_ID’, or expressions, e.g., ‘ST_MAKEPOINT(table.x, table.y) AS QUERY_NODE_WKTPOINT’. Multiple values can be provided as long as the same identifier is used for all values. If using raw values in an identifier combination, the number of values specified must match across the combination. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

restrictions (list of str) –
Additional restrictions to apply to the nodes/edges of an existing graph. Restrictions must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS RESTRICTIONS_EDGE_ID’, expressions, e.g., ‘column/2 AS RESTRICTIONS_VALUECOMPARED’, or raw values, e.g., ‘{0, 0, 0, 1} AS RESTRICTIONS_ONOFFCOMPARED’. If using raw values in an identifier combination, the number of values specified must match across the combination. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

adjacency_table (str) –
Name of the table to store the resulting adjacencies, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. If left blank, the query results are instead returned in the response. If the ‘QUERY_TARGET_NODE_LABEL’ query identifier is used in input parameter queries, then two additional columns will be available: ‘PATH_ID’ and ‘RING_ID’. See Using Labels for more information. The default value is ‘’.

rings (int) –
Sets the number of rings around the node to query for adjacency, with ‘1’ being the edges directly attached to the queried node. Also known as number of hops. For example, if it is set to ‘2’, the edge(s) directly attached to the queried node(s) will be returned; in addition, the edge(s) attached to the node(s) attached to the initial ring of edge(s) surrounding the queried node(s) will be returned. If the value is set to ‘0’, any nodes that meet the criteria in input parameter queries and input parameter restrictions will be returned. This parameter is only applicable when querying nodes. The default value is 1.

options (dict of str to str) –
Additional parameters. Allowed keys are:

force_undirected – If set to true, all inbound edges and outbound edges relative to the node will be returned. If set to false, only outbound edges relative to the node will be returned. This parameter is only applicable if the queried graph input parameter graph_name is directed and when querying nodes. Consult Directed Graphs for more details. Allowed values are:

true

false

The default value is ‘false’.

limit – When specified (>0), limits the number of query results. The size of the nodes table will be limited by the limit value. The default value is ‘0’.

output_wkt_path – If true then concatenated wkt line segments will be added as the WKT column of the adjacency table. Allowed values are:

true

false

The default value is ‘false’.

and_labels – If set to true, the result of the query has entities that satisfy all of the target labels, instead of any. Allowed values are:

true

false

The default value is ‘false’.

server_id – Indicates which graph server(s) to send the request to. Default is to send to the server, amongst those containing the corresponding graph, that has the most computational bandwidth.

output_charn_length – When specified (>0 and <=256), limits the number of char length on the output tables for string based nodes. The default length is 64. The default value is ‘64’.

find_common_labels – If set to true, for many-to-many queries or multi-level traversals, it lists the common labels between the source and target nodes and edge labels in each path. Otherwise (zero rings), it’ll list all labels of the node(s) queried. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

result (bool) –
Indicates a successful query.

adjacency_list_int_array (list of longs) –
The adjacency entity integer ID: either edge IDs per node requested (if using QUERY_EDGE_ID or QUERY_NODE1_ID and QUERY_NODE2_ID in the input) or two node IDs per edge requested (if using QUERY_NODE_ID in the input).

adjacency_list_string_array (list of str) –
The adjacency entity string ID: either edge IDs per node requested (if using QUERY_EDGE_NAME or QUERY_NODE1_NAME and QUERY_NODE2_NAME in the input) or two node IDs per edge requested (if using QUERY_NODE_NAME in the input).

adjacency_list_wkt_array (list of str) –
The adjacency entity WKTPOINT or WKTLINE ID: either edge IDs per node requested (if using QUERY_EDGE_WKTLINE or QUERY_NODE1_WKTPOINT and QUERY_NODE2_WKTPOINT in the input) or two node IDs per edge requested (if using QUERY_NODE_WKTPOINT in the input).

info (dict of str to str) –
Additional information.

repartition_graph(graph_name=None, options={})[source]

Rebalances an existing partitioned graph.

IMPORTANT: It’s highly recommended that you review the Graphs & Solvers concepts documentation, the Graph REST Tutorial, and/or some graph examples before using this endpoint.

Parameters

graph_name (str) –
Name of the graph resource to rebalance.

options (dict of str to str) –
Optional parameters. Allowed keys are:

new_graph_name – If a non-empty value is specified, the original graph will be kept (non-default behavior) and a new balanced graph will be created under this given name. When the value is empty (default), the generated ‘balanced’ graph will replace the original ‘unbalanced’ graph under the same graph name. The default value is ‘’.

source_node – The distributed shortest path solve is run from this source node to all the nodes in the graph to create balanced partitions using the iso-distance levels of the solution. The source node is selected by the rebalance algorithm automatically (default case when the value is an empty string). Otherwise, the user specified node is used as the source. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

result (bool) –
Indicates a successful rebalancing on all servers.

info (dict of str to str) –
Additional information.

restore_backup(backup_name='', restore_objects_map=None, datasource_name=None, options={})[source]

Restores objects from a backup instance. Response from a backup restoration operation.

Parameters

backup_name (str) –
Name of the backup object, which must refer to a currently existing backup. The default value is ‘’.

restore_objects_map (dict of str to str) –
Map of objects to be restored from the backup. Error if empty. Allowed keys are:

all – All object types in a schema (excludes permissions, system configuration, host secret key, KiFS directories and user defined functions)

table – Database Table

credential – Credential

context – Context

datasink – Data Sink

datasource – Data Source

stored_procedure – SQL Procedure

monitor – Table Monitor (Stream)

user – User (internal and external) and associated permissions

role – Role, role members (roles or users, recursively) and associated permissions

configuration – If true, restore the database configuration file. Allowed values are:

false

true

The default value is ‘false’.

datasource_name (str) –
Datasource where backup is located.

options (dict of str to str) –
Optional parameters. Allowed keys are:

backup_id – Backup instance ID to restore. Leave empty to restore the most recent backup instance. The default value is ‘’.

restore_policy – Behavior to apply when restoring objects that already exist. Allowed values are:

none – If an object to be restored currently exists with the same name, abort and return error

replace – If an object to be restored currently exists with the same name, replace it with the backup version

rename – If an object to be restored currently exists with the same name, rename the original version

The default value is ‘none’.

renamed_objects_schema – If the restore policy is rename, optionally use this schema for renamed objects instead of a default generated one. The default value is ‘’.

create_schema_if_not_exist – Create the schema for an object to be restored if it does not currently exist. Error otherwise. Allowed values are:

false

true

The default value is ‘true’.

ddl_only – Only recreates the objects from their DDL, do not restore table data. Allowed values are:

true

false

The default value is ‘false’.

checksum – Verify checksum for backup files. Allowed values are:

false

true

The default value is ‘true’.

dry_run – Does a dry-run restoration operation. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

backup_name (str) –
The backup name

backup_id (long) –
The backup ID that was restored

restored_bytes (long) –
Total size of data restored from backup

restored_files (long) –
Total number of files restored from backup

restored_records (long) –
Total number of records restored from backup

restored_objects (dict of str to str) –
Objects that were successfully restored and their associated types.

renamed_objects (dict of str to str) –
Original and new names of objects that were successfully restored and their associated types.

failed_objects (dict of str to str) –
Objects that failed to be restored and their associated types.

info (dict of str to str) –
Additional information.

revoke_permission(principal='', object=None, object_type=None, permission=None, options={})[source]

Revoke user or role the specified permission on the specified object.

Parameters

principal (str) –
Name of the user or role for which the permission is being revoked. Must be an existing user or role. The default value is ‘’.

object (str) –
Name of object permission is being revoked from. It is recommended to use a fully-qualified name when possible.

object_type (str) –
The type of object being revoked. Allowed values are:

context – Context

credential – Credential

datasink – Data Sink

datasource – Data Source

directory – KIFS File Directory

graph – A Graph object

proc – UDF Procedure

schema – Schema

sql_proc – SQL Procedure

system – System-level access

table – Database Table

table_monitor – Table monitor

permission (str) –
Permission being revoked. Allowed values are:

admin – Full read/write and administrative access on the object.

connect – Connect access on the given data source or data sink.

create – Ability to create new objects of this type.

delete – Delete rows from tables.

execute – Ability to Execute the Procedure object.

insert – Insert access to tables.

read – Ability to read, list and use the object.

send_alert – Ability to send system alerts.

update – Update access to the table.

user_admin – Access to administer users and roles that do not have system_admin permission.

write – Access to write, change and delete objects.

options (dict of str to str) –
Optional parameters. Allowed keys are:

columns – Revoke table security from these columns, comma-separated. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

principal (str) –
Value of input parameter principal.

object (str) –
Value of input parameter object.

object_type (str) –
Value of input parameter object_type.

permission (str) –
Value of input parameter permission.

info (dict of str to str) –
Additional information.

revoke_permission_credential(name=None, permission=None, credential_name=None, options={})[source]

Revokes a credential-level permission from a user or role.

Parameters

name (str) –
Name of the user or role from which the permission will be revoked. Must be an existing user or role.

permission (str) –
Permission to revoke from the user or role. Allowed values are:

credential_admin – Full read/write and administrative access on the credential.

credential_read – Ability to read and use the credential.

credential_name (str) –
Name of the credential on which the permission will be revoked. Must be an existing credential, or an empty string to revoke access on all credentials.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

credential_name (str) –
Value of input parameter credential_name.

info (dict of str to str) –
Additional information.

revoke_permission_datasource(name=None, permission=None, datasource_name=None, options={})[source]

Revokes a data source permission from a user or role.

Parameters

name (str) –
Name of the user or role from which the permission will be revoked. Must be an existing user or role.

permission (str) –
Permission to revoke from the user or role. Allowed values are:

admin – Admin access on the given data source

connect – Connect access on the given data source

datasource_name (str) –
Name of the data source on which the permission will be revoked. Must be an existing data source, or an empty string to revoke permission from all data sources.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

datasource_name (str) –
Value of input parameter datasource_name.

info (dict of str to str) –
Additional information.

revoke_permission_directory(name=None, permission=None, directory_name=None, options={})[source]

Revokes a KiFS directory-level permission from a user or role.

Parameters

name (str) –
Name of the user or role from which the permission will be revoked. Must be an existing user or role.

permission (str) –
Permission to revoke from the user or role. Allowed values are:

directory_read – For files in the directory, access to list files, download files, or use files in server side functions.

directory_write – Access to upload files to, or delete files from, the directory. A user or role with write access automatically has read access.

directory_name (str) –
Name of the KiFS directory to which the permission revokes access

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

directory_name (str) –
Value of input parameter directory_name.

info (dict of str to str) –
Additional information.

revoke_permission_proc(name=None, permission=None, proc_name=None, options={})[source]

Revokes a proc-level permission from a user or role.

Parameters

name (str) –
Name of the user or role from which the permission will be revoked. Must be an existing user or role.

permission (str) –
Permission to revoke from the user or role. Allowed values are:

proc_admin – Admin access to the proc.

proc_execute – Execute access to the proc.

proc_name (str) –
Name of the proc to which the permission grants access. Must be an existing proc, or an empty string if the permission grants access to all procs.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

proc_name (str) –
Value of input parameter proc_name.

info (dict of str to str) –
Additional information.

revoke_permission_system(name=None, permission=None, options={})[source]

Revokes a system-level permission from a user or role.

Parameters

name (str) –
Name of the user or role from which the permission will be revoked. Must be an existing user or role.

permission (str) –
Permission to revoke from the user or role. Allowed values are:

system_admin – Full access to all data and system functions.

system_user_admin – Access to administer users and roles that do not have system_admin permission.

system_write – Read and write access to all tables.

system_read – Read-only access to all tables.

system_send_alert – Send system alerts.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

info (dict of str to str) –
Additional information.

revoke_permission_table(name=None, permission=None, table_name=None, options={})[source]

Revokes a table-level permission from a user or role.

Parameters

name (str) –
Name of the user or role from which the permission will be revoked. Must be an existing user or role.

permission (str) –
Permission to revoke from the user or role. Allowed values are:

table_admin – Full read/write and administrative access to the table.

table_insert – Insert access to the table.

table_update – Update access to the table.

table_delete – Delete access to the table.

table_read – Read access to the table.

table_name (str) –
Name of the table to which the permission grants access, in [schema_name.]table_name format, using standard name resolution rules. Must be an existing table, view or schema.

options (dict of str to str) –
Optional parameters. Allowed keys are:

columns – Apply security to these columns, comma-separated. The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

name (str) –
Value of input parameter name.

permission (str) –
Value of input parameter permission.

table_name (str) –
Value of input parameter table_name.

info (dict of str to str) –
Additional information.

revoke_role(role=None, member=None, options={})[source]

Revokes membership in a role from a user or role.

Parameters

role (str) –
Name of the role in which membership will be revoked. Must be an existing role.

member (str) –
Name of the user or role that will be revoked membership in input parameter role. Must be an existing user or role.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

role (str) –
Value of input parameter role.

member (str) –
Value of input parameter member.

info (dict of str to str) –
Additional information.

show_backup(backup_name='', datasource_name=None, options={})[source]

Shows information about a backup Returns detailed information about one or more backup instances.

Parameters

backup_name (str) –
Name of the backup object. An empty string or ‘*’ will return all existing backups. The default value is ‘’.

datasource_name (str) –
Datasource where backup is located.

options (dict of str to str) –
Optional parameters. Allowed keys are:

backup_id – Backup instance ID to show. Leave empty to show information from the most recent backup instance in the container. The default value is ‘’.

show_contents – Shows the contents of the specified backup_id. Allowed values are:

none – No backup contents

object_names – Object names only

object_files – Object names and files

The default value is ‘none’.

no_error_if_not_exists – If false will return an error if the provided input parameter backup_name does not exist. If true then it will return an empty result. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

backup_name (str) –
Value of input parameter backup_name.

backup_description (list of dicts of str to str) –
Backup description

backup_ids (list of dicts of str to str) –
Backup instances in this backup

backup_contents (list of dicts of str to str) –
Backup contents

deleted_backup_ids (list of dicts of str to str) –
Backup instances that have been deleted from this backup object

info (dict of str to str) –
Additional information.

show_credential(credential_name=None, options={})[source]

Shows information about a specified credential or all credentials.

Parameters

credential_name (str) –
Name of the credential on which to retrieve information. The name must refer to a currently existing credential. If ‘*’ is specified, information about all credentials will be returned.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

credential_names (list of str) –
A list of all credential names.

credential_types (list of str) –
A list of each credential’s type.

credential_identities (list of str) –
A list of each credential’s identity.

credentials (list of str) –
A list of each credential’s create_credential_request JSON encoded structure.

additional_info (list of dicts of str to str) –
Additional information about the respective credential in output parameter credential_names.

info (dict of str to str) –
Additional information.

show_datasink(name=None, options={})[source]

Shows information about a specified data sink or all data sinks.

Parameters

name (str) –
Name of the data sink for which to retrieve information. The name must refer to a currently existing data sink. If ‘*’ is specified, information about all data sinks will be returned.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

datasink_names (list of str) –
The data sink names.

destination_types (list of str) –
The destination type of the data sinks named in output parameter datasink_names.

additional_info (list of dicts of str to str) –
Additional information about the respective data sinks in output parameter datasink_names. Allowed keys are:

destination – Destination for the output data in ‘destination_type://path[:port]’ format

kafka_topic_name – Kafka topic if the data sink type is a Kafka broker

user_name – Name of the remote system user

info (dict of str to str) –
Additional information.

show_datasource(name=None, options={})[source]

Shows information about a specified data source or all data sources.

Parameters

name (str) –
Name of the data source for which to retrieve information. The name must refer to a currently existing data source. If ‘*’ is specified, information about all data sources will be returned.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

datasource_names (list of str) –
The data source names.

storage_provider_types (list of str) –
The storage provider type of the data sources named in output parameter datasource_names. Allowed values are:

hdfs – Apache Hadoop Distributed File System

s3 – Amazon S3 bucket

additional_info (list of dicts of str to str) –
Additional information about the respective data sources in output parameter datasource_names. Allowed keys are:

location – Location of the remote storage in ‘storage_provider_type://[storage_path[:storage_port]]’ format

s3_bucket_name – Name of the Amazon S3 bucket used as the data source

s3_region – Name of the Amazon S3 region where the bucket is located

hdfs_kerberos_keytab – Kerberos key for the given HDFS user

user_name – Name of the remote system user

info (dict of str to str) –
Additional information.

show_directories(directory_name='', options={})[source]

Shows information about directories in KiFS. Can be used to show a single directory, or all directories.

Parameters

directory_name (str) –
The KiFS directory name to show. If empty, shows all directories. The default value is ‘’.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

directories (list of str) –
KiFS directory names

users (list of str) –
User that created each directory for the respective directories in output parameter directories

creation_times (list of longs) –
The creation time for each directory in milliseconds since epoch, for the respective directories in output parameter directories

data_usages (list of longs) –
The data usage each directory in bytes, for the respective directories in output parameter directories

data_limits (list of longs) –
The data limit for each directory in bytes, for the respective directories in output parameter directories

permissions (list of str) –
Highest level of permission the calling user has for the respective directories in output parameter directories. Will be empty if no permissions. If a user has been granted both read and write permissions, ‘directory_write’ will be listed.

info (dict of str to str) –
Additional information.

show_environment(environment_name='', options={})[source]

Shows information about a specified user-defined function (UDF) environment or all environments. Returns detailed information about existing environments.

Parameters

environment_name (str) –
Name of the environment on which to retrieve information. The name must refer to a currently existing environment. If ‘*’ or an empty value is specified, information about all environments will be returned. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If true and if the environment specified in input parameter environment_name does not exist, no error is returned. If false and if the environment specified in input parameter environment_name does not exist, then an error is returned. Allowed values are:

true

false

The default value is ‘false’.

show_names_only – If true only return the names of the installed environments and omit package listing. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

environment_names (list of str) –
A list of all environment names.

packages (list of lists of str) –
Information about the installed packages in the respective environments in output parameter environment_names.

info (dict of str to str) –
Additional information.

show_files(paths=None, options={})[source]

Shows information about files in KiFS. Can be used for individual files, or to show all files in a given directory.

Parameters

paths (list of str) –
File paths to show. Each path can be a KiFS directory name, or a full path to a KiFS file. File paths may contain wildcard characters after the KiFS directory delimiter.

Accepted wildcard characters are asterisk (*) to represent any string of zero or more characters, and question mark (?) to indicate a single character. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

file_names (list of str) –
A listing of files in the paths specified

sizes (list of longs) –
Size of each file, in bytes

users (list of str) –
User that created the file

creation_times (list of longs) –
Creation time for each file, in milliseconds since epoch

info (dict of str to str) –
Additional information. Allowed keys are:

multipart_uploads – JSON-encoded information about multipart uploads in progress

show_graph(graph_name='', options={})[source]

Shows information and characteristics of graphs that exist on the graph server.

Parameters

graph_name (str) –
Name of the graph on which to retrieve information. If left as the default value, information about all graphs is returned. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

show_original_request – If set to true, the request that was originally used to create the graph is also returned as JSON. Allowed values are:

true

false

The default value is ‘true’.

server_id – Indicates which graph server(s) to send the request to. Default is to send to get information about all the servers.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

result (bool) –
Indicates a success. This call will fail if the graph specified in the request does not exist.

load (list of ints) –
A percentage approximating the current computational load on the server.

memory (list of longs) –
Available memory.

graph_names (list of str) –
Name(s) of the graph(s).

graph_server_ids (list of ints) –
Id(s) of the graph(s).

graph_owner_user_names (list of str) –
Owner of the graph(s) and associated solution table(s).

graph_owner_resource_groups (list of str) –
Owner of the resource groups(s) of the graph(s).

directed (list of bools) –
Whether or not the edges of the graph have directions (bi-directional edges can still exist in directed graphs). Consult Directed Graphs for more details.

num_nodes (list of longs) –
Total number of nodes in the graph.

num_edges (list of longs) –
Total number of edges in the graph.

num_bytes (list of longs) –
Memory this graph uses in bytes.

resource_capacity (list of longs) –
Memory this graph uses in bytes.

is_persisted (list of bools) –
Shows whether or not the graph is persisted (saved and loaded on launch).

is_partitioned (list of bools) –
Indicates if the graph data is distributed across all available servers.

is_sync_db (list of bools) –
Shows whether or not the graph is linked to the original tables that created it, and will potentially be re-created instead loaded from persist on launch.

has_insert_table_monitor (list of bools) –
Shows whether or not the graph has an insert table monitor attached to it.

original_request (list of str) –
The original client request used to create the graph (before any expression evaluation or separator processing).

info (dict of str to str) –
Additional information.

show_proc(proc_name='', options={})[source]

Shows information about a proc.

Parameters

proc_name (str) –
Name of the proc to show information about. If specified, must be the name of a currently existing proc. If not specified, information about all procs will be returned. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

include_files – If set to true, the files that make up the proc will be returned. If set to false, the files will not be returned. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

proc_names (list of str) –
The proc names.

execution_modes (list of str) –
The execution modes of the procs named in output parameter proc_names. Allowed values are:

distributed – Distributed

nondistributed – Nondistributed

files (list of dicts of str to bytes) –
Maps of the files that make up the procs named in output parameter proc_names.

commands (list of str) –
The commands (excluding arguments) that will be invoked when the procs named in output parameter proc_names are executed.

args (list of lists of str) –
Arrays of command-line arguments that will be passed to the procs named in output parameter proc_names when executed.

options (list of dicts of str to str) –
The optional parameters for the procs named in output parameter proc_names.

info (dict of str to str) –
Additional information.

show_proc_status(run_id='', options={})[source]

Shows the statuses of running or completed proc instances. Results are grouped by run ID (as returned from GPUdb.execute_proc()) and data segment ID (each invocation of the proc command on a data segment is assigned a data segment ID).

Parameters

run_id (str) –
The run ID of a specific proc instance for which the status will be returned. If a proc with a matching run ID is not found, the response will be empty. If not specified, the statuses of all executed proc instances will be returned. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

clear_complete – If set to true, if a proc instance has completed (either successfully or unsuccessfully) then its status will be cleared and no longer returned in subsequent calls. Allowed values are:

true

false

The default value is ‘false’.

run_tag – If input parameter run_id is specified, return the status for a proc instance that has a matching run ID and a matching run tag that was provided to GPUdb.execute_proc(). If input parameter run_id is not specified, return statuses for all proc instances where a matching run tag was provided to GPUdb.execute_proc(). The default value is ‘’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

proc_names (dict of str to str) –
The proc names corresponding to the returned run IDs.

params (dict of str to dicts of str to str) –
The string params passed to GPUdb.execute_proc() for the returned run IDs.

bin_params (dict of str to dicts of str to bytes) –
The binary params passed to GPUdb.execute_proc() for the returned run IDs.

input_table_names (dict of str to lists of str) –
The input table names passed to GPUdb.execute_proc() for the returned run IDs.

input_column_names (dict of str to dicts of str to lists of str) –
The input column names passed to GPUdb.execute_proc() for the returned run IDs, supplemented with the column names for input tables not included in the input column name map.

output_table_names (dict of str to lists of str) –
The output table names passed to GPUdb.execute_proc() for the returned run IDs.

options (dict of str to dicts of str to str) –
The optional parameters passed to GPUdb.execute_proc() for the returned run IDs.

overall_statuses (dict of str to str) –
Overall statuses for the returned run IDs. Note that these are rollups and individual statuses may differ between data segments for the same run ID; see output parameter statuses and output parameter messages for statuses from individual data segments. Allowed values are:

running – The proc instance is currently running.

complete – The proc instance completed with no errors.

killed – The proc instance was killed before completion.

error – The proc instance failed with an error.

none – The proc instance does not have a status, i.e. it has not yet ran.

statuses (dict of str to dicts of str to str) –
Statuses for the returned run IDs, grouped by data segment ID. Allowed values are:

running – The proc instance is currently running.

complete – The proc instance completed with no errors.

killed – The proc instance was killed before completion.

error – The proc instance failed with an error.

none – The proc instance does not have a status, i.e. it has not yet ran.

messages (dict of str to dicts of str to str) –
Messages containing additional status information for the returned run IDs, grouped by data segment ID.

results (dict of str to dicts of str to dicts of str to str) –
String results for the returned run IDs, grouped by data segment ID.

bin_results (dict of str to dicts of str to dicts of str to bytes) –
Binary results for the returned run IDs, grouped by data segment ID.

output (dict of str to dicts of str to dicts of str to lists of str) –
Output lines for the returned run IDs, grouped by data segment ID. Allowed keys are:

stdout – Output lines from stdout.

stderr – Output lines from stderr.

timings (dict of str to dicts of str to dicts of str to longs) –
Timing information for the returned run IDs, grouped by data segment ID.

info (dict of str to str) –
Additional information.

show_resource_objects(options={})[source]

Returns information about the internal sub-components (tiered objects) which use resources of the system. The request can either return results from actively used objects (default) or it can be used to query the status of the objects of a given list of tables. Returns detailed information about the requested resource objects.

Parameters

options (dict of str to str) –
Optional parameters. Allowed keys are:

tiers – Comma-separated list of tiers to query, leave blank for all tiers.

expression – An expression to filter the returned objects. Expression is limited to the following operators: =,!=,<,<=,>,>=,+,-,*,AND,OR,LIKE. For details see Expressions. To use a more complex expression, query the ki_catalog.ki_tiered_objects table directly.

order_by – Single column to be sorted by as well as the sort direction, e.g., ‘size asc’. Allowed values are:

size

id

priority

tier

evictable

owner_resource_group

limit – An integer indicating the maximum number of results to be returned, per rank, or (-1) to indicate that the maximum number of results allowed by the server should be returned. The number of records returned will never exceed the server’s own limit, defined by the max_get_records_size parameter in the server configuration. The default value is ‘100’.

table_names – Comma-separated list of tables to restrict the results to. Use ‘*’ to show all tables.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

rank_objects (dict of str to str) –
Tier usage across ranks. Layout is: response.rank_usage[rank_number][resource_group_name] = group_usage (as stringified json)

info (dict of str to str) –
Additional information.

show_resource_statistics(options={})[source]

Requests various statistics for storage/memory tiers and resource groups. Returns statistics on a per-rank basis.

Parameters

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

statistics_map (dict of str to str) –
Map of resource statistics

info (dict of str to str) –
Additional information.

show_resource_groups(names=None, options={})[source]

Requests resource group properties. Returns detailed information about the requested resource groups.

Parameters

names (list of str) –
List of names of groups to be shown. A single entry with an empty string returns all groups. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

show_default_values – If true include values of fields that are based on the default resource group. Allowed values are:

true

false

The default value is ‘true’.

show_default_group – If true include the default and system resource groups in the response. This value defaults to false if an explicit list of group names is provided, and true otherwise. Allowed values are:

true

false

The default value is ‘true’.

show_tier_usage – If true include the resource group usage on the worker ranks in the response. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

groups (list of dicts of str to str) –
Map of resource group information.

rank_usage (dict of str to str) –
Tier usage across ranks. Layout is: response.rank_usage[rank_number][resource_group_name] = group_usage (as stringified json)

info (dict of str to str) –
Additional information.

show_schema(schema_name=None, options={})[source]

Retrieves information about a schema (or all schemas), as specified in input parameter schema_name.

Parameters

schema_name (str) –
Name of the schema for which to retrieve the information. If blank, then info for all schemas is returned.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If false will return an error if the provided input parameter schema_name does not exist. If true then it will return an empty result if the provided input parameter schema_name does not exist. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

schema_name (str) –
Value of input parameter schema_name.

schema_names (list of str) –
A list of all schema names for which information is returned

schema_tables (list of lists of str) –
An array of arrays containing a list of tables in each of the respective output parameter schema_names.

additional_info (list of dicts of str to str) –
Additional information about the respective tables in output parameter schema_names.

info (dict of str to str) –
Additional information.

show_security(names=None, options={})[source]

Shows security information relating to users and/or roles. If the caller is not a system administrator, only information relating to the caller and their roles is returned.

Parameters

names (list of str) –
A list of names of users and/or roles about which security information is requested. If none are provided, information about all users and roles will be returned. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

show_current_user – If true, returns only security information for the current user. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

types (dict of str to str) –
Map of user/role name to the type of that user/role. Allowed values are:

internal_user – A user whose credentials are managed by the database system.

external_user – A user whose credentials are managed by an external LDAP.

role – A role.

roles (dict of str to lists of str) –
Map of user/role name to a list of names of roles of which that user/role is a member.

permissions (dict of str to lists of dicts of str to str) –
Map of user/role name to a list of permissions directly granted to that user/role.

resource_groups (dict of str to str) –
Map of user name to resource group name.

info (dict of str to str) –
Additional information.

show_sql_proc(procedure_name='', options={})[source]

Shows information about SQL procedures, including the full definition of each requested procedure.

Parameters

procedure_name (str) –
Name of the procedure for which to retrieve the information. If blank, then information about all procedures is returned. The default value is ‘’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If true, no error will be returned if the requested procedure does not exist. If false, an error will be returned if the requested procedure does not exist. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

procedure_names (list of str) –
A list of the names of the requested procedures.

procedure_definitions (list of str) –
A list of the definitions for the requested procedures.

additional_info (list of dicts of str to str) –
Additional information about the respective tables in the requested procedures. Allowed keys are:

execute_as – The periodic execution impersonate user. The default value is ‘’.

execute_interval – The periodic execution interval in seconds. The default value is ‘’.

execute_start_time – The initial date/time that periodic execution began. The default value is ‘’.

execute_stop_time – Time at which the periodic execution stops. The default value is ‘’.

info (dict of str to str) –
Additional information.

show_statistics(table_names=None, options={})[source]

Retrieves the collected column statistics for the specified table(s).

Parameters

table_names (list of str) –
Names of tables whose metadata will be fetched, each in [schema_name.]table_name format, using standard name resolution rules. All provided tables must exist, or an error is returned. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_error_if_not_exists – If true and if the table names specified in input parameter table_names does not exist, no error is returned. If false and if the table names specified in input parameter table_names does not exist, then an error is returned. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_names (list of str) –
Value of input parameter table_names.

stastistics_map (list of lists of dicts of str to str) –
A list of maps which contain the column statistics of the table input parameter table_names.

info (dict of str to str) –
Additional information.

show_system_properties(options={})[source]

Returns server configuration and version related information to the caller. The admin tool uses it to present server related information to the user.

Parameters

options (dict of str to str) –
Optional parameters. Allowed keys are:

properties – A list of comma separated names of properties requested. If not specified, all properties will be returned.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

property_map (dict of str to str) –
A map of server configuration parameters and version information. Allowed keys are:

conf.enable_worker_http_servers – Boolean value indicating whether the system is configured for multi-head ingestion. Allowed values are:

TRUE – Indicates that the system is configured for multi-head ingestion.

FALSE – Indicates that the system is NOT configured for multi-head ingestion.

conf.worker_http_server_ips – Semicolon (‘;’) separated string of IP addresses of all the ingestion-enabled worker heads of the system.

conf.worker_http_server_ports – Semicolon (‘;’) separated string of the port numbers of all the ingestion-enabled worker ranks of the system.

conf.hm_http_port – The host manager port number (an integer value).

conf.enable_ha – Flag indicating whether high availability (HA) is set up (a boolean value).

conf.ha_ring_head_nodes – A comma-separated string of high availability (HA) ring node URLs. If HA is not set up, then an empty string.

info (dict of str to str) –
Additional information.

show_system_status(options={})[source]

Provides server configuration and health related status to the caller. The admin tool uses it to present server related information to the user.

Parameters

options (dict of str to str) –
Optional parameters, currently unused. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

status_map (dict of str to str) –
A map of server configuration and health related status.

info (dict of str to str) –
Additional information.

show_system_timing(options={})[source]

Returns the last 100 database requests along with the request timing and internal job ID. The admin tool uses it to present request timing information to the user.

Parameters

options (dict of str to str) –
Optional parameters, currently unused. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

endpoints (list of str) –
List of recently called endpoints, most recent first.

time_in_ms (list of floats) –
List of time (in ms) of the recent requests.

jobIds (list of str) –
List of the internal job IDs for the recent requests.

info (dict of str to str) –
Additional information.

show_table(table_name=None, options={})[source]

Retrieves detailed information about a table, view, or schema, specified in input parameter table_name. If the supplied input parameter table_name is a schema the call can return information about either the schema itself or the tables and views it contains. If input parameter table_name is empty, information about all schemas will be returned.

If the option get_sizes is set to true, then the number of records in each table is returned (in output parameter sizes and output parameter full_sizes), along with the total number of objects across all requested tables (in output parameter total_size and output parameter total_full_size).

For a schema, setting the show_children option to false returns only information about the schema itself; setting show_children to true returns a list of tables and views contained in the schema, along with their corresponding detail.

To retrieve a list of every table, view, and schema in the database, set input parameter table_name to ‘*’ and show_children to true. When doing this, the returned output parameter total_size and output parameter total_full_size will not include the sizes of non-base tables (e.g., filters, views, joins, etc.).

Parameters

table_name (str) –
Name of the table for which to retrieve the information, in [schema_name.]table_name format, using standard name resolution rules. If blank, then returns information about all tables and views.

options (dict of str to str) –
Optional parameters. Allowed keys are:

dependencies – Include view dependencies in the output. Allowed values are:

true

false

The default value is ‘false’.

force_synchronous – If true then the table sizes will wait for read lock before returning. Allowed values are:

true

false

The default value is ‘true’.

get_access_data – If true then data about the last read, write, alter and create will be returned. Allowed values are:

true

false

The default value is ‘false’.

get_cached_sizes – If true then the number of records in each table, along with a cumulative count, will be returned; blank, otherwise. This version will return the sizes cached at rank 0, which may be stale if there is a multihead insert occurring. Allowed values are:

true

false

The default value is ‘false’.

get_sizes – If true then the number of records in each table, along with a cumulative count, will be returned; blank, otherwise. Allowed values are:

true

false

The default value is ‘false’.

skip_additional_info – If true then the response will not populate the additional_info field. Allowed values are:

true

false

The default value is ‘false’.

no_error_if_not_exists – If false will return an error if the provided input parameter table_name does not exist. If true then it will return an empty result. Allowed values are:

true

false

The default value is ‘false’.

skip_temp_schemas – If true then the table list will not include tables from SYS_TEMP and other system temporary schemas. This is the default behavior for non-admin users. Allowed values are:

true

false

The default value is ‘false’.

show_children – If input parameter table_name is a schema, then true will return information about the tables and views in the schema, and false will return information about the schema itself. If input parameter table_name is a table or view, show_children must be false. If input parameter table_name is empty, then show_children must be true. Allowed values are:

true

false

The default value is ‘true’.

get_column_info – If true then column info (memory usage, etc) will be returned. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_name (str) –
Value of input parameter table_name.

table_names (list of str) –
If input parameter table_name is a table or view, then the single element of the array is input parameter table_name. If input parameter table_name is a schema and show_children is set to true, then this array is populated with the names of all tables and views in the given schema; if show_children is false, then this array will only include the schema name itself. If input parameter table_name is an empty string, then the array contains the names of all tables in the user’s default schema.

table_descriptions (list of lists of str) –
List of descriptions for the respective tables in output parameter table_names. Allowed values are:

COLLECTION

JOIN

LOGICAL_EXTERNAL_TABLE

LOGICAL_VIEW

MATERIALIZED_EXTERNAL_TABLE

MATERIALIZED_VIEW

MATERIALIZED_VIEW_MEMBER

MATERIALIZED_VIEW_UNDER_CONSTRUCTION

REPLICATED

RESULT_TABLE

SCHEMA

VIEW

type_ids (list of str) –
Type IDs of the respective tables in output parameter table_names.

type_schemas (list of str) –
Type schemas of the respective tables in output parameter table_names.

type_labels (list of str) –
Type labels of the respective tables in output parameter table_names.

properties (list of dicts of str to lists of str) –
Property maps of the respective tables in output parameter table_names.

additional_info (list of dicts of str to str) –
Additional information about the respective tables in output parameter table_names. Allowed keys are:

request_avro_type – Method by which this table was created. Allowed values are:

create_table

create_projection

create_union

request_avro_json – The JSON representation of request creating this table. The default value is ‘’.

protected – No longer used. Indicated whether the respective table was protected or not. Allowed values are:

true

false

record_bytes – The number of in-memory bytes per record which is the sum of the byte sizes of all columns with property ‘data’.

total_bytes – The total size in bytes of all data stored in the table.

collection_names – [DEPRECATED–use schema_name instead] This will now contain the name of the schema for the table. There can only be one schema for a table.

schema_name – The name of the schema for the table. There can only be one schema for a table.

table_ttl – The value of the time-to-live setting. Not present for schemas.

remaining_table_ttl – The remaining time-to-live, in minutes, before the respective table expires (-1 if it will never expire). Not present for schemas.

primary_key_type – The primary key type of the table (if it has a primary key). Allowed values are:

memory – In-memory primary key

disk – On-disk primary key

foreign_keys – Semicolon-separated list of foreign keys, of the format ‘source_column references target_table(primary_key_column)’. Not present for schemas. The default value is ‘’.

foreign_shard_key – Foreign shard key description of the format: <fk_foreign_key> references <pk_column_name> from <pk_table_name>(<pk_primary_key>). Not present for schemas. The default value is ‘’.

partition_type – Partitioning scheme used for this table. Allowed values are:

RANGE – Using range partitioning

INTERVAL – Using interval partitioning

LIST – Using manual list partitioning

HASH – Using hash partitioning.

SERIES – Using series partitioning.

NONE – Using no partitioning

The default value is ‘NONE’.

partition_keys – Comma-separated list of partition keys. The default value is ‘’.

partition_definitions – Comma-separated list of partition definitions, whose format depends on the partition_type. See partitioning documentation for details. The default value is ‘’.

is_automatic_partition – True if partitions will be created for LIST VALUES which don’t fall into existing partitions. The default value is ‘’.

attribute_indexes – Semicolon-separated list of indexes. For column (attribute) indexes, only the indexed column name will be listed. For other index types, the index type will be listed with the colon-delimited indexed column(s) and the comma-delimited index option(s) using the form: <index_type>@<column_list>@<column_options>. Not present for schemas. The default value is ‘’.

column_info – JSON-encoded string representing a map of column name to information including memory usage if the get_column_info option is true. The default value is ‘’.

global_access_mode – Returns the global access mode (i.e. lock status) for the table. Allowed values are:

no_access – No read/write operations are allowed on this table.

read_only – Only read operations are allowed on this table.

write_only – Only write operations are allowed on this table.

read_write – All read/write operations are allowed on this table.

view_table_name – For materialized view the name of the view this member table is part of - if same as the table_name then this is the root of the view. The default value is ‘’.

is_view_persisted – True if the view named view_table_name is persisted - reported for each view member. Means method of recreating this member is saved - not the members data. The default value is ‘’.

is_dirty – True if some input table of the materialized view that affects this member table has been modified since the last refresh. The default value is ‘’.

refresh_method – For materialized view current refresh_method - one of manual, periodic, on_change. The default value is ‘’.

refresh_start_time – For materialized view with periodic refresh_method the initial datetime string that periodic refreshes began. The default value is ‘’.

refresh_stop_time – Time at which the periodic view refresh stops. The default value is ‘’.

refresh_period – For materialized view with periodic refresh_method the current refresh period in seconds. The default value is ‘’.

last_refresh_time – For materialized view the datetime string indicating the last time the view was refreshed. The default value is ‘’.

next_refresh_time – For materialized view with periodic refresh_method a datetime string indicating the next time the view is to be refreshed. The default value is ‘’.

user_chunk_size – User-specified number of records per chunk, if provided at table creation time. The default value is ‘’.

user_chunk_column_max_memory – User-specified target max bytes per column in a chunk, if provided at table creation time. The default value is ‘’.

user_chunk_max_memory – User-specified target max bytes for all columns in a chunk, if provided at table creation time. The default value is ‘’.

owner_resource_group – Name of the owner resource group. The default value is ‘’.

alternate_shard_keys – Semicolon-separated list of shard keys that were equated in joins (applicable for join tables). The default value is ‘’.

datasource_subscriptions – Semicolon-separated list of datasource names the table has subscribed to. The default value is ‘’.

null_modifying_columns – Comma-separated list of null modifying column names. The default value is ‘’.

compression_codec – Default compression codec for the table. The default value is ‘’.

created_by – User that created this table or view. The default value is ‘’.

created_time – Time (UTC) when this table or view was created. The default value is ‘’.

last_read_by – User that last read this table or view. The default value is ‘’.

last_read_time – Time (UTC) when this table or view was last read. The default value is ‘’.

read_count – Count of times this table or view was read. The default value is ‘’.

last_write_by – User that last wrote to this table. The default value is ‘’.

last_write_time – Time (UTC) when this table was last written. The default value is ‘’.

write_count – Count of times this table was written. The default value is ‘’.

last_alter_by – User that last altered this table or view. The default value is ‘’.

last_alter_time – Time (UTC) when this table or view was last altered. The default value is ‘’.

alter_count – Count of times this table or view was altered. The default value is ‘’.

sizes (list of longs) –
If get_sizes is true, an array containing the number of records of each corresponding table in output parameter table_names. Otherwise, an empty array.

full_sizes (list of longs) –
If get_sizes is true, an array containing the number of records of each corresponding table in output parameter table_names (same values as output parameter sizes). Otherwise, an empty array.

join_sizes (list of floats) –
If get_sizes is true, an array containing the number of unfiltered records in the cross product of the sub-tables of each corresponding join-table in output parameter table_names. For simple tables, this number will be the same as output parameter sizes. For join-tables, this value gives the number of joined-table rows that must be processed by any aggregate functions operating on the table. Otherwise, (if get_sizes is false), an empty array.

total_size (long) –
If get_sizes is true, the sum of the elements of output parameter sizes. Otherwise, -1.

total_full_size (long) –
If get_sizes is true, the sum of the elements of output parameter full_sizes (same value as output parameter total_size). Otherwise, -1.

info (dict of str to str) –
Additional information.

show_table_metadata(table_names=None, options={})[source]

Retrieves the user provided metadata for the specified tables.

Parameters

table_names (list of str) –
Names of tables whose metadata will be fetched, in [schema_name.]table_name format, using standard name resolution rules. All provided tables must exist, or an error is returned. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_names (list of str) –
Value of input parameter table_names.

metadata_maps (list of dicts of str to str) –
A list of maps which contain the metadata of the tables in the order the tables are listed in input parameter table_names. Each map has (metadata attribute name, metadata attribute value) pairs.

info (dict of str to str) –
Additional information.

show_table_monitors(monitor_ids=None, options={})[source]

Show table monitors and their properties. Table monitors are created using GPUdb.create_table_monitor(). Returns detailed information about existing table monitors.

Parameters

monitor_ids (list of str) –
List of monitors to be shown. An empty list or a single entry with an empty string returns all table monitors. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

monitor_ids (list of str) –
List of monitor IDs.

table_names (list of str) –
List of source tables being monitored for the respective output parameter monitor_ids.

events (list of str) –
List of notification events for the respective output parameter monitor_ids.

increasing_columns (list of str) –
List of columns used on the respective tables in output parameter table_names that will increase for new records.

filter_expressions (list of str) –
List of filter expressions used on the respective tables in output parameter table_names to limit records for notifications.

join_table_names (list of str) –
List of join_table_names.

join_column_names (list of str) –
List of join_column_names

join_expressions (list of str) –
List of join expressions.

refresh_method (list of str) –
List of refresh methods used on the respective tables in output parameter table_names.

refresh_period (list of str) –
List of refresh periods used on the respective tables in output parameter table_names.

refresh_start_time (list of str) –
List of refresh start times used on the respective tables in output parameter table_names.

datasink_names (list of str) –
List of datasink names for the respective output parameter monitor_ids if one is defined.

additional_info (list of dicts of str to str) –
Additional information about the respective monitors in output parameter monitor_ids. Allowed keys are:

monitor_type – Notification type for the respective output parameter monitor_ids and output parameter table_names. The default value is ‘’.

type_schema – Notification type schemas for the respective output parameter monitor_ids and output parameter table_names. The default value is ‘’.

materialized_view_for_change_detector – Materialized view that implements the change detector

materialized_view_for_filter – Materialized views created for the output parameter filter_expressions. The default value is ‘’.

references – Reference count on the respective output parameter monitor_ids. The default value is ‘’.

datasink_json – Datasink info in JSON format for the respective output parameter monitor_ids if one is defined. The default value is ‘’.

info (dict of str to str) –
Additional information.

show_tables_by_type(type_id=None, label=None, options={})[source]

Gets names of the tables whose type matches the given criteria. Each table has a particular type. This type comprises the schema and properties of the table and sometimes a type label. This function allows a look up of the existing tables based on full or partial type information. The operation is synchronous.

Parameters

type_id (str) –
Type id returned by a call to GPUdb.create_type().

label (str) –
Optional user supplied label which can be used instead of the type_id to retrieve all tables with the given label.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_names (list of str) –
List of tables matching the input criteria.

info (dict of str to str) –
Additional information.

show_triggers(trigger_ids=None, options={})[source]

Retrieves information regarding the specified triggers or all existing triggers currently active.

Parameters

trigger_ids (list of str) –
List of IDs of the triggers whose information is to be retrieved. An empty list means information will be retrieved on all active triggers. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

trigger_map (dict of str to dicts of str to str) –
This dictionary contains (key, value) pairs of (trigger ID, information map/dictionary) where the key is a Unicode string representing a Trigger ID. The value is another embedded dictionary containing (key, value) pairs where the keys consist of ‘table_name’, ‘type’ and the parameter names relating to the trigger type, e.g. nai, min, max. The values are unicode strings (numeric values are also converted to strings) representing the value of the respective parameter. If a trigger is associated with multiple tables, then the string value for table_name contains a comma separated list of table names.

info (dict of str to str) –
Additional information.

show_types(type_id=None, label=None, options={})[source]

Retrieves information for the specified data type ID or type label. For all data types that match the input criteria, the database returns the type ID, the type schema, the label (if available), and the type’s column properties.

Parameters

type_id (str) –
Type Id returned in response to a call to GPUdb.create_type().

label (str) –
Option string that was supplied by user in a call to GPUdb.create_type().

options (dict of str to str) –
Optional parameters. Allowed keys are:

no_join_types – When set to ‘true’, no join types will be included. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

type_ids (list of str)

type_schemas (list of str)

labels (list of str)

properties (list of dicts of str to lists of str)

info (dict of str to str) –
Additional information.

show_video(paths=None, options={})[source]

Retrieves information about rendered videos.

Parameters

paths (list of str) –
The fully-qualified KiFS paths for the videos to show. If empty, shows all videos. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

creation_times (list of str) –
Creation time for each video as an ISO-8601 datetime.

elapsed_render_time_seconds (list of longs) –
The elapsed time spent rendering each video in seconds.

job_ids (list of longs) –
The job id of the rendering process, for each video that is still being rendered.

paths (list of str) –
KIFS path to each video.

rendered_bytes (list of longs) –
The number of bytes emitted by the encoder for each video.

rendered_frames (list of longs) –
The number of frames rendered for each video.

rendered_percents (list of longs) –
Percent completion of each video’s rendering process (0-100)

requests (list of str) –
JSON-string reflecting each video’s creation parameters.

status (list of str) –
The status of the last rendered frame for each video. Either OK or Error with a message indicating the nature of the error.

ttls (list of longs) –
The remaining TTL, in minutes, before the respective video expires (-1 if it will never expire).

info (dict of str to str) –
Additional information.

show_wal(table_names=None, options={})[source]

Requests table write-ahead log (WAL) properties. Returns information about the requested table WAL entries.

Parameters

table_names (list of str) –
List of tables to query. An asterisk returns all tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

show_settings – If true include a map of the WAL settings for the requested tables. Allowed values are:

true

false

The default value is ‘true’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

table_names (list of str) –
List of returned tables.

sizes (list of lists of longs) –
List of current WAL usage.

capacities (list of longs) –
List of WAL capacities.

uncommitted (list of lists of longs) –
List of number of uncommitted entries.

settings (list of dicts of str to str) –
List of table WAL settings.

info (dict of str to str) –
Additional information.

solve_graph(graph_name=None, weights_on_edges=[], restrictions=[], solver_type='SHORTEST_PATH', source_nodes=[], destination_nodes=[], solution_table='graph_solutions', options={})[source]

Solves an existing graph for a type of problem (e.g., shortest path, page rank, traveling salesman, etc.) using source nodes, destination nodes, and additional, optional weights and restrictions.

IMPORTANT: It’s highly recommended that you review the Graphs & Solvers concepts documentation, the Graph REST Tutorial, and/or some /solve/graph examples before using this endpoint.

Parameters

graph_name (str) –
Name of the graph resource to solve.

weights_on_edges (list of str) –
Additional weights to apply to the edges of an existing graph. Weights must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS WEIGHTS_EDGE_ID’, expressions, e.g., ‘ST_LENGTH(wkt) AS WEIGHTS_VALUESPECIFIED’, or constant values, e.g., ‘{4, 15, 2} AS WEIGHTS_VALUESPECIFIED’. Any provided weights will be added (in the case of ‘WEIGHTS_VALUESPECIFIED’) to or multiplied with (in the case of ‘WEIGHTS_FACTORSPECIFIED’) the existing weight(s). If using constant values in an identifier combination, the number of values specified must match across the combination. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

restrictions (list of str) –
Additional restrictions to apply to the nodes/edges of an existing graph. Restrictions must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS RESTRICTIONS_EDGE_ID’, expressions, e.g., ‘column/2 AS RESTRICTIONS_VALUECOMPARED’, or constant values, e.g., ‘{0, 0, 0, 1} AS RESTRICTIONS_ONOFFCOMPARED’. If using constant values in an identifier combination, the number of values specified must match across the combination. If remove_previous_restrictions option is set to true, any provided restrictions will replace the existing restrictions. Otherwise, any provided restrictions will be added (in the case of ‘RESTRICTIONS_VALUECOMPARED’) to or replaced (in the case of ‘RESTRICTIONS_ONOFFCOMPARED’). The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

solver_type (str) –
The type of solver to use for the graph. Allowed values are:

SHORTEST_PATH – Solves for the optimal (shortest) path based on weights and restrictions from one source to destinations nodes. Also known as the Dijkstra solver.

PAGE_RANK – Solves for the probability of each destination node being visited based on the links of the graph topology. Weights are not required to use this solver.

PROBABILITY_RANK – Solves for the transitional probability (Hidden Markov) for each node based on the weights (probability assigned over given edges).

CENTRALITY – Solves for the degree of a node to depict how many pairs of individuals that would have to go through the node to reach one another in the minimum number of hops. Also known as betweenness.

MULTIPLE_ROUTING – Solves for finding the minimum cost cumulative path for a round-trip starting from the given source and visiting each given destination node once then returning to the source. Also known as the traveling salesman problem.

INVERSE_SHORTEST_PATH – Solves for finding the optimal path cost for each destination node to route to the source node. Also known as inverse Dijkstra or the service man routing problem.

BACKHAUL_ROUTING – Solves for optimal routes that connect remote asset nodes to the fixed (backbone) asset nodes.

ALLPATHS – Solves for paths that would give costs between max and min solution radia - Make sure to limit by the ‘max_solution_targets’ option. Min cost should be >= shortest_path cost.

STATS_ALL – Solves for graph statistics such as graph diameter, longest pairs, vertex valences, topology numbers, average and max cluster sizes, etc.

CLOSENESS – Solves for the centrality closeness score per node as the sum of the inverse shortest path costs to all nodes in the graph.

The default value is ‘SHORTEST_PATH’.

source_nodes (list of str) –
It can be one of the nodal identifiers - e.g: ‘NODE_WKTPOINT’ for source nodes. For BACKHAUL_ROUTING, this list depicts the fixed assets. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

destination_nodes (list of str) –
It can be one of the nodal identifiers - e.g: ‘NODE_WKTPOINT’ for destination (target) nodes. For BACKHAUL_ROUTING, this list depicts the remote assets. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

solution_table (str) –
Name of the table to store the solution, in [schema_name.]table_name format, using standard name resolution rules. The default value is ‘graph_solutions’.

options (dict of str to str) –
Additional parameters. Allowed keys are:

max_solution_radius – For ALLPATHS, SHORTEST_PATH and INVERSE_SHORTEST_PATH solvers only. Sets the maximum solution cost radius, which ignores the input parameter destination_nodes list and instead outputs the nodes within the radius sorted by ascending cost. If set to ‘0.0’, the setting is ignored. The default value is ‘0.0’.

min_solution_radius – For ALLPATHS, SHORTEST_PATH and INVERSE_SHORTEST_PATH solvers only. Applicable only when max_solution_radius is set. Sets the minimum solution cost radius, which ignores the input parameter destination_nodes list and instead outputs the nodes within the radius sorted by ascending cost. If set to ‘0.0’, the setting is ignored. The default value is ‘0.0’.

max_solution_targets – For ALLPATHS, SHORTEST_PATH and INVERSE_SHORTEST_PATH solvers only. Sets the maximum number of solution targets, which ignores the input parameter destination_nodes list and instead outputs no more than n number of nodes sorted by ascending cost where n is equal to the setting value. If set to 0, the setting is ignored. The default value is ‘1000’.

uniform_weights – When specified, assigns the given value to all the edges in the graph. Note that weights provided in input parameter weights_on_edges will override this value.

left_turn_penalty – This will add an additional weight over the edges labeled as ‘left turn’ if the ‘add_turn’ option parameter of the GPUdb.create_graph() was invoked at graph creation. The default value is ‘0.0’.

right_turn_penalty – This will add an additional weight over the edges labeled as’ right turn’ if the ‘add_turn’ option parameter of the GPUdb.create_graph() was invoked at graph creation. The default value is ‘0.0’.

intersection_penalty – This will add an additional weight over the edges labeled as ‘intersection’ if the ‘add_turn’ option parameter of the GPUdb.create_graph() was invoked at graph creation. The default value is ‘0.0’.

sharp_turn_penalty – This will add an additional weight over the edges labeled as ‘sharp turn’ or ‘u-turn’ if the ‘add_turn’ option parameter of the GPUdb.create_graph() was invoked at graph creation. The default value is ‘0.0’.

num_best_paths – For MULTIPLE_ROUTING solvers only; sets the number of shortest paths computed from each node. This is the heuristic criterion. Default value of zero allows the number to be computed automatically by the solver. The user may want to override this parameter to speed-up the solver. The default value is ‘0’.

max_num_combinations – For MULTIPLE_ROUTING solvers only; sets the cap on the combinatorial sequences generated. If the default value of two millions is overridden to a lesser value, it can potentially speed up the solver. The default value is ‘2000000’.

output_edge_path – If true then concatenated edge IDs will be added as the EDGE path column of the solution table for each source and target pair in shortest path solves. Allowed values are:

true

false

The default value is ‘false’.

output_wkt_path – If true then concatenated wkt line segments will be added as the Wktroute column of the solution table for each source and target pair in shortest path solves. Allowed values are:

true

false

The default value is ‘true’.

server_id – Indicates which graph server(s) to send the request to. Default is to send to the server, amongst those containing the corresponding graph, that has the most computational bandwidth. For SHORTEST_PATH solver type, the input is split amongst the server containing the corresponding graph.

convergence_limit – For PAGE_RANK solvers only; Maximum percent relative threshold on the page rank scores of each node between consecutive iterations to satisfy convergence. Default value is 1 (one) percent. The default value is ‘1.0’.

max_iterations – For PAGE_RANK solvers only; Maximum number of page rank iterations for satisfying convergence. Default value is 100. The default value is ‘100’.

max_runs – For all CENTRALITY solvers only; Sets the maximum number of shortest path runs; maximum possible value is the number of nodes in the graph. Default value of 0 enables this value to be auto computed by the solver. The default value is ‘0’.

output_clusters – For STATS_ALL solvers only; the cluster index for each node will be inserted as an additional column in the output. Allowed values are:

true – An additional column ‘CLUSTER’ will be added for each node

false – No extra cluster info per node will be available in the output

The default value is ‘false’.

solve_heuristic – Specify heuristic search criterion only for the geo graphs and shortest path solves towards a single target. Allowed values are:

astar – Employs A-STAR heuristics to speed up the shortest path traversal

none – No heuristics are applied

The default value is ‘none’.

astar_radius – For path solvers only when ‘solve_heuristic’ option is ‘astar’. The shortest path traversal front includes nodes only within this radius (kilometers) as it moves towards the target location. The default value is ‘70’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

result (bool) –
Indicates a successful solution on all servers.

result_per_destination_node (list of floats) –
Cost or page rank (based on solver type) for each destination node requested. Only populated if ‘export_solve_results’ option is set to true.

info (dict of str to str) –
Additional information.

update_records(table_name=None, expressions=None, new_values_maps=None, records_to_insert=[], records_to_insert_str=[], record_encoding='binary', options={}, record_type=None)[source]

Runs multiple predicate-based updates in a single call. With the list of given expressions, any matching record’s column values will be updated as provided in input parameter new_values_maps. There is also an optional ‘upsert’ capability where if a particular predicate doesn’t match any existing record, then a new record can be inserted.

Note that this operation can only be run on an original table and not on a result view.

This operation can update primary key values. By default only ‘pure primary key’ predicates are allowed when updating primary key values. If the primary key for a table is the column ‘attr1’, then the operation will only accept predicates of the form: “attr1 == ‘foo’” if the attr1 column is being updated. For a composite primary key (e.g. columns ‘attr1’ and ‘attr2’) then this operation will only accept predicates of the form: “(attr1 == ‘foo’) and (attr2 == ‘bar’)”. Meaning, all primary key columns must appear in an equality predicate in the expressions. Furthermore each ‘pure primary key’ predicate must be unique within a given request. These restrictions can be removed by utilizing some available options through input parameter options.

The update_on_existing_pk option specifies the record primary key collision policy for tables with a primary key, while ignore_existing_pk specifies the record primary key collision error-suppression policy when those collisions result in the update being rejected. Both are ignored on tables with no primary key.

Parameters

table_name (str) –
Name of table to be updated, in [schema_name.]table_name format, using standard name resolution rules. Must be a currently existing table and not a view.

expressions (list of str) –
A list of the actual predicates, one for each update; format should follow the guidelines here. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

new_values_maps (list of dicts of str to optional str) –
List of new values for the matching records. Each element is a map with (key, value) pairs where the keys are the names of the columns whose values are to be updated; the values are the new values. The number of elements in the list should match the length of input parameter expressions. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

records_to_insert (list of bytes) –
An optional list of new binary-avro encoded records to insert, one for each update. If one of input parameter expressions does not yield a matching record to be updated, then the corresponding element from this list will be added to the table. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

records_to_insert_str (list of str) –
An optional list of JSON encoded objects to insert, one for each update, to be added if the particular update did not match any objects. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

record_encoding (str) –
Identifies which of input parameter records_to_insert and input parameter records_to_insert_str should be used. Allowed values are:

binary

json

The default value is ‘binary’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

global_expression – An optional global expression to reduce the search space of the predicates listed in input parameter expressions. The default value is ‘’.

bypass_safety_checks – When set to true, all predicates are available for primary key updates. Keep in mind that it is possible to destroy data in this case, since a single predicate may match multiple objects (potentially all of records of a table), and then updating all of those records to have the same primary key will, due to the primary key uniqueness constraints, effectively delete all but one of those updated records. Allowed values are:

true

false

The default value is ‘false’.

update_on_existing_pk – Specifies the record collision policy for updating a table with a primary key. There are two ways that a record collision can occur.

The first is an “update collision”, which happens when the update changes the value of the updated record’s primary key, and that new primary key already exists as the primary key of another record in the table.

The second is an “insert collision”, which occurs when a given filter in input parameter expressions finds no records to update, and the alternate insert record given in input parameter records_to_insert (or input parameter records_to_insert_str) contains a primary key matching that of an existing record in the table.

If update_on_existing_pk is set to true, “update collisions” will result in the existing record collided into being removed and the record updated with values specified in input parameter new_values_maps taking its place; “insert collisions” will result in the collided-into record being updated with the values in input parameter records_to_insert / input parameter records_to_insert_str (if given).

If set to false, the existing collided-into record will remain unchanged, while the update will be rejected and the error handled as determined by ignore_existing_pk. If the specified table does not have a primary key, then this option has no effect. Allowed values are:

true – Overwrite the collided-into record when updating a record’s primary key or inserting an alternate record causes a primary key collision between the record being updated/inserted and another existing record in the table

false – Reject updates which cause primary key collisions between the record being updated/inserted and an existing record in the table

The default value is ‘false’.

ignore_existing_pk – Specifies the record collision error-suppression policy for updating a table with a primary key, only used when primary key record collisions are rejected (update_on_existing_pk is false). If set to true, any record update that is rejected for resulting in a primary key collision with an existing table record will be ignored with no error generated. If false, the rejection of any update for resulting in a primary key collision will cause an error to be reported. If the specified table does not have a primary key or if update_on_existing_pk is true, then this option has no effect. Allowed values are:

true – Ignore updates that result in primary key collisions with existing records

false – Treat as errors any updates that result in primary key collisions with existing records

The default value is ‘false’.

update_partition – Force qualifying records to be deleted and reinserted so their partition membership will be reevaluated. Allowed values are:

true

false

The default value is ‘false’.

truncate_strings – If set to true, any strings which are too long for their charN string fields will be truncated to fit. Allowed values are:

true

false

The default value is ‘false’.

use_expressions_in_new_values_maps – When set to true, all new values in input parameter new_values_maps are considered as expression values. When set to false, all new values in input parameter new_values_maps are considered as constants. NOTE: When true, string constants will need to be quoted to avoid being evaluated as expressions. Allowed values are:

true

false

The default value is ‘false’.

record_id – ID of a single record to be updated (returned in the call to GPUdb.insert_records() or GPUdb.get_records_from_collection()).

The default value is an empty dict ( {} ).

record_type (RecordType) –
A RecordType object using which the binary data will be encoded. If None, then it is assumed that the data is already encoded, and no further encoding will occur. Default is None.

Returns

A dict with the following entries–

count_updated (long) –
Total number of records updated.

counts_updated (list of longs) –
Total number of records updated per predicate in input parameter expressions.

count_inserted (long) –
Total number of records inserted (due to expressions not matching any existing records).

counts_inserted (list of longs) –
Total number of records inserted per predicate in input parameter expressions (will be either 0 or 1 for each expression).

info (dict of str to str) –
Additional information.

update_records_by_series(table_name=None, world_table_name=None, view_name='', reserved=[], options={})[source]

Updates the view specified by input parameter table_name to include full series (track) information from the input parameter world_table_name for the series (tracks) present in the input parameter view_name.

Parameters

table_name (str) –
Name of the view on which the update operation will be performed, in [schema_name.]view_name format, using standard name resolution rules. Must be an existing view.

world_table_name (str) –
Name of the table containing the complete series (track) information, in [schema_name.]table_name format, using standard name resolution rules.

view_name (str) –
Name of the view containing the series (tracks) which have to be updated, in [schema_name.]view_name format, using standard name resolution rules. The default value is ‘’.

reserved (list of str) –
The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

count (int)

info (dict of str to str) –
Additional information.

upload_files(file_names=None, file_data=None, options={})[source]

Uploads one or more files to KiFS. There are two methods for uploading files: load files in their entirety, or load files in parts. The latter is recommended for files of approximately 60 MB or larger.

To upload files in their entirety, populate input parameter file_names with the file names to upload into on KiFS, and their respective byte content in input parameter file_data.

Multiple steps are involved when uploading in multiple parts. Only one file at a time can be uploaded in this manner. A user-provided UUID is utilized to tie all the upload steps together for a given file. To upload a file in multiple parts:

Provide the file name in input parameter file_names, the UUID in the multipart_upload_uuid key in input parameter options, and a multipart_operation value of init.
Upload one or more parts by providing the file name, the part data in input parameter file_data, the UUID, a multipart_operation value of upload_part, and the part number in the multipart_upload_part_number. The part numbers must start at 1 and increase incrementally. Parts may not be uploaded out of order.
Complete the upload by providing the file name, the UUID, and a multipart_operation value of complete.

Multipart uploads in progress may be canceled by providing the file name, the UUID, and a multipart_operation value of cancel. If an new upload is initialized with a different UUID for an existing upload in progress, the pre-existing upload is automatically canceled in favor of the new upload.

The multipart upload must be completed for the file to be usable in KiFS. Information about multipart uploads in progress is available in GPUdb.show_files().

File data may be pre-encoded using base64 encoding. This should be indicated using the file_encoding option, and is recommended when using JSON serialization.

Each file path must reside in a top-level KiFS directory, i.e. one of the directories listed in GPUdb.show_directories(). The user must have write permission on the directory. Nested directories are permitted in file name paths. Directories are delineated with the directory separator of ‘/’. For example, given the file path ‘/a/b/c/d.txt’, ‘a’ must be a KiFS directory.

These characters are allowed in file name paths: letters, numbers, spaces, the path delimiter of ‘/’, and the characters: ‘.’ ‘-’ ‘:’ ‘[’ ‘]’ ‘(’ ‘)’ ‘#’ ‘=’.

Parameters

file_names (list of str) –
An array of full file name paths to be used for the files uploaded to KiFS. File names may have any number of nested directories in their paths, but the top-level directory must be an existing KiFS directory. Each file must reside in or under a top-level directory. A full file name path cannot be larger than 1024 characters. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

file_data (list of bytes) –
File data for the files being uploaded, for the respective files in input parameter file_names. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. Allowed keys are:

file_encoding – Encoding that has been applied to the uploaded file data. When using JSON serialization it is recommended to utilize base64. The caller is responsible for encoding the data provided in this payload. Allowed values are:

base64 – Specifies that the file data being uploaded has been base64 encoded.

none – The uploaded file data has not been encoded.

The default value is ‘none’.

multipart_operation – Multipart upload operation to perform. Allowed values are:

none – Default, indicates this is not a multipart upload

init – Initialize a multipart file upload

upload_part – Uploads a part of the specified multipart file upload

complete – Complete the specified multipart file upload

cancel – Cancel the specified multipart file upload

The default value is ‘none’.

multipart_upload_uuid – UUID to uniquely identify a multipart upload

multipart_upload_part_number – Incremental part number for each part in a multipart upload. Part numbers start at 1, increment by 1, and must be uploaded sequentially

delete_if_exists – If true, any existing files specified in input parameter file_names will be deleted prior to start of upload. Otherwise the file is replaced once the upload completes. Rollback of the original file is no longer possible if the upload is cancelled, aborted or fails if the file was deleted beforehand. Allowed values are:

true

false

The default value is ‘false’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

info (dict of str to str) –
Additional information.

upload_files_fromurl(file_names=None, urls=None, options={})[source]

Uploads one or more files to KiFS.

Each file path must reside in a top-level KiFS directory, i.e. one of the directories listed in GPUdb.show_directories(). The user must have write permission on the directory. Nested directories are permitted in file name paths. Directories are delineated with the directory separator of ‘/’. For example, given the file path ‘/a/b/c/d.txt’, ‘a’ must be a KiFS directory.

These characters are allowed in file name paths: letters, numbers, spaces, the path delimiter of ‘/’, and the characters: ‘.’ ‘-’ ‘:’ ‘[’ ‘]’ ‘(’ ‘)’ ‘#’ ‘=’.

Parameters

file_names (list of str) –
An array of full file name paths to be used for the files uploaded to KiFS. File names may have any number of nested directories in their paths, but the top-level directory must be an existing KiFS directory. Each file must reside in or under a top-level directory. A full file name path cannot be larger than 1024 characters. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

urls (list of str) –
List of URLs to upload, for each respective file in input parameter file_names. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

successful_file_names (list of str) –
List of input parameter file_names that were successfully uploaded.

successful_urls (list of str) –
List of input parameter urls that were successfully uploaded.

info (dict of str to str) –
Additional information.

visualize_image_chart(table_name=None, x_column_names=None, y_column_names=None, min_x=None, max_x=None, min_y=None, max_y=None, width=None, height=None, bg_color=None, style_options=None, options={})[source]

Scatter plot is the only plot type currently supported. A non-numeric column can be specified as x or y column and jitters can be added to them to avoid excessive overlapping. All color values must be in the format RRGGBB or AARRGGBB (to specify the alpha value). The image is contained in the output parameter image_data field.

Parameters

table_name (str) –
Name of the table containing the data to be drawn as a chart, in [schema_name.]table_name format, using standard name resolution rules.

x_column_names (list of str) –
Names of the columns containing the data mapped to the x axis of a chart. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

y_column_names (list of str) –
Names of the columns containing the data mapped to the y axis of a chart. The user can provide a single element (which will be automatically promoted to a list internally) or a list.

min_x (float) –
Lower bound for the x column values. For non-numeric x column, each x column item is mapped to an integral value starting from 0.

max_x (float) –
Upper bound for the x column values. For non-numeric x column, each x column item is mapped to an integral value starting from 0.

min_y (float) –
Lower bound for the y column values. For non-numeric y column, each y column item is mapped to an integral value starting from 0.

max_y (float) –
Upper bound for the y column values. For non-numeric y column, each y column item is mapped to an integral value starting from 0.

width (int) –
Width of the generated image in pixels.

height (int) –
Height of the generated image in pixels.

bg_color (str) –
Background color of the generated image.

style_options (dict of str to lists of str) –
Rendering style options for a chart. Allowed keys are:

pointcolor – The color of points in the plot represented as a hexadecimal number. The default value is ‘0000FF’.

pointsize – The size of points in the plot represented as number of pixels. The default value is ‘3’.

pointshape – The shape of points in the plot. Allowed values are:

none

circle

square

diamond

hollowcircle

hollowsquare

hollowdiamond

The default value is ‘square’.

cb_pointcolors – Point color class break information consisting of three entries: class-break attribute, class-break values/ranges, and point color values. This option overrides the pointcolor option if both are provided. Class-break ranges are represented in the form of “min:max”. Class-break values/ranges and point color values are separated by cb_delimiter, e.g. {“price”, “20:30;30:40;40:50”, “0xFF0000;0x00FF00;0x0000FF”}.

cb_pointsizes – Point size class break information consisting of three entries: class-break attribute, class-break values/ranges, and point size values. This option overrides the pointsize option if both are provided. Class-break ranges are represented in the form of “min:max”. Class-break values/ranges and point size values are separated by cb_delimiter, e.g. {“states”, “NY;TX;CA”, “3;5;7”}.

cb_pointshapes – Point shape class break information consisting of three entries: class-break attribute, class-break values/ranges, and point shape names. This option overrides the pointshape option if both are provided. Class-break ranges are represented in the form of “min:max”. Class-break values/ranges and point shape names are separated by cb_delimiter, e.g. {“states”, “NY;TX;CA”, “circle;square;diamond”}.

cb_delimiter – A character or string which separates per-class values in a class-break style option string. The default value is ‘;’.

x_order_by – An expression or aggregate expression by which non-numeric x column values are sorted, e.g. “avg(price) descending”.

y_order_by – An expression or aggregate expression by which non-numeric y column values are sorted, e.g. “avg(price)”, which defaults to “avg(price) ascending”.

scale_type_x – Type of x axis scale. Allowed values are:

none – No scale is applied to the x axis.

log – A base-10 log scale is applied to the x axis.

The default value is ‘none’.

scale_type_y – Type of y axis scale. Allowed values are:

none – No scale is applied to the y axis.

log – A base-10 log scale is applied to the y axis.

The default value is ‘none’.

min_max_scaled – If this options is set to “false”, this endpoint expects request’s min/max values are not yet scaled. They will be scaled according to scale_type_x or scale_type_y for response. If this options is set to “true”, this endpoint expects request’s min/max values are already scaled according to scale_type_x/scale_type_y. Response’s min/max values will be equal to request’s min/max values. The default value is ‘false’.

jitter_x – Amplitude of horizontal jitter applied to non-numeric x column values. The default value is ‘0.0’.

jitter_y – Amplitude of vertical jitter applied to non-numeric y column values. The default value is ‘0.0’.

plot_all – If this options is set to “true”, all non-numeric column values are plotted ignoring min_x, max_x, min_y and max_y parameters. The default value is ‘false’.

options (dict of str to str) –
Optional parameters. Allowed keys are:

image_encoding – Encoding to be applied to the output image. When using JSON serialization it is recommended to specify this as base64. Allowed values are:

base64 – Apply base64 encoding to the output image.

none – Do not apply any additional encoding to the output image.

The default value is ‘none’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

min_x (float) –
Lower bound for the x column values as provided in input parameter min_x or calculated for non-numeric columns when plot_all option is used.

max_x (float) –
Upper bound for the x column values as provided in input parameter max_x or calculated for non-numeric columns when plot_all option is used.

min_y (float) –
Lower bound for the y column values as provided in input parameter min_y or calculated for non-numeric columns when plot_all option is used.

max_y (float) –
Upper bound for the y column values as provided in input parameter max_y or calculated for non-numeric columns when plot_all option is used.

width (int) –
Width of the image as provided in input parameter width.

height (int) –
Height of the image as provided in input parameter height.

bg_color (str) –
Background color of the image as provided in input parameter bg_color.

image_data (bytes) –
The generated image data.

axes_info (dict of str to lists of str) –
Information returned for drawing labels for the axes associated with non-numeric columns. Allowed keys are:

sorted_x_values – Sorted non-numeric x column value list for drawing x axis label.

location_x – X axis label positions of sorted_x_values in pixel coordinates.

sorted_y_values – Sorted non-numeric y column value list for drawing y axis label.

location_y – Y axis label positions of sorted_y_values in pixel coordinates.

info (dict of str to str) –
Additional information.

visualize_isochrone(graph_name=None, source_node=None, max_solution_radius=-1.0, weights_on_edges=[], restrictions=[], num_levels=1, generate_image=True, levels_table='', style_options=None, solve_options={}, contour_options={}, options={})[source]

Generate an image containing isolines for travel results using an existing graph. Isolines represent curves of equal cost, with cost typically referring to the time or distance assigned as the weights of the underlying graph. See Graphs & Solvers for more information on graphs.

Parameters

graph_name (str) –
Name of the graph on which the isochrone is to be computed.

source_node (str) –
Starting vertex on the underlying graph from/to which the isochrones are created.

max_solution_radius (float) –
Extent of the search radius around input parameter source_node. Set to ‘-1.0’ for unrestricted search radius. The default value is -1.0.

weights_on_edges (list of str) –
Additional weights to apply to the edges of an existing graph. Weights must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS WEIGHTS_EDGE_ID’, or expressions, e.g., ‘ST_LENGTH(wkt) AS WEIGHTS_VALUESPECIFIED’. Any provided weights will be added (in the case of ‘WEIGHTS_VALUESPECIFIED’) to or multiplied with (in the case of ‘WEIGHTS_FACTORSPECIFIED’) the existing weight(s). The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

restrictions (list of str) –
Additional restrictions to apply to the nodes/edges of an existing graph. Restrictions must be specified using identifiers; identifiers are grouped as combinations. Identifiers can be used with existing column names, e.g., ‘table.column AS RESTRICTIONS_EDGE_ID’, or expressions, e.g., ‘column/2 AS RESTRICTIONS_VALUECOMPARED’. If remove_previous_restrictions is set to true, any provided restrictions will replace the existing restrictions. If remove_previous_restrictions is set to false, any provided restrictions will be added (in the case of ‘RESTRICTIONS_VALUECOMPARED’) to or replaced (in the case of ‘RESTRICTIONS_ONOFFCOMPARED’). The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.

num_levels (int) –
Number of equally-separated isochrones to compute. The default value is 1.

generate_image (bool) –
If set to true, generates a PNG image of the isochrones in the response. Allowed values are:

True

False

The default value is True.

levels_table (str) –
Name of the table to output the isochrones to, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. The table will contain levels and their corresponding WKT geometry. If no value is provided, the table is not generated. The default value is ‘’.

style_options (dict of str to str) –
Various style related options of the isochrone image. Allowed keys are:

line_size – The width of the contour lines in pixels. The default value is ‘3’. The minimum allowed value is ‘0’. The maximum allowed value is ‘20’.

color – Color of generated isolines. All color values must be in the format RRGGBB or AARRGGBB (to specify the alpha value). If alpha is specified and flooded contours are enabled, it will be used for as the transparency of the latter. The default value is ‘FF696969’.

bg_color – When input parameter generate_image is set to true, background color of the generated image. All color values must be in the format RRGGBB or AARRGGBB (to specify the alpha value). The default value is ‘00000000’.

text_color – When add_labels is set to true, color for the labels. All color values must be in the format RRGGBB or AARRGGBB (to specify the alpha value). The default value is ‘FF000000’.

colormap – Colormap for contours or fill-in regions when applicable. All color values must be in the format RRGGBB or AARRGGBB (to specify the alpha value). Allowed values are:

jet

accent

afmhot

autumn

binary

blues

bone

brbg

brg

bugn

bupu

bwr

cmrmap

cool

coolwarm

copper

cubehelix

dark2

flag

gist_earth

gist_gray

gist_heat

gist_ncar

gist_rainbow

gist_stern

gist_yarg

gnbu

gnuplot2

gnuplot

gray

greens

greys

hot

hsv

inferno

magma

nipy_spectral

ocean

oranges

orrd

paired

pastel1

pastel2

pink

piyg

plasma

prgn

prism

pubu

pubugn

puor

purd

purples

rainbow

rdbu

rdgy

rdpu

rdylbu

rdylgn

reds

seismic

set1

set2

set3

spectral

spring

summer

terrain

viridis

winter

wistia

ylgn

ylgnbu

ylorbr

ylorrd

The default value is ‘jet’.

solve_options (dict of str to str) –
Solver specific parameters. Allowed keys are:

remove_previous_restrictions – Ignore the restrictions applied to the graph during the creation stage and only use the restrictions specified in this request if set to true. Allowed values are:

true

false

The default value is ‘false’.

restriction_threshold_value – Value-based restriction comparison. Any node or edge with a ‘RESTRICTIONS_VALUECOMPARED’ value greater than the restriction_threshold_value will not be included in the solution.

uniform_weights – When specified, assigns the given value to all the edges in the graph. Note that weights provided in input parameter weights_on_edges will override this value.

The default value is an empty dict ( {} ).

contour_options (dict of str to str) –
Solver specific parameters. Allowed keys are:

projection – Spatial Reference System (i.e. EPSG Code). Allowed values are:

3857

102100

900913

EPSG:4326

PLATE_CARREE

EPSG:900913

EPSG:102100

EPSG:3857

WEB_MERCATOR

The default value is ‘PLATE_CARREE’.

width – When input parameter generate_image is set to true, width of the generated image. The default value is ‘512’.

height – When input parameter generate_image is set to true, height of the generated image. If the default value is used, the height is set to the value resulting from multiplying the aspect ratio by the width. The default value is ‘-1’.

search_radius – When interpolating the graph solution to generate the isochrone, neighborhood of influence of sample data (in percent of the image/grid). The default value is ‘20’.

grid_size – When interpolating the graph solution to generate the isochrone, number of subdivisions along the x axis when building the grid (the y is computed using the aspect ratio of the output image). The default value is ‘100’.

color_isolines – Color each isoline according to the colormap; otherwise, use the foreground color. Allowed values are:

true

false

The default value is ‘true’.

add_labels – If set to true, add labels to the isolines. Allowed values are:

true

false

The default value is ‘false’.

labels_font_size – When add_labels is set to true, size of the font (in pixels) to use for labels. The default value is ‘12’.

labels_font_family – When add_labels is set to true, font name to be used when adding labels. The default value is ‘arial’.

labels_search_window – When add_labels is set to true, a search window is used to rate the local quality of each isoline. Smooth, continuous, long stretches with relatively flat angles are favored. The provided value is multiplied by the labels_font_size to calculate the final window size. The default value is ‘4’.

labels_intralevel_separation – When add_labels is set to true, this value determines the distance (in multiples of the labels_font_size) to use when separating labels of different values. The default value is ‘4’.

labels_interlevel_separation – When add_labels is set to true, this value determines the distance (in percent of the total window size) to use when separating labels of the same value. The default value is ‘20’.

labels_max_angle – When add_labels is set to true, maximum angle (in degrees) from the vertical to use when adding labels. The default value is ‘60’.

The default value is an empty dict ( {} ).

options (dict of str to str) –
Additional parameters. Allowed keys are:

solve_table – Name of the table to host intermediate solve results, in [schema_name.]table_name format, using standard name resolution rules and meeting table naming criteria. This table will contain the position and cost for each vertex in the graph. If the default value is used, a temporary table is created and deleted once the solution is calculated. The default value is ‘’.

is_replicated – If set to true, replicate the solve_table. Allowed values are:

true

false

The default value is ‘true’.

data_min_x – Lower bound for the x values. If not provided, it will be computed from the bounds of the input data.

data_max_x – Upper bound for the x values. If not provided, it will be computed from the bounds of the input data.

data_min_y – Lower bound for the y values. If not provided, it will be computed from the bounds of the input data.

data_max_y – Upper bound for the y values. If not provided, it will be computed from the bounds of the input data.

concavity_level – Factor to qualify the concavity of the isochrone curves. The lower the value, the more convex (with ‘0’ being completely convex and ‘1’ being the most concave). The default value is ‘0.5’. The minimum allowed value is ‘0’. The maximum allowed value is ‘1’.

use_priority_queue_solvers – sets the solver methods explicitly if true. Allowed values are:

true – uses the solvers scheduled for ‘shortest_path’ and ‘inverse_shortest_path’ based on solve_direction

false – uses the solvers ‘priority_queue’ and ‘inverse_priority_queue’ based on solve_direction

The default value is ‘false’.

solve_direction – Specify whether we are going to the source node, or starting from it. Allowed values are:

from_source – Shortest path to get to the source (inverse Dijkstra)

to_source – Shortest path to source (Dijkstra)

The default value is ‘from_source’.

The default value is an empty dict ( {} ).

Returns

A dict with the following entries–

width (int) –
Width of the image as provided in width.

height (int) –
Height of the image as provided in height.

bg_color (long) –
Background color of the image as provided in bg_color.

image_data (bytes) –
Generated contour image data.

info (dict of str to str) –
Additional information.

solve_info (dict of str to str) –
Additional information.

contour_info (dict of str to str) –
Additional information.