Skip to main content

Class GPUdb

class gpudb.GPUdb(host=None, options=None, *args, **kwargs)

This is the main class to be used to provide the client functionality to interact with the server.

Usage patterns

  • Secured setup (Default)

    This code given below will set up a secured connection. The property ‘skip_ssl_cert_verification’ is set to ‘False’ by default. SSL certificate check will be enforced by default.

    options = GPUdb.Options()
    options.username = "user"
    options.password = "password"
    options.logging_level = "debug"
    
    gpudb = GPUdb(host='https://your_server_ip_or_FQDN:8082/gpudb-0', options=options )
    
  • Unsecured setup

    The code given below will set up an unsecured connection to the server. The property ‘skip_ssl_cert_verification’ has been set explicitly to ‘True’. So, irrespective of whether an SSL setup is there or not all certificate checks will be bypassed.

    options = GPUdb.Options()
    options.username = "user"
    options.password = "password"
    options.skip_ssl_cert_verification = True
    options.logging_level = "debug"
    
    gpudb = GPUdb(host='https://your_server_ip_or_FQDN:8082/gpudb-0', options=options )
    

    Another way of setting up an unsecured connection is as given by the code below. In this case, the URL is not a secured one so no SSL setup comes into play.

    options = GPUdb.Options()
    options.username = "user"
    options.password = "password"
    options.logging_level = "debug"
    
    gpudb = GPUdb(host='http://your_server_ip_or_FQDN:9191', options=options )
    

Construct a new GPUdb client instance. This object communicates to the database server at the given address. This class implements HA failover, which means that upon certain error conditions, this class will try to establish connection with one of the other clusters (specified by the user or known to the ring) to continue service. There are several options related to how to control that in the GPUdb.Options class that can be controlled via options.

Note

Please read the docstring of options about backward- compatibility related notes.

Parameters

class HASynchronicityMode(value, names=None, module=None, type=None, start=1)

Inner enumeration class to represent the high-availability synchronicity override mode that is applied to each endpoint call. Available enumerations are:

  • DEFAULT – No override; defer to the HA process for synchronizing endpoints (which has different logic for different endpoints). This is the default mode.

  • NONE – Do not replicate the endpoint calls to the backup cluster.

  • SYNCHRONOUS – Synchronize all endpoint calls

  • ASYNCHRONOUS – Do NOT synchronize any endpoint call

class HAFailoverOrder(value, names=None, module=None, type=None, start=1)

Inner enumeration class to represent the high-availability failover order that is applied to ring resiliency or inter-cluster failover. The order dictates in which pattern backup clusters will be chosen when a failover needs to happen in the client API. Available enumerations are:

  • RANDOM – Randomly choose the backup cluster from the available clusters. This is the default mode.

  • SEQUENTIAL – Choose the cluster sequentially from the list of clusters (the union of the user given clusters and auto-discovered clusters).

class Options(options=None)

Encapsulates the various options used to create a GPUdb object. The same object can be used on multiple GPUdb client handles and state modifications are chained together:

For backward compatibility, we will support the following options from the 7.0 GPUdb keyword arguments and map them to the following properties:

  • connection -> protocol

  • no_init_db_contact -> disable_auto_discovery

opts = GPUdb.Options.default()
opts.disable_failover = True
db1 = gpudb.GPUdb( host = "http://1.2.3.4:9191",
                   options = opts )
opts.primary_url = "http://7.8.9.0:9191"
db2 = gpudb.GPUdb( host = "http://1.2.3.4:9191",
                   options = opts )

Create a default set of options for GPUdb object creation.

Parameters

Returns

as_json() str

Return the options as a JSON. Will stringify parameters as needed. For example, GPUdb.URL and GPUdb.HAFailoverOrder objects will be stringified.

property cluster_reconnect_count

Gets the number of times the API tries to reconnect to the same cluster (when a failover event has been triggered), before actually failing over to any available backup cluster. Does not apply when only a single cluster is available.

This method is now deprecated.

property disable_auto_discovery

Gets the property indicating whether to disable automatic discovery of backup clusters or worker rank URLs. If set to true, then the GPUdb object will not connect to the database at initialization time, and will only work with the URLs given.

property disable_failover

Gets the whether failover upon failures is to be completely disabled.

property encoding

Gets the encoding used by the client. Supported values are:

  • binary

  • snappy

  • json

property ha_failover_order

Gets the current inter-cluster failover order. This indicates in which order–sequential or random–the backup clusters would be used when an inter-cluster failover event happens. Default is RANDOM.

property host_manager_port

Gets the host manager port number. Some endpoints are supported only at the host manager, rather than the head node of the database.

property hostname_regex

Gets the regex pattern to be used to filter URLs of the servers. If null, then the first URL encountered per rank will be used. Returns a compiled regex object or None if no regex is being used.

property http_headers

Gets the custom HTTP headers that will be used per HTTP endpoint submission by the GPUdb to the server. The header keys and values must be strings. Returns a deep copy.

add_http_header(header, value)

Adds a custom HTTP header to the set of ones which will be used per HTTP endpoint submission by the GPUdb to the server. The header key and value must be strings. Also, the following headers are protected, and cannot be overridden by the user:

  • “Accept”

  • “Authorization”

  • “Content-type”

  • “X-Kinetica-Group”

Parameters

property initial_connection_attempt_timeout

Gets the timeout used when trying to establish a connection to the database at GPUdb initialization. The value is given in milliseconds and the default is 0. 0 indicates no retry will be done; instead, the user given URLs will be stored without farther discovery.

If multiple URLs are given by the user, then API will try all of them once before retrying or giving up. When this timeout is set to a non-zero value, and the first attempt failed, then the API will wait (sleep) for a certain amount of time and try again. Upon consecutive failures, the sleep amount will be doubled. So, before the first retry (i.e. the second attempt), the API will sleep for one minute. Before the second retry, the API will sleep for two minutes, the next sleep interval would be four minutes, and onward.

property server_connection_timeout

Gets the timeout used when trying to establish a connection to the database at GPUdb initialization. The value is given in milliseconds and the default is 0. 0 indicates no retry will be done; instead, the user given URLs will be stored without farther discovery.

If multiple URLs are given by the user, then API will try all of them once before retrying or giving up. When this timeout is set to a non-zero value, and the first attempt failed, then the API will wait (sleep) for a certain amount of time and try again. Upon consecutive failures, the sleep amount will be doubled. So, before the first retry (i.e. the second attempt), the API will sleep for one minute. Before the second retry, the API will sleep for two minutes, the next sleep interval would be four minutes, and onward.

property intra_cluster_failover_timeout

Gets the timeout used when trying to recover from an intra-cluster failover event. The value is given in seconds. The default is equivalent to 5 minutes.

This method is now deprecated.

property logging_level

Gets the logging level that will be used by the API. By default, logging is set by the root logger (possibly set by the end user application). If the user sets the logging level explicitly via this options class, then the programmatically set level will be used instead.

property password

Gets the password to be used for authentication to GPUdb.

property primary_url

Gets the URL of the primary cluster’s head node in an HA environment.

property primary_host

Gets the hostname of the primary cluster of the HA environment.

Deprecated since version 7.2.3.5: The method will be removed in version 8.0.0.0. Instead of setting the primary host, the primary URL should be set using GPUdb.Options.primary_url at GPUdb initialization.

property protocol

Gets the protocol being used by the client.

property skip_ssl_cert_verification

Gets the value of the property indicating whether to verify the SSL certificate for HTTPS connections.

property timeout

Gets the timeout value, in milliseconds, after which a lack of response from the GPUdb server will result in requests being aborted. A timeout of zero is interpreted as an infinite timeout. Note that this applies independently to various stages of communication, so overall a request may run for longer than this without being aborted.

property username

Gets the username to be used for authentication to GPUdb.

property oauth_token

Gets the OAuth2 token to be used for authentication to GPUdb.

property max_retries

Gets the maximum number of retry attempts for HTTP requests.

property client_name

Gets the client application name to be included in the User-Agent header for HTTP requests.

property client_version

Gets the client application version to be included in the User-Agent header for HTTP requests.

class Version(version_str)

An internal class to handle Kinetica Version (client API or server).

Takes in a string containing a Kinetica version and creates a GPUdb.Version object from it.

Parameters

property first

Read-only property–first component of the version.

property second

Read-only property–second component of the version.

property third

Read-only property–third component of the version.

property fourth

Read-only property–fourth component of the version.

is_version_compatible(other)

Given another version, are the two compatible based on just the first two components taken into account? We don’t take the 3rd and 4th components into account since the server and the API ought to work as long as the first two components match.

TODO: Possibly add another optional parameter for taking how many

components to take into account when checking for compatibility.

Parameters

Returns

class ValidateUrl

An internal class to handle connection URL parsing

static validate_url(url=None)

Takes in a string URL, validates it, adds defaults where necessary, and returns a tuple with the URL components.

Parameters

Returns

class URL(url=None, port=None, protocol=None, accept_full_urls_only=False)

An internal class to handle URLs. Stores the hostname/IP address, port, protocol, path, and the full URL (as a string).

Takes in a string containing a full URL, or another URL object, and creates a URL object from it.

Parameters

property host

Read-only property–hostname or IP address.

property port

Read-only property–port.

property using_default_port

Read-only property–boolean indicating if we’re using a default port, or using the user given port (or the lack thereof).

property protocol

Read-only property–protocol (HTTP or HTTPS).

property using_default_protocol

Read-only property–boolean indicating if we’re using a default protocol, or using the user given protocol.

property path

Read-only property–URL path.

property url

Read-only property–fully qualified URL.

property username

Read-only property–username in URL, if present.

property password

Read-only property–password in URL, if present.

class ClusterAddressInfo(head_rank_url, worker_rank_urls=None, host_names=None, host_manager_url=None, host_manager_port=None, is_primary_cluster=None, server_version=None, logging_level=None)

Inner class to keep track of all relevant information for a given Kinetica cluster. It mostly keeps track of URLs and hostnames, with some additional information like whether the cluster is primary or not.

Creates a ClusterAddressInfo object with the given information.

Parameters

property head_rank_url

Returns the current head node GPUdb.URL for this cluster.

property protocol

Returns the protocol used (‘HTTP’ or ‘HTTPS’). This is derived from the head rank URL. A read-only property.

property worker_rank_urls

Returns the list of the worker rank GPUdb.URL objects for this cluster. May be empty if worker http servers are disabled.

property host_names

Returns the list of hostnames for this cluster.

property host_manager_url

Returns the host manager GPUdb.URL for this cluster.

property is_primary_cluster

Returns whether this cluster is the primary cluster in the ring.

property is_intra_cluster_failover_enabled

Returns whether this cluster has intra-cluster failover enabled.

This method is now deprecated.

property server_version

Returns the version of this cluster, if known; None otherwise.

does_cluster_contain_node(host_name)

Checks if the given hostname (or IP address) is part of this cluster.

Parameters

Returns

END_OF_SET = -9999

(int) Used for indicating that all of the records (till the end of the set are desired)–generally used for /get/records/* functions.

get_version_info()

Return the version information for this API.

get_host()

Return the host this client is talking to.

get_primary_host()

Return the primary host for this client.

set_primary_host(new_primary_host, start_using_new_primary_host=False, delete_old_primary_host=False)

Set the primary host for this client. Start using this host per the user’s directions. Also, either delete any existing primary host information, or relegate it to the ranks of a backup host.

Parameters

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method will no longer be functional. This method will be a no-op, not changing primary host. port. The method will be removed in version 7.2.0.0. The only way to set the primary host is via GPUdb.Options at GPUdb initialization. It cannot be changed after that.

get_port()

Return the port the host is listening to.

get_host_manager_port()

Return the port the host manager is listening to.

get_url(stringified=True)

Return the GPUdb.URL or its string representation that points to the current head node of the current cluster in use.

Parameters

Returns

get_hm_url(stringified=True)

Return the GPUdb.URL or its string representation that points to the current host manager of the current cluster in use.

Parameters

Returns

get_failover_urls()

Return a list of the head node URLs for each of the clusters in the HA ring in failover order.

Returns

get_head_node_urls()

Return a list of the head node URLs for each of the clusters in the HA ring for the database server.

Returns

get_num_cluster_switches()

Gets the number of times the client has switched to a different cluster amongst the high availability ring.

property current_cluster_info

Return the GPUdb.ClusterAddressInfo object containing information on the current/active cluster.

property all_cluster_info

Return the list of GPUdb.ClusterAddressInfo objects that contain address of each of the clusters in the ring.

property ha_ring_size

Return the list of GPUdb.ClusterAddressInfo objects that contain address of each of the clusters in the ring.

property options

Return the GPUdb.Options object that contains all the knobs the user can turn for controlling this class’s behavior.

property gpudb_full_url

Returns the full URL of the current head rank of the currently active cluster.

property server_version

Returns the GPUdb.Version object representing the version of the currently active cluster of the Kinetica server.

property protocol

Returns the HTTP protocol being used by the GPUdb object to communicate to the database server.

property primary_host

Returns the primary hostname.

property username

Gets the username to be used for authentication to GPUdb.

property password

Gets the password to be used for authentication to GPUdb.

property oauth_token

Gets the OAuth2 token to be used for authentication to GPUdb.

property timeout

Gets the timeout used for http connections to GPUdb.

property disable_auto_discovery

Returns whether auto-discovery has been disabled.

property logging_level

Returns the integer value of the logging level that is being used by the API. By default, logging is set to NOTSET, and the logger will honor the root logger’s level.

property get_known_types

Return all known types; if none, return None.

get_known_type(type_id, lookup_type=True)

Given an type ID, return any associated known type; if none is found, then optionally try to look it up and save it. Otherwise, return None.

Parameters

Returns

get_all_available_full_urls(stringified=True)

Return the list of GPUdb.URL objects or its string representation that points to the current head node of each of the clusters in the ring.

Parameters

Returns

add_http_header(header, value)

Adds an HTTP header to the map of additional HTTP headers to send to the server with each request. If the header is already in the map, its value is replaced with the specified value. The user is not allowed to modify the following headers:

  • Accept

  • Authorization

  • X-Kinetica-Group

  • Content-type

remove_http_header(header)

Removes the given HTTP header from the map of additional HTTP headers to send to GPUdb with each request. The user is not allowed to remove the following headers:

  • Accept

  • Authorization

  • X-Kinetica-Group

  • Content-type

get_http_headers()

Returns a dict containing all the custom headers used currently by GPUdb. Returns a deep copy so that the user does not accidentally change the headers. Note that the API may use other headers as appropriate; the ones returned here are the custom ones set up by the user.

log_debug(message)

Logging method for debug.

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method is deprecated, and may be removed in a future version. Previously, this was a static method; now it is an instance method. This method will log messages as intended.

log_warn(message)

Logging method for warnings.

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method is deprecated, and may be removed in a future version. Previously, this was a static method; now it is an instance method. This method will log messages as intended.

log_info(message)

Logging method for information.

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method is deprecated, and may be removed in a future version. Previously, this was a static method; now it is an instance method. This method will log messages as intended.

log_error(message)

Logging method for error.

Deprecated since version 7.1.0.0: As of version 7.1.0.0, this method is deprecated, and may be removed in a future version. Previously, this was a static method; now it is an instance method. This method will log messages as intended.

encode_datum(SCHEMA, datum, encoding=None)

Returns an Avro binary or JSON encoded datum dict using its schema.

Parameters

encode_datum_cext(SCHEMA, datum, encoding=None)

Returns an avro binary or JSON encoded datum dict using its schema.

Parameters

static valid_json(json_string)

Validates a JSON string by trying to parse it into a Python object

static merge_dicts(*dict_args)

Given any number of dictionaries, shallow copy and merge into a new dict, precedence goes to key-value pairs in latter dictionaries.

logger(ranks, log_levels, options=)

Convenience function to change log levels of some or all GPUdb ranks.

Parameters

Returns

set_server_logger_level(ranks, log_levels, options=)

Convenience function to change log levels of some or all GPUdb ranks.

Parameters

Returns

set_client_logger_level(log_level)

Set the log level for the client GPUdb class.

Parameters

insert(*, table_name: str = None, records: List[Any] | Dict[str, Any] | List[List[Any]] | List[Dict[str, Any]] = None, options=None)

Insert one or more records.

Parameters

Returns

delete(*, table_name=None, expression=None)

Deletes the record matching the provided criterion from the given table. The record selection criteria can be a single input parameter expression (matching multiple records) The operation is synchronous meaning that a response will not be available until the request is completely processed and all the matching records are deleted.

Parameters

Returns

update(*, table_name=None, expression=None, new_values_map=None)

Runs predicate-based updates in a single call. With the given expression, any matching record’s column values will be updated as provided in input parameter new_values_map.

Note that this operation can only be run on an original table and not on a result view.

Parameters

Returns

insert_records_json(json_records, table_name, json_options=None, create_table_options=None, options=None)

Inserts a single JSON record or an array of JSON records passed in as a string.

If a fail-over event is triggered, this function will handle failing over to a secondary cluster.

Parameters

Returns

Raises

Example

response = gpudb.insert_records_json(records, "test_insert_records_json", json_options={'validate': True}, create_table_options={'truncate_table': 'true'})
response_object = json.loads(response)
print(response_object['data']['count_inserted'])
get_records_json(table_name, column_names=None, offset=0, limit=-9999, expression=None, orderby_columns=None, having_clause=None)

Retrieves records from a table in the form of a JSON array (stringified). The only mandatory parameter is the ‘tableName’. The rest are all optional with suitable defaults wherever applicable.

If a fail-over event is triggered, this function will handle failing over to a secondary cluster.

Parameters

Returns

Raises

Example

resp = gpudb.get_records_json("table_name")
json_object = json.loads(resp)
print(json_object["data"]["records"])
wms(wms_params, url=None)

Submits a WMS call to the server.

Parameters

Returns

ping(url)

Pings the given URL and returns the response. If no response, returns an empty string.

Parameters

Returns

is_kinetica_running(url)

Verifies that GPUdb is running at the given URL (does not do any HA failover).

Parameters

Returns

get_server_debug_information(url)

Gets the database debug information from the given URL and returns the response.

Parameters

Returns

to_df(sql: str, sql_params: list = [], batch_size: int = 5000, sql_opts: dict = , show_progress: bool = False)

Runs the given query and converts the result to a Pandas Data Frame.

Parameters

Raises

Returns

query(sql, batch_size=5000, sql_params=[], sql_opts=)

Execute a SQL query and return a GPUdbSqlIterator

Parameters

Returns

query_one(sql, sql_params=[], sql_opts=)

Execute a SQL query that returns only one row.

Parameters

Returns

execute(sql, sql_params=[], sql_opts=)

Execute a SQL query and return the row count.

Parameters

Returns

static get_connection(enable_ssl_cert_verification=False, enable_auto_discovery=False, enable_failover=False, logging_level=‘INFO’) GPUdb

Get a connection to Kinetica getting connection and authentication information from environment variables.

This method is useful particularly for Jupyter notebooks, which won’t need authentication credentials embedded within them. This, in turn, helps to prevent commit of credentials to the notebook version control. In addition, some features including auto-discovery and SSL certificate verification are disabled by default to simplify connections for simple use cases.

The following environment variables are required: - KINETICA_URL: the url of the Kinetica server - KINETICA_USER: the username to connect with - KINETICA_PASSWD: the password to connect with

Parameters

Returns (GPUdb):

An active connection to Kinetica.

load_gpudb_schemas()

Saves all request and response schemas for GPUdb queries in a lookup table (lookup by query name).

load_gpudb_func_to_endpoint_map()

Saves a mapping of rest endpoint function names to endpoints in a dictionary.

admin_add_host(host_address=None, options=)

Adds a host to an existing cluster.

Note

This method should be used for on-premise deployments only.

Parameters

Returns

admin_add_ranks(hosts=None, config_params=None, options=)

Add one or more ranks to an existing Kinetica cluster. The new ranks will not contain any data initially (other than replicated tables) and will not be assigned any shards. To rebalance data and shards across the cluster, use GPUdb.admin_rebalance().

The database must be offline for this operation, see GPUdb.admin_offline()

For example, if attempting to add three new ranks (two ranks on host 172.123.45.67 and one rank on host 172.123.45.68) to a Kinetica cluster with additional configuration parameters:

  • input parameter hosts would be an array including 172.123.45.67 in the first two indices (signifying two ranks being added to host 172.123.45.67) and 172.123.45.68 in the last index (signifying one rank being added to host 172.123.45.67)

  • input parameter config_params would be an array of maps, with each map corresponding to the ranks being added in input parameter hosts. The key of each map would be the configuration parameter name and the value would be the parameter’s value, e.g. ‘“rank.gpu”:”1”

This endpoint’s processing includes copying all replicated table data to the new rank(s) and therefore could take a long time. The API call may time out if run directly. It is recommended to run this endpoint asynchronously via GPUdb.create_job().

Note

This method should be used for on-premise deployments only.

Parameters

Returns

admin_alter_host(host=None, options=)

Alter properties on an existing host in the cluster. Currently, the only property that can be altered is a hosts ability to accept failover processes.

Parameters

Returns

admin_alter_jobs(job_ids=None, action=None, options=)

Perform the requested action on a list of one or more job(s). Based on the type of job and the current state of execution, the action may not be successfully executed. The final result of the attempted actions for each specified job is returned in the status array of the response. See Job Manager for more information.

Parameters

Returns

admin_backup_begin(options=)

Prepares the system for a backup by closing all open file handles after allowing current active jobs to complete. When the database is in backup mode, queries that result in a disk write operation will be blocked until backup mode has been completed by using GPUdb.admin_backup_end().

Parameters

Returns

admin_backup_end(options=)

Restores the system to normal operating mode after a backup has completed, allowing any queries that were blocked to complete.

Parameters

Returns

admin_ha_offline(offline=None, options=)

Pauses consumption of messages from other HA clusters to support data repair/recovery scenarios. In-flight queries may fail to replicate to other clusters in the ring when going offline.

Parameters

Returns

admin_ha_refresh(options=)

Restarts the HA processing on the given cluster as a mechanism of accepting breaking HA conf changes. Additionally the cluster is put into read-only while HA is restarting.

Parameters

Returns

admin_offline(offline=None, options=)

Take the system offline. When the system is offline, no user operations can be performed with the exception of a system shutdown.

Parameters

Returns

admin_rebalance(options=)

Rebalance the data in the cluster so that all nodes contain an equal number of records approximately and/or rebalance the shards to be equally distributed (as much as possible) across all the ranks.

The database must be offline for this operation, see GPUdb.admin_offline()

  • If GPUdb.admin_rebalance() is invoked after a change is made to the cluster, e.g., a host was added or removed, sharded data will be evenly redistributed across the cluster by number of shards per rank while unsharded data will be redistributed across the cluster by data size per rank

  • If GPUdb.admin_rebalance() is invoked at some point when unsharded data (a.k.a. randomly-sharded) in the cluster is unevenly distributed over time, sharded data will not move while unsharded data will be redistributed across the cluster by data size per rank

NOTE: Replicated data will not move as a result of this call

This endpoint’s processing time depends on the amount of data in the system, thus the API call may time out if run directly. It is recommended to run this endpoint asynchronously via GPUdb.create_job().

Parameters

Returns

admin_remove_host(host=None, options=)

Removes a host from an existing cluster. If the host to be removed has any ranks running on it, the ranks must be removed using GPUdb.admin_remove_ranks() or manually switched over to a new host using GPUdb.admin_switchover() prior to host removal. If the host to be removed has the graph server or SQL planner running on it, these must be manually switched over to a new host using GPUdb.admin_switchover().

Note

This method should be used for on-premise deployments only.

Parameters

Returns

admin_remove_ranks(ranks=None, options=)

Remove one or more ranks from an existing Kinetica cluster. All data will be rebalanced to other ranks before the rank(s) is removed unless the rebalance_sharded_data or rebalance_unsharded_data parameters are set to false in the input parameter options, in which case the corresponding sharded data and/or unsharded data (a.k.a. randomly-sharded) will be deleted.

The database must be offline for this operation, see GPUdb.admin_offline()

This endpoint’s processing time depends on the amount of data in the system, thus the API call may time out if run directly. It is recommended to run this endpoint asynchronously via GPUdb.create_job().

Note

This method should be used for on-premise deployments only.

Parameters

Returns

admin_repair_table(table_names=None, table_types=None, options=)

Manually repair a corrupted table. Returns information about affected tables.

Parameters

Returns

admin_send_alert(message=, label=, log_level=None, options=)

Sends a user generated alert to the monitoring system.

Parameters

Returns

admin_show_alerts(num_alerts=None, options=)

Requests a list of the most recent alerts. Returns lists of alert data, including timestamp and type.

Parameters

Returns

admin_show_cluster_operations(history_index=0, options=)

Requests the detailed status of the current operation (by default) or a prior cluster operation specified by input parameter history_index. Returns details on the requested cluster operation.

The response will also indicate how many cluster operations are stored in the history.

Parameters

Returns

admin_show_jobs(options=)

Get a list of the current jobs in GPUdb.

Parameters

Returns

admin_show_shards(options=)

Show the mapping of shards to the corresponding rank and tom. The response message contains list of 16384 (total number of shards in the system) Rank and TOM numbers corresponding to each shard.

Parameters

Returns

admin_shutdown(exit_type=None, authorization=None, options=)

Exits the database server application.

Parameters

Returns

admin_switchover(processes=None, destinations=None, options=)

Manually switch over one or more processes to another host. Individual ranks or entire hosts may be moved to another host.

Note

This method should be used for on-premise deployments only.

Parameters

Returns

admin_verify_db(options=)

Verify database is in a consistent state. When inconsistencies or errors are found, the verified_ok flag in the response is set to false and the list of errors found is provided in the error_list.

Parameters

Returns

aggregate_convex_hull(table_name=None, x_column_name=None, y_column_name=None, options=)

Calculates and returns the convex hull for the values in a table specified by input parameter table_name.

Parameters

Returns

aggregate_group_by(table_name=None, column_names=None, offset=0, limit=-9999, encoding=‘binary’, options=)

Calculates unique combinations (groups) of values for the given columns in a given table or view and computes aggregates on each unique combination. This is somewhat analogous to an SQL-style SELECT…GROUP BY.

For aggregation details and examples, see Aggregation. For limitations, see Aggregation Limitations.

Any column(s) can be grouped on, and all column types except unrestricted-length strings may be used for computing applicable aggregates.

The results can be paged via the input parameter offset and input parameter limit parameters. For example, to get 10 groups with the largest counts the inputs would be: limit=10, options=“sort_order”:”descending”, “sort_by”:”value”.

Input parameter options can be used to customize behavior of this call e.g. filtering or sorting the results.

To group by columns ‘x’ and ‘y’ and compute the number of objects within each group, use: column_names=[‘x’,’y’,’count(*)’].

To also compute the sum of ‘z’ over each group, use: column_names=[‘x’,’y’,’count(*)’,’sum(z)’].

Available aggregation functions are: count(*), sum, min, max, avg, mean, stddev, stddev_pop, stddev_samp, var, var_pop, var_samp, arg_min, arg_max and count_distinct.

Available grouping functions are Rollup, Cube, and Grouping Sets

This service also provides support for Pivot operations.

Filtering on aggregates is supported via expressions using aggregation functions supplied to having.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column names must adhere to standard naming conventions; column/aggregation expressions will need to be aliased. If the source table’s shard key is used as the grouping column(s) and all result records are selected (input parameter offset is 0 and input parameter limit is -9999), the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available when any of the values of input parameter column_names is an unrestricted-length string.

Parameters

Returns

aggregate_group_by_and_decode(table_name=None, column_names=None, offset=0, limit=-9999, encoding=‘binary’, options=, record_type=None, force_primitive_return_types=True, get_column_major=True)

Calculates unique combinations (groups) of values for the given columns in a given table or view and computes aggregates on each unique combination. This is somewhat analogous to an SQL-style SELECT…GROUP BY.

For aggregation details and examples, see Aggregation. For limitations, see Aggregation Limitations.

Any column(s) can be grouped on, and all column types except unrestricted-length strings may be used for computing applicable aggregates.

The results can be paged via the input parameter offset and input parameter limit parameters. For example, to get 10 groups with the largest counts the inputs would be: limit=10, options=“sort_order”:”descending”, “sort_by”:”value”.

Input parameter options can be used to customize behavior of this call e.g. filtering or sorting the results.

To group by columns ‘x’ and ‘y’ and compute the number of objects within each group, use: column_names=[‘x’,’y’,’count(*)’].

To also compute the sum of ‘z’ over each group, use: column_names=[‘x’,’y’,’count(*)’,’sum(z)’].

Available aggregation functions are: count(*), sum, min, max, avg, mean, stddev, stddev_pop, stddev_samp, var, var_pop, var_samp, arg_min, arg_max and count_distinct.

Available grouping functions are Rollup, Cube, and Grouping Sets

This service also provides support for Pivot operations.

Filtering on aggregates is supported via expressions using aggregation functions supplied to having.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column names must adhere to standard naming conventions; column/aggregation expressions will need to be aliased. If the source table’s shard key is used as the grouping column(s) and all result records are selected (input parameter offset is 0 and input parameter limit is -9999), the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available when any of the values of input parameter column_names is an unrestricted-length string.

Parameters

Returns

aggregate_histogram(table_name=None, column_name=None, start=None, end=None, interval=None, options=)

Performs a histogram calculation given a table, a column, and an interval function. The input parameter interval is used to produce bins of that size and the result, computed over the records falling within each bin, is returned. For each bin, the start value is inclusive, but the end value is exclusive–except for the very last bin for which the end value is also inclusive. The value returned for each bin is the number of records in it, except when a column name is provided as a value_column. In this latter case the sum of the values corresponding to the value_column is used as the result instead. The total number of bins requested cannot exceed 10,000.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service a request that specifies a value_column.

Parameters

Returns

aggregate_k_means(table_name=None, column_names=None, k=None, tolerance=None, options=)

This endpoint runs the k-means algorithm - a heuristic algorithm that attempts to do k-means clustering. An ideal k-means clustering algorithm selects k points such that the sum of the mean squared distances of each member of the set to the nearest of the k points is minimized. The k-means algorithm however does not necessarily produce such an ideal cluster. It begins with a randomly selected set of k points and then refines the location of the points iteratively and settles to a local minimum. Various parameters and options are provided to control the heuristic search.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

Parameters

Returns

aggregate_min_max(table_name=None, column_name=None, options=)

Calculates and returns the minimum and maximum values of a particular column in a table.

Parameters

Returns

aggregate_min_max_geometry(table_name=None, column_name=None, options=)

Calculates and returns the minimum and maximum x- and y-coordinates of a particular geospatial geometry column in a table.

Parameters

Returns

aggregate_statistics(table_name=None, column_name=None, stats=None, options=)

Calculates the requested statistics of the given column(s) in a given table.

The available statistics are: count (number of total objects), mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, weighted_average, cardinality (unique count), estimated_cardinality, percentile, and percentile_rank.

Estimated cardinality is calculated by using the hyperloglog approximation technique.

Percentiles and percentile ranks are approximate and are calculated using the t-digest algorithm. They must include the desired percentile/percentile_rank. To compute multiple percentiles each value must be specified separately (i.e. ‘percentile(75.0),percentile(99.0),percentile_rank(1234.56),percentile_rank(-5)’).

A second, comma-separated value can be added to the percentile statistic to calculate percentile resolution, e.g., a 50th percentile with 200 resolution would be ‘percentile(50,200)’.

The weighted average statistic requires a weight column to be specified in weight_column_name. The weighted average is then defined as the sum of the products of input parameter column_name times the weight_column_name values divided by the sum of the weight_column_name values.

Additional columns can be used in the calculation of statistics via additional_column_names. Values in these columns will be included in the overall aggregate calculation–individual aggregates will not be calculated per additional column. For instance, requesting the count and mean of input parameter column_name x and additional_column_names y and z, where x holds the numbers 1-10, y holds 11-20, and z holds 21-30, would return the total number of x, y, and z values (30), and the single average value across all x, y, and z values (15.5).

The response includes a list of key/value pairs of each statistic requested and its corresponding value.

Parameters

Returns

aggregate_statistics_by_range(table_name=None, select_expression=, column_name=None, value_column_name=None, stats=None, start=None, end=None, interval=None, options=)

Divides the given set into bins and calculates statistics of the values of a value-column in each bin. The bins are based on the values of a given binning-column. The statistics that may be requested are mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, first, last and weighted average. In addition to the requested statistics the count of total samples in each bin is returned. This counts vector is just the histogram of the column used to divide the set members into bins. The weighted average statistic requires a weight column to be specified in weight_column_name. The weighted average is then defined as the sum of the products of the value column times the weight column divided by the sum of the weight column.

There are two methods for binning the set members. In the first, which can be used for numeric valued binning-columns, a min, max and interval are specified. The number of bins, nbins, is the integer upper bound of (max-min)/interval. Values that fall in the range [min+n*interval,min+(n+1)*interval) are placed in the nth bin where n ranges from 0..nbin-2. The final bin is [min+(nbin-1)*interval,max]. In the second method, bin_values specifies a list of binning column values. Binning-columns whose value matches the nth member of the bin_values list are placed in the nth bin. When a list is provided, the binning-column must be of type string or int.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

Parameters

Returns

aggregate_unique(table_name=None, column_name=None, offset=0, limit=-9999, encoding=‘binary’, options=)

Returns all the unique values from a particular column (specified by input parameter column_name) of a particular table or view (specified by input parameter table_name). If input parameter column_name is a numeric column, the values will be in output parameter binary_encoded_response. Otherwise if input parameter column_name is a string column, the values will be in output parameter json_encoded_response. The results can be paged via input parameter offset and input parameter limit parameters.

“limit”:”10”,”sort_order”:”descending”

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column name must adhere to standard naming conventions; any column expression will need to be aliased. If the source table’s shard key is used as the input parameter column_name, the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available if the value of input parameter column_name is an unrestricted-length string.

Parameters

Returns

aggregate_unique_and_decode(table_name=None, column_name=None, offset=0, limit=-9999, encoding=‘binary’, options=, record_type=None, force_primitive_return_types=True, get_column_major=True)

Returns all the unique values from a particular column (specified by input parameter column_name) of a particular table or view (specified by input parameter table_name). If input parameter column_name is a numeric column, the values will be in output parameter binary_encoded_response. Otherwise if input parameter column_name is a string column, the values will be in output parameter json_encoded_response. The results can be paged via input parameter offset and input parameter limit parameters.

“limit”:”10”,”sort_order”:”descending”

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column name must adhere to standard naming conventions; any column expression will need to be aliased. If the source table’s shard key is used as the input parameter column_name, the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available if the value of input parameter column_name is an unrestricted-length string.

Parameters

Returns

aggregate_unpivot(table_name=None, column_names=None, variable_column_name=, value_column_name=, pivoted_columns=None, encoding=‘binary’, options=)

Rotate the column values into rows values.

For unpivot details and examples, see Unpivot. For limitations, see Unpivot Limitations.

Unpivot is used to normalize tables that are built for cross tabular reporting purposes. The unpivot operator rotates the column values for all the pivoted columns. A variable column, value column and all columns from the source table except the unpivot columns are projected into the result table. The variable column and value columns in the result table indicate the pivoted column name and values respectively.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

Returns

aggregate_unpivot_and_decode(table_name=None, column_names=None, variable_column_name=, value_column_name=, pivoted_columns=None, encoding=‘binary’, options=, record_type=None, force_primitive_return_types=True, get_column_major=True)

Rotate the column values into rows values.

For unpivot details and examples, see Unpivot. For limitations, see Unpivot Limitations.

Unpivot is used to normalize tables that are built for cross tabular reporting purposes. The unpivot operator rotates the column values for all the pivoted columns. A variable column, value column and all columns from the source table except the unpivot columns are projected into the result table. The variable column and value columns in the result table indicate the pivoted column name and values respectively.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

Returns

alter_backup(backup_name=None, action=None, value=None, datasink_name=None, options=)

Alters an existing database backup, accessible via the data sink specified by input parameter datasink_name.

Parameters

Returns

alter_credential(credential_name=None, credential_updates_map=None, options=None)

Alter the properties of an existing credential.

Parameters

Returns

alter_datasink(name=None, datasink_updates_map=None, options=None)

Alters the properties of an existing data sink.

Parameters

Returns

alter_datasource(name=None, datasource_updates_map=None, options=None)

Alters the properties of an existing data source.

Parameters

Returns

alter_directory(directory_name=None, directory_updates_map=None, options=)

Alters an existing directory in KiFS.

Parameters

Returns

alter_environment(environment_name=None, action=None, value=None, options=)

Alters an existing environment which can be referenced by a user-defined function (UDF).

Parameters

Returns

alter_resource_group(name=None, tier_attributes=, ranking=, adjoining_resource_group=, options=)

Alters the properties of an existing resource group to facilitate resource management.

Parameters

Returns

alter_role(name=None, action=None, value=None, options=)

Alters a Role.

Parameters

Returns

alter_schema(schema_name=None, action=None, value=None, options=)

Used to change the name of a SQL-style schema, specified in input parameter schema_name.

Parameters

Returns

alter_system_properties(property_updates_map=None, options=)

The GPUdb.alter_system_properties() endpoint is primarily used to simplify the testing of the system and is not expected to be used during normal execution. Commands are given through the input parameter property_updates_map whose keys are commands and values are strings representing integer values (for example ‘8000’) or boolean values (‘true’ or ‘false’).

Parameters

Returns

alter_table(table_name=None, action=None, value=None, options=)

Apply various modifications to a table or view. The available modifications include the following:

Manage a table’s columns–a column can be added, removed, or have its type and properties modified, including whether it is dictionary encoded or not.

External tables cannot be modified except for their refresh method.

Create or delete a column, low-cardinality index, chunk skip, geospatial, CAGRA, or HNSW index. This can speed up certain operations when using expressions containing equality or relational operators on indexed columns. This only applies to tables.

Create or delete a foreign key on a particular column.

Manage a range-partitioned or a manual list-partitioned table’s partitions.

Set (or reset) the tier strategy of a table or view.

Refresh and manage the refresh mode of a materialized view or an external table.

Set the time-to-live (TTL). This can be applied to tables or views.

Set the global access mode (i.e. locking) for a table. This setting trumps any role-based access controls that may be in place; e.g., a user with write access to a table marked read-only will not be able to insert records into it. The mode can be set to read-only, write-only, read/write, and no access.

Parameters

Returns

alter_table_columns(table_name=None, column_alterations=None, options=None)

Apply various modifications to columns in a table, view. The available modifications include the following:

Create or delete an index on a particular column. This can speed up certain operations when using expressions containing equality or relational operators on indexed columns. This only applies to tables.

Manage a table’s columns–a column can be added, removed, or have its type and properties modified, including whether it is dictionary encoded or not.

Parameters

Returns

alter_table_metadata(table_names=None, metadata_map=None, options=)

Updates (adds or changes) metadata for tables. The metadata key and values must both be strings. This is an easy way to annotate whole tables rather than single records within tables. Some examples of metadata are owner of the table, table creation timestamp etc.

Parameters

Returns

alter_table_monitor(topic_id=None, monitor_updates_map=None, options=)

Alters a table monitor previously created with GPUdb.create_table_monitor().

Parameters

Returns

alter_tier(name=None, options=)

Alters properties of an existing tier to facilitate resource management.

To disable watermark-based eviction, set both high_watermark and low_watermark to 100.

Parameters

Returns

alter_user(name=None, action=None, value=None, options=)

Alters a user.

Parameters

Returns

alter_video(path=None, options=)

Alters a video.

Parameters

Returns

alter_wal(table_names=None, options=)

Alters table write-ahead log (WAL) settings. Returns information about the requested table WAL modifications.

Parameters

Returns

append_records(table_name=None, source_table_name=None, field_map=None, options=)

Append (or insert) all records from a source table (specified by input parameter source_table_name) to a particular target table (specified by input parameter table_name). The field map (specified by input parameter field_map) holds the user specified map of target table column names with their mapped source column names.

Parameters

Returns

check_table(table_names=None, options=)

Scans the requested tables as specified in input parameter table_names for integrity. Any table chunks which fail the check will be marked as corrupt. By default the database will automatically repair corrupt tables (via truncating). Note that since this reads every table column from disk it may be a potentially long-running operation. The option local_only can be used to skip any table files already written to a remote storage. Returns table corruption results.

Parameters

Returns

clear_statistics(table_name=, column_name=, options=)

Clears statistics (cardinality, mean value, etc.) for a column in a specified table.

Parameters

Returns

clear_table(table_name=, authorization=, options=)

Clears (drops) one or all tables in the database cluster. The operation is synchronous meaning that the table will be cleared before the function returns. The response payload returns the status of the operation along with the name of the table that was cleared.

Parameters

Returns

clear_table_monitor(topic_id=None, options=)

Deactivates a table monitor previously created with GPUdb.create_table_monitor().

Parameters

Returns

clear_tables(table_names=[], options=)

Clears (drops) tables in the database cluster. The operation is synchronous meaning that the tables will be cleared before the function returns. The response payload returns the status of the operation for each table requested.

Parameters

Returns

clear_trigger(trigger_id=None, options=)

Clears or cancels the trigger identified by the specified handle. The output returns the handle of the trigger cleared as well as indicating success or failure of the trigger deactivation.

Parameters

Returns

collect_statistics(table_name=None, column_names=None, options=)

Collect statistics for a column(s) in a specified table.

Parameters

Returns

create_backup(backup_name=None, backup_type=None, backup_objects_map=, datasink_name=None, options=)

Creates a database backup, containing a snapshot of existing objects, at the remote file store accessible via the data sink specified by input parameter datasink_name.

Parameters

Returns

create_catalog(name=None, table_format=None, location=None, type=None, credential=None, datasource=None, options=)

Creates a catalog, which contains the location and connection information for a deltalake catalog that is external to the database.

Parameters

Returns

create_credential(credential_name=None, type=None, identity=None, secret=None, options=)

Create a new credential.

Parameters

Returns

create_datasink(name=None, destination=None, options=)

Creates a data sink, which contains the destination information for a data sink that is external to the database.

Parameters

Returns

create_datasource(name=None, location=None, user_name=None, password=None, options=)

Creates a data source, which contains the location and connection information for a data store that is external to the database.

Parameters

Returns

create_directory(directory_name=None, options=)

Creates a new directory in KiFS. The new directory serves as a location in which the user can upload files using GPUdb.upload_files().

Parameters

Returns

create_environment(environment_name=None, options=)

Creates a new environment which can be used by user-defined functions (UDF).

Parameters

Returns

create_graph(graph_name=None, directed_graph=True, nodes=None, edges=None, weights=None, restrictions=None, options=)

Creates a new graph network using given nodes, edges, weights, and restrictions.

IMPORTANT: It’s highly recommended that you review the Graphs and Solvers concepts documentation, the Graph REST Tutorial, and/or some graph examples before using this endpoint.

Parameters

Returns

create_job(endpoint=None, request_encoding=‘binary’, data=None, data_str=None, options=)

Create a job which will run asynchronously. The response returns a job ID, which can be used to query the status and result of the job. The status and the result of the job upon completion can be requested by GPUdb.get_job().

Parameters

Returns

create_join_table(join_table_name=None, table_names=None, column_names=None, expressions=[], options=)

Creates a table that is the result of a SQL JOIN.

For join details and examples see: Joins. For limitations, see Join Limitations and Cautions.

Parameters

Returns

create_materialized_view(table_name=None, options=)

Initiates the process of creating a materialized view, reserving the view’s name to prevent other views or tables from being created with that name.

For materialized view details and examples, see Materialized Views.

The response contains output parameter view_id, which is used to tag each subsequent operation (projection, union, aggregation, filter, or join) that will compose the view.

Parameters

Returns

create_proc(proc_name=None, execution_mode=‘distributed’, files=, command=, args=[], options=)

Creates an instance (proc) of the user-defined functions (UDF) specified by the given command, options, and files, and makes it available for execution.

Parameters

Returns

create_projection(table_name=None, projection_name=None, column_names=None, options=)

Creates a new projection of an existing table. A projection represents a subset of the columns (potentially including derived columns) of a table.

For projection details and examples, see Projections. For limitations, see Projection Limitations and Cautions.

Window functions, which can perform operations like moving averages, are available through this endpoint as well as GPUdb.get_records_by_column().

A projection can be created with a different shard key than the source table. By specifying shard_key, the projection will be sharded according to the specified columns, regardless of how the source table is sharded. The source table can even be unsharded or replicated.

If input parameter table_name is empty, selection is performed against a single-row virtual table. This can be useful in executing temporal (NOW()), identity (USER()), or constant-based functions (GEODIST(-77.11, 38.88, -71.06, 42.36)).

Parameters

Returns

create_resource_group(name=None, tier_attributes=, ranking=None, adjoining_resource_group=, options=)

Creates a new resource group to facilitate resource management.

Parameters

Returns

create_role(name=None, options=)

Creates a new role.

Note

This method should be used for on-premise deployments only.

Parameters

Returns

create_schema(schema_name=None, options=)

Creates a SQL-style schema. Schemas are containers for tables and views. Multiple tables and views can be defined with the same name in different schemas.

Parameters

Returns

create_table(table_name=None, type_id=None, options=)

Creates a new table with the given type (definition of columns). The type is specified in input parameter type_id as either a numerical type ID (as returned by GPUdb.create_type()) or as a list of columns, each specified as a list of the column name, data type, and any column attributes.

Example of a type definition with some parameters:

[
    ["id", "int8", "primary_key"],
    ["dept_id", "int8", "primary_key", "shard_key"],
    ["manager_id", "int8", "nullable"],
    ["first_name", "char32"],
    ["last_name", "char64"],
    ["salary", "decimal"],
    ["hire_date", "date"]
]

Each column definition consists of the column name (which should meet the standard column naming criteria), the column’s specific type (int, long, float, double, string, bytes, or any of the properties map values from GPUdb.create_type()), and any data handling, data key, or data replacement properties.

A table may optionally be designated to use a replicated distribution scheme, or be assigned: foreign keys to other tables, a partitioning scheme, and/or a tier strategy.

Parameters

Returns

create_table_external(table_name=None, filepaths=None, modify_columns=, create_table_options=, options=)

Creates a new external table, which is a local database object whose source data is located externally to the database. The source data can be located either in KiFS; on the cluster, accessible to the database; or remotely, accessible via a pre-defined external data source.

The external table can have its structure defined explicitly, via input parameter create_table_options, which contains many of the options from GPUdb.create_table(); or defined implicitly, inferred from the source data.

Parameters

Returns

create_table_monitor(table_name=None, options=)

Creates a monitor that watches for a single table modification event type (insert, update, or delete) on a particular table (identified by input parameter table_name) and forwards event notifications to subscribers via ZMQ. After this call completes, subscribe to the returned output parameter topic_id on the ZMQ table monitor port (default 9002). Each time an operation of the given type on the table completes, a multipart message is published for that topic; the first part contains only the topic ID, and each subsequent part contains one binary-encoded Avro object that corresponds to the event and can be decoded using output parameter type_schema. The monitor will continue to run (regardless of whether or not there are any subscribers) until deactivated with GPUdb.clear_table_monitor().

For more information on table monitors, see Table Monitors.

Parameters

Returns

create_trigger_by_area(request_id=None, table_names=None, x_column_name=None, x_vector=None, y_column_name=None, y_vector=None, options=)

Sets up an area trigger mechanism for two column_names for one or more tables. (This function is essentially the two-dimensional version of GPUdb.create_trigger_by_range().) Once the trigger has been activated, any record added to the listed tables(s) via GPUdb.insert_records() with the chosen columns’ values falling within the specified region will trip the trigger. All such records will be queued at the trigger port (by default ‘9001’ but able to be retrieved via GPUdb.show_system_status()) for any listening client to collect. Active triggers can be cancelled by using the GPUdb.clear_trigger() endpoint or by clearing all relevant tables.

The output returns the trigger handle as well as indicating success or failure of the trigger activation.

Parameters

Returns

create_trigger_by_range(request_id=None, table_names=None, column_name=None, min=None, max=None, options=)

Sets up a simple range trigger for a column_name for one or more tables. Once the trigger has been activated, any record added to the listed tables(s) via GPUdb.insert_records() with the chosen column_name’s value falling within the specified range will trip the trigger. All such records will be queued at the trigger port (by default ‘9001’ but able to be retrieved via GPUdb.show_system_status()) for any listening client to collect. Active triggers can be cancelled by using the GPUdb.clear_trigger() endpoint or by clearing all relevant tables.

The output returns the trigger handle as well as indicating success or failure of the trigger activation.

Parameters

Returns

create_type(type_definition=None, label=None, properties=, options=)

Creates a new type describing the columns of a table. The type definition is specified as a list of columns, each specified as a list of the column name, data type, and any column attributes.

Example of a type definition with some parameters:

[
    ["id", "int8", "primary_key"],
    ["dept_id", "int8", "primary_key", "shard_key"],
    ["manager_id", "int8", "nullable"],
    ["first_name", "char32"],
    ["last_name", "char64"],
    ["salary", "decimal"],
    ["hire_date", "date"]
]

Each column definition consists of the column name (which should meet the standard column naming criteria), the column’s specific type (int, long, float, double, string, bytes, or any of the possible values for input parameter properties), and any data handling, data key, or data replacement properties.

Note that some properties are mutually exclusive–i.e. they cannot be specified for any given column simultaneously. One example of mutually exclusive properties are primary_key and nullable.

A single primary key and/or single shard key can be set across one or more columns. If a primary key is specified, then a uniqueness constraint is enforced, in that only a single object can exist with a given primary key column value (or set of values for the key columns, if using a composite primary key). When inserting data into a table with a primary key, depending on the parameters in the request, incoming objects with primary key values that match existing objects will either overwrite (i.e. update) the existing object or will be skipped and not added into the set.

Parameters

Returns

create_union(table_name=None, table_names=None, input_column_names=None, output_column_names=None, options=)

Merges data from one or more tables with comparable data types into a new table.

The following merges are supported:

UNION (DISTINCT/ALL) - For data set union details and examples, see Union. For limitations, see Union Limitations and Cautions.

INTERSECT (DISTINCT/ALL) - For data set intersection details and examples, see Intersect. For limitations, see Intersect Limitations.

EXCEPT (DISTINCT/ALL) - For data set subtraction details and examples, see Except. For limitations, see Except Limitations.

Parameters

Returns

create_user_external(name=None, options=)

Creates a new external user (a user whose credentials are managed by an external LDAP).

Note

This method should be used for on-premise deployments only.

Parameters

Returns

create_user_internal(name=None, password=None, options=)

Creates a new internal user (a user whose credentials are managed by the database system).

Parameters

Returns

create_video(attribute=None, begin=None, duration_seconds=None, end=None, frames_per_second=None, style=None, path=None, style_parameters=None, options=)

Creates a job to generate a sequence of raster images that visualize data over a specified time.

Parameters

Returns

delete_directory(directory_name=None, options=)

Deletes a directory from KiFS.

Parameters

Returns

delete_files(file_names=None, options=)

Deletes one or more files from KiFS.

Parameters

Returns

delete_graph(graph_name=None, options=)

Deletes an existing graph from the graph server and/or persist.

Parameters

Returns

delete_proc(proc_name=None, options=)

Deletes a proc. Any currently running instances of the proc will be killed.

Parameters

Returns

delete_records(table_name=None, expressions=None, options=)

Deletes record(s) matching the provided criteria from the given table. The record selection criteria can either be one or more input parameter expressions (matching multiple records), a single record identified by record_id options, or all records when using delete_all_records. Note that the three selection criteria are mutually exclusive. This operation cannot be run on a view. The operation is synchronous meaning that a response will not be available until the request is completely processed and all the matching records are deleted.

Parameters

Returns

delete_resource_group(name=None, options=)

Deletes a resource group.

Parameters

Returns

delete_role(name=None, options=)

Deletes an existing role.

Note

This method should be used for on-premise deployments only.

Parameters

Returns

delete_user(name=None, options=)

Deletes an existing user.

Note

This method should be used for on-premise deployments only.

Parameters

Returns

download_files(file_names=None, read_offsets=None, read_lengths=None, options=)

Downloads one or more files from KiFS.

Parameters

Returns

drop_backup(backup_name=None, datasink_name=None, options=)

Deletes one or more existing database backups and contained snapshots, accessible via the data sink specified by input parameter datasink_name.

Parameters

Returns

drop_catalog(name=None, options=)

Drops an existing catalog. Any external tables that depend on the catalog must be dropped before it can be dropped.

Parameters

Returns

drop_credential(credential_name=None, options=)

Drop an existing credential.

Parameters

Returns

drop_datasink(name=None, options=)

Drops an existing data sink.

By default, if any table monitors use this sink as a destination, the request will be blocked unless option clear_table_monitors is true.

Parameters

Returns

drop_datasource(name=None, options=)

Drops an existing data source. Any external tables that depend on the data source must be dropped before it can be dropped.

Parameters

Returns

drop_environment(environment_name=None, options=)

Drop an existing user-defined function (UDF) environment.

Parameters

Returns

drop_schema(schema_name=None, options=)

Drops an existing SQL-style schema, specified in input parameter schema_name.

Parameters

Returns

execute_proc(proc_name=None, params=, bin_params=, input_table_names=[], input_column_names=, output_table_names=[], options=)

Executes a proc. This endpoint is asynchronous and does not wait for the proc to complete before returning.

If the proc being executed is distributed, input parameter input_table_names and input parameter input_column_names may be passed to the proc to use for reading data, and input parameter output_table_names may be passed to the proc to use for writing data.

If the proc being executed is non-distributed, these table parameters will be ignored.

Parameters

Returns

execute_sql(statement=None, offset=0, limit=-9999, encoding=‘binary’, request_schema_str=, data=[], options=)

Execute a SQL statement (query, DML, or DDL).

See SQL Support for the complete set of supported SQL commands.

When a caller wants all the results from a large query (e.g., more than max_get_records_size records), they can make multiple calls to this endpoint using the input parameter offset and input parameter limit parameters to page through the results. Normally, this will execute the input parameter statement query each time. To avoid re-executing the query each time and to keep the results in the same order, the caller should specify a paging_table name to hold the results of the query between calls and specify the paging_table on subsequent calls. When this is done, the caller should clear the paging table and any other tables in the result_table_list (both returned in the response) when they are done paging through the results. Output parameter paging_table (and result_table_list) will be empty if no paging table was created (e.g., when all the query results were returned in the first call).

Parameters

Returns

execute_sql_and_decode(statement=None, offset=0, limit=-9999, encoding=‘binary’, request_schema_str=, data=[], options=, record_type=None, force_primitive_return_types=True, get_column_major=True)

Execute a SQL statement (query, DML, or DDL).

See SQL Support for the complete set of supported SQL commands.

When a caller wants all the results from a large query (e.g., more than max_get_records_size records), they can make multiple calls to this endpoint using the input parameter offset and input parameter limit parameters to page through the results. Normally, this will execute the input parameter statement query each time. To avoid re-executing the query each time and to keep the results in the same order, the caller should specify a paging_table name to hold the results of the query between calls and specify the paging_table on subsequent calls. When this is done, the caller should clear the paging table and any other tables in the result_table_list (both returned in the response) when they are done paging through the results. Output parameter paging_table (and result_table_list) will be empty if no paging table was created (e.g., when all the query results were returned in the first call).

Parameters

Returns

export_query_metrics(options=)

Export query metrics to a given destination. Returns query metrics.

Parameters

Returns

export_records_to_files(table_name=None, filepath=None, options=)

Export records from a table to files. All tables can be exported, in full or partial (see columns_to_export and columns_to_skip). Additional filtering can be applied when using export table with expression through SQL. Default destination is KIFS, though other storage types (Azure, S3, GCS, and HDFS) are supported through datasink_name; see GPUdb.create_datasink().

Server’s local file system is not supported. Default file format is delimited text. See options for different file types and different options for each file type. Table is saved to a single file if within max file size limits (may vary depending on datasink type). If not, then table is split into multiple files; these may be smaller than the max size limit.

All filenames created are returned in the response.

Parameters

Returns

export_records_to_table(table_name=None, remote_query=, options=)

Exports records from source table to the specified target table in an external database.

Parameters

Returns

filter(table_name=None, view_name=, expression=None, options=)

Filters data based on the specified expression. The results are stored in a result set with the given input parameter view_name.

For details see Expressions.

The response message contains the number of points for which the expression evaluated to be true, which is equivalent to the size of the result view.

Parameters

Returns

filter_by_area(table_name=None, view_name=, x_column_name=None, x_vector=None, y_column_name=None, y_vector=None, options=)

Calculates which objects from a table are within a named area of interest (NAI/polygon). The operation is synchronous, meaning that a response will not be returned until all the matching objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input NAI restriction specification is created with the name input parameter view_name passed in as part of the input.

Parameters

Returns

filter_by_area_geometry(table_name=None, view_name=, column_name=None, x_vector=None, y_vector=None, options=)

Calculates which geospatial geometry objects from a table intersect a named area of interest (NAI/polygon). The operation is synchronous, meaning that a response will not be returned until all the matching objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input NAI restriction specification is created with the name input parameter view_name passed in as part of the input.

Parameters

Returns

filter_by_box(table_name=None, view_name=, x_column_name=None, min_x=None, max_x=None, y_column_name=None, min_y=None, max_y=None, options=)

Calculates how many objects within the given table lie in a rectangular box. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set which satisfies the input NAI restriction specification is also created when a input parameter view_name is passed in as part of the input payload.

Parameters

Returns

filter_by_box_geometry(table_name=None, view_name=, column_name=None, min_x=None, max_x=None, min_y=None, max_y=None, options=)

Calculates which geospatial geometry objects from a table intersect a rectangular box. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set which satisfies the input NAI restriction specification is also created when a input parameter view_name is passed in as part of the input payload.

Parameters

Returns

filter_by_geometry(table_name=None, view_name=, column_name=None, input_wkt=, operation=None, options=)

Applies a geometry filter against a geospatial geometry column in a given table or view. The filtering geometry is provided by input parameter input_wkt.

Parameters

Returns

filter_by_list(table_name=None, view_name=, column_values_map=None, options=)

Calculates which records from a table have values in the given list for the corresponding column. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input filter specification is also created if a input parameter view_name is passed in as part of the request.

For example, if a type definition has the columns ‘x’ and ‘y’, then a filter by list query with the column map “x”:[“10.1”, “2.3”], “y”:[“0.0”, “-31.5”, “42.0”] will return the count of all data points whose x and y values match both in the respective x- and y-lists, e.g., “x = 10.1 and y = 0.0”, “x = 2.3 and y = -31.5”, etc. However, a record with “x = 10.1 and y = -31.5” or “x = 2.3 and y = 0.0” would not be returned because the values in the given lists do not correspond.

Parameters

Returns

filter_by_radius(table_name=None, view_name=, x_column_name=None, x_center=None, y_column_name=None, y_center=None, radius=None, options=)

Calculates which objects from a table lie within a circle with the given radius and center point (i.e. circular NAI). The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input circular NAI restriction specification is also created if a input parameter view_name is passed in as part of the request.

For track data, all track points that lie within the circle plus one point on either side of the circle (if the track goes beyond the circle) will be included in the result.

Parameters

Returns

filter_by_radius_geometry(table_name=None, view_name=, column_name=None, x_center=None, y_center=None, radius=None, options=)

Calculates which geospatial geometry objects from a table intersect a circle with the given radius and center point (i.e. circular NAI). The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input circular NAI restriction specification is also created if a input parameter view_name is passed in as part of the request.

Parameters

Returns

filter_by_range(table_name=None, view_name=, column_name=None, lower_bound=None, upper_bound=None, options=)

Calculates which objects from a table have a column that is within the given bounds. An object from the table identified by input parameter table_name is added to the view input parameter view_name if its column is within [input parameter lower_bound, input parameter upper_bound] (inclusive). The operation is synchronous. The response provides a count of the number of objects which passed the bound filter. Although this functionality can also be accomplished with the standard filter function, it is more efficient.

For track objects, the count reflects how many points fall within the given bounds (which may not include all the track points of any given track).

Parameters

Returns

filter_by_series(table_name=None, view_name=, track_id=None, target_track_ids=None, options=)

Filters objects matching all points of the given track (works only on track type data). It allows users to specify a particular track to find all other points in the table that fall within specified ranges (spatial and temporal) of all points of the given track. Additionally, the user can specify another track to see if the two intersect (or go close to each other within the specified ranges). The user also has the flexibility of using different metrics for the spatial distance calculation: Euclidean (flat geometry) or Great Circle (spherical geometry to approximate the Earth’s surface distances). The filtered points are stored in a newly created result set. The return value of the function is the number of points in the resultant set (view).

This operation is synchronous, meaning that a response will not be returned until all the objects are fully available.

Parameters

Returns

filter_by_string(table_name=None, view_name=, expression=None, mode=None, column_names=None, options=)

Calculates which objects from a table or view match a string expression for the given string columns. Setting case_sensitive can modify case sensitivity in matching for all modes except search. For search mode details and limitations, see Full Text Search.

Parameters

Returns

filter_by_table(table_name=None, view_name=, column_name=None, source_table_name=None, source_table_column_name=None, options=)

Filters objects in one table based on objects in another table. The user must specify matching column types from the two tables (i.e. the target table from which objects will be filtered and the source table based on which the filter will be created); the column names need not be the same. If a input parameter view_name is specified, then the filtered objects will then be put in a newly created view. The operation is synchronous, meaning that a response will not be returned until all objects are fully available in the result view. The return value contains the count (i.e. the size) of the resulting view.

Parameters

Returns

filter_by_value(table_name=None, view_name=, is_string=None, value=0, value_str=, column_name=None, options=)

Calculates which objects from a table has a particular value for a particular column. The input parameters provide a way to specify either a String or a Double valued column and a desired value for the column on which the filter is performed. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new result view which satisfies the input filter restriction specification is also created with a view name passed in as part of the input payload. Although this functionality can also be accomplished with the standard filter function, it is more efficient.

Parameters

Returns

get_graph_entities(graph_name=None, offset=0, limit=10000, options=)

Retrieves node or edge entities from an existing graph, with pagination support via offset and limit. Use GPUdb.show_graph() to obtain the total number of nodes and edges.

Parameters

Returns

get_job(job_id=None, options=)

Get the status and result of asynchronously running job. See the GPUdb.create_job() for starting an asynchronous job. Some fields of the response are filled only after the submitted job has finished execution.

Parameters

Returns

get_records(table_name=None, offset=0, limit=-9999, encoding=‘binary’, options=, get_record_type=True)

Retrieves records from a given table, optionally filtered by an expression and/or sorted by a column. This operation can be performed on tables and views. Records can be returned encoded as binary, json, or geojson.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters. Note that when paging through a table, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records retrieved may differ between calls based on the updates applied.

Parameters

Returns

get_records_and_decode(table_name=None, offset=0, limit=-9999, encoding=‘binary’, options=, record_type=None, force_primitive_return_types=True)

Retrieves records from a given table, optionally filtered by an expression and/or sorted by a column. This operation can be performed on tables and views. Records can be returned encoded as binary, json, or geojson.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters. Note that when paging through a table, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records retrieved may differ between calls based on the updates applied.

Parameters

Returns

get_records_by_column(table_name=None, column_names=None, offset=0, limit=-9999, encoding=‘binary’, options=)

For a given table, retrieves the values from the requested column(s). Maps of column name to the array of values as well as the column data type are returned. This endpoint supports pagination with the input parameter offset and input parameter limit parameters.

Window functions, which can perform operations like moving averages, are available through this endpoint as well as GPUdb.create_projection().

When using pagination, if the table (or the underlying table in the case of a view) is modified (records are inserted, updated, or deleted) during a call to the endpoint, the records or values retrieved may differ between calls based on the type of the update, e.g., the contiguity across pages cannot be relied upon.

If input parameter table_name is empty, selection is performed against a single-row virtual table. This can be useful in executing temporal (NOW()), identity (USER()), or constant-based functions (GEODIST(-77.11, 38.88, -71.06, 42.36)).

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

Returns

get_records_by_column_and_decode(table_name=None, column_names=None, offset=0, limit=-9999, encoding=‘binary’, options=, record_type=None, force_primitive_return_types=True, get_column_major=True)

For a given table, retrieves the values from the requested column(s). Maps of column name to the array of values as well as the column data type are returned. This endpoint supports pagination with the input parameter offset and input parameter limit parameters.

Window functions, which can perform operations like moving averages, are available through this endpoint as well as GPUdb.create_projection().

When using pagination, if the table (or the underlying table in the case of a view) is modified (records are inserted, updated, or deleted) during a call to the endpoint, the records or values retrieved may differ between calls based on the type of the update, e.g., the contiguity across pages cannot be relied upon.

If input parameter table_name is empty, selection is performed against a single-row virtual table. This can be useful in executing temporal (NOW()), identity (USER()), or constant-based functions (GEODIST(-77.11, 38.88, -71.06, 42.36)).

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

Returns

get_records_by_series(table_name=None, world_table_name=None, offset=0, limit=250, encoding=‘binary’, options=)

Retrieves the complete series/track records from the given input parameter world_table_name based on the partial track information contained in the input parameter table_name.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

In contrast to GPUdb.get_records() this returns records grouped by series/track. So if input parameter offset is 0 and input parameter limit is 5 this operation would return the first 5 series/tracks in input parameter table_name. Each series/track will be returned sorted by their TIMESTAMP column.

Parameters

Returns

get_records_by_series_and_decode(table_name=None, world_table_name=None, offset=0, limit=250, encoding=‘binary’, options=, force_primitive_return_types=True)

Retrieves the complete series/track records from the given input parameter world_table_name based on the partial track information contained in the input parameter table_name.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

In contrast to GPUdb.get_records() this returns records grouped by series/track. So if input parameter offset is 0 and input parameter limit is 5 this operation would return the first 5 series/tracks in input parameter table_name. Each series/track will be returned sorted by their TIMESTAMP column.

Parameters

Returns

get_records_from_collection(table_name=None, offset=0, limit=-9999, encoding=‘binary’, options=)

Retrieves records from a collection. The operation can optionally return the record IDs which can be used in certain queries such as GPUdb.delete_records().

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

Note that when using the Java API, it is not possible to retrieve records from join views using this operation. (DEPRECATED)

Parameters

Returns

get_records_from_collection_and_decode(table_name=None, offset=0, limit=-9999, encoding=‘binary’, options=, force_primitive_return_types=True)

Retrieves records from a collection. The operation can optionally return the record IDs which can be used in certain queries such as GPUdb.delete_records().

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

Note that when using the Java API, it is not possible to retrieve records from join views using this operation. (DEPRECATED)

Parameters

Returns

grant_permission(principal=, object=None, object_type=None, permission=None, options=)

Grant user or role the specified permission on the specified object.

Parameters

Returns

grant_permission_credential(name=None, permission=None, credential_name=None, options=)

Grants a credential-level permission to a user or role.

Parameters

Returns

grant_permission_datasource(name=None, permission=None, datasource_name=None, options=)

Grants a data source permission to a user or role.

Parameters

Returns

grant_permission_directory(name=None, permission=None, directory_name=None, options=)

Grants a KiFS directory-level permission to a user or role.

Parameters

Returns

grant_permission_proc(name=None, permission=None, proc_name=None, options=)

Grants a proc-level permission to a user or role.

Parameters

Returns

grant_permission_system(name=None, permission=None, options=)

Grants a system-level permission to a user or role.

Parameters

Returns

grant_permission_table(name=None, permission=None, table_name=None, filter_expression=, options=)

Grants a table-level permission to a user or role.

Parameters

Returns

grant_role(role=None, member=None, options=)

Grants membership in a role to a user or role.

Parameters

Returns

has_permission(principal=, object=None, object_type=None, permission=None, options=)

Checks if the specified user has the specified permission on the specified object.

Parameters

Returns

has_proc(proc_name=None, options=)

Checks the existence of a proc with the given name.

Parameters

Returns

has_role(principal=, role=None, options=)

Checks if the specified user has the specified role.

Parameters

Returns

has_schema(schema_name=None, options=)

Checks for the existence of a schema with the given name.

Parameters

Returns

has_table(table_name=None, options=)

Checks for the existence of a table with the given name.

Parameters

Returns

has_type(type_id=None, options=)

Check for the existence of a type.

Parameters

Returns

insert_records(table_name=None, data=None, list_encoding=None, options=, record_type=None)

Adds multiple records to the specified table. The operation is synchronous, meaning that a response will not be returned until all the records are fully inserted and available. The response payload provides the counts of the number of records actually inserted and/or updated, and can provide the unique identifier of each added record.

The input parameter options parameter can be used to customize this function’s behavior.

The update_on_existing_pk option specifies the record collision policy for inserting into a table with a primary key, but is ignored if no primary key exists.

The return_record_ids option indicates that the database should return the unique identifiers of inserted records.

Parameters

Returns

insert_records_from_files(table_name=None, filepaths=None, modify_columns=, create_table_options=, options=)

Reads from one or more files and inserts the data into a new or existing table. The source data can be located either in KiFS; on the cluster, accessible to the database; or remotely, accessible via a pre-defined external data source.

For delimited text files, there are two loading schemes: positional and name-based. The name-based loading scheme is enabled when the file has a header present and text_has_header is set to true. In this scheme, the source file(s) field names must match the target table’s column names exactly; however, the source file can have more fields than the target table has columns. If error_handling is set to permissive, the source file can have fewer fields than the target table has columns. If the name-based loading scheme is being used, names matching the file header’s names may be provided to columns_to_load instead of numbers, but ranges are not supported.

Note: Due to data being loaded in parallel, there is no insertion order guaranteed. For tables with primary keys, in the case of a primary key collision, this means it is indeterminate which record will be inserted first and remain, while the rest of the colliding key records are discarded.

Returns once all files are processed.

Parameters

Returns

insert_records_from_payload(table_name=None, data_text=None, data_bytes=None, modify_columns=, create_table_options=, options=)

Reads from the given text-based or binary payload and inserts the data into a new or existing table. The table will be created if it doesn’t already exist.

Returns once all records are processed.

Parameters

Returns

insert_records_from_query(table_name=None, remote_query=None, modify_columns=, create_table_options=, options=)

Computes remote query result and inserts the result data into a new or existing table.

Parameters

Returns

insert_records_random(table_name=None, count=None, options=)

Generates a specified number of random records and adds them to the given table. There is an optional parameter that allows the user to customize the ranges of the column values. It also allows the user to specify linear profiles for some or all columns in which case linear values are generated rather than random ones. Only individual tables are supported for this operation.

This operation is synchronous, meaning that a response will not be returned until all random records are fully available.

Parameters

Returns

insert_symbol(symbol_id=None, symbol_format=None, symbol_data=None, options=)

Adds a symbol or icon (i.e. an image) to represent data points when data is rendered visually. Users must provide the symbol identifier (string), a format (currently supported: ‘svg’ and ‘svg_path’), the data for the symbol, and any additional optional parameter (e.g. color). To have a symbol used for rendering create a table with a string column named ‘SYMBOLCODE’ (along with ‘x’ or ‘y’ for example). Then when the table is rendered (via WMS) if the ‘dosymbology’ parameter is ‘true’ then the value of the ‘SYMBOLCODE’ column is used to pick the symbol displayed for each point.

Parameters

Returns

kill_proc(run_id=, options=)

Kills a running proc instance.

Parameters

Returns

lock_table(table_name=None, lock_type=‘status’, options=)

Manages global access to a table’s data. By default a table has a input parameter lock_type of read_write, indicating all operations are permitted. A user may request a read_only or a write_only lock, after which only read or write operations, respectively, are permitted on the table until the lock is removed. When input parameter lock_type is no_access then no operations are permitted on the table. The lock status can be queried by setting input parameter lock_type to status.

Parameters

Returns

match_graph(graph_name=None, sample_points=None, solve_method=‘markov_chain’, solution_table=, options=)

Matches a directed route implied by a given set of latitude/longitude points to an existing underlying road network graph using a given solution type.

IMPORTANT: It’s highly recommended that you review the Graphs and Solvers concepts documentation, the Graph REST Tutorial, and/or some /match/graph examples before using this endpoint.

Parameters

Returns

modify_graph(graph_name=None, nodes=None, edges=None, weights=None, restrictions=None, options=)

Update an existing graph network using given nodes, edges, weights, restrictions, and options.

IMPORTANT: It’s highly recommended that you review the Graphs and Solvers concepts documentation, and Graph REST Tutorial before using this endpoint.

Parameters

Returns

query_graph(graph_name=None, queries=None, restrictions=[], adjacency_table=, rings=1, options=)

Employs a topological query on a graph generated a-priori by GPUdb.create_graph() and returns a list of adjacent edge(s) or node(s), also known as an adjacency list, depending on what’s been provided to the endpoint; providing edges will return nodes and providing nodes will return edges.

To determine the node(s) or edge(s) adjacent to a value from a given column, provide a list of values to input parameter queries. This field can be populated with column values from any table as long as the type is supported by the given identifier. See Query Identifiers for more information.

To return the adjacency list in the response, leave input parameter adjacency_table empty.

IMPORTANT: It’s highly recommended that you review the Graphs and Solvers concepts documentation, the Graph REST Tutorial, and/or some /match/graph examples before using this endpoint.

Parameters

Returns

repartition_graph(graph_name=None, options=)

Rebalances an existing partitioned graph.

IMPORTANT: It’s highly recommended that you review the Graphs and Solvers concepts documentation, the Graph REST Tutorial, and/or some graph examples before using this endpoint.

Parameters

Returns

restore_backup(backup_name=, restore_objects_map=None, datasource_name=None, options=)

Restores database objects from a backup accessible via the data source specified by input parameter datasource_name.

Parameters

Returns

revoke_permission(principal=, object=None, object_type=None, permission=None, options=)

Revoke user or role the specified permission on the specified object.

Parameters

Returns

revoke_permission_credential(name=None, permission=None, credential_name=None, options=)

Revokes a credential-level permission from a user or role.

Parameters

Returns

revoke_permission_datasource(name=None, permission=None, datasource_name=None, options=)

Revokes a data source permission from a user or role.

Parameters

Returns

revoke_permission_directory(name=None, permission=None, directory_name=None, options=)

Revokes a KiFS directory-level permission from a user or role.

Parameters

Returns

revoke_permission_proc(name=None, permission=None, proc_name=None, options=)

Revokes a proc-level permission from a user or role.

Parameters

Returns

revoke_permission_system(name=None, permission=None, options=)

Revokes a system-level permission from a user or role.

Parameters

Returns

revoke_permission_table(name=None, permission=None, table_name=None, options=)

Revokes a table-level permission from a user or role.

Parameters

Returns

revoke_role(role=None, member=None, options=)

Revokes membership in a role from a user or role.

Parameters

Returns

show_backup(backup_name=, datasource_name=None, options=)

Shows information about one or more backups accessible via the data source specified by input parameter datasource_name.

Parameters

Returns

show_credential(credential_name=None, options=)

Shows information about a specified credential or all credentials.

Parameters

Returns

show_datasink(name=None, options=)

Shows information about a specified data sink or all data sinks.

Parameters

Returns

show_datasource(name=None, options=)

Shows information about a specified data source or all data sources.

Parameters

Returns

show_directories(directory_name=, options=)

Shows information about directories in KiFS. Can be used to show a single directory, or all directories.

Parameters

Returns

show_environment(environment_name=, options=)

Shows information about a specified user-defined function (UDF) environment or all environments. Returns detailed information about existing environments.

Parameters

Returns

show_files(paths=None, options=)

Shows information about files in KiFS. Can be used for individual files, or to show all files in a given directory.

Parameters

Returns

show_graph(graph_name=, options=)

Shows information and characteristics of graphs that exist on the graph server.

Parameters

Returns

show_proc(proc_name=, options=)

Shows information about a proc.

Parameters

Returns

show_proc_status(run_id=, options=)

Shows the statuses of running or completed proc instances. Results are grouped by run ID (as returned from GPUdb.execute_proc()) and data segment ID (each invocation of the proc command on a data segment is assigned a data segment ID).

Parameters

Returns

show_resource_objects(options=)

Returns information about the internal sub-components (tiered objects) which use resources of the system. The request can either return results from actively used objects (default) or it can be used to query the status of the objects of a given list of tables. Returns detailed information about the requested resource objects.

Parameters

Returns

show_resource_statistics(options=)

Requests various statistics for storage/memory tiers and resource groups. Returns statistics on a per-rank basis.

Parameters

Returns

show_resource_groups(names=None, options=)

Requests resource group properties. Returns detailed information about the requested resource groups.

Parameters

Returns

show_schema(schema_name=None, options=)

Retrieves information about a schema (or all schemas), as specified in input parameter schema_name.

Parameters

Returns

show_security(names=None, options=)

Shows security information relating to users and/or roles. If the caller is not a system administrator, only information relating to the caller and their roles is returned.

Parameters

Returns

show_sql_proc(procedure_name=, options=)

Shows information about SQL procedures, including the full definition of each requested procedure.

Parameters

Returns

show_statistics(table_names=None, options=)

Retrieves the collected column statistics for the specified table(s).

Parameters

Returns

show_system_properties(options=)

Returns server configuration and version related information to the caller. The admin tool uses it to present server related information to the user.

Parameters

Returns

show_system_status(options=)

Provides server configuration and health related status to the caller. The admin tool uses it to present server related information to the user.

Parameters

Returns

show_system_timing(options=)

Returns the last 100 database requests along with the request timing and internal job ID. The admin tool uses it to present request timing information to the user.

Parameters

Returns

show_table(table_name=None, options=)

Retrieves detailed information about a table, view, or schema, specified in input parameter table_name. If the supplied input parameter table_name is a schema the call can return information about either the schema itself or the tables and views it contains. If input parameter table_name is empty, information about all schemas will be returned.

If the option get_sizes is set to true, then the number of records in each table is returned (in output parameter sizes and output parameter full_sizes), along with the total number of objects across all requested tables (in output parameter total_size and output parameter total_full_size).

For a schema, setting the show_children option to false returns only information about the schema itself; setting show_children to true returns a list of tables and views contained in the schema, along with their corresponding detail.

To retrieve a list of every table, view, and schema in the database, set input parameter table_name to ‘*’ and show_children to true. When doing this, the returned output parameter total_size and output parameter total_full_size will not include the sizes of non-base tables (e.g., filters, views, joins, etc.).

Parameters

Returns

show_table_metadata(table_names=None, options=)

Retrieves the user provided metadata for the specified tables.

Parameters

Returns

show_table_monitors(monitor_ids=None, options=)

Show table monitors and their properties. Table monitors are created using GPUdb.create_table_monitor(). Returns detailed information about existing table monitors.

Parameters

Returns

show_tables_by_type(type_id=None, label=None, options=)

Gets names of the tables whose type matches the given criteria. Each table has a particular type. This type comprises the schema and properties of the table and sometimes a type label. This function allows a look up of the existing tables based on full or partial type information. The operation is synchronous.

Parameters

Returns

show_triggers(trigger_ids=None, options=)

Retrieves information regarding the specified triggers or all existing triggers currently active.

Parameters

Returns

show_types(type_id=None, label=None, options=)

Retrieves information for the specified data type ID or type label. For all data types that match the input criteria, the database returns the type ID, the type schema, the label (if available), and the type’s column properties.

Parameters

Returns

show_video(paths=None, options=)

Retrieves information about rendered videos.

Parameters

Returns

show_wal(table_names=None, options=)

Requests table write-ahead log (WAL) properties. Returns information about the requested table WAL entries.

Parameters

Returns

solve_graph(graph_name=None, weights_on_edges=[], restrictions=[], solver_type=‘SHORTEST_PATH’, source_nodes=[], destination_nodes=[], solution_table=‘graph_solutions’, options=)

Solves an existing graph for a type of problem (e.g., shortest path, page rank, traveling salesman, etc.) using source nodes, destination nodes, and additional, optional weights and restrictions.

IMPORTANT: It’s highly recommended that you review the Graphs and Solvers concepts documentation, the Graph REST Tutorial, and/or some /solve/graph examples before using this endpoint.

Parameters

Returns

update_records(table_name=None, expressions=None, new_values_maps=None, records_to_insert=[], records_to_insert_str=[], record_encoding=‘binary’, options=, record_type=None)

Runs multiple predicate-based updates in a single call. With the list of given expressions, any matching record’s column values will be updated as provided in input parameter new_values_maps. There is also an optional ‘upsert’ capability where if a particular predicate doesn’t match any existing record, then a new record can be inserted.

Note that this operation can only be run on an original table and not on a result view.

This operation can update primary key values. By default only ‘pure primary key’ predicates are allowed when updating primary key values. If the primary key for a table is the column ‘attr1’, then the operation will only accept predicates of the form: “attr1 == ‘foo’” if the attr1 column is being updated. For a composite primary key (e.g. columns ‘attr1’ and ‘attr2’) then this operation will only accept predicates of the form: “(attr1 == ‘foo’) and (attr2 == ‘bar’)”. Meaning, all primary key columns must appear in an equality predicate in the expressions. Furthermore each ‘pure primary key’ predicate must be unique within a given request. These restrictions can be removed by utilizing some available options through input parameter options.

The update_on_existing_pk option specifies the record primary key collision policy for tables with a primary key, while ignore_existing_pk specifies the record primary key collision error-suppression policy when those collisions result in the update being rejected. Both are ignored on tables with no primary key.

Parameters

Returns

upload_files(file_names=None, file_data=None, options=)

Uploads one or more files to KiFS. There are two methods for uploading files: load files in their entirety, or load files in parts. The latter is recommended for files of approximately 60 MB or larger.

To upload files in their entirety, populate input parameter file_names with the file names to upload into on KiFS, and their respective byte content in input parameter file_data.

Multiple steps are involved when uploading in multiple parts. Only one file at a time can be uploaded in this manner. A user-provided UUID is utilized to tie all the upload steps together for a given file. To upload a file in multiple parts:

  1. Provide the file name in input parameter file_names, the UUID in the multipart_upload_uuid key in input parameter options, and a multipart_operation value of init.

  2. Upload one or more parts by providing the file name, the part data in input parameter file_data, the UUID, a multipart_operation value of upload_part, and the part number in the multipart_upload_part_number. The part numbers must start at 1 and increase incrementally. Parts may not be uploaded out of order.

  3. Complete the upload by providing the file name, the UUID, and a multipart_operation value of complete.

Multipart uploads in progress may be canceled by providing the file name, the UUID, and a multipart_operation value of cancel. If an new upload is initialized with a different UUID for an existing upload in progress, the pre-existing upload is automatically canceled in favor of the new upload.

The multipart upload must be completed for the file to be usable in KiFS. Information about multipart uploads in progress is available in GPUdb.show_files().

File data may be pre-encoded using base64 encoding. This should be indicated using the file_encoding option, and is recommended when using JSON serialization.

Each file path must reside in a top-level KiFS directory, i.e. one of the directories listed in GPUdb.show_directories(). The user must have write permission on the directory. Nested directories are permitted in file name paths. Directories are delineated with the directory separator of ‘/’. For example, given the file path ‘/a/b/c/d.txt’, ‘a’ must be a KiFS directory.

These characters are allowed in file name paths: letters, numbers, spaces, the path delimiter of ‘/’, and the characters: ‘.’ ‘-’ ‘:’ ‘[’ ‘]’ ‘(’ ‘)’ ‘#’ ‘=’.

Parameters

Returns

upload_files_fromurl(file_names=None, urls=None, options=)

Uploads one or more files to KiFS.

Each file path must reside in a top-level KiFS directory, i.e. one of the directories listed in GPUdb.show_directories(). The user must have write permission on the directory. Nested directories are permitted in file name paths. Directories are delineated with the directory separator of ‘/’. For example, given the file path ‘/a/b/c/d.txt’, ‘a’ must be a KiFS directory.

These characters are allowed in file name paths: letters, numbers, spaces, the path delimiter of ‘/’, and the characters: ‘.’ ‘-’ ‘:’ ‘[’ ‘]’ ‘(’ ‘)’ ‘#’ ‘=’.

Parameters

Returns

verify_backup(backup_name=, datasource_name=None, options=)

Inspects the requested database backup(s) for conformity at the remote file store accessible via the data source specified by input parameter datasource_name. By default all snapshots are inspected unless the option backup_id is used to target a specific instance. Returns backup verification results.

Parameters

Returns

visualize_image_chart(table_name=None, x_column_names=None, y_column_names=None, min_x=None, max_x=None, min_y=None, max_y=None, width=None, height=None, bg_color=None, style_options=None, options=)

Scatter plot is the only plot type currently supported. A non-numeric column can be specified as x or y column and jitters can be added to them to avoid excessive overlapping. All color values must be in the format RRGGBB or AARRGGBB (to specify the alpha value). The image is contained in the output parameter image_data field.

Parameters

Returns

visualize_isochrone(graph_name=None, source_node=None, max_solution_radius=-1.0, weights_on_edges=[], restrictions=[], num_levels=1, generate_image=True, levels_table=, style_options=None, solve_options=, contour_options=, options=)

Generate an image containing isolines for travel results using an existing graph. Isolines represent curves of equal cost, with cost typically referring to the time or distance assigned as the weights of the underlying graph. See Graphs and Solvers for more information on graphs.

Parameters

Returns