Class GPUdbIngestor

class gpudb_multihead_io.GPUdbIngestor(gpudb, table_name, record_type, batch_size, options=None, workers=None, is_table_replicated=False, json_ingestion=False)[source]

Initializes the GPUdbIngestor instance.

Parameters

gpudb (gpudb.GPUdb) –

The client handle through which the ingestion process is to be conducted.

table_name (str) –

The name of the table into which records will be ingested. Must be an existing table.

record_type (gpudb.GPUdbRecordType) –

The type for the records which will be ingested; must match the type of the given table.

batch_size (int) –

The size of the queues; when any queue (one per worker rank of the database server) attains the given size, the queued records will be automatically flushed. Until then, those records will be held client-side and not actually ingested. (Unless flush() is called, of course.)

options (dict of str to str) –

Any insertion options to be passed onto the GPUdb server. Optional parameter.

workers (GPUdbWorkerList) –

Optional parameter. A list of GPUdb worker rank addresses.

is_table_replicated (bool) –

Optional boolean flag indicating whether the table is replicated; if True, then multi-head ingestion will not be used (but the head node would be used for ingestion instead). This is due to GPUdb not supporting multi-head ingestion on replicated tables.

json_ingestion (bool) –

Indicates whether the GPUdbIngestor instance is being used to insert JSON records or not. Default has been set to False. To use GPUdbIngestor for inserting JSON records it must be set to True.

Example

gpudb_ingestor = GPUdbIngestor(gpudb, table_name, record_type, ingestor_batch_size, ingestor_options, workers, json_ingestion=True)
get_gpudb()[source]

Return the instance of GPUdb client used by this ingestor.

property retry_count

Return the number of times ingestion will be attempted upon failure.

get_table_name()[source]

Return the GPUdb table associated with this ingestor.

get_batch_size()[source]

Return the batch_size used for this ingestor.

get_options()[source]

Return the options used for this ingestor.

get_count_inserted()[source]

Return the number of records inserted thus far.

get_count_updated()[source]

Return the number of records updated thus far.

set_logger_level(log_level)[source]

Set the log level for the GPUdb multi-head I/O module.

Parameters

log_level (int, long, or str) –

A valid log level for the logging module

insert_record(record, record_encoding='binary', is_data_encoded=True)[source]

Queues a record for insertion into GPUdb. If the queue reaches the batch size, all records in the queue will be inserted into GPUdb before the method returns. If an error occurs while inserting the records, the records will no longer be in the queue nor in GPUdb; catch InsertionException to get the list of records that were being inserted if needed (for example, to retry).

Parameters

record (list, dict, collections.OrderedDict, gpudb.GPUdbRecord, Record, or JSON) –

The record to insert.

record_encoding (str) –

The encoding to use for the insertion. Allowed values are:

  • binary

  • json

The default value is binary.

is_data_encoded (bool) –

Indicates if the data has already been encoded (so that we don’t do double encoding). Use ONLY if the data has already been encoded. Default is True.

Raises

InserttionException

If an error occurs while inserting.

insert_records(records, record_encoding='binary', is_data_encoded=True)[source]

Queues a list of records for insertion into GPUdb. If any queue reaches the batch size, all records in that queue will be inserted into GPUdb before the method returns. If an error occurs while inserting the queued records, the records will no longer be in that queue nor in GPUdb; catch InsertionException to get the list of records that were being inserted (including any from the queue in question and any remaining in the list not yet queued) if needed (for example, to retry). Note that depending on the number of records, multiple calls to GPUdb may occur.

Parameters

record (list, dict, collections.OrderedDict, gpudb.GPUdbRecord, Record, or JSON) –

The record(s) to insert.

record_encoding (str) –

The encoding to use for the insertion. Allowed values are:

  • binary

  • json

The default value is binary.

is_data_encoded (bool) –

Indicates if the data has already been encoded (so that we don’t do double encoding). Use ONLY if the data has already been encoded. Default is True.

Raises

InsertionException

If an error occurs while inserting

flush(forced_flush=True, is_data_encoded=True)[source]

Ensures that any queued records are inserted into GPUdb. If an error occurs while inserting the records from any queue, the records will no longer be in that queue nor in GPUdb; catch InsertionException to get the list of records that were being inserted if needed (for example, to retry). Other queues may also still contain unflushed records if this occurs.

Parameters

forced_flush (bool) –

Boolean flag indicating whether a user invoked this method or an internal method called it.

is_data_encoded (bool) –

Indicates if the data has already been encoded (so that we don’t do double encoding). Use ONLY if the data has already been encoded. Default is True.

Raises

InsertionException

If an error occurs while inserting records.