Class GPUdbIngestor

class gpudb_multihead_io.GPUdbIngestor(gpudb, table_name, record_type, batch_size, options=None, workers=None, is_table_replicated=False)[source]

Initializes the GPUdbIngestor instance.

Parameters

gpudb (GPUdb) –
The client handle through which the ingestion process is to be conducted.
table_name (str) –
The name of the table into which records will be ingested. Must be an existing table.
record_type (GPUdbRecordType) –
The type for the records which will be ingested; must match the type of the given table.
batch_size (int) –
The size of the queues; when any queue (one per worker rank of the database server) attains the given size, the queued records will be automatically flushed. Until then, those records will be held client-side and not actually ingested. (Unless flush() is called, of course.)
options (dict of str to str) –
Any insertion options to be passed onto the GPUdb server. Optional parameter.
workers (GPUdbWorkerList) –
Optional parameter. A list of GPUdb worker rank addresses.
is_table_replicated (bool) –
Optional boolean flag indicating whether the table is replicated; if True, then multi-head ingestion will not be used (but the head node would be used for ingestion instead). This is due to GPUdb not supporting multi-head ingestion on replicated tables.
get_gpudb()[source]

Return the instance of GPUdb client used by this ingestor.

get_table_name()[source]

Return the GPUdb table associated with this ingestor.

get_batch_size()[source]

Return the batch_size used for this ingestor.

get_options()[source]

Return the options used for this ingestor.

get_count_inserted()[source]

Return the number of records inserted thus far.

get_count_updated()[source]

Return the number of records updated thus far.

set_logger_level(log_level)[source]

Set the log level for the GPUdb multi-head i/o module.

Parameters

log_level (int, long, or str) –
A valid log level for the logging module
insert_record(record, record_encoding='binary', is_data_encoded=True)[source]

Queues a record for insertion into GPUdb. If the queue reaches the {@link #get_batch_size batch size}, all records in the queue will be inserted into GPUdb before the method returns. If an error occurs while inserting the records, the records will no longer be in the queue nor in GPUdb; catch {@link InsertionException} to get the list of records that were being inserted if needed (for example, to retry).

Parameters

record (dict, GPUdbRecord, collections.OrderedDict, Record) –
The record to insert.
record_encoding (str) –

The encoding to use for the insertion. Allowed values are:

  • ‘binary’
  • ‘json’

The default values is ‘binary’.

is_data_encoded (bool) –
Indicates if the data has already been encoded (so that we don’t do double encoding). Use ONLY if the data has already been encoded. Default is False.

@throws InsertionException if an error occurs while inserting.

insert_records(records, record_encoding='binary', is_data_encoded=True)[source]

Queues a list of records for insertion into GPUdb. If any queue reaches the {@link #get_batch_size batch size}, all records in that queue will be inserted into GPUdb before the method returns. If an error occurs while inserting the queued records, the records will no longer be in that queue nor in GPUdb; catch {@link InsertionException} to get the list of records that were being inserted (including any from the queue in question and any remaining in the list not yet queued) if needed (for example, to retry). Note that depending on the number of records, multiple calls to GPUdb may occur.

Parameters

records (GPUdbRecord, collections.OrderedDict, Record) –
The records to insert.
record_encoding (str) –

The encoding to use for the insertion. Allowed values are:

  • ‘binary’
  • ‘json’

The default values is ‘binary’.

is_data_encoded (bool) –
Indicates if the data has already been encoded (so that we don’t do double encoding). Use ONLY if the data has already been encoded. Default is False.

@throws InsertionException if an error occurs while inserting

flush(forced_flush=True, is_data_encoded=True)[source]

Ensures that any queued records are inserted into GPUdb. If an error occurs while inserting the records from any queue, the records will no longer be in that queue nor in GPUdb; catch {@link InsertException} to get the list of records that were being inserted if needed (for example, to retry). Other queues may also still contain unflushed records if this occurs.

Parameters

forced_flush (bool) –
Boolean flag indicating whether a user invoked this method or an internal method called it.
is_data_encoded (bool) –
Indicates if the data has already been encoded (so that we don’t do double encoding). Use ONLY if the data has already been encoded. Default is False.

@throws InsertException if an error occurs while inserting records.