Class GPUdbIngestor
- class gpudb_multihead_io.GPUdbIngestor(gpudb, table_name, record_type, batch_size, options=None, workers=None, is_table_replicated=False, json_ingestion=False)[source]
Initializes the GPUdbIngestor instance.
Parameters
- gpudb (
gpudb.GPUdb
) – The client handle through which the ingestion process is to be conducted.
- table_name (str) –
The name of the table into which records will be ingested. Must be an existing table.
- record_type (
gpudb.GPUdbRecordType
) – The type for the records which will be ingested; must match the type of the given table.
- batch_size (int) –
The size of the queues; when any queue (one per worker rank of the database server) attains the given size, the queued records will be automatically flushed. Until then, those records will be held client-side and not actually ingested. (Unless
flush()
is called, of course.)- options (dict of str to str) –
Any insertion options to be passed onto the GPUdb server. Optional parameter.
- workers (
GPUdbWorkerList
) – Optional parameter. A list of GPUdb worker rank addresses.
- is_table_replicated (bool) –
Optional boolean flag indicating whether the table is replicated; if True, then multi-head ingestion will not be used (but the head node would be used for ingestion instead). This is due to GPUdb not supporting multi-head ingestion on replicated tables.
- json_ingestion (bool) –
Indicates whether the GPUdbIngestor instance is being used to insert JSON records or not. Default has been set to False. To use GPUdbIngestor for inserting JSON records it must be set to True.
Example
gpudb_ingestor = GPUdbIngestor(gpudb, table_name, record_type, ingestor_batch_size, ingestor_options, workers, json_ingestion=True)
- property retry_count
Return the number of times ingestion will be attempted upon failure.
- set_logger_level(log_level)[source]
Set the log level for the GPUdb multi-head I/O module.
Parameters
- log_level (int, long, or str) –
A valid log level for the logging module
- insert_record(record, record_encoding='binary', is_data_encoded=True)[source]
Queues a record for insertion into GPUdb. If the queue reaches the
batch size
, all records in the queue will be inserted into GPUdb before the method returns. If an error occurs while inserting the records, the records will no longer be in the queue nor in GPUdb; catchInsertionException
to get the list of records that were being inserted if needed (for example, to retry).Parameters
- record (list, dict, collections.OrderedDict,
gpudb.GPUdbRecord
, Record, or JSON) – The record to insert.
- record_encoding (str) –
The encoding to use for the insertion. Allowed values are:
binary
json
The default value is
binary
.- is_data_encoded (bool) –
Indicates if the data has already been encoded (so that we don’t do double encoding). Use ONLY if the data has already been encoded. Default is True.
Raises
InserttionException
–If an error occurs while inserting.
- record (list, dict, collections.OrderedDict,
- insert_records(records, record_encoding='binary', is_data_encoded=True)[source]
Queues a list of records for insertion into GPUdb. If any queue reaches the
batch size
, all records in that queue will be inserted into GPUdb before the method returns. If an error occurs while inserting the queued records, the records will no longer be in that queue nor in GPUdb; catchInsertionException
to get the list of records that were being inserted (including any from the queue in question and any remaining in the list not yet queued) if needed (for example, to retry). Note that depending on the number of records, multiple calls to GPUdb may occur.Parameters
- record (list, dict, collections.OrderedDict,
gpudb.GPUdbRecord
, Record, or JSON) – The record(s) to insert.
- record_encoding (str) –
The encoding to use for the insertion. Allowed values are:
binary
json
The default value is
binary
.- is_data_encoded (bool) –
Indicates if the data has already been encoded (so that we don’t do double encoding). Use ONLY if the data has already been encoded. Default is True.
Raises
InsertionException
–If an error occurs while inserting
- record (list, dict, collections.OrderedDict,
- flush(forced_flush=True, is_data_encoded=True)[source]
Ensures that any queued records are inserted into GPUdb. If an error occurs while inserting the records from any queue, the records will no longer be in that queue nor in GPUdb; catch
InsertionException
to get the list of records that were being inserted if needed (for example, to retry). Other queues may also still contain unflushed records if this occurs.Parameters
- forced_flush (bool) –
Boolean flag indicating whether a user invoked this method or an internal method called it.
- is_data_encoded (bool) –
Indicates if the data has already been encoded (so that we don’t do double encoding). Use ONLY if the data has already been encoded. Default is True.
Raises
InsertionException
–If an error occurs while inserting records.
- gpudb (