Package com.gpudb

Class BulkInserter<T>

  • Type Parameters:
    T - the type of object being inserted
    All Implemented Interfaces:
    AutoCloseable

    public class BulkInserter<T>
    extends Object
    implements AutoCloseable
    Object that manages the insertion into GPUdb of large numbers of records in bulk, with automatic batch management and support for multi-head ingest. BulkInserter instances are thread safe and may be used from any number of threads simultaneously. Use the insert(Object) and insert(List) methods to queue records for insertion, and the flush() method to ensure that all queued records have been inserted.
    • Method Detail

      • setTimedFlushOptions

        public void setTimedFlushOptions​(BulkInserter.FlushOptions flushOptions)
                                  throws GPUdbException
        This method could potentially result in two different scenarios
        1. It could start a timed flush thread if it was not already active when the BulkInserter was created.
        2. If the timed flush thread was already active then setting this to a new value will first terminate the existing thread; set the options to the new value and finally restart a new thread with the options passed in. This case could result in a delay since the thread needs to be cleaned up and restarted.
        Parameters:
        flushOptions - BulkInserter.FlushOptions to use for timed flush operation
        Throws:
        GPUdbException - in case an invalid timeout values is set in the flushOptions parameter.
      • close

        public void close()
                   throws BulkInserter.InsertException
        Closes BulkInserter resources:
        1. Performs a final flush of any data yet to be ingested
        2. Terminates the timed flush mechanism, if needed
        3. Terminates the ingest executor service

        This method will be called automatically if the BulkInserter class is used in a try-with-resources block. If not used that way it is mandatory to call this method to initiate a smooth cleanup of the underlying resources.
             try( BulkInserter inserter = new BulkInserter(...) ) {
                 // Do something with the BulkInserter instance
                 // inserter.{some_method}
             }
             // Here the close method of the BulkInserter class will be called
             // automatically
         
        or
             BulkInserter inserter = new BulkInserter<>(...)
             // Invoke some methods on the inserter
             //Explicitly call close() method
             inserter.close();
         
        Specified by:
        close in interface AutoCloseable
        Throws:
        BulkInserter.InsertException - - While doing the final flush
      • getGPUdb

        public GPUdb getGPUdb()
        Gets the GPUdb instance into which records will be inserted.
        Returns:
        the GPUdb instance into which records will be inserted
      • getTableName

        public String getTableName()
        Gets the name of the table into which records will be inserted.
        Returns:
        the name of the table into which records will be inserted
      • getBatchSize

        public int getBatchSize()
        Gets the batch size (the number of records to insert into GPUdb at a time). For multi-head ingest this value is per worker.
        Returns:
        the batch size
      • getOptions

        public Map<String,​String> getOptions()
        Gets the optional parameters that will be passed to GPUdb while inserting.
        Returns:
        the optional parameters that will be passed to GPUdb while inserting
        See Also:
        InsertRecordsRequest.Options
      • isMultiHeadEnabled

        public boolean isMultiHeadEnabled()
      • getMaxRetries

        public int getMaxRetries()
        Gets the number of times inserts into GPUdb will be retried in the event of an error. After this many retries, BulkInserter.InsertException will be thrown.
        Returns:
        the number of retries
        See Also:
        setMaxRetries(int)
      • setMaxRetries

        public void setMaxRetries​(int value)
        Sets the number of times inserts into GPUdb will be retried in the event of an error. After this many retries, BulkInserter.InsertException will be thrown.
        Parameters:
        value - the number of retries
        Throws:
        IllegalArgumentException - if value is less than zero
        See Also:
        getMaxRetries()
      • getRetryCount

        @Deprecated(since="7.1.10",
                    forRemoval=true)
        public int getRetryCount()
        Deprecated, for removal: This API element is subject to removal in a future version.
        Gets the number of times inserts into GPUdb will be retried in the event of an error. After this many retries, BulkInserter.InsertException will be thrown.
        Returns:
        the number of retries
        See Also:
        setRetryCount(int)
      • setRetryCount

        @Deprecated(since="7.1.10",
                    forRemoval=true)
        public void setRetryCount​(int value)
        Deprecated, for removal: This API element is subject to removal in a future version.
        Sets the number of times inserts into GPUdb will be retried in the event of an error. After this many retries, BulkInserter.InsertException will be thrown.
        Parameters:
        value - the number of retries
        Throws:
        IllegalArgumentException - if value is less than zero
        See Also:
        getRetryCount()
      • getCountInserted

        public long getCountInserted()
        Gets the number of records inserted into GPUdb. Excludes records that are currently queued but not yet inserted and records not inserted due to primary key conflicts.
        Returns:
        the number of records inserted
      • getCountUpdated

        public long getCountUpdated()
        Gets the number of records updated (instead of inserted) in GPUdb due to primary key conflicts.
        Returns:
        the number of records updated
      • getErrors

        public List<BulkInserter.InsertException> getErrors()
        Gets the list of errors received since the last call to getErrors().
        Returns:
        list of InsertException objects
      • getWarnings

        public List<BulkInserter.InsertException> getWarnings()
        Gets the list of warnings received since the last call to getWarnings().
        Returns:
        list of InsertException objects
      • flush

        public void flush()
                   throws BulkInserter.InsertException
        Ensures that any queued records are inserted into GPUdb. If an error occurs while inserting the records from any queue, the records will no longer be in that queue nor in GPUdb; catch BulkInserter.InsertException to get the list of records that were being inserted if needed (for example, to retry). Other queues may also still contain unflushed records if this occurs.
        Throws:
        BulkInserter.InsertException - if an error occurs while inserting
      • insert

        public void insert​(T record)
                    throws BulkInserter.InsertException
        Queues a record for insertion into GPUdb. If the queue reaches the batch size, all records in the queue will be inserted into GPUdb before the method returns. If an error occurs while inserting the records, the records will no longer be in the queue nor in GPUdb; catch BulkInserter.InsertException to get the list of records that were being inserted if needed (for example, to retry). Note: This version of insert(Object) will result in sequential batch inserts in a background thread. Use insert(List) to allow multiple queues to reach their batch size and parallelize all of those batch inserts at once.
        Parameters:
        record - the record to insert
        Throws:
        GPUdbException - if an error occurs while calculating shard/primary keys
        BulkInserter.InsertException - if an error occurs while inserting
      • insert

        public void insert​(List<T> records)
                    throws BulkInserter.InsertException
        Queues a list of records for insertion into GPUdb. If any queue reaches the batch size, all queues that have reached the batch size will have their records inserted into the database before the method returns. If an error occurs while inserting the records, they will no longer be in that queue or the database; catch BulkInserter.InsertException to get the list of records that were being inserted (including any from the queue in question and any remaining in the list not yet queued) if needed (for example, to retry). Note that depending on the number of records, multiple calls to the database may occur. Note: This version of insert(List) will result in parallelizing the batch inserts in background threads.
        Parameters:
        records - the records to insert
        Throws:
        BulkInserter.InsertException - if an error occurs while inserting