Class DataFrameUtils

class gpudb_dataframe.DataFrameUtils[source]

classmethod sql_to_df(db: GPUdb, sql: str, sql_params: list = [], batch_size: int = 5000, sql_opts: dict = {}, show_progress: bool = False) → DataFrame | None[source]

Create a pd.Dataframe from the results of a SQL query.

Parameters

db (GPUdb) –
a GPUdb instance

sql (str) –
the SQL query

sql_params (list) –
the query parameters. Defaults to None.

batch_size (int) –
the batch size for the SQL execution results. Defaults to BATCH_SIZE.

sql_opts (dict) –
the SQL options as a dict. Defaults to None.

show_progress (bool) –
whether to display progress or not. Defaults to False.

Raises

GPUdbException –
If the SQL query failed

Returns

pd.DataFrame –
a Pandas pd.Dataframe or None if the SQL query has returned no results

classmethod table_to_df(db: GPUdb, table_name: str, batch_size: int = 5000, show_progress: bool = False) → DataFrame[source]

Convert a Kinetica table into a pd.Dataframe and load data into it.

Parameters

db (GPUdb) –
a GPUdb instance

table_name (str) –
name of the Kinetica table

batch_size (int) –
the batch size for the SQL execution results. Defaults to BATCH_SIZE.

show_progress (bool) –
whether to display progress or not. Defaults to False.

Returns

pd.Dataframe –
Returns a Pandas pd.Dataframe created from the Kinetica table

classmethod table_type_as_df(gpudb_table: GPUdbTable) → DataFrame[source]

Convert the type schema (column list) of a GPUdbTable into a pd.Dataframe.

Parameters

gpudb_table (GPUdbTable) –
a GPUdbTable instance

Returns

pd.DataFrame –
a Pandas pd.Dataframe created by analyzing the table column types

classmethod df_to_table(df: DataFrame, db: GPUdb, table_name: str, column_types: dict = {}, clear_table: bool = False, create_table: bool = True, load_data: bool = True, show_progress: bool = False, batch_size: int = 5000, **kwargs) → GPUdbTable[source]

Load a pd.Dataframe into a table; optionally dropping any existing table, creating it if it doesn’t exist, and loading data into it; and then returning a GPUdbTable reference to the table.

Parameters

df (pd.DataFrame) –
The Pandas pd.Dataframe to load into a table

db (GPUdb) –
GPUdb instance

table_name (str) –
Name of the target Kinetica table for the pd.Dataframe loading

column_types (dict) –
Optional Kinetica column properties to apply to the column type definitions inferred from the pd.Dataframe; map of column name to a list of column properties for that column, excluding the inferred base type. Defaults to empty map. For example:
{ "middle_name": [ 'char64', 'nullable' ], "state": [ 'char2', 'dict' ] }
clear_table (bool) –
Whether to drop an existing table of the same name or not before creating this one. Defaults to False.

create_table (bool) –
Whether to create the table if it doesn’t exist or not. Defaults to True.

load_data (bool) –
Whether to load data into the target table or not. Defaults to True.

show_progress (bool) –
Whether to show progress of the operation on the console. Defaults to False.

batch_size (int) –
The number of records at a time to load into the target table. Defaults to BATCH_SIZE.

Raises

GPUdbException –
If the pd.Dataframe is empty, the table doesn’t exist and create_table is False, or the data ingest fails

Returns

GPUdbTable –
a GPUdbTable instance created from the pd.Dataframe passed in

classmethod df_insert_into_table(df: DataFrame, gpudb_table: GPUdbTable, batch_size: int = 5000, show_progress: bool = False) → int[source]

Load a Pandas pd.Dataframe into a GPUdbTable.

Parameters

df (pd.Dataframe) –
a Pandas pd.Dataframe

gpudb_table (GPUdbTable) –
a GPUdbTable instance

batch_size (int) –
a batch size to use for loading data into the table. Defaults to BATCH_SIZE.

show_progress (bool) –
whether to show progress of the operation. Defaults to False.

Raises

GPUdbException –
If the data ingest fails

Returns

int –
the number of rows of the pd.Dataframe actually inserted into the Kinetica table