Class DataFrameUtils

class gpudb_dataframe.DataFrameUtils[source]
classmethod sql_to_df(db: GPUdb, sql: str, sql_params: list = [], batch_size: int = 5000, sql_opts: dict = {}, show_progress: bool = False) DataFrame | None[source]

Create a pd.Dataframe from the results of a SQL query.

Parameters

db (GPUdb) –

a GPUdb instance

sql (str) –

the SQL query

sql_params (list) –

the query parameters. Defaults to None.

batch_size (int) –

the batch size for the SQL execution results. Defaults to BATCH_SIZE.

sql_opts (dict) –

the SQL options as a dict. Defaults to None.

show_progress (bool) –

whether to display progress or not. Defaults to False.

Raises

GPUdbException –

If the SQL query failed

Returns

pd.DataFrame –

a Pandas pd.Dataframe or None if the SQL query has returned no results

classmethod table_to_df(db: GPUdb, table_name: str, batch_size: int = 5000, show_progress: bool = False) DataFrame[source]

Convert a Kinetica table into a pd.Dataframe and load data into it.

Parameters

db (GPUdb) –

a GPUdb instance

table_name (str) –

name of the Kinetica table

batch_size (int) –

the batch size for the SQL execution results. Defaults to BATCH_SIZE.

show_progress (bool) –

whether to display progress or not. Defaults to False.

Returns

pd.Dataframe –

Returns a Pandas pd.Dataframe created from the Kinetica table

classmethod table_type_as_df(gpudb_table: GPUdbTable) DataFrame[source]

Convert the type schema (column list) of a GPUdbTable into a pd.Dataframe.

Parameters

gpudb_table (GPUdbTable) –

a GPUdbTable instance

Returns

pd.DataFrame –

a Pandas pd.Dataframe created by analyzing the table column types

classmethod df_to_table(df: DataFrame, db: GPUdb, table_name: str, column_types: dict = {}, clear_table: bool = False, create_table: bool = True, load_data: bool = True, show_progress: bool = False, batch_size: int = 5000, **kwargs) GPUdbTable[source]

Load a pd.Dataframe into a table; optionally dropping any existing table, creating it if it doesn’t exist, and loading data into it; and then returning a GPUdbTable reference to the table.

Parameters

df (pd.DataFrame) –

The Pandas pd.Dataframe to load into a table

db (GPUdb) –

GPUdb instance

table_name (str) –

Name of the target Kinetica table for the pd.Dataframe loading

column_types (dict) –

Optional Kinetica column properties to apply to the column type definitions inferred from the pd.Dataframe; map of column name to a list of column properties for that column, excluding the inferred base type. Defaults to empty map. For example:

{ "middle_name": [ 'char64', 'nullable' ], "state": [ 'char2', 'dict' ] }
clear_table (bool) –

Whether to drop an existing table of the same name or not before creating this one. Defaults to False.

create_table (bool) –

Whether to create the table if it doesn’t exist or not. Defaults to True.

load_data (bool) –

Whether to load data into the target table or not. Defaults to True.

show_progress (bool) –

Whether to show progress of the operation on the console. Defaults to False.

batch_size (int) –

The number of records at a time to load into the target table. Defaults to BATCH_SIZE.

Raises

GPUdbException –

If the pd.Dataframe is empty, the table doesn’t exist and create_table is False, or the data ingest fails

Returns

GPUdbTable –

a GPUdbTable instance created from the pd.Dataframe passed in

classmethod df_insert_into_table(df: DataFrame, gpudb_table: GPUdbTable, batch_size: int = 5000, show_progress: bool = False) int[source]

Load a Pandas pd.Dataframe into a GPUdbTable.

Parameters

df (pd.Dataframe) –

a Pandas pd.Dataframe

gpudb_table (GPUdbTable) –

a GPUdbTable instance

batch_size (int) –

a batch size to use for loading data into the table. Defaults to BATCH_SIZE.

show_progress (bool) –

whether to show progress of the operation. Defaults to False.

Raises

GPUdbException –

If the data ingest fails

Returns

int –

the number of rows of the pd.Dataframe actually inserted into the Kinetica table