Class DataFrameUtils
- class gpudb_dataframe.DataFrameUtils[source]
- classmethod sql_to_df(db: GPUdb, sql: str, sql_params: list = [], batch_size: int = 5000, sql_opts: dict = {}, show_progress: bool = False) DataFrame | None [source]
Create a
pd.Dataframe
from the results of a SQL query.Parameters
- db (GPUdb) –
a
GPUdb
instance- sql (str) –
the SQL query
- sql_params (list) –
the query parameters. Defaults to None.
- batch_size (int) –
the batch size for the SQL execution results. Defaults to
BATCH_SIZE
.- sql_opts (dict) –
the SQL options as a dict. Defaults to None.
- show_progress (bool) –
whether to display progress or not. Defaults to False.
Raises
- GPUdbException –
If the SQL query failed
Returns
- pd.DataFrame –
a Pandas
pd.Dataframe
or None if the SQL query has returned no results
- classmethod table_to_df(db: GPUdb, table_name: str, batch_size: int = 5000, show_progress: bool = False) DataFrame [source]
Convert a Kinetica table into a
pd.Dataframe
and load data into it.Parameters
- db (GPUdb) –
a
GPUdb
instance- table_name (str) –
name of the Kinetica table
- batch_size (int) –
the batch size for the SQL execution results. Defaults to
BATCH_SIZE
.- show_progress (bool) –
whether to display progress or not. Defaults to False.
Returns
- pd.Dataframe –
Returns a Pandas
pd.Dataframe
created from the Kinetica table
- classmethod table_type_as_df(gpudb_table: GPUdbTable) DataFrame [source]
Convert the type schema (column list) of a
GPUdbTable
into apd.Dataframe
.Parameters
- gpudb_table (GPUdbTable) –
a
GPUdbTable
instance
Returns
- pd.DataFrame –
a Pandas
pd.Dataframe
created by analyzing the table column types
- classmethod df_to_table(df: DataFrame, db: GPUdb, table_name: str, column_types: dict = {}, clear_table: bool = False, create_table: bool = True, load_data: bool = True, show_progress: bool = False, batch_size: int = 5000, **kwargs) GPUdbTable [source]
Load a
pd.Dataframe
into a table; optionally dropping any existing table, creating it if it doesn’t exist, and loading data into it; and then returning aGPUdbTable
reference to the table.Parameters
- df (pd.DataFrame) –
The Pandas
pd.Dataframe
to load into a table- db (GPUdb) –
GPUdb
instance- table_name (str) –
Name of the target Kinetica table for the
pd.Dataframe
loading- column_types (dict) –
Optional Kinetica column properties to apply to the column type definitions inferred from the
pd.Dataframe
; map of column name to a list of column properties for that column, excluding the inferred base type. Defaults to empty map. For example:{ "middle_name": [ 'char64', 'nullable' ], "state": [ 'char2', 'dict' ] }
- clear_table (bool) –
Whether to drop an existing table of the same name or not before creating this one. Defaults to False.
- create_table (bool) –
Whether to create the table if it doesn’t exist or not. Defaults to True.
- load_data (bool) –
Whether to load data into the target table or not. Defaults to True.
- show_progress (bool) –
Whether to show progress of the operation on the console. Defaults to False.
- batch_size (int) –
The number of records at a time to load into the target table. Defaults to
BATCH_SIZE
.
Raises
- GPUdbException –
If the
pd.Dataframe
is empty, the table doesn’t exist andcreate_table
is False, or the data ingest fails
Returns
- GPUdbTable –
a
GPUdbTable
instance created from thepd.Dataframe
passed in
- classmethod df_insert_into_table(df: DataFrame, gpudb_table: GPUdbTable, batch_size: int = 5000, show_progress: bool = False) int [source]
Load a Pandas
pd.Dataframe
into aGPUdbTable
.Parameters
- df (pd.Dataframe) –
a Pandas
pd.Dataframe
- gpudb_table (GPUdbTable) –
a
GPUdbTable
instance- batch_size (int) –
a batch size to use for loading data into the table. Defaults to
BATCH_SIZE
.- show_progress (bool) –
whether to show progress of the operation. Defaults to False.
Raises
- GPUdbException –
If the data ingest fails
Returns
- int –
the number of rows of the
pd.Dataframe
actually inserted into the Kinetica table