Loading Data

Loading Data from Files

Kinetica supports loading of text-based delimited data files via SQL. There are two supported paths:

INSERT INTO...SELECT...FROM FILE

Kinetica can load data from text-based data files that are local to the client into existing tables using a modified INSERT INTO...SELECT statement.

Note

This command is only available through KiSQL or database clients configured with the Kinetica JDBC driver.

When reading data from a file, Kinetica assumes the following format, able to be overridden using command options:

  • If there is a header row, it is either:

    • a simple list of comma-delimited column names:

      id,category,name,description,stock
      
    • the Kinetica standard format:

      id|int|data,category|string|data|char16,name|string|data|char32,description|string|data|char128|nullable,stock|int|data|nullable
      
  • The data fields are comma-delimited

  • Strings are optionally enclosed in double-quotes; double-quotes must be used when the data contains commas; two consecutive double-quotes are used as an escape code for double-quoted string data containing double-quotes; e.g.:

    "This string contains a "" quote mark and a "","" double-quoted comma."

    This would be loaded into a single string column in Kinetica as:

    This string contains a " quote mark and a "," double-quoted comma.

INSERT INTO...SELECT...FROM FILE Syntax
1
2
3
4
INSERT INTO [<schema name>.]<table name> [( <column list> )]
SELECT <field list>
FROM FILE."<file name | file name expression>"
[ <option list> ]
ParametersDescription
<schema name>Name of the schema containing the table into which data will be inserted
<table name>Name of the table into which data will be inserted
<column list>Optional list of target table columns into which data will be inserted; if not specified, all target table columns will be assumed. Whether specified or implied, this list needs to match the fields specified in <field list> in number, order, & type.
<field list>List of the source data file field names from which data will be extracted; use * to extract all columns (this wildcard must be used when a source file has no header which defines field names). This list needs to match all the target table columns (or those specified in the <column list> option) in number, order, & type.
<file name | file name expression>Name of the source data file or set of source data files from which records will be extracted; wildcards (*) can be used to specify a group of files
<option list>Optional list of whitespace-separated processing options

Options

The following options can be specified when loading data from files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.

OptionDescription
BATCH_SIZE=<n>

Use an ingest batch size of n records.

The default batch size is 10,000.

COMMENT='<string>'

Treat lines in the source file that begin with string as comments and skip.

The default comment marker is #.

DELIMITER='<char>'

Use char as the source file field delimiter.

The default delimiter is ,, unless the source file has one of these extensions:

  • .psv - will cause | to be the delimiter
  • .tsv - will cause the tab character to be the delimiter

See Option Characters for allowed characters.

ESCAPE='<char>'

Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:

Source Data StringRepresentation when Loaded into the Database
<char>aASCII bell
<char>bASCII backspace
<char>fASCII form feed
<char>nASCII line feed
<char>rASCII carriage return
<char>tASCII horizontal tab
<char>vASCII vertical tab

For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database.

The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

There is no default escape character. See Option Characters for allowed characters.

HEADER=<Y|N>Declare that the source file has or does not have a header. If not given, that determination will be intuited.
INITIAL_CLEAR

Truncate the target table before loading data.

The default is to not truncate the target table.

LIMIT=<n>

Limit the total number of records inserted into the target table to n. If reading from multiple source files, this count is assessed across all files being read. For example, if n is 15 and the three files being read have 10 records each, all of the records from the 1st file and the first 5 records from the 2nd file will be loaded. The remaining records in the 2nd file and all of the records from the 3rd file will be skipped.

The default is no limit.

NULL='<string>'

Treat string as the indicator of a null source field value.

The default is \N.

ON_ERROR=<mode>

When an error is encountered loading a record, handle it using using the given <mode>; one of the following:

ModeDescription
PERMISSIVEIf an error is encountered parsing a source record, attempt to insert as much of the valid fields from the record as possible.
SKIPIf an error is encountered parsing a source record, skip the record.
ABORTIf an error is encountered parsing a source record, stop the data load process. Primary key collisions are considered abortable errors in this mode.

The default mode is SKIP.

QUOTE='<char>'

Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given.

The default is the " (double-quote) character. See Option Characters for allowed characters.

SKIP=<n>

Skip the first n source file data lines read, not including header and comment lines. If reading from multiple source files, this count is assessed across all files being read. For example, if n is 15 and the first two files read have 10 records each, the first record to be loaded (not skipped) will be the 6th record of the 2nd file.

The default is to skip no records.

Option Characters

For DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:

Escaped CharCorresponding Source File Character
''Single quote
\\Backslash
\aASCII bell
\bASCII backspace
\fASCII form feed
\tASCII horizontal tab
\vASCII vertical tab

For instance, if two single quotes ('') are specified for a QUOTE character, the parser will interpret single quotes in the source file as quoting characters; specifying \t for DELIMITER will cause the parser to interpret ASCII horizontal tab characters in the source file as delimiter characters.

Examples

Single File

To load a CSV file of product data located at /tmp/data/products.csv into a table named product:

INSERT INTO...SELECT...FROM FILE (Single File) Example
1
2
3
INSERT INTO example.product
SELECT *
FROM FILE."/tmp/data/products.csv"
Multiple Files

To load all files whose names begin with products and end in a csv extension, located under /tmp/data, into a table named product:

INSERT INTO...SELECT...FROM FILE (Multiple File) Example
1
2
3
INSERT INTO example.product
SELECT *
FROM FILE."/tmp/data/products*csv"
Column/Field Names

To load specific fields from a product data file located at /tmp/data/products.csv into specific target table columns:

INSERT INTO...SELECT...FROM FILE (Column/Field Names) Example
1
2
3
INSERT INTO example.product (id, stock, category, name)
SELECT ID, Stock, Category, Name
FROM FILE."/tmp/data/products.csv"

Note

When specifying source data file field names, it is not required that target table column names also be specified, or vice versa. The only requirement is that the set of source data file fields selected align with the target table columns into which data will be inserted.

Options

To load data from a product data file with the following options:

  • file located at /tmp/data/products.csv
  • only records 4 through 10 will be loaded
  • any error will cause the insert to fail
  • the target table will be truncated before loading
  • records will be inserted into the table in groups of 2
INSERT INTO...SELECT...FROM FILE with Load Options Example
1
2
3
4
5
6
7
8
INSERT INTO example.product
SELECT *
FROM FILE."/tmp/data/products.csv"
BATCH_SIZE=2
INITIAL_CLEAR
LIMIT=7
ON_ERROR=abort
SKIP=3

To load data from a product data file with the following options:

  • file located at /tmp/data/products.ssv
  • the file has a header
  • the field delimiter is a semicolon
  • data requiring quotes will have single quotes as the quoting character
  • the escape character is the backtick
  • the data file represents null values as <null>
  • data file comments are on lines starting with --
INSERT INTO...SELECT...FROM FILE with Parse Options Example
1
2
3
4
5
6
7
8
9
INSERT INTO example.product
SELECT *
FROM FILE."/tmp/data/products.ssv"
COMMENT='--'
DELIMITER=';'
ESCAPE='`'
HEADER=Y
NULL='<null>'
QUOTE='\''

Considerations

Multi-Head

Blank Lines

  • A blank line in a source data file will be treated as a valid record and will be inserted into the target table as a record with all null fields. If blank lines should not be considered as data, they should be prefixed with the comment marker in order to be skipped by the loading process.

GAdmin

  • While GAdmin does support SQL file ingestion, it is recommended to use server-side loading instead. Upon receiving this client-side load command, GAdmin will spawn a JDBC client on its host that will need to reference files accessible to the gpudb user located on that host (usually, the head node of the Kinetica cluster).

/execute/sql endpoint

  • While SQL file ingestion is available to ODBC/JDBC clients and via GAdmin; it is not supported within the /execute/sql endpoint, either as a REST call directly or as an API call (like the Python API's execute_sql() function).

LOAD INTO

Kinetica can load data from text-based data files, located on the Kinetica cluster itself, in KiFS, or from a data source, into existing tables using a LOAD INTO statement.

Its use with ring resiliency has additional considerations.

The LOAD INTO command can initiate a subscription to a data source that references an Azure, Kafka, or S3 source of data configured for streaming.

Note

Only JSON & GeoJSON is supported for Kafka streams.

Also, due to data being loaded in parallel, there is no insertion order guaranteed. For tables with primary keys, in the case of a primary key collision, this means it is indeterminate which record will be inserted first and remain, while the rest of the colliding key records are discarded.

LOAD INTO Syntax
1
2
3
4
5
<COPY | LOAD> [DATA] INTO [<schema name>.]<table name>
[FROM FILE PATHS <file paths>]
[FORMAT <[DELIMITED] TEXT [(<delimited text options>)] | PARQUET | JSON | SHAPEFILE>]
[WITH OPTIONS (<load option name> = '<load option value>'[,...])]
[<table property clause>]
ParametersDescription
DATAOptional keyword for compatibility
<schema name>Name of the schema containing the table into which data will be loaded
<table name>Name of the table into which data will be loaded
FROM FILE PATHS <file paths>

Source file specification clause, where <file paths> is a comma-separated list of single-quoted file paths from which data will be loaded; all files specified are presumed to have the same format and data types.

Note

This clause is not applicable when streaming from a Kafka data source.

The form of a file path is dependent on the source referenced:

  • Data Source: If a data source is specified in the external table load options these file paths must resolve to accessible files at that data source location. A "path prefix" can be specified, which will cause all files whose path begins with the given prefix to be included.

    For example, a "path prefix" of /data/ge for <file paths> would match all of the following:

    • /data/geo.csv
    • /data/geo/flights.csv
    • /data/geo/2021/airline.csv

    If using an HDFS data source, the "path prefix" must be the name of an HDFS directory.

  • KiFS: The path must resolve to an accessible file path within KiFS. A "path prefix" can be specified, which will cause all files whose path begins with the given prefix to be included.

    For example, a "path prefix" of kifs://data/ge would match all of the following files under the KiFS data directory:

    • kifs://data/geo.csv
    • kifs://data/geo/flights.csv
    • kifs://data/geo/2021/airline.csv
  • Kinetica Cluster: The paths are OS-level paths under the Kinetica configuration file's external files directory that are accessible to the gpudb user and either:

    • accessible to all nodes in the Kinetica cluster
    • mirrored across each node in the Kinetica cluster

    A "path prefix" that references a directory on the cluster file system can be specified, which will cause all files contained within that directory and its subdirectories to be included. Wildcards can also be used to specify a group of files.

    For example, if external_files_directory is set to /opt/gpudb/data, the following <file paths> specifications will refer to these corresponding files:

    File PathReference
    /opt/gpudb/data/products.csvFile /opt/gpudb/data/products.csv
    archive/products.csvFile /opt/gpudb/data/archive/products.csv
    archive/*.csvAll files under /opt/gpudb/data/archive with a csv extension
    archiveAll files under /opt/gpudb/data/archive and its subdirectories
    /opt/gpudb/products.csvError, as absolute path /opt/gpudb does not contain external files directory /opt/gpudb/data as part of the path
FORMAT

Optional indicator of source file type; will be inferred from file extension if not given.

Supported formats include:

KeywordDescription
[DELIMITED] TEXT

Any text-based, delimited field data file (CSV, PSV, TSV, etc.); a comma-delimited list of options can be given to specify the way in which the data file(s) should be parsed, including the delimiter used, whether headers are present, etc.

See Delimited Text Options for the complete list of <delimited text options>.

PARQUETApache Parquet data file
JSONEither a JSON or GeoJSON data file
SHAPEFILEArcGIS shapefile
WITH OPTIONS

Optional indicator that a comma-delimited list of connection & global option/value assignments will follow.

See Load Options for the complete list of options

<table property clause>Optional comma-separated list of table properties, from a subset of those available, to the table to load into, if it is created by this call.

Delimited Text Options

The following options can be specified when loading data from delimited text files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.

OptionDescription
COMMENT = '<string>'

Treat lines in the source file(s) that begin with string as comments and skip.

The default comment marker is #.

DELIMITER = '<char>'

Use char as the source file field delimiter.

The default delimiter is a comma, unless a source file has one of these extensions:

  • .psv - will cause | to be the delimiter
  • .tsv - will cause the tab character to be the delimiter

See Delimited Text Option Characters for allowed characters.

ESCAPE = '<char>'

Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:

Source Data StringRepresentation when Loaded into the Database
<char>aASCII bell
<char>bASCII backspace
<char>fASCII form feed
<char>nASCII line feed
<char>rASCII carriage return
<char>tASCII horizontal tab
<char>vASCII vertical tab

For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database.

The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

There is no default escape character.

HEADER DELIMITER = '<char>'

Use char as the source file header field name/property delimiter, when the source file header contains both names and properties. This is largely specific to the Kinetica export to delimited text feature, which will, within each field's header, contain the field name and any associated properties, delimited by the pipe | character.

An example Kinetica header in a CSV file:

The default is the | (pipe) character. See Delimited Text Option Characters for allowed characters.

Note

The DELIMITER character will still be used to separate field name/property sets from each other in the header row

INCLUDES HEADER = <TRUE|FALSE>Declare that the source file(s) will or will not have a header. If not given, that determination will be intuited.
NULL = '<string>'

Treat string as the indicator of a null source field value.

The default is the empty string.

QUOTE = '<char>'

Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given.

The default is the " (double-quote) character. See Delimited Text Option Characters for allowed characters.

Delimited Text Option Characters

For DELIMITER, HEADER DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:

Escaped CharCorresponding Source File Character
''Single quote
\aASCII bell
\bASCII backspace
\fASCII form feed
\tASCII horizontal tab
\vASCII vertical tab

For instance, if two single quotes ('') are specified for a QUOTE character, the parser will interpret single quotes in the source file as quoting characters; specifying \t for DELIMITER will cause the parser to interpret ASCII horizontal tab characters in the source file as delimiter characters.

Load Options

The following options can be specified to modify the way data is loaded (or not loaded) into the target table.

OptionDescription
BATCH SIZE

Use an ingest batch size of the given number of records.

The default batch size is 10,000.

COLUMN FORMATS

Use the given type-specific formatting for the given column when parsing source data being loaded into that column. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string.

Supported column types include:

Column TypeSource Data Format Codes
date

Apply the given date format to the given column.

Important

Valid formats must contain one year, month, & day code

Format codes include:

CodeDescription
%Y4-digit year
%m2-digit month, where January is 01
%d2-digit day of the month, where the 1st of each month is 01
time

Apply the given time format to the given column.

Important

Valid formats must contain hours, minutes, & one of the seconds codes.

Format codes include:

CodeDescription
%H24-based hour, where 12:00 AM is 00 and 7:00 PM is 19
%M2-digit minute of the hour
%S2-digit second of the minute
%s

2-digit second of the minute plus an N-digit fractional component, separated by a dot

Note

fractional seconds will be truncated after the milliseconds place

datetime

Apply the given date/time format to the given column.

Important

Valid formats must contain a date & time format, as detailed above.

For example, to make the YYYY.MM.DD format for loading source data into date column d and HH:MM:SS format for loading source data into time column t:

{
    "d": {"date": "%Y.%m.%d"},
    "t": {"time": "%H:%M:%S"}
}
DATA SOURCELoad data from the given data source. Data source connect privilege is required when loading from a data source.
DEFAULT COLUMN FORMATS

Use the given formats for source data being loaded into target table columns with the corresponding column types. This should be a map of target column type to source format for data being loaded into columns of that type, formatted as a JSON string.

Supported column properties and source data formats are the same as those listed in the description of the COLUMN FORMATS option.

For example, to make the default format for loading source data dates in the form YYYY.MM.DD and times in the form HH:MM:SS:

{
    "date": "%Y.%m.%d",
    "time": "%H:%M:%S",
    "datetime": "%Y.%m.%d %H:%M:%S"
}
FIELDS MAPPED BY < POSITION(<col#s>) | NAME(<field names>) >

Choose a comma-separated list of fields from the source file(s) to load, specifying fields by either position or name. If loading by name, the source file field names must match the target table column names exactly.

Note

When specifying source data file fields, the set of source data file fields must align, in type & number, with the target table columns into which data will be loaded.

Important

Field mapping by position is not supported for Parquet files.

INGESTION MODE

Whether to do a full ingest of the data or perform a dry run or type inference instead. The default mode is FULL.

ValueDescription
DRY RUNNo data will be inserted, but the file will be read with the applied ON ERROR mode and the number of valid records that would normally be inserted is returned.
FULLData is fully ingested according to the active ON ERROR mode.
TYPE INFERENCEInfer the type of the source data and return, without ingesting any data. The inferred type is returned in the response, as the output of a SHOW CREATE TABLE command.
KAFKA GROUP IDConsumer group ID to use for data sources configured for Kafka. If not specified, one will be randomly generated.
LOADING MODE

Use one of the following distribution schemes to load data files. The default mode is HEAD.

ModeDescription
HEADThe head node loads all data. All files must be available to the head node.
DISTRIBUTED LOCAL

A single worker process on each node loads all files that are available to it. This option works best when each worker loads files from its own file system, to maximize performance. In order to avoid data duplication, either each worker performing the load needs to have visibility to a set of files unique to it (no file is visible to more than one node) or the target table needs to have a primary key (which will allow the worker to automatically deduplicate data).

Note

  • If the target table doesn't exist, the table structure will be determined by the head node. If the head node has no files local to it, it will be unable to determine the structure and the request will fail.
  • This mode should not be used in conjuction with a data source, as data sources are seen by all worker processes as shared resources with no "local" component.
  • If the head node is configured to have no worker processes, no data strictly accessible to the head node will be loaded.
DISTRIBUTED SHARED

The head node coordinates loading data by worker processes across all nodes from shared files available to all workers.

Note

Instead of existing on a shared source, the files can be duplicated on a source local to each host to improve performance, though the files must appear as the same data set from the perspective of all hosts performing the load.

ON ERROR

When an error is encountered loading a record, handle it using one of the following modes. The default mode is PERMISSIVE.

ModeDescription
PERMISSIVEIf an error is encountered parsing a source record, attempt to insert as many of the valid fields from the record as possible; insert a null for each errant value found.
SKIPIf an error is encountered parsing a source record, skip the record.
ABORTIf an error is encountered parsing a source record, stop the data load process. Primary key collisions are considered abortable errors in this mode.
POLL_INTERVAL

Interval, in seconds, at which a data source is polled for updates. Only valid when SUBSCRIBE is true. Default is 60 seconds.

Note

The number of seconds must be passed as a single-quoted string.

SUBSCRIBE

Whether to subscribe to the data source specified in the DATA SOURCE option. The default value is FALSE.

Important

This option is only available for data sources configured for Azure, Kafka, or S3.

ValueDescription
TRUEStream data from the specified data source.
FALSELoad all data once from the specified data source.

Table Property Clause

A comma-separated list of options for creating the target table, if it doesn't exist and if the source data file has Kinetica headers, can be specified.

Kinetica headers contain column type information, which are used in creating the target table if it doesn't exist. An example Kinetica header:

id|int|data,category|string|data|char16,name|string|data|char32,description|string|data|char128|nullable,stock|int|data|nullable
LOAD INTO Table Property Clause
1
2
3
4
5
6
USING TABLE PROPERTIES
(
    <table property> = <value>,
    ...
    <table property> = <value>
)
PropertyDescription
CHUNK SIZESize of the blocks of memory holding the data, when loaded; specified as the maximum number of records each block of memory should hold
TTLThe time-to-live (TTL) for the table; if not set, the table will not expire

Examples

Single File

To load a CSV file of product data located at data/products.csv, relative to the configured external files directory, into a table named product:

LOAD INTO (Single File) Example
1
2
LOAD DATA INTO example.product
FROM FILE PATHS 'data/products.csv'

Single File - No Header

To load a headerless CSV file of product data located at data/products.nh.csv, relative to the configured external files directory, into a table named product:

LOAD INTO (No Header) Example
1
2
3
LOAD DATA INTO example.product
FROM FILE PATHS 'data/products.nh.csv'
FORMAT TEXT (INCLUDES HEADER = false)

Multiple Files

To load all files whose names begin with products and end in a csv extension, located at data/products*csv, relative to the configured external files directory, into a table named product:

LOAD INTO (Multiple Files by Wildcard) Example
1
2
3
LOAD DATA INTO example.product
FROM FILE PATHS 'data/products*csv'
FORMAT TEXT

When loading files from multiple paths, specify a comma-delimited list of those file paths. To load one file located at data/products.csv and one file located at data/products.kh.csv, relative to the configured external files directory, into a table named product:

LOAD INTO (Multiple Files by List) Example
1
2
3
LOAD DATA INTO example.product
FROM FILE PATHS 'data/products.csv', 'data/products.kh.csv'
FORMAT TEXT

Note

Multiple paths and wildcards specifying multiple files on those paths can be used together.

Column/Field Names

To load specific fields from a product data file, with the following conditions:

  • data file located at data/products.title-case.csv, relative to the configured external files directory
  • data file contains a header with fields named ID, Category, Name, Description, & Stock
  • target table named product_name_stock
  • target columns named id, name, & stock
  • only load fields ID, Name, & Stock into columns id, name, & stock
  • skip records that fail to load successfully
LOAD INTO (Column/Field Names) Example
1
2
3
4
LOAD DATA INTO example.product_name_stock
FROM FILE PATHS 'data/products.title-case.csv'
FORMAT TEXT
WITH OPTIONS (ON ERROR = skip, FIELDS MAPPED BY NAME(ID, Name, Stock))

Note

When specifying source data file field names, the set of source data file fields selected must align, in type & number, with the target table columns into which data will be loaded.

Delimited Text Options

To load data from a product data file with the following options:

  • file located at data/products.ssv, relative to the configured external files directory
  • the file has a header
  • the field delimiter is a semicolon
  • data requiring quotes will have single quotes as the quoting character
  • the escape character is the backtick
  • the data file represents null values as <null>
  • data file comments are on lines starting with --
  • when parse errors are encountered, as much of the record's data as possible will be loaded into the target table
LOAD INTO with Delimited Text Options Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
LOAD DATA INTO example.product
FROM FILE PATHS 'data/products.ssv'
FORMAT TEXT
(
  COMMENT = '--',
  DELIMITER = ';',
  ESCAPE = '`',
  INCLUDES HEADER = true,
  NULL = '<null>',
  QUOTE = ''''
)
WITH OPTIONS (ON ERROR = permissive)

Create Table with Kinetica Headers

A CSV file with Kinetica headers can be used to create the target table if it doesn't exist.

To load such a file of product data with the following conditions:

  • data file located at data/products.kh.csv, relative to the configured external files directory
  • data file contains a Kinetica header
  • target table named product_create_on_load
  • target table will be created if it doesn't exist
  • target table will have a chunk size of 1,000,000 records, if it is created by this load command
LOAD INTO (Create Table/Kinetica Header) Example
1
2
3
LOAD DATA INTO example.product_create_on_load
FROM FILE PATHS 'data/products.kh.csv'
USING TABLE PROPERTIES (CHUNK SIZE = 1000000)

Parquet File

To load a Parquet file of employee data located at data/employee.parquet, relative to the configured external files directory, into a table named employee_2000:

LOAD INTO (Parquet) Example
1
2
3
LOAD DATA INTO example.employee_2000
FROM FILE PATHS 'data/employee.parquet'
FORMAT PARQUET

Kafka

To subscribe to a Kafka data feed with the following conditions:

  • Kafka remote source is represented by the kafka_ds data source
  • data will be continuously streamed from the Kafka source
  • target table named orders
LOAD INTO (Kafka) Example
1
2
LOAD DATA INTO example.orders
WITH OPTIONS (DATA SOURCE = 'kafka_ds', SUBSCRIBE = true)

Dry Run

To perform a dry run of a load of a CSV file of product data located at data/products.csv, relative to the configured external files directory, into a table named product:

LOAD INTO (Dry Run) Example
1
2
3
LOAD DATA INTO example.product
FROM FILE PATHS 'data/products.csv'
WITH OPTIONS (INGESTION MODE = dry run)

Note

The dry run will return the number of records that would have been loaded, but not actually insert any records into the target table.

KiFS File

To load a CSV file, products.csv, in the KiFS directory data, into a table named example.product:

1
2
LOAD DATA INTO example.product
FROM FILE PATHS 'kifs://data/products.csv'

Data Source File

To load a CSV file, products.csv, in the KiFS directory data, into a table named example.product:

1
2
3
LOAD DATA INTO example.product
FROM FILE PATHS 'products.csv'
WITH OPTIONS (DATA SOURCE = 'product_ds')