Loading Data

Kinetica supports loading of data via SQL from a variety of file types and through several mechanisms. There are two primary paths:


LOAD INTO

Kinetica can load data into a table, using a LOAD INTO statement. The data can be sourced through either of the following:

  • local files uploaded into KiFS
  • a data source configured to allow access to:
    • remote files on Azure, GCS, HDFS, or S3
    • remote messages on a Kafka topic
    • a remote database queryable via JDBC (or CData JDBC)

Note

For contextualized examples, see Examples. For copy/paste examples, see Loading Data. For an overview of loading data into Kinetica, see Data Loading Concepts.

LOAD INTO Syntax
1
2
3
4
5
6
7
8
9
<COPY | LOAD> [DATA] INTO [<schema name>.]<table name>
<
      FROM FILE PATHS <file paths>
            [FORMAT <[DELIMITED] TEXT [(<delimited text options>)] | AVRO | JSON | PARQUET | SHAPEFILE>]
   |
   FROM REMOTE QUERY '<source data query>'
>
[WITH OPTIONS (<load option name> = '<load option value>'[,...])]
[<table property clause>]

A LOAD INTO that uses a data source can perform a one-time load upon creation and optionally subscribe for updates on an interval, depending on the data source provider:

Provider Description One-Time Load Subscription
Azure Microsoft blob storage Yes Yes
CData

CData Software source-specific JDBC driver

See driver list for the supported drivers

Yes  
GCS Google Cloud Storage Yes Yes
HDFS Apache Hadoop Distributed File System Yes  
JDBC Java DataBase Connectivity; requires user-supplied driver Yes  
Kafka Apache Kafka streaming feed (JSON/GeoJSON only)   Yes
S3 Amazon S3 Bucket Yes Yes

See Manage Subscription for pausing, resuming, & canceling subscriptions on the target table.

While a target table can have a primary key defined, there are two limitations to consider when loading data into one with LOAD INTO:

  • A primary key collision between an incoming record and one already in the external table will result in the incoming record being rejected--there are no primary key record updates when using LOAD INTO
  • A primary key collision between two records within the incoming data set will result in one of the two records being chosen non-deterministically for insert into the target table; assuming there is no collision between that record and one already in the table.

The use of LOAD INTO with ring resiliency has additional considerations.

Parameters Description
DATA Optional keyword for compatibility
<schema name> Name of the schema containing the table into which data will be loaded
<table name> Name of the table into which data will be loaded

FROM FILE PATHS

<file paths>

Source file specification clause, where <file paths> is a comma-separated list of single-quoted file paths from which data will be loaded; all files specified are presumed to have the same format and data types.

Note

This clause is mutually exclusive with the FROM REMOTE QUERY clause, and is only applicable when loading from an Azure, GCS, HDFS, or S3 data source, or from KiFS.

The form of a file path is dependent on the source referenced:

  • Data Source: If a data source is specified in the external table load options these file paths must resolve to accessible files at that data source location. A "path prefix" can be specified, which will cause all files whose path begins with the given prefix to be included.

    For example, a "path prefix" of /data/ge for <file paths> would match all of the following:

    • /data/geo.csv
    • /data/geo/flights.csv
    • /data/geo/2021/airline.csv

    If using an HDFS data source, the "path prefix" must be the name of an HDFS directory.

  • KiFS: The path must resolve to an accessible file path within KiFS. A "path prefix" can be specified, which will cause all files whose path begins with the given prefix to be included.

    For example, a "path prefix" of kifs://data/ge would match all of the following files under the KiFS data directory:

    • kifs://data/geo.csv
    • kifs://data/geo/flights.csv
    • kifs://data/geo/2021/airline.csv
FORMAT

Optional indicator of source file type, for file-based data sources; will be inferred from file extension if not given.

Supported formats include:

Keyword Description
[DELIMITED] TEXT

Any text-based, delimited field data file (CSV, PSV, TSV, etc.); a comma-delimited list of options can be given to specify the way in which the data file(s) should be parsed, including the delimiter used, whether headers are present, etc. Records spanning multiple lines are not supported.

See Delimited Text Options for the complete list of <delimited text options>.

AVRO Apache Avro data file
JSON

Either a JSON or GeoJSON data file

See JSON/GeoJSON Limitations for the supported data types.

PARQUET

Apache Parquet data file

See Parquet Limitations for the supported data types.

SHAPEFILE ArcGIS shapefile

FROM REMOTE QUERY

'<source data query>'

Source data specification clause, where <source data query> is a SQL query selecting the data which will be loaded. Any column expressions used should be aliased.

Note

This clause is mutually exclusive with the FROM FILE PATHS clause, and is only applicable to CData & JDBC data sources.

The query should meet the following criteria:

  • Any column expression used is given a column alias.
  • The first column is not a WKT or unlimited length VARCHAR type.
  • The columns and expressions queried should match the order, number, & type of the columns in the target table. If the target table doesn't exist, it will be created using the names and data types of the queried columns in the order in which they appear in the query.

Type inferencing is limited by the available JDBC types. To take advantage of Kinetica-specific types and properties, create the target table manually before loading data.

WITH OPTIONS

Optional indicator that a comma-delimited list of connection & global option/value assignments will follow.

See Load Options for the complete list of options.

<table property clause> Optional comma-separated list of table properties, from a subset of those available, to the table to load into, if it is created by this call.

Delimited Text Options

The following options can be specified when loading data from delimited text files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.

Option Description
COMMENT = '<string>'

Treat lines in the source file(s) that begin with string as comments and skip.

The default comment marker is #.

DELIMITER = '<char>'

Use char as the source file field delimiter.

The default delimiter is a comma, unless a source file has one of these extensions:

  • .psv - will cause | to be the delimiter
  • .tsv - will cause the tab character to be the delimiter

See Delimited Text Option Characters for allowed characters.

ESCAPE = '<char>'

Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:

Source Data String Representation when Loaded into the Database
<char>a ASCII bell
<char>b ASCII backspace
<char>f ASCII form feed
<char>n ASCII line feed
<char>r ASCII carriage return
<char>t ASCII horizontal tab
<char>v ASCII vertical tab

For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database.

The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

There is no default escape character.

HEADER DELIMITER = '<char>'

Use char as the source file header field name/property delimiter, when the source file header contains both names and properties. This is largely specific to the Kinetica export to delimited text feature, which will, within each field's header, contain the field name and any associated properties, delimited by the pipe | character.

An example Kinetica header in a CSV file:

1
id|int|data,category|string|data|char16,name|string|data|char32

The default is the | (pipe) character. See Delimited Text Option Characters for allowed characters.

Note

The DELIMITER character will still be used to separate field name/property sets from each other in the header row

INCLUDES HEADER = <TRUE|FALSE>

Declare that the source file(s) will or will not have a header.

The default is TRUE.

NULL = '<string>'

Treat string as the indicator of a null source field value.

The default is the empty string.

QUOTE = '<char>'

Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given.

The default is the " (double-quote) character. See Delimited Text Option Characters for allowed characters.

Delimited Text Option Characters

For DELIMITER, HEADER DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:

Escaped Char Corresponding Source File Character
'' Single quote
\a ASCII bell
\b ASCII backspace
\f ASCII form feed
\t ASCII horizontal tab
\v ASCII vertical tab

For instance, if two single quotes ('') are specified for a QUOTE character, the parser will interpret single quotes in the source file as quoting characters; specifying \t for DELIMITER will cause the parser to interpret ASCII horizontal tab characters in the source file as delimiter characters.

Load Options

The following options can be specified to modify the way data is loaded (or not loaded) into the target table.

Option Description
BAD RECORD TABLE

Name of the table containing records that failed to be loaded into the target table. This bad record table will include the following columns:

Column Name Source Data Format Codes
line_number Number of the line in the input file containing the failed record
char_number Position of character within a failed record that is assessed as the beginning of the portion of the record that failed to process
filename Name of file that contained the failed record
line_rejected Text of the record that failed to process
error_msg Error message associated with the record processing failure

Note

This option is not applicable for an ON ERROR mode of ABORT. In that mode, processing stops at the first error and that error is returned to the user.

BATCH SIZE

Use an ingest batch size of the given number of records.

The default batch size is 200,000.

Note

This option is not applicable to Kafka data sources.

COLUMN FORMATS

Use the given type-specific formatting for the given column when parsing source data being loaded into that column. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string.

Supported column types include:

Column Type Source Data Format Codes
date

Apply the given date format to the given column.

Common date format codes follow. For the complete list, see Date/Time Conversion Codes.

Code Description
YYYY 4-digit year
MM 2-digit month, where January is 01
DD 2-digit day of the month, where the 1st of each month is 01
time

Apply the given time format to the given column.

Common time format codes follow. For the complete list, see Date/Time Conversion Codes.

Code Description
HH24 24-based hour, where 12:00 AM is 00 and 7:00 PM is 19
MI 2-digit minute of the hour
SS 2-digit second of the minute
MS milliseconds
datetime Apply the given date/time format to the given column.

For example, to load dates of the format 2010.10.30 into date column d and times of the 24-hour format 18:36:54.789 into time column t:

{
    "d": {"date": "YYYY.MM.DD"},
    "t": {"time": "HH24:MI:SS.MS"}
}

Note

This option is not applicable to JDBC data sources.

DATA SOURCE Load data from the given data source. Data source connect privilege is required when loading from a data source.
DEFAULT COLUMN FORMATS

Use the given formats for source data being loaded into target table columns with the corresponding column types. This should be a map of target column type to source format for data being loaded into columns of that type, formatted as a JSON string.

Supported column properties and source data formats are the same as those listed in the description of the COLUMN FORMATS option.

For example, to make the default format for loading source data dates like 2010.10.30 and 24-hour times like 18:36:54.789:

{
    "date": "YYYY.MM.DD",
    "time": "HH24:MI:SS.MS",
    "datetime": "YYYY.MM.DD HH24:MI:SS.MS"
}

Note

This option is not applicable to JDBC data sources.

FIELDS IGNORED BY < POSITION(<col#s>) | NAME(<field names>) >

Choose a comma-separated list of fields from the source file(s) to ignore, loading only those fields that are not in the identified list in the order they appear in the file. Fields can be identified by either POSITION or NAME. If ignoring by NAME, the specified names must match the source file field names exactly.

Note

When ignoring source data file fields and the target table exists, the set of fields that are not ignored must align, in type & number in their order in the source file, with the target table columns into which the data will be loaded.

Important

Ignoring fields by POSITION is only supported for delimited text files.

FIELDS MAPPED BY < POSITION(<col#s>) | NAME(<field names>) >

Choose a comma-separated list of fields from the source file(s) to load, in the specified order, identifying fields by either POSITION or NAME. If mapping by NAME, the specified names must match the source file field names exactly.

Note

When mapping source data file fields and the target table exists, the set of fields that are identified must align, in type & number in the specified order, with the target table columns into which data will be loaded.

Important

Mapping fields by POSITION is only supported for delimited text files.

INGESTION MODE

Whether to do a full ingest of the data or perform a dry run or type inference instead. The default mode is FULL.

Value Description
DRY RUN No data will be inserted, but the file will be read with the applied ON ERROR mode and the number of valid records that would normally be inserted is returned.
FULL Data is fully ingested according to the active ON ERROR mode.
TYPE INFERENCE Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response, as the output of a SHOW TABLE command.
KAFKA GROUP ID Consumer group ID to use for data sources configured for Kafka. If not specified, one will be randomly generated.
ON ERROR

When an error is encountered loading a record, handle it using one of the following modes. The default mode is ABORT.

Mode Description
PERMISSIVE If an error is encountered parsing a source record, attempt to insert as many of the valid fields from the record as possible; insert a null for each errant value found.
SKIP If an error is encountered parsing a source record, skip the record.
ABORT If an error is encountered parsing a source record, stop the data load process. Primary key collisions are considered abortable errors in this mode.
POLL_INTERVAL

Interval, in seconds, at which a data source is polled for updates. The number of seconds must be passed as a single-quoted string. Default is 60 seconds.

Important

This option is only available for data sources configured for Azure or S3 and when SUBSCRIBE is true.

PRIMARY KEY

Optional primary key specification clause, where the value is a parenthesized comma-separated list of columns to use as the primary key for the table:

PRIMARY KEY = ('id', 'dept_id')
SHARD KEY

Optional shard key specification clause, where the value is a parenthesized comma-separated list of columns to use as the shard key for the table:

SHARD KEY = ('id')
SUBSCRIBE

Whether to subscribe to the data source specified in the DATA SOURCE option. The default value is FALSE.

Important

This option is only available for data sources configured for Azure, GCS, Kafka, or S3.

Value Description
TRUE Stream data from the specified data source.
FALSE Load all data once from the specified data source.

Table Property Clause

A comma-separated list of options for creating the target table, if it doesn't exist and if the source data file has Kinetica headers, can be specified.

Kinetica headers contain column type information, which are used in creating the target table if it doesn't exist. An example Kinetica header:

id|int|data,category|string|data|char16,name|string|data|char32,description|string|data|char128|nullable,stock|int|data|nullable
LOAD INTO Table Property Clause
1
2
3
4
5
6
USING TABLE PROPERTIES
(
    <table property> = <value>,
    ...
    <table property> = <value>
)
Property Description
CHUNK SIZE Size of the blocks of memory holding the data, when loaded; specified as the maximum number of records each block of memory should hold
TTL The time-to-live (TTL) for the table; if not set, the table will not expire

Examples

Single File

To load a CSV file of product data named products.csv in the KiFS directory data, into a table named product:

LOAD INTO (Single File) Example
1
2
LOAD DATA INTO example.product
FROM FILE PATHS 'kifs://data/products.csv'

Single File - No Header

To load a headerless CSV file of product data named products.nh.csv in the KiFS directory data, into a table named product:

LOAD INTO (No Header) Example
1
2
3
LOAD DATA INTO example.product
FROM FILE PATHS 'kifs://data/products.nh.csv'
FORMAT TEXT (INCLUDES HEADER = FALSE)

Multiple Files

When loading files from multiple paths, specify a comma-delimited list of those file paths. To load one file named products.csv and one file named products.kh.csv, both under the KiFS directory data, into a table named product:

LOAD INTO (Multiple Files) Example
1
2
3
LOAD DATA INTO example.product
FROM FILE PATHS 'kifs://data/products.csv', 'kifs://data/products.kh.csv'
FORMAT TEXT

Column/Field Names

To load specific fields from a product data file, with the following conditions:

  • data file named products.title-case.csv located in the KiFS directory data
  • data file contains a header with fields named ID, Category, Name, Description, & Stock
  • target table named product_name_stock
  • target columns named id, name, & stock
  • only load fields ID, Name, & Stock into columns id, name, & stock
  • skip records that fail to load successfully
LOAD INTO (Column/Field Names) Example
1
2
3
4
LOAD DATA INTO example.product_name_stock
FROM FILE PATHS 'kifs://data/products.title-case.csv'
FORMAT TEXT
WITH OPTIONS (ON ERROR = SKIP, FIELDS MAPPED BY NAME(ID, Name, Stock))

Note

When specifying source data file field names, the set of source data file fields selected must align, in type & number, with the target table columns into which data will be loaded.

Date/Time Column Types

To apply a specific date format to the hire_date field:

LOAD INTO (Date/Time Column Types) Example
1
2
3
4
5
6
7
8
9
LOAD INTO example.employee
FROM FILE PATHS 'kifs://data/employee.csv'
WITH OPTIONS
(
    COLUMN FORMATS = '
    {
        "hire_date": {"date": "YYYY-MM-DD"}
    }'
)

Delimited Text Options

To load data from a product data file with the following options:

  • data file named products.ssv located in the KiFS directory data
  • the file has a header
  • the field delimiter is a semicolon
  • data requiring quotes will have single quotes as the quoting character
  • the escape character is the backtick
  • the data file represents null values as <null>
  • data file comments are on lines starting with --
  • when parse errors are encountered, as much of the record's data as possible will be loaded into the target table
LOAD INTO with Delimited Text Options Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
LOAD DATA INTO example.product
FROM FILE PATHS 'kifs://data/products.ssv'
FORMAT TEXT
(
      COMMENT = '--',
      DELIMITER = ';',
      ESCAPE = '`',
      INCLUDES HEADER = TRUE,
      NULL = '<null>',
      QUOTE = ''''
)
WITH OPTIONS (ON ERROR = PERMISSIVE)

Create Table with Kinetica Headers

A CSV file with Kinetica headers can be used to create the target table if it doesn't exist.

To load such a file of product data with the following conditions:

  • data file named products.kh.csv located in the KiFS directory data
  • data file contains a Kinetica header
  • target table named product_create_on_load
  • target table will be created if it doesn't exist
  • target table will have a chunk size of 1,000,000 records, if it is created by this load command
LOAD INTO (Create Table/Kinetica Header) Example
1
2
3
LOAD DATA INTO example.product_create_on_load
FROM FILE PATHS 'kifs://data/products.kh.csv'
USING TABLE PROPERTIES (CHUNK SIZE = 1000000)

Primary/Shard Keys

A CSV file without Kinetica headers can be used to create a target table with primary & shard keys if it doesn't exist.

To load such a file of employee data with the following conditions:

  • data file named employee.csv located in the KiFS directory data
  • data file does not contain a Kinetica header
  • target table named example.employee
  • target table will be created if it doesn't exist
  • target table will have a composite primary key on the id & dept_id columns and a shard key on id column, if it is created by this load command
LOAD INTO (Primary/Shard Key) Example
1
2
3
4
5
6
7
LOAD INTO example.employee
FROM FILE PATHS 'kifs://data/employee.csv'
WITH OPTIONS
(
    PRIMARY KEY = (id, dept_id),
    SHARD KEY = (id)
)

Important

The data must have no nulls in any columns designated as primary key columns.

Parquet File

To load a Parquet file of employee data, employee.parquet, in the KiFS directory data, into a table named example.employee_2000:

LOAD INTO (Parquet) Example
1
2
3
LOAD DATA INTO example.employee_2000
FROM FILE PATHS 'kifs://data/employee.parquet'
FORMAT PARQUET

Kafka

To subscribe to a Kafka data feed with the following conditions:

  • Kafka remote source is represented by the kafka_ds data source
  • data will be continuously streamed from the Kafka source
  • target table named orders
LOAD INTO (Kafka) Example
1
2
3
LOAD DATA INTO example.orders
FORMAT JSON
WITH OPTIONS (DATA SOURCE = 'kafka_ds', SUBSCRIBE = TRUE)

Dry Run

To perform a dry run of a load of a CSV file, products.csv, in the KiFS directory data, into a table named example.product:

LOAD INTO (Dry Run) Example
1
2
3
LOAD DATA INTO example.product
FROM FILE PATHS 'kifs://data/products.csv'
WITH OPTIONS (INGESTION MODE = DRY RUN)

Note

The dry run will return the number of records that would have been loaded, but not actually insert any records into the target table.

Data Source File

To load a CSV file, products.csv, from the data source example.product_ds, into a table named example.product:

LOAD INTO (Data Source File) Example
1
2
3
LOAD DATA INTO example.product
FROM FILE PATHS 'products.csv'
WITH OPTIONS (DATA SOURCE = 'example.product_ds')

Data Source Query

To load the results of a remote query of employees in department 2 from the JDBC data source example.jdbc_ds, into a local table named example.employee_dept2:

LOAD INTO (Data Source Query) Example
1
2
3
LOAD INTO example.employee_dept2
FROM REMOTE QUERY 'SELECT * FROM example.employee WHERE dept_id = 2'
WITH OPTIONS (DATA SOURCE = 'example.jdbc_ds')

INSERT INTO ... SELECT ... FROM FILE

Kinetica can load data from text-based data files that are local to the client into existing tables using a modified INSERT INTO ... SELECT statement.

Note

This command is only available through KiSQL or database clients configured with the Kinetica ODBC/JDBC driver.

When reading data from a file, Kinetica assumes the following format, able to be overridden using command options:

  • If there is a header row, it is either:

    • a simple list of comma-delimited column names:

      id,category,name,description,stock
      
    • the Kinetica standard format:

      id|int|data,category|string|data|char16,name|string|data|char32,description|string|data|char128|nullable,stock|int|data|nullable
      
  • The data fields are comma-delimited

  • Strings are optionally enclosed in double-quotes; double-quotes must be used when the data contains commas; two consecutive double-quotes are used as an escape code for double-quoted string data containing double-quotes; e.g.:

    "This string contains a "" quote mark and a "","" double-quoted comma."

    This would be loaded into a single string column in Kinetica as:

    This string contains a " quote mark and a "," double-quoted comma.

INSERT INTO ... SELECT ... FROM FILE Syntax
1
2
3
4
INSERT INTO [<schema name>.]<table name> [( <column list> )]
SELECT <field list>
FROM FILE."<file name | file name expression>"
[ <option list> ]
Parameters Description
<schema name> Name of the schema containing the table into which data will be inserted
<table name> Name of the table into which data will be inserted
<column list> Optional list of target table columns into which data will be inserted; if not specified, all target table columns will be assumed. Whether specified or implied, this list needs to match the fields specified in <field list> in number, order, & type.
<field list> List of the source data file field names from which data will be extracted; use * to extract all columns (this wildcard must be used when a source file has no header which defines field names). This list needs to match all the target table columns (or those specified in the <column list> option) in number, order, & type.
<file name | file name expression> Name of the source data file or set of source data files from which records will be extracted; wildcards (*) can be used to specify a group of files
<option list> Optional list of whitespace-separated processing options

Options

The following options can be specified when loading data from files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read.

Option Description
BATCH_SIZE=<n>

Use an ingest batch size of n records.

The default batch size is 10,000.

COMMENT='<string>'

Treat lines in the source file that begin with string as comments and skip.

The default comment marker is #.

DELIMITER='<char>'

Use char as the source file field delimiter.

The default delimiter is ,, unless the source file has one of these extensions:

  • .psv - will cause | to be the delimiter
  • .tsv - will cause the tab character to be the delimiter

See Option Characters for allowed characters.

ESCAPE='<char>'

Use char as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:

Source Data String Representation when Loaded into the Database
<char>a ASCII bell
<char>b ASCII backspace
<char>f ASCII form feed
<char>n ASCII line feed
<char>r ASCII carriage return
<char>t ASCII horizontal tab
<char>v ASCII vertical tab

For instance, if the escape character is \, a \t encountered in the data will be converted to a tab character when stored in the database.

The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not.

There is no default escape character. See Option Characters for allowed characters.

HEADER=<Y|N> Declare that the source file has or does not have a header. If not given, that determination will be intuited.
INITIAL_CLEAR

Truncate the target table before loading data.

The default is to not truncate the target table.

LIMIT=<n>

Limit the total number of records inserted into the target table to n. If reading from multiple source files, this count is assessed across all files being read. For example, if n is 15 and the three files being read have 10 records each, all of the records from the 1st file and the first 5 records from the 2nd file will be loaded. The remaining records in the 2nd file and all of the records from the 3rd file will be skipped.

The default is no limit.

NULL='<string>'

Treat string as the indicator of a null source field value.

The default is \N.

ON_ERROR=<mode>

When an error is encountered loading a record, handle it using using the given <mode>; one of the following:

Mode Description
PERMISSIVE If an error is encountered parsing a source record, attempt to insert as much of the valid fields from the record as possible.
SKIP If an error is encountered parsing a source record, skip the record.
ABORT If an error is encountered parsing a source record, stop the data load process. Primary key collisions are considered abortable errors in this mode.

The default mode is SKIP.

QUOTE='<char>'

Use char as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters (for clarity, for instance). The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by ESCAPE, if given.

The default is the " (double-quote) character. See Option Characters for allowed characters.

SKIP=<n>

Skip the first n source file data lines read, not including header and comment lines. If reading from multiple source files, this count is assessed across all files being read. For example, if n is 15 and the first two files read have 10 records each, the first record to be loaded (not skipped) will be the 6th record of the 2nd file.

The default is to skip no records.

Option Characters

For DELIMITER, ESCAPE, & QUOTE, any single character can be used, or any one of the following escaped characters:

Escaped Char Corresponding Source File Character
'' Single quote
\\ Backslash
\a ASCII bell
\b ASCII backspace
\f ASCII form feed
\t ASCII horizontal tab
\v ASCII vertical tab

For instance, if two single quotes ('') are specified for a QUOTE character, the parser will interpret single quotes in the source file as quoting characters; specifying \t for DELIMITER will cause the parser to interpret ASCII horizontal tab characters in the source file as delimiter characters.

Examples

Single File

To load a CSV file of product data located at ../data/products.csv into a table named product:

INSERT INTO ... SELECT ... FROM FILE (Single File) Example
1
2
3
INSERT INTO example.product
SELECT *
FROM FILE."../data/products.csv"

Multiple Files

To load all files whose names begin with products and end in a csv extension, located under ../data, into a table named product:

INSERT INTO ... SELECT ... FROM FILE (Multiple File) Example
1
2
3
INSERT INTO example.product
SELECT *
FROM FILE."../data/products*csv"

Column/Field Names

To load specific fields from a product data file located at ../data/products.csv into specific target table columns:

INSERT INTO ... SELECT ... FROM FILE (Column/Field Names) Example
1
2
3
INSERT INTO example.product (id, stock, category, name)
SELECT ID, Stock, Category, Name
FROM FILE."../data/products.csv"

Note

When specifying source data file field names, it is not required that target table column names also be specified, or vice versa. The only requirement is that the set of source data file fields selected align with the target table columns into which data will be inserted.

Options

To load data from a product data file with the following options:

  • file located at ../data/products.csv
  • only records 4 through 10 will be loaded
  • any error will cause the insert to fail
  • the target table will be truncated before loading
  • records will be inserted into the table in groups of 2
INSERT INTO ... SELECT ... FROM FILE with Load Options Example
1
2
3
4
5
6
7
8
INSERT INTO example.product
SELECT *
FROM FILE."../data/products.csv"
BATCH_SIZE=2
INITIAL_CLEAR
LIMIT=7
ON_ERROR=abort
SKIP=3

To load data from a product data file with the following options:

  • file located at ../data/products.ssv
  • the file has a header
  • the field delimiter is a semicolon
  • data requiring quotes will have single quotes as the quoting character
  • the escape character is the backtick
  • the data file represents null values as <null>
  • data file comments are on lines starting with --
INSERT INTO ... SELECT ... FROM FILE with Parse Options Example
1
2
3
4
5
6
7
8
9
INSERT INTO example.product
SELECT *
FROM FILE."../data/products.ssv"
COMMENT='--'
DELIMITER=';'
ESCAPE='`'
HEADER=Y
NULL='<null>'
QUOTE='\''

Considerations

Multi-Head

Blank Lines

  • A blank line in a source data file will be treated as a valid record and will be inserted into the target table as a record with all null fields. If blank lines should not be considered as data, they should be prefixed with the comment marker in order to be skipped by the loading process.

GAdmin

  • While GAdmin does not support this command, it does support importing data from client-local files directly.

/execute/sql endpoint

  • While SQL file ingestion is available to ODBC/JDBC clients; it is not supported within the /execute/sql endpoint, either as a REST call directly or as an API call (like the Python API's execute_sql() function).