> ## Documentation Index > Fetch the complete documentation index at: https://docs.kinetica.com/llms.txt > Use this file to discover all available pages before exploring further. # Loading Data *Kinetica* supports loading of data via SQL from a variety of file types and through several mechanisms. There are two primary paths: * [Loading server-accessible data sets](/content/sql/load#sql-load-file-server) via SQL (or the [/execute/sql](/content/api/rest/execute_sql_rest) endpoint call) that either: * can be reached by *Kinetica* via an [external data source](/content/sql/ddl#sql-create-data-source) object * have been uploaded to *Kinetica* via [SQL](/content/sql/kifs#sql-kifs-upload-file) or [API](/content/api/rest/insert_records_fromfiles_rest) * [Loading client-side files](/content/sql/load#sql-load-file-client) via ODBC/JDBC ## LOAD INTO *Kinetica* can load either internal or external data into a table, using a `LOAD INTO` statement. ```sql title="LOAD INTO Syntax" theme={null} [DATA] INTO [.] < FROM REMOTE QUERY '' | FROM FILE PATHS [FORMAT <[DELIMITED] TEXT [()] | AVRO | JSON | PARQUET | SHAPEFILE>] > [WITH OPTIONS ( = ''[,...])] [

] ``` For contextualized examples, see [Examples](/content/sql/load#sql-load-file-server-examples). For copy/paste examples, see [Loading Data](/content/snippets/load-data). For an overview of loading data into *Kinetica*, see [Data Loading Concepts](/content/load_data/concepts). The data can be sourced through either of the following: * local files uploaded into [KiFS](/content/tools/kifs) * a [data source](/content/sql/ddl#sql-create-data-source) configured to allow access to: * remote files on *Azure*, *GCS*, *HDFS*, or *S3* * remote messages on a *Kafka* topic * a remote database queryable via *JDBC* A `LOAD INTO` that uses a *data source* can perform a one-time load upon creation and optionally subscribe for updates on an interval, depending on the *data source* provider: | Provider | Description | One-Time Load | Subscription | | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------- | ------------- | ------------ | | *Azure* | Microsoft blob storage | Yes | Yes | | *Confluent* | Confluent Kafka streaming feed (JSON, GeoJSON, or Avro) | | Yes | | *GCS* | Google Cloud Storage | Yes | Yes | | *HDFS* | Apache Hadoop Distributed File System | Yes | | | *JDBC* | Java DataBase Connectivity; using a user-supplied driver or one of the drivers on the list [supported list](/content/concepts/jdbc_drivers) | Yes | Yes | | *Kafka* | Apache Kafka streaming feed (JSON, GeoJSON, or Avro) | | Yes | | *S3* | Amazon S3 Bucket | Yes | Yes | See [Manage Subscription](/content/sql/ddl#sql-alter-table-manage-sub) for pausing, resuming, & canceling subscriptions on the target table. The use of `LOAD INTO` with [ring resiliency](/content/ha) has additional [considerations](/content/ha/ha_configuration#ring-extdata). ### Parameters Optional keyword for compatibility Name of the [schema](/content/sql/ddl#sql-create-schema) containing the table into which data will be loaded Name of the [table](/content/sql/ddl#sql-create-table) into which data will be loaded Source data specification clause, where `` is a SQL query selecting the data which will be loaded. This clause is mutually exclusive with the `FROM FILE PATHS` clause, and is only applicable to JDBC *data sources*. The query should meet the following criteria: * Any column expression used is given a column alias. * The first column is not a `WKT` or unlimited length `VARCHAR` type. * The columns and expressions queried should match the order, number, & type of the columns in the target table. If the target table doesn't exist, it will be created using the names and data types of the queried columns in the order in which they appear in the query. Any query resulting in more than *10,000* records will be distributed and loaded in parallel (unless directed otherwise) using the following rule sequence: 1. If `REMOTE_QUERY_NO_SPLIT` is `TRUE`, the query will not be distributed. 2. If a valid `REMOTE_QUERY_PARTITION_COLUMN` is specified, the query will be distributed by partitioning on the given column's values 3. If a valid `REMOTE_QUERY_ORDER_BY` is specified, the query will be distributed by ordering the data accordingly and then partitioning into sequential blocks from the first record 4. If a non-null numeric/date/time column exists, the query will be distributed by partitioning on the first such column's values 5. The query will be distributed by sorting the data on the first column and then partitioning into sequential blocks from the first record Type inferencing is limited by the available JDBC types. To take advantage of Kinetica-specific types and properties, create the target table manually before loading data. Source file specification clause, where `` is a comma-separated list of single-quoted file paths from which data will be loaded; all files specified are presumed to have the same format and data types. This clause is mutually exclusive with the `FROM REMOTE QUERY` clause, and is not applicable to JDBC *data sources*. The form of a file path is dependent on the source referenced: * *Data Source*: If a *data source* is specified in the [load options](/content/sql/load#sql-load-file-server-load-opt), these file paths must resolve to accessible files at that *data source* location. A "path prefix" can be specified instead, which will cause all files whose path begins with the given prefix to be included. For example, a "path prefix" of `/data/ge` for `` would match all of the following: * `/data/geo.csv` * `/data/geo/flights.csv` * `/data/geo/2021/airline.csv` If using an HDFS *data source*, the "path prefix" must be the name of an HDFS directory. * [KiFS](/content/tools/kifs): The path must resolve to an accessible file path within *KiFS*. A "path prefix" can be specified instead, which will cause all files whose path begins with the given prefix to be included. For example, a "path prefix" of `kifs://data/ge` would match all of the following files under the *KiFS* `data` directory: * `kifs://data/geo.csv` * `kifs://data/geo/flights.csv` * `kifs://data/geo/2021/airline.csv` Optional indicator of source file type, for file-based data sources; will be inferred from the file extension if not given. Supported formats include:

Keyword	Description
`\[DELIMITED] TEXT`	Any text-based, delimited field data file (CSV, PSV, TSV, etc.); a comma-delimited list of options can be given to specify the way in which the data file(s) should be parsed, including the delimiter used, whether headers are present, etc. Records spanning multiple lines are not supported. See [Delimited Text Options](/content/sql/load#sql-load-file-server-delim-opt) for the complete list of `\`.
`AVRO`	Apache Avro data file
`JSON`	Either a JSON or GeoJSON data file See [JSON/GeoJSON Limitations](/content/load_data/concepts#ingest-json-limitations) for the supported data types.
`PARQUET`	Apache Parquet data file See [Parquet Limitations](/content/load_data/concepts#ingest-parquet-limitations) for the supported data types.
`SHAPEFILE`	ArcGIS shapefile

Optional indicator that a comma-delimited list of connection & global option/value assignments will follow. See [Load Options](/content/sql/load#sql-load-file-server-load-opt) for the complete list of options. Optional comma-separated list of table properties, from a subset of those available, to the table to load into, if it is created by this call. ### Delimited Text Options The following options can be specified when loading data from delimited text files. When reading from multiple files, options specific to the source file will be applied to each file being read. Treat lines in the source file(s) that begin with `string` as comments and skip. The default comment marker is `#`. Use `char` as the source file field delimiter. The default delimiter is a comma, unless a source file has one of these extensions: * `.psv` - will cause `|` to be the delimiter * `.tsv` - will cause the tab character to be the delimiter See [Delimited Text Option Characters](/content/sql/load#sql-load-file-server-delim-opt-char) for allowed characters. Use `char` as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:

Source Data String	Representation when Loaded into the Database
`\a`	ASCII bell
`\b`	ASCII backspace
`\f`	ASCII form feed
`\n`	ASCII line feed
`\r`	ASCII carriage return
`\t`	ASCII horizontal tab
`\v`	ASCII vertical tab

For instance, if the escape character is `\`, a `\t` encountered in the data will be converted to a tab character when stored in the database. The escape character can be used to escape the quoting character, and will be treated as an escape character whether it is within a quoted field value or not. There is no default escape character. Use `char` as the source file header field name/property delimiter, when the source file header contains both names and properties. This is largely specific to the Kinetica export to delimited text feature, which will, within each field's header, contain the field name and any associated properties, delimited by the pipe `|` character. An example *Kinetica* header in a CSV file: ``` id|int|data,category|string|data|char16,name|string|data|char32 ``` The default is the `|` (pipe) character. See [Delimited Text Option Characters](/content/sql/load#sql-load-file-server-delim-opt-char) for allowed characters. The `DELIMITER` character will still be used to separate field name/property sets from each other in the header row Declare that the source file(s) will or will not have a header. The default is `TRUE`. Treat `string` as the indicator of a null source field value. The default is `\N`. Use `char` as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters *(for clarity, for instance)*. The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by `ESCAPE`, if given. The default is the `"` (double-quote) character. See [Delimited Text Option Characters](/content/sql/load#sql-load-file-server-delim-opt-char) for allowed characters. #### Delimited Text Option Characters For `DELIMITER`, `HEADER DELIMITER`, `ESCAPE`, & `QUOTE`, any single character can be used, or any one of the following escaped characters: | Escaped Char | Corresponding Source File Character | | ------------ | ----------------------------------- | | `''` | Single quote | | `\a` | ASCII bell | | `\b` | ASCII backspace | | `\f` | ASCII form feed | | `\t` | ASCII horizontal tab | | `\v` | ASCII vertical tab | For instance, if two single quotes (`''`) are specified for a `QUOTE` character, the parser will interpret *single quotes* in the source file as *quoting* characters; specifying `\t` for `DELIMITER` will cause the parser to interpret *ASCII horizontal tab* characters in the source file as *delimiter* characters. ### Load Options The following options can be specified to modify the way data is loaded (or not loaded) into the target table. Name of the table containing records that failed to be loaded into the target table. This bad record table will include the following columns:

Column Name	Source Data Format Codes
`line\_number`	Number of the line in the input file containing the failed record
`char\_number`	Position of character within a failed record that is assessed as the beginning of the portion of the record that failed to process
`filename`	Name of file that contained the failed record
`line\_rejected`	Text of the record that failed to process
`error\_msg`	Error message associated with the record processing failure

This option is not applicable for an `ON ERROR` mode of `ABORT`. In that mode, processing stops at the first error and that error is returned to the user. Use an ingest batch size of the given number of records. The default batch size is *50,000*. Use the given type-specific formatting for the given column when parsing source data being loaded into that column. This should be a map of column names to format specifications, where each format specification is map of column type to data format, all formatted as a JSON string. Supported column types include: Apply the given date format to the given column. Common date format codes follow. For the complete list, see [Date/Time Conversion Codes](/content/sql/query#sql-datetime-conversion-codes).

Code	Description
`YYYY`	4-digit year
`MM`	2-digit month, where January is `01`
`DD`	2-digit day of the month, where the 1st of each month is `01`

Apply the given time format to the given column. Common time format codes follow. For the complete list, see [Date/Time Conversion Codes](/content/sql/query#sql-datetime-conversion-codes).

Code	Description
`HH24`	24-based hour, where 12:00 AM is `00` and 7:00 PM is `19`
`MI`	2-digit minute of the hour
`SS`	2-digit second of the minute
`MS`	milliseconds

Apply the given date/time format to the given column. For example, to load dates of the format `2010.10.30` into date column *d* and times of the 24-hour format `18:36:54.789` into time column *t*: ``` { "d": {"date": "YYYY.MM.DD"}, "t": {"time": "HH24:MI:SS.MS"} } ``` This option is not available for *data sources* configured for *JDBC*. Load data from the given [data source](/content/sql/ddl#sql-create-data-source). [Data source connect privilege](/content/sql/security#sql-security-priv-mgmt-ds-grant) is required when loading from a *data source*. Use the given formats for source data being loaded into target table columns with the corresponding column types. This should be a map of target column type to source format for data being loaded into columns of that type, formatted as a JSON string. Supported column properties and source data formats are the same as those listed in the description of the `COLUMN FORMATS` option. For example, to make the default format for loading source data dates like `2010.10.30` and 24-hour times like `18:36:54.789`: ``` { "date": "YYYY.MM.DD", "time": "HH24:MI:SS.MS", "datetime": "YYYY.MM.DD HH24:MI:SS.MS" } ``` This option is not available for *data sources* configured for *JDBC*. Choose a comma-separated list of fields from the source file(s) to ignore, loading only those fields that are not in the identified list in the order they appear in the file. Fields can be identified by either `POSITION` or `NAME`. If ignoring by `NAME`, the specified names must match the source file field names exactly. * Identifying by Name: ``` FIELDS IGNORED BY NAME(Category, Description) ``` * Identifying by Position: ``` FIELDS IGNORED BY POSITION(3, 4) ``` - When ignoring source data file fields, the set of fields that are not ignored must align, in type & number in their order in the source file, with the target table columns into which the data will be loaded. - Ignoring fields by `POSITION` is only supported for delimited text files. Choose a comma-separated list of fields from the source file(s) to load, in the specified order, identifying fields by either `POSITION` or `NAME`. If mapping by `NAME`, the specified names must match the source file field names exactly. * Identifying by Name: ``` FIELDS MAPPED BY NAME(ID, Name, Stock) ``` * Identifying by Position: ``` FIELDS MAPPED BY POSITION(1, 2, 5) ``` - When mapping source data file fields and the target table exists, the set of fields that are identified must align, in type & number in the specified order, with the target table columns into which data will be loaded. - Mapping fields by `POSITION` is only supported for delimited text files. Specify the policy for handling nested columns within JSON data. The default is `FALSE`.

Value	Description
`TRUE`	Break up nested columns into multiple columns.
`FALSE`	Treat nested columns as JSON columns instead of flattening.

Specify the error suppression policy for inserting duplicate primary key values into a table with a primary key. If the specified table does not have a primary key or the `UPDATE_ON_EXISTING_PK` option is used, then this options has no effect. The default is `FALSE`.

Value	Description
`TRUE`	Suppress errors when inserted records and existing records' PKs match.
`FALSE`	Return errors when inserted records and existing records' PKs match.

Whether to do a full ingest of the data or perform a *dry run* or *type inference* instead. The default mode is `FULL`.

Value	Description
`DRY RUN`	No data will be inserted, but the file will be read with the applied `ON ERROR` mode and the number of valid records that would normally be inserted is returned.
`FULL`	Data is fully ingested according to the active `ON ERROR` mode.
`TYPE INFERENCE`	Infer the type of the source data and return, without ingesting any data. The inferred type is returned in the response, as the output of a `SHOW TABLE` command.

Retrieve this many records at a time from the remote database. Lowering this number will help tables with large record sizes fit into available memory during ingest. The default is *50,000*. This option is only available for *data sources* configured for *JDBC*. Run the single given statement before the initial load is performed. Also run it before each subsequent reload, if `SUBSCRIBE` is `TRUE`. For example, to set the time zone to *UTC* before running each load, use: ``` JDBC_SESSION_INIT_STATEMENT = 'SET TIME ZONE ''UTC''' ``` This option is only available for *data sources* configured for *JDBC*. Number of *Kafka* consumer threads per rank; valid range is `1` through `6`. The default is `1`. Consumer group ID to use for *data sources* configured for *Kafka*. If not specified, one will be randomly generated. Enable optimistic ingestion where *Kafka* topic offsets and table data are committed independently to achieve parallelism. The default is `FALSE`. Whether to start a *Kafka* subscription at the earliest or latest offset, for *data sources* configured for *Kafka*. The default is `earliest`.

Value	Description
`earliest`	Start streaming from the oldest message in the queue.
`latest`	Start streaming from the newest message in the queue.

Optional number of minutes after which a *Kafka* subscription will be cancelled, for *data sources* configured for *Kafka*. Maximum time to collect *Kafka* messages before type inferencing on the set of them. The default is generally `30` seconds. Limit the number of records to load in this request. If the number is larger than `BATCH SIZE`, the number of records loaded will be limited to the next whole number of `BATCH SIZE` (per working thread). The number of remote query partitions to assign each *Kinetica* worker process. The queries assigned to a worker process will be executed by the tasks allotted to the process. To decrease memory pressure, increase the number of splits per rank. The default is *8* splits per rank. This option is only available for *data sources* configured for *JDBC*. The number of tasks to use on each *Kinetica* worker process to process remote queries. The tasks assigned to a worker process will execute any remote query partitions assigned to it. To decrease memory pressure, decrease the number of tasks per rank. The default is *8* tasks per rank. This option is only available for *data sources* configured for *JDBC*. When an error is encountered loading a record, handle it using either of the following modes. The default mode is `ABORT`.

Value	Description
`SKIP`	If an error is encountered parsing a source record, skip the record.
`ABORT`	If an error is encountered parsing a source record, stop the data load process. Primary key collisions are considered abortable errors in this mode.

Interval, in seconds, at which a *data source* is polled for updates. The number of seconds must be passed as a single-quoted string. The default interval is *60* seconds. This option is only applicable when `SUBSCRIBE` is `TRUE`. This option is not available for *data sources* configured for *Kafka*. Optional [primary key](/content/concepts/tables#primary-key) specification clause, where the value is a parenthesized comma-separated list of columns to use as the *primary key* for the *table*: ``` PRIMARY KEY = ('id', 'dept_id') ``` For a JDBC query *change data capture* loading scheme, the remote query column that will be used to determine whether a record is new and should be loaded or not. This column should have an ever-increasing value and be of an integral or date/timestamp type. Often, this column will be a sequence-based ID or create/modify timestamp. This option is only applicable when `SUBSCRIBE` is `TRUE`. This option is only available for *data sources* configured for *JDBC*. Whether to not distribute the retrieval of remote data and issue queries for blocks of data at time in parallel. The default is `FALSE`. This option is only available for *data sources* configured for *JDBC*

Value	Description
`TRUE`	Issue the remote data retrieval as a single query.
`FALSE`	Distribute and parallelize the remote data retrieval in queries for blocks of data at a time.

Ordering expression to use in partitioning remote data for retrieval. The remote data will be ordered according to this expression and then retrieved in sequential blocks from the first record. This is potentially less performant than using `REMOTE_QUERY_PARTITION_COLUMN`. If `REMOTE_QUERY_NO_SPLIT` is `TRUE`, a valid `REMOTE_QUERY_PARTITION_COLUMN` is specified, or the column given is invalid, this option is ignored. This option is only available for *data sources* configured for *JDBC* Column to use to partition remote data for retrieval. The column must be numeric and should be relatively evenly distributed so that queries using values of this column to partition data will retrieve relatively consistently-sized result sets. If `REMOTE_QUERY_NO_SPLIT` is `TRUE` or the column given is invalid, this option is ignored. This option is only available for *data sources* configured for *JDBC* Name of the Avro schema in the schema registry to use when reading Avro records. This option is not available for *data sources* configured for *Kafka*. Optional [shard key](/content/concepts/tables#shard-key) specification clause, where the value is a parenthesized comma-separated list of columns to use as the *shard key* for the *table*: ``` SHARD KEY = ('id') ``` Skip this number of lines when loading from a source file. Whether to subscribe to the [data source](/content/sql/ddl#sql-create-data-source) specified in the `DATA SOURCE` option. The default is `FALSE`. This option is not available for *data sources* configured for *HDFS*.

Value	Description
`TRUE`	Stream data from the specified data source.
`FALSE`	Load all data once from the specified data source.

Specify the string truncation policy for inserting text into `VARCHAR` columns that are not large enough to hold the entire text value. The default is `FALSE`.

Value	Description
`TRUE`	Truncate any inserted string value at the maximum size for its column.
`FALSE`	Reject any record with a string value that is too long for its column.

Specify whether to truncate the target table before beginning the load or subscription. The default is `FALSE` (don't truncate the target table). When making a type inference of the data values in order to define column types for the target table, use one of the following modes. The default mode is `SPEED`.

Value	Description
`ACCURACY`	Scan all available data to arrive at column types that are the narrowest possible that can still hold all the data.
`SPEED`	Pick the widest possible column types from the minimum data scanned in order to quickly arrive at column types that should fit all data values.

Specify the record collision policy for inserting into a table with a primary key. If the specified table does not have a primary key, then this options has no effect. The default is `FALSE`.

Value	Description
`TRUE`	Update existing records with records being inserted, when PKs match.
`FALSE`	Discard records being inserted when existing records' PKs match.

### Table Property Clause A comma-separated list of options for creating the target table, if it doesn't exist **and** if the source data file has *Kinetica* headers, can be specified. *Kinetica* headers contain column type information, which are used in creating the target table if it doesn't exist. An example *Kinetica* header: ``` id|int|data,category|string|data|char16,name|string|data|char32,description|string|data|char128|nullable,stock|int|data|nullable ``` ```sql title="LOAD INTO Table Property Clause" theme={null} USING TABLE PROPERTIES ( = , ...

= ) ``` Size of the blocks of memory holding the data, when loaded; specified as the maximum number of bytes any one column should hold. The size of dictionary-encoded columns is estimated. Size of the blocks of memory holding the data, when loaded; specified as the maximum total number of bytes all columns should hold. The size of dictionary-encoded columns is estimated. Size of the blocks of memory holding the data, when loaded; specified as the maximum number of records each block of memory should hold The default [compression](/content/concepts/column_compression) type to apply to columns of this table not explicitly given one. The [time-to-live (TTL)](/content/concepts/ttl) for the *table*; if not set, the *table* will not expire ### Examples #### File Paths

Single File

To load a CSV file of product data named products.csv in the *KiFS* directory data, into a table named `product`: ```sql LOAD INTO (Single File) Example theme={null} LOAD DATA INTO example.product FROM FILE PATHS 'kifs://data/products.csv' ```

Multiple Files

When loading files from multiple paths, specify a comma-delimited list of those file paths. To load one file named products.csv and one file named products.kh.csv, both under the *KiFS* directory data, into a table named `product`: ```sql LOAD INTO (Multiple Files) Example theme={null} LOAD DATA INTO example.product FROM FILE PATHS 'kifs://data/products.csv', 'kifs://data/products.kh.csv' ``` #### File Formats

Single File - No Header

To load a headerless CSV file of product data named products.nh.csv in the *KiFS* directory data, into a table named `product`: ```sql LOAD INTO (No Header) Example theme={null} LOAD DATA INTO example.product FROM FILE PATHS 'kifs://data/products.nh.csv' FORMAT TEXT (INCLUDES HEADER = FALSE) ```

Delimited Text Options

To load data from a product data file with the following options: * data file named products.ssv located in the *KiFS* directory data * the file has a header * the field delimiter is a semicolon * data requiring quotes will have single quotes as the quoting character * the escape character is the backtick * the data file represents *null* values as `` * data file comments are on lines starting with `--` * when parse errors are encountered, the record will be skipped ```sql LOAD INTO with Delimited Text Options Example theme={null} LOAD DATA INTO example.product FROM FILE PATHS 'kifs://data/products.ssv' FORMAT TEXT ( COMMENT = '--', DELIMITER = ';', ESCAPE = '`', INCLUDES HEADER = TRUE, NULL = '', QUOTE = '''' ) WITH OPTIONS (ON ERROR = SKIP) ```

Kinetica Headers

A CSV file with *Kinetica* headers can be used to specify the column types of the target table when creating it if it doesn't exist. To load such a file of product data with the following conditions: * data file named products.kh.csv located in the *KiFS* directory data * data file contains a *Kinetica* header * target table named `product_create_on_load` * target table will be created if it doesn't exist ```sql LOAD INTO (Kinetica Header) Example theme={null} LOAD DATA INTO example.product_create_on_load FROM FILE PATHS 'kifs://data/products.kh.csv' ```

Parquet File

To load a *Parquet* file of employee data, employee.parquet, in the *KiFS directory* data, into a table named `example.employee_2000`: ```sql LOAD INTO (Parquet) Example theme={null} LOAD DATA INTO example.employee_2000 FROM FILE PATHS 'kifs://data/employee.parquet' FORMAT PARQUET ```

Kafka

To subscribe to a *Kafka* data feed with the following conditions: * *Kafka* remote source is represented by the `kafka_ds` *data source* * data will be continuously streamed from the *Kafka* source * target table named `orders` ```sql LOAD INTO (Kafka) Example theme={null} LOAD DATA INTO example.orders FORMAT JSON WITH OPTIONS (DATA SOURCE = 'kafka_ds', SUBSCRIBE = TRUE) ``` #### Table Options If the target table doesn't exist, the `LOAD INTO` command will create it using the given table options and data type inferencing for the column types. See [Kinetica Headers](/content/sql/load#sql-load-file-server-examples-kh) for an example of loading a file with *Kinetica* headers to define the column types of the created table.

Primary/Shard Keys

A CSV file can be used to create a target table with primary & shard keys if it doesn't exist. To load such a file of employee data with the following conditions: * data file named employee.csv located in the *KiFS* directory data * data file does not contain a *Kinetica* header * target table named `example.employee` * target table will be created if it doesn't exist * target table will have a composite *primary key* on the `id` & `dept_id` columns and a *shard key* on `id` column, if it is created by this load command ```sql LOAD INTO (Primary/Shard Key) Example theme={null} LOAD INTO example.employee FROM FILE PATHS 'kifs://data/employee.csv' WITH OPTIONS ( PRIMARY KEY = (id, dept_id), SHARD KEY = (id) ) ``` The data must have no nulls in any columns designated as *primary key* columns.

Creation Options

Table creation options can be used to define characteristics of a table when creating it if it doesn't exist. To load such a file of product data with the following conditions: * data file named products.kh.csv located in the *KiFS* directory data * data file does not contain a *Kinetica* header * target table named `product_create_options` * target table will be created if it doesn't exist * target table will have a chunk size of *1,000,000* records, if it is created by this load command * target table will have a *TTL* of *5* minutes, and will be dropped if unused for any 5-minute span, if it is created by this load command ```sql LOAD INTO (Create Table) Example theme={null} LOAD DATA INTO example.product_create_options FROM FILE PATHS 'kifs://data/products.csv' USING TABLE PROPERTIES (CHUNK SIZE = 1000000, TTL = 5) ``` #### Column Options

Column/Field Names

To load specific fields from a product data file, with the following conditions: * data file named products.title-case.csv located in the *KiFS* directory data * data file contains a header with fields named `ID`, `Category`, `Name`, `Description`, & `Stock` * target table named `product_name_stock` * target columns named `id`, `name`, & `stock` * only load fields `ID`, `Name`, & `Stock` into columns `id`, `name`, & `stock` * skip records that fail to load successfully ```sql LOAD INTO (Column/Field Names) Example theme={null} LOAD DATA INTO example.product_name_stock FROM FILE PATHS 'kifs://data/products.title-case.csv' WITH OPTIONS (ON ERROR = SKIP, FIELDS MAPPED BY NAME(ID, Name, Stock)) ``` When specifying source data file field names, the set of source data file fields selected must align, in type & number, with the target table columns into which data will be loaded.

Date/Time Column Types

To apply a specific date format to the `hire_date` field: ```sql LOAD INTO (Date/Time Column Types) Example theme={null} LOAD INTO example.employee FROM FILE PATHS 'kifs://data/employee.csv' WITH OPTIONS ( COLUMN FORMATS = ' { "hire_date": {"date": "YYYY-MM-DD"} }' ) ``` #### Dry Run To perform a dry run of a load of a CSV file, products.csv, in the *KiFS directory* data, into a table named `example.product`: ```sql LOAD INTO (Dry Run) Example theme={null} LOAD DATA INTO example.product FROM FILE PATHS 'kifs://data/products.csv' WITH OPTIONS (INGESTION MODE = DRY RUN) ``` The dry run will return the number of records that would have been loaded, but not actually insert any records into the target table. #### Data Sources

File-Based

To load a CSV file, products.csv, from the *data source* example.product\_ds, into a table named `example.product`: ```sql LOAD INTO (Data Source File) Example theme={null} LOAD DATA INTO example.product FROM FILE PATHS 'products.csv' WITH OPTIONS (DATA SOURCE = 'example.product_ds') ```

Query-Based

To load the results of a remote query of employees in department *2* from the JDBC *data source* example.jdbc\_ds, into a local table named `example.employee_dept2`: ```sql LOAD INTO (Data Source Query) Example theme={null} LOAD INTO example.employee_dept2 FROM REMOTE QUERY 'SELECT * FROM example.employee WHERE dept_id = 2' WITH OPTIONS (DATA SOURCE = 'example.jdbc_ds') ``` #### Change Data Capture

File-Based

To load a set of order data in a change data capture scheme with the following conditions: * data pulled through a *data source*, example.order\_ds * data files contained with an orders directory * initially, all files in the directory will be loaded; subsequently, only those files that have been updated since the last check will be reloaded * files will be polled for updates every *5* minutes * target table named `example.orders` * target table will be created if it doesn't exist ```sql LOAD INTO (File Change Data Capture) Example theme={null} -- Load files in the orders directory with timestamps newer than the time of the -- previous poll interval; poll every 5 minutes LOAD DATA INTO example.orders FROM FILE PATHS 'orders/' WITH OPTIONS ( DATA SOURCE = 'example.order_ds', SUBSCRIBE = TRUE, POLL_INTERVAL = 300 ) ```

Query-Based

To load the data from a remote query of orders in a change data capture scheme with the following conditions: * data pulled through a *data source*, example.jdbc\_ds * data contained with an `example.orders` table, where only orders for product with ID *42* will be loaded into the target table * initially, all orders will be loaded; subsequently, only those orders with an `order_id` column value higher than the highest one on the previous poll cycle will be loaded * remote table will be polled for updates every *60* seconds * target table named `example.order_product42` * target table will be created if it doesn't exist ```sql LOAD INTO (Query Change Data Capture) Example theme={null} -- Load new orders for product 42 continuously into a table -- order_id is an ever-increasing sequence allotted to each new order LOAD INTO example.order_product42 FROM REMOTE QUERY 'SELECT * FROM example.orders WHERE product_id = 42' WITH OPTIONS ( DATA SOURCE = 'example.jdbc_ds', SUBSCRIBE = TRUE, REMOTE_QUERY_INCREASING_COLUMN = 'order_id' ) ``` ## INSERT INTO ... SELECT ... FROM FILE *Kinetica* can load data from text-based data files that are local to the client into existing tables using a modified `INSERT INTO ... SELECT` statement. This command is only available through [KiSQL](/content/tools/kisql) or database clients configured with the [Kinetica ODBC/JDBC driver](/content/connectors/sql_guide#odbc-jdbc). When reading data from a file, *Kinetica* assumes the following format, able to be overridden using command [options](/content/sql/load#sql-load-file-client-options): * If there is a header row, it is either: * a simple list of comma-delimited column names: ``` id,category,name,description,stock ``` * the *Kinetica* standard format: ``` id|int|data,category|string|data|char16,name|string|data|char32,description|string|data|char128|nullable,stock|int|data|nullable ``` * The data fields are comma-delimited * Strings are optionally enclosed in double-quotes; double-quotes must be used when the data contains commas; two consecutive double-quotes are used as an escape code for double-quoted string data containing double-quotes; e.g.: `"This string contains a "" quote mark and a "","" double-quoted comma."` This would be loaded into a single string column in *Kinetica* as: `This string contains a " quote mark and a "," double-quoted comma.` ```sql title="INSERT INTO ... SELECT ... FROM FILE Syntax" theme={null} INSERT INTO [.]

[( )] SELECT FROM FILE."" [ ### Options The following options can be specified when loading data from files. When reading from multiple files (using wildcards when specifying the file names), options specific to the source file will be applied to each file being read. Use an ingest batch size of `n` records. The default batch size is *10,000*. Treat lines in the source file that begin with `string` as comments and skip. The default comment marker is `#`. Use `char` as the source file field delimiter. The default delimiter is `,`, unless the source file has one of these extensions: * `.psv` - will cause `|` to be the delimiter * `.tsv` - will cause the tab character to be the delimiter See [Option Characters](/content/sql/load#sql-load-file-client-option-chars) for allowed characters. Use `char` as the source file data escape character. The escape character preceding any other character, in the source data, will be converted into that other character, except in the following special cases:

Source Data String	Representation when Loaded into the Database
`\a`	ASCII bell
`\b`	ASCII backspace
`\f`	ASCII form feed
`\n`	ASCII line feed
`\r`	ASCII carriage return
`\t`	ASCII horizontal tab
`\v`	ASCII vertical tab

Mode	Description
`PERMISSIVE`	If an error is encountered parsing a source record, attempt to insert as much of the valid fields from the record as possible.
`SKIP`	If an error is encountered parsing a source record, skip the record.
`ABORT`	If an error is encountered parsing a source record, stop the data load process. Primary key collisions are considered abortable errors in this mode.

The default mode is `SKIP`. Use `char` as the source file data quoting character, for enclosing field values. Usually used to wrap field values that contain embedded delimiter characters, though any field may be enclosed in quote characters *(for clarity, for instance)*. The quote character must appear as the first and last character of a field value in order to be interpreted as quoting the value. Within a quoted value, embedded quote characters may be escaped by preceding them with another quote character or the escape character specified by `ESCAPE`, if given. The default is the `"` (double-quote) character. See [Option Characters](/content/sql/load#sql-load-file-client-option-chars) for allowed characters. Skip the first `n` source file data lines read, not including header and comment lines. If reading from multiple source files, this count is assessed across all files being read. For example, if `n` is *15* and the first two files read have *10* records each, the first record to be loaded (not skipped) will be the *6th* record of the *2nd* file. The default is to skip no records. ### Option Characters For `DELIMITER`, `ESCAPE`, & `QUOTE`, any single character can be used, or any one of the following escaped characters: | Escaped Char | Corresponding Source File Character | | ------------ | ----------------------------------- | | `''` | Single quote | | `\\` | Backslash | | `\a` | ASCII bell | | `\b` | ASCII backspace | | `\f` | ASCII form feed | | `\t` | ASCII horizontal tab | | `\v` | ASCII vertical tab | For instance, if two single quotes (`''`) are specified for a `QUOTE` character, the parser will interpret *single quotes* in the source file as *quoting* characters; specifying `\t` for `DELIMITER` will cause the parser to interpret *ASCII horizontal tab* characters in the source file as *delimiter* characters. ### Examples #### File Paths

Single File

To load a CSV file of product data located at ../data/products.csv into a table named `product`: ```sql INSERT INTO ... SELECT ... FROM FILE (Single File) Example theme={null} INSERT INTO example.product SELECT * FROM FILE."../data/products.csv" ```

Multiple Files

To load all files whose names begin with `products` and end in a `csv` extension, located under ../data, into a table named `product`: ```sql INSERT INTO ... SELECT ... FROM FILE (Multiple File) Example theme={null} INSERT INTO example.product SELECT * FROM FILE."../data/products*csv" ``` #### Options

Column/Field Names

To load specific fields from a product data file located at ../data/products.csv into specific target table columns: ```sql INSERT INTO ... SELECT ... FROM FILE (Column/Field Names) Example theme={null} INSERT INTO example.product (id, stock, category, name) SELECT ID, Stock, Category, Name FROM FILE."../data/products.csv" ``` When specifying source data file field names, it is not required that target table column names also be specified, or vice versa. The only requirement is that the set of source data file fields selected align with the target table columns into which data will be inserted.

Load Options

To load data from a product data file with the following options: * file located at ../data/products.csv * only records *4* through *10* will be loaded * any error will cause the insert to fail * the target table will be truncated before loading * records will be inserted into the table in groups of *2* ```sql INSERT INTO ... SELECT ... FROM FILE with Load Options Example theme={null} INSERT INTO example.product SELECT * FROM FILE."../data/products.csv" BATCH_SIZE=2 INITIAL_CLEAR LIMIT=7 ON_ERROR=abort SKIP=3 ``` To load data from a product data file with the following options: * file located at ../data/products.ssv * the file has a header * the field delimiter is a semicolon * data requiring quotes will have single quotes as the quoting character * the escape character is the backtick * the data file represents *null* values as `` * data file comments are on lines starting with `--` ```INSERT INTO ... SELECT ... FROM FILE with Parse Options Example theme={null} INSERT INTO example.product SELECT * FROM FILE."../data/products.ssv" COMMENT='--' DELIMITER=';' ESCAPE='`' HEADER=Y NULL='' QUOTE='\'' ``` ### Considerations *Multi-Head* * If [Multi-Head Ingest](/content/tuning/multihead/multihead_ingest) has been [enabled on the database server](/content/tuning/multihead/multihead_ingest#multi-head-ingest-configuration), file loading through ODBC/JDBC will automatically leverage it, when [applicable](/content/tuning/multihead/multihead_ingest#multi-head-ingest-considerations). *Blank Lines* * A blank line in a source data file will be treated as a valid record and will be inserted into the target table as a record with all null fields. If blank lines should not be considered as data, they should be prefixed with the comment marker in order to be skipped by the loading process. *GAdmin* * While [GAdmin](/content/admin/gadmin/query#gadmin-sql) does not support this command, it does support [importing data](/content/admin/gadmin/data#import-data) from client-local files directly. [/execute/sql](/content/api/rest/execute_sql_rest) *endpoint* * While SQL file ingestion is available to ODBC/JDBC clients; it is not supported within the [/execute/sql](/content/api/rest/execute_sql_rest) endpoint, either as a REST call directly or as an API call (like the *Python* API's `execute_sql()` function).