| bad_record_table_name | Name of a table to which records that
were rejected are written. The
bad-record-table has the following
columns: line_number (long),
line_rejected (string), error_message
(string). When error_handling is
abort, bad records table is not
populated. |
| bad_record_table_limit | A positive integer indicating the
maximum number of records that can be
written to the bad-record-table. The default value is 10000. |
| bad_record_table_limit_per_input | For subscriptions, a positive integer
indicating the maximum number of records
that can be written to the
bad-record-table per file/payload.
Default value will be
bad_record_table_limit and total size
of the table per rank is limited to
bad_record_table_limit. |
| batch_size | Number of records to insert per batch
when inserting data. The default value is 50000. |
| column_formats | For each target column specified,
applies the column-property-bound format
to the source data loaded into that
column. Each column format will contain
a mapping of one or more of its column
properties to an appropriate format for
each property. Currently supported
column properties include date, time,
and datetime. The parameter value must
be formatted as a JSON string of maps of
column names to maps of column
properties to their corresponding column
formats, e.g., '{ "order_date" : {
"date" : "%Y.%m.%d" }, "order_time" : {
"time" : "%H:%M:%S" } }'. See default_column_formats for valid
format syntax. |
| columns_to_load | Specifies a comma-delimited list of
columns from the source data to load.
If more than one file is being loaded,
this list applies to all files. Column numbers can be specified
discretely or as a range. For example,
a value of '5,7,1..3' will insert values
from the fifth column in the source data
into the first column in the target
table, from the seventh column in the
source data into the second column in
the target table, and from the first
through third columns in the source data
into the third through fifth columns in
the target table. If the source data contains a header,
column names matching the file header
names may be provided instead of column
numbers. If the target table doesn't
exist, the table will be created with
the columns in this order. If the
target table does exist with columns in
a different order than the source data,
this list can be used to match the order
of the target table. For example, a
value of 'C, B, A' will create a three
column table with column C, followed by
column B, followed by column A; or will
insert those fields in that order into a
table created with columns in that
order. If the target table exists, the
column names must match the source data
field names for a name-mapping to be
successful. Mutually exclusive with
columns_to_skip. |
| columns_to_skip | Specifies a comma-delimited list of
columns from the source data to skip.
Mutually exclusive with
columns_to_load. |
| compression_type | Source data compression type. The default value is auto. | Supported
Values | Description |
|---|
| none | No compression. | | auto | Auto detect compression type. | | gzip | gzip file compression. | | bzip2 | bzip2 file compression. |
|
| datasource_name | Name of an existing external data source
from which data file(s) specified in
input parameter filepaths will be
loaded. |
| default_column_formats | Specifies the default format to be
applied to source data loaded into
columns with the corresponding column
property. Currently supported column
properties include date, time, and
datetime. This default
column-property-bound format can be
overridden by specifying a column
property and format for a given target
column in column_formats. For each
specified annotation, the format will
apply to all columns with that
annotation unless a custom
column_formats for that annotation is
specified. The parameter value must be formatted as
a JSON string that is a map of column
properties to their respective column
formats, e.g., '{ "date" : "%Y.%m.%d",
"time" : "%H:%M:%S" }'. Column formats
are specified as a string of control
characters and plain text. The supported
control characters are 'Y', 'm', 'd',
'H', 'M', 'S', and 's', which follow the
Linux 'strptime()' specification, as
well as 's', which specifies seconds and
fractional seconds (though the
fractional component will be truncated
past milliseconds). Formats for the 'date' annotation must
include the 'Y', 'm', and 'd' control
characters. Formats for the 'time'
annotation must include the 'H', 'M',
and either 'S' or 's' (but not both)
control characters. Formats for the
'datetime' annotation meet both the
'date' and 'time' control character
requirements. For example, '{"datetime"
: "%m/%d/%Y %H:%M:%S" }' would be used
to interpret text as "05/04/2000
12:12:11". |
| datalake_catalog | Name of an existing datalake(iceberg)
catalog used in loading files. |
| datalake_path | Path of datalake(iceberg) object. |
| datalake_snapshot | Snapshot ID of datalake(iceberg) object. |
| error_handling | Specifies how errors should be handled
upon insertion. The default value is abort. | Supported Values | Description |
|---|
| permissive | Records with missing columns are
populated with nulls if possible;
otherwise, the malformed records are
skipped. | | ignore_bad_records | Malformed records are skipped. | | abort | Stops current insertion and aborts
entire operation when an error is
encountered. Primary key collisions are
considered abortable errors in this
mode. |
|
| external_table_type | Specifies whether the external table
holds a local copy of the external data. The default value is materialized. | Supported
Values | Description |
|---|
| materialized | Loads a copy of the external data into
the database, refreshed on demand. | | logical | External data will not be loaded into
the database; the data will be retrieved
from the source upon servicing each
query against the external table. |
|
| file_type | Specifies the type of the file(s) whose
records will be inserted. The default value is delimited_text. | Supported
Values | Description |
|---|
| avro | Avro file format. | | delimited_text | Delimited text file format; e.g., CSV,
TSV, PSV, etc. | | gdb | Esri/GDB file format. | | json | JSON file format. | | parquet | Apache Parquet file format. | | shapefile | ShapeFile file format. |
|
| flatten_columns | Specifies how to handle nested columns. The default value is false. | Supported
Values | Description |
|---|
| true | Break up nested columns to multiple
columns. | | false | Treat nested columns as JSON columns
instead of flattening. |
|
| gdal_configuration_options | Comma separated list of gdal conf
options, for the specific requests:
key=value. |
| ignore_existing_pk | Specifies the record collision
error-suppression policy for inserting
into a table with a
primary key,
only used when not in upsert mode
(upsert mode is disabled when
update_on_existing_pk is false). If
set to true, any record being inserted
that is rejected for having primary key
values that match those of an existing
table record will be ignored with no
error generated. If false, the
rejection of any record for having
primary key values matching an existing
record will result in an error being
reported, as determined by
error_handling. If the specified
table does not have a primary key or if
upsert mode is in effect
(update_on_existing_pk is true),
then this option has no effect. The default value is false. | Supported
Values | Description |
|---|
| true | Ignore new records whose primary key
values collide with those of existing
records. | | false | Treat as errors any new records whose
primary key values collide with those of
existing records. |
|
| ingestion_mode | Whether to do a full load, dry run, or
perform a type inference on the source
data. The default value is full. | Supported Values | Description |
|---|
| full | Run a type inference on the source data
(if needed) and ingest. | | dry_run | Does not load data, but walks through
the source data and determines the
number of valid records, taking into
account the current mode of
error_handling. | | type_inference_only | Infer the type of the source data and
return, without ingesting any data. The
inferred type is returned in the
response. |
|
| jdbc_fetch_size | The JDBC fetch size, which determines
how many rows to fetch per round trip. The default value is 50000. |
| kafka_consumers_per_rank | Number of Kafka consumer threads per
rank (valid range 1-6). The default value is 1. |
| kafka_group_id | The group id to be used when consuming
data from a Kafka topic (valid only for
Kafka datasource subscriptions). |
| kafka_offset_reset_policy | Policy to determine whether the Kafka
data consumption starts either at
earliest offset or latest offset. The default value is earliest. The supported values are: |
| kafka_optimistic_ingest | Enable optimistic ingestion where Kafka
topic offsets and table data are
committed independently to achieve
parallelism. The default value is false. The supported values are: |
| kafka_subscription_cancel_after | Sets the Kafka subscription lifespan (in
minutes). Expired subscription will be
cancelled automatically. |
| kafka_type_inference_fetch_timeout | Maximum time to collect Kafka messages
before type inferencing on the set of
them. |
| layer | Geo files layer(s) name(s): comma
separated. |
| loading_mode | Scheme for distributing the extraction
and loading of data from the source data
file(s). This option applies only when
loading files that are local to the
database. The default value is head. | Supported Values | Description |
|---|
| head | The head node loads all data. All files
must be available to the head node. | | distributed_shared | The head node coordinates loading data
by worker processes across all nodes
from shared files available to all
workers. NOTE: Instead of existing on a shared source,
the files can be duplicated on a source
local to each host to improve
performance, though the files must
appear as the same data set from the
perspective of all hosts performing the
load. | | distributed_local | A single worker process on each node
loads all files that are available to
it. This option works best when each
worker loads files from its own file
system, to maximize performance. In
order to avoid data duplication, either
each worker performing the load needs to
have visibility to a set of files unique
to it (no file is visible to more than
one node) or the target table needs to
have a primary key (which will allow the
worker to automatically deduplicate
data). NOTE: If the target table doesn't exist, the
table structure will be determined by
the head node. If the head node has no
files local to it, it will be unable to
determine the structure and the request
will fail. If the head node is configured to have
no worker processes, no data strictly
accessible to the head node will be
loaded. |
|
| local_time_offset | Apply an offset to Avro local timestamp
columns. |
| max_records_to_load | Limit the number of records to load in
this request: if this number is larger
than batch_size, then the number of
records loaded will be limited to the
next whole number of batch_size (per
working thread). |
| num_tasks_per_rank | Number of tasks for reading file per
rank. Default will be system
configuration parameter,
external_file_reader_num_tasks. |
| poll_interval | If true, the number of seconds between
attempts to load external files into the
table. If zero, polling will be
continuous as long as data is found. If
no data is found, the interval will
steadily increase to a maximum of 60
seconds. The default value is 0. |
| primary_keys | Comma separated list of column names to
set as primary keys, when not specified
in the type. |
| refresh_method | Method by which the table can be
refreshed from its source data. The default value is manual. | Supported
Values | Description |
|---|
| manual | Refresh only occurs when manually
requested by invoking the refresh action
of
/alter/table
on this table. | | on_start | Refresh table on database startup and
when manually requested by invoking the
refresh action of
/alter/table
on this table. |
|
| schema_registry_connection_retries | Confluent Schema registry connection
timeout (in secs). |
| schema_registry_connection_timeout | Confluent Schema registry connection
timeout (in secs). |
| schema_registry_max_consecutive_connection_failures | Max records to skip due to SR connection
failures, before failing. |
| max_consecutive_invalid_schema_failure | Max records to skip due to schema
related errors, before failing. |
| schema_registry_schema_name | Name of the Avro schema in the schema
registry to use when reading Avro
records. |
| shard_keys | Comma separated list of column names to
set as shard keys, when not specified in
the type. |
| skip_lines | Skip a number of lines from the
beginning of the file. |
| start_offsets | Starting offsets by partition to fetch
from kafka. A comma separated list of
partition:offset pairs. |
| subscribe | Continuously poll the data source to
check for new data and load it into the
table. The default value is false. The supported values are: |
| table_insert_mode | Insertion scheme to use when inserting
records from multiple shapefiles. The default value is single. | Supported
Values | Description |
|---|
| single | Insert all records into a single table. | | table_per_file | Insert records from each file into a new
table corresponding to that file. |
|
| text_comment_string | Specifies the character string that
should be interpreted as a comment line
prefix in the source data. All lines in
the data starting with the provided
string are ignored. For delimited_text file_type only. The default value is #. |
| text_delimiter | Specifies the character delimiting field
values in the source data and field
names in the header (if present). For delimited_text file_type only. The default value is ,. |
| text_escape_character | Specifies the character that is used to
escape other characters in the source
data. An 'a', 'b', 'f', 'n', 'r', 't', or 'v'
preceded by an escape character will be
interpreted as the ASCII bell,
backspace, form feed, line feed,
carriage return, horizontal tab, and
vertical tab, respectively. For
example, the escape character followed
by an 'n' will be interpreted as a
newline within a field value. The escape character can also be used to
escape the quoting character, and will
be treated as an escape character
whether it is within a quoted field
value or not. For delimited_text file_type only. |
| text_has_header | Indicates whether the source data
contains a header row. For delimited_text file_type only. The default value is true. The supported values are: |
| text_header_property_delimiter | Specifies the delimiter for
column properties
in the header row (if present). Cannot
be set to same value as
text_delimiter. For delimited_text file_type only. The default value is |. |
| text_null_string | Specifies the character string that
should be interpreted as a null value in
the source data. For delimited_text file_type only. The default value is N. |
| text_quote_character | Specifies the character that should be
interpreted as a field value quoting
character in the source data. The
character must appear at beginning and
end of field value to take effect.
Delimiters within quoted fields are
treated as literals and not delimiters.
Within a quoted field, two consecutive
quote characters will be interpreted as
a single literal quote character,
effectively escaping it. To not have a
quote character, specify an empty
string. For delimited_text file_type only. The default value is ". |
| text_search_columns | Add 'text_search' property to internally
inferenced string columns. Comma
separated list of column names or '*'
for all columns. To add 'text_search'
property only to string columns greater
than or equal to a minimum size, also
set the text_search_min_column_length |
| text_search_min_column_length | Set the minimum column size for strings
to apply the 'text_search' property to.
Used only when text_search_columns has
a value. |
| trim_space | If set to true, remove leading or
trailing space from fields. The default value is false. The supported values are: |
| truncate_strings | If set to true, truncate string values
that are longer than the column's type
size. The default value is false. The supported values are: |
| truncate_table | If set to true, truncates the table
specified by input parameter
table_name prior to loading the
file(s). The default value is false. The supported values are: |
| type_inference_max_records_read | |
| type_inference_mode | Optimize type inferencing for either
speed or accuracy. The default value is speed. | Supported
Values | Description |
|---|
| accuracy | Scans data to get exactly-typed and
sized columns for all data scanned. | | speed | Scans data and picks the widest possible
column types so that 'all' values will
fit with minimum data scanned. |
|
| remote_query | Remote SQL query from which data will be
sourced. |
| remote_query_filter_column | Name of column to be used for splitting
remote_query into multiple sub-queries
using the data distribution of given
column. |
| remote_query_increasing_column | Column on subscribed remote query result
that will increase for new records
(e.g., TIMESTAMP). |
| remote_query_partition_column | Alias name for
remote_query_filter_column. |
| update_on_existing_pk | Specifies the record collision policy
for inserting into a table with a
primary key.
If set to true, any existing table
record with primary key values that
match those of a record being inserted
will be replaced by that new record (the
new data will be 'upserted'). If set to
false, any existing table record with
primary key values that match those of a
record being inserted will remain
unchanged, while the new record will be
rejected and the error handled as
determined by ignore_existing_pk and
error_handling. If the specified
table does not have a primary key, then
this option has no effect. The default value is false. | Supported
Values | Description |
|---|
| true | Upsert new records when primary keys
match existing records. | | false | Reject new records when primary keys
match existing records. |
|