> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kinetica.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Load Data from GCS

> Copy and paste tutorial for loading data from Google Cloud Storage

<a id="load-data-by-source-gcs" />

Loading data from Google Cloud Storage can be done in three easy steps:

1. [Create a credential](/content/load_data/by_source/gcs#load-from-gcs-credential) - holds GCS
   account authentication information
2. [Create a data source](/content/load_data/by_source/gcs#load-from-gcs-data-source) - holds
   GCS connection information; uses the *credential* for authentication
3. [Ingest the data](/content/load_data/by_source/gcs#load-from-gcs-ingest) - Load data from
   GCS; uses the *data source* to identify the GCS source

<a id="load-from-gcs-credential" />

## Create Credential

To store the access details for your *data source*, first create a *credential*
with the [CREATE CREDENTIAL](/content/sql/ddl#sql-create-credential) command, and
reference it in your `CREATE DATA SOURCE` statement.  This will allow you to
store the authentication details of your connection separately.

If connecting to a public GCS bucket that allows anonymous access, this step can
be skipped.

<CodeGroup>
  ```sql User ID & Key theme={null}
  CREATE CREDENTIAL gcs_cred
  TYPE = 'gcs_service_account_id',
  IDENTITY = 'auser@auser.iam.gserviceaccount.com',
  SECRET = '-----BEGIN PRIVATE KEY-----\nABCDEFG=\n-----END PRIVATE KEY-----\n'
  ```

  ```sql JSON Key theme={null}
  CREATE CREDENTIAL gcs_cred
  TYPE = 'gcs_service_account_keys'
  WITH OPTIONS
  (
  	GCS_SERVICE_ACCOUNT_KEYS = '
  	{
  		"type": "service_account",
  		"project_id": "auser",
  		"private_key_id": "abcdef1234567890",
  		"private_key": "-----BEGIN PRIVATE KEY-----\nABCDEFG=\n-----END PRIVATE KEY-----\n",
  		"client_email": "auser@auser.iam.gserviceaccount.com",
  		"client_id": "1234567890",
  		"auth_uri": "https://accounts.google.com/o/oauth2/auth",
  		"token_uri": "https://oauth2.googleapis.com/token",
  		"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  		"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/auser%40auser.iam.gserviceaccount.com"
  	}'
  )
  ```
</CodeGroup>

<a id="load-from-gcs-data-source" />

## Create Data Source

Next, create a *data source* using the
[CREATE DATA SOURCE](/content/sql/ddl#sql-create-data-source) command in Kinetica.
The *data source* defines how Kinetica connects to Google Cloud Storage.

<CodeGroup>
  ```sql Credential theme={null}
  CREATE DATA SOURCE gcs_ds
  LOCATION = 'GCS'
  WITH OPTIONS
  (
  	CREDENTIAL = 'gcs_cred',
  	GCS_BUCKET_NAME = 'gcs-public'
  )
  ```

  ```sql Managed Credentials theme={null}
  CREATE DATA SOURCE gcs_ds
  LOCATION = 'GCS'
  WITH OPTIONS
  (
  	USE_MANAGED_CREDENTIALS = true,
  	GCS_BUCKET_NAME = 'gcs-public'
  )
  ```

  ```sql Public (No Auth) theme={null}
  CREATE DATA SOURCE gcs_ds
  LOCATION = 'GCS'
  WITH OPTIONS
  (
  	GCS_BUCKET_NAME = 'gcs-public'
  )
  ```

  ```sql User ID & Key theme={null}
  CREATE DATA SOURCE gcs_ds
  LOCATION = 'GCS'
  USER = 'auser@auser.iam.gserviceaccount.com'
  PASSWORD = '-----BEGIN PRIVATE KEY-----\nABCDEFG=\n-----END PRIVATE KEY-----\n'
  WITH OPTIONS
  (
  	GCS_BUCKET_NAME = 'gcs-public'
  )
  ```

  ```sql JSON Key theme={null}
  CREATE DATA SOURCE gcs_ds
  LOCATION = 'GCS'
  WITH OPTIONS
  (
  	GCS_SERVICE_ACCOUNT_KEYS = '
  	{
  		"type": "service_account",
  		"project_id": "auser",
  		"private_key_id": "abcdef1234567890",
  		"private_key": "-----BEGIN PRIVATE KEY-----\nABCDEFG=\n-----END PRIVATE KEY-----\n",
  		"client_email": "auser@auser.iam.gserviceaccount.com",
  		"client_id": "1234567890",
  		"auth_uri": "https://accounts.google.com/o/oauth2/auth",
  		"token_uri": "https://oauth2.googleapis.com/token",
  		"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  		"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/auser%40auser.iam.gserviceaccount.com"
  	}',
  	GCS_BUCKET_NAME = 'gcs-public'
  )
  ```
</CodeGroup>

<a id="load-from-gcs-ingest" />

## Ingest Data

To initiate the ingest of data into a Kinetica table, use the
[LOAD INTO](/content/sql/load#sql-load-into) command. You can initiate a one-time
batch load, or a continuous load using change-data-capture by using the option
`SUBSCRIBE = TRUE`.

<CodeGroup>
  ```sql Batch Load theme={null}
  LOAD DATA INTO example.orders
  FROM FILE PATHS 'example_orders.csv'
  FORMAT TEXT
  WITH OPTIONS (DATA SOURCE = 'gcs_ds')
  ```

  ```sql Streaming Load theme={null}
  LOAD DATA INTO example.orders
  FROM FILE PATHS 'example_orders.csv'
  FORMAT TEXT
  WITH OPTIONS (
      DATA SOURCE = 'gcs_ds',
      SUBSCRIBE = TRUE
  )
  ```
</CodeGroup>

## Considerations

The following are some common options used when loading.  For the full list of
options, see [LOAD INTO](/content/sql/load#sql-load-into).  For copy/paste examples of many of
the options, see [Loading Data](/content/snippets/load-data).

### Error Handling

Kinetica has two different
[error handling modes](/content/sql/load#sql-load-file-server-load-opt) for dealing
with erroneous data. To halt ingestion after a bad record is found, use the
`ABORT` mode. To skip erroneous records and continue the ingest, use the
`SKIP` mode.

To inspect erroneous records, you may use the
[BAD RECORD TABLE NAME](/content/sql/load#sql-load-file-server-load-opt) option.
All bad records encountered will be stored there for review. The bad records
table is limited to 10,000 records by default and may be overridden using the
[BAD RECORD TABLE LIMIT](/content/sql/load#sql-load-file-server-load-opt) option.

<CodeGroup>
  ```sql Abort theme={null}
  LOAD INTO ki_home.error_example
  FROM FILE PATHS 'error_example.csv'
  FORMAT TEXT
  WITH OPTIONS (
      DATA SOURCE = 's3_ds',
      ON ERROR = ABORT
  )
  ```

  ```sql Log Bad Records theme={null}
  LOAD INTO ki_home.error_example
  FROM FILE PATHS 'error_example.csv'
  FORMAT TEXT
  WITH OPTIONS (
      DATA SOURCE = 's3_ds',
      BAD RECORD TABLE NAME = 'error_example_invalid',
      ON ERROR = SKIP
  )
  ```
</CodeGroup>

### Load Specific Columns

In some cases, you may only want to store certain columns from your source data.
Use the [FIELDS MAPPED BY NAME(...)](/content/sql/load#sql-load-file-server-load-opt)
option, which allows you to specify the desired fields to store in Kinetica.

```sql title="Load Specific Columns Example" theme={null}
LOAD DATA INTO example.orders
FROM FILE PATHS 'example_orders.csv'
FORMAT TEXT
WITH OPTIONS (
    DATA SOURCE = 's3_ds',
    FIELDS MAPPED BY NAME(ID, Name, Product_ID, Quantity)
)
```

### DateTime Formatting

Use the  [COLUMN FORMATS](/content/sql/load#sql-load-file-server-load-opt) option
to format date and time fields into Kinetica Date, Time, and DateTime columns.
Time formats are specified using a JSON formatted string. Non-placeholder characters
must be wrapped in quotes, which must also be escaped (e.g.
`'{"dt": {"date": "\"(\"YYYY\")\" Mon, DD"}}'`). Alternatively, you can use
the `?` character as a wildcard character. Note that
Kinetica does not handle or store timezones and they will be discarded.
See the full list of supported
[date and time format codes](/content/sql/query#sql-datetime-conversion-codes).

<CodeGroup>
  ```sql ISO 8601 Timestamps theme={null}
  -- Example Data:
  -- dt
  -- 2022-01-19T15:50:42Z+05:00

  LOAD INTO ki_home.timestamp_example
  FROM FILE PATHS 'timestamp_example.csv'
  FORMAT TEXT
  WITH OPTIONS (
      DATA SOURCE = 's3_ds',
      COLUMN FORMATS = '
      {
          "dt": {"datetime": "YYYY-MM-DD?HH:MI:SS"}
      }'
  )
  ```

  ```sql Custom Date theme={null}
  -- Example Data:
  -- d
  -- (2022) Feb, 22

  LOAD INTO ki_home.date_example
  FROM FILE PATHS 'date_example.csv'
  FORMAT TEXT
  WITH OPTIONS (
      DATA SOURCE = 's3_ds',
      COLUMN FORMATS = '{"d": {"date": "\"(\"YYYY\")\" Mon, DD"}}'
  )
  ```

  ```sql Custom Time theme={null}
  -- Example Data:
  -- t
  -- 18-27-59.5536

  LOAD INTO ki_home.time_example
  FROM FILE PATHS 'time_example.csv'
  FORMAT TEXT
  WITH OPTIONS (
      DATA SOURCE = 's3_ds',
      COLUMN FORMATS = '{"t": {"time": "HH-MI-SS.MS"}}'
  )
  ```
</CodeGroup>
