/create/datasource

URL: http://<db.host>:<db.port>/create/datasource

Creates a data source, which contains the location and connection information for a data store that is external to the database.

Input Parameter Description

Name

Type

Description

name

string

Name of the data source to be created.

location

string

Location of the remote storage in 'storage_provider_type://[storage_path[:storage_port]]' format. Supported storage provider types are 'azure', 'gcs', 'hdfs', 'jdbc', 'kafka', 'confluent', and 's3'.

user_name

string

Name of the remote system user; may be an empty string

password

string

Password for the remote system user; may be an empty string

options

map of string to strings

Optional parameters. The default value is an empty map ( {} ).

Supported Parameters (keys)

Parameter Description

skip_validation

Bypass validation of connection to remote source. The default value is false. The supported values are:

true
false

connection_timeout

Timeout in seconds for connecting to this storage provider

wait_timeout

Timeout in seconds for reading from this storage provider

credential

Name of the credential object to be used in data source

s3_bucket_name

Name of the Amazon S3 bucket to use as the data source

s3_region

Name of the Amazon S3 region where the given bucket is located

s3_verify_ssl

Whether to verify SSL connections The default value is true.

Supported Values	Description
true	Connect with SSL verification
false	Connect without verifying the SSL connection; for testing purposes, bypassing TLS errors, self-signed certificates, etc.

s3_use_virtual_addressing

Whether to use virtual addressing when referencing the Amazon S3 source The default value is true.

Supported Values	Description
true	The requests URI should be specified in virtual-hosted-style format where the bucket name is part of the domain name in the URL.
false	Use path-style URI for requests.

s3_aws_role_arn

Amazon IAM Role ARN which has required S3 permissions that can be assumed for the given S3 IAM user

s3_encryption_customer_algorithm

Customer encryption algorithm used encrypting data

s3_encryption_customer_key

Customer encryption key to encrypt or decrypt data

hdfs_kerberos_keytab

Kerberos keytab file location for the given HDFS user. This may be a KIFS file.

hdfs_delegation_token

Delegation token for the given HDFS user

hdfs_use_kerberos

Use kerberos authentication for the given HDFS cluster The default value is false. The supported values are:

true
false

azure_storage_account_name

Name of the Azure storage account to use as the data source, this is valid only if tenant_id is specified

azure_container_name

Name of the Azure storage container to use as the data source

azure_tenant_id

Active Directory tenant ID (or directory ID)

azure_sas_token

Shared access signature token for Azure storage account to use as the data source

azure_oauth_token

OAuth token to access given storage container

gcs_bucket_name

Name of the Google Cloud Storage bucket to use as the data source

gcs_project_id

Name of the Google Cloud project to use as the data source

gcs_service_account_keys

Google Cloud service account keys to use for authenticating the data source

is_stream

To load from Azure/GCS/S3 as a stream continuously. The default value is false. The supported values are:

true
false

kafka_topic_name

Name of the Kafka topic to use as the data source

jdbc_driver_jar_path

JDBC driver jar file location. This may be a KIFS file.

jdbc_driver_class_name

Name of the JDBC driver class

anonymous

Use anonymous connection to storage provider--DEPRECATED: this is now the default. Specify use_managed_credentials for non-anonymous connection. The default value is true. The supported values are:

true
false

use_managed_credentials

When no credentials are supplied, we use anonymous access by default. If this is set, we will use cloud provider user settings. The default value is false. The supported values are:

true
false

use_https

Use https to connect to datasource if true, otherwise use http The default value is true. The supported values are:

true
false

schema_registry_location

Location of Confluent Schema Registry in '[storage_path[:storage_port]]' format.

schema_registry_credential

Confluent Schema Registry credential object name.

schema_registry_port

Confluent Schema Registry port (optional).

schema_registry_connection_retries

Confluent Schema registry connection timeout (in Secs)

schema_registry_connection_timeout

Confluent Schema registry connection timeout (in Secs)

Output Parameter Description

The GPUdb server embeds the endpoint response inside a standard response structure which contains status information and the actual response to the query. Here is a description of the various fields of the wrapper:

Name

Type

Description

status

String

'OK' or 'ERROR'

message

String

Empty if success or an error message

data_type

String

'create_datasource_response' or 'none' in case of an error

data

String

Empty string

data_str

JSON or String

This embedded JSON represents the result of the /create/datasource endpoint:

Name	Type	Description
name	string	Value of input parameter name.
info	map of string to strings	Additional information.

Empty string in case of an error.