A data source is reference object for a data set that is external to the
database. It consists of the location & connection information to that external
source, but doesn't hold the names of any specific data sets/files within that
source. A data source can make use of a
credential object for storing remote authentication
information.
A data source name must adhere to the standard
naming criteria . Each data source
exists within a schema and follows the standard
name resolution rules for tables .
The following data source providers are supported:
Azure (Microsoft blob storage)CData (CData Software source-specific JDBC driver; see driver list for the
full list of supported JDBC drivers)GCS (Google Cloud Storage)HDFS (Apache Hadoop Distributed File System)JDBC (Java Database Connectivity, using a user-supplied driver)Kafka (streaming feed) S3 (Amazon S3 Bucket) Note
The following default hosts are used for Azure & S3, but can be
overridden in the location parameter:
Azure: <service_account_name>.blob.core.windows.net S3: <region>.amazonaws.com Data sources perform no function by themselves, but act as proxies for
accessing external data when referenced in certain database operations. The
following can make use of data sources :
Individual files within a data source need to be identified when the
data source is referenced within these calls.
Note
The data source will be validated upon creation, by default, and
will fail to be created if an authorized connection cannot be established. CData data sources can use a JDBC credential
for authentication. Managing Data Sources A data source can be managed using the following API endpoint calls. For
managing data sources in SQL, see CREATE DATA SOURCE .
Creating a Data Source To create a data source , kin_ds , that connects to an Amazon S3 bucket,
kinetica_ds , in the US East (N. Virginia) region, in Python :
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = 'kin_ds' ,
location = 's3' ,
user_name = aws_id ,
password = aws_key ,
options = {
's3_bucket_name' : 'kinetica-ds' ,
's3_region' : 'us-east-1'
}
)
Important
For Amazon S3 connections, the user_name & password
parameters refer to the AWS Access ID & Key, respectively.
Provider-Specific Syntax Several authentication schemes across multiple providers are supported.
Azure BLOB Credential
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>' ,
'azure_container_name' : '<azure container name>'
}
)
Public (No Auth)
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '<azure storage account name>' ,
password = '' ,
options = {
'azure_container_name' : '<azure container name>'
}
)
Password
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '<azure storage account name>' ,
password = '<azure storage account key>' ,
options = {
'azure_container_name' : '<azure container name>'
}
)
SAS Token
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '<azure storage account name>' ,
password = '' ,
options = {
'azure_sas_token' : '<azure sas token>' ,
'azure_container_name' : '<azure container name>'
}
)
Active Directory
1
2
3
4
5
6
7
8
9
10
11
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '<ad client id>' ,
password = '<ad client secret key>' ,
options = {
'azure_storage_account_name' : '<azure storage account name>' ,
'azure_container_name' : '<azure container name>' ,
'azure_tenant_id' : '<azure tenant id>'
}
)
CData Credential
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = '<cdata jdbc url>' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>'
}
)
Password in URL
1
2
3
4
5
6
7
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = '<cdata jdbc url with username/password>' ,
user_name = '' ,
password = '' ,
options = {}
)
Password as Parameter
1
2
3
4
5
6
7
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = '<cdata jdbc url>' ,
user_name = '<jdbc username>' ,
password = '<jdbc password>' ,
options = {}
)
GCS Credential
1
2
3
4
5
6
7
8
9
10
11
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = 'gcs[://<host>]' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>' ,
[ 'gcs_project_id' : '<gcs project id>' ,]
'gcs_bucket_name' : '<gcs bucket name>'
}
)
Public (No Auth)
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'gcs[://<host>]' ,
user_name = '' ,
password = '' ,
options = {
[ 'gcs_project_id' : '<gcs project id>' ,]
'gcs_bucket_name' : '<gcs bucket name>'
}
)
User ID & Key
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'gcs[://<host>]' ,
user_name = '<gcs account id>' ,
password = '<gcs account private key>' ,
options = {
[ 'gcs_project_id' : '<gcs project id>' ,]
'gcs_bucket_name' : '<gcs bucket name>'
}
)
JSON Key
1
2
3
4
5
6
7
8
9
10
11
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'gcs[://<host>]' ,
user_name = '' ,
password = '' ,
options = {
'gcs_service_account_keys' : '<gcs account json key text>' ,
[ 'gcs_project_id' : '<gcs project id>' ,]
'gcs_bucket_name' : '<gcs bucket name>'
}
)
HDFS Credential
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = 'hdfs://<host>:<port>' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>'
}
)
Password
1
2
3
4
5
6
7
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'hdfs://<host>:<port>' ,
user_name = '<hdfs username>' ,
password = '<hdfs password>' ,
options = {}
)
Kerberos Token
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'hdfs://<host>:<port>' ,
user_name = '<hdfs username>' ,
password = '' ,
options = {
'hdfs_use_kerberos' : 'true'
}
)
Kerberos Keytab
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'hdfs://<host>:<port>' ,
user_name = '<hdfs username>' ,
password = '' ,
options = {
'hdfs_kerberos_keytab' : 'kifs://<keytab file/path>'
}
)
JDBC Credential
1
2
3
4
5
6
7
8
9
10
11
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = '<jdbc url>' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>' ,
'jdbc_driver_class_name' = '<jdbc driver class full path>' ,
'jdbc_driver_jar_path' = 'kifs://<jdbc driver jar path>'
}
)
Password
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = '<jdbc url>' ,
user_name = '<jdbc username>' ,
password = '<jdbc password>' ,
options = {
'jdbc_driver_class_name' = '<jdbc driver class full path>' ,
'jdbc_driver_jar_path' = 'kifs://<jdbc driver jar path>'
}
)
Kafka (Apache) Credential
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = 'kafka://<host>:<port>' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>' ,
'kafka_topic_name' : '<kafka topic name>'
}
)
Credential w/ Schema Registry
1
2
3
4
5
6
7
8
9
10
11
12
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = 'kafka://<host>:<port>' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>' ,
'kafka_topic_name' : '<kafka topic name>' ,
'schema_registry_credential' = '[<sr credential schema name>.]<sr credential name>' ,
'schema_registry_location' = '<schema registry url>'
}
)
Public (No Auth)
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'kafka://<host>:<port>' ,
user_name = '' ,
password = '' ,
options = {
'kafka_topic_name' : '<kafka topic name>'
}
)
Kafka (Confluent) Credential
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = 'confluent://<host>:<port>' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>' ,
'kafka_topic_name' : '<kafka topic name>'
}
)
Credential w/ Schema Registry
1
2
3
4
5
6
7
8
9
10
11
12
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = 'confluent://<host>:<port>' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>' ,
'kafka_topic_name' : '<kafka topic name>' ,
'schema_registry_credential' = '[<sr credential schema name>.]<sr credential name>' ,
'schema_registry_location' = '<schema registry url>'
}
)
Public (No Auth)
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 'confluent://<host>:<port>' ,
user_name = '' ,
password = '' ,
options = {
'kafka_topic_name' : '<kafka topic name>'
}
)
S3 (Amazon) Credential
1
2
3
4
5
6
7
8
9
10
11
h_db . create_datasource (
name = '[<data source schema name>.]<data source name>' ,
location = 's3[://<host>]' ,
user_name = '' ,
password = '' ,
options = {
'credential' : '[<credential schema name>.]<credential name>' ,
's3_bucket_name' : '<aws s3 bucket name>' ,
's3_region' : '<aws s3 region>'
}
)
Public (No Auth)
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 's3[://<host>]' ,
user_name = '' ,
password = '' ,
options = {
's3_bucket_name' : '<aws s3 bucket name>' ,
's3_region' : '<aws s3 region>'
}
)
Access Key
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 's3[://<host>]' ,
user_name = '<aws access key id>' ,
password = '<aws secret access key>' ,
options = {
's3_bucket_name' : '<aws s3 bucket name>' ,
's3_region' : '<aws s3 region>'
}
)
IAM Role
1
2
3
4
5
6
7
8
9
10
11
h_db . create_datasource (
name = '[<schema name>.]<data source name>' ,
location = 's3[://<host>]' ,
user_name = '<aws access key id>' ,
password = '<aws secret access key>' ,
options = {
's3_bucket_name' : '<aws s3 bucket name>' ,
's3_region' : '<aws s3 region>' ,
's3_aws_role_arn' : '<amazon resource name>'
}
)
Limitations Azure anonymous data sources are only supported when both the container and
the contained objects allow anonymous access. HDFS systems with wire encryption are not supported. Kafka data sources require an associated
credential object for authentication.