A data source is reference object for a data set that is external to the
database. It consists of the location & connection information to that external
source, but doesn't hold the names of any specific data sets/files within that
source. A data source can make use of a
credential object for storing remote authentication
information.
A data source name must adhere to the standard
naming criteria . Each data source
exists within a schema and follows the standard
name resolution rules for tables .
The following data source providers are supported:
Azure (Microsoft blob storage)
HDFS (Apache Hadoop Distributed File System)
Kafka (Apache Kafka streaming feed)
S3 (Amazon S3 Bucket)
Note
The following default hosts are used for Azure & S3, but can be
overridden in the location parameter:
Azure: <service_account_name>.blob.core.windows.net
S3: <region>.amazonaws.com
Data sources perform no function by themselves, but act as proxies for
accessing external data when referenced in certain database operations. The
following can make use of data sources :
Individual files within a data source need to be identified when the
data source is referenced within these calls.
Note
The data source will be validated upon creation, by default, and
will fail to be created if an authorized connection cannot be established.
Managing Data Sources
A data source can be managed using the following API endpoint calls. For
managing data sources in SQL, see CREATE DATA SOURCE .
Creating a Data Source
To create a data source , kin_ds , that connects to an Amazon S3 bucket,
kinetica_ds , in the US East (N. Virginia) region, in Python :
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = 'kin_ds' ,
location = 's3' ,
user_name = aws_id ,
password = aws_key ,
options = {
's3_bucket_name' : 'kinetica-ds' ,
's3_region' : 'us-east-1'
}
)
Important
For Amazon S3 connections, the user_name & password
parameters refer to the AWS Access ID & Key, respectively.
Provider-Specific Syntax
Several authentication schemes across multiple providers are supported.
Amazon S3
Public (No Auth)
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '<data source name>' ,
location = 's3[://<host>]' ,
options = {
'anonymous' : 'true' ,
's3_bucket_name' : '<aws s3 bucket name>' ,
's3_region' : '<aws s3 region>'
}
)
Access Key
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '<data source name>' ,
location = 's3[://<host>]' ,
user_name = '<aws access key id>' ,
password = '<aws secret access key>' ,
options = {
's3_bucket_name' : '<aws s3 bucket name>' ,
's3_region' : '<aws s3 region>'
}
)
IAM Role
1
2
3
4
5
6
7
8
9
10
11
h_db . create_datasource (
name = '<data source name>' ,
location = 's3[://<host>]' ,
user_name = '<aws access key id>' ,
password = '<aws secret access key>' ,
options = {
's3_bucket_name' : '<aws s3 bucket name>' ,
's3_region' : '<aws s3 region>' ,
's3_aws_role_arn' : '<amazon resource name>'
}
)
Azure BLOB
Public (No Auth)
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '<azure storage account name>' ,
options = {
'anonymous' : 'true' ,
'azure_container_name' : '<azure container name>'
}
)
Password
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '<azure storage account name>' ,
password = '<azure storage account key>' ,
options = {
'azure_container_name' : '<azure container name>'
}
)
SAS Token
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '<azure storage account name>' ,
password = '' ,
options = {
'azure_container_name' : '<azure container name>' ,
'azure_sas_token' : '<azure sas token>'
}
)
OAuth Token
1
2
3
4
5
6
7
8
9
10
h_db . create_datasource (
name = '<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '<azure storage account name>' ,
password = '' ,
options = {
'azure_container_name' : '<azure container name>' ,
'azure_oauth_token' : '<azure oauth token>'
}
)
Active Directory
1
2
3
4
5
6
7
8
9
10
11
h_db . create_datasource (
name = '<data source name>' ,
location = 'azure[://<host>]' ,
user_name = '<ad client id>' ,
password = '<ad client secret key>' ,
options = {
'azure_storage_account_name' : '<azure storage account name>' ,
'azure_container_name' : '<azure container name>' ,
'azure_tenant_id' : '<azure tenant id>'
}
)
HDFS
Password
1
2
3
4
5
6
7
h_db . create_datasource (
name = '<data source name>' ,
location = 'hdfs://<host>:<port>' ,
user_name = '<hdfs username>' ,
password = '<hdfs password>' ,
options = {}
)
Kerberos Token
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '<data source name>' ,
location = 'hdfs://<host>:<port>' ,
user_name = '<hdfs username>' ,
password = '' ,
options = {
'hdfs_use_kerberos' : 'true'
}
)
Kerberos Keytab
1
2
3
4
5
6
7
8
9
h_db . create_datasource (
name = '<data source name>' ,
location = 'hdfs://<host>:<port>' ,
user_name = '<hdfs username>' ,
password = '' ,
options = {
'hdfs_kerberos_keytab' : '<keytab file/path>'
}
)
Apache Kafka
Anonymous
1
2
3
4
5
6
7
h_db . create_datasource (
name = '<data source name>' ,
location = 'kafka://<host>:<port>' ,
options = {
'kafka_topic_name' : '<kafka topic name>'
}
)
Authenticated
1
2
3
4
5
6
7
8
h_db . create_datasource (
name = '<data source name>' ,
location = 'kafka://<host>:<port>' ,
options = {
'kafka_topic_name' : '<kafka topic name>' ,
'credential' : '<kafka credential name>'
}
)
Limitations
Azure anonymous data sources are only supported when both the container and
the contained objects allow anonymous access.
HDFS systems with wire encryption are not supported.
Kafka data sources require an associated
credential object for authentication.