A data source is reference object for a data set that is external to the database. It consists of the location & connection information to that external source, but doesn't hold the names of any specific data sets/files within that source. A data source can make use of a credential object for storing remote authentication information.
The following data source types are supported:
- Azure (Microsoft blob storage)
- HDFS (Apache Hadoop Distributed File System)
- Kafka (Apache Kafka streaming feed)
- S3 (Amazon S3 Bucket)
The following hosts are used for each of the data source providers:
- Azure: <service_account_name>.blob.core.windows.net
- HDFS: Specified via the location parameter
- Kafka: Specified via the location parameter
- S3: <region>.amazonaws.com
Data sources perform no function by themselves, but act as proxies for accessing external data when referenced in certain database operations. The following can make use of data sources:
- External tables (see also the CREATE EXTERNAL TABLE command in SQL)
- Insert records (from files) API calls (see also the LOAD DATA command in SQL)
Individual files within a data source need to be identified when the data source is referenced within these calls.
The data source will be validated upon creation, by default, and will fail to be created if an authorized connection cannot be established.
Managing Data Sources
A data source can be managed using the following API endpoint calls. For managing data sources in SQL, see CREATE DATA SOURCE.
|/create/datasource||Creates a data source, given a location and connection information|
|/alter/datasource||Modifies the properties of a data source, validating the new connection|
|/drop/datasource||Removes the data source reference from the database; will not modify the external source data|
|/show/datasource||Outputs the data source properties; passwords are redacted|
|/grant/permission/datasource||Grants the permission for a user to connect to a data source|
|/revoke/permission/datasource||Revokes the permission for a user to connect to a data source|
Creating a Data Source
To create a data source, kin_ds, that connects to an Amazon S3 bucket, kinetica_ds, in the US East (N. Virginia) region, in Python:
For Amazon S3 connections, the user_name & password parameters refer to the AWS Access ID & Key, respectively.
Several authentication schemes across multiple providers are supported.
- Amazon S3 Using Access Key
- Amazon S3 Using IAM Role
- Azure Using Password
- Azure Using SAS Token
- Azure Using OAuth Token
- Azure Using Active Directory
- HDFS Using Password
- HDFS Using Kerberos Token
- HDFS Using Kerberos Keytab
- Kafka without Authentication
- Kafka with Authentication
Amazon S3 Using Access Key
Amazon S3 Using IAM Role
Azure Using Password
Azure Using SAS Token
Azure Using OAuth Token
Azure Using Active Directory
HDFS Using Password
HDFS Using Kerberos Token
HDFS Using Kerberos Keytab
Kafka without Authentication
Kafka with Authentication
Creating an authenticated Kafka data source requires creating a corresponding credential object to store the Kafka credentials and then referencing that object when creating the data source.