Kinetica File System (KiFS)

The Kinetica File System (KiFS) is a file system interface that's packaged with Kinetica. It provides a repository for users without direct access to the database cluster's file system to store and make use of files within the database.

KiFS can be leveraged by several Kinetica features:

KiFS files can be referenced with the following URI:

kifs://<kifs directory>/<kifs file>

For example, the following URI can be broken down into three components:

kifs://data/geospatial/flights.csv
ComponentValue
Schemekifs://
Directorydata
File/geospatial/flights.csv

The unique KiFS file name, when referenced in the API, is the composite of the directory and file:

data/geospatial/flights.csv

Configuration

KiFS can be configured to use any of the following for file storage:

  • Local shared storage, mounted and accessible to every node in the Kinetica cluster
  • Azure (Microsoft blob storage)
  • HDFS (Apache Hadoop Distributed File System)
  • S3 (Amazon S3 Bucket)

To enable KiFS, update the /opt/gpudb/core/etc/gpudb.conf configuration file in the KiFS section with one of the following setups, and then restart the database.

Note

Remote storage configuration parameters mirror those used for defining cold storage tiers. See Cold Storage Tier in the Configuration Reference for the full set of parameters.

Local KiFS Storage Configuration Example
1
2
kifs.type=disk
kifs.base_path=/opt/gpudb/kifs
Azure KiFS Storage Configuration Example
1
2
3
4
5
kifs.type = azure
kifs.base_path = /gpudb/kifs
kifs.azure_container_name = kinetica
kifs.azure_storage_account_name = <azure account name>
kifs.azure_storage_account_key = <azure account key>
HDFS KiFS Storage Configuration Example
1
2
3
4
5
6
kifs.type = hdfs
kifs.base_path = /gpudb/kifs
kifs.hdfs_uri = hdfs://localhost:9000
kifs.hdfs_principal = kinetica
kifs.hdfs_use_kerberos = true
kifs.hdfs_kerberos_keytab = /opt/gpudb/krb5.keytab
S3 KiFS Storage Configuration Example
1
2
3
4
5
6
7
kifs.type = s3
kifs.base_path = /gpudb/kifs
kifs.wait_timeout = 10
kifs.connection_timeout = 30
kifs.s3_bucket_name = kifs-bucket
kifs.s3_aws_access_key_id = <aws access key>
kifs.s3_aws_secret_access_key = <aws secret key>

Managing KiFS

KiFS can be managed using the following API endpoint calls.

API CallDescription
/create/directoryCreates a directory, a container for files
/delete/directoryRemoves the directory; can optionally remove all contained files
/show/directoriesOutputs the properties of one or more specified directories, or optionally, all directories
/grant/permission/directoryGrants the permission for a user to access files within a directory
/revoke/permission/directoryRevokes the permission for a user to access files within a directory
/delete/filesRemoves one or more files
/download/filesDownloads one or more files
/show/filesOutputs the properties of one or more files
/upload/filesUploads one or more files to a directory

API Initialization

The Kinetica Java API provides a streamlined interface for managing files & directories in KiFS. When using KiFS via the Java API, the handler class must be initialized. The examples in the following sections make use of this interface and assume these steps have been taken.

KiFS API Imports
1
2
3
import com.gpudb.GPUdb;
import com.gpudb.GPUdbBase;
import com.gpudb.filesystem.GPUdbFileHandler;
KiFS API Initialization
1
2
3
4
5
6
7
8
// Establish connection with a running instance of Kinetica
GPUdbBase.Options options = new GPUdbBase.Options();
options.setUsername(user);
options.setPassword(pass);
GPUdb kdb = new GPUdb(url, options);

// Acquire handle for KiFS interface
GPUdbFileHandler gfh = new GPUdbFileHandler(kdb);

Directories

In KiFS, a directory is a top-level container for files. A directory cannot be nested within another directory, though a file contained within a directory may have a path that gives the appearance of being contained within one or more nested virtual directories. A directory must exist before files can be uploaded into it.

Creating Directories

To create a KiFS directory, data:

Create Directory Example
1
gfh.createDirectory("data", false);

Showing Directories

To show the stats of a KiFS directory, data:

Show Directory Example
1
2
3
4
5
6
7
List<KifsDirectoryInfo> dirs = gfh.showDirectory("data", null);
for (KifsDirectoryInfo dir : dirs)
{
  System.out.println(dir.getKifsPath());
  System.out.println("* Created by: " + dir.getCreatedBy());
  System.out.println("* Created time: " + dir.getCreationTime());
}

To show the stats of all KiFS directories:

Show All Directories Example
1
2
3
List<KifsDirectoryInfo> allDirs = gfh.showAllDirectories(null);
for (KifsDirectoryInfo dir : allDirs)
  System.out.println("* " + dir.getKifsPath());

Deleting Directories

To delete an empty, existing KiFS directory, use the following parameters:

  • data - name of the directory
  • false - don't delete files contained within the directory, and return an error if any are found
  • false - return an error if the directory is not found
Delete Empty Directory Example
1
gfh.deleteDirectory("data", false, false);

To delete a KiFS directory, data, and all files contained within, regardless of whether the directory exists (similar to a Unix rm -rf):

  • data - name of the directory
  • true - delete any files contained within the directory
  • true - suppress the error in the case that the directory is not found
Delete Non-Empty Directory Example
1
gfh.deleteDirectory("data", true, true);

Files

In KiFS, a file is a user-uploaded text or binary object. Each file is located within a directory and must have a unique name within that directory. The name can contain forward slashes to create the appearance of a hierarchical virtual directory structure and help namespace the files. Overall, each file is referenced by the composite of the directory and file name.

Uploading Files

To upload a file to a KiFS directory, data, under a virtual directory, geo:

Upload Single File Example
1
gfh.upload("upload/data/csv/flights.csv", "data/geo", null, null);

To upload a set of files to a KiFS directory, data, under a virtual directory, geo:

Upload Multiple Files Example
1
2
3
4
5
6
7
8
9
// Add files to the list of files to upload
List<String> localFileNames = new ArrayList<>();
localFileNames.add("upload/programs/analyze-paths.jar");
localFileNames.add("upload/data/csv/flights.csv");

// Upload each file as data/geo/<filename>
//   All files, regardless of source path, will be placed directly
//   under "data/geo"
gfh.upload(localFileNames, "data/geo", null, null);

Listing Files

To list the KiFS files in a directory, data:

List Files Example
1
2
for (KifsFileInfo fileInfo : gfh.showFiles(Collections.singletonList("data")))
  System.out.println("* " + fileInfo.getFileName() + " (" + fileInfo.getFileSize() + "B)");

Downloading Files

To download a KiFS file, placing it directly under a local directory, download:

Download Single File Example
1
gfh.download("data/geo/flights.csv", "download", null, null);

To download a set of KiFS files, placing them all directly under a local directory, download:

Download Multiple Files Example
1
2
3
4
5
List<String> remoteFileNames = new ArrayList<>();
remoteFileNames.add("data/geo/analyze-paths.jar");
remoteFileNames.add("data/geo/flights.csv");

gfh.download(remoteFileNames, "download", null, null);

Deleting Files

To delete a set of KiFS files:

Delete Multiple Files Example
1
2
3
4
5
List<String> remoteFileNames = new ArrayList<>();
remoteFileNames.add("data/geo/analyze-paths.jar");
remoteFileNames.add("data/geo/flights.csv");

gfh.deleteFiles(remoteFileNames, false);