Kinetica provides the Active Analytics Workbench (AAW) with the goal of simplifying and accelerating data science and machine learning in a scalable fashion. With AAW, users can ingest data, train models, make inferences (answers/output from models), and even audit models with a few endpoints (or clicks via the UI). The AAW package can be automatically installed via KAgent and coexists with the database--meaning easy access to one's data and GPUs. AAW leverages Kubernetes to deploy, train, and test models.
The AAW workflow is defined by five key concepts:
Docker container registries house the ML models themselves and abstract their implementation details. Any model used by Kinetica must exist in a Docker container registry.
Data comprises ingests, datasets, and feature sets.
- Ingests represent an ingest tool (e.g., KIO, Kafka, etc.) that pulls data from a given source (Kinetica, PostgreSQL, S3, etc.) and puts it in a new table inside Kinetica. Data can be pulled in batches or via continuous streaming.
- Datasets represent a set (or sub-set) of column data from a source table in Kinetica. One or more columns can be filtered to create a dataset.
- Feature sets represent a group of features, which are datasets transformed inline with functions or relationally using Materialized Views
A model can be a function, statistical model, regression, data model, and more that is deployed to enable inferencing capabilities. AAW can deploy any number of replicas of the model, allowing for scalability and better resource management. AAW currently only supports Blackbox models, which are models where implementation details are abstracted and housed in Docker containers. Input and output are the only available interface; they also don't require a training dataset.
Blackbox models rely on the Kinetica Blackbox SDK to fetch AAW-compatible output from a custom Blackbox. Kinetica can assist in Blackbox model container creation if necessary. Consult New Blackbox Model for more information.
A deployment represents a model that has been deployed. A deployed model can have inference tests run manually, automatically, or in batches, depending on the type of deployment. Currently there are three types of deployments:
- On-Demand -- inference tests are run as necessary using user-provided input with results being returned based on the given input
- Continuous -- inference tests are run automatically against records being streamed into an input table; inference results are inserted into an output table
- Batch -- inference tests are run against a batch of data in an existing table all at once
An audit represents the ability to audit a model deployment to ensure its training, testing, and inferencing are untampered. An audit enables drilling into specific inferences from a deployment and filtering the inference by input parameter, process status, and more.
Kinetica recommends installing AAW using KAgent, as KAgent can install everything AAW requires and can preconfigure everything for the best out-of-the-box experience. Consult Cluster for AAW package install instructions using KAgent. Since AAW relies on Kubernetes, there are two official methods of setting up Kubernetes with KAgent:
- Using an embedded version of Kubernetes -- KAgent will install and set-up Kubernetes for you on the same infrastructure as Kinetica
- Using an external Kubernetes cluster -- KAgent will configure AAW to communicate with the external Kubernetes cluster
After the installation is finished, a kml service is available to manage the AAW user interface and API.
Kinetica recommends users opt for an external Kubernetes cluster for the following reasons:
- External Kubernetes clusters are fully managed, monitored, and updated
- External Kubernetes clusters won't contend with the database for resources and as such, can achieve greater performance on the database and AAW sides
- External Kubernetes clusters can be scaled independently of the database
Amazon Web Services (AWS) has an Elastic Kubernetes Service (EKS) and Microsoft Azure has an Azure Kubernetes Service that are fully managed services that can be used with AAW.
If users still wish to opt for the embedded Kubernetes setup, please consider the following limitations:
- The embedded Kubernetes cluster is a stock Kubernetes installation that is statically sized upon cluster initialization
- The embedded Kubernetes cluster cannot be updated independently of the database, i.e. Kubernetes is only updated when the database is updated, assuming the database version to be installed includes Kubernetes updates as well
- Kinetica does not provide direct management tools for the embedded Kubernetes cluster, but other tools can be used to connect to and manage the embedded Kubernetes cluster. Additionally, there are utility scripts located in /opt/gpudb/kml/utils that can assist with some of the cluster management
All the main logs for the AAW service and API are located in
/opt/gpudb/kml/logs. All logs for the AAW user interface are located