SQL Developer Guide
Step-by-step instructions to interact with Kinetica via SQL
Step-by-step instructions to interact with Kinetica via SQL
The following guide provides step-by-step instructions to get started interacting with Kinetica using SQL. This guide demonstrates some of the most common and useful SQL commands, but a detailed look into the full SQL support is available on the SQL page.
For a description of how to configure various clients for issuing SQL commands, see Connecting.
The tutorial script makes reference to a data file in the current directory. This path can be updated to point to a different path local to where the script is being run:
|
|
The data file will be uploaded into KiFS and then loaded into the table from there:
|
|
Before creating any database objects for this tutorial, a schema needs to be created to contain them:
|
|
Tables can be created as replicated using the REPLICATED specification. The OR REPLACE specification can be added to ensure the table is replaced if it already exists in the database. Adding a schema name before the table name in the CREATE TABLE command, e.g., CREATE TABLE my_schema.mytable will create the table under the given schema; however, the schema must first exist. For this tutorial, three tables will be created:
|
|
|
|
|
|
The INSERT INTO ... VALUES statement is used to insert records into a table. At a minimum, each record must have values specified for all of the NOT NULL columns in the table. The values specified in the VALUES clause must match the number and order of the columns specified, or they must align with the ordering of the columns in the table if no columns were specified:
|
|
If each record in the VALUES clause specifies values for all columns in their natural table ordering (the order in which they appeared in the original CREATE TABLE statement) no column names need to be specified, like so:
|
|
As mentioned in the Prerequisites section, the data file is uploaded into Kinetica File System (KiFS) and then inserted into the target table from there:
|
|
A SELECT statement will retrieve the records from the given table. You can reduce the amount of records returned by using a TOP n clause. This will select the first n records returned by the query. Use * to select all columns in the table.
|
|
|
|
The UPDATE statement is used to update matching key values for all records in a table. A WHERE clause can be used to specify the records to update.
|
|
The DELETE statement is used to delete records from a table. A WHERE clause can be used to specify the records to delete.
|
|
Some properties can be altered or added after table creation, including indexes and dictionary encoding. Use the ALTER TABLE command to specify the table and property you want to alter.
ADD INDEX will add a column index to a table column.
|
|
Use ALTER COLUMN to add column properties like dictionary encoding.
|
|
Important
When adding a new property, all existing column properties must be listed along with any new properties; those not listed will be removed.
A simple filter can be performed using the SELECT statement's WHERE clause.
|
|
Note
This query makes use of the column index created in the preceding Indexes section.
BETWEEN can be used to filter values within a range, inclusively.
|
|
The CASE ... WHEN statement is a logical function that allows you to categorize records based on given filters.
|
|
Aggregate functions are just a small offering of the available SQL functions. Aggregate functions are paired with a GROUP BY clause that groups the returned records by values in a given column. The HAVING clause filters records after they've been aggregated.
|
|
Subqueries allow for nested queries within the clauses of a SQL statement. The below example is a simple SELECT statement that includes subqueries in each of its three clauses.
|
|
The WITH statement can be used to give a subquery an alias for use in a larger query. You can reuse the aliased query as if it were another result set, but only in the query immediately following the WITH statement. The "parameters" to the WITH statement are the aliases that will be given to the result columns returned by the SELECT contained within the WITH; the number of "parameters" and columns in the SELECT clause should match.
|
|
Joins allow you to link multiple tables together, along their relations, retrieving associated information from any or all of them. Tables can only be joined if they're sharded similarly or replicated.
An inner join returns only records that have matching values in both tables.
|
|
A left join returns all of the records an inner join does, but additionally, for each record in the table on the left side of the join that has no match along the relation to a record in the table on the right side of the join, a corresponding record will be returned with "left-side" columns populated with the "left-side" record data and the "right-side" columns populated with nulls.
|
|
Note
Full outer joins may require both tables to be replicated or joined on their shard keys. Set merges that perform deduplication of records, like UNION DISTINCT, INTERSECT, and EXCEPT may also need to use replicated tables to ensure the correct results, so a replicated version of the taxi (taxi_trip_data_replicated) table is created at this point in the tutorial.
|
|
A full outer join returns all of the records a left join does, but additionally, for each record in the table on the right side of the join that has no match along the relation to a record in the table on the left side of the join, a corresponding record will be returned with "right-side" columns populated with the "right-side" record data and the "left-side" columns populated with nulls.
|
|
You can also create a table directly from a query using CREATE TABLE ... AS. These can also be created as temporary tables by applying the TEMP specification. Much like regular table creation, you can also specify OR REPLACE to replace the table if it already exists.
|
|
|
|
A UNION can be used to combine homogeneous data sets into one larger data set. UNION & UNION DISTINCT will both combine data sets but only retain the records that are unique across the chosen columns, removing all duplicates. UNION ALL will combine data sets, retaining all records from the source data sets.
|
|
A INTERSECT will combine data sets but only include the records found in both data sets, removing duplicate result records. INTERSECT ALL will retain duplicate intersecting records.
|
|
A EXCEPT will return records that appear in the first data set but not the second data set. Note that the data sets on each side of the EXCEPT will have duplicates removed first, and then the set subtraction will be processed. EXCEPT ALL will retain duplicates from the first set.
|
|
Use TRUNCATE TABLE to remove all records from a table without deleting the table.
|
|
Included below is a complete example containing all the above queries, the data files, and output.
The script can be run via Workbooks in Workbench or any SQL client, including KiSQL, as follows:
|
|
Note
As this script creates a schema and several database objects within it, system admin permission is required to run it.