Overview
Kinetica supports arbitrary iteration over records within a data set for the purpose of generating0 to N result records per iterated record.
The primary use case for this is in returning sub-elements of a given geospatial
column as individual records; e.g., returning each point contained within a set
of WKTs as a distinct row in a result set.
This capability exists in both the native API and SQL using the ITER virtual
table. By joining a data set with the ITER table, each record in the data
set can be returned as a user-defined number of records in the result set.
The iteration can occur over sub-elements of a column value (as in the case of
the point extraction from a WKT) or a fixed number of times for each record in
a given data set. To accomplish this, an iteration expression must be used to
limit the ITER table’s upper bound column, i:
-
a value-based number of iterations—here, the number of iterations for a given
record is equal to the length of a string column in that record:
-
a constant number of iterations—here, there are 5 iterations (result rows)
for each record in the data set:
ITER table can be utilized via the native /create/jointable
endpoint or by joining against the ITER table in SQL.
Examples
Geospatial
In these examples, point & line extraction from a WKT column will be demonstrated, using variable & fixed iteration techniques. They will operate against aline table containing an identifier column and a
WKT column:
Point Extraction - Variable Iteration
The individual points can be extracted from eachLINESTRING. Each record in
line will be iterated over a number of times equal to the number of points
in each line’s WKT value (ST_NPOINTS(wkt)). Each resulting record will be
the point within the WKT at the ith + 1 position (ST_POINTN(wkt, i + 1)).
Note that the ITER table’s i column values begin at 0.
LINESTRING values of three & four points
each were turned into seven records—one for each POINT represented in the
two LINESTRING values.
Line Segment Extraction - Variable Iteration
The line segments can be extracted from eachLINESTRING. Each record in
line will be iterated over a number of times equal to the number of
component line segments in each line’s WKT value (ST_NPOINTS(wkt) - 1).
Each resulting record will be a LINESTRING created from the points within
the WKT at the ith + 1 and ith + 2 positions
(ST_POINTN(wkt, i + 1) & ST_POINTN(wkt, i + 2), respectively). Note
that the ITER table’s i column values begin at 0.
LINESTRING values of three & four points
each were turned into five records—two component line segments for the first
LINESTRING value and three for the second one.
Endpoint Extraction - Fixed Iteration
The points at both ends of eachLINESTRING can be extracted. Each record in
line will be iterated over a fixed number of times—here, twice, one time
for each point at the ends of the WKT value.
Each resulting record will either be the point at the beginning
(ST_STARTPOINT(wkt)) or end (ST_ENDPOINT(wkt)) of the LINETSTRING.
Note that the ITER table’s i column values begin at 0, and that this
query uses implied join syntax—no JOIN keyword is necessary.
LINESTRING values were turned into four
records—one for the POINT at each end of the two LINESTRING values.
Non-Geospatial
In these examples, character extraction from a string column will be demonstrated, using variable & fixed iteration techniques. They will operate against adictionary table containing an identifier column
and a charN word column:
Letter Counting - Variable Iteration
Individual letters can be extracted from each word and then aggregated to count the number of occurrences of each letter across all words indictionary.
Each record in dictionary will be iterated over a number of times equal to
the number of letters in each word (LENGTH(word)). Each resulting record
will be the letter within the word at the ith + 1 position
(SUBSTR(word, i + 1, 1)). Note that the ITER table’s i column
values begin at 0.
JOIN keyword
is necessary.
Data Duplication - Fixed Iteration
Data can be duplicated without being materialized through the use of iteration. This may be useful in several situations, including generating randomized test data from a template table or generating a large number of records for load testing. In either case, the memory footprint will be small, relative to the size of the table involved in the iteration operation. Queries against the result set should be as performant as those on tables with an equivalent number of materialized records. In this example, the words in thedictionary table will be duplicated five
times:
Limitations
- As outer joins (
LEFT,RIGHT, &FULL) have no meaning within the context of iteration, they should not be used in the join - Though multiple iterators can be used in the same operation, they cannot
refer to each other for bounding purposes; e.g.,
iter1.i < iter2.i