A collection is a container for tables and/or views, somewhat akin to a schema in other databases. Other types of views, like joins, can also be created within collections. The contained tables and/or views can be of uniform or type schemas.
A view collection is what results from performing a filter operation against another collection.
Besides providing a means for logical grouping of tables, a collection provides the ability to query the contained tables, regardless of the mixture of type schemas, for columns shared between all tables. Tables within the collection that lack the queried-for columns will simply not contribute to the resulting data set.
For example, a transportation department may have two teams gathering ongoing
spatial information in their respective jurisdictions. Team A ingests
geo-referenced objects from Twitter into a table named TWITTER
inside of
the collection MASTER
. Their objects have X
and Y
columns
corresponding to longitude and latitude, as well as TIMESTAMP
and other
columns. Meanwhile, Team B collects vehicle movement information, adding
their objects with X
, Y
, TIMESTAMP
, TRACK_ID
and other columns
to a table named VEHICLE_TRACKS
also inside the MASTER
collection.
Now, queries for X
, Y
, & TIMESTAMP
on the collection MASTER
will be applied to both tables.
Collections do not themselves have a time-to-live in the way that views do. However, an empty collection will be removed from memory automatically when the database on which it resides is restarted. In that case, it will need to be recreated for applications that depend on its existence, like ODBC or Reveal.
Setting a time-to-live on a collection will set the time-to-live of every table & view contained within it. This creates an effective time-to-live for the collection, as each access of a member of the collection will extend its life.
Collections have the same naming criteria as tables.
A collection can be created using the /create/table endpoint with
the is_collection
option set to true
.
Note
Since collections can contain tables of different types, you don't need to specify a type ID.
In Python, the GPUdbTable
class serves as a convenient wrapper for many
table-related endpoints and can be used to create a collection implicitly, if
it doesn't already exist; for example:
gpudb.GPUdbTable(
None,
name = "my_collection",
db = h_db,
options = gpudb.GPUdbTableOptions.default().is_collection(True)
)
After a collection is created, tables and views can be added to the
collection at creation time using the compatible endpoints with the
collection_name
option set to the collection's name.
For example, in Python,:
# Create a column list
columns = [
[ "id", gpudb.GPUdbRecordColumn._ColumnType.INT ],
[ "name", gpudb.GPUdbRecordColumn._ColumnType.STRING, gpudb.GPUdbColumnProperty.CHAR64 ]
]
# Create a simple table using the column list
gpudb.GPUdbTable(
columns,
name = "table_in_collection",
db = h_db,
options = gpudb.GPUdbTableOptions.default().collection_name("my_collection")
)
Once records/tables are available in the collection, you can use the /get/records/fromcollection endpoint to retrieve data from the collection.
For example, in Python,:
result = h_db.get_records_from_collection_and_decode("my_collection", 0, -9999)
for collection_record in result["records"]:
print collection_record.values()