Full Text Search

The SQL FILTER_BY_STRING function, as well as the /filter/bystring endpoint, enables the use of several string-based search modes, including full text search via the search mode. The syntax and restrictions for this search follow, including examples of applying different search criteria.

In order for a full text search to be applied to a column, the following requirements must be met:

  • Full text search must be enabled by setting enable_text_search to true in the /opt/gpudb/core/etc/gpudb.conf configuration file; see Text Search for details.
  • The column must be a string type and have the text_search column property applied. The following effective types under the string base type are text-searchable:
    • char1 - char256
    • date
    • datetime
    • ipv4
    • string
    • time
    • uuid

Individual Term Match

To search for one or more search terms, add the terms to a space-separated list.

Search TextResult
perfect unionmatches any record containing perfect, union, or both

Exact Phrase Match

To search for an exact phrase, double quote the entire search text.

Search TextResult
"Perfect Union"matches the exact phrase Perfect Union

Boolean Operations

Valid operators include NOT, AND, OR, & parentheses (). An OR is assumed if no operator is specified.

Search TextResult
justice AND tranquilitymatches records containing both justice and tranquility
justice OR tranquility

matches records containing either justice or tranquility (or both)

Note

This is the default behavior, so justice OR tranquility is equivalent to justice tranquility

justice NOT tranquilitymatches records containing justice that don't contain tranquility
(justice tranquility honesty) AND peacematches records containing peace that also contain any one or more of justice, tranquility, & honesty

Wildcard Match

Wildcard matches can be specified with either single-character or multi-character wildcards, or a combination of the two:

  • ? - Exactly one character
  • * - Zero or more characters
Search TextResult
est*ish*matches any records containing a word that starts with est, followed by any number of characters, followed by is, and ending with any number of additional characters; this would match records containing establish, establishable, and establishment
est???is?matches any records containing a word that starts with est, followed by exactly three characters, followed by is, and ending with any single character; this would match records containing establish, but not establishable or establishment
est*abl?matches any records containing a word that starts with est, followed by any number of characters, followed by abl, and ending with any single character; this would match records containing establishable, but not establish or establishment

Fuzzy Match

Fuzzy matches can be specified by appending a tilde (~) to the end of the fuzzy match term. A decimal number between 0 (inclusive) and 1 (exclusive) can be used to specify the minimum relative similarity of the match, with 0 indicating the fuzziest match possible, and values closer to 1 indicating an increasingly stricter match. The default match threshold is .5.

Search TextResult
rear~matches records containing rear, fear, bear, or read
rear~.8matches records containing rear, but not fear, bear, or read

Proximity Match

To match two words within a specified distance of each other, quote the two terms, separated by a space, and append a tilde (~) followed by the distance in number of words.

Search TextResult
"Union Tranquility"~10matches records containing Union and Tranquility within 10 words of each other

Term Relevance

To increase the relevance of a matched search term versus other given terms, append a caret (^) followed by a positive numeric boosting factor for that term. A quote-enclosed phrase can also have its relevance increased. The default factor is 1.

Search TextResult
we have fun^5matches records containing any of we, have, or fun, but giving a greater match relevance score to records containing fun

Required, Optional, & Prohibited Matches

Within the context of a group of terms, any of the terms can be marked as required or prohibited:

  • + - when prepended to a term, requires that the term be matched
  • - - when prepended to a term, prohibits matches on the term
  • <no marker> - when neither marker is prepended to a term, the term is optional
Search TextResult
we -never have +funmatches records containing fun, optionally containing we or have, and not containing never

Range Matches

Inclusive [<term1> TO <term2>] and exclusive {<term1> TO <term2>} matches are supported. Matches where one side is inclusive and the other exclusive are not supported.

Search TextResult
[100 TO 200]

matches records containing any numbers that lexicographically sort between 100 and 200, including 100 and 200

Note

This is a lexicographical comparison, so a search of [100 TO 200] would match 2, 20, or 1AZ

{alpha TO beta}matches records containing words that lexicographically sort between alpha and beta, not including either alpha or beta

Escaping Special Characters

Special characters are escaped with a backslash (\). Special characters include:

+

-

^

"

~

*

?

:

\

&&

||

!

( )

{ }

[ ]

Restrictions

  • The first character of a search term cannot be a wildcard (* or ?)

  • Search terms cannot be any of the following:

    a

    an

    and

    are

    as

    at

    be

    but

    by

    for

    if

    in

    into

    is

    it

    no

    not

    of

    on

    or

    such

    that

    the

    their

    then

    these

    they

    this

    to

    was

    will

    with