.. _catalog-search-api: Search API Reference ==================== This API reference is organized by actions and features. Each resource type has one or more data representations and one or more methods. .. _catalog-search-standard: OpenSearch standard search -------------------------- The basic usage of OpenSearch interfaces is described in the section about :ref:`opensearch-1`. The possible search parameter are by the OpenSearch description document, there are. With few exceptions there have to be exact matches for entries to be listed in a search result. Typical filter parameters that can be used are the following: .. list-table:: :widths: 20 10 50 :header-rows: 1 * - Identifier(s) - Purpose - Remarks * - start/stop - Specify the **start and the end** of the temporal range products have to match. - Should always be used together. * - trel - Specifies what **relation** a product's **time** must have with the temporal range defined by the values for the search parameters ``start``/``stop`` in order to create a match. - The default value is *intersects*. * - geom - Specifies the **area of interest**. - WKT format is used, usually the geometry is a POLYGON. * - bbox - Specifies the area of interest as a **bounding box**. - 2 pairs of coordinates for the bottom-left and top-right corner of the rectangle. * - dcg - Specifies whether to **double-check the geometry** for more precise filtering. - TODO: values? * - rel - Specifies what **relation** a product's **spatial geometry** must have with the geometry defined by the ``geom`` or ``bbox`` in order to create a match. - The default value is *intersects*. * - uid - Specifies the **unique identifier** of the product to be matched. - This returns one product. * - cat - Specifies the **category** matching products must belong to. - Categories are often used to group products of an index into different series and are used as a filter for series. * - pt - Specifies the **product type** that products have to match. - The list of possible values depends on the searched catalog index. * - psn - Specifies the **short name of the platform** (satellite) from which matching products derive. - The list of possible values depends on the searched catalog index. * - psi - Specifies the **serial identifier of the platform** (satellite) from which matching products derive. - The list of possible values depends on the searched catalog index. * - isn - Specifies the **short name of the instrument** from which matching products derive. - The list of possible values depends on the searched catalog index. * - st - Specifies the **sensor type** that products have to match. - The list of possible values depends on the searched catalog index, typical values are *OPTICAL* or *RADAR*. * - od - Specifies the **orbit direction** that products have to match. - The possible values are *ASCENDING* and *DESCENDING*. * - ot - Specifies the **orbit type** that products have to match. - The only possible value is usually *LEO*. * - title - Specifies the **title** that a product has to match. - Using this search parameter usually results in a single match. * - track - Specifies the orbit **track number** that a product has to match. - The value is expressed as number, set or interval. * - swath - Specifies the **swath** identifier. - The list of possible values depends on the searched catalog index. * - lc - Specifies the **land coverage** that products have to match. - The value is expressed as number, set or interval. * - cc - Specifies the **cloud coverage** that products have to match. - The value is expressed as number, set or interval. * - pi - Specifies the **parent identifier**, or the **collection** of the entry in a hierarchy of datasets. - The list of possible values depends on the searched catalog index. The following search parameters do not represent filter criteria, but influence the search result: .. list-table:: :widths: 20 10 50 :header-rows: 1 * - Identifier - Purpose - Remarks * - count - Specifies the **number of entries** to be returned. - The default is usually 20 and the maximum allowed is usually 200. * - startIndex - Specifies the **index of the first entry** of the result to be returned (alternative to ``startPage``). - The first entry usually has the index 1, this is specified in the OpenSearch description document related to the search. * - startPage - Specifies the number of the **result page** to be returned (alternative to ``startIndex``). - A page has as many entries as the value (or default value) of ``count``. The first page usually has the index 1, this is specified in the OpenSearch description document related to the search. * - do - Specifies the **download origin** (keyword, hostname...) to adapt the enclosure, i.e. the URL of the related downloadable resource. - If the parameter is enclosed between [] (e.g. [terradue]), enclosure will be returned only if there is an enclosure found for this source. .. _catalog-search-free: Free text search ---------------- There is an OpenSearch parameter, usually named **q** (*searchTerms*) that allows more sophisticated free-text searches. The free text input performs a handy quick meaning search using a consistent notation rule for expressing a phrase and logical operators. The query string is parsed into a series of terms and operators. A term can be a single word — L0 or SLC — or a phrase, surrounded by double quotes — "Synthetic Aperture Radar" — which searches for all the words in the phrase, in the same order. Catalog Metadata fields ^^^^^^^^^^^^^^^^^^^^^^^^^ A dataset is described by a set of fields. Some of them are accessible for specific filter: .. list-table:: :widths: 20 10 50 :header-rows: 1 * - Field name - Type - Description * - isn - string - Instrument short name * - ot - string - Orbit type * - pl - string - Processing level * - psn - string - Platform short name * - psi - string - Platform serial identifier * - pt - string - Product type * - sr - string - Sensor spectral range * - st - string - Sensor type * - summary - string - Abstract of the dataset * - title - string - Title of the dataset * - track - number - Orbit track Field names ^^^^^^^^^^^ All fields of the metadata of the dataset is searched for the search terms, but it is possible to specify other fields in the query syntax: - where the track field is 15 :: track:15 - where the swath field contains IW or EW. If you omit the OR operator the default operator will be used :: swath:(IW OR EW) - where the instrument description field contains the exact phrase "Synthetic Aperture Radar" :: instrumentDescription:"Synthetic Aperture Radar" - where any of the fields processingInformation.method, processingInformation.processorName or processingInformation.processingLevel contains L1 or Level 1 (note how we need to escape the * with a backslash): :: processingInformation.\*:(L1 "Level 1") - where the field title has no value (or is missing): :: _missing_:title - where the field title has any non-null value: :: _exists_:title Wildcards ^^^^^^^^^ Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters: :: IW? S1* Be aware that wildcard queries can use an enormous amount of memory and perform very badly — just think how many terms need to be queried to match the query string "a* b* c*". .. warning:: Allowing a wildcard at the beginning of a word (eg "\*ing") is particularly heavy, because all terms in the index need to be examined, just in case they match. Wildcarded terms are not analyzed by default — they are lowercased but no further analysis is done, mainly because it is impossible to accurately analyze a word that is missing some of its letters. Regular expressions +++++++++++++++++++ Regular expression patterns can be embedded in the query string by wrapping them in forward-slashes ("/"): :: parentIdentifier:/[EI]W_SLC__1SS.?/ The supported regular expression syntax is explained in :doc:`Regular expression syntax `. .. WARNING:: A query string such as the following would force Elasticsearch to visit every term in the index: :: /.\*n/ Use with caution! Fuzziness ^^^^^^^^^ We can search for terms that are similar to, but not exactly like our search terms, using the “fuzzy” operator: :: sent~ rdar~ This uses the Damerau-Levenshtein distance to find all terms with a maximum of two changes, where a change is the insertion, deletion or substitution of a single character, or transposition of two adjacent characters. The default edit distance is 2, but an edit distance of 1 should be sufficient to catch 80% of all human misspellings. It can be specified as: :: quikc~1 Ranges ^^^^^^ Ranges can be specified for date, numeric or string fields. Inclusive ranges are specified with square brackets [min TO max] and exclusive ranges with curly brackets {min TO max}. All days in 2012: :: startDate:[2012-01-01 TO 2012-12-31] Track 1..5 :: track:[1 TO 5] Topic categories between alpha and omega, excluding alpha and omega: :: tc:{alpha TO omega} Processing Level from L1 upwards :: pl:[L1 TO *] modified before 2012 :: modified:{* TO 2012-01-01} Curly and square brackets can be combined: Numbers from 1 up to but not including 5 :: track:[1 TO 5} Ranges with one side unbounded can use the following syntax: :: orbitNumber:>10 orbitNumber:>=10 orbitNumber:<10 orbitNumber:<=10 Note To combine an upper and lower bound with the simplified syntax, you would need to join two clauses with an AND operator: :: orbitNumber:(>=10 AND <20) orbitNumber:(+>=10 +<20) The parsing of ranges in query strings can be complex and error prone. It is much more reliable to use an explicit range filter. Boosting ^^^^^^^^ Use the boost operator ^ to make one term more relevant than another. For instance, if we want to find all datasets in dual polarisation, but we are especially interested in dual polarisation in IW swath: :: som:IW_DP^2 pm:D The default boost value is 1, but can be any positive floating point number. Boosts between 0 and 1 reduce relevance. Boosts can also be applied to phrases or to groups: :: "Synthetic Aperture Radar"^2 (IW_DP SAR)^4 Boolean operators ^^^^^^^^^^^^^^^^^ By default, all terms are optional, as long as one term matches. A search for sar msi atsr will find any document that contains one or more of sar or msi or atsr. We have already discussed the default operator above which allows you to force all terms to be required, but there are also boolean operators which can be used in the query string itself to provide more control. The preferred operators are + (this term must be present) and - (this term must not be present). All other terms are optional. For example, this query: :: S1A SAR +IW -EW states that: IW must be present EW must not be present S1A and SAR are optional — their presence increases the relevance The familiar operators AND, OR and NOT (also written &&, || and !) are also supported. However, the effects of these operators can be more complicated than is obvious at first glance. NOT takes precedence over AND, which takes precedence over OR. While the + and - only affect the term to the right of the operator, AND and OR can affect the terms to the left and right. Rewriting the above query using AND, OR and NOT demonstrates the complexity: :: S1A OR SAR AND IW AND NOT EW This is incorrect, because SAR is now a required term. :: (S1A OR SAR) AND IW AND NOT EW This is incorrect because at least one of S1A or SAR is now required and the search for those terms would be scored differently from the original query. :: ((S1A AND IW) OR (SAR AND IW) OR IW) AND NOT EW This form now replicates the logic from the original query correctly, but the relevance scoring bares little resemblance to the original. Grouping ^^^^^^^^ Multiple terms or clauses can be grouped together with parentheses, to form sub-queries: :: (S1A OR SAR) AND IW Groups can be used to target a particular field, or to boost the result of a sub-query: :: status:(archived OR planned) at:(nominal calibration)^2 Reserved characters ^^^^^^^^^^^^^^^^^^^ If you need to use any of the characters which function as operators in your query itself (and not as operators), then you should escape them with a leading backslash. For instance, to search for (1+1)=2, you would need to write your query as \(1\+1\)\=2. The reserved characters are: :: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ / Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.