Hands-On Exercise 9: using an OpenSearch catalogue¶
In this exercise we will use an OpenSearch [1] catalogue URL as the input source of the workflow, and query the catalogue with OpenSearch parameters to get the input products.
Prerequisites¶
- You have cloned the Hands-On git repository (see Clone the Hands-On repository),
- (Only for python) You have installed the required software (see Prerequisites when using python),
- You have installed the BEAM Toolbox (see Install the BEAM Toolbox).
Install the Hands-On¶
- Install the Hands-On Exercise 9, just type:
cd
cd dcs-hands-on
mvn clean install -D hands.on=9 -P bash
Inspect the application.xml¶
- Inspect the application.xml, it is slightly different than the one of the previous exercise a multi-node workflow:
<?xml version="1.0" encoding="UTF-8"?>
<application id="beam_arithm">
<jobTemplates>
<!-- BEAM BandMaths operator job template -->
<jobTemplate id="expression">
<streamingExecutable>/application/expression/run</streamingExecutable>
<defaultParameters>
<parameter id="expression">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
<parameter id="startdate" type="opensearch" target="time:start">2012-04-06T10:24:29.000Z</parameter>
<parameter id="enddate" type="opensearch" target="time:end">2012-04-07</parameter>
<parameter id="qbbox" type="opensearch" target="geo:box">2.99,58.45,0.53,58.26</parameter>
</defaultParameters>
</jobTemplate>
<!-- BEAM Level 3 processor job template -->
<jobTemplate id="binning">
<streamingExecutable>/application/binning/run</streamingExecutable>
<defaultParameters>
<parameter id="cellsize">9.28</parameter>
<parameter id="bandname">out</parameter>
<parameter id="bitmask">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
<parameter id="bbox">-180,-90,180,90</parameter>
<parameter id="algorithm">MIN_MAX</parameter>
<parameter id="outputname">binned</parameter>
<parameter id="resampling">binning</parameter>
<parameter id="palette">#MCI_Palette
color0=0,0,0
color1=0,0,154
color2=54,99,250
color3=110,201,136
color4=166,245,8
color5=222,224,0
color6=234,136,0
color7=245,47,0
color8=255,255,255
numPoints=9
sample0=98.19878118960284
sample1=98.64947122314665
sample2=99.10016125669047
sample3=99.5508512902343
sample4=100.0015413237781
sample5=100.4522313573219
sample6=100.90292139086574
sample7=101.35361142440956
sample8=101.80430145795337</parameter>
<parameter id="band">1</parameter>
<parameter id="tailor">true</parameter>
</defaultParameters>
<defaultJobconf>
<property id="ciop.job.max.tasks">1</property>
</defaultJobconf>
</jobTemplate>
</jobTemplates>
<workflow id="hands-on-9" title="Exercise, using an OpenSearch catalogue" abstract="Exercise 9, using an OpenSearch catalogue">
<workflowVersion>1.0</workflowVersion>
<node id="node_expression">
<job id="expression"></job>
<sources>
<source refid="cas:series">https://catalog.terradue.com/eo-samples/series/mer_rr__1p/description</source>
</sources>
<parameters>
</parameters>
</node>
<node id="node_binning">
<job id="binning"></job>
<sources>
<source refid="wf:node">node_expression</source>
</sources>
<parameters>
<parameter id="bitmask"/>
</parameters>
</node>
</workflow>
</application>
Note the different source in the node_expression:
<node id="node_expression">
<job id="expression"></job>
<sources>
<source refid="cas:series">https://catalog.terradue.com/eo-samples/series/mer_rr__1p/description</source>
</sources>
<parameters>
</parameters>
</node>
The source is the URL of an OpenSearch description document of a catalogue series. That XML document contains information on how to query products of a certain type or collection (or series), i.e. URL templates and the parameter descriptions.
Note also the OpenSearch parameters (those with type="opensearch"
and a target attribute) defined in the jobTemplate expression:
<jobTemplate id="expression">
<streamingExecutable>/application/expression/run</streamingExecutable>
<defaultParameters>
<parameter id="expression">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
<parameter id="startdate" type="opensearch" target="time:start">2012-04-06T10:24:29.000Z</parameter>
<parameter id="enddate" type="opensearch" target="time:end">2012-04-07</parameter>
<parameter id="qbbox" type="opensearch" target="geo:box">2.99,58.45,0.53,58.26</parameter>
</defaultParameters>
</jobTemplate>
Catalogue query¶
This section is just an excursion intended to help understand how the products are retrieved using OpenSearch.
We saw the URL of the OpenSearch description document that contains URL templates for searching. It contains a number of URL templates to obtain product metadata in different formats.
The URL template used for querying the products is this:
Note
URL template for product query
https://catalog.terradue.com:443//eo-samples/series/mer_rr__1p/search?format=atomeop&count={count?}&startPage={startPage?}&startIndex={startIndex?}&q={searchTerms?}&lang={language?}&update={dct:modified?}&do={t2:downloadOrigin?}&start={time:start?}&stop={time:end?}&trel={time:relation?}&bbox={geo:box?}&uid={geo:uid?}&geom={geo:geometry?}&rel={geo:relation?}&cat={dc:subject?}&psn={eop:platform?}&isn={eop:instrument?}&st={eop:sensorType?}&pl={eop:processingLevel?}&ot={eop:orbitType?}&title={eop:title?}&pi={eop:parentIdentifier?}&od={eop:orbitDirection?}&lc={t2:landCover?}&dcg={t2:doubleCheckGeometry?}
The highlighted parts (time:start, time:end and geo:box) are unique identifiers that refer to standardised search criteria; in this case the temporal range (defined by start and end time) and the geographical area (the WGS 84 coordinates and the bottom-left or top-right bounding box) the desired products have to match (the question mark after the identifier means that the parameter is optional for the search).
The parameters in the <defaultParameters> section use these three identifiers as targets. This means, when the actual product query is performed, the curly bracket portions are replaced with the text content of those XML elements or an empty string if no default value has been provided.
The replacement results in this query URL (shortened for readability):
Note
Resolved URL for query
https://catalog.terradue.com:443//eo-samples/series/mer_rr__1p/search?format=atomeop&start=2012-04-06T10:24:29.000Z&stop=2012-04-07&bbox=2.99,58.45,0.53,58.26
At that URL we find an ATOM XML document containing two entries (that correspond to 2 products). Within these entries, there are the download URLs of the actual product files.
The sandbox framework downloads the products and at this point we have the same products as in the previous exercice Hands-On Exercise 6: a multi-node workflow.
Using OpenSearch makes the application more flexible since it can be used with search parameters so that different input data can be used without changing input file lists on the sandbox host.
Run and debug the workflow¶
- Run the node_expression:
ciop-run node_expression
- Copy the Tracking URL and paste it in a browser,
- Check the log of one of the two Tasks, as described in Exercise 2: make a robust workflow and debug it. It will be similar to:
Note that the input product is now downloaded from an external repository, resulting from the initial OpenSearch query to the catalogue, and from the way that catalogue references datasets.
Recap¶
- We used an OpenSearch catalogue as the source of the first node of the workflow;
- We defined a number of OpenSearch parameters to query the catalogue;
- We processed the query results in the node_expression;
- We learnt that by using OpenSearch job parameters we can determine the input products.
Footnotes
[1] | OpenSearch |