Run executable design patterns

A processing task triggers the execution of a run executable.

The run executable reads the inputs via the stdin channel as if one would do:

$ echo "file1\nfile2" | myExecutable

This section defines typical design patterns which include:

  • sourcing the required libraries (bash and Python are available; R is in beta)
  • get parameter values
  • reading the stdin
  • stage-in/out
  • apply user processing

There are two standard design patterns:

  • Process n inputs to generate n (or m) outputs (parallel)
  • Process n inputs to generate one output (aggregation)

And there auxiliary nodes that do not process the inputs but arrange and/or combine them for subsequent nodes in the workflow.

Process n inputs and generate n (or m) outputs

This design pattern processes inputs independently from one another. There will several processing task processing a number of inputs each.

The typical structure of such a run executable is:

!define DIAG_NAME Workflow example

!include includes/skins.iuml

skinparam backgroundColor #FFFFFF
skinparam componentStyle uml2

start

:Source libraries;

:Get parameter values;

while (check stdin?) is (line)
  :Stage-in data;
  :Apply user application;
  :Stage-out result;
endwhile (empty)

stop

Below you find templates implementing this design pattern:

Bash streaming executable

R streaming executable

Python streaming executable

Process n inputs to generate one output

This design pattern processes all inputs to generate the result. There will one single processing task processing all the inputs.

The typical structure of such a run executable is:

!define DIAG_NAME Workflow example

!include includes/skins.iuml

skinparam backgroundColor #FFFFFF
skinparam componentStyle uml2

start

:Source libraries;

:Get parameter values;

while (check stdin?) is (line)
  :Stage-in data;
endwhile (empty)

:Apply user application;
:Stage-out result;

stop

Auxiliary nodes

Auxiliary nodes are needed when the output of a node cannot be directly processed by the subsequent nodes (e.g. parallel processing would not be possible).

These nodes usually process the data by reference (no stage-in) and combines or arranges these references and provides those references as outputs.

Typical examples are:

  • Group catalogue products by periods of time (e.g. produce daily aggregated products)
  • Couple RADAR SAR master/slave images in the interferometry domain