Data upload¶
For the correct data publication it is essential that the relevant files that belong to the output of an application run are present on a repository on the platform. This can either happen automatically after the processing or by a manual upload.
Automatic upload¶
When an application runs on a production centre, its results are usually uploaded to a repository directory with a predictable URL. This is URL is composed as follows:
<repository-base-url>/<repository-name>/_results/workflows/<app_name>/run/<process-id>/
The parts of the URL contain the following information:
<repository-base-url>
: The repository’s base URL (usually https://store.terradue.com)<repository-name>
: It is usually the platform username assigned to you, your group or your project, and also the name of the related catalog index.<app_name>
: The identifier (including the version number) of the WPS process offering providing providing the interface to run the application on the production centre.<process-id>
: The production centre (Oozie) ID of the job that produced the output.
For example:
https://store.terradue.com/myrepo/_results/workflows/my_app_identifier_0_7/run/2c78c91c-39ba-11e9-9804-0242ac110004/
Manual upload¶
Manual upload of data allows you to upload outputs generated outside the platform and to specify more user-friendly locations.
To upload data on a repository, you need the following:
- The repository base URL (usually https://store.terradue.com),
- A repository name (which is usually the platform username assigned to you, or a specific one assigned to your group or your project),
- A platform username for an account with write permission on the above repository (this can be the repository’s own account, but it might be a personal user account), and
- The API key for that account.
For more details on the Terradue storage API that is used below see the section Storage.
Data Preparation¶
Before uploading your data, it is important to “prepare and organise” them so that they can be published and visualised correctly. This is especially important if the recast service is used.
This includes proper naming of files, i.e. making sure that files that logically belong together share the same base name. There might be different facets of a single result entry that is reflected through groups of files that form datasets that are analysed separately by the recast service.
For example, the following files …
Flood_Map_20170405_max_extent.shp
Flood_Map_20170405_max_extent.shx
Flood_Map_20170405_max_extent.prj
Flood_Map_20170405_max_extent.dbf
Flood_Map_20170405_max_extent.png
Flood_Map_20170405_max_extent.pngw
Flood_Map_20170405_max_extent.properties
Flood_Map_20170405_freq.tif
Flood_Map_20170405_freq.rgb.tif
Flood_Map_20170405_freq.tif.legend.png
Flood_Map_20170405_freq.png
Flood_Map_20170405_freq.pngw
Flood_Map_20170405_freq.xml
logs.zip
… would be organised in these groups, using the base name as an indication:
Flood_Map_20170405_max_extent
├── Shapefile (zip)
├── PNG
├── PNGW
├── Properties
└── Legends (PNG)
Flood_Map_20170405_freq
├── GeoTiff
├── PNG
├── PNGW
├── XML
└── Legends (PNG)
logs.zip
As it will become clear in the following step (Metadata generation), it is useful to provide as much accompanying metadata as possible for the recast process. The easiest way of doing that is via a .properties file (which can also seen in the example above). The .properties file is a simple text file containing key/value pairs with basic metadata information. A .properties file is formatted like this, with key and value separeted by =, with one field per line:
key1=value1
key2=value2
...
The following property keys are recognised (list not exhaustive):
Key | Usage/remarks |
---|---|
title | Defines the title visible in the list of result. |
date | Sets start date/time and end date/time values (format: startdate/enddate). A single date sets an instant time. The format is ISO-8601, i.e. YYYY-MM-DDThh:mm:ss.ffffZ . |
bbox | Geo-references the dataset using the format minX,minY,maxX,minY format . |
envelope | Geo-references the dataset using the format Env[minX : maxX, minY : maxY] . |
geometry | georeferences the dataset using WKT format. |
image_url | Sets a URL to a thumbnail image. |
copyright | Adds credits in the summary table. |
Upload URL¶
The URL on the repository to which the output file has to be uploaded, has to follow certain rules and shall conform to the following pattern:
<repository-base-url>/<repository-name>/<org-path>/files/<version>/<product-path>
The first two parts are explained above. The other parts contain the following information:
<org-path>
: A path structure reflecting the product type and the date or period the product covers. It should be composed like this:<product-type>/<yyyy>/<mm>/<dd>
. The<product-type>
part can be subdivided into further levels. The<yyyy>/<mm>/<dd>
part for the product date does not have to be fully respected for products that regularly cover more than one day. In that case, the<dd>
portion can be omitted, otherwise, for products spans several periods, the one referring to the first date should be used.<version>
: A string indicating the repository version. It usually has the value v1.<product-path>
: The relative path to the product. The way this is organised for each product type can be decided based on the specifics of the products. Make sure, however, that it is consistent for products of the same type. Files belonging to one product should reside in a single directory and separated from other products’ files.
Only the first part of the URL (<repository-base-url>/<repository-name>/
) is mandatory (e.g. http://store.terradue.com/eo-samples/
), but it is recommended to follow the indicated rules as closely as possible.
A valid destination URL for a single file of a product could look like this:
http://store.terradue.com/eo-samples/data-publication-sample/data-publication-sample.tif
File upload via Python¶
The following Python snippet shows how the file upload can be done in Python.
import requests
# Set variables
local_file = '...' # path to file to be uploaded in the local file system
store_url = '...' # destination URL for file on the Terradue storage (see above)
content_type = '...' # content type (MIME type) of file
username = '...' # Ellip username (see above)
api_key = '...' # corresponding API key
# Open file to send
content = open(local_file, 'rb').read()
# Send file (using the HTTP PUT method)
res = requests.put(url="{0}/{1}".format(store_url, os.path.basename(local_file)),
headers={"Content-Type": content_type},
auth=(username, api_key),
data=content
)
The following Jupyter notebook demonstrates more in detail how to use Python code to upload files to a repository on the Terradue storage:
Alternative: Manual upload via shell commands¶
The files can also be uploaded done with the widely used curl command as seen in the following code snippet (the variable settings before the call are made for clarity and readability):
username=... # Ellip username (see above)
api_key=... # corresponding API key
file=... # path to file to be uploaded in the local file system
url=... # destination URL for file on the Terradue storage (see above)
curl -u "$username:$api_key" -X PUT -T "$file" "$url"
The server response will look similar to this:
{
"uri": "https://store.terradue.com/myproducttype/2019/02/27/files/v1/high-res/mp_20190227_102238_ABC/mp_20190227_102238_ABC.tif",
"downloadUri": "https://store.terradue.com/myproducttype/2019/02/27/files/v1/high-res/mp_20190227_102238_ABC/mp_20190227_102238_ABC.tif",
"repo": "myrepo",
"path": "myproducttype/2019/02/27/files/v1/high-res/mp_20190227_102238_ABC/mp_20190227_102238_ABC.tif",
"created": "2019-03-01T10:55:20.034Z",
"createdBy": "user",
"size": "7654321",
"mimeType": "application/octet",
"checksums": {
"md5" : "23285d1984632347b6ad60800a26531b",
"sha1" : "1add322bfc564ef570e334517d60f69d941607d2"
},
"originalChecksums": {
"md5" : "23285d1984632347b6ad60800a26531b",
"sha1" : "1add322bfc564ef570e334517d60f69d941607d2"
}
}
The correct link for further reference of the file is the one provided by the downloadUri
of the response.
The upload has to be repeated for every single file that belongs to an output. Make sure all files that are logically part of the same output product are in the same folder and that there are separate folders for each product.