.. _publication-metadata-ingestion :

Metadata ingestion
==================

If the data recasting has been performed (see :ref:`publication-recast`), all extracted metadata is available via the OpenSearch description document (OSDD) URL, which is the output of the recast process. We do not have to make any query ourselves, but can instead use that URL as input for a publication web service in order to publish the result metadata on a catalog index, for example our personal index or a public index to allow others to find the new entry.


Automatic ingestion
-------------------

Similarly to the recasting (*analyzeResults*) web service, there is a web service named *dataPublication* for the ingestion/publication the Terradue catalog:
https://recast.terradue.com/t2api/ows?service=WPS&request=DescribeProcess&version=1.0.0&identifier=dataPublication&status=true

It requires the following input parameters:

* ``items``: The OpenSearch description URL of a catalog entry (usually the result of the recast process, see :ref:``).
   
  - e.g. *https://recast.terradue.com/t2api/describe/eo-samples/data-publication-sample*

* ``index``: The destination index on the catalog where the entry will be published.
  
  - e.g. your personal index

* ``category``: Used for the name of the product (optional); if absent, a UUID is generated.
  
  - e.g. *data-publication-sample*

* ``_T2Username``: The name of the Terradue platform user performing the publication operation (requires writing access to the destination index).

* ``_T2ApiKey``: The corresponding Terradue platform API key.
  You can get it from your account profile on terradue.com.

The web service takes an OpenSearch description URL as input (this is the output of the *analyzeResults* web service) and publishes it on the desired index on the Terradue catalog.

As with the recast web service (*analyzeResults), using this web service with Python is very simple; the code snippet below shows the basic usage:

.. code-block:: python

    # Set WPS server information
    wps_url = "https://recast.terradue.com/t2api/ows"
    wps_process_id = "dataPublication"

    # Define parameters

    input_osd = '...'         # Input OpenSearch description URL (see above)
    index_name = '...'        # destination index
    category = '...'          # result identifier/category
    username = '...'          # Ellip username (see above)
    api_key = '...'           # corresponding API key

    wps_inputs = [
                ('items', input_osd),
                ('index', index_name),
                ('category', identifier),
                ('_T2ApiKey', api_key),
                ('_T2Username', username)
    ]

    # Get WPS instance
    wps = WebProcessingService(wps_url)

    # Invoke recast process
    wps_execution = wps.execute(wps_process_id, 
                                wps_inputs, 
                                output = [('result_osd', True)]
    )

    # Poll job status until job finishes (takes a few seconds)
    monitorExecution(wps_execution, sleepSecs=10)


The following Jupyter notebook demonstrates more in detail how to use Python code to perform the ingestion of a metadata entry previously generated via recast:

`Publish metadata on the Terradue catalog <notebooks/publish-metadata.ipynb>`_ (step 3).


Manual ingestion
----------------

Alternatively, a metadata file generated locally previously can be fed into the appropriate index on the Terradue catalog.

For more details on the Terradue catalog API that is used below see the section :ref:`catalog-api`.

All this can be achieved using Python code, as in the following snippet:

.. code-block:: python

    import requests

    # Set variables
    atom_file = '...'         # path to file containing the ATOM feed to be sent to the catalog
    index_name = '...'        # destination_index
    username = '...'          # Ellip username (see above)
    api_key = '...'           # corresponding API key

    # Open file to send
    content = open(atom_file, 'rb').read()

    # Send file (using the HTTP PUT method)
    request = requests.post("https://catalog.terradue.com/{0}".format(index_name),
                            headers={"Content-Type": "application/atom+xml", "Accept": "application/xml"},
                            auth=(username, api_key),
                            data=content
    )
  

The entire process, including the creation of the metadata to the ingestion can be found in this Jupyter notebook:

`Generate an ATOM entry for the catalog and perform its ingestion <notebooks/post-atom.ipynb>`_.


It is also possible to use basic shell commands to feed an entry (created entirely manually, as described :ref:`here <publication-metadata-generation-manual>`) into the Terradue catalog, using commands like the following:

.. code-block:: bash

    username=...   # Ellip username
    api_key=...    # corresponding API key
    atom_file=...  # path to file containing the ATOM feed to be sent to the catalog
    index_name=... # catalog destination index
    
    curl -u "$username:$api_key" -X POST -H "Content-type: application/atom+xml" -T "$atom_file" "https://catalog.terradue.com/${index_name}"