In the context of your application development life-cycle, the Sandbox provides you with three filesystems (or directories):
A user’s home directory is intended to contain the user’s files; possibly including text documents, pictures or videos, etc. It may also include the configuration files of preferred settings for any software you have used there, and that you might have tailored to your liking: web browser bookmarks, favorite desktop wallpaper and themes, passwords to any external services accessed via a given software, etc. The user can install executable software in this directory, but it will only be available to users with permission to this directory. The home directory can be organized further with the use of sub-directories.
As such, the HOME is used to store the user’s files. It can be used to store source files (the compiled programs would then go APPLICATION).
Note
At job or workflow execution time, the Sandbox uses a system user to execute the application. This system user cannot read files in HOME. When the application is ran on a Production Environment (cluster mode), the HOME directory is no longer available in any of the computing nodes.
The APPLICATION filesystem contains all the files required to run the application.
The APPLICATION filesystem is available on the Sandbox as /application.
Note
Whenever an application wrapper script needs to refer the APPLICATION value (/application), use the variable $_CIOP_APPLICATION_PATH, example:
export BEAM_HOME=$_CIOP_APPLICATION_PATH/common/beam-4.11
The APPLICATION contains
See also
The Application Descriptor file is described in Application descriptor reference
A job template folder contains:
There isn’t a defined naming convention although it is often called run with an extension:
Note
The streaming executable script will read its inputs via stdin managed by the Hadoop Map Reduce streaming underlying layer
Note
There aren’t any particular rules for the folders in the job template folder
The APPLICATION of a workflow with two jobs can then be represented as
/application/
application.xml
/job_template_1
run.sh
/bin
/etc
/job_template_2
run.sh
/bin
/lib
The DAG helps you to sequence your Application workflow with simple rules. For the Hadoop Map/Reduce programming framework, a workflow is subject to constraints implying that certain tasks must be performed earlier than others.
The application nodes of the DAG can be Mappers, Reducers or (starting from ciop v1.2) Map/Reduce Hadoop jobs.
The Developer Cloud Sandbox environment builds on a “shared-nothing” architecture that partitions and distributes each large dataset to the disks attached directly to the worker nodes of the cluster. Hadoop will split (distribute) the standard input of a Job to each task created on the cluster. A task is created from a Job template. The input split depends on the number of available task slots. The number of task slots depends on the cluster dimension (the number of worker nodes).
In the Developer Cloud Sandbox environment (pseudo-cluster mode), the cluster dimension is 1 and the number of the available task slots is 2 (running on a 2-Cores CPU).
In the IaaS Production environment (cluster mode), the cluster dimension is n (the servers provisioned on the cluster) and the number of available tasks slots is n x m (m-Cores CPU of the provisioned server type).
The application descriptor file contains the definition of the application, and is composed of two sections:
The application descriptor is an XML file managed on the Sandbox APPLICATION filesystem, and is located as $_CIOP_APPLICATION_PATH/application.xml (the value of $_CIOP_APPLICATION_PATH is “/application”)
See also
The Application Descriptor file structure is documented in Application descriptor reference
Tip
Check that your application descriptor file is well formed with the ciop-appcheck utility