General Notes

The Sandbox, short of Terradue’s Developer Cloud Sandbox, is a Virtual Machine (VM) running the CentOS 6.x Linux distribution. This VM has the complete set of tools of the CIOP framework (the ‘ciop’ command line tools). Furthermore it has CDH (Cloudera’s Distribution of Apache Hadoop) installed in Pseudo-Distributed mode [1].

Clone the Hands-On repository

In order to have the code locally available on your Sandbox, you have to clone the Hands-On git repository available on the Terradue’s Github organization. To do so just:

  • Log on your Sandbox (see Connect to your Sandbox)
  • Type:
cd
git clone https://github.com/Terradue/dcs-hands-on.git

Congrats

You have now the resources needed to complete all the Hands-On!

Programming languages

The Hands-On exercises are implemented with 2 different languages: bash and python. You can choose which language to use. All the Hands-On exercises relate to bash by default, but you can choose to use python, and the behaviour will not change.

The dollar sign ($) at the beginning of each line indicates the Linux shell prompt. The actual prompt will include additional information (e.g. [user@sb-10-15-10-10.terradue.int]$ ) but it is omitted from these instructions for brevity.

  • For example, for the Hands-On 1, to select bash type:
mvn clean install -D hands.on=1 -P bash
  • For the same Hands-On, to select python type:
mvn clean install -D hands.on=1 -P python

Prerequisites when using python

If you want to select python as programming language for the Hands-On exercises, you have to install the Anaconda distribution and the cioppy package. To do that just type:

sudo yum install -y miniconda
sudo conda install -y cioppy

Using the code examples

  • In some command-line steps in the exercises, you will see lines like this:
ciop-run my_node
  • If not otherwise specified, all the commands of these Hands-On refer to the $_CIOP_APPLICATION_PATH path:
echo $_CIOP_APPLICATION_PATH
/application
  • Sometimes the Hands-On refer to the variable $HOSTNAME. To obtain its value type:
echo $HOSTNAME

The output will be similar to:

sb-xx-xx-xx-xx.lab.terradue.int

Install additional software

  • You can install the software you need by using the yum command since the sandbox user (usually your username) has sudo privileges for yum:
sudo yum install <package name>
  • You can also install the Python packages you need by using the conda command since the sandbox user (usually your username) has sudo privileges for conda:
sudo conda install <package name>

Note

As the exercises progress and you gain more familiarity with the CIOP framework, the Hadoop and the MapReduce, we provide fewer step-by-step instructions. You feel free to ask to us for explanations or doubts using our Support Site https://support.terradue.com. We’ll be happy to help!

[1]Pseudo-distributed mode is a method of running Hadoop whereby all Hadoop daemons run on the same machine. It is a cluster consisting of a single machine. It works just like a larger cluster, the only key difference (apart from the speed, of course!) being that the block replication factor is set to 1 (normally in a Hadoop Cluster the blocks on HDFS have a replication factor of 3).