The Sandbox, short of Terradue’s Developer Cloud Sandbox, is a Virtual Machine (VM) running the CentOS 6.x Linux distribution. This VM has the complete set of tools of the CIOP framework (the ‘ciop’ command line tools). Furthermore it has CDH (Cloudera’s Distribution of Apache Hadoop) installed in Pseudo-Distributed mode [1].
In order to have the code locally available on your Sandbox, you have to clone the Hands-On git repository available on the Terradue’s Github organization. To do so just:
cd
git clone https://github.com/Terradue/dcs-hands-on.git
Congrats
You have now the resources needed to complete all the Hands-On !
The Hands-On exercises are implemented with 2 different languages: bash and python. You can choose which language to use. All the Hands-On exercises relate to bash by default, but you can choose to use python, and the behaviour will not change.
mvn clean install -D hands.on=1 -P bash
mvn clean install -D hands.on=1 -P python
If you want to select python as programming language for the Hands-On exercises, you have to install the Anaconda distribution and the cioppy package. To do that just type:
sudo yum install -y miniconda
sudo conda install -y cioppy
ciop-run my_node
The dollar sign ($) at the beginning of each line indicates the Linux shell prompt. The actual prompt will include additional information (e.g. [user@sb-10-15-10-10.terradue.int]$ ) but it is omitted from these instructions for brevity.
echo $_CIOP_APPLICATION_PATH
/application
echo $HOSTNAME
The output will be similar to:
sb-xx-xx-xx-xx.lab.terradue.int
sudo yum install <package name>
sudo conda install <package name>
Note
As the exercises progress and you gain more familiarity with the CIOP framework, the Hadoop and the MapReduce, we provide fewer step by step instructions. You feel free to ask to us for explanations or doubts using our Support Site https://support.terradue.com. We’ll be happy to help !
[1] | Pseudo-distributed mode is a method of running Hadoop whereby all Hadoop daemons run on the same machine. It is a cluster consisting of a single machine. It works just like a larger cluster, the only key difference (apart from the speed, of course !) being that the block replication factor is set to 1 (normally in a Hadoop Cluster the blocks on HDFS have a replication factor of 3). |