Apache whirr basic tutorial explained
Apache Whirr is an open-source Java API library for creating/setup a Hadoop cluster on different cloud instance services.
It also provides command-line tools to launch Hadoop services.
Whirr tool uses
JCloud API in middle to interact with different cloud providers.
Apache Whirr provides the following advantages.
- No need of providing scripts for each cloud provider to execute and deploy cloud services
- Common API to interact with different cloud providers for provisioning
- Install/configure/setup/deploy Hadoop clusters very quickly in minutes
If you see the
whirr recipe folder of whirr software package, the following cloud providers and services are supported
Whir supported cloud providers:
- Amazon cloud:- Very easily we can set up Hadoop on the amazon ec2 instance. Launch clusters dynamically and destroy clusters when not required
- Rackspace cloud:-
- Open stack Cloud
Whirr supported services:
How to install Whir on a local instance
For setup and installation whirr on any instance, java is a required thing.
First download whir from apache mirror site🔗 Extract whirr tarball
$ tar -xzvf whirr-0.8.0.tar.gz
$ cd whirr-0.8.0
Set PATH environment variable for the whirr
to Test whether
whirr is working or not
$ whirr version
Apache Whirr 0.8.2
above command display version of installed whirr package To configure any cloud providers, users have to write whirr.properties that have roles and cluster information
whirr.cluster-name=name of the cluster
whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,1 hadoop-datanode+hadoop-tasktracker different roles and services
whirr.provider=provide cloud provider here
whirr.identity=provide access key if of cloud provider instance
whirr.credential=secret access key or cloud provider instance
whirr.private-key-file= private key file of cloud provider
whirr.public-key-file=public key file of cloud provider
That’s it on my understanding of
Apache Whirr. Please comment below for any questions on this.