What is hive in Hadoop?

Thursday, November 24, 2011 0 Comments

Hive is open source framework developed in java, and one of sub component of hadoop system, developed by facebook on top of hadoop hdfs system.

I have already blogged about the basics of HDFS Basics of HDFS in Hadoop.Hive can be used to accesse the data (files in hdfs ) stored in hadoop distributed file system or data stored in HBase.
and Map reduce is java framework to process the data parallelly

Hive can be used to analyze the large amount of data  on Hadoop without knowing java map reduce programming.

Hive provides hive query language (HQL). which is simpilary to Structured query language. Hive provides all queries with minimal ansi sql support.
if we wan to support complex query features like aggregation, custom functions, in that case we have to write custom map reduce program that can be plugged to hive sql repository.

Execute Hive Queries?:-
Hive provides command line interfaces platform i.e hive shell for executing hive queries. You can write the queries in shell script and call the shell script. This hive queries calls the map reduce jobs  and query, process the data.

Hive Advantages:-
1.Hive is built on hadoo, so supports and handles all the capablities of hadoop provides like reliable, high avialble ,nodefailure,commodatiy hardware
2.Database developer need not to learn the java programming for writing map reduce programs for retrieving data from hadoop system.

Hive Disadvantages:-
1. Hive is not for OLAP processing

This topic has been a very basic start to explore on what is hadoop. Hopefull you have enough information to get started.

If you have any questions, please feel free to leave a comment and I will get back to you.

Tomcat is a open source application server built on java framework. Tomcat can be downloaded from apache site, extract zip file to D drive example D:\ jakarta-tomcat-5.5.0 First importantent thing is make sure that you installed and set up the java/JDK software. Tomcat requires jdk installation i.ethe location in which java is installed for this we need to add set JAVA_HOME=“path where jdk is installed” and add set CATALINA_HOME=.. in both startup.bat and shutdown.bat. set the following environment variables like this JAVA_HOME=D:\JDK1.5 i.e where jdk installed to the drive CATALINA_HOME=D:\ jakarta-tomcat-5.5.0 Starting and Stopping Tomcat : To start Tomcat run CATALINA_HOME\bin\startup.bat To stop Tomcat run CATALINA_HOME\bin\shutdown.bat By default tomcat is configured to run on 8080 port. it means tomcat is listening on at port no#8080, if you want to change the port, we can change the configuraiton located in CATALINA_HOME\confi\server.xml. Deploying webapplications:- webapplications are in the form of war file. so you can deploy to tomcat using console or direct your file to webapp folder Please leave us a comments if you have any difficulty in setup tomcat server
Hadoop installation on windows.

Hadoop is a apache framework used to process large amount of data with parallel process. Hadoop is mostly used for linux flavours systems for production use.

As a developer, To explore more things on hadoop, we need a start on windos.

As hadoop is developed and executed in linux flavours, We have many options to setup on windows with either cygwin or vm player.

I am going to list down the steps required for installation and configuring hadoop on windows using cygwin.

What is Cygwin:-
Cygwin is mock enviornment for windows based systems to run as linux based systems. They provided command line interfaces which process this commands and call the windows dll and API's. So you need of knowing most unix commands for this.

It is free and open source software.

1. Download from cygwin from cygwin site and click on setup.exe. and select type openssh in search box of search box in Selected Package as below and install required dependencies.

Once Cygwin is installed in your system, make sure that it works.

2. makesure you set JAVA_HOME point to jdk1.5 or jdk1.6 in your environment variable

3. try to unpack the hadoop hadoop rar distribution In the distribution,

trpe the below command
c:\ bin/hadoop

Now you are ready to start hadoop node

Other tutorial that you may like.
Learn Basics of HDFS in Hadoop
Introduction to Hadoop
Difference between Distribute File System and Normal File System

Most of the times we encountered the situation to read the first element of ArrayList using get(0) method. In Some instances you want to read the last element of anArray List, you can use the following code snippet.
ArrayList list=new ArrayList();

To Access the fourth object, we have to use list.get(list.size()-1). Here size() method returns the size of the list and size -1 return the last index of the list. Note:-

In runtime if there are no elements in the list,if you call method list.get(list.size()-1), it will throws IndexOutOfBoundsException. always make sure that you call this method if list is not empty or null check.

Here is the code snippet
String lastElement=(String) list.get(list.size()-1);
In Unix environment we have sevaral frequent used commands to work 1. What is relative path and absolute path Absolute path:- It is the path from the root directory to current directory Relative Path:- It is the relative path from the current directory. 2.What are SHELL OR BASH SHELL variables. SHELL or BASH variables are symbolic names which assigned the names. 3.how do
In Software development, reusable code to solve the frequent problems occured in the designation of any system In OOPS programming, we have several design patterns the following are the Populare design patters
There are different categories types of design patterns in object oriented programming

1. Creation patterns:- This type of patterns are used to describe the object creation in best possible ways in different contexts . Singleton is the example
2.Structural design patterns:-

3. Behavioral design pattern

Advantages of design patterns:-

1.Improves the performance of the system.
2.Solve the bottleneck of the problem.
3.Best design for the system is possible
4.Improves the code for writing in more object oriented way like inheritance and encapsulation

Disadvantage of the Design patterns:-

As per me, he more code is introduced in the current existing system for better design.
As design patterns are for best design, More complex to understand the system.

Please leave if you see any pros and cons of design patterns.


Singleton design pattern is one of the design pattern to maintain the single instance of object in a system. whenever object created using new Object() code, one new instance created, if we call this multiple instances are created in heap memory. Over the time the calls to the new objects grows, size of the object size grows in the heap memory and it will cause performance overahead. To avoid this, we will make the one object creation for multiple calls and return the same object

public class Singleton {
  * Initializing the static member variable as null
 public static Singleton single = null;

  * private means, we can not create any object using new operator outside
  * this class
 private Singleton() {


  * this method always return same instance. you can make this method as
  * synchronized to create amultiple isntances by different thread at a time

 public static Singleton getInstance() {
  if (single == null) {
   single = new Singleton();
  return single;

  * clone is not supported and throws exception if we make the clone of this
  * object
  * @see java.lang.Object#clone()
 public Object clone() throws CloneNotSupportedException {
  throw new CloneNotSupportedException(
    "This is singleton class, cloning is not supported");

 public static void main(String args[]) {
   * calling the multiple getInstance method always returns the same
   * instance
  System.out.println("Object=1 " + getInstance());
  System.out.println("Object=2 " + getInstance());

This pattern maintains one instance of java object in heap memory of java virtual machine instead of creating multiple instances. Improve the performances and less objects created for heap memory

PS abbreviated as process status command list information about the active process running in the linux/unix machine.this command list out all the active running process

. Command #1:- ps -aef list all the process in system.

Command #2:- ps -aef| grep java We can use ps command piping with grep .List all the process in the linux/unix with name java.

Command #3:- Kill proceesId to kill the process which has id of process. we can use kill -9 pid for forcible killing the process.

Command #3:- Killall proceesName to kill the process which has name of process.

Command #4:- Kill 0 to stop all the process.

ConcurrentModificationException exception is occurred because of while one thread is iterating the collections and other thread trying to modify the object in collections. This will be happened for below map and list of collections. 1. Modifying state of any key or value in map implementations(example,HashMap,HashTable,LinkedHashMap) during iteration of Map objects 2.Adding/remove(Iterator.remove) the object in a collections class while iteration of collection of objects at the same time. you can use ConcurrentHashMap for avoiding this exception but there is no guarantee of all your objects are iterated. Solution is use map.entrySet for modifying object while iteration HashMap mapDemo = new HashMap(); mapDemo.put("key-1", "value-1"); mapDemo.put("key-2", "value-2"); mapDemo.put("key-3", "value-3"); for (Map.Entry entry : mapDemo.entrySet()) { if (entry.getKey().contains("key-2")){ entry.setValue("new Value-2"); } } for (Map.Entry entry : mapDemo.entrySet()) { System.out.println(entry.getKey()+"==="+entry.getValue()); } } and the output is:- key-1===value-1 key-2===new Value-2 key-3===value-3