What is hive in Hadoop?

Hive is an open source framework developed in Java, and one of sub-component of the Hadoop system, developed by Facebook on top of Hadoop HDFS system.

I have already blogged about the basics of HDFS Basics of HDFS in Hadoop. Hive can be used to access the data (files in HDFS) stored in the Hadoop distributed file system or data stored in HBase.

and Map reduce is java framework to process the data parallelly

Hive can be used to analyze the large amount of data on Hadoop without knowing java map-reduce programming.

Hive provides hive query language (HQL). which is similar to the structured query language. Hive provides all queries with minimal ANSI sql support.

if we want to support complex query features like aggregation, custom functions, in that case, we have to write custom map reduce program that can be plugged to hive SQL repository.

Execute Hive Queries

Hive provides command line interfaces platform i.e hive shell for executing hive queries. You can write the queries in a shell script and call the shell script. This hive queries call the map to reduce jobs and query, process the data.

Hive Advantages

  1.  Hive is built on Hadoop, so supports and handles all the capabilities of Hadoop provides like reliable, highly available, node failure, commodity hardware
  2. Database developer need not learn the Java programming for writing map-reduce programs for retrieving data from Hadoop system.
  3.  Data stored in HDFS so you will have features of scalability, redundancy over hive SQL language
  4. Querying data using hive is simple and easy to use

Hive Disadvantages

  1.  Hive is not for OLAP processing, only supports OLTP processing
  2. Subqueries are not supported.

This topic has been a very basic start to explore on what is hadoop. Hopefull, you have enough information to get started.

If you have any questions, please feel free to leave a comment and I will get back to you.

Similar Posts