What is a hive in Hadoop? Advantages and disadvantages

Hive is an open-source framework developed in Java, and one of the sub-component of the Hadoop system, developed by Facebook on top of the Hadoop HDFS system.

We have different components as part of Hadoop architecture.

  • HDFS Basics in Hadoop.
  • Hive can be used to access the data (files in HDFS) stored in the Hadoop distributed file system or data stored in HBase.
  • Map-reduce is a java framework to process the data parallelly

Hive uses a large amount of data on Hadoop without knowing java map-reduce programming.

It provides hive query language (HQL). which is similar to the structured query language(SQL).

Hive provides all queries with minimal ANSI SQL support.

if we want to support complex query features like aggregation, and custom functions, in that case, we have to write a custom map-reduce program that can be plugged into the hive SQL repository easily.

How to Execute Hive Queries?

Hive provides a command-line interfaces platform i.e hive shell for executing hive queries. You can write the queries in a shell script and call the shell script. These hive queries call the map to reduce jobs and query, and process the data.

Hive Advantages

  • Hive is built on Hadoop, which supports and handles all the capabilities of Hadoop provides like reliable, highly available, node failure, commodity hardware
  • Database developers need not learn Java programming for writing map-reduce programs for retrieving data from the Hadoop system.
  • Data stored in HDFS so you will have features of scalability, redundancy over hive SQL language
  • Querying data using hive is simple and easy to use

Hive Disadvantages

  • Hive is not for OLAP processing, only supports OLTP processing
  • Subqueries are not supported.

This topic has been a basic start to exploring what is Hadoop. hopefully, you have enough information to get started.

Conclusion

This is a short tutorial about hive basics.