Partitioning in Hive

The partitioning in Hive means dividing the table into some parts based on the values of a particular column like date, course, city or country. The advantage of partitioning is that since the data is stored in slices, the query response time becomes faster.

As we know that Hadoop is used to handle the huge amount of data, it is always required to use the best approach to deal with it. The partitioning in Hive is the best example of it.

Let's assume we have a data of 10 million students studying in an institute. Now, we have to fetch the students of a particular course. If we use a traditional approach, we have to go through the entire data. This leads to performance degradation. In such a case, we can adopt the better approach i.e., partitioning in Hive and divide the data among the different datasets based on particular columns.

The partitioning in Hive can be executed in two ways -

Static Partitioning

In static or manual partitioning, it is required to pass the values of partitioned columns manually while loading the data into the table. Hence, the data file doesn't contain the partitioned columns.

Example of Static Partitioning

First, select the database in which we want to create a table.

Create the table and provide the partitioned columns by using the following command: -

hive> create table student (id int, name string, age int,  institute string) 
partitioned by (course string)
row format delimited
fields terminated by ',';

Let's retrieve the information associated with the table.

Load the data into the table and pass the values of partition columns with it by using the following command: -

hive> load data local inpath '/home/codegyani/hive/student_details1' into table student
partition(course= "java");  

Here, we are partitioning the students of an institute based on courses.

Load the data of another file into the same table and pass the values of partition columns with it by using the following command: -

hive> load data local inpath '/home/codegyani/hive/student_details2' into table student
partition(course= "hadoop");

In the following screenshot, we can see that the table student is divided into two categories.

Let's retrieve the entire data of the able by using the following command: -

Now, try to retrieve the data based on partitioned columns by using the following command: -

In this case, we are not examining the entire data. Hence, this approach improves query response time.

Let's also retrieve the data of another partitioned dataset by using the following command: -

Next TopicDynamic Partitioning

← prev next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Splunk

SPSS

Swagger

Transact-SQL

Tumblr

ReactJS

Regex

Reinforcement Learning

R Programming

RxJS

React Native

Python Design Patterns

Python Pillow

Python Turtle

Keras

Preparation

Aptitude

Reasoning

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

AWS

Selenium

Cloud Computing

Hadoop

ReactJS

Data Science

Angular 7

Blockchain

Git

Machine Learning

DevOps

B.Tech / MCA

DBMS

Data Structures

DAA

Operating System

Computer Network

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

Automata

C Programming

C++

Java

.Net

Python

Programs

Control System

Data Mining

Data Warehouse

^{Like/Subscribe us for latest updates or newsletter}

Hadoop Tutorial

Hadoop Modules

Hadoop - MapReduce

HBase

Hive Tutorial

Pig

Sqoop

Spark Tutorial

Misc