next → ← prev

Bucketing in Hive

The bucketing in Hive is a data organizing technique. It is similar to partitioning in Hive with an added functionality that it divides large datasets into more manageable parts known as buckets. So, we can use bucketing in Hive when the implementation of partitioning becomes difficult. However, we can also divide partitions further in buckets.

Working of Bucketing in Hive

Bucketing in Hive

The concept of bucketing is based on the hashing technique.
Here, modules of current column value and the number of required buckets is calculated (let say, F(x) % 3).
Now, based on the resulted value, the data is stored into the corresponding bucket.

Example of Bucketing in Hive

First, select the database in which we want to create a table.

Bucketing in Hive

Create a dummy table to store the data.

hive> create table emp_demo (Id int, Name string , Salary float)  
row format delimited  
fields terminated by ',' ; 

Bucketing in Hive

Now, load the data into the table.

hive> load data local inpath '/home/codegyani/hive/emp_details' into table emp_demo;

Bucketing in Hive

Enable the bucketing by using the following command: -

Create a bucketing table by using the following command: -

hive> create table emp_bucket(Id int, Name string , Salary float)  
clustered by (Id) into 3 buckets
row format delimited  
fields terminated by ',' ;  

Bucketing in Hive

Now, insert the data of dummy table into the bucketed table.

Bucketing in Hive

Bucketing in Hive

Bucketing in Hive

Here, we can see that the data is divided into three buckets.

Bucketing in Hive

Let's retrieve the data of bucket 0.

Bucketing in Hive

According to hash function :
6%3=0
3%3=0
So, these columns stored in bucket 0.

Let's retrieve the data of bucket 1.

Bucketing in Hive

According to hash function :
7%3=1
4%3=1
1%3=1
So, these columns stored in bucket 1.

Let's retrieve the data of bucket 2.

Bucketing in Hive

According to hash function :
8%3=2
5%3=2
2%3=2
So, these columns stored in bucket 2.

Next TopicHiveQL - Operators

← prev next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Splunk tutorial

Splunk

SPSS

Swagger tutorial

Swagger

Transact-SQL

Tumblr tutorial

Tumblr

ReactJS

Regex

Reinforcement learning tutorial

Reinforcement Learning

R Programming tutorial

R Programming

RxJS

React Native tutorial

React Native

Python Design Patterns

Python Design Patterns

Python Pillow tutorial

Python Pillow

Python Turtle tutorial

Python Turtle

Keras

Preparation

Aptitude

Logical Reasoning

Reasoning

Verbal Ability

Interview Questions

Company Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

Artificial Intelligence

AWS

Selenium tutorial

Selenium

Cloud Computing

Cloud Computing

Hadoop tutorial

Hadoop

ReactJS Tutorial

ReactJS

Data Science Tutorial

Data Science

Angular 7 Tutorial

Angular 7

Blockchain Tutorial

Blockchain

Git

Machine Learning Tutorial

Machine Learning

DevOps Tutorial

DevOps

B.Tech / MCA

DBMS

Data Structures tutorial

Data Structures

DAA

Operating System

Operating System

Computer Network tutorial

Computer Network

Compiler Design tutorial

Compiler Design

Computer Organization and Architecture

Computer Organization

Discrete Mathematics Tutorial

Discrete Mathematics

Ethical Hacking

Ethical Hacking

Computer Graphics Tutorial

Computer Graphics

Software Engineering

Software Engineering

Web Technology

Cyber Security tutorial

Cyber Security

Automata Tutorial

Automata

C Language tutorial

C Programming

C++

Java

.Net Framework tutorial

.Net

Python tutorial

Python

List of Programs

Programs

Control Systems tutorial

Control System

Data Mining Tutorial

Data Mining

Data Warehouse Tutorial

Data Warehouse

^{Like/Subscribe us for latest updates or newsletter}

Subscribe to Get Email Alerts

YouTube