How Hash Partition Algorithm Works

If you’re one of those people that wait for the movie in order to comment that book you were too lazy to read, this post is for you. You’ll either see the following clip or keep reading. Enjoy.

How Hash Partition Algorithm Works

Golden Rules about hash partitioning a table
1. Number of partitions = degree of parallelism of the table
2. Choose columns with a good data distribution. Status=>bad; SomeRandomCode=>Good
3. Make it a power of 2 (2,4,8,16,32,… etc)

I shall exemplify:
• a) we’ll create a table with 3 columns with loads of records hash partitioned it by each column
∘ col1 : very skewed set of data
∘ col2 : very even data distribution
∘ col3 : very small cardinality
• b) load the data
• c) measure how rows are scattered along the partitions and see what will have the best result

a) — We’ll use only number for the sake of simplicity
create table hashtab (
col1 number,
col2 number,
col3 number)
partition by hash (col1) partitions 4;

– b) some pl/sql science here ;-)
for i in 1..10000 loop
insert into hashtab values
mod(i,4) — there is only 4 possible values here
end loop;

select col1, count(*) from hashtab group by col1 having count(*) > 1;
select col2, count(*) from hashtab group by col2;
select col3, count(*) from hashtab group by col3;

– c) Let’s collect stats

exec dbms_stats.gather_table_stats(‘LMC’,’HASHTAB’);

select partition_name, num_rows from user_tab_partitions where table_name=’HASHTAB';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
SYS_P41                              2462
SYS_P42                              2524
SYS_P43                              2472
SYS_P44                              2542
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
SYS_P45                              2330
SYS_P46                              2450
SYS_P47                              2610
SYS_P48                              2610
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
SYS_P49                                 0
SYS_P50                              3333
SYS_P51                              3333
SYS_P52                              3334

If you’re wondering if that 0 it’s only because there we had 3 different values on 4 hash partitions, you’re wrong, because here are the results with 4 different values:

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
SYS_P53                                 0
SYS_P54                              2500
SYS_P55                              2500
SYS_P56                              5000

Conclusion: The more scattered and random the best this hashing algorithm performs!

So no more hash partition based on country codes and stuff like that.

Hope it was useful.


Kudos for Maria Colgan and the group she works inside Oracle for inspiring the likes of me:

About these ads

2 thoughts on “How Hash Partition Algorithm Works

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s