If you’re one of those people that wait for the movie in order to comment that book you were too lazy to read, this post is for you. You’ll either see the following clip or keep reading. Enjoy.
How Hash Partition Algorithm Works
Golden Rules about hash partitioning a table
1. Number of partitions = degree of parallelism of the table
2. Choose columns with a good data distribution. Status=>bad; SomeRandomCode=>Good
3. Make it a power of 2 (2,4,8,16,32,… etc)
I shall exemplify:
• a) we’ll create a table with 3 columns with loads of records hash partitioned it by each column
∘ col1 : very skewed set of data
∘ col2 : very even data distribution
∘ col3 : very small cardinality
• b) load the data
• c) measure how rows are scattered along the partitions and see what will have the best result
a) — We’ll use only number for the sake of simplicity
create table hashtab (
col1 number,
col2 number,
col3 number)
partition by hash (col1) partitions 4;
– b) some pl/sql science here ![]()
begin
for i in 1..10000 loop
insert into hashtab values
(dbms_random.random,
mod(i,1000),
mod(i,4) — there is only 4 possible values here
);
end loop;
end;
/
select col1, count(*) from hashtab group by col1 having count(*) > 1;
select col2, count(*) from hashtab group by col2;
select col3, count(*) from hashtab group by col3;
– c) Let’s collect stats
exec dbms_stats.gather_table_stats(‘LMC’,'HASHTAB’);
select partition_name, num_rows from user_tab_partitions where table_name=’HASHTAB’;
Col1 PARTITION_NAME NUM_ROWS ------------------------------ ---------- SYS_P41 2462 SYS_P42 2524 SYS_P43 2472 SYS_P44 2542
Col2 PARTITION_NAME NUM_ROWS ------------------------------ ---------- SYS_P45 2330 SYS_P46 2450 SYS_P47 2610 SYS_P48 2610
Col3 PARTITION_NAME NUM_ROWS ------------------------------ ---------- SYS_P49 0 SYS_P50 3333 SYS_P51 3333 SYS_P52 3334
If you’re wondering if that 0 it’s only because there we had 3 different values on 4 hash partitions, you’re wrong, because here are the results with 4 different values:
PARTITION_NAME NUM_ROWS ------------------------------ ---------- SYS_P53 0 SYS_P54 2500 SYS_P55 2500 SYS_P56 5000
Conclusion: The more scattered and random the best this hashing algorithm performs!
So no more hash partition based on country codes and stuff like that.
Hope it was useful.
LMC
Kudos for Maria Colgan and the group she works inside Oracle for inspiring the likes of me: http://blogs.oracle.com/optimizer
Pingback: Como usar o método de particionamento por Hash « gUPo
Pingback: Recover HCC partitioned tables to non-Exadata storage – The Parallel Quest | ocpdba oracle weblog