Best Practices for Statistics Gathering on Oracle 11g


Some people look at Statistics Gathering as taking vitamins: nice to have. And it’s wrong.

You shouldn’t base your tuning strategy on dynamic sampling alone, or luck. Some people freak out because once they generated stats for some system and it all went south. That’s the same as saying that because cars crash and people get killed we shouldn’t drive at all.

You have to use stats. Period.

So my small contribution to make your life easier is to give 5 tips if you are using Oracle 11g:

1) Run all queries against empty tables to populate column usage (histograms will be gathered based on this column usage).

2) Run stats after data has been loaded but before indexes are created (the creation of the index gathers stats on them automatically).

3) Find (business) correlations between columns and inform the optimizer about these correlations by the means of creating Column Groups. Also use this technique instead of function-based indexes.

4) Always gather stats with AUTO_SAMPLE_SIZE so the new stats gathering algorithm of Oracle 11g can kick in.

5) If you use partition tables you *must* use synopsis so the global stats can be derived much much much faster in an incremental fashion.

I guess 1, 2 and 4 don’t offer any doubts.

Regarding 3 and 5 here’s the usual techie yada yada:

3) Column Groups are created by the means of Extended Optimizer Statistics a new procedure of the DBMS_STATS package: CREATE_EXTENDED_STATS.

See an example in this very good article.

5) Synopsis are created when you enable incremental stats at the table or system level:

table level: SQL> exec DBMS_STATS.SET_TABLE_PREFS(‘SCOTT’,’TABLE_XYZ’,’INCREMENTAL”TRUE’);

system level: SQL> exec DBMS_STATS.SET_GLOBAL_PREFS(‘INCREMENTAL”TRUE’);

This will avoid rescaning partitions that haven’t changed since the last stats generation, making it faster.

So remember: ASICS it’s a matter of survival, not an option.

LMC.

Extended Optimizer Statistics

About these ads

11 thoughts on “Best Practices for Statistics Gathering on Oracle 11g

  1. Luis, I am testing performance of our application on Oracle11gR2 and comparing statistics gathering step with what we used to have on Oracle11gR1. Same scope of data and same instances configuration: statistics on Oracle11gR2 takes twice as long. Originally I have estimate_pct set to null. So I gathered statistics on R2 schema first with estimate_pct set to Null – took 17s and then on the same R2 instance with auto_sample_size- took 30 s. So, it seems that it takes longer on R2 than on R1 and auto_sample_size did not help. Would you have any suggestions, explanations?
    Best regards.

  2. Quick Question. I’m referring to point number 2.

    My scenario:

    On a weekly basis we ‘Duplicate’ a table onto a new table space, load the data, build the indexes and analyze.

    My question:

    If I understand correctly its safe to say that you first have to analyze the table after the dataload before Creating the indexes? We are running on Oracle 10g and not 11?

  3. Interesting suggestions. I’m curious about the importance of 1 & 2, trying to imagine a situation when I have empty tables in production or tables without indexes. This seems to assume you create a new db and run scripts to create tables, then run load jobs. We use export/import at the schema level to migrate everything. For us it would be hard to justify the former approach “just” to achieve 1&2

  4. Thanks for the feedback Steve. You might want to load the tables with data pump as well but doing it in two blows: first structure, then run load, then data, then you might want to create indexes (which you can extract as a script from the data pump file). Does this makes sense for you?

  5. Hi, Neat post. There is a problem together with your website in
    web explorer, might check this? IE nonetheless is the market
    chief and a big element of people will leave out your wonderful writing
    because of this problem.

  6. Catching up on stats with a production system means step 1 is not feasible. I cannot run it against empty tables. So does it make sense to run it for a longer period to try and approximate? Or is there some way to interpret the results given that the tables did not start out empty?

  7. Sure you can’t run it against production because tables are not empty. But you could run it against an empty set of the same tables to where you just imported the production stats. Once you got the ball rolling it’s a matter of using incremental stats.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s