It’s a bird, It’s a plane, it’s Supercluster

Whenever I give my kids a choice of breakfast, they will invariably choose what I reckon as the “wrong” choice. It’s always the one with more sugar, or non-natural ingredients. When they don’t have a choice, they always state their preferences. In technology such preferences also exist. Some of them very passionate. Specially after Oracle announced a new Engineered System called SPARC Supercluster (SSC), last October at Oracle Open World 2011 in San Francisco. Whilst the positioning and comparison of SSC seemed quite easy and spontaneous for some, the vast majority still wonders where does this new product sits in the midst of its closest cousins: Exalogic and Exadata. For the former I challenge you to step back a bit and try to think in a more agnostic fashion.

In order to do that let me go back to the concept of “preference”.

So what is a technological preference? It’s preferring a set or subset of technologies in detriment of others (not necessarily the “bad ones”). It’s just a preference. Why would you have a preference? Mainly because it gets the job done. For some people preference is attached to a career or knowledge area, so they are professionally defined by their knowledge. If you want to get the job done in the best way possible you must be technologically agnostic. Technology is a means to an end, not the end itself. So if you’re reading this and you are trusted by your customers, or want to be trusted as a strategic partner, you must concentrate on how technology products would get your customers from point A to point B. When you strip down a solution to its architectural components, you would want it to have building blocks that encapsulate the absolute best of breed in each area.

Let’s say you’re laying down an infrastructure for an enterprise application assuming the following granularity: storage, network, operating system, database, logic and interface.

Do you think that 5 years ago one could sum up all those layers into one single building block? Probably you would have to go even more back in time, but those days of closed systems are doomed to end (mainly because the knowledge to operate those is fading away). Assuming we are talking only about open systems the answer to the question is: no, such large building block didn’t exist. The technological purists might beg to differ, but this is no task for the light hearted. It’s something so big, so complicated, that only a concentration of enterprise class building blocks can achieve. In space research, or astrophysics, a large collection of groups of stars is called a Supercluster, and that’s exactly what Oracle has come up with. The good people of IT departments can now tackle the immensity of this task using a new building block. Your customers might “prefer” mixing other building blocks, it’s up to them to have the pains of integration. On the other hand, if they want to get the job done, they shouldn’t have the meaningless work of connecting an array of diffuse technologies just because of “preferences”. Getting the job done does not comply with these micro-questions.

So what comes inside a Supercluster? Storage, network, operating system, database, logic and interface! How cool is that? Exactly what we needed!

Some might say that they could build such a system but with different components, that would also get the job done. My question to them is: How many price tags would such a system had to have? Supercluster only has one.

LMC

If you want learn more about SSC:

http://www.oracle.com/technetwork/server-storage/hardware-solutions/sparc-supercluster-191641.html

Smart Flash Redo Logging: A small step from development a giant leap in database performance

Since the first ages of Oracle Database Architecture, one of the most common bottlenecks in high load transactional systems could be the speed of the disks where the log files were stored. One of the symptoms of this problem would be an upsetting event called “Log File Sync”. Basically the hiccups on the redo log files storage, no matter how good the latency was, would create a cascading effect on the performance of the whole logging subsystem affecting the whole system’s responsiveness.

An apparent solution would be to put redo log files inside SSD disks or any other low latency storage solution. But you will eventually hit the wall and its cascading effects, if the storage couldn’t cope with the amount of transactions that it’s supposed to deal with. Another apparent solution that some might think of would be to create only one member per redo log group. But it’s not, and it raises the risk which is an unnecessary measure.

The solution that the Oracle Database development team came up with in 11.2.0.3 (also in 11.2.0.2 BP11) is so powerful and yet so simple!

Now when log writer (LGWR) picks up a bunch of log entries from the log buffer to write them to redo logs, it just sends a write request to both disk and flash, the first one to send back an acknowledgement that the write was done, is taken as the confirmation of the write. By now you might be thinking: “Well flash is going to be always be the first one!”. You’re almost right. But if flash is always faster, then what happens to the disk files? They’re written anyway, but if disk fails or lags behind, the log information is in flash. And if flash fails, the log information will be in another cell. So the hiccups are eliminated and the high transactional systems would have to wear out flash cards, which is something that just won’t happen before any other piece get worn out.

Let’s recap:

  1. Requirements:
    1. Version 11.2.2.4 at the cell level
    2. Version 11.2.0.3 at the DB level (when it comes out) or 11.2.0.2 BP11
  2. Smart Flash Logging has to be enabled (it’s enabled by default, only worry if you disabled it)
  3. In systems that are not highly transactional like ODSs or DWs you can disable this feature for all databases or for just one database (in cellcli prompt):
    1. For all use the command DROP FLASHLOG [FORCE]
    2. For just one, alter the default IORM plan: ALTER IORMPLAN dbplan=((name=test, flashLog=off))
  4. To monitor the behaviour of this mechanism go to the cellcli console and use the following command:
    1. LIST FLASHLOG [DETAIL]
    2. or use all the statistics that start with FL_*, like for example: FL_FLASH_IO_ERRS (number of errors when writing to flash => the few times that the disk saved the day).

See the picture below:

This is ground breaking stuff. It’s the removal of an old restriction with an ingenious solution.

LMC

Divide to conquer: Parallel load distribution and balance

My first encounter with a machine with several CPUs was back in the university where I developed a paper about Load Balancing in a Massive Parallel Processing environment. I was trying to develop an algorithm that would balance load amongst CPUs in a dynamic way. I guess the purpose was to investigate more than to create, but two decades later the challenges of Load Balancing (LB) in parallel processing still remain a key aspect to a successful use of several processing units. In times like these never the ability to scale in a linear way were so important, hence dividing is still the most effective way to conquer!

In Oracle database, parallel processing, specifically in the query processing arena, is of utmost importance. The last big machine I had my hands on, an X2-8 Exadata, has nothing less than 128 Cores of processing and the smallest error in balancing the load has an immediate effect.

The Parallel Query (PQ) model of processing was introduced in Oracle7 and the basic mechanism still remains the same as you can see here. In this documentation you can read the following:

“(…) there are actually eight query servers involved in the query even though the degree of parallelism is four. This is because a parent and child operator can be performed at the same time.

Later on in Oracle 11g the same piece of documentation referred to this relationship of parenthood in a different manner:

“(…) there are actually eight parallel execution servers involved in the query even though the DOP is 4. This is because a producer and consumer operator can be performed at the same time (inter-operation parallelism)

My point is that, it’s not enough to parallel your load if it’s not balanced. With this in mind I think that your top priorities regarding PQ processing should be:

- Does your statement produce a good amount of scattered scanning in order to take advantage of a parallel operation?

- Is your operation load intensive or join intensive?

- How data is distributed amongst producer and consumer parallel processes?

The purpose of this post is to make you aware of the 3rd question!

The same doc I’ve quoted, has a section about “Balancing the Workload” from where you can read the common sense concept of “workload skewing, which occurs when some parallel execution servers perform significantly more work than the other processes“. I bet that most of you out there would think that the Oracle optimizer would always do an equitative distribution of load among parallel processes, but I have news for you: it tries its best, and sometimes “best” and “optimal” are not the same. That’s where the human knowledge of the nature of data comes in handy.

You can control this process in three ways:

- Through the wise use of the Degree Of Parallelism (DOP) versus the number of partitions in a table. using a 15 DOP in a 16 partitions table like the doc refers to is a classical error that occurs more times than you might think!

- Enabling correlated (incremental) stats where they apply (the classical example: Car Manufacturer and Model: When I say “Camaro” I don’t need to scan the word “Chevrolet”!)

- Using the hint PQ_DISTRIBUTE, but like any other hint please test it thoroughly before you use it in production.

My goal was to make you aware that parallel distribution of load is important, and from here on I recommend that you read some more about it, and if you can, try it too with your data (not in production please!). So here’s a list I’ve put for your reading pleasure:

* About the PQ_DISTRIBUTE hint

* Blog entries about the subject:

The Cost of PX Distribution Method by Christian Antognini

Parallel query distribution methods by Tony Hasler

* Oracle Forum discussion thread about PQ_DISTRIBUTE

I still haven’t found proper MOS note about this so if you guys out there would want to share it please do so.

Enjoy.

LMC

 

How to gain space on indexes of HCC partitioned tables

The problem only exists because indexed information is not HCC compressed in Exadata, so the database ends up having tables 10 times smaller, but indexes stay the same.

Well the idea came to me through Ghassan Salem from RWP and hence the credits belong to him. If the table is partitioned, you can save space by marking unusable the old index partitions (so they will be wiped out saving space). According to Ghassan even if those old partition’s are accessed, the absence of an indexed can be compensated by having those partitions with a higher compression level, and so the Full Partition Scan will not be noticed. Great idea? But how can we do this?

After googling a bit and browsing the Oracle docs, it turned out that a blog post explaining how to implement Ghassan’s idea  already existed. It’s from Christian Antognini the author from the book: “Troubleshooting Oracle Performance”.

This is truly a great idea and instead of loosing my time and energy trying to make it my own, I’d rather share it and acknowledge it through this blog post.

LMC

Dealing with corruption in an IOT: the practical hands-on approach

If you find yourself with a corrupted block in an Oracle database you should go immediately to My Oracle Support (MOS) and read note 1088018.1. There you will find the appropriate MOS notes you should read if your problem is related to table, index, or LOB corruption. If you’re really unlucky you might have Undo or Redo corruption or even Dictionary or Controlfile corruption. In these cases this note will help you as well.

That’s why this is called a Master Note.

But if you crash into a block corruption inside an IOT, you basically have to give a step back and read again and again the part of the note that says:

IOT

** Contact Oracle Support **

Generic notes about these objects: * None Public*

If you’re not very aware of what’s underneath an Index Organized Table (IOT) you would know that you can’t reorganize the index of these tables because the index … is the table!

So what can you do?

Well, I say you should open a Service Request and ask for help in trying to recover this corruption inside the IOT.

If you can afford loosing some data of this IOT, after you open your SR you can create an image of this IOT and point your app towards this. This is meant to be a temporary practical approach, so you can provision the data inside the corrupted IOT to your users.

Here’s the deal:

1. Mark the IOT so it skips the corrupted blocks

2. Build, rebuild or create another structure by scanning the corrupted IOT successfully

3. Disable the corrupted block skipping

4. Point your users temporary to the non-corrupted data if they need asap

Here’s the code:

1. Marking the IOT to skip corrupted blocks can’t be done with events. They simply don’t work with IOTs:

alter session set events '10231 trace name context forever, level 10';

=> for tablescans

alter session set events '10232 trace name context forever, level 10';

=> to dump to trace files…

alter session set events '10233 trace name context forever, level 10';

=> for index scans

IOTs are not tables and are not indexes, so none of those events will make it. So what can you do? Use the DBMS_REPAIR package, that it will work like a charm:

SQL> exec DBMS_REPAIR.SKIP_CORRUPT_BLOCKS('MY_SCHMEA','MY_IOT');

PL/SQL procedure successfully completed.

2. Rebuild the IOT (this will make you loose those corrupted entries for good) or CTAS it. I’ve tested with MOVE and it worked fine:

SQL> alter table MY_SCHEMA.MY_IOT move;

Table altered.

3. Now disable the corrupted block skipping using the package again:

SQL> exec DBMS_REPAIR.SKIP_CORRUPT_BLOCKS('MY_SCHEMA','MY_IOT',flags=>dbms_repair.noskip_flag);

PL/SQL procedure successfully completed.

4. Now you’re good to go and use the info that was not affected by the corruption while MOS people are helping you do a more deep analysis of the problem and provide an action plan to solve it.

LMC

Stuff I’ve learned this week

Been trapped in a war-like situation since sunday night so here’s some of the stuff I’ve learned on this mad week:
- the freaking em11g agent only installs in silent mode, and you have to edit a response file before you invoke the damn thing that only accepts it in absolute path. Are we in the 70′s again? What is this?
- if you ssh with X11 forwarding you don’t need to export the DISPLAY var. It only takes xauth list, copy the whole line and “xauth add “
- recover a db until seq is tricky. You have to check what was the seq# of the last arch log backed up and then make it recover until that seq+1.
- asm is a devilish thing for sys and storage admins.
- before creating a standby db to a blank (binaries only) host you should install the em agent there
- when a data guard standby creation fails with em the newly created standby db target in em has to be removed manually
- data guard wizard in em will only work if your agent is secured (https)

Yet another:

- The lack of SQL skills is the root of all evil

Last but not least:

The line that separates tech guys who catch flies and trap them inside upside-down plastic cups, and potential serial killers is a very thin one.

LMC

Recover HCC partitioned tables to non-Exadata storage – The Parallel Quest

If you read my previous post, you might have noticed that I was using a non-partitioned table, of a somewhat small size. The basic stuff is there, but if you’re managing bigger tables you might want to do it using partitions and do it using parallel DDL, to speed things up.

This raises a new challenge: you might not want to uncompress all your partitions, and those that you decide to uncompress should be able to do it in parallel. The sole propose of this post is to show you how can you do it, assuming you want to uncompress all your HCC compressed partitions in the non-Exadata storage… and in parallel :-D

Now the steps

1. Create 4 different tablespaces to accommodate the four partitions of the table we’re about to create

2. Create a table called BRAGA (3rd biggest city in Portugal and the finalists in the UEFA Europa Cup next 18th May in Dublin’s Arena). This table is hash partitioned, with four partitions pointing to the four tablespaces created previously. The table will use HCC compression at the default level (QUERY HIGH), which uses the exact same algorithms and transformations as ARCHIVE LOW, but with a smaller Compression Unit (CU).

3. Direct Load the table with info from the Data Dictionary, creating 415805 rows (about 70MB of uncompressed info).

4. Check the space taken by the CUs: 2.75MB. A compression ratio around 25!

5. Gather stats for the table and its partitions

6. Check how the hashing algorithm spread the data across the four partitions. It should be balanced. I’ve written another post this year about this. You should choose the column with the biggest cardinality to spread the data more evenly. It should be a number around 415805/4=103952 rows for each partition.

7. Backup the tablespaces

8. Recover them back to non-Exadata storage, this time to a local disk in node 1 (/home/oracle/lmc). I was using only one instance to avoid having ORA-01157 errors blowing in my face.

9. Check if the datafiles are on the expected place

10. Query table BRAGA and get error ORA-64307 thrown at you

11. Now it’s the tricky bit! First you modify the compression status of each partition to “nocompress” and then …

12. … you uncompressed them at will and in parallel (after enabling parallel DDL at the session level). Each ALTER TABLE braga MOVE PARTITION  will be a parallel operation in itself, and can be done in parallel amongst themselves to raise the bar!

13. After the uncompression gets done you’ll see that the table is readable and has 69MB!

14. To play around a bit I’ve decided to compress them again but using OLTP compression and in the end I measure the ratio. Nice one too.

Now the code

create tablespace knod1
 datafile '+DGDATA' size 100m
 autoextend on next 250m maxsize 5g;
 create tablespace knod2
 datafile '+DGDATA' size 100m
 autoextend on next 250m maxsize 5g;
 create tablespace knod3
 datafile '+DGDATA' size 100m
 autoextend on next 250m maxsize 5g;
 create tablespace knod4
 datafile '+DGDATA' size 100m
 autoextend on next 250m maxsize 5g;
create table lmc.braga ( --column mimic dba_segments
 OWNER                VARCHAR2(30),
 SEGMENT_NAME            VARCHAR2(81),
 PARTITION_NAME         VARCHAR2(30),
 SEGMENT_TYPE            VARCHAR2(18),
 SEGMENT_SUBTYPE        VARCHAR2(10),
 TABLESPACE_NAME        VARCHAR2(30),
 HEADER_FILE            NUMBER,
 HEADER_BLOCK            NUMBER,
 BYTES                NUMBER,
 BLOCKS             NUMBER,
 EXTENTS            NUMBER,
 INITIAL_EXTENT         NUMBER,
 NEXT_EXTENT            NUMBER,
 MIN_EXTENTS            NUMBER,
 MAX_EXTENTS            NUMBER,
 MAX_SIZE            NUMBER,
 RETENTION            VARCHAR2(7),
 MINRETENTION            NUMBER,
 PCT_INCREASE            NUMBER,
 FREELISTS            NUMBER,
 FREELIST_GROUPS        NUMBER,
 RELATIVE_FNO            NUMBER,
 BUFFER_POOL            VARCHAR2(7),
 FLASH_CACHE            VARCHAR2(7),
 CELL_FLASH_CACHE        VARCHAR2(7)
 )
 partition by hash(header_block)
 partitions 4
 store in (knod1, knod2, knod3, knod4)
 compress for query high
 ;

Direct-Load it so HCC can kick-in:

SQL> insert /*+ APPEND */ into lmc.braga
 2   select * from dba_segments;
415805 rows created.
select sum(bytes)/1024/1024 Megs
 from dba_extents
 where segment_name = 'BRAGA';
MEGS
 ----------
 2.75
-- Collect Stats
 exec dbms_stats.gather_table_stats('LMC','BRAGA');
select partition_name, num_rows
 from user_tab_partitions
 where table_name='BRAGA';
PARTITION_NAME             NUM_ROWS
 ------------------------------ ----------
 SYS_P61985               101624
 SYS_P61986               104043
 SYS_P61987               103247
 SYS_P61988               106891

Nice Distribution! This means we’ve picked the right column to hash partition it.

Now let’s move it to non-Exadata storage:

1st) Backup.
You can do it one by one or backup them all with the same rman command, see my example:

RMAN> backup tablespace knod1;
Starting backup at 13-MAY-11
 using target database control file instead of recovery catalog
 allocated channel: ORA_DISK_1
 channel ORA_DISK_1: SID=693 instance=orcl1 device type=DISK
 channel ORA_DISK_1: starting full datafile backup set
 channel ORA_DISK_1: specifying datafile(s) in backup set
 input datafile file number=01006 name=+DGDATA/orcl/datafile/knod1.1153.750943521
 channel ORA_DISK_1: starting piece 1 at 13-MAY-11
 channel ORA_DISK_1: finished piece 1 at 13-MAY-11
 piece handle=+DGARCH/orcl/backupset/2011_05_13/nnndf0_tag20110512t120530_0.279.750945931 tag=TAG20110512T120530 comment=NONE
 channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
 Finished backup at 13-MAY-11
RMAN> backup tablespace knod2,knod3,knod4;
Starting backup at 13-MAY-11
 using channel ORA_DISK_1
 channel ORA_DISK_1: starting full datafile backup set
 channel ORA_DISK_1: specifying datafile(s) in backup set
 input datafile file number=01007 name=+DGDATA/orcl/datafile/knod2.1154.750943521
 input datafile file number=01008 name=+DGDATA/orcl/datafile/knod3.1155.750943523
 input datafile file number=01009 name=+DGDATA/orcl/datafile/knod4.1156.750943525
 channel ORA_DISK_1: starting piece 1 at 13-MAY-11
 channel ORA_DISK_1: finished piece 1 at 13-MAY-11
 piece handle=+DGARCH/orcl/backupset/2011_05_13/nnndf0_tag20110512t120633_0.278.750945995 tag=TAG20110512T120633 comment=NONE
 channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
 Finished backup at 13-MAY-11

Now restore to “/home/oracle/lmc”:

run {
 sql 'alter tablespace "KNOD1" offline immediate';
 sql 'alter tablespace "KNOD2" offline immediate';
 sql 'alter tablespace "KNOD3" offline immediate';
 sql 'alter tablespace "KNOD4" offline immediate';
 set newname for datafile '+DGDATA/orcl/datafile/knod1.1153.750943521' to '/home/oracle/lmc/knod1.dbf';
 restore tablespace 'KNOD1';
 set newname for datafile '+DGDATA/orcl/datafile/knod2.1154.750943521' to '/home/oracle/lmc/knod2.dbf';
 restore tablespace 'KNOD2';
 set newname for datafile '+DGDATA/orcl/datafile/knod3.1155.750943523' to '/home/oracle/lmc/knod3.dbf';
 restore tablespace 'KNOD3';
 set newname for datafile '+DGDATA/orcl/datafile/knod4.1156.750943525' to '/home/oracle/lmc/knod4.dbf';
 restore tablespace 'KNOD4';
 switch datafile all;
 recover tablespace 'KNOD1';
 recover tablespace 'KNOD2';
 recover tablespace 'KNOD3';
 recover tablespace 'KNOD4';
 sql 'alter tablespace "KNOD1" online';
 sql 'alter tablespace "KNOD2" online';
 sql 'alter tablespace "KNOD3" online';
 sql 'alter tablespace "KNOD4" online';
 }

After a while everything is back to non-Exadata storage:

[oracle@db01 lmc]$ ll
 total 410144
 -rw-r----- 1 oracle oinstall 104890368 May 13 18:12 knod1.dbf
 -rw-r----- 1 oracle oinstall 104890368 May 13 18:13 knod2.dbf
 -rw-r----- 1 oracle oinstall 104890368 May 13 18:13 knod3.dbf
 -rw-r----- 1 oracle oinstall 104890368 May 13 18:13 knod4.dbf
 [oracle@db01 lmc]$ pwd
 /home/oracle/lmc

Let’s try to query the table:

SQL> select count(*)
 2  from lmc.braga;
 from lmc.braga
 *
 ERROR at line 2:
 ORA-64307: hybrid columnar compression is only supported in tablespaces
 residing on Exadata storage

This is fine! Now let’s uncompress it in parallel:

SQL> ALTER SESSION ENABLE PARALLEL DDL;
Session altered.
SQL> alter table braga modify partition SYS_P61985 nocompress;
Table altered.
SQL> alter table braga modify partition SYS_P61986 nocompress;
Table altered.
SQL> alter table braga modify partition SYS_P61987 nocompress;
Table altered.
SQL> alter table braga modify partition SYS_P61988 nocompress;
Table altered.
SQL> select sum(bytes)/1024/1024 Megs
 from dba_extents
 where segment_name = 'BRAGA';  2    3
MEGS
 ----------
 2.75
SQL> select count(*)
 2  from braga;
 from braga
 *
 ERROR at line 2:
 ORA-64307: hybrid columnar compression is only supported in tablespaces
 residing on Exadata storage
Still can't!!!!!!! because we haven't released it yet.
SQL> alter table braga move partition SYS_P61985 parallel;
Table altered.
SQL> alter table braga move partition SYS_P61986 parallel;
Table altered.
SQL> alter table braga move partition SYS_P61987 parallel;
Table altered.
SQL> alter table braga move partition SYS_P61988 parallel;
Table altered.
SQL> select count(*)
 2  from braga;
COUNT(*)
 ----------
 415805
SQL> select sum(bytes)/1024/1024 Megs
 from dba_extents
 where segment_name = 'BRAGA';  2    3
MEGS
 ----------
 69

Wow! Looks like the ratio was 69/2.75=25 or in orageek language:

SQL> select round(69/2.75) Ratio from dual;
RATIO
 ----------
 25

Now that we got kicked off Exaland, we can only use OLTP compression. Let’s do it to all four partitions:

alter table braga modify partition SYS_P61985
 compress for oltp;
 alter table braga modify partition SYS_P61986
 compress for oltp;
 alter table braga modify partition SYS_P61987
 compress for oltp;
 alter table braga modify partition SYS_P61988
 compress for oltp;
 -- Gosh I was dumm not naming the partitions :-( 

Now rebuild them:

alter table braga move partition SYS_P61985;
 alter table braga move partition SYS_P61986;
 alter table braga move partition SYS_P61987;
 alter table braga move partition SYS_P61988;
SQL> select sum(bytes)/1024/1024 Megs
 from dba_extents
 where segment_name = 'BRAGA';  2    3
MEGS
 ----------
 16

Still a good ratio:

SQL> select round(69/16) Ratio from dual;
RATIO
 ----------
 4

Hope this was of some utility.

LMC

Recover HCC Compressed Tables to non Exadata storage

Is this for me? (always helpful for busy people)

As many of you know by now, Exadata comes with an exclusive way to compress information, based on columns, and with a different storage unit called “Compression Unit”, that responds by the name of Hybrid Columnar Compression (HCC). The only storage in the world that can read this compressed information is Exadata Storage Server. So if you have your main production database inside an Exadata box, and you want to duplicate that database to non-Exadata storage (for instance SAN), then this article is for you.

Stuff you need to know first:

- As of this day Golden Gate development teams haven’t been able to capture information from compressed tables (HCC or other Oracle compression algorithm) but they are working on it

- If you use standby database (Data Guard) either in active or on mount mode you need to do these steps first, before you start syncronizing

- HCC is a very good compression technique. I’ve seen hundreds of examples and very few of them don’t have a 10 to 20 average compression ratio. So you need to have 10 to 20 times the space in your non-Exadata storage before you consider using the following steps. The only way to compensate the need for that amount of space is to “explode” the HCC tables to OLTP compressed tables. I’ll show you that too.

The Steps

These steps were done inside an Exadata X2-2 Full Machine on a server pool with 4 nodes, but can be reproduced on any other Exadata V2/X2 Machine.

The following example will:

1. Create a tablespace called “knod”

2. Create a table “lisboa” inside the tablespace, with an HCC compression level of 2 (Query High). The table will be loaded during creation with the view “DBA_EXTENTS” to create some volume (28674 rows).

3. Backup the tablespace “knod” with RMAN to the FRA that resides on the ASM diskgroup called “RECO” (inside Exadata).

4. Restore the tablespace from Exadata to a non-Exadata storage, using RMAN.

5. Query the table and get the ORA-64307 Error.

6. Try to move the table back to a tablespace inside Exadata (called “tbs_with_hcc”) and see that it can’t read it either, throwing the 64307 error again.

7. Try to DML on the table and watch that only the INSERT statement would work, because you can add data, you just can’t read the one that’s there.

8. To put you out of your misery the table will be decompressed (it could be recompressed with OLTP compression too in this stage)

9. And now you can query or DML on top of the table that everything is working normally (the table was decompressed from it’s HCC format).

10. Finally we count the bytes it takes up (3 MB), OLTP compress it and count them again (0.625) to see that the table was 3 times smaller.

Main Conclusion: You don’t have to “explode” your HCC tables and then OLTP compressed them, you can go *directly* from HCC on Exadata to OLTP compression on non-Exadata storage.

The Script

SQL> create tablespace knod datafile '+DATA_DM01' size 1g;

Tablespace created.

SQL> create table lisboa
  2  tablespace knod
  3  compress for query high
  4  as
  5  select * from dba_extents
  6  ;

Table created.

Elapsed: 00:00:12.00
SQL> select count(*) from lisboa;

  COUNT(*)
----------
     28674

RMAN> backup tablespace knod;

Starting backup at 23-MAR-11
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00009 name=+DATA/orcl/datafile/knod.3800.746570399
channel ORA_DISK_1: starting piece 1 at 23-MAR-11
channel ORA_DISK_1: finished piece 1 at 23-MAR-11
piece handle=+RECO/orcl/backupset/2011_03_23/nnndf0_tag20110323t204617_0.366.746570777 tag=TAG20110323T204617 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 23-MAR-11

---

run {
sql 'alter tablespace "KNOD" offline immediate';
set newname for datafile '+DATA/orcl/datafile/knod.3799.746573393' to '/home/lmc/rec/knod.dbf';
restore tablespace 'KNOD';
switch datafile all;
recover tablespace 'KNOD';
sql 'alter tablespace "KNOD" online';
}

---

SQL> desc lisboa
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                              VARCHAR2(30)
 SEGMENT_NAME                                       VARCHAR2(81)
 PARTITION_NAME                                     VARCHAR2(30)
 SEGMENT_TYPE                                       VARCHAR2(18)
 TABLESPACE_NAME                                    VARCHAR2(30)
 EXTENT_ID                                          NUMBER
 FILE_ID                                            NUMBER
 BLOCK_ID                                           NUMBER
 BYTES                                              NUMBER
 BLOCKS                                             NUMBER
 RELATIVE_FNO                                       NUMBER

SQL> select count(*) from lisboa;
select count(*) from lisboa
*
ERROR at line 1:
ORA-64307: hybrid columnar compression is only supported in tablespaces
residing on Exadata storage

SQL> alter table lisboa move tablespace tbs_with_hcc;
alter table lisboa move tablespace tbs_with_hcc
            *
ERROR at line 1:
ORA-64307: hybrid columnar compression is only supported in tablespaces
residing on Exadata storage

SQL> insert into lisboa select * from dba_extents where rownum < 2;

1 row created.

SQL> commit;

SQL> update lisboa
  2  set owner = owner;
update lisboa
       *
ERROR at line 1:
ORA-64307: hybrid columnar compression is only supported in tablespaces
residing on Exadata storage

SQL> delete lisboa where owner = 'KUS';
delete lisboa where owner = 'KUS'
       *
ERROR at line 1:
ORA-64307: hybrid columnar compression is only supported in tablespaces
residing on Exadata storage

------------------- BUT !!!!!!!!!!!!! ----------------------
If you alter table move 'it with NOCOMPRESS things will be different:

SQL> alter table lisboa move nocompress;

Table altered.

SQL> select count(*) from lisboa;

  COUNT(*)
----------
     28675

-- Now I can even do all the DML I can:

SQL> r
  1* update lisboa set owner = 'USK' where owner = 'SYSMAN'

1951 rows updated.

SQL> commit;

Commit complete.

SQL> delete lisboa where owner = 'USK';

1951 rows deleted.

SQL> commit;

Commit complete.

-- ****************************************************************
SQL> select sum(bytes)/1024/1024 from dba_extents where segment_name = 'LISBOA';
SUM(BYTES)/1024/1024
--------------------
                   3

SQL> alter table lisboa move compress for oltp;

Table altered.

SQL>  select sum(bytes)/1024/1024 from dba_extents where segment_name = 'LISBOA';

SUM(BYTES)/1024/1024
--------------------
                .625

-- **************************************************************

Feel free to bash or inquire.

LMC

Myth breaker: Patching Exadata

Exadata Storage Software has become a powerfull extension of the Oracle Database, not only leveraging performance but also offloading tasks from the database server, off to the storage server.

The span of this relationship is more visible when you have to patch the cells because of something happening at the database level. And since the Database Machine (DBM) works in grid, you’ll have three layers to patch: exadata cells, grid infrastructure and database. Let’s call them cell, grid and db.

To help you out in this task you have to make a favourite out of MOS Note 888828.1 that tells you exactly what patch Bundle you have to apply in each layer. The good thing is that cells work with a large set of versions at the grid and db level, but these last two have a more dependable patch tracking. Again, reading the referred note you ‘ll find what those combinations are.

So you have to be aware of the several tracks:

Cell Track

Exadata V1 – Has a track on it’s own, although you can install V2 software on it.

Exadata V2/X2-2 -  These two share the same track. The oldest one beeing 11.2.1.x.x and the latest one beeing 11.2.2.x.x. When V2 was released it came out with 11.2.1.2.1, and when X2-2 was release it came out with 11.2.1.3.1.

Grid Track

This one is exclusive of the V2/X2-2 DBMs simply because they came with 11gR2 which created this aggregated layer of software that includes clusterware and ASM software, called Grid Infrastructure. In Exadata V2 onwards development decided that this layer would have a Bundle Patch (BP) on it’s own, and so you’ll have a special BP for Grid. This BP might need that you’ll have a minimum BP at the Database level, hence you have to check out note 888828.1 to see have is the latest combination, and the older ones, where one might be the fit for you.

Database Track

This is the normal db patching, with the advantages of having a Bundle Patch that agregates: one-offs, PSUs and CPUs. All specific to the DBM platform. This is*not* a Patchset! Patchsets are a superset of BPs. So:

Exadata V2/X2-2 has two tracks:

11.2.0.1 (No Patchset) + Bundle Patch (the latest one is BP8)

11.2.0.2 (Patchset 1) + Bundle Patch (the latest one is BP2)

Putting it all together

Storage Nodes: here you can have the Exadata Cell Patch version you want. So there is no need to worry, just try to keep it to the latest since you can do it online.

At the database nodes: Patchset + Grid + Database Tracks have bundle combinations. This is “business-as-usual” patching that can be done in rolling fashion to avoid downtime. Some combinations require a specific version at the cell level so that’s why you should worry about having cells patched to the latest version.

Example

ACME,co bought a Database Machine that came with 11.2.1.2.1 Exadata version at the cells and 11.2.0.1 at the database level without any bundle patching.  Throughout 12 months this company only applied one-off patches for specific bugs but have never applied any Bundle Patch at the grid or db level, neither at the cell level.

This company chooses to have downtime and patch the whole infrastructure, so here’s the plan:

1st – Patch cells from a database node (so you can do them in parallel) to latest cell version. In this case going from 11.2.1.2.1 to 11.2.2.2.0 directly. Although it seems a great leap, it is possible.

2nd – From 11.2.2.2.x onwards the cell patches have a step that needs to be run at the db nodes

3rd – Patch db level to BP6

4th – Patch grid level to BP4

5th – Patch both db+grid to 11.2.0.2 (Patchset 1 – PS1)

6th – Patch db level to PS1-BP2

It seems complex but it works, it patches firmware, OS level, storage level, database level, EVERYTHING is patched in a certified way whereas if you had to do it with several teams and several manufacturers you would take months to plan, huge risk margin, and an enormous manpower effort to cary this on non-working hours. More expensive.

I hate to sound like a broken record, but Exadata is truely a game changer not only at the business and tech level, but also at the operational level.

LMC

How Hash Partition Algorithm Works

If you’re one of those people that wait for the movie in order to comment that book you were too lazy to read, this post is for you. You’ll either see the following clip or keep reading. Enjoy.

How Hash Partition Algorithm Works

Golden Rules about hash partitioning a table
1. Number of partitions = degree of parallelism of the table
2. Choose columns with a good data distribution. Status=>bad; SomeRandomCode=>Good
3. Make it a power of 2 (2,4,8,16,32,… etc)

I shall exemplify:
• a) we’ll create a table with 3 columns with loads of records hash partitioned it by each column
∘ col1 : very skewed set of data
∘ col2 : very even data distribution
∘ col3 : very small cardinality
• b) load the data
• c) measure how rows are scattered along the partitions and see what will have the best result

a) — We’ll use only number for the sake of simplicity
create table hashtab (
col1 number,
col2 number,
col3 number)
partition by hash (col1) partitions 4;

– b) some pl/sql science here ;-)
begin
for i in 1..10000 loop
insert into hashtab values
(dbms_random.random,
mod(i,1000),
mod(i,4) — there is only 4 possible values here
);
end loop;
end;
/

select col1, count(*) from hashtab group by col1 having count(*) > 1;
select col2, count(*) from hashtab group by col2;
select col3, count(*) from hashtab group by col3;

– c) Let’s collect stats

exec dbms_stats.gather_table_stats(‘LMC’,'HASHTAB’);

select partition_name, num_rows from user_tab_partitions where table_name=’HASHTAB’;

Col1
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
SYS_P41                              2462
SYS_P42                              2524
SYS_P43                              2472
SYS_P44                              2542
Col2
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
SYS_P45                              2330
SYS_P46                              2450
SYS_P47                              2610
SYS_P48                              2610
Col3
PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
SYS_P49                                 0
SYS_P50                              3333
SYS_P51                              3333
SYS_P52                              3334

If you’re wondering if that 0 it’s only because there we had 3 different values on 4 hash partitions, you’re wrong, because here are the results with 4 different values:

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
SYS_P53                                 0
SYS_P54                              2500
SYS_P55                              2500
SYS_P56                              5000

Conclusion: The more scattered and random the best this hashing algorithm performs!

So no more hash partition based on country codes and stuff like that.

Hope it was useful.

LMC

Kudos for Maria Colgan and the group she works inside Oracle for inspiring the likes of me: http://blogs.oracle.com/optimizer



Downloading 11.2.0.2 from MoS? Don’t get caught!

Simple post, just to warn the souls that haven’t downloaded 11.2.0.2 patchset yet:

1 – You don’t need the base version 11.2.0.1 anymore, this patchset is selfsufficient

2 – There are 7 files in a total 4.7 Gigs but you don’t have to download them all

So which zip files do you need? Here’s the list :

zip1of7 + zip2of7 => Database (if you’re creating a non-ASM, non-RAC environment this is all you need)

zip3of7 => Grid Infrastructure (only needed for either or both ASM and RAC installations)

zip4of7 => Client

zip5of7 => Gateways

zip6of7 => Examples

zip7of7 => deinstall (uninstalling Oracle software it’s thru a different set of media since 11gR2)

So if you need basic non-rac non-asm upgrade or groud up installation, just download the first 2 zips.

LMC.

Exadata Storage Foundation

Exadata is software. It’s the secret sauce in the Database Machine solution, but actually aside from all the fancy stuff that it does (HCC, Storage Indexes, or Smart Scan), it has to present the physical disks back to the ASM layer, so the party can begin upstairs at the database level.

So how it all happens?

First Exadata picks each physical disk and turns it into an entity called Cell Disk. This entity is a “kind-of” formatting. After you have created your Cell Disk you can chop it into smaller entities called Grid Disks. The Grid Disks themselves are still raw, unformatted slots of space in a bigger physical disk, but a Grid Disk always maps into one and only one physical disk.

The Grid Disks are always created with an ASM intention behind. Just because these areas in each and every physical or cell disk, will map directly into a ASM Disk, and in order to enable redundancy you have to make sure you are using all physical disks from all cells. It doesn’t make any sense create an ASM Disk Group based only on Grid Disks from a single cell. It’s possible to do so, but it’s senseless.

In a quarter-rack, the default configuration from Oracle very own installation services creates by default three ASM Disk Groups: DATA, RECO and SYSTEM. Each of these ASM Disk Groups, have all it’s Grid Disks created across all cell disks from all cells. The last time I’ve explained this to a customer I did two drawing: one having all disks divided into smaller units called grid disks (with the respective default sizes in gigs on the side) and other with the physical disk numbering on a given cell:


Hope to have shed some light over this, and to the storage guys: please stop crying about not having RAID and low-level copies. You don’t need them for database!!

LMC.

Exalogic is out there!

The all brand new Middleware Machine, also called Exalogic is a box filled with 30 dual-sixcore servers, with a total of 360 cores of horse power for Java aps. One file for patching the whole hardware and software, 40TB SAS server with a 960GB SSD storage and 2.8TB RAM with a 4TB read cache and 72GB write cache. Wow!

The new tier here is Exalogic  Elastic Cloud software a new layer of software that stands between JRockit and the Operating System or Oracle VM. JRockit itself comes combined with Hotspot best of both worlds in Java Virtual Machine, all provisioned by Exalogic software and made available as one resource by Coherence.

It’s a very exiting new era the one that unveils machines capable of running clouds of of the box, because Exalogic was built to talk natively to Exadata and take the power out of that relation.

Much more to come!

Another Exadata bites the Portuguese dust!

The first half-rack Database Machine has just arrived in Portugal, it is setted up and running. Not yet on production though, but it took a record breaking day and a half to hand the system over to the customer cluster ready and with a database ready to trash!

Let’s do the math:

1 – Operating System installation on all four nodes

2 – Setting it up network-wise (including the IB switches)

3 – Setting it up oracle-wise

4 – Disks config on each of the 7 Exadata Storage Servers

5 – Grid Infrastructure installation on all 4 nodes

6 – RDBMS Installation on all 4 nodes

7 – ASM config upon the grid disks

8 – Smart Flash Cache config

9 – RAC enabled database creation on the ASM diskgroups

10 – Testing and Documenting

Now tell me: for how long do you think some consultancy firm would rate this job? A week? No way!! Put a month and only if all the stars and planets aligned ;-)

We did it in a day and a half for a huge portuguese retail costumer when delivering a half-rack Database Machine, with 4 database servers and 7 exadata storage servers inside the box. This is truly revolutionary. Exadata it’s not just for Extreme Performance, it’s a matter of survival for most businesses that shutdown because of some small glitch in the complex array of pieces scatered around their IT infrastructure.

This culture of trying to make it more simple by bundling it it’s not new in some other areas, but in the IT Infrastructure alleys it is! Specially in Database!

How many hours did the DBAs worked all-nighters just to find out the problem was somewhere else in the pile of complexity? By saving precious time to these fellows we are enabling them to fit more work in the same day, bringing together the two sides that generally work apart: business and IT.

But is this all just a sea of roses? No, just because it joins the two most unpredictable machines in the world: computers and people.

DBAs need to know the true meaning of RAC on a many-as-one perspective, and sys-admins need to face the fact that their realm is not beeing invaded but that there is a new kid on the storage block: Exadata technology. Apps developers need to know how to code scalable apps, etc.

The journey has just began, but it’s one that a small country like Portugal has proven to be taken it seriously, because we already have 5 DBMs around here and plan to plant more in the months to come.

LMC.

New Physical Database Limits of 11.2

It has been a while since I’ve gone through the documentation part of Oracle Database Server that talks about limits! I think the last one was when the rowid format changed from Oracle7 to Oracle8.

This post is not just about publishing those in another website, it’s commenting them. If you want to read the original text click here, if you want to read my remarks on the limitations, keep scrolling down, they are in bold (but don’t expect rocket science though!)

Physical Database Limits

Item Type of Limit Limit Value
Database Block Size Minimum 2048 bytes; must be a multiple of operating system physical block size
Database Block Size Maximum Operating system dependent; never more than 32 KB
Database Blocks Minimum in initial extent of a segment 2 blocks (just because one is for data other is for segment header)
Database Blocks Maximum per datafile Platform dependent; typically 222 – 1 blocks
Controlfiles Number of control files 1 minimum; 2 or more (on separate devices) strongly recommended
Controlfiles Size of a control file Dependent on operating system and database creation options; maximum of 20,000 x (database block size)
Database files Maximum per tablespace Operating system dependent; usually 1022 (Here the bigfile tablespaces actually can index much more space with just one datafile. It's because the rowid lacks the datafile encoding hence saving more space to index more blocks)
Database files Maximum per database 65533 – May be less on some operating systems

Limited also by size of database blocks and by the DB_FILES initialization parameter for a particular instance

Database extents Maximum per dictionary managed tablespace 4 GB * physical block size (with K/M modifier); 4 GB (without K/M modifier)
Database extents Maximum per locally managed (uniform) tablespace 2 GB * physical block size (with K/M modifier); 2 GB (without K/M modifier)
Database file size Maximum Operating system dependent. Limited by maximum operating system file size; typically 222 or 4 MB blocks
MAXEXTENTS Default value Derived from tablespace default storage or DB_BLOCK_SIZE initialization parameter
MAXEXTENTS Maximum Unlimited (you should put a brake here not letting the segment have more than 100,000 extents on huge objects; 10,000 on big ones and less than 1,000 on average sized ones: typically huge is terabyte sized, big is hundreds of gigs and average sized less than 10 gigs)
Redo Log Files Maximum number of logfiles Limited by value of MAXLOGFILES parameter in the CREATE DATABASE statementControl file can be resized to allow more entries; ultimately an operating system limit (always change the default setting specially in a clustered environment)
Redo Log Files Maximum number of logfiles per group Unlimited
Redo Log File Size Minimum size 4 MB (always start with at least 100MB on OLTP and 500MB on DWH shops)
Redo Log File Size Maximum Size Operating system limit; typically 2 GB
Tablespaces Maximum number per database 64 K Number of tablespaces cannot exceed the number of database files because each tablespace must include at least one file
Bigfile Tablespaces Number of blocks A bigfile tablespace contains only one datafile or tempfile, which can contain up to approximately 4 billion ( 232 ) blocks. The maximum size of the single datafile or tempfile is 128 terabytes (TB) for a tablespace with 32 K blocks and 32 TB for a tablespace with 8 K blocks.
Smallfile (traditional) Tablespaces Number of blocks A smallfile tablespace is a traditional Oracle tablespace, which can contain 1022 datafiles or tempfiles, each of which can contain up to approximately 4 million (222) blocks.
External Tables file Maximum size Dependent on the operating system.An external table can be composed of multiple files.

Slice and Dice bigfile tablespaces with RMAN

Back in 2003 when Oracle 10g was released and bigfile tablespaces were first introduced, not many databases were in the terabyte or petabyte weight-class. The very feature of having your data warehouse downsized in the number of files through the use of bigfile tablespaces, was not considered very useful unless you would use Automatic Storage Management (ASM).

But those who believed that having a tablespace terabyte sized, in a single datafile, crashed their belief in the moment of having to do a backup. Why? Because in Oracle 10g you couldn’t do the backup in parallel of a single datafile. Better said: you couldn’t use the RMAN multi-channelling feature because a single datafile had to go through the same pipe in the whole backup process.

It’s typical in the technology arena to create new features that create new needs, that will be solved and made easier, through the introduction of, yet, new features. So in that same line of thought, Oracle 11g introduced the capability of doing a RMAN multi-channelling, hence parallel, backup of a single datafile, by slicing and dicing it into smaller chunks.

How can we do that? It’s simple: add the following clause to the RMAN command “BACKUP”:

SECTION SIZE <size in bytes>

To give you some examples for the see-it-to-believe-it folks why not give some URLs from the hard-working bloggers (not lazy like me!):

http://jhdba.wordpress.com/2010/04/06/reducing-rman-backup-time-for-unevenly-sized-tablespaces/

Oracle Base Web-Site

Interesting article on Oracle Magazine

If you want to Google it, try: “Multi-section backups”. You’ll speed up the time it takes to backup your huge datafiles in your even bigger database.

Hope it helped some one out there.

LMC.

Enterprise Manager 11g Installation

People who care to surf around the posts on this blog have found out that step-by-steps, howtos, and installation guides are not the point because you have a bunch of those in Oracle official docs and tons of other blogs.
Having said that, last (and first!) time I’ve installed the all new Enterprise Manager 11g, I’ve captured the screen steps and I thought: “Why don’t fill the internet with mode stuff about stuff for people interested in stuff?”. So here it is.

The main changes I’ve found in the installation process were:

1. Look and feel like Oracle Database 11gR2
2. The installation process doesn’t make you create another helplessly old database instance: you provide a connection string for a previous one (10.2.0.4 or above). I’ve used my general purpose 11gR2 database and it went fine.
3. This one is a bit of a pain: it only works with Weblogic Server 10.3.2. You don’t have to create any domains, just install it and EM 11g will recognize your WLS 10.3.2


Best Practices for Statistics Gathering on Oracle 11g

Some people look at Statistics Gathering as taking vitamins: nice to have. And it’s wrong.

You shouldn’t base your tuning strategy on dynamic sampling alone, or luck. Some people freak out because once they generated stats for some system and it all went south. That’s the same as saying that because cars crash and people get killed we shouldn’t drive at all.

You have to use stats. Period.

So my small contribution to make your life easier is to give 5 tips if you are using Oracle 11g:

1) Run all queries against empty tables to populate column usage (histograms will be gathered based on this column usage).

2) Run stats after data has been loaded but before indexes are created (the creation of the index gathers stats on them automatically).

3) Find (business) correlations between columns and inform the optimizer about these correlations by the means of creating Column Groups. Also use this technique instead of function-based indexes.

4) Always gather stats with AUTO_SAMPLE_SIZE so the new stats gathering algorithm of Oracle 11g can kick in.

5) If you use partition tables you *must* use synopsis so the global stats can be derived much much much faster in an incremental fashion.

I guess 1, 2 and 4 don’t offer any doubts.

Regarding 3 and 5 here’s the usual techie yada yada:

3) Column Groups are created by the means of Extended Optimizer Statistics a new procedure of the DBMS_STATS package: CREATE_EXTENDED_STATS.

See an example in this very good article.

5) Synopsis are created when you enable incremental stats at the table or system level:

table level: SQL> exec DBMS_STATS.SET_TABLE_PREFS(‘SCOTT’,'TABLE_XYZ’,'INCREMENTAL”TRUE’);

system level: SQL> exec DBMS_STATS.SET_GLOBAL_PREFS(‘INCREMENTAL”TRUE’);

This will avoid rescaning partitions that haven’t changed since the last stats generation, making it faster.

So remember: ASICS it’s a matter of survival, not an option.

LMC.

Extended Optimizer Statistics

Exadata Storage Server Hybrid Columnar Compression (EHCC)

Since I first wrote the article I’m about to publish, there were several developments until last 27th April when I presented it in public at the Extreme Performance Day in Lisbon, and internally at the “Exadata Top Guns” event in Berlin, 11th April.

So I decided I could make it public. It’s the story of my first Exadata PoC trying to prove out the EHCC compression ratio. Hope you all like it and can learn and comment it.

————

It was the second day of my second week at Oracle as an employee. I’ve been cruising the Oracle hallways for two decades, but always wearing other hats. I’ve even represented Oracle when as a freelance consultant I was an Oracle University trainer. But this is it. This is the real deal. I’ve made it into the lion’s den!
So I was trying to calmly settle myself into this huge organization that is Oracle Corp, when I got the call: “You have to go and do a Proof-of-Concept [PoC] at this customer about how Exadata compresses data”. No panic, only thrill. The customer, TIMw.e., a mobile content producer and seller, based in Portugal, but with offices in four continents, was one of my first clients back in 2006 when I was an independent consultant. So no news here. Doing a PoC, fine. No news either.
But the hard one here was how would I return at a site where I had accountability and play that asset into making them at ease with Exadata Hybrid Column Compressing (EHCC) enough so they could buy the DB Machine. Technical stuff was not a problem. I’ve already had a CentOS 5 Virtual Machine setted up with Oracle 11gR2, and I just had to apply here a patch in order to be able to emulate the algorithm of EHCC. I’ve applied patch n.o 8896202 on my box and ran a coupple of scripts in order to get the DBMS_COMPRESSION package created. This package would emulate EHCC algorithm and tells you the magic number: compression ratio. By using the following method (PL/SQL procedure):
- DBMS_COMPRESSION.GET_COMPRESSION_RATIO
You would feed it with parameters like username, table, tablespace and desired method of compression and the procedure would sample a million rows out of your table and give you back the magic number. The four methods of compression are divided into 2 categories and inside these two categories, a high and a low method. These are the combinations:
1.   For Query Low Compression
2.   For Query High Compression
3.   For Archive Low Compression
4.   For Archive High Compression
The “For Query” methods lets you run OLTP without any compromise, and the “For Archive” methods can achieve astonishing ratios, but applies at historical data. I’ve decided to have a number in my hand prior to head off in to the customer site. And so I’ve created a table with highly random data using the following SQL:

SQL>
CREATE TABLE MYTABLE
TABLESPACE SUCHANDSUCH
AS SELECT dbms_random.normal ColNum1,
dbms_random.random ColNum2,
dbms_random.value ColNum3,
dbms_random.string('a',40) ColStr1
FROM dual
CONNECT BY level <= 1000000;

And then I ran the DBMS_COMPRESSION.GET_COMPRESSION_RATIO with method n.o 2 (“For Query High Compression”) and I’ve got a ratio of 1.5. Not exactly what I was expecting, but at least with this number I would better manage expectations and not compromise myself or the corporation with any ratios.
And so I went to the customer’s office and sat down with the CTO and their DBA. The plan would be to run the algorithm into a single partition of a table that receives each day about 20 million records.
We began late, and so only by 07:20pm did their DBA handed over me the data so I could load into by 11gR2 box. Whilst I was importing the whole data, I was curious to see what kind of ratio could I achive. And so by the time the import process had imported about 2.5 million records, I launched the HCC simulation onto this table and I got the following output:

SQL> set serverout on
SQL> exec get_ehcc_cr(2);
Compression Advisor self-check validation successful. select count(*) on both
Uncompressed and EHCC Compressed format = 1000001 rows
COMPRESSED_TYPE = "Compress For Query High"
COMPRESSED_BLOCKS = 4999
UNCOMPRESSED_BLOCKS = 40299
COMPRESSED_ROWS = 200
UNCOMPRESSED_ROWS = 54
COMPRESSION_RATIO = 8

PL/SQL procedure successfully completed.

I could only see the last line saying the compression ratio was 8. Eight!

I freaked out and immediatly punched a huge puncher the customer has in the middle of the open space. The CTO jumped out of his chair and came to see the (partial yet conclusive) result. He started to talk some mambo-jambo speach about Oracle writting a statement of commitement where we would compromise with a compression ratio of at least 5. I laughed and saw his excitement as a positive thing. Of course Oracle wouldn’t write such a document, but the customer was thrilled and getting that much out of a PoC was enough for me. I’ve ended up loading the whole 20 million records, arriving home
after midnight and crunching the numbers again, this time with the whole 20 mil. I got a solid 8.2 and went to bed happy for my first PoC at Oracle Corp.

———————

It turned out this customer bought the box, and are becoming a worldwide Oracle Exadata reference.

Why is Exadata so fast?

At the moment I’m writing this post from Berlin, where Oracle is hosting a huge Exadata event. The message that passed out there to the market is that DB Machine is a very fast machine, but the way things should be put are: that it’s a unique architecture, since there is nothing new on the hardware side.

So why it is so fast?

I’ll try to give an honest answer, respecting the disclosure agreement as an Oracle employee, and unveiling something that makes this post worthwhile reading.

There are 3 reasons:

1) The team that design it knows “a heck of a lot” about hardware, and so they tailored it to be the ideal architecture to run an Oracle database

2) 11gR2 is the new newness, and it helps speed up a lot of processes

3) I/O is no longer a bottleneck

You could find a lot of tech doc around oracle.com, but the bottomline is that the vision to even build a V2 and place it as an OLTP and DW (general purpose) machine is just pure genious.

What freaks me the most about Exadata? Knowing so much about the tecnhnicality of the product and not beeing able to share it ;-) Sorry.

LMC.

Disable Firewall in Oracle VM Templates saves you a lot of trouble

If you’re addicted to use Oracle VM templates like myself and sometimes have networking problems there is one “must-do” task that will save you a lot of trouble: disabling the Linux firewall.

I experienced sshd, EM, scp, you-name-it-kind-of-networking-tool problems until I disabled the firewall inside each and every Oracle Virtual Machine (on a lab OVS server pool, not production).

And how to you acomplish that simple task?

Easy-peasy lemon-squeazy:

1. log on as root

2. # service iptables stop

3. # chkconfig iptables off

If you want to save the config for later use, you can do this command before stoping the service:

# service iptables save

If you’re using IPv6 then the service is called “ip6tables” and the commands are the same.

Hope it helped.

ACFS and Oracle VM: A silent revolution in storage management

(11gR2 only)

Slice and dice physical machines into small, much manageable, much performant pieces, is only one side behind virtualization. The other is: storage virtualization. In Oracle VM you can either add a virtual disk (dedicated to that VM) or a shared disk (that you can share across VMs).

After the virtual disks are visible to the VM you will treat it like a LUN and use the block device, build a raw or even create a partition and format it. This process takes forever to complete in big files (above 100 GB or so). If you want to create 10 mount points in a virtual machine, you can create 10 virtual disks, but they are not extensible. If something goes wrong, you have to unmount, delete the virtual disk and but it back on again, this time bigger. It’s a huge waste of space.

So what is the alternative?

Create the same 10 virtual disks, use them all in an ASM instance and build a disk group (with or without redundancy).

Imagine that these 10 virtual disks have 10 gigs each. With the new 11gR2 ASM capabilities you can use these 100 gigs in two ways: store database or clusterware files, or even create dynamic volumes that you can mount (with the “-t acfs” switch).

3 uses for the same 100 gigs:

- Database files

- Clusterware Files

- Dynamic Volumes

I can use 80 gigs for database files, and 10 gigs to mount for Oracle binaries. If by some lack of chance my 10 gigs are not enough, I can expand them, instead of having to painfully wait for another virtual disk that takes forever to be active on the VM.

Bottomline?

If you have a 700GB space for your VM on Oracle Virtual Server (OVS)  give it all to ASM 11gR2 and he will help you out on all kinds of needs you have.

LMC.

Exadata DB Machine Performance tests, specs and snaps

Here in Portugal, the European country with more Sun Oracle Exadata v2 DB Machines per square-mile, there’s an Oracle customer that  loaded it’s data inside the box and without doing any optimizations what-so-ever, reached the amazing value of 25 times faster… Should I repeat?

I take this oportunity to remind you all that Oracle Exadata v2 comes in 4 flavours:

Sun Oracle Database Machine Full Rack

8 db servers, 3 QDR InfiniBand switches, 14 Exadata Storage Servers with 14×12 X 600 GB SAS disks or 14×12 x 2 TB SATA disks and  5.3 TB of Exadata Smart Flash Cache.
Sun Oracle Database Machine Half Rack

4 db servers, 2 QDR InfiniBand switches, 7 Exadata Storage Servers with 7×12 X 600 GB SAS disks or 7×12 x 2 TB SATA disks and  2.6 TB of Exadata Smart Flash Cache.
Sun Oracle Database Machine Quarter Rack

2 db servers, 2 QDR InfiniBand switches, 3 Exadata Storage Servers with 3×12 X 600 GB SAS disks or 3×12 x 2 TB SATA disks and  1.1 TB of Exadata Smart Flash Cache.

Sun Oracle Database Machine Basic System

well… who cares about basic system?! ;-)

How does an Exadata Storage Server looks like? Here’s a couple of snaps:

PS: I haven’t disclosed the customer name but I’ll invited them both and the Oracle Consultant to share publicly.

First Portuguese Oracle Nerd to Unpack an Oracle-Sun Exadata v2

Last 23rd December two Exadata v2 DB Machines arrived at our little Portugal! I’ve unpacked both, and at the first one, fellows at the data center were nice enough to put up with 3 “crazy” dudes from Oracle that were taking photo shots from every angle: I was one of these 3 ;-)

These are the very first two DB Machines in the whole Iberia market (Portugal+Spain)!!!

I proudly became speacialized in these machines that we hope can take Database Computing to the next level. These babies can do things that one year ago looked sci-fi.

I haven’t been able to post more tech stuff here because life at Oracle has been (happily) very intense, but stay tunned for Exadata v2 because it will change the way people look at database systems.

LMC.

PS: Photo Credits to J. Oliveira :-)

The Difference between Online and Offline compression

Imagine you have two tables each one with 250,000 records and they just had been created with CTAS. One was created using regular compression and the other with Advanced Compression. Let’s check out many 8k data blocks each one has:

 

TABLE_NAME                     COMPRESS COMPRESS_FOR     BLOCKS
------------------------------ -------- ------------ ----------
TAB1                           ENABLED  BASIC              3584
TAB2                           ENABLED  OLTP               4096

 

TAB1 was compressed using the BASIC compression method and has 3585 blocks, whilst TAB2 has been compressed using the Advanced Compression algorithm (OLTP) and strangely has more blocks, 4096 in total.

 

What’s going on here?

 

Is Advanced Compression worst than regular compression? If so why Oracle charges extra for that feature?

 

I think the answer is in the future! Let’s insert another 250,000 in each table and see what will be the size in each one. I have a table called NPTAB with millions of records in which those two tables were based upon. So I’ll just go to that table and fetch another 250k records:

 

SQL> insert into rep.tab1 select * from rep.nptab where rownum < 250001;

250000 rows created.

SQL> c/tab1/tab2
  1* insert into rep.tab2 select * from rep.nptab where rownum < 250001
SQL> r
  1* insert into rep.tab2 select * from rep.nptab where rownum < 250001

250000 rows created.

SQL> commit;

Commit complete.

 

And now let’s check out how many blocks are there in each table:

 

TABLE_NAME                     COMPRESS COMPRESS_FOR     BLOCKS
------------------------------ -------- ------------ ----------
TAB1                           ENABLED  BASIC             16384
TAB2                           ENABLED  OLTP               9216

 

What the hell? The Advanced Compressed table is now almost 7 thousand blocks smaller than the TAB1 compressed with the basic method. What happened?

Easy! Advanced Compression is not just good compression, but it’s online compression. Data is compressed once and read many times. Data is compressed as it comes, it’s not a two time process, it’s all done at the same time. So when data is inserted compressed it already takes less space, whilst the other table (TAB1) had it’s data compressed when it was created, but when data came, everything was de-compressed.

 

Queries take less time because objects are smaller. Take a look of how much time it takes to count all the records:

 

SQL> select count(*) from rep.tab1;

  COUNT(*)
----------
    499999

Elapsed: 00:00:16.03

SQL> select count(*) from rep.tab2;

  COUNT(*)
----------
    499999

Elapsed: 00:00:07.45
 

The Advanced Compressed table takes 9 seconds less to scan!

Less space, less time to execute queries, wow! Is it all good? I’m inclined to say yes, but data insertion in Advanced Compressed table have a slight performance degradation. How much? Let’s truncate both tables and see how much time it will take to put all the 499999 records back in:

 

SQL> insert into rep.tab1 select * from rep.nptab where rownum < 500000;

499999 rows created.

Elapsed: 00:00:41.37

SQL> insert into rep.tab2 select * from rep.nptab where rownum < 500000;

499999 rows created.

Elapsed: 00:01:25.87

 

There is no perfect world, and so data insertion was slower 200%.

 

You decide what’s best for you: slower data insertion or less storage and far faster queries?

 

Take your pick.

 

LMC.

How to transport a tablespace from 9i to 11g

It has been a while since I’ve written here. Main reason: Oracle Corp aquired me! Anyway, today on the menu we have a very interesting thing:

“How to transport a tablespace from 9i to 11g”.

It’s easy. No science here. You just have to follow the basics steps of the regular from 9i to 9i TTS procedure. I’ll reproduce the drill here of my testings with 9.2.0.1 to 11.1.0.6.

SQL> create tablespace lmc datafile ‘c:\oracle\oradata\orc9\lmc_01.dbf’ size 10m;

Tablespace criado.

SQL> create table mytable tablespace lmc as

2  select * from dba_objects where rownum < 101;

Tabela criada.

SQL> EXEC DBMS_TTS.TRANSPORT_SET_CHECK(ts_list => ‘LMC’, incl_constraints => TRUE);

Procedimento de PL/SQL concluφdo com Ωxito.

SQL> SELECT * FROM transport_set_violations;

VIOLATIONS

——————————————————————————–

Sys owned object  MYTABLE in tablespace LMC not allowed in pluggable set

– ooooopsie dasy

SQL> drop table mytable;

Tabela suprimida.

SQL> conn system/oracle

Ligado.

SQL> create table mytable tablespace lmc as

2  select * from dba_objects where rownum < 101;

Tabela criada.

SQL> conn sys as sysdba

Introduzir senha:

Ligado.

SQL> EXEC DBMS_TTS.TRANSPORT_SET_CHECK(ts_list => ‘LMC’, incl_constraints => TRUE);

Procedimento de PL/SQL concluφdo com Ωxito.

SQL> SELECT * FROM transport_set_violations;

nπo foram seleccionadas linhas

– Cool, we can go on

exp USERID=’sys/oracle AS SYSDBA’         TRANSPORT_TABLESPACE=y         TABLESPACES=LMC    FILE=C:\oracle\oradata\orc9\lmc_tt.dmp

SQL> alter tablespace lmc read only;

Tablespace alterado.

——- On Database 11G ——

Copy the file “lmc_01.dbf” to: “c:\oracle\oradata\orco”

imp USERID=’sys/oracle AS SYSDBA’   TRANSPORT_TABLESPACE=y     DATAFILES=’C:\oracle\oradata\orco\LMC_01.DBF’   TABLESPACES=lmc FILE=C:\oracle\oradata\orc9\lmc_tt.dmp

SQL> alter tablespace lmc read write;

Tablespace alterado.

SQL> conn system/oracle

Ligado.

SQL> desc mytable

Nome                                      Nulo?    Tipo
----------------------------------------- -------- -------------
OWNER                                              VARCHAR2(30)
OBJECT_NAME                                        VARCHAR2(128)
SUBOBJECT_NAME                                     VARCHAR2(30)
OBJECT_ID                                          NUMBER
DATA_OBJECT_ID                                     NUMBER
OBJECT_TYPE                                        VARCHAR2(18)
CREATED                                            DATE
LAST_DDL_TIME                                      DATE
TIMESTAMP                                          VARCHAR2(19)
STATUS                                             VARCHAR2(7)
TEMPORARY                                          VARCHAR2(1)
GENERATED                                          VARCHAR2(1)
SECONDARY                                          VARCHAR2(1)

SQL> select count(*) from mytable;

COUNT(*)

———-

100

Bingo. Has if by magic the table arrives at the 11g database untouched!

Hope you liked it.

LMC.