Since the first ages of Oracle Database Architecture, one of the most common bottlenecks in high load transactional systems could be the speed of the disks where the log files were stored. One of the symptoms of this problem would be an upsetting event called “Log File Sync”. Basically the hiccups on the redo log files storage, no matter how good the latency was, would create a cascading effect on the performance of the whole logging subsystem affecting the whole system’s responsiveness.
An apparent solution would be to put redo log files inside SSD disks or any other low latency storage solution. But you will eventually hit the wall and its cascading effects, if the storage couldn’t cope with the amount of transactions that it’s supposed to deal with. Another apparent solution that some might think of would be to create only one member per redo log group. But it’s not, and it raises the risk which is an unnecessary measure.
The solution that the Oracle Database development team came up with in 126.96.36.199 (also in 188.8.131.52 BP11) is so powerful and yet so simple!
Now when log writer (LGWR) picks up a bunch of log entries from the log buffer to write them to redo logs, it just sends a write request to both disk and flash, the first one to send back an acknowledgement that the write was done, is taken as the confirmation of the write. By now you might be thinking: “Well flash is going to be always be the first one!”. You’re almost right. But if flash is always faster, then what happens to the disk files? They’re written anyway, but if disk fails or lags behind, the log information is in flash. And if flash fails, the log information will be in another cell. So the hiccups are eliminated and the high transactional systems would have to wear out flash cards, which is something that just won’t happen before any other piece get worn out.
- Version 184.108.40.206 at the cell level
- Version 220.127.116.11 at the DB level (when it comes out) or 18.104.22.168 BP11
- Smart Flash Logging has to be enabled (it’s enabled by default, only worry if you disabled it)
- In systems that are not highly transactional like ODSs or DWs you can disable this feature for all databases or for just one database (in cellcli prompt):
- For all use the command DROP FLASHLOG [FORCE]
- For just one, alter the default IORM plan: ALTER IORMPLAN dbplan=((name=test, flashLog=off))
- To monitor the behaviour of this mechanism go to the cellcli console and use the following command:
- LIST FLASHLOG [DETAIL]
- or use all the statistics that start with FL_*, like for example: FL_FLASH_IO_ERRS (number of errors when writing to flash => the few times that the disk saved the day).
See the picture below:
This is ground breaking stuff. It’s the removal of an old restriction with an ingenious solution.