Search This Blog

Friday, December 6, 2013

Exadata Storage Index solving an endless update

The following simple update was running every evening, endlessly, on a production Exadata machine, Oracle version 11.2.0.3 (well, not truly endlessly, but for about an 30-60 minutes, which is considered to be quite endless in my standards...).

UPDATE T1
SET EXP_DATE = SYSDATE
WHERE P_TIME IS NOT NULL
AND EXP_DATE IS NULL


The update was running on a 20M rows table, only using two NULL/NOT NULL filtering criteria, and updates around 10-20K rows.
Since it run for about an hour, using a FULL TABLE SCAN, in this hour other updates that were running simultaneously on the same table (with different criteria) were suffering from row locks on a daily basis, and this was simply not acceptable. 
Analyzing this intrusive update indicated that the update was spending most of its time on IO Waits, as many unneeded blocks were returned by the Exadata storage tier to the Exadata database tier, and only then database tied was able to start applying the null filters ( i.e "P_TIME IS NOT NULL AND EXP_DATE IS NULL").

Index was not an option here. So what is?

Since nulls are used here, Exadata Storage Index (which is part of Exadata offloading/smart scan features) could be the solution, if we just let Oracle use it.

For those of you who are yet unfamiliar with Exadata storage indexes, here is a brief explanation:

Storage Indexes is a very powerful Exadata feature, when used. Don't mix it with Oracle’s traditional B-Tree or bitmapped indexes as they are not stored in the database. Instead, its a feature (in-memory structures) of the storage server software that is designed to eliminate disk I/O. They identify locations where the requested records are not, instead of the other way around.
They store minimum and maximum column values for disk storage units. Because SQL predicates are passed to the storage servers when Smart Scans are performed, the storage software can check the predicates against the Storage Index metadata (maximum and minimum values) before doing the requested I/O. Any storage region that cannot possibly have a matching row is skipped.
In addition to the minimum and maximum values, there is a flag indicating whether any of the records in a storage region contain nulls.
You can read more about exadata storage index in the following link
http://www.oracle.com/technetwork/issue-archive/2011/11-may/o31exadata-354069.html  

Anyway, before we drill into how storage indexes tremendously helped in this case, here are some basic info on the update: its text, execution plan, and runtime statistics over the last two weeks: 

SQL> select sql_text from dba_hist_sqltext where sql_id='bxcvv372pkr7m';
 
UPDATE T1 SET EXP_DATE = SYSDATE WHERE P_TIME IS NOT NULL AND EXP_DATE IS NULL
 
SQL> select * from table ( DBMS_XPLAN.DISPLAY_AWR('bxcvv372pkr7m',3865564104) );
 
PLAN_TABLE_OUTPUT
---------------------------------------------------
SQL_ID bxcvv372pkr7m
--------------------
UPDATE T1 SET EXP_DATE = SYSDATE WHERE P_TIME
IS NOT NULL AND EXP_DATE IS NULL

 
Plan hash value: 3865564104
--------------------------------------------------------------------------------------------------
| Id  | Operation                                             | Name            |
--------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                         |                     |           

|   1 |  UPDATE                                               | T1                |           
|   2 |   TABLE ACCESS STORAGE FULL | T1                 | 
--------------------------------------------------------------------------------------------------
 
This above plan (to achieve an easier display, I have removed several fields from it) is showing us that full table scan is being used while running the update, and that smart scan can be used here, theoretically.
Selecting basic statistics on Table T1 shows that it has almost 20M rows over 1M blocks  (block size is 8k).

SQL> select num_rows,blocks,last_analyzed from dba_tables a where table_name='T1';
 
  NUM_ROWS     BLOCKS      LAST_ANAL
-------------------    ------------      -----------------
  19309303           1096332         28-NOV-13


And here are some statistics on its execution taken from dba_hist_sqlstat (b.t.w., dba_hist_sqlstat is an excellent place to grab overtime statistics information on sql executions):

select  trunc(b.begin_interval_time) btime ,
      a.plan_hash_value  plan_hs,
      sum(a.executions_delta) execs,
      sum(trunc(a.elapsed_time_delta/ 1000000/decode(a.executions_delta,0,1,a.executions_delta) ,3) ) tm_sec,
      sum(trunc(a.rows_processed_delta/decode(a.executions_delta,0,1,a.executions_delta) ,2) ) nrows,
      sum(trunc( a.buffer_gets_delta/decode(a.executions_delta,0,1,a.executions_delta) ,2)) buff_avg,
      sum(a.PHYSICAL_READ_BYTES_DELTA) pysrd,
      sum(a.PHYSICAL_READ_REQUESTS_DELTA) pysrq,
      sum(a.DISK_READS_DELTA) dkrd,
      sum(a.IO_INTERCONNECT_BYTES_DELTA) intercn,
      sum(a.IO_OFFLOAD_RETURN_BYTES_DELTA)
offld,
      sum(a.IO_OFFLOAD_ELIG_BYTES_DELTA) elig
from dba_hist_sqlstat a , dba_hist_snapshot b
where sql_id='bxcvv372pkr7m'
and a.snap_id=b.snap_id
and b.begin_interval_time > sysdate -14
group by trunc(begin_interval_time), plan_hash_value
order by trunc(begin_interval_time) desc;


btime                plan_hs  execs   tm_sec nrows  buff_avg    pysrd                pysrq     dkrd          intercn      offld  elig
------                 ------         ------     ------         ------              ------ ------ ------ ------ ------
01-DEC-2013  3865564104   1   1751   16752   4327717   32936501248   104370   4020569   32936501248   0   0
28-NOV-2013  3865564104   1   2511          0    3873633   30624784384   59219     3738377   30624784384   0   0
27-NOV-2013  3865564104   1   3042   21454   5210873   41140527104   74379   5022037   41140527104   0   0
26-NOV-2013  3865564104   1   1504   23270   3220407   25268518912   50146   3084536   25268518912   0   0
25-NOV-2013  3865564104   1   1730          0    1954352   15447539712   27075   1885686   15447539712   0   0
24-NOV-2013  3865564104   1   1712   17479   4329434   33956782080   77921   4145115   33956782080   0   0
21-NOV-2013  3865564104   1   2492   16112   4303976   34013880320   71177   4152085   34013880320   0   0
20-NOV-2013  3865564104   1   2749   14181   4352718   33895473152   81683   4137631   33895473152   0   0
19-NOV-2013  3865564104   1   2591   18819   5365660   42522451968   76118   5190729   42522451968   0   0

As indicated by IO_INTERCONNECT_BYTES_DELTA,  exadata storage tier had to pass huge amount of IO to the database tier, and as indicate by the zero values of IO_OFFLOAD_RETURN_BYTES_DELTA, although plan is showing "TABLE ACCESS STORAGE FULL", there was no exadata smart scan/offloading here.

So why is that? Why does Oracle not using Exadata powerful offloading abilities?

This is because in order for smart scan to happen, few basic prerequisites need to be fulfilled:

1) There must be a full scan of an object (and indeed there is)
2) The object must be stored on Exadata storage (and indeed it is)
3) The scan must use Oracle’s Direct Path Read mechanism (and it does not) => So this unfulfilled prerequisite disables the use of smart scan here!!!!

How do I know direct path read was not used?
Lets look at DBA_HIST_ACTIVE_SESS_HISTORY (b.t.w, its another great place to historically analyze  what was going on while SQL was executed). It will show us what Oracle was doing while running this update over the last two weeks:

SQL> select nvl(event,'CPU') event,count(*)
from DBA_HIST_ACTIVE_SESS_HISTORY
where sql_id='bxcvv372pkr7m'
and sample_time > sysdate - 14
group by event
order by count(*) desc; 

 
EVENT                                                              COUNT(*)
---------------------------------------------------------------- ----------
cell multiblock physical read                                          2018
CPU                                                                                    94
cell single block physical read                                           37
read by other session                                                            1

 
And indeed, what we see here is that most of the time Oracle was waiting on "cell multiblock physical read", which is the regular full scan of exadata (also known as "db file scattered read" in non exeadata machines).
If a smart scan would have been used here I would expect to see the "cell smart table scan" wait event.

OK, so lets undo the third restriction. Lets force Oracle use direct path read.
How do we force it? By telling oracle, using hints for example, to run the update on parallel:

SQL> select sql_text from dba_hist_sqltext where sql_id='88s54njj4n3mj';
 
UPDATE /*+ full(A) parallel(A,8) */ T1 A SET EXP_DATE = SYSDATE WHERE P_TIME IS NOT NULL AND EXP_DATE IS NULL

Lets run it with the new hints and see what Oracle will do this time:
When Oracle is using Storage Index, the only way to verify it is by checking the statistics "cell physical IO bytes saved by storage index" is increasing.

SQL> select name,value
from v$mystat s, v$statname n
where s.statistic# = n.statistic#
and n.name in ('cell physical IO bytes saved by storage index',
        'cell physical IO interconnect bytes returned by smart scan');

NAME                                                                                               VALUE
----------------------------------------------------------------                      ----------
cell physical IO bytes saved by storage index                                   0
cell physical IO interconnect bytes returned by smart scan               0


The statistics is showing us that up till now, no storage index offload was used by this session.
Now, lets run the update and see whether this statistic will increase as expected.

SQL> UPDATE /*+ full(A) parallel(A,8) */ T1 A SET EXP_DATE = SYSDATE WHERE P_TIME IS NOT NULL AND EXP_DATE IS NULL;

SQL> select name,value
from v$mystat s, v$statname n
where s.statistic# = n.statistic#
and n.name in ('cell physical IO bytes saved by storage index',
        'cell physical IO interconnect bytes returned by smart scan');


NAME                                                                                               VALUE
----------------------------------------------------------------                      ----------
cell physical IO bytes saved by storage index                                   3106807808
cell physical IO interconnect bytes returned by smart scan              4052928


Yes! the above update was using storage index. It is indicating that 3,106,807,808 bytes were saved by storage index. Wow!!! That's a lot!!!
And see other statistics. Now, after storage index was used, only 4052928 bytes were returned from the storage tier to the database tier, through the interconnect.
This is a dramatic saving.

The update has finished  in 8 seconds!!! I will say that again folks, as you may not hear me right.
Updating 15k rows in 20M rows table now took only 8 second!!!
No more 1 hour update. Big wow! I call this a huge saving, a tremendous improvement, great relief.

Lets view that execution plan this time:

SQL> select * from table ( DBMS_XPLAN.DISPLAY_cursor) ;
PLAN_TABLE_OUTPUT
------------------------------------
SQL_ID  43x29vnth9z4x, child number 0
-------------------------------------
UPDATE /*+ full(A) parallel(A,8) */ T1 A SET
EXP_DATE = SYSDATE WHERE P_TIME IS NOT NULL AND EXP_DATE IS NULL


Plan hash value: 4279346641
--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                    |     TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                       |                   |      
|   1 |  UPDATE                                             | T1              |

|   2 |   PX COORDINATOR                                    
|   3 |    PX SEND QC (RANDOM)                | :TQ10000 |    Q1,00 | P->S    | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR                    |                  |    Q1,00 | PCWC |                       |
|*  5 |      TABLE ACCESS STORAGE FULL| T1             |    Q1,00 | PCWP |                       |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - storage(:Z>=:Z AND :Z<=:Z AND ("EXP_DATE" IS NULL AND "P_TIME" IS NOT NULL))
       filter(("EXP_DATE" IS NULL AND "P_TIME" IS NOT NULL))


The plan is showing us that parallel execution did happen, and that Smart Scan is possible now, as the predicate information is displaying the storage line.

And finally, lets observe which wait events occurred this time while running the update:

SQL> select nvl(event,'CPU') event,count(*)
from v$active_session_history
where sql_id='43x29vnth9z4x'
and sample_time > sysdate - 1/24
group by event
order by count(*) desc; 

  
EVENT                                                                          COUNT(*)
---------------------------------------------------------------- ----------
cell smart table scan                                                     8


As expected, we now see the wait event "cell smart table scan", which proves that Smart Scan was used now.

Needless to say that after this change locking situation stopped occuring.
Mission completed. Well done :)

That's all for now. Hope you've enjoyed this post. As always, comments are most welcomed.

Merav Kedem,
Founder of
DB OPTimize
Oracle Performance Tuning & DBA Consulting

2 comments: