Search This Blog

Friday, December 6, 2013

Exadata Storage Index solving an endless update

The following simple update was running every evening, endlessly, on a production Exadata machine, Oracle version 11.2.0.3 (well, not truly endlessly, but for about an 30-60 minutes, which is considered to be quite endless in my standards...).

UPDATE T1
SET EXP_DATE = SYSDATE
WHERE P_TIME IS NOT NULL
AND EXP_DATE IS NULL


The update was running on a 20M rows table, only using two NULL/NOT NULL filtering criteria, and updates around 10-20K rows.
Since it run for about an hour, using a FULL TABLE SCAN, in this hour other updates that were running simultaneously on the same table (with different criteria) were suffering from row locks on a daily basis, and this was simply not acceptable. 
Analyzing this intrusive update indicated that the update was spending most of its time on IO Waits, as many unneeded blocks were returned by the Exadata storage tier to the Exadata database tier, and only then database tied was able to start applying the null filters ( i.e "P_TIME IS NOT NULL AND EXP_DATE IS NULL").

Index was not an option here. So what is?

Since nulls are used here, Exadata Storage Index (which is part of Exadata offloading/smart scan features) could be the solution, if we just let Oracle use it.

For those of you who are yet unfamiliar with Exadata storage indexes, here is a brief explanation:

Storage Indexes is a very powerful Exadata feature, when used. Don't mix it with Oracle’s traditional B-Tree or bitmapped indexes as they are not stored in the database. Instead, its a feature (in-memory structures) of the storage server software that is designed to eliminate disk I/O. They identify locations where the requested records are not, instead of the other way around.
They store minimum and maximum column values for disk storage units. Because SQL predicates are passed to the storage servers when Smart Scans are performed, the storage software can check the predicates against the Storage Index metadata (maximum and minimum values) before doing the requested I/O. Any storage region that cannot possibly have a matching row is skipped.
In addition to the minimum and maximum values, there is a flag indicating whether any of the records in a storage region contain nulls.
You can read more about exadata storage index in the following link
http://www.oracle.com/technetwork/issue-archive/2011/11-may/o31exadata-354069.html  

Anyway, before we drill into how storage indexes tremendously helped in this case, here are some basic info on the update: its text, execution plan, and runtime statistics over the last two weeks: 

SQL> select sql_text from dba_hist_sqltext where sql_id='bxcvv372pkr7m';
 
UPDATE T1 SET EXP_DATE = SYSDATE WHERE P_TIME IS NOT NULL AND EXP_DATE IS NULL
 
SQL> select * from table ( DBMS_XPLAN.DISPLAY_AWR('bxcvv372pkr7m',3865564104) );
 
PLAN_TABLE_OUTPUT
---------------------------------------------------
SQL_ID bxcvv372pkr7m
--------------------
UPDATE T1 SET EXP_DATE = SYSDATE WHERE P_TIME
IS NOT NULL AND EXP_DATE IS NULL

 
Plan hash value: 3865564104
--------------------------------------------------------------------------------------------------
| Id  | Operation                                             | Name            |
--------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                         |                     |           

|   1 |  UPDATE                                               | T1                |           
|   2 |   TABLE ACCESS STORAGE FULL | T1                 | 
--------------------------------------------------------------------------------------------------
 
This above plan (to achieve an easier display, I have removed several fields from it) is showing us that full table scan is being used while running the update, and that smart scan can be used here, theoretically.
Selecting basic statistics on Table T1 shows that it has almost 20M rows over 1M blocks  (block size is 8k).

SQL> select num_rows,blocks,last_analyzed from dba_tables a where table_name='T1';
 
  NUM_ROWS     BLOCKS      LAST_ANAL
-------------------    ------------      -----------------
  19309303           1096332         28-NOV-13


And here are some statistics on its execution taken from dba_hist_sqlstat (b.t.w., dba_hist_sqlstat is an excellent place to grab overtime statistics information on sql executions):

select  trunc(b.begin_interval_time) btime ,
      a.plan_hash_value  plan_hs,
      sum(a.executions_delta) execs,
      sum(trunc(a.elapsed_time_delta/ 1000000/decode(a.executions_delta,0,1,a.executions_delta) ,3) ) tm_sec,
      sum(trunc(a.rows_processed_delta/decode(a.executions_delta,0,1,a.executions_delta) ,2) ) nrows,
      sum(trunc( a.buffer_gets_delta/decode(a.executions_delta,0,1,a.executions_delta) ,2)) buff_avg,
      sum(a.PHYSICAL_READ_BYTES_DELTA) pysrd,
      sum(a.PHYSICAL_READ_REQUESTS_DELTA) pysrq,
      sum(a.DISK_READS_DELTA) dkrd,
      sum(a.IO_INTERCONNECT_BYTES_DELTA) intercn,
      sum(a.IO_OFFLOAD_RETURN_BYTES_DELTA)
offld,
      sum(a.IO_OFFLOAD_ELIG_BYTES_DELTA) elig
from dba_hist_sqlstat a , dba_hist_snapshot b
where sql_id='bxcvv372pkr7m'
and a.snap_id=b.snap_id
and b.begin_interval_time > sysdate -14
group by trunc(begin_interval_time), plan_hash_value
order by trunc(begin_interval_time) desc;


btime                plan_hs  execs   tm_sec nrows  buff_avg    pysrd                pysrq     dkrd          intercn      offld  elig
------                 ------         ------     ------         ------              ------ ------ ------ ------ ------
01-DEC-2013  3865564104   1   1751   16752   4327717   32936501248   104370   4020569   32936501248   0   0
28-NOV-2013  3865564104   1   2511          0    3873633   30624784384   59219     3738377   30624784384   0   0
27-NOV-2013  3865564104   1   3042   21454   5210873   41140527104   74379   5022037   41140527104   0   0
26-NOV-2013  3865564104   1   1504   23270   3220407   25268518912   50146   3084536   25268518912   0   0
25-NOV-2013  3865564104   1   1730          0    1954352   15447539712   27075   1885686   15447539712   0   0
24-NOV-2013  3865564104   1   1712   17479   4329434   33956782080   77921   4145115   33956782080   0   0
21-NOV-2013  3865564104   1   2492   16112   4303976   34013880320   71177   4152085   34013880320   0   0
20-NOV-2013  3865564104   1   2749   14181   4352718   33895473152   81683   4137631   33895473152   0   0
19-NOV-2013  3865564104   1   2591   18819   5365660   42522451968   76118   5190729   42522451968   0   0

As indicated by IO_INTERCONNECT_BYTES_DELTA,  exadata storage tier had to pass huge amount of IO to the database tier, and as indicate by the zero values of IO_OFFLOAD_RETURN_BYTES_DELTA, although plan is showing "TABLE ACCESS STORAGE FULL", there was no exadata smart scan/offloading here.

So why is that? Why does Oracle not using Exadata powerful offloading abilities?

This is because in order for smart scan to happen, few basic prerequisites need to be fulfilled:

1) There must be a full scan of an object (and indeed there is)
2) The object must be stored on Exadata storage (and indeed it is)
3) The scan must use Oracle’s Direct Path Read mechanism (and it does not) => So this unfulfilled prerequisite disables the use of smart scan here!!!!

How do I know direct path read was not used?
Lets look at DBA_HIST_ACTIVE_SESS_HISTORY (b.t.w, its another great place to historically analyze  what was going on while SQL was executed). It will show us what Oracle was doing while running this update over the last two weeks:

SQL> select nvl(event,'CPU') event,count(*)
from DBA_HIST_ACTIVE_SESS_HISTORY
where sql_id='bxcvv372pkr7m'
and sample_time > sysdate - 14
group by event
order by count(*) desc; 

 
EVENT                                                              COUNT(*)
---------------------------------------------------------------- ----------
cell multiblock physical read                                          2018
CPU                                                                                    94
cell single block physical read                                           37
read by other session                                                            1

 
And indeed, what we see here is that most of the time Oracle was waiting on "cell multiblock physical read", which is the regular full scan of exadata (also known as "db file scattered read" in non exeadata machines).
If a smart scan would have been used here I would expect to see the "cell smart table scan" wait event.

OK, so lets undo the third restriction. Lets force Oracle use direct path read.
How do we force it? By telling oracle, using hints for example, to run the update on parallel:

SQL> select sql_text from dba_hist_sqltext where sql_id='88s54njj4n3mj';
 
UPDATE /*+ full(A) parallel(A,8) */ T1 A SET EXP_DATE = SYSDATE WHERE P_TIME IS NOT NULL AND EXP_DATE IS NULL

Lets run it with the new hints and see what Oracle will do this time:
When Oracle is using Storage Index, the only way to verify it is by checking the statistics "cell physical IO bytes saved by storage index" is increasing.

SQL> select name,value
from v$mystat s, v$statname n
where s.statistic# = n.statistic#
and n.name in ('cell physical IO bytes saved by storage index',
        'cell physical IO interconnect bytes returned by smart scan');

NAME                                                                                               VALUE
----------------------------------------------------------------                      ----------
cell physical IO bytes saved by storage index                                   0
cell physical IO interconnect bytes returned by smart scan               0


The statistics is showing us that up till now, no storage index offload was used by this session.
Now, lets run the update and see whether this statistic will increase as expected.

SQL> UPDATE /*+ full(A) parallel(A,8) */ T1 A SET EXP_DATE = SYSDATE WHERE P_TIME IS NOT NULL AND EXP_DATE IS NULL;

SQL> select name,value
from v$mystat s, v$statname n
where s.statistic# = n.statistic#
and n.name in ('cell physical IO bytes saved by storage index',
        'cell physical IO interconnect bytes returned by smart scan');


NAME                                                                                               VALUE
----------------------------------------------------------------                      ----------
cell physical IO bytes saved by storage index                                   3106807808
cell physical IO interconnect bytes returned by smart scan              4052928


Yes! the above update was using storage index. It is indicating that 3,106,807,808 bytes were saved by storage index. Wow!!! That's a lot!!!
And see other statistics. Now, after storage index was used, only 4052928 bytes were returned from the storage tier to the database tier, through the interconnect.
This is a dramatic saving.

The update has finished  in 8 seconds!!! I will say that again folks, as you may not hear me right.
Updating 15k rows in 20M rows table now took only 8 second!!!
No more 1 hour update. Big wow! I call this a huge saving, a tremendous improvement, great relief.

Lets view that execution plan this time:

SQL> select * from table ( DBMS_XPLAN.DISPLAY_cursor) ;
PLAN_TABLE_OUTPUT
------------------------------------
SQL_ID  43x29vnth9z4x, child number 0
-------------------------------------
UPDATE /*+ full(A) parallel(A,8) */ T1 A SET
EXP_DATE = SYSDATE WHERE P_TIME IS NOT NULL AND EXP_DATE IS NULL


Plan hash value: 4279346641
--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                    |     TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                       |                   |      
|   1 |  UPDATE                                             | T1              |

|   2 |   PX COORDINATOR                                    
|   3 |    PX SEND QC (RANDOM)                | :TQ10000 |    Q1,00 | P->S    | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR                    |                  |    Q1,00 | PCWC |                       |
|*  5 |      TABLE ACCESS STORAGE FULL| T1             |    Q1,00 | PCWP |                       |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - storage(:Z>=:Z AND :Z<=:Z AND ("EXP_DATE" IS NULL AND "P_TIME" IS NOT NULL))
       filter(("EXP_DATE" IS NULL AND "P_TIME" IS NOT NULL))


The plan is showing us that parallel execution did happen, and that Smart Scan is possible now, as the predicate information is displaying the storage line.

And finally, lets observe which wait events occurred this time while running the update:

SQL> select nvl(event,'CPU') event,count(*)
from v$active_session_history
where sql_id='43x29vnth9z4x'
and sample_time > sysdate - 1/24
group by event
order by count(*) desc; 

  
EVENT                                                                          COUNT(*)
---------------------------------------------------------------- ----------
cell smart table scan                                                     8


As expected, we now see the wait event "cell smart table scan", which proves that Smart Scan was used now.

Needless to say that after this change locking situation stopped occuring.
Mission completed. Well done :)

That's all for now. Hope you've enjoyed this post. As always, comments are most welcomed.

Merav Kedem,
Founder of
DB OPTimize
Oracle Performance Tuning & DBA Consulting

Thursday, May 2, 2013

Exadata, Oracle and its Optimizer - The Dark Effect of Wrong Statistics

Exadata or not, sometimes Oracle is still Oracle, optimizer is still optimizer, and math is still math.

Here is the story: Instance is Oracle 11.2.0.3, running on an Exadata machine (X2-2 half Rack).
User is complaining query is taking too long to finish (about 5 minutes).
User has also said that last time he run that query, the range scanned was much wider (about a year), nevertheless no performance problems had occurred. Query had finished rather fast the previous time it run... (this is a point to remember for later on).

Let's look at the query. For the sake of this post, I have simplified it a bit but the idea remains the same.
Basically it is just a simple select, with 4 months range predicates on a date column (SCAN_DATE), and another predicates on some non unique id column (POLICY_NUM).

     SELECT COUNT(SCAN_DATE)
     FROM my_user.my_table a
     WHERE scan_date >= TO_DATE('01/01/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
     AND scan_date < TO_DATE('01/05/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
     AND ( policy_num ='183' OR policy_num='000000183');

Let's run it and see what happens:

09:51:12 SQL> SELECT COUNT(SCAN_DATE)
 FROM my_user.my_table A
WHERE SCAN_DATE >= TO_DATE('01/01/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
AND SCAN_DATE  < TO_DATE('01/05/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
and ( POLICY_NUM   ='183' OR policy_num='000000183');


COUNT(SCAN_DATE)
----------------
               0

Elapsed: 00:04:43.87      => NOTICE THE HIGH ELAPSED TIME

Execution Plan
----------------------------------------------------------
Plan hash value: 877559244

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                                 | Name                       | Rows| Bytes | Cost (%CPU)|Time |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                                     |     1 |    16 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                              |                                    |     1 |    16 |               |                 |
|*  2 |   TABLE ACCESS BY INDEX ROWID  | MY_TABLE                |     1 |    16 |     7   (0)| 00:00:01  |
|*  3 |    INDEX RANGE SCAN                      | SCAN_DATE_INDX |     9 |         |     3   (0)| 00:00:01  |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("POLICY_NUM"='000000183' OR "POLICY_NUM"='183')
   3 - access("SCAN_DATE">=TO_DATE(' 2013-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "SCAN_DATE"<TO_DATE(' 2013-05-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    4622469  consistent gets      => NOTICE THE HIGH CONSISTENT GETS
     375620  physical reads       => AND HIGH PHYSICAL READS
       8320  redo size
        533  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The above execution plan is showing index range scan is used (bolded line). It is also stated in the plan that optimizer thinks only 9 rows will be returned by the index. This looked suspicious to me, since table has millions of rows.
   
Let's view the execution in OEM SQL Monitoring (this is a great and most recommended feature!!!).



Indeed it shows that although estimated rows is 9 (and this is what making the optimizer wrongly choose his decision), the actual returned rows are 10M.
Exadata or not Exadata, optimizer is still optimizer, and math is still math.
Scanning 10M rows through an index is obviously not the best thing to do (and this is an understatement).
Just think about it. Even if IO latency is very very good, lets say 0.5ms (which is considered to be very good latency for random reads), multiplying 0.5ms by 10M seeks will still take forever.

So what is making Oracle choose a wrong decision?

10:21:32 SQL> select num_rows,last_analyzed
from dba_tables
where owner='MY_OWNER'
and table_name='MY_TABLE';


NUM_ROWS       LAST_ANAL
------------------     -----------------
168,562,490          26-OCT-12


Table was last analyzed on 26.10.2012? Six months ago? This doesn't sound promising. But let's continue and look what are the statistics of those two relevant columns, especially low value and high value..
(btw, I have simplified the select on the low/high_value to make the query below more readable and short. To be able to see the real values on these two column you need to use some conversion on it.)

10:28:29 SQL> SELECT column_name,data_type,num_distinct,density,last_analyzed,low_value,high_value
10:28:29  18   from DBA_TAB_COLUMNS a
10:28:29  19   where table_name='MY_TABLE'
10:28:29  20   and column_name in ('POLICY_NUM','SCAN_DATE');


COLUMN_NAME DATA_TYPE  NUM_DISTINCT    DENSITY LAST_ANAL  LOW_V                HI_V
---------------            ----------            ------------ ---------- --------- ---------- -------------------- --------------------
POLICY_NUM      VARCHAR2    3922917    5.1066E-06 26-OCT-12                                          8888
SCAN_DATE       DATE              19276420  .00002758   26-OCT-12           1981-1-1 0:0:0      2012-10-22 2:51:48



Look what we found. The highest value of SCAN_DATE column is 22.10.2012. Obviously this is because last time table was analyzed was in 26.10.2012. No wonder Oracle thought there won't be many rows returned for the requested predicate. The requested predicate is requesting rows when scan date between 1.1.2013 - 1.5.2013. This date range is higher thna the high_value of scan_date column:

      WHERE SCAN_DATE >= TO_DATE('01/01/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
      AND SCAN_DATE  < TO_DATE('01/05/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS')


Now let's see how many rows are actually in the table. Notice the hints I have used, and how fast Oracle returned the result. Those hints are forcing Oracle to use index fast full scan with parallel query (degree 4), which is much more appropriate in this case than a simple index range scan, since table has millions of rows (Note: Exadata offloading is also relevant here, and I have added a special section for that at the end of this post).

10:04:49 SQL> SELECT /*+ index_ffs (a,scan_date_indx) parallel_index(a,scan_date_indx,4)  */

COUNT(SCAN_DATE)
FROM my_owner.my_table A;

COUNT(SCAN_DATE)
----------------
       186,863,326


Elapsed: 00:00:05.44

Table has 186M rows. Now let's see how many rows will be returned for the requested dates:

10:05:11 SQL> SELECT /*+ index_ffs (a,scan_date_indx)  parallel_index(a,scan_date_indx,4)  */ COUNT(SCAN_DATE)
  FROM my_owner.my_table A
  WHERE SCAN_DATE >= TO_DATE('01/01/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
  and scan_date  < to_date('01/05/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS');


COUNT(SCAN_DATE)
----------------
        10,213,081


Elapsed: 00:00:04.95

Indeed, 10M rows are returned here. This is not a surprise, since we have already seen the huge gap between estimated rows and actual rows in OEM's SQL Monitoring screen.

Here is the time to say that after finding there is a statistics problem here, I have explored the issue a bit more with the customer, to figure out what was the case when query run much faster (remember this was part of the complaint?). It appeared that query run much faster when the rows were selected from year 2012. Customer has informed me that since they started running the query on year 2013, query was taking forever..
And now we know why..
Let's explore it a bit more, and see how many rows are returned for 2012:

10:06:07 SQL> SELECT /*+ index_ffs (a,scan_date_indx)  parallel_index(a,scan_date_indx,4)  */ COUNT(SCAN_DATE)
10:06:07   2  FROM my_owner.my_table A
10:06:07   3  WHERE SCAN_DATE >= TO_DATE('01/01/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
10:06:07   4  and scan_date  < to_date('01/05/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS');


COUNT(SCAN_DATE)
----------------
        10,381,170


Elapsed: 00:00:04.24

Let's see what Oracle thinks about the cardinality of this predicate:

10:07:16 SQL> SELECT /*+ index_ffs (a,scan_date_indx)  parallel_index(a,scan_date_indx,4)  */ COUNT(SCAN_DATE)
10:07:16   2  FROM my_owner.my_table  A
10:07:16   3  WHERE SCAN_DATE >= TO_DATE('01/01/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
10:07:16   4  and scan_date  < to_date('01/05/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS');


COUNT(SCAN_DATE)
----------------
        10381170


Elapsed: 00:00:04.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1948420453

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                              |                         |     1 |     7 |   155K  (1)| 00:31:05 |        |      |            |
|   1 |  SORT AGGREGATE                                 |                         |     1 |     7 |                  |          |        |      |            |
|   2 |   PX COORDINATOR                                |                         |        |        |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)                      | :TQ10000       |     1 |     7 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                              |                         |     1 |     7 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR                         |                        |  9473K|    63M|   155K  (1)| 00:31:05 |  Q1,00 |PCWC||
|*  6 |       INDEX STORAGE FAST FULL SCAN| SCAN_DATE_INDX |  9473K|    63M|   155K  (1)| 00:31:05 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - storage("SCAN_DATE">=TO_DATE(' 2012-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "SCAN_DATE"<TO_DATE('
              2012-05-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("SCAN_DATE">=TO_DATE(' 2012-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "SCAN_DATE"<TO_DATE('
              2012-05-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


Statistics
----------------------------------------------------------
         12  recursive calls
        108  db block gets
     588111  consistent gets
     585699  physical reads
          0  redo size
        537  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 

  
As expected, when the given date range is for 2012 (i.e SCAN_DATE >= TO_DATE('01/01/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS') and SCAN_DATE < to_date('01/05/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS') ),
the estimated returned rows Oracle is showing (9473K) is much more accurate.

And why does the query returns much faster when the given date range was for 2012?
Because Oracle knows scanning this index will bring 10M rows, so it smartly decides not to use this index. Instead, it uses the index on POLICY_NUM, as shown below.
Notice also the monitor hint. This will enable us to view the execution in SQL Monitoring, since by default, only long running queries (above 5 seconds) or parallel queries are shown.


09:49:11 SQL> set autotrace on
09:49:34 SQL>
09:49:46 SQL> SELECT /*+ monitor */ COUNT(SCAN_DATE)
09:49:47   2   FROM my_owner.my_table A
09:49:47   3  WHERE SCAN_DATE >= TO_DATE('01/01/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
09:49:47   4  AND SCAN_DATE  < TO_DATE('01/05/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
09:49:47   5  and ( POLICY_NUM   ='183' OR policy_num='000000183');


COUNT(SCAN_DATE)
----------------
               0


Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1090044264

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                                     |     1 |    16   |    62   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                             |                                      |     1 |    16  |                |          |
|   2 |   INLIST ITERATOR                              |                                      |         |          |                |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID| MY_TABLE                 |     4   |    64 |    62   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                      | INDX_POLICY_NUM |    79 |          |     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("SCAN_DATE">=TO_DATE(' 2012-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "SCAN_DATE"<TO_DATE(' 2012-05-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   4 - access("POLICY_NUM"='000000183' OR "POLICY_NUM"='183')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          8  consistent gets
          3  physical reads
          0  redo size
        533  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


So the only question remains open is why does statistics are not updated, as this is an Oracle 11.2.0.3 instance, and statistics gathering through dbms_stats should be done automatically.
I suspected locked statistics. Let's check that:

10:08:41 SQL> select owner, table_name, stattype_locked
from dba_tab_statistics
where table_name='MY_TABLE' ;


OWNER      TABLE_NAME      STATTYPE_LOCKED
----------       ---------------        -------------------------
IMAGE      MY_TABLE       ALL


As suspected, statistics are locked!!! No wonder last time table was analyzed was 6 months ago. For date fields, this is a bad thing. You can lock statistics if and only if it has no effect the optimizer decision. And clearly this is not the case here. Optimizer decision was effected becuase high_value on the date column had a wrong value (much lower value than it should).
Resolution here is simple. Unlock statistics, re-gather statistics, and... Voilà!! . Query "suddenly" runs much faster...

Let's see what happens to "high_value" statistics, after unlocking and re-gathering statistics on MY_TABLE:

12:41:32 SQL> SELECT owner,column_name,data_type,num_distinct,density,last_analyzed,num_nulls,
low_value,high_value

    from DBA_TAB_COLUMNS a
   where table_name='MY_TABLE'

  and column_name in ('POLICY_NUM','SCAN_DATE');

 COLUMN_NAME     DATA_TYPE  NUM_DISTINCT    DENSITY LAST_ANAL  NUM_NULLS LOW_V                HI_V
--------------- ---------- ------------ ---------- --------- ---------- -------------------- --------------------
POLICY_NUM      VARCHAR2        5634560   9.9753E-06     28-APR-13          0                                   8888
SCAN_DATE        DATE                   5644288  0 .000045952   28-APR-13          0   1981-1-1 0:0:0       2013-4-28 12:1:31


Brilliant. HIGH_VALUE now has the correct value, so let's re-run the original query and see how Oracle execute it this time:

12:42:21 SQL> SELECT COUNT(SCAN_DATE)
 FROM my_owner.my_table A
WHERE SCAN_DATE >= TO_DATE('01/01/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
AND SCAN_DATE  < TO_DATE('01/05/2013 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
and ( POLICY_NUM   ='183' OR policy_num='000000183');


COUNT(SCAN_DATE)
----------------
               0


Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1090044264

-------------------------------------------------------------------------------------------------
| Id  | Operation                                                  | Name                             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                                         |     1  |    18 |    52   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                                |                                         |     1  |    18 |                 |                   |
|   2 |   INLIST ITERATOR                                 |                                         |         |          |                 |                   |
|*  3 |    TABLE ACCESS BY INDEX ROWID   | MY_TABLE                     |     3  |    54 |    52   (0)  | 00:00:01 |
|*  4 |     INDEX RANGE SCAN                        | INDX_POLICY_NUM   |    61 |          |     5   (0)  | 00:00:01  |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("SCAN_DATE">=TO_DATE(' 2013-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "SCAN_DATE"<TO_DATE(' 2013-05-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   4 - access("POLICY_NUM"='000000183' OR "POLICY_NUM"='183')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
        533  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

Excellent. As expected, after Oracle sees the right statistics, it now knows the highest value for SCAN_DATE is 28.4.13 and not 26.10.12, therefore it can smartly choose much more suitable execution plan. This time Oracle decides not  to range scan the index SCAN_DATE_INDX (since it knows such range will be wide), but instead, it chooses to range scan INDX_POLICY_NUM.
This time the result is returned in no time.
  
Oracle is no mystery. Just need to understand it, and know how it works.
:)

And now, if you can just bare with me a bit longer, here is some cool exadata staff to notice:
Since we forced an index fast full scan with parallel, Oracle was able to use Exadata smart scan offloading (btw, to enable the possibility of smart scan use, full scan and direct read are a must). Since the only column used in this query is scan_date, offloading is extremely effective here: more than 4GB were offloaded (i.e database didn't need to process those 4G as most of them were already filtered by the storage). This can also be seen in OEM, as "cell offload efficiency: 97%":

This smart scan can be also verified through v$sesstat:

SQL> select b.name,a.value from v$sesstat a, v$statname b, v$session c
where a.statistic# = b.statistic#
AND A.SID=C.SID
AND ( B.NAME like '%storage%' or b.name like '%offload%')
and a.sid=1542
order by a.value desc; 


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes eligible for predicate offload                     0

cell simulated physical IO bytes returned by predicate offload            0
cell simulated physical IO bytes eligible for predicate offload           0
cell physical IO bytes saved by storage index                             0


Notice that before running the query, "cell physical IO bytes eligible for predicate offload" is 0.
Now, let's run the query again.

SQL> SELECT /*+ index_ffs (a,scan_date_indx)  parallel_index(a,scan_date_indx,4)  */ COUNT(SCAN_DATE)
  2  FROM IMAGE.INS_DOC_S A
  3  WHERE SCAN_DATE >= TO_DATE('01/01/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS')
  4  and scan_date  < to_date('01/05/2012 0:0:0', 'DD/MM/YYYY HH24:MI:SS');


COUNT(SCAN_DATE)
----------------
        10381208


Now let's see if statistics values have changed:

SQL> select b.name,a.value from v$sesstat a, v$statname b, v$session c
where a.statistic# = b.statistic#
AND A.SID=C.SID
AND ( B.NAME like '%storage%' or b.name like '%offload%')
and a.sid=1542
order by a.value desc; 


NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes eligible for predicate offload            4798046208
cell simulated physical IO bytes eligible for predicate offload           0
cell simulated physical IO bytes returned by predicate offload            0
cell physical IO bytes saved by storage index                             0


After we execute the query, "cell physical IO bytes eligible for predicate offload" had jump to 4798046208. Meaning, smart scan did happen, and this offload is all due to this query.
Cool and effective!
:)

That's all for now. Hope you've enjoyed this post.
As always, comments are most welcomed.

Merav Kedem,
Founder of
DB OPTimize
Oracle Performance Tuning & DBA Consulting







Thursday, July 12, 2012

No, It is not always the database fault..

When things are starting to get wrong, Database ( i.e DBA ) is always the first tier to blame. I am sure you have faced this situation, probably much more than once or twice.
Guilty as charged, until proven innocent, right?

Well, it is not that nothing can't go wrong in the database, but I wish it would stop be treated as the black sheep of the family.

Recently, I had been in a situation where application was constantly complaining for bad response time, day after day. First thing to explore was whether application changed its behaviour, since most of the time that is the cause for performance issues. Surprisingly enough, it did not. No new SQLs, no major changes in execution count.
Second, I went to look for database changing its behaviour. But it didn't. No changes in execution plan (usually major cause for performance degradation). As much as I have looked, from the database side, it looked like database didn't change its behavior.

So what could be wrong?

Storage can go wrong for example, and that was the case here.

Looking for problems in the wrong place will not assist in solving them. But just saying "it is not the database" probably won't help. You will probably always have to prove it...
Average physical read/write time is what oracle experiencing when dealing with IO. But underneath these numbers, two more tiers are involved. Storage and File System.

Well, after checking average physical read time and average physical write time of several files in this database, it was clear (unfortunately, just to me for a while) that writes became much slower, at exactly the time when the application have started to complain.

Don't think it was easy. For many days Storage guys continues to say "Nothing is wrong with the storage" (how many times have you heard this? I wonder), but finally, when numbers became so high, it couldn't be ignored any more.

At last, spotlight went to spot on the right place (and stopped pointing to the wrong place, the database). Well, probably just until the next complaint will rise again...but I guess I can't win it all..

Here is a query that can be run in Oracle 10g and above. It might look scary, but in short, it's just go over dba_hist_filestatxs (where file statistics are held overtime), and since numbers there are incremented every snapshot (it doesn't hold the deltas), I have used LAG analytic function to subtract current value from preceding value (i.e, current snapshot from preceding snapshot), and I did it for every statistics I wanted to observe.
That's it.


select *   from (
   select
      b.begin_interval_time begin_interval_time ,  
      nullif((phywrts - ( lag(phywrts,1,0) over( order by a.snap_id ) )),0) delta_phywrts,
      trunc( (writetim*10 - ( lag(writetim*10,1,0) over( order by a.snap_id ) ) ) /  
             nullif((phywrts - ( lag(phywrts,1,0) over( order by a.snap_id ) )),0)  ,3 ) avg_write_ms,
      nullif((phyrds - ( lag(phyrds,1,0) over( order by a.snap_id ) )),0)   delta_phyrds,
      trunc( (readtim*10 - ( lag(readtim*10,1,0) over( order by a.snap_id ) ) ) /  
             nullif((phyrds - ( lag(phyrds,1,0) over( order by a.snap_id ) )),0)  ,3 ) avg_read_ms,
      nullif((singleblkrds - ( lag(singleblkrds,1,0) over( order by a.snap_id ) )),0) delta_singleblkrds,
      trunc( (singleblkrdtim*10 - ( lag(singleblkrdtim*10,1,0) over( order by a.snap_id ) ) ) /  
             nullif((singleblkrds - ( lag(singleblkrds,1,0) over( order by a.snap_id ) ))  ,0),3 ) avg_singleblkrds_ms
   from dba_hist_filestatxs a , 
   dba_hist_snapshot b 
   where filename = 'my_file.dbf'
   and  a.snap_id=b.snap_id  
  )
 where to_char(begin_interval_time,'HH24') between 7 and 11  -- in case you wish to see only specific hours
 and to_char(begin_interval_time,'HH24') not in ('FRIDAY','SATURDAY ') -- in case you wish to filter nonworking days
  order by begin_interval_time;




Below is part of the real life output, where major changes in average write time can clearly be seen, especially since 24.5. I have taken statistics of a specific data file, each time between 09:30-12:00, in normal working days (not in a weekend). Snapshot interval is 15 minutes. (For simplification, although above query shows reads and writes statistics, below I am showing only writes, since this is the issue relevant for this post).

on 7.5, average write time was between 2-7 ms.
on 16.5, average write time was between 6-11 ms.
on 21.5, average write time was between 6-14 ms.
on 24.5, average write time was between 12-21 ms.
on 13.6, average write time was very very bad. around 40 ms write time. This is 10 times slower than a month ago!
Finally, on 6/7, average write time went back to normal. went down to around 5 ms.

Pay attention to the increasing values of the 3rd column, avg_write_ms, which shows the average physical write time of a specific datafile, for every snapshot.

BEGIN_INTERVAL_TIMEDELTA_PHYWRTSAVG_WRITE_MS
============================================
07-MAY-12 09:30:0519703.395
07-MAY-12 09:45:1020074.489
07-MAY-12 10:00:1216792.829
07-MAY-12 10:15:1721203.037
07-MAY-12 10:30:1923552.169
07-MAY-12 10:45:2316092.094
07-MAY-12 11:00:247507.533
07-MAY-12 11:15:267372.713
07-MAY-12 11:30:287793.465
07-MAY-12 11:45:296252.448
16-MAY-12 09:30:485726.73
16-MAY-12 09:45:496455.395
16-MAY-12 10:00:515768.072
16-MAY-12 10:15:5272010.513
16-MAY-12 10:30:5462211.495
16-MAY-12 10:45:557457.208
16-MAY-12 11:01:006929.205
16-MAY-12 11:15:0192010.793
16-MAY-12 11:30:038398.116
16-MAY-12 11:45:057558.172
21-MAY-12 09:30:5421568.409
21-MAY-12 09:45:57201614.627
21-MAY-12 10:00:59187710.436
21-MAY-12 10:15:0322246.753
21-MAY-12 10:30:06114512.043
21-MAY-12 10:45:1086912.105
21-MAY-12 11:00:1277411.989
21-MAY-12 11:15:148578.191
21-MAY-12 11:30:167378.71
21-MAY-12 11:45:1882612.421
24-MAY-12 09:30:2766114.19
24-MAY-12 09:45:2971512.881
24-MAY-12 10:00:3080015.637
24-MAY-12 10:15:3271612.835
24-MAY-12 10:30:3389519.307
24-MAY-12 10:45:3779919.148
24-MAY-12 11:00:3871621.983
24-MAY-12 11:15:4184121.878
24-MAY-12 11:30:4472415.138
24-MAY-12 11:45:4665315.773
13-JUN-12 09:30:152460824.476
13-JUN-12 09:45:172424931.427
13-JUN-12 10:00:212016439.09
13-JUN-12 10:15:231980738.68
13-JUN-12 10:30:241957238.088
13-JUN-12 10:45:261881241.594
13-JUN-12 11:00:291900340.993
13-JUN-12 11:15:332166044.534
13-JUN-12 11:30:342324238.171
13-JUN-12 11:45:362105542.248
06-JUL-12 09:30:5030576.319
06-JUL-12 09:45:5134065.733
06-JUL-12 10:00:5238808.103
06-JUL-12 10:15:5343245.381
06-JUL-12 10:30:5454094.906
06-JUL-12 10:45:5527145.036
06-JUL-12 11:00:5627234.697
06-JUL-12 11:15:5724347.781
06-JUL-12 11:30:5838044.91
06-JUL-12 11:45:5922435.737


Ladies and gentlemen of the jury, what is your verdict?
We have found the defendant NOT GUILTY, your honor.

To sum it all up, as stated in the beginning, Database is not always to blame. This time the jury found it innocent.
:)
First thing to check is the application. If you see that applications is behaving more or less the same (no new SQLs, same execution count per SQL), check the database.
And if the database is behaving more or less the same, it is simply not there.
Something else is the source of the problem.

As always, comments are most welcome..


Merav Kedem,
Founder of
DB OPTimize
Oracle Performance Tuning & DBA Consulting


Tuesday, March 20, 2012

Bind variables in Oracle 11.2



A bit of an history before I begin...
Oracle 10.2 had exposed bind variables values passed to a SQL statement via a new view named V$SQL_BIND_CAPTURE. Nevertheless, to limit the overhead, Oracle captured the bind at most every 15 minutes for a given cursor. 


In other words, you were not able to see bind values for each SQL execution.
In some cases, especially when binds values have major influence on SQL execution, knowing the bind values passed to a given execution could be very important.


Oracle 11.2 has extended this ability.V$SQL_MONITOR was expanded and a new column BIND_XML was added to it.
V$SQL_MONITOR  is one of my favourite additions to 11g. It displays very useful information on SQL statement executions and it is automatically started when a SQL statement runs parallel or when it has consumed at least 5 seconds of CPU or I/O time. (My only complaint about this view is that it is not kept in history. No dba_hist_sql_monitor yet. Maybe someone in Oracle will read this post and add it to their TODO list...).



Anyway, with the addition of BIND_XML new column, this means that in 11gR2 we will be able to see bind variables values for each execution of such SQL statements. Sure, it is still only partitial picture of bind variables world (since we will have bind information only for those statements who run in parallel or run longer that 5 seconds), but it is still more than we had before...

And as examples always make life much easier, here is a simple demonstration:

SQL> var  a1 number;
SQL> var a2 varchar2(30);
SQL> var a3 varchar2(40);
SQL>
SQL> exec :a1 := 1000;


PL/SQL procedure successfully completed.


SQL> exec :a2 := '%A%';


PL/SQL procedure successfully completed.


SQL> exec :a3 := '01-JAN-2010';


PL/SQL procedure successfully completed.


SQL> set timing on


SQL> select count(*)
  2  from dba_tables t, dba_objects o
  3  where o.object_id > :a1
  4  and t.table_name like :a2
  5  and o.created > to_date(:a3,'DD-MON-YYYY')
  6  and o.object_type='TABLE'
  7  and o.owner=t.owner
  8  and o.object_name=t.table_name;


  COUNT(*)                                                                      
----------                                                                      
      2163                                                                      


SQL> set long 1000
SQL> set lines 130
SQL> select  xmltype(binds_xml) from v$sql_monitor
  2  where sql_text like 'select count(*)%from dba_tables t, dba_objects o%';


XMLTYPE(BINDS_XML)                                                              
--------------------------------------------------------------------------------
<binds>                                                                         
  <bind name=":A1" pos="1" dty="2" dtystr="NUMBER" maxlen="22" len="2">1000</bind>                                                                              
  <bind name=":A2" pos="2" dty="1" dtystr="VARCHAR2(32)" maxlen="32" csid="38" len="3">%A%</bind>                                                               
  <bind name=":A3" pos="3" dty="1" dtystr="VARCHAR2(128)" maxlen="128" csid="38" len="11">01-JAN-2010</bind>                                                    
</binds>                                                                        
                                                                                
As you can see, since running this SQL took a bit more than 5 seconds, its execution as well as all bind variable values passed for this execution can be instantly observed in v$sql_monitor.


Note #1: It is possible to change the 15 minutes bind capture interval by setting an undocumented parameter “_cursor_bind_capture_interval”, (though it can have a large impact on system performance), i.e 
ALTER SYSTEM SET "_cursor_bind_capture_interval" = 30; -- capture every 30 seconds 


Note #2: v$sql_monitor requires a licenses for Oracle Diagnostics & Tuning Packs  


Comments are most welcome..


Merav Kedem,
Founder of
DB OPTimize
Oracle Performance Tuning & DBA Consulting






Friday, October 21, 2011

AWR Formatter - worth a try

I have bumped into one of Tom Kyte's posts, recommending an AWR Formatter plug-in, written by Tyler Muth.
As Tom Kytes said: "It's pretty cool - works as a Chrome plugin - and it makes an AWR report a little more 'friendly' to use. It creates hot links for many of the wait events (so you know what they mean) and it summarizes up a lot of stuff - making the AWR report a lot more "interactive". Check it out and give him feedback on it if you have time."

Merav Kedem,
Founder of
DB OPTimize
Oracle Performance Tuning & DBA Consulting