Wednesday, March 3, 2010

Learning About Performance

I've found myself at a certain disadvantage lately, specifically in regards to understanding tuning. I don't like being at a disadvantage...I'm competitive that way.

I understand, from a very high level of what needs to be done, but I could not prove my theories. Well, I could, sort of, but it wasn't necessarily a reasoned or logical approach. I just tried all possible permutations. There's a certain time constraint with that method and when you're talking about huge volumes of data (i.e. datawarehouses), time is short.

I've decided it's time to change that, so I'll begin to peck away ever so slowly.

In Usage Tracking, there is a table called S_ETL_TIME_DAY. It's a sister table to S_ETL_DAY which is your everyday time dimension. S_ETL_TIME_DAY breaks down a single day into hours and minutes which means there are 1440 records (24*60*60).

Somewhere I saw the following SQL:
SELECT DISTINCT hours
FROM s_etl_time_day
which just returns the hours in the day (24). I've pondered on whether DISTINCT is a bug, but it seems fairly innocuous here.

What about this though?
SELECT rownum - 1 hours
FROM dual
CONNECT BY LEVEL <= 24;
Which one is faster?

Let's try an explain plan

Query 1
S_NQ_SCHED@TESTING>EXPLAIN PLAN FOR
2 SELECT rownum - 1 hours
3 FROM dual
4 CONNECT BY LEVEL <= 24;

Explained.

Elapsed: 00:00:00.05
S_NQ_SCHED@TESTING>@EXPLAIN

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 1731520519

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
| 1 | COUNT | | | | |
|* 2 | CONNECT BY WITHOUT FILTERING| | | | |
| 3 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(LEVEL<=24)

15 rows selected.
Query 2
S_NQ_SCHED@TESTING>EXPLAIN PLAN FOR
2 SELECT DISTINCT hours
3 FROM s_etl_time_day;

Explained.

Elapsed: 00:00:00.04
S_NQ_SCHED@TESTING>@explain

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
Plan hash value: 878743397

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24 | 72 | 5 (20)| 00:00:01 |
| 1 | HASH UNIQUE | | 24 | 72 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS FULL| S_ETL_TIME_DAY | 1440 | 4320 | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

9 rows selected.
Sadly, about the only thing I can usually see in an explain plan is whether an index is being picked up or if there is a nested loop. I'm not going to worry about all of it now...this is just a start. More to get me in the habit.

Next I set up tracing since "reading" the file is so much easier now.
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HOURS';

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
Here's the output:
Query 1Query 2
What does it all mean? I am sure you know or understand better than I. For me, I just need to create the habit. I'll read (more) about the specifics of the explain plan and the output from tkprof. If you want to explain, feel free. If you want to do it in your own post, link it up or you can write it here. I'd be happy to host it.

8 comments:

  1. > I understand, from a very high level of what needs to be done, but I could not prove my theories. Well, I could, sort of, but it wasn't necessarily a reasoned or logical approach. I just tried all possible permutations.

    My apologies if this is teaching you to suck eggs, but sounds like these might help the approach side of things: Brilliant performance articles by Cary Millsap

    ReplyDelete
  2. I have no idea about your application but
    If this will be running by more than one session at the same time , before conclusion with one session test, you should run both statements with number of session which will be actively running them.

    My vote is to dual one which is simplest and doesn't depend on the size of the table you are querying from.

    ReplyDelete
  3. I'd second Coskan. Especially as FAST DUAL is really fast ;-)
    In <our particular case, It might be sufficient to get v$sesstat before and after each statement and calculate the deltas (to see what the session was doing). This could give you an even closer insight.

    ReplyDelete
  4. I wouldn't say either would noticably benefit from tuning for a single call/session. If there were LOTS of calls it may be worth it.

    I suspect in a very high concurrency environment (lots of sessions running the same query concurrently) you'd be more worried by latch contention. When a block is queried it needs to be latched momentarily (for the duration of the read) as you can't have another session (on another CPU) writing to the block at the same time.

    Because DUAL doesn't really exist as a block, it doesn't need that protection so using a DUAL query can reduce latch contention.

    ReplyDelete
  5. Here's another vote for DUAL. I'm 99% sure it will scale better than the query against the table. If I have time, I'll test tomorrow at work with Quest's Benchmark Factory.

    ReplyDelete
  6. @rnm1978

    Now I know you aren't reading...that often anyway. :)

    ReplyDelete
  7. @coskan, @gary, @martin @enrique

    That's my point. I would bet all of you have a much better understanding of the why's involved...I don't. I would probably pick dual, mostly because it will always be there.

    I wish I could say emphatically and with examples, that dual is the single best way to go. I can't yet because I don't understand at that level yet.

    I don't like being in that situation...at all. Much of my knowledge is based on reading and internalizing asktom for years...but to repeat the tests that he performs, or to even create the tests that he performs is just now out of my grasp. That's what I want to change.

    chet

    ReplyDelete
  8. @oraclenerd of course I'm reading ... just a bad memory :)

    ReplyDelete