I understand, from a very high level of what needs to be done, but I could not prove my theories. Well, I could, sort of, but it wasn't necessarily a reasoned or logical approach. I just tried all possible permutations. There's a certain time constraint with that method and when you're talking about huge volumes of data (i.e. datawarehouses), time is short.
I've decided it's time to change that, so I'll begin to peck away ever so slowly.
In Usage Tracking, there is a table called S_ETL_TIME_DAY. It's a sister table to S_ETL_DAY which is your everyday time dimension. S_ETL_TIME_DAY breaks down a single day into hours and minutes which means there are 1440 records (24*60*60).
Somewhere I saw the following SQL:
SELECT DISTINCT hourswhich just returns the hours in the day (24). I've pondered on whether DISTINCT is a bug, but it seems fairly innocuous here.
FROM s_etl_time_day
What about this though?
SELECT rownum - 1 hoursWhich one is faster?
FROM dual
CONNECT BY LEVEL <= 24;
Let's try an explain plan
Query 1
S_NQ_SCHED@TESTING>EXPLAIN PLAN FORQuery 2
2 SELECT rownum - 1 hours
3 FROM dual
4 CONNECT BY LEVEL <= 24;
Explained.
Elapsed: 00:00:00.05
S_NQ_SCHED@TESTING>@EXPLAIN
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 1731520519
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
| 1 | COUNT | | | | |
|* 2 | CONNECT BY WITHOUT FILTERING| | | | |
| 3 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(LEVEL<=24)
15 rows selected.
S_NQ_SCHED@TESTING>EXPLAIN PLAN FORSadly, about the only thing I can usually see in an explain plan is whether an index is being picked up or if there is a nested loop. I'm not going to worry about all of it now...this is just a start. More to get me in the habit.
2 SELECT DISTINCT hours
3 FROM s_etl_time_day;
Explained.
Elapsed: 00:00:00.04
S_NQ_SCHED@TESTING>@explain
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
Plan hash value: 878743397
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24 | 72 | 5 (20)| 00:00:01 |
| 1 | HASH UNIQUE | | 24 | 72 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS FULL| S_ETL_TIME_DAY | 1440 | 4320 | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
9 rows selected.
Next I set up tracing since "reading" the file is so much easier now.
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HOURS';Here's the output:
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
Query 1 | Query 2 |
> I understand, from a very high level of what needs to be done, but I could not prove my theories. Well, I could, sort of, but it wasn't necessarily a reasoned or logical approach. I just tried all possible permutations.
ReplyDeleteMy apologies if this is teaching you to suck eggs, but sounds like these might help the approach side of things: Brilliant performance articles by Cary Millsap
I have no idea about your application but
ReplyDeleteIf this will be running by more than one session at the same time , before conclusion with one session test, you should run both statements with number of session which will be actively running them.
My vote is to dual one which is simplest and doesn't depend on the size of the table you are querying from.
I'd second Coskan. Especially as FAST DUAL is really fast ;-)
ReplyDeleteIn <our particular case, It might be sufficient to get v$sesstat before and after each statement and calculate the deltas (to see what the session was doing). This could give you an even closer insight.
I wouldn't say either would noticably benefit from tuning for a single call/session. If there were LOTS of calls it may be worth it.
ReplyDeleteI suspect in a very high concurrency environment (lots of sessions running the same query concurrently) you'd be more worried by latch contention. When a block is queried it needs to be latched momentarily (for the duration of the read) as you can't have another session (on another CPU) writing to the block at the same time.
Because DUAL doesn't really exist as a block, it doesn't need that protection so using a DUAL query can reduce latch contention.
Here's another vote for DUAL. I'm 99% sure it will scale better than the query against the table. If I have time, I'll test tomorrow at work with Quest's Benchmark Factory.
ReplyDelete@rnm1978
ReplyDeleteNow I know you aren't reading...that often anyway. :)
@coskan, @gary, @martin @enrique
ReplyDeleteThat's my point. I would bet all of you have a much better understanding of the why's involved...I don't. I would probably pick dual, mostly because it will always be there.
I wish I could say emphatically and with examples, that dual is the single best way to go. I can't yet because I don't understand at that level yet.
I don't like being in that situation...at all. Much of my knowledge is based on reading and internalizing asktom for years...but to repeat the tests that he performs, or to even create the tests that he performs is just now out of my grasp. That's what I want to change.
chet
@oraclenerd of course I'm reading ... just a bad memory :)
ReplyDelete