Tom's mantra:
- You should do it in a single SQL statement if at all possible.
- If you cannot do it in a single SQL Statement, then do it in PL/SQL.
- If you cannot do it in PL/SQL, try a Java Stored Procedure.
- If you cannot do it in Java, do it in a C external procedure.
- If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it…
I'm faced with a new datawarehouse load, eventually it will be about 100 million records. If I go the straight SQL way (embedded in PL/SQL), I'll need to create multiple objects to support that method and each time we add a new line of business, we'll have to add more objects and more code.
If I can encapsulte it in PL/SQL, I could eliminate most of the necessary supporting objects, but at a big cost, performance. It would make coding time much quicker and ultimately more maintainable (in my opinion), adding a new line of business would simple by updating the packaged functions with an additional control check.
It's a difficult pill to swallow though. I'm originally an OLTP guy and I can write pretty good SQL. Being so new to datawarehousing concepts I feel like I am cheating somehow. Making that decision of when to move to slow-by-slow seems exceedingly difficult. Hopefully with more experience in the datawarehouse environment it will become easier...but it's still tough.
3 comments:
Is there any reason why you're not using an ETL tool to do this?
DW loader routines are hard to get, especially if you don't have any relevant previous in handling large volumes of data. This is the Voice of Experience.
Cheers, APC
Tom has it exactly right in my opinion.
Whilst it can take a bit more thought to do things in SQL, it's generally better when it comes to performance...and when you're dealing with warehouse type volumes then performance tends to be pretty important.
We had the same issues on the DW I look after, where a number of developers were rolled onto the project over time, often from OLTP PL/SQL backgrounds, and wanting to code things in a "slow by slow" fashion as you put it...trying to explain to them why the set based approach is better was hard work sometimes, but in the end they got it.
"Design Patterns" was one of the central approaches we instigated within our team and that worked well to ensure that we identified and tested set based approaches for achieving certain tasks, which we then ensured that all similar tasks
were coded according to the same design pattern.
I can't think of one case where we've had to resort to row based processing yet, although the flip side is that we've had to come up with some, sometimes convoluted, solutions just to keep it set based.
All good fun though.
@APC
It's a pure PL/SQL environment. Our management (from one of our larger competitors) believes that with the volume of data we will be using PL/SQL will handle the load better. That said, we are soon to begin evaluating OWB. So who knows where it's going.
That's exactly what I'm going through at this time...it would be nice to see how a tool does it to understand some of the concepts behind it.
@jeff
That is the fun part, I agree. I guess that's why I love what I do.
Post a Comment