Wednesday, September 26, 2007
To LOOP or Not To LOOP - Revisited
Initially I went the easy route, writing PL/SQL functions that would populate collections and then do the matching. I tried to tune the functions so that the collection would be populated only once for each line of business and then ordering the result set that would be passed through. In my mind, this would reduce the number of cursors opened. I wasn't if that would make a difference or not, but I was going to give it a try.
Well, it was dog slow. One of the lookups calls was estimated to take days to complete (I only know this because I have gotten into the habit of using dbms_application_info.set_session_longops).
So I rethought it.
I only needed to get the keys right?
I first created a primary key on the staging table (same key that would be used in the dimension). I then created a "work" table that would reference that key and then store the other key that I needed to lookup. I also made the foreign key a unique key (one to one relationship) so that I would know immediately if I did something wrong. I then used straight SQL to populate that table. One pass for all the values that matched and then another pass with those that didn't have a matching record (padded rows).
From days to minutes...2 minutes in fact. So I applied that technique to two other lookup situations.
The code is still pretty straight-forward and adding another line of business is relatively easy (I checked this today and it took all of 10 minutes). All the fancy PL/SQL and collection stuff I had created was thrown out the window.
So I did manage to use straight SQL, the performance is phenomenal and maintenance will be easy.
It even passed muster (albeit grudgingly) with our more seasoned datawarehouse folks. Ultimately, they couldn't complain when it ran in under two hours...
Wednesday, September 19, 2007
To CONSTRAINT or Not to CONSTRAINT
I can understand that constraints my slow down table loads, but with the volume of data we are currently using (100 million row tables), I just don't see that as a huge impact on performance. For me, writing less code is better than any minimal performance gains we might achieve.
For instance, today he told me he wanted to add a new column. The possible values would either be M or G. I asked (I'm lead on the project) him to throw a CHECK constraint on the table limiting the values that can go in that field.
For me, I guess it's a support issue and part database purist issue. Use it for more than a bucket, Oracle's expensive. It forces developers to deal with that and will immediately tell them if they have done something wrong. It also makes support a tad easier as they won't have to wonder what actually goes in the field.
We talked at length about it but never really came to a decision (i.e. I couldn't convince him).
I know that a benchmark test would be the best way to prove either way, but I don't have the time right now to do it. Perhaps when things slow down again I will.
Any suggestions out there?
Tuesday, September 18, 2007
Zero Day is NOT Upon Me
Neither of my two deployments are going through tomorrow, so I have two more weeks to go until another opportunity to screw something up. ;)
Speaking of which...
I watch SpongeBob Squarepants quite a lot with a 5 year old boy in the house. I was watching this episode the other day where Patrick (the starfish and best buddy to SpongeBob) found out his parents were coming over to visit him.
He was not very happy as his parents treated him like he was stupid. SpongeBob told him that he would help out by acting even dumber than Patrick thus making Patrick actually look smart. The act worked, but too well. Patrick, even out of earshot of his parents, kept cracking stupid jokes at the expense of SpongeBob. This resulted in SpongeBob getting mad and reminding Patrick that he was only acting stupid to help him out.
Since I put my sign up (not counting my counter up top, no one here gives me a hard time) everyone has consistently beat me up over it. Today was it for me. Yes, I brought it on myself with the sign and all that, but I'm through with it. My boss mentioned that if I had to reset my sign that I wouldn't really be there to do so as I would have been escorted out the door (he didn't say this in a mean-spirited way...he might be right next to me) and unable to reset.
I started practicing how I would answer questions about my firing in my next interview.
Interviewer: "So, why did you get fired?"
Me: "Well, I screwed up production a couple of times."
Interviewer: "How did you do that?"
Me: "The first time I deployed code from development to production."
Interviewer: "Really?! Why did you have access to production?"
Me: "..."
Me: "On the second occurence, I deployed a bug into production; I changed a non-requirement driven piece of code"
Interviewer: "What about QA? Didn't they do regression testing?"
Me: "..."
I felt a little better.
I've have never been happy about my mistakes, I tried to "man-up" and own them; perspective and context are good things though. I should learn from my mistakes and correct the behavior that led to them, that's all I can really do.
Thursday, September 13, 2007
To LOOP or Not to LOOP
Tom's mantra:
- You should do it in a single SQL statement if at all possible.
- If you cannot do it in a single SQL Statement, then do it in PL/SQL.
- If you cannot do it in PL/SQL, try a Java Stored Procedure.
- If you cannot do it in Java, do it in a C external procedure.
- If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it…
I'm faced with a new datawarehouse load, eventually it will be about 100 million records. If I go the straight SQL way (embedded in PL/SQL), I'll need to create multiple objects to support that method and each time we add a new line of business, we'll have to add more objects and more code.
If I can encapsulte it in PL/SQL, I could eliminate most of the necessary supporting objects, but at a big cost, performance. It would make coding time much quicker and ultimately more maintainable (in my opinion), adding a new line of business would simple by updating the packaged functions with an additional control check.
It's a difficult pill to swallow though. I'm originally an OLTP guy and I can write pretty good SQL. Being so new to datawarehousing concepts I feel like I am cheating somehow. Making that decision of when to move to slow-by-slow seems exceedingly difficult. Hopefully with more experience in the datawarehouse environment it will become easier...but it's still tough.
The Chicken Almost Came Home to Roost
I have a scheduled deployment next week, mainly just performance improvements. I've managed to get this down from 10-12 hours to 7-8.
At the end of the meeting, the Business questions the need for the performance improvements - we're re-architecting my solution in parallel because mine was just a conversion from SAS (yuk!) - as we'll be live in just 2 months with the new one.
Obviously, I know why they're thinking that.
IT, me, can't be trusted to do it right. Since I've, umm...screwed up a couple of times. Why do this if it's not broken?
I just hung my head low, I knew, and I couldn't really argue with them. I had no ground to stand on. Our PM said that he'd take it back to our manager and let them know.
I went to lunch and thought about this blog entry I would be writing.
I got back from lunch and my PM informed me that this would be going into production...the Business' big boss, Miss VP, said so.
Woohoo! Someone has some semblance of confidence in IT (me)!
I've done everything I could to make sure that I didn't do something silly. We had a peer code review to compare the most recent build against that which was in production. My unit tests were much more thorough. I worked with QA to get them to look at specific points. Let's just hope all goes well.
This is a big test for me. Either I pass and gain some credibility back or fail and lose my job. Wish me luck!
Code Style
I'm sure if I could make the rules, I'd have everyone writing code according to my ways. Power corrupts and all that.
I have learned to accept others' ways though. It's been difficult at times. If the code is readable and it works, I usually just bite my tongue.
Strangely, I love to type. I manually create all of my scripts. I make my code "pretty." I don't use GUI tools to auto-generate table definitions, and I especially don't use tools to format my code.
Toad Formatter, or whatever it's called, is my enemy. I'm sure Toad is not the only one with some sort of auto-formatter and I don't like any of them. For some reason I believe it is an abomination.
When I get to do the technical interview, I scan their resume for SQL*Plus. If I don't see it, I ask them what tools they use. If they don't mention SQL*Plus, I'm skeptical of their abilities...until I get to the five constraints question.
This is not to say that people that use Toad or SQLDeveloper are bad coders or anything, it's just my preference.
So what's your preference?
Tuesday, September 11, 2007
Inline vs. Out-of-line Constraints?
So he came over and we talked about inline vs. out-of-line constraints. I asked him the advantage of out-of-line. He asked me the advantage of inline. I'm sure his answer was better than mine as he is much more articulate.
For me, it's mostly a style thing, except for FOREIGN KEY constraints which, if defined inline, will inherit the data type of the parent column. That makes life easier if you ever need to change the data type of a parent key (not a recommended best practice mind you). I just think it looks prettier (factual based evidence). It's all in one file and I like to see how many constraints I can put on a single table. I'm shooting for at least one per column to save myself time down the road of coding exceptions. Let the database do it's job!
Here's a sample of inline constraints:
DROP TABLE s;
DROP TABLE t;
CREATE TABLE t
(
id NUMBER(10)
CONSTRAINT pk_id PRIMARY KEY,
name VARCHAR2(30)
CONSTRAINT nn_name_t NOT NULL,
age NUMBER(2,0)
CONSTRAINT nn_age_t NOT NULL
CONSTRAINT ck_gtzero_age_t CHECK ( age >= 0 )
);
CREATE TABLE s
(
sid NUMBER(10)
CONSTRAINT pk_sid PRIMARY KEY,
id
CONSTRAINT fk_id_s REFERENCES t ( id ),
something_unique VARCHAR2(30)
CONSTRAINT uq_somethingunique_s UNIQUE
);
And out-of-line Constraints:
DROP TABLE s;
DROP TABLE t;
CREATE TABLE t
(
id NUMBER(10),
name VARCHAR2(30),
age NUMBER(2,0)
);
ALTER TABLE t ADD CONSTRAINT pk_id PRIMARY KEY ( id );
ALTER TABLE t ADD CONSTRAINT nn_name_t CHECK ( name IS NOT NULL );
ALTER TABLE t ADD CONSTRAINT nn_age_t CHECK ( age IS NOT NULL );
ALTER TABLE t ADD CONSTRAINT ck_gtzero_age_t CHECK ( age >= 0 );
CREATE TABLE s
(
sid NUMBER(10),
id NUMBER(10),
something_unique VARCHAR2(30)
);
ALTER TABLE s ADD CONSTRAINT pk_sid PRIMARY KEY ( sid );
ALTER TABLE s
ADD CONSTRAINT fk_id_s FOREIGN KEY ( id ) REFERENCES t ( id );
ALTER TABLE s
ADD CONSTRAINT uq_somethingunique_s UNIQUE ( something_unique );
So, what are the advantages for either method? I couldn’t find much via google or asktom, but I probably just didn't search on the right terms. I find it hard to believe that this topic hasn't come up before.
* Note that I did use 5 constraint types
The Good Manager
This is something I think about quite often. I am in a very chaotic, immature organization currently. The process to deploy code changes about every other day and of course non of it is documented. Then there's the fact that I had complete control at my previous job, I was the DBA, Architect, Web Developer and Designer (suck at that), and most importantly Database Developer. I had a very good manager who just literally let me run wild (within reason of course).
For me, that was a perfect situation. I felt I was under-utilized at my previous job and that was the perfect opportunity to flex my muscles. I learned a great deal there and I am forever thankful for that.
One of the big reasons I took my current job was because of the chaos and the immaturity of the IT organization. There are countless opportunities to help shape the future, to build the foundation. I'd also get to experience life in the for-profit corporate world where performance is rewarded financially. There's also significant room to advance relatively quickly compared to more established environments.
I have learned things on the technical side, but far and away my biggest gain in knowledge is in how to do software development in a team environment and the peculiar politics of a company.
I have my manager to thank for that. He is a former military officer who attended one of the military academies. He has worked in our industry for a number of years and is our subject matter expert on the financial side of things.
- He gives us (developers) the opportunity to voice our opinions.
- He gives us a view into the politics.
- He gives us the big picture view.
- He is fair.
- He does not do things just because that's the way they're done. He fights those battles so that we don't have to do it the wrong way.
- He backs us up.
- Shit doesn't roll downhill with him.
If he ever decided to leave, I might just have to follow him.
The Countdown Timer
Now I'll have a reminder at work and on the blog not to touch that which is out of scope.
Monday, September 10, 2007
Oracle Tools I've Used
SQL*Plus
APEX - Application Express (formerly HTMLDB)
JDeveloper
SQL Developer
Reports Builder
OC4J
Discoverer
Oracle Server 8i, 9i, 10g (Windows)
Oracle Application Server 9i, 10g (Windows)
rman
oradim
lsnrctl
tkprof
Coming Soon:
Oracle Warehouse Builder
Database Features:
Java
Object Types
Workflow
Advanced Queueing (in conjunction with Workflow)
Heterogenous Services
Features I'd like to use:
Change Data Capture
XML
Spatial
interMedia
Regular Expressions
Database Supplied Packages:
DBMS_OUTPUT
DBMS_LOB
UTL_FILE
DBMS_APPLICATION_INFO
DBMS_CRYPTO
DBMS_UTILITY
DBMS_METADATA
DBMS_EPG
DBMS_HS_PASSTHROUGH
DBMS_JAVA
DBMS_JOB
DBMS_LOCK
DBMS_MVIEW
DBMS_OBFUSCATION_TOOLKIT
DBMS_RANDOM
DBMS_SESSION
DBMS_SQL
DBMS_STATS
DBMS_XDB
DBMS_XPLAN
HTMLDB_APPLICATION
HTMLDB_ITEM
HTMLDB_UTIL
HTP
OWA_COOKIE
UTL_MAIL
UTL_RAW
UTL_SMTP
WPG_DOCLOAD
Packages I'd like learn to use (10g):
UTL_DBWS
DBMS_CDC_PUBLISH
DBMS_CDC_SUBSCRIBE
DBMS_DATA_MINING
DBMS_DATAPUMP
DBMS_SCHEDULER
DBMS_OLAP
DBMS_PIPE
DBMS_STREAMS
DBMS_STREAMS_ADM
DBMS_STREAMS_AUTH
DBMS_STREAMS_MESSAGING
Wednesday, September 5, 2007
Working with the Business
Probably one of the biggest reasons I got into IT was because I didn't like the control that IT (seemingly) had over me. I wanted that kind of control, but I also wanted to share it.
I have tried to carry my end user mentality with me to IT. I try and go out of my way to learn the business; from reading procedures to just sitting down with them and discussing what they do. This was fairly easy to do in the smaller companies I've worked with, but now that I am in a 3,000+ person company, it has gotten a bit more difficult.
I read an article by Steve Jones about getting closer to your business. I couldn’t agree more.
I would love (and have asked) to be able to spend more time with the Business folks. Perhaps shadowing them for a week, or just writing reports so I can get a better feel for what they do with the data/information we provide them. Better yet, make it part of the on-boarding for new employees. Spend 6 weeks with various departments or something like that. Some companies have management programs that do just that.
Anyway, I believe time spent with the Business makes me a better developer and hopefully builds up a good working relationship (trust) between IT and the Business.
Business Logic: In the Database or in the Application
Today I read a post from Dan Linstedt titled "How Data Models can Impact Business." I followed the rather lengthy, but descriptive post, until I got to this part:
"Ok, I kind-of buy it, but what about Referential Integrity, when should that be enforced?
In two places:
1) When the data is captured within the application capturing it - it would clean up a LOT of these source systems, and put much more rigorous business logic (and cleaner data) into the source systems to begin with.
2) When the data is "released" for users, to reports, to screens, to output. This is when reusable common services / routines for getting data out are helpful. They implement the referential integrity in the application layer."
Now, I haven't been in IT all that long (5+ years now), but I put as much business logic into the database as I possibly can. I use the front end (APEX, woohoo!), to handle row color or something along those lines.
If you start with a good, flexible model, have a strong database team (DBAs, Developers) and you build a good API to your physical model (no INSERT, UPDATE or DELETE to any users, the only entry point to your tables being your API), flexibility and maintainability should not be a problem. Making changes should not entail a monumental effort. I suppose if the application in Mr. Linstedt's article is the only point of entry into the tables, I probably wouldn't really disagree (he's just moved it from my database API to his application), but then you have a giant bucket. Why not just use what you have paid for and build it in the database? That way, more than one application can use the same API over and over.