Tuesday, April 15, 2008

How Do You Audit?

I'm not necessarily talking about system auditing, which I understand to be pretty much like throwing a switch, I'm talking about table level auditing.

Given the following table:

CREATE TABLE t
(
id NUMBER(15) PRIMARY KEY,
ssn VARCHAR2(9) UNIQUE,
last_name VARCHAR2(40),
first_name VARCHAR2(30),
dob DATE
);
The requirement is to track any changes to everything but the SSN. ID is just a surrogate key.

I typically do something like this:

CREATE TABLE t
(
id NUMBER(15) PRIMARY KEY,
ssn VARCHAR2(9) UNIQUE,
last_name VARCHAR2(40),
first_name VARCHAR2(30),
dob DATE,
create_date DATE,
create_user VARCHAR2(30),
end_date DATE,
end_user VARCHAR2(30)

);
I added the create_ and end_ columns to see who did what when. If someone comes in and decides to change the record, the end_date and end_user are populated (thereby "terminating" the record) and a new record is created with the updated values. That then becomes the "current" record.

I know there are other ways, but I'd like to hear some of your ideas/methods.

Updated
I realized that I should not have put ID as the primary key, I should have used a unique key (SSN) instead but generated a surrogate key (ID). I've updated it to use SSN as the unique key, so the primary key can and will change (it's a surrogate), but the SSN cannot.

Updated II
Then it would fail when you create a new record because the SSN would no longer be unique. Oy, I usually read and re-read anything technically related.

So I'll remove the UNIQUE constraint from SSN and that should do the trick. SSN will be used to find the record of the appropriate person, along with END_DATE IS NOT NULL (a "current" row).

5 comments:

  1. I'm actually a big fan of surrogate keys. SSNs can change you know. I never, ever use a meaningful column as the PK and I never run into this issue.

    Also, looks like you're implying that it's OK to store a SSN in clear text. That would be a very, very bad thing. Make sure it's encrypted or even better, hashed.

    ReplyDelete
  2. I agree Tyler. I would encrypt it at the very least and only allow users with access view the plain-text.

    It was probably a bad example, I should have known better with the Oracle community. ;)

    Let's assume it's encrypted and you didn't have the surrogate key to find the record to update so you had to use SSN (in this bad example anyway). How would you capture changes to the other fields (first name, last name, dob)?

    ReplyDelete
  3. Yes, Use DBMS_CRYPTO or Transparent Data Encryption. TDE needs to be licensed under ASO and it only encrypts the data at the file and log level. It's good for packaged apps because it is hard to get them to change things as quickly as you need it done. Still, if someone gets access to the data inside the DB, they can see it.

    DBMS_Crypto is slick in that they still need to provide a key to unlock/decrypt the value along with a decryption function that you build using DBMS_Crypto.

    You do risk having the table grow a large amount.

    What if you had a sequence generated or some other generated value for the PK. You can derive it from a number of things. For contention issues you might even look at making it a reverse key index if you see a lot of buffer busy waits on the end block of the index. This way you can have high concurrency and be able to use a sequence.

    Then have an auditing table that you insert the old values and new values and maybe even what field changed. You can then log when it changed and have a foreign key that links back to the primary key of the main record table so you could always go back and build a history. If you foresee you are doing a lot of joins, build a and index on the FK column and see if Oracle uses your index : )...

    ReplyDelete
  4. In the past I used DBMS_CRYPTO along with a modified VPD (Standard Edition). I created a view on top of the table that called the column and the column the the decrypt function and only if the user was authorized could they view the decrypted column, otherwise it just displayed 'XXX-XX-XXXX' or something.

    I have always wondered about having a single audit table and using that key, but I will still need a create_audit_key and a end_audit_key (and possibly an update_audit_key) column.

    Ultimately, if you could build your application in parallel with your datawarehouse, you could just update the old record in your type 2 dimension and create a new current row there, only maintaining one record at a time (thus being able to use the SSN as a PK) in the base table.

    ReplyDelete
  5. Lewis C did some Auditing with streams and change data capture to send the audit information to an audit DB that was out of control of the DBAs. This is much better than triggers which are abused A LOT in auditing situations.

    People wonder why their throughput went down after adding triggers. With CDC it's asynchronous instead of syncronous like triggers.

    ReplyDelete