Advertisements
Posted by: jsonmez | December 9, 2009

Tracking a Change on a Column

I recently ran across a very interesting problem involving database design.

I have changed the table names to simplify the problem, so let me describe it as such:

I have 2 tables, fruit and vegetable. Each stores whether or not a fruit or vegetable is tasty.  We can ignore the other columns, so it would be like

Now lets say that someone keeps changing the IsTasty setting through the UI of my application and management is demanding the ability to see when someone changed it last and who.  The tricky part here is although we are ignoring the other data on the table, there is other data, and we don’t want to track when any data on the table was changed, just this one column.

Solution 1: Track the data in the table by adding columns to track updated on and updated by.

Drawback here is we are adding 2 extra columns per table, and anywhere else we wanted to do this would require the same.  Imagine if we had more columns in this same table we wanted to track changes on.  Arghh…

Solution 2: Track the data in a separate table for the fruit and vegetable tables for just that column.

This seems to be a little more clean and extensible in the future.  Now we have created two extra tables, and we haven’t added a bunch of extra columns to our existing tables, but if we wanted to track more columns, we would have to add more tables that are copy and pastes of these tables.  These tables are also exactly the same as each other except for the foreign key.

Solution 3: Create a new table that has 2 optional foreign keys

Here we are removing the duplication of the tables that are almost exactly alike.  It will scale, but we have optional foreign keys and it is very hard to do joins.  We also lose knowing what data this table belongs to and automatic referential integrity provided by the database.  (Which is the ability for the database to know what tables are linked together and make sure the referential data is correct.)  We could create a custom constraint, but each time we need a new column we would have to add a new key, and for a different type of column, we would need another table.

Solution 4: Create a meta table to store information about columns in other tables

To me this seems like the best approach.  When a column was changed has nothing to do with a fruit or vegetable.  It is data about data or meta-data.  By creating a table to store meta-data about any table it is very scalable.  We don’t have a foreign key, but we can either create triggers, or use our DAO framework to automatically make inserts to the audit table.

Other options?  Pro’s con’s?  Thoughts?

Advertisements

Responses

  1. John:
    Whatever solution you choose, you will still need smart programmers to make the change when new columns are added. You will still need smart data analysts to come up with a way to provide the reports.

    What difference does it really make? Size of the database certainly matters. Complexity matters, for when you are gone someone will need to pick this up. My vote is for simplicity so that I can hand this off to the next guy without much training. That makes it solution 2 for me.

  2. The real problem here is that you have Fruit and Vegetable as subclasses of, let’s say, EdibleObject, and the IsTasty attribute is one of EdibleObject, not Fruit or Vegetable. You really want to be able to have EdibleObject contain a history (perhaps a sorted collection) of IsTasty, of type AuditableField, which in turn would have a value and some other attributes to track when it was last changed.

    The database design exposes the impedence mismatch problem between the object and relational views of the data. So what you are really trying to develop is a general solution to the implementation of AuditableField in a relational database setting.

    I think your last solution, with the metatable, comes as close as you will get to a clean relational implementation of the problem. (Now if only relational databases would permit the type of complex typing present in OO languages…)

    Your only drawbacks to this solution are:
    no RI to ensure that data in the metatable refers back to an actual table/column combination
    additional coding overhead to storing the AuditableField columns, in the form of
    update triggers: since the general rule is to use these sparingly, I’m not fond of this as a solution
    new DAO code in the application: which is more time consuming to write but is cleaner in some respects from a database position.

    Of course, I also like what tboehm30 says about simplicity – the metatable solution is a good one, but one that will require significant instruction and direction so that the next guy doesn’t completely mess it up.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: