Print

Print


Yeah, disk space isn't the issue, I don't think: the reason you
should use a normalized database is because it will make maintenance
easier and code simpler. An un-normalized database will likely lead
to complicated (and thus hard to maintain) code, and make it harder
to maintain the database.

The only reason to make the database un-normalized is if you really
do need it for execution performance. I doubt you do in this case,
but I've generally subscribed to the "don't optimize prematurely"
school of thought. My first priority is clean and easy to maintain
code, unless I've actually got real evidence that I need to sacrifice
this for performance/optimization.

The disk space differences are negligible in contemporary terms.
Clean code and clean design vs. (premature?) optimization is the
issue. Whoever said that the un-normalized schema was simpler or
easier was, in my opinion, wrong; the normalized way is simpler and
ultimately easier, the un-normalized way (theoretically) gives you a
(probably negligible) run-time performance boost (on select; and a
(likely just as negligible) performance hit on update).

--Jonathan

>On 5/22/06, Houghton,Andrew <[log in to unmask]> wrote:
>>  > From: Code for Libraries [mailto:[log in to unmask]] On Behalf
>
>>  > Of Ross Singer
>>  > Sent: 22 May, 2006 11:58
>>  > To: [log in to unmask]
>>  > Subject: Re: [CODE4LIB] Musings on using xISBN in our Horizon
>>  > catalog
>>  >
>>  > Given the size of both datasets, we are still talking about a few
>>  > seconds per query with most currently available SQL databases.  But
>>  > what happens when the size of the catalog doubles, triples,
>quadruples, etc.?
>>
>>  Wait.  A few seconds? You're kidding, right?  An indexed look up of
>the group id
>
>Oops, flagrant typo revealed.  That should have read "... We are still
>talking about a few milliseconds per query..."  If only my fingers
>could read my mind...
>
>>  Just so we're clear, we are talking about, at most, .5M rows that are
>perhaps 50 bytes > wide, with internal data.  That translates to ~25
>megabytes for the table, and the
>>  indexes would be /maybe/ 5 megs each.
>
>There is more than one way to solve a problem.  You are free to use an
>unnormalized database design if that is adequate for your needs.  But
>why would you want a database ~25 MB with 10MB of indexes, when you
>could get the same results from a normalized database design whose
>storage requirements were *roughly* 4 times less and get better
>scalability should your requirements change in the future?
>
>Yes, I can hear the come-back, "disk space is cheap", but someone pays
>for it at some point in time.  I can agree to disagree with using an
>unnormalized database design and as I indicated in my first post, "If
>I were setting up the SQL database, I would probably do the
>following..."
>
>>  > A good database design will go a long way...
>
>Andy.