On 5/22/06, Houghton,Andrew <[log in to unmask]> wrote:
> On 5/22/06, Houghton,Andrew <[log in to unmask]> wrote:
> > > From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> > > Of Ross Singer
> > > Sent: 22 May, 2006 11:58
> > > To: [log in to unmask]
> > > Subject: Re: [CODE4LIB] Musings on using xISBN in our Horizon
> > > catalog
> > >
> > > Given the size of both datasets, we are still talking about a few
> > > seconds per query with most currently available SQL databases. But
> > > what happens when the size of the catalog doubles, triples,
> quadruples, etc.?
> > Wait. A few seconds? You're kidding, right? An indexed look up of
> the group id
> Oops, flagrant typo revealed. That should have read "... We are still
> talking about a few milliseconds per query..." If only my fingers
> could read my mind...
> > Just so we're clear, we are talking about, at most, .5M rows that are
> perhaps 50 bytes > wide, with internal data. That translates to ~25
> megabytes for the table, and the
> > indexes would be /maybe/ 5 megs each.
> There is more than one way to solve a problem. You are free to use an
> unnormalized database design if that is adequate for your needs. But
> why would you want a database ~25 MB with 10MB of indexes, when you
> could get the same results from a normalized database design whose
> storage requirements were *roughly* 4 times less and get better
> scalability should your requirements change in the future?
> Yes, I can hear the come-back, "disk space is cheap", but someone pays
> for it at some point in time. I can agree to disagree with using an
> unnormalized database design and as I indicated in my first post, "If
> I were setting up the SQL database, I would probably do the
I don't disagree with these points, and I think you missed mine. ;)
I earn my salary living by the mantra "normalize 'til it hurts,
denormalize 'til it works", and I don't think anyone who has inherited
a badly structured database /could/ disagree.
My point wasn't about relative cost of one approach over the other,
but that the storage and indexing should not be an optimization
consideration today. If designed well, it shouldn't make any
difference, beyond the literal SQL query level, how the database is
laid out. If the table structure decided upon is the only possible
way to generate the output required by the application, then there's
something seriously wrong with the data model as designed.
To get back to Horizon, though, can the catalog be queried with a
combination of ISBN and format, or at least queried quickly by ISBN
and filtered on format in a machine-friendly way? If so, I wouldn't
even think of storing the type data in the xISBN generated linking
database. That's one more piece of data to be wrong and/or stale, and
duplication for the sake of premature optimization (IMHO :).
> > > A good database design will go a long way...
I couldn't agree more.
[log in to unmask]
GPLS -- PINES Development