> From: Code for Libraries [mailto:[log in to unmask]] On
> Behalf Of Ross Singer
> Sent: 22 May, 2006 11:58
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Musings on using xISBN in our Horizon catalog
>
> Er, how many records are we talking about?
>
> I don't really see this table getting too unmanageable. He's
> not frbrizing Worldcat this way, after all.
>
Lets say that the catalog we are talking about has 30,000 items. Also,
lets say that xISBN, on average, returns 4 related ISBN's per the given
ISBN in the catalog for each of those 30,000 items. Using the ISBN to
ISBN relationship table, described in my previous message, we will have
16 rows * 30,000 items for a total of 480,000 rows in the database.
While that may not be too unmanageable, depending upon hardware and SQL
database being used, if we look at the FRBR to ISBN relationship table,
described in my previous message, we will have 4 rows * 30,000 items for
a total of 120,000 rows. That is *roughly* 4 times less data that the
SQL database has to index and search through per query. Given the size
of both datasets, we are still talking about a few seconds per query
with most currently available SQL databases. But what happens when the
size of the catalog doubles, triples, quadruples, etc.?
The tradeoff is that the ISBN to ISBN relationship table uses more
storage space which translates to longer database reindexing, backup,
etc. times from an operational perspective. It also means that during
updates, you have 4 times the number of records to update which will
impact performance. Also, if we try to scale the ISBN to ISBN
relationship table to a catalog with significantly more items, then
it will become unmanageable.
When reading Ben's description of the problem he states: "we want to be
able to display links to other versions of the same work (audiobooks,
large print, possibly translations)." From a database design, you
probably want to track "audiobook", "large print", etc. with the item.
Using the ISBN to ISBN relationship table design, because the data is
unnormalized, we will continue to store duplicate information in the
database as we extend the database design:
Column 1 Column 2 Column 3
-------- -------- --------
ISBN-1 ISBN-1 book
ISBN-1 ISBN-2 audiobook
ISBN-1 ISBN-3 large print
ISBN-1 ISBN-4 video
ISBN-2 ISBN-1 book
ISBN-2 ISBN-2 audiobook
ISBN-2 ISBN-3 large print
ISBN-2 ISBN-4 video
ISBN-3 ISBN-1 book
ISBN-3 ISBN-2 audiobook
ISBN-3 ISBN-3 large print
ISBN-3 ISBN-4 video
ISBN-4 ISBN-1 book
ISBN-4 ISBN-2 audiobook
ISBN-4 ISBN-3 large print
ISBN-4 ISBN-4 video
we could mitigate this duplicate information in the database design
by creating a "format" table and use the Foreign Key. However, we
still have 12 duplicate foreign keys because the original data was
unnormalized.
When looking at the FRBR to ISBN relationship table, you have
normalized out the "work" and "item" levels and can attach information
at either the "work" or "item" levels *without* duplicating information
in the database.
Column 1 Column 2 Column 3
-------- -------- --------
GROUP-1 ISBN-1 book
GROUP-1 ISBN-2 audiobook
GROUP-1 ISBN-3 large print
GROUP-1 ISBN-4 video
Again we could, and probably should, create a "format" table and use a
foreign key for Column 3.
You can see that as you start to extend the database designs, the ISBN
to ISBN relationship table will become unmanageable from a design
standpoint and will also cause additional duplicate information to be
stored in the database, which will increase storage requirements, etc.
A good database design will go a long way...
Andy.
|