LISTSERV 16.5 - CODE4LIB Archives

On 5/22/06, Houghton,Andrew <[log in to unmask]> wrote:
> > From: Code for Libraries [mailto:[log in to unmask]] On
> > Behalf Of Ross Singer
> > Sent: 22 May, 2006 11:58
> > To: [log in to unmask]
> > Subject: Re: [CODE4LIB] Musings on using xISBN in our Horizon catalog
> >
> > Er, how many records are we talking about?
> >
> > I don't really see this table getting too unmanageable.  He's
> > not frbrizing Worldcat this way, after all.
> >
>
> Lets say that the catalog we are talking about has 30,000 items.  Also,
> lets say that xISBN, on average, returns 4 related ISBN's per the given
> ISBN in the catalog for each of those 30,000 items.  Using the ISBN to
> ISBN relationship table, described in my previous message, we will have
> 16 rows * 30,000 items for a total of 480,000 rows in the database.
>
> While that may not be too unmanageable, depending upon hardware and SQL
> database being used, if we look at the FRBR to ISBN relationship table,
> described in my previous message, we will have 4 rows * 30,000 items for
> a total of 120,000 rows.  That is *roughly* 4 times less data that the
> SQL database has to index and search through per query.  Given the size
> of both datasets, we are still talking about a few seconds per query
> with most currently available SQL databases.  But what happens when the
> size of the catalog doubles, triples, quadruples, etc.?

Wait.  A few seconds? You're kidding, right?  An indexed look up of
the group id for the original ISBN, and then 1 subsequent look up by
that group id (should also be indexed) should be on the order of 1
millisecond /total/.  Equality index scans are fast in every database
worth using.

>
> The tradeoff is that the ISBN to ISBN relationship table uses more
> storage space which translates to longer database reindexing, backup,
> etc. times from an operational perspective.  It also means that during
> updates, you have 4 times the number of records to update which will
> impact performance.  Also, if we try to scale the ISBN to ISBN
> relationship table to a catalog with significantly more items, then
> it will become unmanageable.
>

Just so we're clear, we are talking about, at most, .5M rows that are
perhaps 50 bytes wide, with internal data.  That translates to ~25
megabytes for the table, and the indexes would be /maybe/ 5 megs each.

> When reading Ben's description of the problem he states: "we want to be
> able to display links to other versions of the same work (audiobooks,
> large print, possibly translations)."  From a database design, you
> probably want to track "audiobook", "large print", etc. with the item.
> Using the ISBN to ISBN relationship table design, because the data is
> unnormalized, we will continue to store duplicate information in the
> database as we extend the database design:
>
> Column 1        Column 2        Column 3
> --------        --------        --------
> ISBN-1  ISBN-1  book
> ISBN-1  ISBN-2  audiobook
> ISBN-1  ISBN-3  large print
> ISBN-1  ISBN-4  video
> ISBN-2  ISBN-1  book
> ISBN-2  ISBN-2  audiobook
> ISBN-2  ISBN-3  large print
> ISBN-2  ISBN-4  video
> ISBN-3  ISBN-1  book
> ISBN-3  ISBN-2  audiobook
> ISBN-3  ISBN-3  large print
> ISBN-3  ISBN-4  video
> ISBN-4  ISBN-1  book
> ISBN-4  ISBN-2  audiobook
> ISBN-4  ISBN-3  large print
> ISBN-4  ISBN-4  video
>
> we could mitigate this duplicate information in the database design
> by creating a "format" table and use the Foreign Key.  However, we
> still have 12 duplicate foreign keys because the original data was
> unnormalized.
>

Can you not search by ISBN and type in the main catalog?  (I'm not a
Horizon user, so I have no idea.)

> When looking at the FRBR to ISBN relationship table, you have
> normalized out the "work" and "item" levels and can attach information
> at either the "work" or "item" levels *without* duplicating information
> in the database.
>
> Column 1        Column 2        Column 3
> --------        --------        --------
> GROUP-1 ISBN-1  book
> GROUP-1 ISBN-2  audiobook
> GROUP-1 ISBN-3  large print
> GROUP-1 ISBN-4  video
>
> Again we could, and probably should, create a "format" table and use a
> foreign key for Column 3.
>
> You can see that as you start to extend the database designs, the ISBN
> to ISBN relationship table will become unmanageable from a design
> standpoint and will also cause additional duplicate information to be
> stored in the database, which will increase storage requirements, etc.
>
> A good database design will go a long way...
>
>
> Andy.
>


--
Mike Rylander
[log in to unmask]
GPLS -- PINES Development
Database Developer
http://open-ils.org