> From: Code for Libraries [mailto:[log in to unmask]] On > Behalf Of Ross Singer > Sent: 22 May, 2006 11:58 > To: [log in to unmask] > Subject: Re: [CODE4LIB] Musings on using xISBN in our Horizon catalog > > Er, how many records are we talking about? > > I don't really see this table getting too unmanageable. He's > not frbrizing Worldcat this way, after all. > Lets say that the catalog we are talking about has 30,000 items. Also, lets say that xISBN, on average, returns 4 related ISBN's per the given ISBN in the catalog for each of those 30,000 items. Using the ISBN to ISBN relationship table, described in my previous message, we will have 16 rows * 30,000 items for a total of 480,000 rows in the database. While that may not be too unmanageable, depending upon hardware and SQL database being used, if we look at the FRBR to ISBN relationship table, described in my previous message, we will have 4 rows * 30,000 items for a total of 120,000 rows. That is *roughly* 4 times less data that the SQL database has to index and search through per query. Given the size of both datasets, we are still talking about a few seconds per query with most currently available SQL databases. But what happens when the size of the catalog doubles, triples, quadruples, etc.? The tradeoff is that the ISBN to ISBN relationship table uses more storage space which translates to longer database reindexing, backup, etc. times from an operational perspective. It also means that during updates, you have 4 times the number of records to update which will impact performance. Also, if we try to scale the ISBN to ISBN relationship table to a catalog with significantly more items, then it will become unmanageable. When reading Ben's description of the problem he states: "we want to be able to display links to other versions of the same work (audiobooks, large print, possibly translations)." From a database design, you probably want to track "audiobook", "large print", etc. with the item. Using the ISBN to ISBN relationship table design, because the data is unnormalized, we will continue to store duplicate information in the database as we extend the database design: Column 1 Column 2 Column 3 -------- -------- -------- ISBN-1 ISBN-1 book ISBN-1 ISBN-2 audiobook ISBN-1 ISBN-3 large print ISBN-1 ISBN-4 video ISBN-2 ISBN-1 book ISBN-2 ISBN-2 audiobook ISBN-2 ISBN-3 large print ISBN-2 ISBN-4 video ISBN-3 ISBN-1 book ISBN-3 ISBN-2 audiobook ISBN-3 ISBN-3 large print ISBN-3 ISBN-4 video ISBN-4 ISBN-1 book ISBN-4 ISBN-2 audiobook ISBN-4 ISBN-3 large print ISBN-4 ISBN-4 video we could mitigate this duplicate information in the database design by creating a "format" table and use the Foreign Key. However, we still have 12 duplicate foreign keys because the original data was unnormalized. When looking at the FRBR to ISBN relationship table, you have normalized out the "work" and "item" levels and can attach information at either the "work" or "item" levels *without* duplicating information in the database. Column 1 Column 2 Column 3 -------- -------- -------- GROUP-1 ISBN-1 book GROUP-1 ISBN-2 audiobook GROUP-1 ISBN-3 large print GROUP-1 ISBN-4 video Again we could, and probably should, create a "format" table and use a foreign key for Column 3. You can see that as you start to extend the database designs, the ISBN to ISBN relationship table will become unmanageable from a design standpoint and will also cause additional duplicate information to be stored in the database, which will increase storage requirements, etc. A good database design will go a long way... Andy.