On 5/22/06, Houghton,Andrew <[log in to unmask]> wrote: > > From: Code for Libraries [mailto:[log in to unmask]] On > > Behalf Of Ross Singer > > Sent: 22 May, 2006 11:58 > > To: [log in to unmask] > > Subject: Re: [CODE4LIB] Musings on using xISBN in our Horizon catalog > > > > Er, how many records are we talking about? > > > > I don't really see this table getting too unmanageable. He's > > not frbrizing Worldcat this way, after all. > > > > Lets say that the catalog we are talking about has 30,000 items. Also, > lets say that xISBN, on average, returns 4 related ISBN's per the given > ISBN in the catalog for each of those 30,000 items. Using the ISBN to > ISBN relationship table, described in my previous message, we will have > 16 rows * 30,000 items for a total of 480,000 rows in the database. > > While that may not be too unmanageable, depending upon hardware and SQL > database being used, if we look at the FRBR to ISBN relationship table, > described in my previous message, we will have 4 rows * 30,000 items for > a total of 120,000 rows. That is *roughly* 4 times less data that the > SQL database has to index and search through per query. Given the size > of both datasets, we are still talking about a few seconds per query > with most currently available SQL databases. But what happens when the > size of the catalog doubles, triples, quadruples, etc.? Wait. A few seconds? You're kidding, right? An indexed look up of the group id for the original ISBN, and then 1 subsequent look up by that group id (should also be indexed) should be on the order of 1 millisecond /total/. Equality index scans are fast in every database worth using. > > The tradeoff is that the ISBN to ISBN relationship table uses more > storage space which translates to longer database reindexing, backup, > etc. times from an operational perspective. It also means that during > updates, you have 4 times the number of records to update which will > impact performance. Also, if we try to scale the ISBN to ISBN > relationship table to a catalog with significantly more items, then > it will become unmanageable. > Just so we're clear, we are talking about, at most, .5M rows that are perhaps 50 bytes wide, with internal data. That translates to ~25 megabytes for the table, and the indexes would be /maybe/ 5 megs each. > When reading Ben's description of the problem he states: "we want to be > able to display links to other versions of the same work (audiobooks, > large print, possibly translations)." From a database design, you > probably want to track "audiobook", "large print", etc. with the item. > Using the ISBN to ISBN relationship table design, because the data is > unnormalized, we will continue to store duplicate information in the > database as we extend the database design: > > Column 1 Column 2 Column 3 > -------- -------- -------- > ISBN-1 ISBN-1 book > ISBN-1 ISBN-2 audiobook > ISBN-1 ISBN-3 large print > ISBN-1 ISBN-4 video > ISBN-2 ISBN-1 book > ISBN-2 ISBN-2 audiobook > ISBN-2 ISBN-3 large print > ISBN-2 ISBN-4 video > ISBN-3 ISBN-1 book > ISBN-3 ISBN-2 audiobook > ISBN-3 ISBN-3 large print > ISBN-3 ISBN-4 video > ISBN-4 ISBN-1 book > ISBN-4 ISBN-2 audiobook > ISBN-4 ISBN-3 large print > ISBN-4 ISBN-4 video > > we could mitigate this duplicate information in the database design > by creating a "format" table and use the Foreign Key. However, we > still have 12 duplicate foreign keys because the original data was > unnormalized. > Can you not search by ISBN and type in the main catalog? (I'm not a Horizon user, so I have no idea.) > When looking at the FRBR to ISBN relationship table, you have > normalized out the "work" and "item" levels and can attach information > at either the "work" or "item" levels *without* duplicating information > in the database. > > Column 1 Column 2 Column 3 > -------- -------- -------- > GROUP-1 ISBN-1 book > GROUP-1 ISBN-2 audiobook > GROUP-1 ISBN-3 large print > GROUP-1 ISBN-4 video > > Again we could, and probably should, create a "format" table and use a > foreign key for Column 3. > > You can see that as you start to extend the database designs, the ISBN > to ISBN relationship table will become unmanageable from a design > standpoint and will also cause additional duplicate information to be > stored in the database, which will increase storage requirements, etc. > > A good database design will go a long way... > > > Andy. > -- Mike Rylander [log in to unmask] GPLS -- PINES Development Database Developer http://open-ils.org