> So the ideal use case for our xISBN cache is that we would be querying
> only a local database, and that database would only return ISBNs (or bib
> numbers) of other editions which are actually in our catalog.
Very nice idea.
> My guess is that we should have a row for each ISBN in the system, along
> with a column that links that ISBN to some common identifier that will
> symbolically mean e.g. "SQL for Dummies, any version". We could then
> ask "What is the identifier for the first ISBN of the item being
> displayed?" and then do a second query that asks "What other ISBNs in
> our catalog have that identifier?"
My gut instinct is that while this is nice, I'm wondering if for this
particular project it couldn't be simplier. Why not just simply have a
two column table, both being ISBNs. Something like source,related. Then
you simply feed in the results for every hit from the xISBN service.
And of course you might want to refresh this over time, or try to use some
other techniques. (Ie occasionally look for places where source isbn has
related isbn, but related does not have source.).
Of course, there would be a definite advantage to be able to have the
identifier. I don't know what number you'd use. I would think you'd just
have to use a internal id (auto-incrementing or something of that nature).
>
> The only remaining question, I think, is whether we should have one
> table with ISBN as primary key and identifier as a second column (one
> row per ISBN), and then another table with identifier as primary key and
> ISBN as second column (again, one row per ISBN). This could make our
> queries faster at the cost of doubling storage space. On the other
> hand, perhaps we don't need to optimize that far. It depends on how
> mySQL reacts to searching the entire table for a non-primary-key column
> value. I'll have to fiddle with it and find out, unless one of you
> knows already.
>
Nope, sounds like a fiddling thing to me. I'd suspect that a two column
table should be optimized for lookups in either direction since they tend
to be used in joining queries (ie are a relation). But I don't know the
behavior of MySQL by default.
Jon Gorman
|