On Wed, Feb 27, 2013 at 1:42 PM, Andy Kohler <[log in to unmask]> wrote:
> I agree with Terry: use a database. Since you're doing multiple queries,
> invest the time up front to import your data in a queryable format, with
> indexes, instead of repeatedly building comparison files...
> Another, completely unrelated, possible solution depending on your needs:
> run the records through solrmarc and do your queries via solr?
> Good luck... let us know what you eventually decide to do.
After trying a few experiments, it appears that my use case (mostly
comparing huge record sets with an even bigger record set of records on
indexed points) is well suited to a relational model. My primary goal is to
help a bunch of libraries migrate to a common catalog so the primary thing
people are interested in knowing is what data is local to their catalog.
Identifying access points and relevant description in their catalog that
are not in the master record involves questions like "Give me a list of
records where field X occurs more times in our local catalog than in the
master record (or that value is missing from the master record -- thank
goodness for LEFT JOIN)" so that arrangements can be made.
I'm getting surprising performance and the convenience of being able to do
everything from the command line is nice.