On Wed, Feb 27, 2013 at 1:42 PM, Andy Kohler <[log in to unmask]> wrote: > I agree with Terry: use a database. Since you're doing multiple queries, > invest the time up front to import your data in a queryable format, with > indexes, instead of repeatedly building comparison files... > > Another, completely unrelated, possible solution depending on your needs: > run the records through solrmarc and do your queries via solr? > > Good luck... let us know what you eventually decide to do. > After trying a few experiments, it appears that my use case (mostly comparing huge record sets with an even bigger record set of records on indexed points) is well suited to a relational model. My primary goal is to help a bunch of libraries migrate to a common catalog so the primary thing people are interested in knowing is what data is local to their catalog. Identifying access points and relevant description in their catalog that are not in the master record involves questions like "Give me a list of records where field X occurs more times in our local catalog than in the master record (or that value is missing from the master record -- thank goodness for LEFT JOIN)" so that arrangements can be made. I'm getting surprising performance and the convenience of being able to do everything from the command line is nice. kyle