This is why I think we should figure out smart ways to manage facets independently of Lucene index fields. Solr populates a facet by setting up a bitset for every value found in a given index field, and it uses those bitsets to filter query result sets by deriving an intersection set. We can extend that functionality by populating and maintaining bitsets based on external data sources that can map to Lucene document ids. This allows us to update the bitset (relatively cheap) without having to update the index (relatively expensive). We could use this for those attributes that change relatively often, like circ status or user-applied tags (or even full-text in Roy's environment). When something like this changes, we look up the Lucene document id and add it to or delete it from the relevant bitset. We might also update a quick external datastore like a MySQL db that's dedicated to handling these dynamic facets, so we can rebuild the facet from scratch when we need to. That way we avoid having to refetch and reindex the bib record into Lucene via Solr every time a dynamic attribute changes (since you can't update a single field in a Lucene index). I'm assuming that those frequent updates to the Lucene index are enough overhead to be worth avoiding; that will need to be confirmed by practice. My experience with Solr is in a project where we're indexing full text along with bib metadata, so reindexing (potentially hundreds of pages of text) is something we definitely want to avoid. How do people expect this to play out with bib records without full text? Peter -----Original Message----- From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Roy Tennant Sent: Friday, January 19, 2007 11:29 AM To: [log in to unmask] Subject: Re: [CODE4LIB] Limiting by availability (was Re: [CODE4LIB] Getting data from Voyager into XML?) On 1/19/07 9:26 AM, "Steve Toub" <[log in to unmask]> wrote: > Also, as a possible sweet-spot, I'm wondering if its practical to do > post-search winnowing by availability after doing the FCLA-style > real-time query, by doing indexing on the fly of the responses from > the real-time queries for that particular search. Interesting idea if done on a screen-by-screen basis. That is, you simply don't display to the user those that aren't available. I've thought about this same strategy for a "full-text" filter. That is, you bring back all the results, but if the user only wants items that have full-text available, you filter out those that don't as you build the screen display. This of course has a hit on response time, but with APIs that allow multiple-item lookups, it is at least not as bad as it could be. Roy