Print

Print


I built a Queryset class using pysolr that does this for one of my projects. It's not available as a standalone package, but the code is here:

https://github.com/unt-libraries/catalog-api/blob/master/django/sierra/utils/solr.py

Django-Haystack does this as well, but I'm not sure how usable Haystack is outside of Django. In fact, what I built here was an attempt to re-implement parts of Haystack's SearchQuerySet interface using something a bit more barebones. I found in this particular case Haystack was overkill for my needs and actually added quite a bit of overhead to Solr searches, but its interfaces are perfect for use in Django where you want to use Solr in place of Django models (e.g., using a Solr queryset in place of Django's QuerySets). I was able to shave a few hundred milliseconds off search requests this way. And since this is in context of a web API, that was important for me.

I guess it works about like you would think. After you instantiate a utils.solr.Queryset object (passing connection details, a page_by parameter, and the kwargs you want to send to Solr), you access individual results using the Queryset object like you would a list. Behind the scenes, it sends queries to Solr as needed to fetch the result (or results) you're accessing and caches the last result set. It only sends a query when you try to access a result outside the cached result set.

Some of the filters still need some work, and it *is* barebones--features like highlighting and faceting aren't implemented at all--but it works for what I use it for, and it shows how you might go about abstracting Solr paging logic with pysolr. (FWIW I haven't tried any of the other numerous Python modules for interacting with Solr so I'm not sure if others do something similar...)

Jason


> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Tod Olson
> Sent: Thursday, September 01, 2016 5:28 AM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] python for solr
> 
> Exactly! The question is whether there is a python solr library that
> provides a layer of abstraction over that paging logic.
> 
> -Tod
> 
> Sent from from the æther.
> 
> > On Sep 1, 2016, at 04:59, Andrew Hankinson
> <[log in to unmask]> wrote:
> >
> > Solr itself has an internal limit to the number of results you can
> return on a single page (I think it is 1000) and AFAIK always returns a
> paged result. For speed and memory usage over large result sets it would
> probably be most efficient to build in paging logic.
> >
> >> On Aug 31, 2016, at 10:45 PM, Tod Olson <[log in to unmask]> wrote:
> >>
> >> On a related note, do any of the libraries allow the user to iterate
> over a large result set without having to be aware of repeated calls,
> incrementing the start parameter, and that sort of bookkeeping?
> >>
> >> It seems like someone must have built an iterator to hide that when
> you're trying to sift through a large number of hits.
> >>
> >> -Tod
> >>
> >>> On Aug 31, 2016, at 4:09 PM, Rhoads, Joseph
> <[log in to unmask]> wrote:
> >>>
> >>> I've used several of these.  I like the interface of mysolr but (as
> >>> mentioned) it hasn't been updated in a while.
> >>>
> >>> pysolr is fairly up to date (v3.5 came out in May this year), and is
> used
> >>> in django-haystack for the solr backend.
> >>> https://github.com/django-haystack/pysolr
> >>>
> >>> Haystack itself is great if you want an ORM-like interface for solr
> and use
> >>> django.
> >>> https://github.com/django-haystack/django-haystack
> >>>
> >>> -Joseph
> >>>
> >>>
> >>>
> >>>> On Wed, Aug 31, 2016 at 3:42 PM, Chris Gray <[log in to unmask]>
> wrote:
> >>>>
> >>>> I haven't done much of that but you can submit documents via the
> API and
> >>>> have them indexed (and processed by Tika).  Once you understand how
> to do
> >>>> that, you might find that you can do everything you want to do.
> >>>>
> >>>> An alternative would be reading the source of one of those
> libraries.  In
> >>>> the list you referenced, the only mention of inserting documents
> was for
> >>>> sunburnt.  I would be inclined to look there first, especially
> since it
> >>>> mentions a pythonic interface to Solr.
> >>>>
> >>>> A good, and amusing, cautionary tale about overwritten Python
> libraries is
> >>>> at https://www.youtube.com/watch?v=o9pEzgHorH0.
> >>>>
> >>>> Chris
> >>>>
> >>>>
> >>>>> On 2016-08-31 03:28 PM, Eric Lease Morgan wrote:
> >>>>>
> >>>>> On Aug 31, 2016, at 3:25 PM, Chris Gray <[log in to unmask]>
> wrote:
> >>>>>
> >>>>> Okay, there are SO many Python libraries [1] for Solr, and I'd
> like to
> >>>>>>> know which one is the most popular (not necessarily the "best").
> >>>>>> What do you want to do with it?
> >>>>>>
> >>>>>> I didn't feel the need to even look for a Python library for my
> needs.
> >>>>>> I use Python to submit searches to the Solr web API and consume
> the results
> >>>>>> as JSON.
> >>>>>
> >>>>> Good question. I want to add documents to a Solr index, and I want
> to
> >>>>> query the same index. Hmmm. -Eric M.
> >>