Well OK, but sorry in advance if this bores people.
I'm working on a frontend to CONTENTdm, specifically for viewing
historical newspapers. Pretty much like the LC newspapers
(http://chroniclingamerica.loc.gov/), but with CDM on the backend
instead of groovy XML stuff.
Tribulation: is it even possible to do a combined fulltext and date
range search? I'm using the new dmwebservices interface that's been
included in v.6. I'm pretty certain (after having crawled around the
code for about a week) that neither dmwebservices or dmQuery (in
DMSystem.php) is the problem, so the suspect becomes the black-box "Find
service". It seems like when a date search is combined with fulltext,
fulltext suddenly gets redefined to be somewhat less than the actual
full text. I don't know exactly what it gets reduced to, but it seems
like it combines title and description and maybe subject but not the
actual OCR'ed "full text". Remove the date clause from the search, and
everything's fine. I've tried this on our vanilla installation, the same
problem. Is this a known thing? Google reveals nothing, nor does the
official site. I'm pretty close to just giving up and decoupling the two
in the search interface, but it seems really unsatisfying.
Triumph: in an OCR'ed collection, there will be a "words.txt" and
"words2.txt" file. The coordinates for each word are stored as
1/65535ths of the width/height of the original image dimensions. The
coordinates are stored in words2.txt as <term x, y, width, height>.
From there you can just overlay a positioned <div> instead of relying
on the composited image you get from getimage.exe (which crashes quite
relibly when the image border intersects a highlight). What the
difference between words.txt and words2.txt is, I don't know yet; but
I've written a little script to pull pixel coordinates of terms out of
words2.txt, if anyone wants.
On 11-05-27 09:21 PM, Kevin S. Clarke wrote:
> I'm sure there are folks on this mailing list who use ContentDM. You
> could always post advances, trials, and tribulations here.
>
> Kevin
>
>
> On Fri, May 27, 2011 at 11:31 PM, Rod McFarland<[log in to unmask]> wrote:
>> Subject tells it all really, I've found some really old wikis and a bunch of
>> unhelpful Powerpoints via Google. The forum on the official page seems to be
>> pretty much dormant. Is there an untainted forum for CONTENTdm
>> users/hackers/victims out there? I've pretty much given up on the OCLC
>> support, but I've made some advances to share, and met some roadblocks to
>> ask about.
>>
>> If there isn't one, I could probably set something up, if there's interest.
>>
|