Print

Print


Well OK, but sorry in advance if this bores people.

I'm working on a frontend to CONTENTdm, specifically for viewing 
historical newspapers. Pretty much like the LC newspapers 
(http://chroniclingamerica.loc.gov/), but with CDM on the backend 
instead of groovy XML stuff.

Tribulation: is it even possible to do a combined fulltext and date 
range search? I'm using the new dmwebservices interface that's been 
included in v.6. I'm pretty certain (after having crawled around the 
code for about a week) that neither dmwebservices or dmQuery (in 
DMSystem.php) is the problem, so the suspect becomes the black-box "Find 
service". It seems like when a date search is combined with fulltext, 
fulltext suddenly gets redefined to be somewhat less than the actual 
full text. I don't know exactly what it gets reduced to, but it seems 
like it combines title and description and maybe subject but not the 
actual OCR'ed "full text". Remove the date clause from the search, and 
everything's fine. I've tried this on our vanilla installation, the same 
problem. Is this a known thing? Google reveals nothing, nor does the 
official site. I'm pretty close to just giving up and decoupling the two 
in the search interface, but it seems really unsatisfying.

Triumph: in an OCR'ed collection, there will be a "words.txt" and 
"words2.txt" file. The coordinates for each word are stored as 
1/65535ths of the width/height of the original image dimensions. The 
coordinates are stored in words2.txt as <term    x, y, width, height>. 
 From there you can just overlay a positioned <div> instead of relying 
on the composited image you get from getimage.exe (which crashes quite 
relibly when the image border intersects a highlight). What the 
difference between words.txt and words2.txt is, I don't know yet; but 
I've written a little script to pull pixel coordinates of terms out of 
words2.txt, if anyone wants.


On 11-05-27 09:21 PM, Kevin S. Clarke wrote:
> I'm sure there are folks on this mailing list who use ContentDM.  You
> could always post advances, trials, and tribulations here.
>
> Kevin
>
>
> On Fri, May 27, 2011 at 11:31 PM, Rod McFarland<[log in to unmask]>  wrote:
>> Subject tells it all really, I've found some really old wikis and a bunch of
>> unhelpful Powerpoints via Google. The forum on the official page seems to be
>> pretty much dormant. Is there an untainted forum for CONTENTdm
>> users/hackers/victims out there? I've pretty much given up on the OCLC
>> support, but I've made some advances to share, and met some roadblocks to
>> ask about.
>>
>> If there isn't one, I could probably set something up, if there's interest.
>>