LISTSERV 16.5 - CODE4LIB Archives

Can someone point me in the direction of a good, robust broken link scanner other than Xenu, which is not quite powerful or adaptable enough for my needs. We are trying to get more serious about our content strategy in my library and linking in various parts of our site is abysmal. Here's my dream app...

Web app that collects from a non-technical library staff user a base url path under which to crawl and scan links. User creates the object which includes a descriptive title, their email address, and some hidden metadata, such as current creation date. The app crawls the links of said URL and any children, ignoring other site urls not under the given path, returns a report (web, pdf, csv, whatever) of page title/pageurl/broken link text/broken link url/error code. Further, the app is hooked into cron and runs a new report based off of the existing criteria every X days. On X day, user gets an email with updated report. At login, user has a table sort view of all of their objects and each object keeps a record of reports. Stats on how many links per section, and frequency of broken-ness (tracked over time) would be nice but not deal killer. From the admin side of things we would need to be able to configure global error codes to include/exclude, internal urls to exclude, timeout lengths, depths, and websites to treat specially since they may not play well with the crawler, proxy, whatever. Finally, these plus other settings might be nice to override at a local object level admin-wise as well (i.e set a shorter or longer day cycle, set a maximum depth to crawl, etc).

It seems like something of this sort should exist, but I'm not finding exactly what I want. The closest right now is link tiger, but I don't want to set librarians loose on the whole site, just their targeted areas.

Thoughts?