LISTSERV 16.5 - CODE4LIB Archives

On Tue, Jul 30, 2013 at 11:07 AM, Marc Chantreux <[log in to unmask]> wrote:

> On Tue, Jul 30, 2013 at 10:27:01AM -0400, Mark A. Matienzo wrote:
> > i don't know why we're not talking about Haskell
>
> I did to tell there is a lack of libraries and it is not as convenient
> as perl when it comes to use regexps.
>
> I wrote a MARC::MIR reader for ISO2709 but have some issues (it seems it
> is not lazy as expected) and need IS05426 support.
>

If your code is not lazy then you're being too strict :-)

If you're using ghc, you can use the "-ddump-stranal" to see what
expressions the compiler is identifying as strict; since the goal of doing
this analysis is to propagate this information as far as possible to reduce
cons up suspensions as much as possible, it might be hard to figure out
where the root of the contagion is coming from.

If you separate the purely functional parts of the code from the code that
is messing around with monads it should be easier to figure out where
things are getting forced unexpectedly.

Also, does what parts of ISO2709 does French MARC use that is not
compatible with  Z39.2; likewise, are characters encoded in real 5426, or
in the MARC-8 simulacrum thereof?

Get back to language non-wars;  lazy parsing of marc can be a big win, even
in eager languages.  Once you know that you've read a whole record (e.g. by
some miracle the length at the start of the record happens to be correct,
there is no immediate need to parse the contents of the record.  If you
need the leader, you can parse it when you need it;  if you need tags, you
can start parsing directory entries ; if you need fields, you can use the
directory info, and if you need subfields, you can find the subfield
markers.

 If you're lucky, you can avoid a lot of first and second cache pollution.
 Also, if you're using multiple cores, and even more so, multiple
processors, you may be able to avoid sharing the whole of a record across
caches.

Simon