Print

Print


> Author, title, and publication year.... won't get you many false positives,
> but might get you lots of false negatives.
>
> It's certainly true that there is no good "naive" approach to matching
> without identifiers and getting a good balance of minimal false positives
> and false negatives. There are tricky ways to approach it I haven't really
> tried yet, you can sometimes get closer to "good enough" than you think with
> just author/title or author/title/year.
>
> Depends on the source of your data too. If you have an AACR2/NAF controlled
> heading for an author, instead of just a free-text author entry field, that
> certainly makes it easier.


You're right -- I meant false negatives since the data often doesn't line
up. And I agree that with author, title, and year you can get pretty far.
But once you get into lesser used stuff, the names won't be controlled,
you'll see lots of minimal level records with missing data elements,
nonunique titles, and other problems. It gets messy fast.

kyle