LISTSERV 16.5 - CODE4LIB Archives

Hi all,

 

I am working on developing a software system designed to analyze the
content of research documents (e.g., research papers, articles, etc.)
archived in scientific repositories (e.g., http://citeseerx.ist.psu.edu
<http://citeseerx.ist.psu.edu/>  , http://arxiv.org/ ) and automatically
classify them according to FAST and DDC. In order to objectively qualify
the performance of the system, a collection of research documents which
have been manually classified according to the DDC and been assigned
FAST subject heading would be required. I was wondering if anyone is
aware of such dataset existing online.

 

Regards,

Arash