Hi all, I am working on developing a software system designed to analyze the content of research documents (e.g., research papers, articles, etc.) archived in scientific repositories (e.g., http://citeseerx.ist.psu.edu <http://citeseerx.ist.psu.edu/> , http://arxiv.org/ ) and automatically classify them according to FAST and DDC. In order to objectively qualify the performance of the system, a collection of research documents which have been manually classified according to the DDC and been assigned FAST subject heading would be required. I was wondering if anyone is aware of such dataset existing online. Regards, Arash