Dear colleagues,
You want to automate the discovery of people, place names and events within a large corpus of unstructured documents or metadata (e.g. description field)? Then you might want to use the Named-Entity Recognition (NER) extension for OpenRefine that has been developed by Multimedia Lab (ELIS — Ghent University / iMinds) and MasTIC (Université Libre de Bruxelles).
On http://freeyourmetadata.org/named-entity-extraction/, you will find all the information necessary to start experimenting with NER on your own. The extension was developed specifically in the context of a research paper, entitled "Named-Entity Recognition: A Gateway Drug for Cultural Heritage Collections to the Linked Data Cloud?". A preprint of this paper can be found on http://freeyourmetadata.org/publications/named-entity-recognition.pdf. The paper also aims to foster a discussion within the Digital Library community regarding the quality of concepts described in knowledge bases (e.g. Freebase versus DBPedia) and the current struggle between schemes (e.g. schema.org versus Open Graph protocol).
We will be presenting our work in North and Latin America in March (Boston), April (New York and Philadelphia), May (Quito) and June (New York and Montreal) so if you're located in one of those cities/areas and interested in collaborating or hosting a workshop on this topic, don't hesitate to get in touch.
Kind regards,
Seth van Hooland
Président du Master en Sciences et Technologies de l'Information et de la Communication (MaSTIC)
Université Libre de Bruxelles
Av. F.D. Roosevelt, 50 CP 123 | 1050 Bruxelles
http://homepages.ulb.ac.be/~svhoolan/
http://twitter.com/#!/sethvanhooland
http://mastic.ulb.ac.be
0032 2 650 4765
Office: DC11.102
Seth van Hooland
Président du Master en Sciences et Technologies de l'Information et de la Communication (MaSTIC)
Université Libre de Bruxelles
Av. F.D. Roosevelt, 50 CP 123 | 1050 Bruxelles
http://homepages.ulb.ac.be/~svhoolan/
http://twitter.com/#!/sethvanhooland
http://mastic.ulb.ac.be
0032 2 650 4765
Office: DC11.102
|