Hi Code4Libbers, I am working with colleague on a side project which involves some scanned bibliographies and making them more web searchable/sortable/browse-able. While I am quite familiar with the metadata and organization aspects we need, but I am at a bit of a loss on how to automate the process of putting the bibliography in a more structured format so that we can avoid going through hundreds of pages by hand. I am pretty sure regular expressions are needed, but I have not had an instance where I need to automate extracting data from one file type (PDF OCR or text extracted to Word doc) and place it into another (either a database or an XML file) with some enrichment. I would appreciate any suggestions for approaches or tools to look into. Thanks for any help/thoughts people can give. Matt Sherman