LISTSERV 16.5 - CODE4LIB Archives

Amy,

It sounds like this is a three-step process for each file:

1) Feed the PDF (as a data blob) into a script
2) Parse out the data that you're looking for (title, author, year)
3) Build a string using your parsed data, and move the file to that new
filename

1 and 3 should be simple with any scripting language; unfortunately, 2 may
be very difficult.  PDF is not a structured data format, so there's no
guarantee that the data you need can be easily parsed out.  If the PDFs
were uniformly generated (e.g. they were all generated from LaTeX markup or
a single content management system) then it may be possible to parse out
information from the file.  If not - for example, if the PDFs consist of
scanned pages - then you'll need to generate that data elsewhere (perhaps
from an existing catalog), create the new filenames that way, and feed that
list into a script/tool to rename the files.

Best of luck,
--Alex

On Fri, Jan 15, 2016 at 11:06 AM, Chris Moschini <[log in to unmask]> wrote:

> It won't surprise you coders do this all the time and so there are 80 ways
> to do this, so your peril is choice not scarcity.
>
> Although there are a ton of tools that will do this for non-coders:
> https://www.google.com/webhp?q=file%20renamer
>
> On Windows robocopy is popular.
>
> The truth is though most coders just pick the programming language of their
> choice and go for it. The most common is Bash and regex. Bash is built-in
> to Linux and Macs and pretty easy to <https://git-for-windows.github.io/>
> get
> onto Windows <https://www.cygwin.com/>. It's an old and ugly language but
> it's also the kitchen sink of "I just need to do this quick thing." That
> said if you dislike old and ugly languages or unintuitive syntax or command
> names, pick a programming language you do like, or one of the tools above.
>
>
> On Fri, Jan 15, 2016 at 10:56 AM, Amy Schuler <[log in to unmask]>
> wrote:
>
> > Hi,
> > I'm looking for a smart bulk file editor, if it exists.  Specifically I'd
> > like it to be able to move through a list of PDF files that are published
> > research papers, and rename them in this approximate format, based on the
> > contents of the file:
> > firstauthor_firstfewwordsoftitle_year.pdf
> >
> > I know this is probably a crazy dream.  The bulk file editors that I know
> > about are more simple.  They can bulk rename files according to a pre-set
> > pattern or they just remove/add/re-position bits from the existing file
> > string.
> >
> > Thanks!
> >
> > Amy Schuler
> > Cary Institute of Ecosystem Studies
> > [log in to unmask]
> >
>