Recommending to look at BibUtils
https://sourceforge.net/p/bibutils/home/Bibutils/ . What that is trying to
do is to take a citation in one format, parse out the metadata into XML
(using this part of the program
https://sourceforge.net/p/bibutils/home/bib2xml/ ), and then convert it to
another citation format. Part of parsing out the metadata in a citation is
to identify what the citation is to (ie. article, book, etc.). The
intermediate format is MODS XML. You would use BibUtils to go from
citation list to MODS XML, then analyze the MODS XML. MODS is one of the
more full featured / complicated / baroque formats.... but you will only be
dealing with a handful of fields. In practice, what I think is going to
happen is that there will formatting problems in the citations that you
feed it, and it's going to give you data that has significant
inaccuracies. I have not worked nuts and bolts with BibUtils, but rather
with digital library software which integrated it, and I have only looked
at how it generates citations in various formats (ie. not the ingesting a
citation part). My experience is that it tends to default to article being
the format, and I think that you are going to find lots of false
positives for things that aren't articles being treated as articles.
Best,
-Wilhelmina
Wilhelmina Randtke
Head of Libraries Technologies and Systems
Zach S. Henderson Library
1400 Southern Dr.
Statesboro, GA, 30458
(912) 478-5035
[log in to unmask]
>
> Date: Sun, 29 Sep 2024 22:25:58 +0000
> From: "Park, Sarah" <[log in to unmask]>
> Subject: identifying publication types from citations
>
> Hi,
>
> I am looking for a tool or method that can help us identify publication
> types from citations/references using scripts or AI-based tools. My
> colleague and I are interested in citation analysis to determine the types
> of sources used in a discipline, for example, journal articles, review
> articles, magazine articles, book chapters, books, websites, government
> documents (Gov Docs), and NGO documents.
>
> One possible method I got so far was using article database APIs, like
> Scopus, to identify document types, but Scopus seems to track some types
> but not all. I also heard that a model can be trained using ChatGPT or
> other generative AI, but I haven't heard how effective it can be.
>
> Any thoughts or suggestions that could lead to a possible solution would
> be greatly appreciated!
>
> Best,
>
>
> Sarah G. Park, she/her
> Mathematics and Computational Sciences Librarian
> Head, Mathematics Library
> Assistant Professor
> University of Illinois at Urbana-Champaign
> [log in to unmask]<mailto:[log in to unmask]>
>
|