Print

Print


How is Code4Lib Journal indexed? What software is used, and more specifically, what characteristics of each article are included in the index?

Our journal is pretty cool, but as a library-related journal, I think it can be better. For example, what are the various indexed fields? Maybe we can support faceted browsing? Search results are returned in a very narrative form -- a format this is not very computable. If search results were in some sort of columnar format (TSV, CSV, etc.) sorting and grouping would be possible as well as analysis.

Recently, I have been playing a lot with natural language processing and this has resulted in the extraction of statistically significant keywords, named entities, parts-of-speech, and even the identification of sentences matching a given grammar. All of these things lend themselves to inputs for machine learning processes. In turn, the results of all these things can re-incorporated into an index of Code4Lib. Thus the index not only supports find & get but also analysis. For a good time, I'd like to give this a go, just as an experiment. 

Is there someplace where I can download a rudimentary metadata file of all Code4Lib articles? At the least, I hope such a metadata file includes fields such as:

  * author(s)
  * title
  * date
  * abstract
  * link to full text
  * issue

Is there a place where I can get such metadata?

--
Eric Morgan