How is Code4Lib Journal indexed? What software is used, and more specifically, what characteristics of each article are included in the index?
Our journal is pretty cool, but as a library-related journal, I think it can be better. For example, what are the various indexed fields? Maybe we can support faceted browsing? Search results are returned in a very narrative form -- a format this is not very computable. If search results were in some sort of columnar format (TSV, CSV, etc.) sorting and grouping would be possible as well as analysis.
Recently, I have been playing a lot with natural language processing and this has resulted in the extraction of statistically significant keywords, named entities, parts-of-speech, and even the identification of sentences matching a given grammar. All of these things lend themselves to inputs for machine learning processes. In turn, the results of all these things can re-incorporated into an index of Code4Lib. Thus the index not only supports find & get but also analysis. For a good time, I'd like to give this a go, just as an experiment.
Is there someplace where I can download a rudimentary metadata file of all Code4Lib articles? At the least, I hope such a metadata file includes fields such as:
* link to full text
Is there a place where I can get such metadata?