LISTSERV 16.5 - CODE4LIB Archives

This is the screenshot.
Thanks

On Tue, Nov 12, 2019, 9:12 PM Vinit Kumar <[log in to unmask]> wrote:

> Thank you all.
> I could use Pandas and Plotly's Dash to develop the ngram viewer.
> Eric's suggestion was apt and helped in transposed arrangement.
>
> Owen's experience matches with my approach too. Thanks for sharing your
> experiences with me.
>
> Thank you all
>
> On Fri, Nov 8, 2019, 3:27 PM Owen Stephens <[log in to unmask]> wrote:
>
>> I was involved in some work on calculating and visualising this kind of
>> word/phrase frequency on a project a few years ago - this was based on
>> queries to a corpus indexed in ElasticSearch, and charting words/phrase
>> frequency against each other using javascript - see
>> https://ukmhl.historicaltexts.jisc.ac.uk/ngram for an example - but
>> although it looks superficially similar it isn't anywhere close to as
>> sophisticated as the Google n-gram viewer.
>>
>> You probably already know, but I think it's worth stating, that the
>> Google n-gram viewer (https://books.google.com/ngrams) is not simply
>> visualising the frequency of word/phrase occurrences, but the frequency
>> as a percentage of the frequency of all n-grams of the same size in the
>> corpus https://books.google.com/ngrams/info. The Google n-gram viewer
>> goes well beyond this as well, supporting ways of being more specific
>> (e.g. you can limit by part of speech, and by language of the texts
>> analysed). This suggests a sophisticated linguistic parsing of a large
>> corpus with the ability to answer complex questions quickly at a scale -
>> something we weren't able to do in our project.
>>
>> In the project I was involved in, we are simply showing the percentage
>> of texts in the corpus in which a word appears, not the frequency as a
>> percentage of all same sized n-grams - which means our viewer is more
>> about reflecting general book topics than it is about linguistic
>> analysis. You can also see issues with the measurement at either end of
>> the graph where there seem to be spikes in usage - but this actually
>> reflects that the collection simply lacks large numbers of texts from
>> those years, which means a term only has to appear in a few books to get
>> a high percentage. Despite this, the tool is still useful (IMO) within
>> those constraints - in the example search it is possible to see that the
>> term "tubercolosis" rises in frequency, while the term "phthisis" (for
>> the same condition) drops off - a trend also shown by Google n-gram
>> viewer
>>
>> https://books.google.com/ngrams/graph?content=tuberculosis%2Cphthisis&year_start=1800&year_end=1930&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Ctuberculosis%3B%2Cc0%3B.t1%3B%2Cphthisis%3B%2Cc0
>>
>> Finally if you want to do more sophisticated analysis it may be worth
>> looking at specialist tools - e.g. AntConc
>> http://www.laurenceanthony.net/software/antconc/
>>
>> Hope some of that is helpful
>>
>> Owen
>>
>>
>> Fitchett, Deborah wrote on 08/11/2019 01:10:
>> > You might be interested in Chart.js (https://www.chartjs.org/) - it
>> does the visualisation part, if you could do the search part.
>> >
>> > Deborah
>> >
>> > -----Original Message-----
>> > From: Code for Libraries <[log in to unmask]> On Behalf Of Vinit
>> Kumar
>> > Sent: Thursday, 7 November 2019 6:17 PM
>> > To: [log in to unmask]
>> > Subject: [CODE4LIB] n-gram visualisation
>> >
>> > Dear Code4Libers,
>> >
>> > I have data with the following structure:
>> > ngrams   2009    2010  2011 2012   2013 2014 2015
>> > library        22        3          32     32       35      21       21
>> > technology  3         4          43     32       30     43      32
>> > and so on
>> >
>> > Is it possible to visualise this data in a similar manner as Google
>> N-gram viewer displays? Wherein one can put the keyword in a search bar and
>> the visualisation displays the year wise trend of that keyword in the
>> corpus based on the above structured data.
>> > Any pointers or tools would be of help.
>> > Thanking you in anticipation.
>> >
>> >
>> > --
>> > Regards
>> > Vinit Kumar, Ph.D.
>> > Assistant Professor,
>> > Department of Library and Information Science Babasaheb Bhimrao
>> Ambedkar University, Rae Bareilly Road, Lucknow, India 226025
>> > +919454120174
>> >
>> >
>> > ________________________________
>> >
>> > "The contents of this e-mail (including any attachments) may be
>> confidential and/or subject to copyright. Any unauthorised use,
>> distribution, or copying of the contents is expressly prohibited. If you
>> have received this e-mail in error, please advise the sender by return
>> e-mail or telephone and then delete this e-mail together with all
>> attachments from your system."
>>
>> --
>> Sent from Postbox <https://www.postbox-inc.com>
>>
>