This is the screenshot. Thanks On Tue, Nov 12, 2019, 9:12 PM Vinit Kumar <[log in to unmask]> wrote: > Thank you all. > I could use Pandas and Plotly's Dash to develop the ngram viewer. > Eric's suggestion was apt and helped in transposed arrangement. > > Owen's experience matches with my approach too. Thanks for sharing your > experiences with me. > > Thank you all > > On Fri, Nov 8, 2019, 3:27 PM Owen Stephens <[log in to unmask]> wrote: > >> I was involved in some work on calculating and visualising this kind of >> word/phrase frequency on a project a few years ago - this was based on >> queries to a corpus indexed in ElasticSearch, and charting words/phrase >> frequency against each other using javascript - see >> https://ukmhl.historicaltexts.jisc.ac.uk/ngram for an example - but >> although it looks superficially similar it isn't anywhere close to as >> sophisticated as the Google n-gram viewer. >> >> You probably already know, but I think it's worth stating, that the >> Google n-gram viewer (https://books.google.com/ngrams) is not simply >> visualising the frequency of word/phrase occurrences, but the frequency >> as a percentage of the frequency of all n-grams of the same size in the >> corpus https://books.google.com/ngrams/info. The Google n-gram viewer >> goes well beyond this as well, supporting ways of being more specific >> (e.g. you can limit by part of speech, and by language of the texts >> analysed). This suggests a sophisticated linguistic parsing of a large >> corpus with the ability to answer complex questions quickly at a scale - >> something we weren't able to do in our project. >> >> In the project I was involved in, we are simply showing the percentage >> of texts in the corpus in which a word appears, not the frequency as a >> percentage of all same sized n-grams - which means our viewer is more >> about reflecting general book topics than it is about linguistic >> analysis. You can also see issues with the measurement at either end of >> the graph where there seem to be spikes in usage - but this actually >> reflects that the collection simply lacks large numbers of texts from >> those years, which means a term only has to appear in a few books to get >> a high percentage. Despite this, the tool is still useful (IMO) within >> those constraints - in the example search it is possible to see that the >> term "tubercolosis" rises in frequency, while the term "phthisis" (for >> the same condition) drops off - a trend also shown by Google n-gram >> viewer >> >> https://books.google.com/ngrams/graph?content=tuberculosis%2Cphthisis&year_start=1800&year_end=1930&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Ctuberculosis%3B%2Cc0%3B.t1%3B%2Cphthisis%3B%2Cc0 >> >> Finally if you want to do more sophisticated analysis it may be worth >> looking at specialist tools - e.g. AntConc >> http://www.laurenceanthony.net/software/antconc/ >> >> Hope some of that is helpful >> >> Owen >> >> >> Fitchett, Deborah wrote on 08/11/2019 01:10: >> > You might be interested in Chart.js (https://www.chartjs.org/) - it >> does the visualisation part, if you could do the search part. >> > >> > Deborah >> > >> > -----Original Message----- >> > From: Code for Libraries <[log in to unmask]> On Behalf Of Vinit >> Kumar >> > Sent: Thursday, 7 November 2019 6:17 PM >> > To: [log in to unmask] >> > Subject: [CODE4LIB] n-gram visualisation >> > >> > Dear Code4Libers, >> > >> > I have data with the following structure: >> > ngrams 2009 2010 2011 2012 2013 2014 2015 >> > library 22 3 32 32 35 21 21 >> > technology 3 4 43 32 30 43 32 >> > and so on >> > >> > Is it possible to visualise this data in a similar manner as Google >> N-gram viewer displays? Wherein one can put the keyword in a search bar and >> the visualisation displays the year wise trend of that keyword in the >> corpus based on the above structured data. >> > Any pointers or tools would be of help. >> > Thanking you in anticipation. >> > >> > >> > -- >> > Regards >> > Vinit Kumar, Ph.D. >> > Assistant Professor, >> > Department of Library and Information Science Babasaheb Bhimrao >> Ambedkar University, Rae Bareilly Road, Lucknow, India 226025 >> > +919454120174 >> > >> > >> > ________________________________ >> > >> > "The contents of this e-mail (including any attachments) may be >> confidential and/or subject to copyright. Any unauthorised use, >> distribution, or copying of the contents is expressly prohibited. If you >> have received this e-mail in error, please advise the sender by return >> e-mail or telephone and then delete this e-mail together with all >> attachments from your system." >> >> -- >> Sent from Postbox <https://www.postbox-inc.com> >> >