Do you have a link to the code you're using? On Tue, Sep 26, 2017 at 1:25 PM, Eric Lease Morgan <[log in to unmask]> wrote: > Does anybody here know how to access a Python compressed sparse row format > (CSR) object? [1] > > I am using Python to do a bit of topic modeling (think “classification”), > and so far, the results are more than plausible, but the results only > return topics not documents corresponding to the topics. Along the way, my > script creates a compressed sparse row format object, and it looks > something like this: > > (0, 16099) 0.055924002143 > (0, 9497) 0.0256051292226 > (0, 16202) 0.140746540109 > (0, 38982) 0.000842900625312 > : : > (309, 40805) 0.0435077792741 > (309, 45679) 0.0435077792741 > (309, 19462) 0.0435077792741 > (309, 8346) 0.0435077792741 > (309, 31204) 0.0435077792741 > > Where the first column denotes a document identifier, the second column > denotes a topic identifier, and the third column denotes the score of the > topic in the document. In the example above, document #0 is a lot about > topic #16202 but not a lot about topic #38982. > > I want to query my CSR object. For example, given a topic identifier (ie. > 48692), return a list of all document identifiers and scores from the > object. I will then sort the scores to find which documents which most > significantly use the given topic. > > I can’t for the life of me figure out how to get what I need. I can get > specific values of rows like this where tfidf is my CRS object: > > >>> print( tfidf[ 309, 31204 ] ) > >>> 0.0435077792741 > > Any help would be greatly appreciated. > > [1] CSR - http://bit.ly/2fPj42V > > — > Eric Morgan > -- Andromeda Yelton Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/ President, Library & Information Technology Association: http://www.lita.org http://andromedayelton.com @ThatAndromeda <http://twitter.com/ThatAndromeda>