Print

Print


Do you have a link to the code you're using?

On Tue, Sep 26, 2017 at 1:25 PM, Eric Lease Morgan <[log in to unmask]> wrote:

> Does anybody here know how to access a Python compressed sparse row format
> (CSR) object? [1]
>
> I am using Python to do a bit of topic modeling (think “classification”),
> and so far, the results are more than plausible, but the results only
> return topics not documents corresponding to the topics. Along the way, my
> script creates a compressed sparse row format object, and it looks
> something like this:
>
>   (0, 16099)    0.055924002143
>   (0, 9497)     0.0256051292226
>   (0, 16202)    0.140746540109
>   (0, 38982)    0.000842900625312
>   :     :
>   (309, 40805)  0.0435077792741
>   (309, 45679)  0.0435077792741
>   (309, 19462)  0.0435077792741
>   (309, 8346)   0.0435077792741
>   (309, 31204)  0.0435077792741
>
> Where the first column denotes a document identifier, the second column
> denotes a topic identifier, and the third column denotes the score of the
> topic in the document. In the example above, document #0 is a lot about
> topic #16202 but not a lot about topic #38982.
>
> I want to query my CSR object. For example, given a topic identifier (ie.
> 48692), return a list of all document identifiers and scores from the
> object. I will then sort the scores to find which documents which most
> significantly use the given topic.
>
> I can’t for the life of me figure out how to get what I need. I can get
> specific values of rows like this where tfidf is my CRS object:
>
>   >>> print( tfidf[ 309, 31204 ] )
>   >>> 0.0435077792741
>
> Any help would be greatly appreciated.
>
> [1] CSR - http://bit.ly/2fPj42V
>
> —
> Eric Morgan
>



-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>