Do you have a link to the code you're using?
On Tue, Sep 26, 2017 at 1:25 PM, Eric Lease Morgan <[log in to unmask]> wrote:
> Does anybody here know how to access a Python compressed sparse row format
> (CSR) object? [1]
>
> I am using Python to do a bit of topic modeling (think “classification”),
> and so far, the results are more than plausible, but the results only
> return topics not documents corresponding to the topics. Along the way, my
> script creates a compressed sparse row format object, and it looks
> something like this:
>
> (0, 16099) 0.055924002143
> (0, 9497) 0.0256051292226
> (0, 16202) 0.140746540109
> (0, 38982) 0.000842900625312
> : :
> (309, 40805) 0.0435077792741
> (309, 45679) 0.0435077792741
> (309, 19462) 0.0435077792741
> (309, 8346) 0.0435077792741
> (309, 31204) 0.0435077792741
>
> Where the first column denotes a document identifier, the second column
> denotes a topic identifier, and the third column denotes the score of the
> topic in the document. In the example above, document #0 is a lot about
> topic #16202 but not a lot about topic #38982.
>
> I want to query my CSR object. For example, given a topic identifier (ie.
> 48692), return a list of all document identifiers and scores from the
> object. I will then sort the scores to find which documents which most
> significantly use the given topic.
>
> I can’t for the life of me figure out how to get what I need. I can get
> specific values of rows like this where tfidf is my CRS object:
>
> >>> print( tfidf[ 309, 31204 ] )
> >>> 0.0435077792741
>
> Any help would be greatly appreciated.
>
> [1] CSR - http://bit.ly/2fPj42V
>
> —
> Eric Morgan
>
--
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda <http://twitter.com/ThatAndromeda>
|