One of the difficulties with your surface analysis of Thoreau vs Austen is
that Thoreau wrote a memoir and Austen wrote fictional narrative. If the
texts were available, it might be interesting to see how something like
Bridget Jones compares. It will clearly have a lot of female 3rd person in
it, but it will also have a lot of 1st person.
On Tue, Feb 22, 2011 at 19:02, Eric Lease Morgan <[log in to unmask]> wrote:
> On Feb 22, 2011, at 9:02 AM, Cindy Harper wrote:
> > It's not ironic - my post was musing inspired by your work. I guess I
> wasn't sure if I understood your results. You were looking at the overall
> POS usage in the entire texts as a possible way of ranking the texts. I was
> wondering about POS of particular search terms - those that could take on
> several POS....
> Initially I wanted to see if I could classify works based on their POS
> usage.  I was hoping to find lots of action verbs in one work and call it
> an action story. I was hoping to find lots of nouns in another story and
> call it... I don't know, something else. Instead, after rudimentary
> investigation, I discovered that all of of the works I analyzed had the same
> relative percentage of nouns, pronouns, verbs, adverbs, adjectives, etc.
> Maybe such a thing is indicative of the English language.
> On the other hand, I did notice a difference in the use of particular
> pronouns between works. In Walden by Thoreau, a story about an individual
> living on the banks of a "pond", there was a lot of use of the word "I", but
> in a different story, where the author and his brother canoe down a river,
> the word "we" predominated. Similarly, three Jane Austen stories have many
> words like "she" and "her" where those words are less frequent in the works
> by Thoreau. While my analysis was trivial and thin, I think we might be able
> to classify some works by gender or speaking voice.
> Similar things may be possible with other parts-of-speech, like adjectives,
> specifically colors. For example 214 of the 117,540 words in Walden (0.18%)
> are colors  But only 13 of 121,917 words in Pride and Prejudice (0.01%)
> are color words. Despite the similar lengths of the works, Walden is 18
> times more "colorful" than Pride. Interesting? This only begs other
> questions. Is 0.18% a high value or a low value? Is the relative use of
> colors similar within a particular author or not? Has the use of color
> changed over time or indicative of genres? Does the use of specific colors
> actually denote mood?
> In the past libraries did not have a whole lot of full text in order to
> evaluate content. That is not true now-a-days. It is now possible to
> literally count and measure a book's characteristics. Since this metadata is
> numeric in nature, it lends itself to visualization. (Think Karen C's
> presentation at Code4Lib.) And this whole thing is good fodder for search,
> discovery, and evaluation. Too much of our metadata is qualitative.
>  foray's into POS - http://bit.ly/aM2eZx
>  color words in Walden - http://t.co/hlg5ibL
>  color words in Pride - http://t.co/VflNf3n
> Eric Lease Morgan