Playing with words (just for fun)!

This slideshow requires JavaScript.

The wordcloud or tag cloud is a visual representation for text data; the frequent words are bigger and the infrequent ones smaller.  This post’s graph is made on R (by using the wordcloud package). An article from the New York Times, is the input for the graph but you may use your own corpus. An annotated code follows, to understand its logic,

Code:
require(wordcloud)
#using the wordcloud package

require(tm)

data(crude) #crude is a corpus with 20 text documents

crude <- tm_map(crude, removePunctuation) #normalization

crude <- tm_map(crude, function(x)removeWords(x,stopwords()))

tdm <- TermDocumentMatrix(crude) # Creating a term-document matrix (924 terms and 20 documents)

m <- as.matrix(tdm) #Converting tdm to a matrix

v <- sort(rowSums(m),decreasing=TRUE)

d <- data.frame(word = names(v),freq=v) #creating a dataframe

wordcloud(d$word,d$freq) # it gets the column ‘word’ along with the column frequency from the dataframe and creates the wordcloud.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s