The wordcloud or tag cloud is a visual representation for text data; the frequent words are bigger and the infrequent ones smaller. This post’s graph is made on R (by using the wordcloud package). An article from the New York Times, is the input for the graph but you may use your own corpus. An annotated code follows, to understand its logic,
#using the wordcloud package
data(crude) #crude is a corpus with 20 text documents
crude <- tm_map(crude, removePunctuation) #normalization
crude <- tm_map(crude, function(x)removeWords(x,stopwords()))
tdm <- TermDocumentMatrix(crude) # Creating a term-document matrix (924 terms and 20 documents)
m <- as.matrix(tdm) #Converting tdm to a matrix
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v) #creating a dataframe
wordcloud(d$word,d$freq) # it gets the column ‘word’ along with the column frequency from the dataframe and creates the wordcloud.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.