Some time ago I saw a blog post on [visualising your twitter history using streamgraphs]1 in R. As my tweets moved from mostly psychology/philosophy tweeting around my teaching, through to my current mish-mash of learning analytics stuff, I thought it’d be interesting to play with this. So, here’s the code reproduced, plus my streamgraph. If anyone has any other ideas for analysing my twitter data, now I’ve got the archive it’d be easy enough to do new stuff
# Adapted from http://www.r-bloggers.com/visualizing-twitter-history-with-streamgraphs-in-r/
# Script for producing a streamgraph of tweet hashtags
#note: You have to install 'devtools' package first, to install streamgraph from github
#to use devtools you also need rtools from http://cran.r-project.org/bin/windows/Rtools/
#I also had to install withr package
#library(devtools)
#devtools::install_github("hrbrmstr/streamgraph")
# Load packages
library("readr")
library("dplyr")
library("lubridate")
library("streamgraph")
library("htmlwidgets")
# Read my tweets
tweets_df <- read_csv("tweets.csv")
tweets_df\$text <- sapply(tweets_df\$text,function(x) iconv(x, "", "UTF-8"))
tweets_df\$text <- tolower(tweets_df\$text)
# Pick hashtags with regexp
hashtags_list <- regmatches(tweets_df\$text, gregexpr("#[[:alnum:]]+", tweets_df\$text))
#if you want to play with expanding the hashtags to other keywords one way might be to look at a tfidf matrix, removing very infrequent and very frequent terms
# Create a new data_frame with (timestamp, hashtag) -pairs
hashtags_df <- data_frame()
for (i in which(sapply(hashtags_list, length) > 0)) {
hashtags_df <- bind_rows(hashtags_df, data_frame(timestamp = tweets_df\$timestamp[i],
hashtag = hashtags_list[[i]]))
}
# Process data for plotting
hashtags_df <- hashtags_df %>%
# Pick top 20 hashtags
filter(hashtag %in% names(sort(table(hashtag), decreasing=TRUE))[1:20]) %>%
# Group by year-month (daily is too messy)
# Need to add '-01' to make it a valid date for streamgraph
mutate(yearmonth = paste0(format(as.Date(timestamp), format="%Y-%m"), "-01")) %>%
group_by(yearmonth, hashtag) %>%
summarise(value = n())
# Create streamgraph
sg <- streamgraph(data = hashtags_df, key = "hashtag", value = "value", date = "yearmonth",
offset = "silhouette", interpolate = "cardinal",
width = "700", height = "400") %>%
sg_legend(TRUE, "hashtag: ") %>%
sg_axis_x(tick_interval = 1, tick_units = "year", tick_format = "%Y")
# Save it for viewing in the blog post
# Using original code to save here (with file.rename()...but not sure if I have the same problem they encountered
saveWidget(sg, file="twitter_streamgraph.html", selfcontained = TRUE)
file.rename("twitter_streamgraph.html", "twitter_streamgraph.html")
The live version can be found [here]2 (I recommend zooming in), with a screenshot highlighting ‘learning analytics’ below. As I note in the code above, another way of visualising this would be to select different terms, but for now this is quite nice :-). One thing it indicates is that I probably am not using hashtags very effectively(!) except for events like conferences – most of the peaks here are around particular, often very short term events. My tweeting certainly goes up then, but it shouldn’t be quite so dramatic as it is.
twitter_streamgraph