Visualising Twitter History
Some time ago I saw a blog post on visualising your twitter history using streamgraphs in R. As my tweets moved from mostly psychology/philosophy tweeting around my teaching, through to my current mish-mash of learning analytics stuff, I thought it’d be interesting to play with this. So, here’s the code reproduced, plus my streamgraph. If anyone has any other ideas for analysing my twitter data, now I’ve got the archive it’d be easy enough to do new stuff
# Adapted from http://www.r-bloggers.com/visualizing-twitter-history-with-streamgraphs-in-r/ # Script for producing a streamgraph of tweet hashtags #note: You have to install 'devtools' package first, to install streamgraph from github #to use devtools you also need rtools from http://cran.r-project.org/bin/windows/Rtools/ #I also had to install withr package #library(devtools) #devtools::install_github("hrbrmstr/streamgraph") # Load packages library("readr") library("dplyr") library("lubridate") library("streamgraph") library("htmlwidgets") # Read my tweets tweets_df <- read_csv("tweets.csv") tweets_df$text <- sapply(tweets_df$text,function(x) iconv(x, "", "UTF-8")) tweets_df$text <- tolower(tweets_df$text) # Pick hashtags with regexp hashtags_list <- regmatches(tweets_df$text, gregexpr("#[[:alnum:]]+", tweets_df$text)) #if you want to play with expanding the hashtags to other keywords one way might be to look at a tfidf matrix, removing very infrequent and very frequent terms # Create a new data_frame with (timestamp, hashtag) -pairs hashtags_df <- data_frame() for (i in which(sapply(hashtags_list, length) > 0)) { hashtags_df <- bind_rows(hashtags_df, data_frame(timestamp = tweets_df$timestamp[i], hashtag = hashtags_list[[i]])) } # Process data for plotting hashtags_df <- hashtags_df %>% # Pick top 20 hashtags filter(hashtag %in% names(sort(table(hashtag), decreasing=TRUE))[1:20]) %>% # Group by year-month (daily is too messy) # Need to add '-01' to make it a valid date for streamgraph mutate(yearmonth = paste0(format(as.Date(timestamp), format="%Y-%m"), "-01")) %>% group_by(yearmonth, hashtag) %>% summarise(value = n()) # Create streamgraph sg <- streamgraph(data = hashtags_df, key = "hashtag", value = "value", date = "yearmonth", offset = "silhouette", interpolate = "cardinal", width = "700", height = "400") %>% sg_legend(TRUE, "hashtag: ") %>% sg_axis_x(tick_interval = 1, tick_units = "year", tick_format = "%Y") # Save it for viewing in the blog post # Using original code to save here (with file.rename()...but not sure if I have the same problem they encountered saveWidget(sg, file="twitter_streamgraph.html", selfcontained = TRUE) file.rename("twitter_streamgraph.html", "twitter_streamgraph.html")
The live version can be found here (I recommend zooming in), with a screenshot highlighting ‘learning analytics’ below. As I note in the code above, another way of visualising this would be to select different terms, but for now this is quite nice :-). One thing it indicates is that I probably am not using hashtags very effectively(!) except for events like conferences – most of the peaks here are around particular, often very short term events. My tweeting certainly goes up then, but it shouldn’t be quite so dramatic as it is.



This Post Has 0 Comments