Wikimedia meetups in UK - ugly draft 1

Wikimedia meetups in UK – ugly draft 1

Despite the fact I seem to be spending ALL OF THE TIME on R at the minute, I’ve been wanting to explore it a bit more, ideally through manageable chunks (perhaps the achievable element seems comforting contra PhD analysis). In any case, one idea I had recently was to map events, for example Wikimedia meetups, and think about creating scaled blobs over geographic regions. Ideally these would indicate different types of event (e.g. past v. future) and would be interactive (e.g. labelled and hyperlinked). I made a start on it…it was harder than expected :/ but here’s a good beginning. The below is a start on that code. It’d be cool to make it prettier (I mean really), and I think this might be a good candidate for my first Shiny app, so I could make it a bit interactive (E.g. a dropdown list of meetup regions).It might also be fun to use the geonames Wikipedia call at some point, e.g. to find local Wikipedia info on geolocated events. It would also be cool to use WikiData rather than geonames for the location calls (but I’d need to write the api calls, or/and a package to connect, none exists as yet). The reason I’m using ‘rworldmap’ rather than ggmap (by the way) is that even trying to ‘get_openstreetmap’ I was having problems connecting.

library(XML)
library(plyr)
library(shiny)

WikiMeets < - readHTMLTable(doc="http://meta.wikimedia.org/wiki/Meetup/UK/Footer_list") #get the page

WikiMeets <- WikiMeets[[^1]] #happily, there's only one (unnamed) table on that page
#each Wikimeet is numbered "x - " and separated "n" with col 1 giving single cell locations

WikiMeets\$V1 <- as.character(WikiMeets\$V1) #we want char not factor
WikiMeets\$V2 <- as.character(WikiMeets\$V2) #we want char not factor

#There's almost certainly a more elegant way to do this, but we can just do a sapply over the rows, and replace each "n" with the location. 
#Or...this more elegant way!
library(stringr)
WikiMeets <- ddply(WikiMeets, .(V1), 
      function(x) data.frame(WDate=str_split(x\$V2, "n")[[^1]]))
#Now we have a table where each row is 1 event, col1 = location, col2= date

#We want to separate col 2 into n & date
WikiMeets <- cbind(WikiMeets,str_split_fixed(WikiMeets\$WDate, " - ", 2))
WikiMeets <- WikiMeets[c(1,3,4)] #select relevant columns
WikiMeets[] <- lapply(WikiMeets,as.character) #set all as character, use [] to put list into df
WikiMeets <- rename(WikiMeets,c("2"="WDate"))

#Then we'll want to interpret the dates
#it turns out they're filthy.  We need to:
## 1. Abbreviate all months
## 2. Strip leading 0s from days
## 3. Check all years are of a 4 digit type
## 4. Remove the odd multi-day event (there's at least one 8-10 type date)  163
## 5. Deal with 'cancelled' 
## 6. There's at least one instance of month1999 conjoined...the sick bastards  121

WD <- WikiMeets[^3] #get the date column

WD[[^2]] <- lapply(WDl, function(x) x[^1]) #get the day
WD[[^3]] <- lapply(WDl, function(x) ifelse(length(x)>3,x[^4],ifelse(length(x)==1,NA,x[^2]))) #month
WD[[^4]] < - lapply(WDl, function(x) ifelse(length(x)>3,x[^5],ifelse(length(x)==1,NA,x[^3]))) #year

WD < - mutate(WD,
             month = ifelse(is.na(WD[[^3]]),NA,ifelse(nchar(WD[[^3]])>3 & grepl("[0-9]",WD[[^3]])==T,str_sub(paste0(WD[[^3]]),1,3),str_sub(paste0(WD[[^3]]),1,3))),
             year = ifelse(nchar(WD[[^3]])>3 & grepl("[0-9]",WD[[^3]])==T,unlist(strsplit(paste0(WD[[^3]]), "[^0-9]+")),unlist(WD[^4])),
             day = ifelse(nchar(WD[[^2]])==1,paste0("0",unlist(WD[^2])),unlist(WD[^2]))
             )
#cleanup in aisle 3 please.  This works on the 'month' data and (1)checks if NA, if so NA,then checks if it's over 3 AND if it includes numerics (the year) if so it sets WD[[^4]] to that year (strsplit) & pastes 1st 3 letters into col 5, if not it just takes the 1st 3 letters into col 5

WikiMeets < - cbind(WikiMeets,WD[5:7]) #join the split date back onto the meets
rm(WD)

WikiMeets\$WDate <- paste(WikiMeets\$day,WikiMeets\$month,WikiMeets\$year,sep="/") #rejoin the date strings in a standardised format

WikiMeets\$WDate <- ifelse(grepl("cancel",WikiMeets\$WDate,ignore.case=T)==T,"Cancelled",WikiMeets\$WDate) #standardise any with 'cancel' in 

#WikiMeets\$WDate <- ifelse(grepl("cancel",WikiMeets\$WDate,ignore.case=T)==T,"Cancelled",as.Date(WikiMeets\$WDate,"%d/%b/%Y")) looks like as.Date isn't needed here 
Today <- format(Sys.Date(),"%d/%b/%Y") #get today's date in my format

WikiMeets\$Status <- ifelse(WikiMeets\$WDate=="Cancelled","Cancelled",ifelse(WikiMeets\$WDate>Today,"Future",ifelse(WikiMeets\$WDate==Today,"Today","Past"))) #for each date, check if cancelled, today, future, or past

WikiMeets < - rename(WikiMeets,c("V1"="Location"))

#Create the aggregates (future,past,today,cancelled)
WikiMeetsL <- ddply(WikiMeets,.(Status,Location),nrow)

#Get location coordinates
library(geonames)
options(geonamesUsername="sjgknight")
options(geonamesHost="api.geonames.org")
# conveninence function to look up and format results
GNsearchUK <- function(x) {
  res <- GNsearch(name=x, countryBias="GB")
  return(res[1, ])
}
GNresult <- sapply(unique(WikiMeetsL\$Location),GNsearchUK)
GNresult <- t(GNresult) #easier to use transposed
GNresult <- cbind(row.names(GNresult),GNresult) #add loc
GNresult <- GNresult[2:nrow(GNresult),] #get rid of blank row
colnames(GNresult)[^1] <- "Location" #name loc col
WikiMeetsL <- merge(WikiMeetsL,GNresult,by="Location",all.x=T,all.y=F) #attach location data to the WikiMeets table
WikiMeetsL <- WikiMeetsL[c("Location","Status","V1","StatCol","lng","lat")]

#http://api.geonames.org/wikipediaSearch?q=london&maxRows=10&username=demo might be fun to play with at some point

#library(ggmap)
#library(mapproj)
library(rworldmap)
map <- getMap(resolution="low") #get the map

#Thanks https://en.wikipedia.org/wiki/Extreme_points_of_the_United_Kingdom
#Also useful https://www.students.ncl.ac.uk/keith.newman/r/maps-in-r#uk
#http://www.milanor.net/blog/?p=594

#We can create blobs by (1) create a column indicating future,past,cancelled, (2) counting those, (3) layering them in order of size giving each type a diff colour
 par(mai=c(0,0,0.2,0),xaxs="i",yaxs="i") #what does this do?
mapBubbles(dF=WikiMeetsL
           , nameX = "lng"
           , nameY = "lat"
           , nameZSize="V1"
           , nameZColour="Status"
           , colourPalette="rainbow"
           , oceanCol="lightblue"
           , landCol="wheat"
           #, mapRegion="uk"
           , addLegend=T
           , addColourLegend =T
           , symbolSize = .5
           , xlim = c(-11,3)
           , ylim = c(49,60.9)
           , asp = 1
           , lwd=T
           , legendVals = T
           )