Sitting
outside in Ireland (near the place below) seems as good a time as any to
think about how the year’s gone, collaborations forged, skills
developed, etc. (this is also one of the last things I need to slot in
to my 1st year report before I’m done on it!) While this is primarily
something I’m writing for my own purposes – as is the rest of the blog –
aspects of it may be of interest to other’s working in similar areas or
possibly just thinking about reflecting on the year (and look, there’s a
pretty picture). Blog as staging ground First off, the
blog has been a staging ground for a lot of my thinking this year – I’ve
often drafted thoughts here (where they are searchable and organised in
posts) and then copied much of the text into formal publications.
That’s been useful to me, perhaps partly because I can “publish” things
unfinished here, and it pushes me away from my horrible habit of having
dozens of documents open at any one time (although I still have a
problem with >400 tabs open!). When I set the blog up I was pondering
pulling a [Belshaw]1, and writing my thesis largely online from
the beginning (on wikis and wordpress). I was particularly interested
in using CommentPress or Digress.it for paragraph level feedback, and
ZotPress for citation management. In the end, although the blog has
been a drafting space, it hasn’t been in the same way as I’d originally
conceived. That’s partly because of the way I like to write. It’s also
because I was shouted out for having a hideous blog (thanks
[Gol]2)…so I (Gol) fixed it up, including removing the para-level
commenting (which in any case no one was using). ZotPress has remained,
and it’s actually got much better (I’d highly recommend it to academic
bloggers) but for longer pieces of writing the use of shortcodes is
still problematic (you need to flit between the published and raw
versions to see what citation the shortcode refers to – and sometimes
that’s just not workable). My main plan for the blog next year is to
move to self-hosting or at least my own domain name – long term it’s
pretty stupid to send people to an institutional webpage given I’m
likely to only be there for another 2 years. I’d welcome any other
suggestions on things to improve! Conferences and Meetings I’ve been
lucky enough to attend a few conferences this year, with a couple of
other things lined up for the future. Wikimedia visits (including
San Francisco and Lincoln AGM) One of the things I’ve been exploring
this year is the potential of [mediawiki for learning analytics]3,
and whether [wikimedia contributions could be badged]4. That
would be particularly interesting for me given that such platforms might
offer [insight into epistemic cognition,]5 and that the platform
might be used as a [collaborative information seeking/sharing]6
one (see below). More broadly I’m also interested in the potential of
collective intelligence tools such as [Wikipedia for learning
environments and OER development]7 (and talked about this at the
WMUK AGM/Conference in Lincoln earlier in the year), and am particularly
keen to encourage people to edit Wikipedia – including [editing the
Learning Analytics article]8 and [Massive Open Online Course
article]9. Prior to this year I’d never edited Wikipedia (or, I
had, but it was one minor correction some time ago); my work on ORBIT
involved using the platform which helped develop the skills, but it’s
something I’ve become more interested in and both the platform-skills
around organising knowledge, and the specific skills of writing a
Wikipedia style article are pretty valuable and something I’d recommend
thinking about. CSCW13 (both workshops) – San Antonio, Texas In
February I attended [CSCW in San Antonio, Texas]10 with two
workshop papers. One of those was on the use of a toolset (including
Cohere) to support collaborative sensemaking in collaborative
information seeking environments; in an updated form this is included as
part of my first year report. In addition, that workshop has led to a
collaboration with two other attendees and I co-authoring a paper on the
nature of ‘context’ in CIS. The other workshop – on the relationship
between CSCW and Education – was also a useful networking event, from
which I have maintained contact (and met up with again) at least one
other attendee. LAK13 – Leuven, Belgium Almost immediately upon
starting the PhD we (my supervisors and I) set about writing a
submission for the [3rd Learning Analytics and Knowledge
conference]11, held in Leuven, Belgium. In the end, what we
submitted was nominated for ‘Best Paper’ award, and will be revised and
updated for submission in the first issue of the Journal of Learning
Analytics (as well as forming a significant chunk of the earlier parts
of my first year report). Much of the work on this paper also informed
my and Simon’s work on the [6th week of of the Learning Analytics Open
Course]12 this year – which was on epistemology and LAK, and
included talks from me, George Siemens (on connectivism), and David
Williamson Shaffer (on epistemic games). In addition to that paper,
Karen and I wrote a paper for the Discourse Centric Learning Analytics
workshop on the importance of context for educational discourse, and
some challenges to DCLA of context. This paper informed my subsequent
analysis of DCLA techniques and a paper (in draft, included as a Work in
Progress in the report) on the multiple levels of context in the
analysis of exploratory dialogue, some challenges for machine learning
techniques, and a proposed method. LASI – Palo Alto, California In
early July, I was at Stanford (in fact, I’ve really only just returned)
at the Learning Analytics Summer Institute. Unusually, I didn’t write a
blog afterwards for two reasons: 1) I was exhausted, I went to the coast
and just chilled for a couple of days afterwards; 2) it was so busy, and
there were so many small useful conversations it was hard to summarise
or think of anything useful to other people to say (other than what I’d
been tweeting, etc. during the event). In terms of “things to report”:
there are a number of hopeful collaborations (e.g. a group of us looking
at information seeking/knowledge management including Dragan, Bodong and
Emily); I got to talk to some google people and I’m hoping a few things
will come of that; I finally met Rebecca Enyon who holds a joint post
between the Oxford Internet Institute (OII) and the Education dept there
and works on young people’s internet use; I had a really useful chat
with Carolyn Rosé about machine learning stuff; Tony Hirst and I finally
met (despite both being at the OU, this hadn’t happened yet!) and had
some interesting chats about search based pedagogy [jfgi] and the
scope for interesting research using Google Trends/Insights.
Society of the Query Conference – Amsterdam One of my continued
interests is in how we conceptualise knowledge, particularly in the
context of tools such as google, and wikipedia. Paul Matthews wrote a
piece on this in the context of social epistemology (and a) was at CSCW,
and b) I hope to write something with in the not too distant future),
and I also have great hopes for the [Extended Knowledge Project]13
(which I hope to be able to contribute to). In this area, I’ve been
invited to contribute to the 2nd Society of the Query Conference in
Amsterdam this November on the subject of Education and the role of
context (see below), and submit a piece to their reader. The particular
panel is: 4. Search in context There is a long-term cultural
shift in trust happening, away from the library, the book store, even
the school towards Google’s algorithms. What does that mean? How are
search engines used in today’s classrooms and do teachers have enough
critical understanding of what it means to hand over authority? We think
we find more and in a faster way, while we might actually find less or
useless information. The way we search is related to the way we see the
world – how do we learn to operate in this context? Specific
projects – including: Over the year I’ve also been working on specific
projects, particularly developing skills and practical work around the
literature I’ve been reading and writing about. Developing a CIS
environment for Epistemic Commitments A core deliverable for my work is
the development of an environment on which to conduct my research. Over
the year I’ve been exploring a range of internal (Cohere and Evidence
Hub particularly) and external (mediawiki, and the existing CIS
environments in particular) tools which could be used for my research.
As a part of this I learnt how to use WAMPserver to mirror an existing
Wiki (the Schome project at the OU) and download and install extensions
to that Wiki. I have also explored the use of Google Analytics for
tracking user behaviours on websites (not appropriate due to constraints
on identifiable information), and the potential to use external feeds
(RSS) to seed another environment (Cohere). From these explorations and
my reading I designed a specification for a tool in collaboration with
my supervisors and Michelle Bachler (a developer in KMi) which will be a
Firefox addon for the EvidenceHub tool, developed by Michelle. In
addition I have had useful conversations with others about developing
tools to explore epistemic commitments (e.g. [Sean Lip at
Google]14). EDAM Another core piece of my work is around
educationally productive dialogue. In particular, given my interest in
epistemic commitments a core part of my project is to identify when
commitments are ‘accountable’ in the group (i.e. are within the scope of
exploratory/accountable dialogue). To some extent keyword spotting may
be enough here particularly within the constrained environment of the
EvidenceHub (and indeed, that is the finding of the Epistemic Games
group). However, given existing work in the department to further
develop from bag of words approaches, and my own paper with Karen
Littleton discussing the role of context in exploratory dialogue, it was
of interest to explore how machine learning techniques might be used for
such classification. Therefore, over the course of the year I have
learnt to deploy the existing Exploratory Discourse Detection Module
(see section on Maturing EDAM in the first year report) which is built
on the MALLET command line tool. My checking of outputs from this tool
has been conducted in Excel (the tool essentially produces a .csv
format), and I would not anticipate delving further into work with
MALLET. I have, however, further explored the use of GATE, and to some
extent WEKA (and I am pleased to have met one of its founders – Ian
Witten – at LASI). I have also had useful discussions with Elijah
Mayfield who developed the LightSIDE tool, and has been kind enough to
share the code he used to detect ‘authoritative talk’ using an ‘Integer
Linear Programming’ approach. I would hope over the course of the PhD
to be able to utilise GUI tools such as LightSIDE and GATE in
appropriate contexts, while also working with machine learning
specialists to develop custom tools. To that end I have had helpful
technical conversations (for which I am very grateful) with Carolyn Rosé
and Elijah Mayfield, Zhongyu Wei (who conducted much of the original
work on the EDDM tool), and Yulan He (who also worked on the EDDM tool,
and who we hope to continue to work with). The joint paper with Karen
Littleton on Maturing EDAM includes a technical proposal for continued
work which we hope provides a specification for the next generation
tool. So.cl In 2012 Microsoft released a dataset from the social
search tool ‘[so.cl]15‘ to researchers. So.cl is an experimental
social network in which when one searches, a post is created based
around that search, to which interesting results from the search may be
pinned. It is multimedia intensive, and visually quite attractive. The
original intention was that the tool be used particularly in
universities, although that appears to have died down a bit now – one
interesting new development may be in the use of [so.cl
TEDActive]16 “conference-goers can assemble images, research links,
videos, and text into collages that express their reactions and
associations around the TED Talks.” In addition to the literature review
which offers a justification for the interest in dialogue around CIS,
some public blog posts around interesting so.cl discussions (e.g. on
whether [Aliens built the pyramids]17) indicated it might
potentially hold some interesting exploratory dialogue. Thus, in order
to attempt to investigate the exploratory properties of dialogue around
CIS the dataset was requested, ethical clearance granted, and the
dataset opened in R. Language modules, potential of SNA, mostly used it
for subsetting and column classifications (e.g. if >80, then ‘yes’) –
poor use of R power. A key lesson here is that while R has lots of great
packages, loading a whole dataset into dataframes in R is not a good
idea – instead, it is better to load them into MySQL do database stuff
within the database (like most subsetting, joins, etc.) and then run R
commands where necessary on subsets from the larger database. __
[expand title="Those commands can be expanded here"]
`
`
#to create summary tables you can load a table using e.g.
attach(behavfiles)
#then tables using table(), at most basic just pass a single column for frequency see e.g. http://www.statmethods.net/stats/frequencies.html
table(description)
# E.g. to convert from factor to numeric
BLE\$ExploratoryProbability1 <- as.numeric(as.character(ExploratoryProbability))
#run an ifelse on numeric in this case, you can insert an & (after the 40 on this) for additional conditions
BLE\$Exploratory40<-ifelse((BLE\$ExploratoryProbability1>=40), “yes”, “no”)
#To join a table and a sparsly populated table with same columns one nice way is to subset out all the Table1 rows that exist in Table2, then rbind Table1(a) and Table2
#using something like: data2[data1\$char1 %in% c(“string1″,”string2”),1]. OR do the reverse subset to the one you did before using !
NotText = subset(behavfiles, ActionId=!’78’|ActionId=!’49’|ActionId=!’153′)
#78=message on party, 153=message, 49=a comment. You might want to do something with the likes data at some point too
#to filter data down subset
SubBehav = subset(behavfiles, ActionId==’78’|ActionId==’49’|ActionId==’153′)
#You’ll want to export the data, to do that: write.table(SubBehav1534978, “c:/mydata.txt”, sep=”t”)
#First though, lets put into the format EDAM will actually accept!
#blank some columns for EDAM (so not actually blank)
SubBehavExporting\$TargetUserId = “NA”
#Then you can reorder columns using subsetting
SubBehavExporting1 = subset(SubBehavExporting,selectc=
#Detect language
install(cldr)
library(cldr)
BehavLang = detectLanguage(SubBehavExporting\$Context,isPlainText=FALSE,includeExtendedLanguages=FALSE, pickSummaryLanguage=FALSE,removeWeakMatches=FALSE, hintTopLevelDomain=NULL, hintLanguageCode=Languages\$UNKNOWN_LANGUAGE, hintEncoding=Encodings\$UNKNOWN)
BL = cbind(SubBehavExporting,BehavLang)
BL = subset(BL,detectedLanguage==’ENGLISH’)
write.table(BL, “BL.txt”, sep=”t”)
# Load behav files as:#
behav1 = read.delim2(“BehaviorData_0001.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “UTF-8”)
behav2 = read.delim2(“BehaviorData_0002.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “UTF-8”)
behav3 = read.delim2(“BehaviorData_0003.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “UTF-8”)
behav4 = read.delim2(“BehaviorData_0004.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “UTF-8”)
behav5 = read.delim2(“BehaviorData_0005.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “UTF-8”)
# Combine behav files using
behavfiles = rbind(behav1,behav2,behav3,behav4,behav5)
# Load user files
users = read.delim2(“Users1.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
usersDel = read.delim2(“DeletedUsers.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
# Load Action logs
Posts1 = users = read.delim2(“OnelinePosts-2012.11.01-2012.11.20.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts2 = users = read.delim2(“OnelinePosts-2012.10.01-2012.11.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts3 = users = read.delim2(“OnelinePosts-2012.09.01-2012.10.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts4 = users = read.delim2(“OnelinePosts-2012.08.01-2012.09.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts5 = users = read.delim2(“OnelinePosts-2012.05.01-2012.06.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts6 = users = read.delim2(“OnelinePosts-2012.06.01-2012.07.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts7 = users = read.delim2(“OnelinePosts-2012.07.01-2012.08.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts8 = users = read.delim2(“OnelinePosts-2012.01.01-2012.02.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts9 = users = read.delim2(“OnelinePosts-2012.02.01-2012.03.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts10 = users = read.delim2(“OnelinePosts-2012.03.01-2012.04.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
Posts11 = users = read.delim2(“OnelinePosts-2012.04.01-2012.05.01.txt”, header = TRUE, sep = “t”, quote = “”, comment.char = “”, encoding = “utf-8”)
# Combine post files using
postfiles = rbind(Posts1, Posts2, Posts3, Posts4, Posts5, Posts6, Posts7, Posts8, Posts9, Posts10, Posts11)
`` `
[/expand]` CIS on Wikipedia (R) The lesson regarding R and
MySQL was further enforced by another case. Wikipedia Talk pages are a
place in which editors can make sense of, and share, information – I was
hypothesised that we could see these two distinct types of behaviour in
link patterns; first moving from articles to talk pages (sensemaking),
second from talk pages to articles ([CIS in Wikipedia Talk
pages]6). So, with some generous help from Aaron Halfaker (who
scraped the edit histories of Wikipedia to send me, for each edit on
each page, every link added or removed), I set about trying to use [R
to process Wikipedia LinkFlow data]18…again, this was a mistake –
although I did learn some useful R along the way (as detailed in that
blog post). While in Stanford hosted by the Lytics Lab there I had a
chance to talk to René Kizilcec, and the weekend I left we (mostly he)
did some playing on trying to get the dataset into a readable form in
order to get it into MySQL, and then reshape it (or, just count) as per
the discussion in that blog post in which I discuss looking for strings
of ATDR (Inserted on Article; Inserted on Talk; Deleted from Article;
Removed from Talk [essentially the same thing but we need to
distinguish the two]). By looking at that, we could then count the
number of times any particular link has moved from A to T or vice versa
(as well as the other doubles) and we could even insert in S/N – same
user, not same user – on each double to explore that aspect too. IF we
wanted to do a bit more, we could index each link such that if the same
link appears on multiple pages it has the same ID – that would allow us
to start exploring SNA potential too. Having returned to the UK and been
very busy, as is René, this is currently not quite on hold, but
certainly not fully active (I’m hoping to get a ‘Note’ style paper
written by mid-September). The version of the data I managed to get in
to R wasn’t complete (the process must have failed at some point) so the
next step is to get my version of the data into the same format we got
it at Stanford, get it in to MySQL and go from there (which will include
me learning to operate on the dataset within a database, and then if
appropriate moving partially into R). ENA One of the hopes with the
so.cl dataset (above) was that it might provide some interesting coded
data on which Epistemic Network Analysis could be conducted. My visit
to University of Wisconsin-Madison was to learn how to use this method,
and deploy it on some data – intended initially to be the so.cl dataset,
but in the end I recoded my MPhil data. ENA is based on the theory of
Epistemic Frames which posits that the important component of
‘knowledge’ is not facts and skills in isolation, but understanding how
those are connected. For example, in the case of information seeking,
seeing that user’s seek ‘authority’ is, in isolation, not terribly
informative (because standards of ‘authority’ for knowledge may be
inappropriate or appropriate depending on other contextual factors).
However, understanding that a searcher’s ‘authority seeking’ talk is
connected to other talk related to community practices (perhaps around
who we assume authorities to be, such as scientists – a ‘value’ of that
community), or seeing that searchers engage in what Shaffer would call
‘epistemic’ talk and what I have called accountable or exploratory talk
(to justify their selection of authorities) is interesting. So, we see
in this example a case where simply exploring one component in isolation
provides relatively little information, while looking for combinations
offers more insight. For example, we could explore not only a reliance
on authority/corroboration in sourcing, but also instances in which they
are more/less likely to be connected to particular types of
justification (attempts to understand the material v. simple matching
information to plug answers in). This work is ongoing and has required
me to learn to use the ENA tool. Again I have a paper in draft which I
hope to finish by mid-November. Multiple Document Processing &
Google Hope to learn some Python Hope to be able to play with
credibility judgement work
Footnotes
-
http://sjgknight.com/finding-knowledge/2013/01/mediawiki-for-learning-analytics/ “MediaWiki for Learning Analytics?“ ↩
-
http://sjgknight.com/finding-knowledge/2013/07/badging-wikipedia-contributions/ “Badging Wikipedia Contributions” ↩
-
http://sjgknight.com/finding-knowledge/2013/01/wikipedia-feedback-ratings-as-an-epistemic-tool/ “Wikipedia Feedback Ratings as an Epistemic Tool” ↩
-
http://sjgknight.com/finding-knowledge/2013/05/collaborative-information-seeking-on-wikipedia-talk-pages/ “Collaborative Information Seeking on Wikipedia Talk Pages” ↩ ↩2
-
http://sjgknight.com/finding-knowledge/2013/06/wmuk-conference-mediawiki-for-oer-and-learning-analytics/ “WMUK Conference – Mediawiki for OER and Learning Analytics” ↩
-
http://sjgknight.com/finding-knowledge/2013/04/wikipedia-learning-analytics-editathon/ “Wikipedia Learning Analytics editathon” ↩
-
http://sjgknight.com/finding-knowledge/2013/05/an-invititation-to-the-massive-online-open-course-mooc-wikipedia-page/ “An Invititation to the Massive Open Online Course (MOOC) Wikipedia Page” ↩
-
http://sjgknight.com/finding-knowledge/2013/03/cscw2013-2-workshop-papers-texan-fun/ “CSCW2013 – 2 workshop papers & Texan fun” ↩
-
http://sjgknight.com/finding-knowledge/2013/04/lak13/ “#lak13 Conference & Workshops” ↩
-
http://sjgknight.com/finding-knowledge/2013/03/lak13-mooc-week-on-epistemology-assessment-and-pedagogy/ “#lak13 mooc week on epistemology, assessment and pedagogy” ↩
-
http://sjgknight.com/finding-knowledge/2013/03/the-extended-knowledge-project/ “The Extended Knowledge Project” ↩
-
http://sjgknight.com/finding-knowledge/2013/07/google-coursebuilder-and-search-education/ “Google Coursebuilder and Search Education” ↩
-
http://www.bing.com/blogs/site_blogs/b/search/archive/2013/02/26/bing-microsoft-research-bring-the-power-of-social-search-and-whimsy-to-tedactive-with-ted-so-cl.aspx ↩
-
http://blog.fuselabs.org/post/29422205211/some-argue-that-aliens-built-the-pyramids-a-heated ↩
-
http://sjgknight.com/finding-knowledge/2013/06/using-r-to-process-wikipedia-link-flow-data/ “Using R to process Wikipedia link flow data” ↩