Blog

Collaborative Information Seeking on Wikipedia Talk Pages

CIS in wiki

I’m interested in whether links (internal and external) are added to talk pages, and if so whether they then ‘flow’ into the main-page.  I know I park links on talk pages, sometimes because I’m not qualified to assess them, sometimes because I’m busy, sometimes because although it’s a good source of information (e.g. a repository) I don’t have a direct link to relevant information, etc.  This is about collaborative information seeking.  That study would be relatively easy – track link inserts on the talk page, and then see if the link is moved into the article space.  In fact, even the suggestion of new subheadings, etc. on the talk page is a way of thinking about the sorts of epistemically salient distinctions we should be making between types of information on the article page – some of which may be new, some of which could involve restructuring already present information.

Subsequently, I have also thought about other ways to support such collaborative information seeking on the talk page – would a chat function beyond the current wiki markup be useful? How could we facilitate the meaningful sharing of results?  Should talk pages have a ‘useful resources’ section, or ‘queries you might like to try’?  Some of this could be automatic given, for example at a very basic level, we know when pages “link in”, a suggestion might be made to “link out” too.

So, here are some preliminary notes on some related wiki-research:

Information seeking in wiki

Some work has been done on research in this area.

 Collaborative talk in wiki

Of course Wikipedia talk page collaboration is also hugely centred on…well, ‘talk’.  Given my interest in dialogue as a mediating context for collaborative information seeking, that’s pretty interesting.  And indeed, there has been some nice research in this area:

  • Kaltenbrunner and Laniado look at the time evolution of Wikipedia discussions, and how it correlates to editing activity, based on 9.4 million comments from the March 12, 2010 dump.[9] Peaks in commenting and peaks in editing often co-occur (for sufficiently large peaks of 20 comments, 63% of the time) within two days. They show the articles with the longest comment peaks and most edit peaks, and the 20 slowest and 20 fastest discussions. They find that “the fastest growing discussions are more likely to have long lasting edit peaks”   Includes a suggestion for network analysis http://meta.wikimedia.org/wiki/Research:Newsletter/2012/April#Time_evolution_of_Wikipedia_discussions
  • “Dynamics of Conflicts in Wikipedia”[1], develops an interesting “measure of controversiality”, something that might be of interest to editors at large if it was a more widely popularized and dynamically updated statistic. The authors look at the patterns of edit warring on Wikipedia articles, finding that edit warriors are usually prone to reaching consensus, and the rare cases of never-ending warring involve those that continuously attract new editors who have not yet joined the consensus.  http://meta.wikimedia.org/wiki/Research:Newsletter/2012/June#Dynamics_of_edit_wars
  • Some people have wondered whether Issue Based Information Systems (IBIS) might also be interesting in Wikipedia http://meta.wikimedia.org/wiki/Research:The_role_of_IBIS_in_Wikimedia (including bibliography)

A lot of this is based on http://meta.wikimedia.org/wiki/Research:Data…which I really ought to play with at some point (when I’m feeling a bit more competent perhaps).

There’s also an (incomplete) list http://en.wikipedia.org/wiki/Wikipedia:Academic_studies_of_Wikipedia of studies on Wikipedia, including:

In addition to that one there’s a compilation at http://wikipapers.referata.com/wiki/List_of_research_areas – usefully categorised.  I’m also aware there’s a fair bit of education research on wikipedia (e.g. From Wikipedia to the classroom: exploring online publication and learning) – another area for another time!


Print pagePDF pageEmail page

This Post Has 10 Comments

  1. Simon Knight
    Simon Knight says:

    I had an idea re: a quick and easy way to ‘test’ some of this earlier, which I think would be relatively easy from wikipedia data (although I can’t think how I’d extract it – a lot to learn 🙂 ) .

    I imagine if we could get a flat ordered list of links on talk and article pages. (literally, 4 columns – Order/Timestamp, Link, [Talk/Article], ArticleName) it could be queried easily. For each link posted on a ‘Talk’ in any ArticleName:
    1) If that link existed on the ArticleName_Article space then “TalkAfter”
    2) If that link existed after on ArticleName_Article space then “TalkBefore” /flow
    3) If, never appears in edits to the converse, then “no flow”

    To go beyond that you’d want to do something more interesting though, like seeing whether more or less discussed links were more or less likely to appear. Seeing who is involved. Seeing if there are any qualities of the discussion, etc.

    As an add on to the above it would be relatively trivial (Ha! I think) to add a 5th column, ‘Editor’, and check whether for each ‘link’ it was only ever added by one editor, or by multiples (e.g. added to a talk page by one and to the article space by another).

    Again it’d be nice to have more detail (e.g. did the editor get involved in a discussion or not, did they move a link to discuss it critically or positively, etc.

    • Simon Knight
      Simon Knight says:

      Talk after could probably be characterised as ‘sensemaking’ on article links (assuming ‘critiquing’ counts).

      Talk before might be more appropriately thought of as sourcing, or collaborative information seeking – sourcing and collaborative sensemaking on that.

      That leaves links that appear on a talk page (but never on the article) v. those that appear in the article (but never on the talk – a more common scenario presumably). Presumably the latter are just “solid sources”, the former might be interesting to look at in more detail, presumably some of them are ‘rejected sources’ (so would fit into a ‘CIS’ notion) or just ‘ignored’ (ditto) – could probably assess which of those via edit activity. Some might be repositories (top level domains might be gd indicator of that)…might be interesting anyway!

  2. Simon Knight
    Simon Knight says:

    Dumping some links.
    Datadump description: http://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download

    pagelinks.sql.gz provides intra-wiki links (see http://haselgrove.id.au/wikipedia.htm for nice description to play there).

    “Pull links out of a Wikipedia XML dump, real fast.” https://gist.github.com/mnot/843195

    Ways to process and use Wikipedia dumps describes some stuff, inc. processing externallinks.sql.gz (439.6 MB) – links from pages to external sites and pagelinks.sql.gz (1.2 GB) – inter wiki page links; with python http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/

Leave A Reply

You must be logged in to post a comment.

%d bloggers like this: