In a spark of creativity which kept me active for a while over Christmas, I had an idea about using the feedback ratings on the bottom of most Wikipedia pages as a tool to analyse the epistemic judgements on those pages, and their relationship to the qualities of the pages themselves. One cool potential here is that by analysing the properties of highly rated pages, we could predict how new pages might be ranked, and how they might be improved (without direct human input). Furthermore, it may be that there’s scope for lessons learned to be applied to another corpus – so from Wikipedia data, we could learn properties of texts more generally to automatically assess elsewhere. See So what does the feedback tool ask? Well, first it asks for a rating (on a 5 star scale) for how: * Trustworthy * Objective * Complete * Well-Written It also asks the respondent to tick a box if they are “highly knowledgeable about this topic”. When I describe my interest in epistemic cognition (or ‘epistemic behaviour’ which is my preferred expression at the moment) I normally simply and suggest that there are four dimensions – beliefs about the: source, structure, complexity, and justification for knowledge. These relate to how people search for information, and how they deal with ‘multiple document processing’; for example, someone with a more fixed view of knowledge from authority may search for different things, think their search is done earlier, and trust different documents more than someone who thinks that knowledge should be justified, is discursive, changes over time and is complex. Trustworthiness seems to relate to the last and first (source and justification) of these, while objectiveness might relate to the first. Completeness is probably associated with complexity and structure. Being a subject expert is – one would think – likely to tell us something interesting about where these feedback assessments are coming from. Wikimedia have done some research on this data, particularly looking at trying to get users engaged in editing, and some basic metrics to predict article quality (length is a pretty good one). I’m interested in whether other factors could be explored in this context including: * Factors relating to the ‘talk’ pages (e.g. unresolved issues, high quality discourse, lack of talk) * Number of sources cited (presumably standardised for page length) * Number of edits and editors (presumably standardised for article age and probably popularity?) * Internal links (links to other Wikipedia articles) I can certainly see issues (both practically and theoretically) with this sort of analysis, but that said I think it’s also a great opportunity to explore a pretty large dataset across a diverse corpus. If anyone knows of anyone doing this sort of work, I’d be v. interested to know – comment or tweet me [EDIT 17:03 06/01/2013] But what about divergent opinion?  Can Wikipedia (and its rating tool) represent the complexity of epistemic justification?  Are such tools oppressive in nature?  Should we worry about epistemic qualities of the users and authors as much as the articles (e.g. dominance of authorship by gender, language, race, etc.)  What are the political-epistemic factors involved in the testimonial nature of socially authored texts, in contexts as diverse as scientific (e.g. climate change), crisis management (use of tools to provide warning systems to users), health data, etc.etc.?