A couple of years ago I wrote a report for Nominet Trust on measuring value in social media projects. That report started with a discussion of the various ways we might mean “value” – monetary, social, personal, towards the charitable aims, etc. This is a complex question – an image on a website is just another set of pixels to me, but to someone else it might be a previously untold story, an example of some outcome, the first image they’ve ever uploaded, a foci for discussion and networking, etc. Value can propagate through systems, and it’s important to think about what outcomes we’re looking for. I then went on to discuss the various ways people were using tools (including the very tools they were deploying their programmes through) to both create and track ‘value’, using qualitative and quantitative means.  I’m really glad I had the opportunity to take on that internship as it’s helped me on a few things since, including thinking about evaluation on Wikimedia projects. Wikimedia is in many ways an unusual organisation because of its stakeholder structure, and the ways benefits are seen. It is not just a knowledge dissemination charity (although perhaps these are closest parallels), its running depends on community, and we care about that community and its health, its diversity, how it continues to develop. And we care about who has access to the materials, and wide reach there.  So, here are some rough  collated thoughts tying some bits together. You should also check out the work of the various people who’ve been working on Program evaluation in Wikimedia projects, and Metrics in Wikimedia research. On which… # Metrics

Metrics – A Partial Story Metrics are part of the story.  But of

course, we know they only tell part of the story. “1 new editor” doesn’t tell us the story about how that editor has managed to take on new roles in their community, and developed new important pages on previously under-represented topics (and wow, telling that story is hard). If we’re not careful, it also belittles those tireless contributions from people who work to make small changes across articles, or who keep a watch for vandalism, and encourage new editors. That’s not to say we can’t capture metrics on those things, but we need to be aware of what we’re capturing and the stories it allows us to tell. And we need to be able to build a narrative that includes, but is not limited to, metrics. If we [take the example of editathons]1, from those we might be interested in tracking content production as an output (as an aside, I’d note on that example the reporting calls content production an outcome, this is fairly nuanced but I’d say it was an output; the outcome is “valuable contribution”).  We might also be interested in editors retained – so here, the number of participants might be an output (who attended), while retention rate is a proxy performance indicator for an outcome like “community building”. These things matter, but we need to be sure what we’re measuring. Metrics – A Complex Story Of course, metrics are also just hard to capture even for relatively simple things.  For example, if we want to explore [how resources are used in the wild]2 (outside of Wikimedia projects – they can track this rather well) that’s a real challenge.  But such data (paradata) is rather important to [understanding how open education resources (OER) are used]3 and can help us tell that story. It’s also worth noting that aggregation of data can hide stories by making opaque some meaningful layers or subsets of data; we (Wikimedia) are well placed for the type of open aggregation (in which you can click through) that Ed Anderton talks about [here ]4 And that’s something we should think about when developing metric systems and tracking tools, particularly given the technical capabilities we have through the tools around WikiData. # Narratives One way we can add to the story the metrics let us tell is by tracking narratives too, and there are good reasons [we should link evaluation to storytelling]5. And there are ways we can capture these meaningfully in micro-stories, little evaluations from people at the end of sessions which can be aggregated (although, see above!): >

> This simple exchange (that resulted in perhaps a sentence or two) multiplied over the 1000s of exchanges that took place over the project have generated a massive set of qualitative and quantitative data with which to understand the impact of the work.  The trainer ID is attached to their outreach location so that every exchange they note down is attached to a geographic location and outreach centre.  This means you can analyse and cross reference the nature of the exchanges that took place: in different geographic areas, among different work statuses, and for people of different ages. You can, in 1 click, go from an overview of a geographic region to reading the specific exchanges that made up that headline data. – See more at: http://www.nominettrust.org.uk/knowledge-centre/blogs/evaluation-ground#sthash.C539h1Ha.dpuf >

Of course, we in Wikimedia could use templates and categories to keep track of stories, and aggregate (in raw form) through transclusion. As I’ll mention below, I think this sort of use of the platform is also a way of adding value. # Getting Value from our Measure If you’ve yet to be convinced that we should be evaluating, I suggest you take a look at these [10 reasons why you should evaluate.]6 More broadly I’d also suggest thinking about how evaluation can be a force for good:

promoting evidence-based measurement might bring a number of potential benefits:

  •     Empowering and equipping community organisations to prove the value of what they do, and have created.
  •     Build the capacity of community organisations to demonstrate their impact
  •     Enable groups to share their findings and promote their work
  •     Expand, rather than restrict, the range of programmes that can be considered for funding

One of my concerns about the evidence-based agenda as it is currently manifesting (not necessarily the way it has to go, or is intended) is that is potentially very disempowering i.e. the commissioners/funders tell the providers what they think works, how it should be measured and what they are prepared therefore to fund. Not exactly a world of equal partnerships and co-production. Nor a process that accurately reflects the importance of context, complexity and localised needs. As a colleague of mine rightly stresses evaluate comes from the Old French evaluer i.e. to find out the value of something. Using tried and tested measurement tools has to be about helping organisations find the value in the work they do, and help them to learn and improve the service they provide.

– See more at: http://www.nominettrust.org.uk/knowledge-centre/blogs/measurement-force-good#sthash.gDkJfEJg.dpuf

What does this mean to Wikimedia? I’d suggest that we should:

  • Think about how the tools we use for evaluation can be used by others, both within the Wikimedia movement and beyond – lots of organisations need platforms both to deliver their content, and to track its value, we can help them.  We’re Wikimedia for goodness sake, what is the point in using Mediawiki if we can’t make it do cool things.
  • Make sure that the community ‘buys in’ to what we’re doing, and that the tools are not just centralised – we’re Wikimedia for goodness sake, what is the point in us if we can’t be open and engaged with a community.
  • Consider how we can create reports, visualisations, and analytics that enable organisations to share and gather meaning from their impact narratives. (We’re Wikimedia for goodness sake…what’s the point in all that semantic stuff if we’re not going to eat our own dogfood!)

Reflecting what we know Another aspect of that last point, might be

to take some sort of ‘knowledge mapping’ approach to tell our data and stories effectively through empowering impact narratives…I discuss this [here]7, and may return to it another time for Wikimedia. High Risk Projects Metrics? Of course, as we develop these tools, in addition to thinking about what exactly we’re tracking (and excluding), we also need to be aware of how our measurement relates to external factors (or, more cynically, creates perverse incentives). For example, if we use metrics to measure editor retention, one thing we might do is maximise the efficacy of our events (good thing). However, another might be that we start to run events (perhaps for perfectly innocent reasons) that successfully recruit people who would in any case have joined, or who in any case are well represented – that is, we take low risk participants. This is of course a classic issue in education and health – mortality rates are a poor indicator of ‘success’ without some indication of how high risk the patients were in the first place. Now, we mitigate against that in part by looking to try and increase diversity, but nonetheless intersectionality yo. Building Measurement into What we do Finally, when I’m thinking about programme evaluation, what I’m really keen to see is that the metrics and narratives we collect are not just collected for reports.  Moreover, they shouldn’t be collated for reports.  The collection and collation of impact narratives should be a part of our day to day activities, they should be embedded in our programmes and how they run.  That means we need a clear strategy, which is tied to our activities, which are tied back to that strategy and can be reported on in relation to it.  Any planning/strategy needs to be around the outcomes we’re looking for, and the sorts of outputs (metrics and narratives) we have that are good indicators for that outcome. [Wiki exampled Logic Model]8 Eating our own dog food What particularly excites me about all this, is that we’re creating further value through the use of the platform – Mediawiki – by embedding our evaluation in our activities (which are Mediawiki based), and by creating value for the platform such that other organisations could ‘buy in’ to it.  That’s not true of all chapters (who use WordPress or similar on the front end) although I understand why they do that (and they can certainly demonstrate value including through wordpress…but they don’t have this element)…


  1. https://meta.wikimedia.org/wiki/Programs:Evaluation_portal/Library/Edit-a-thons

  2. http://www.nominettrust.org.uk/knowledge-centre/blogs/measuring-impact-tracking-open-content-wild

  3. http://www.nominettrust.org.uk/knowledge-centre/blogs/wikipedia-education-and-tracking-how-knowledge-used

  4. http://www.nominettrust.org.uk/knowledge-centre/blogs/where-it-counts

  5. http://evaluationstories.wordpress.com/2013/12/10/once-upon-a-time-why-link-evaluation-with-storytelling/

  6. http://www.nominettrust.org.uk/knowledge-centre/blogs/evaluation-10-reasons-why-you-should

  7. http://www.nominettrust.org.uk/knowledge-centre/blogs/knowledge-mapping-third-sector

  8. https://commons.wikimedia.org/wiki/File%3AWiki_exampled_Logic_Model.png “By JAnstee (WMF) (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons”