Teachers out there,what advice do u give students on discriminating between stories on web? asks @dmrussell http://t.co/BZrT2pmMl8 #edtech

— Simon Knight (@sjgknight) November 16, 2013

In that case, Dan was talking about a specific case which serves as a nice example: How did Earl Grey tea come to be? Let’s imagine there are two competing stories (near enough true) with some overlap, and one is broadly more dominant. I ask my students to find out about the history of the tea, how do I judge them? Now, as an information scientist (for the purpose of this example) I’m an Information Retrieval expert, not a teastorian, so I’m not going to make claims about the factual account, instead I’m looking to investigate the student’s search and information literacy skills. How would I do that? Well, [we recently wrote a paper on that]1, in which I propose a model with three facets: 1. Claim sourcing: Whether we use authorities (implicitly by just selecting a single source, or explicitly through actual discussion of source qualities) or corroboration across sources (i.e. use of multiple documents to source a single claim) 2. Sourcing use: Whether we use sources in a very question directed way (not necessarily a bad thing, but limited in scope if it doesnt contain…) or use sources in a deeper more exploratory way (including markers of exploratory dialogue such as ‘because’) 3. Claim connecting: Whether we make assertions but fail to connect them, or we build on assertions and make connections between them We can imagine setting a task in which we give students a set of documents on the history of Early Grey Tea, and ask them to select the best supported claims from those documents. Because we (as researchers) have prior knowledge of the documents, we can make some claims regarding the ways in which students process them. For example, we might provide some patently biased sources, alongside some stronger cases (meta analysis or whatever), personal narratives, historical record and so on (this partly depends on the discipline of course).  From these sources we can for example: 1. Ascertain (and to some extent, modify) where information is corroborated across sources (to some extent a high tf-idf weight can model this…) to see which sources are used (of course, logging pageviews and looking at citations also does this) 2. Explore authority use by looking at how metadata facets (time, authorship, publisher/venue, any other relevant elements) are used, and by looking at how document ordering plays a part (e.g. how google indicates document authoritativeness in its SERP rankings) 3. Topic model to look at how many different ‘components’ there might be in a story I’m particularly interested in looking at epistemic dialogue between partners engaged in collaborative information seeking.  However, I think most of the below suggestions would work on any document processing task in which some task was set along the lines of “select the best claims from this set of documents”. In giving feedback to our students we might reply: On Sourcing 1. You presented more than one angle, but you didn’t seem to explore the sources much (sources used as competing without metadata reference being made) 2. You presented more than one angle, and you tried to see whether other documents agreed (you corroborated across sources), but you might like to consider whether some of those documents are more authoritative or more biased than others (corroboration without metadata references) 3. You presented more than one angle, but you might like to “look around” a bit more, you relied on the first document/a particular source and although it might be a good source if you look for different sources other perspectives may come out. You may need to try other searches for this. (Where reference to SERP ordering or/and metadata is made but limited exploration of sources) On use 1. You’ve presented facts that are clearly relevant to the question, but you haven’t always made it clear how these facts answer the question, you should improve your explanation 2. You’ve presented a clear explanation, but it isn’t always related back to the question 3. You’ve presented a clear explanation, but in places you are lacking relevant facts to support your explanation On Connection 1. You seem to extract a lot of claims from the documents. Have you considered whether some of the claims are related? Or whether some are repeated in different ways across documents? (where claims are stated but not connected) 2. You’re using the language from the documents well, but you could consider whether some of the documents use different words for the same thing (this is particularly interesting in open web search where we might want people to use terms from documents in their subsequent queries) 3. You seem to have extracted some claims from the documents, but (where claims are stated but only a limited set of topics from the documents are covered) 4. You are making lots of connections between claims, but you should try to relate these to the question too Now as it happens I think we can probably do something similar on open web search if we’re logging pageviews and queries made by using tools such as DBPedia, Metadata Extractor, and text processing tools (including topic modelling and tf;idf). What I should be doing right now is working out how to set Python up to do that stuff on the documents I’ve already sourced…so, onwards!

Footnotes

  1. http://kmi.open.ac.uk/publications/techreport/kmi-13-03