In analysis of written texts there are various reasons we might want to understand how similar multiple texts are. We might be interested in: 1. Whether text is plagiarised, from a particular external source, or from another student 2. Whether a text has features indicating particular authorship 3. Whether a text has features indicating a particular genre 4. Whether a text apparently draws from particular sources (it has features, for example – topics – indicating particular sources have been used in its writing) Sometimes, similarity is a good thing – it indicates that some or other useful source has been productively used, it indicates adherence to particular stylistic features of writing, etc. – other times it might be less so, as is the case in plagiarism. Notes on plagiarism Although, it is important to note there are a range of reasons text overlaps (some of which might look like plagiarism), and that in some cases (e.g. creative commons adaptation) – the reuse of material (and its alteration) is perfectly acceptable (see e.g. [this post]1). Indeed, as a recent discussion on the ALT discussion list noted, software can detect similarity – but judgement is required to understand where that similarity comes from. In the case of plagiarism though, there are some great resources available (of which this set is in no way exhaustive): * UTS have developed a really nice [quiz to help students understand the requirements]2

  • there’s also a nice paper from Rolfe on [using turnitin for formative feedback on plagiarism]3 * the University of Sydney has some great advice on [ways to prevent students from plagiarising]4
  • JISC/Oxford Brookes published a great guide on the same issue including [‘designing out’ opportunities for plagiarism]5 (e.g. by not using the same assignments every year!), * and a nice resource on combating [‘contract cheating’ (from the hea)]6 * In addition (and with more acknowledgement of appropriate adaptation of open knowledge material), Wikipedia has some rather good [guidance on Plagiarism and Copyright]7 & The WikiEd Foundation produced a [really nice leaflet]8. Importantly, as [McGowan discusses (abstract)]

Tools to detect keyphrases in document which appear elsewhere on the

web 1. Duplichecker – Searches the web for phrases taken from a provided document and returns links to their sources (could be used to identify original sourcing of material, or of use of that material across the web) 2. Plagiarism-detect – does the same as ‘2’ but in a much sexier way (identified text is highlighted and becomes a clickable hyperlink to its source on the web); but it seems it’s not free 3. Plagiarism-checker – same as above, nice feature is that it ignores text in quotations, i.e. it ignores already cited information looking only at the uncited. []9 4. PlagScan – Same again, looks for keyphrases from a given text across the web, returns matches 5. ArticleChecker – same as above, seems to be targeted at content-producers finding reuse of their content across the web (and isn’t working for me currently) 6. Plagiarism-checker – Same as above but a bit limited – it lets you run a few searches at once (but you have to select the lines of text to search on, it won’t do it for you), targeted at both content-producers and teachers. Has a “google alerts” function built in which is pretty cool, to alert authors if their content appears elsewhere on the web 7. Copyscape – same as above give it a URL it’ll show if the content is being used elsewhere on the web, again with a google alert. Fairly limited.