One of the things I do in my Data Science for Innovation class is illustrate some of the issues we face in working with real datasets. Unlike many sample datasets, many datasets we encounter in authentic contexts are messy in various ways, or/and need wrangling into the shape we need for analysis. A great tool to illustrate some of the issues we face is OpenRefine. I confess, it isnât a tool Iâd used until I started teaching this module, and itâs likely not a tool many professional datascientists would continue using (I still use R for most of the bits and pieces I do even knowing about OpenRefine). What OpenRefine is great for is illustrating the issues and principles without getting bogged down in the specifics of âhow do we code thatâ. Itâs also incredibly useful because it is a tool I can imagine introducing to colleagues who donât want to code. Having taught the OpenRefine way a couple of times, the feedback students gave was mostly positive, with a few wishing Iâd selected a different tool. So, I took the OpenRefine tutorial, and worked out how youâd implement each step both in R, and in a spreadsheet (partially implemented so far as possible, solely for illustrative purposes). The three are embedded below and can be treated as CC-By.# Word document OpenRefine instructions [https]1[://]1[goo.gl/mQC7oe]1# RPubs R instructions (click to view)# Spreadsheet illustration (a good example of the limitations of spreadsheetsâŠ)