I’ve borrowed the idea of ‘deep-fried data’ from the title of a presentation by Maciej Cegłowski to the Collections as Data conference at the Library of Congress last month. As an archaeologist living and working in Scotland for 26 years, the idea of deep-fried data spoke to me, not least of course because of Scotland’s culinary reputation for deep-frying anything and everything. Deep-fried Mars bars, deep-fried Crème eggs, deep-fried butter balls in Irn Bru batter, deep-fried pizza, deep-fried steak pies, and so it goes on (see some more not entirely serious examples).
Hardened arteries aside, what does deep-fried data mean, and how is this relevant to the archaeological situation? In fact, you don’t have to look too hard to see that cooking is often used as a metaphor for our relationship with and use of data.
At the same time, some of the assumptions behind Big Data were being questioned. It was no longer quite so straightforward to claim that ‘big data’ could overcome ‘small data’ by throwing computer power at a problem, or that quantity outweighed quality such that the large size of a dataset offset any problems of errors and inaccuracies in the data (e.g. Mayer-Schönberger and Cukier 2013, 33), or that these data could be analysed in the absence of any hypotheses (Anderson 2008).
For instance, boyd and Crawford had highlighted the mythical status of ‘big data’; in particular that it somehow provided a higher order of intelligence that could create insights that were otherwise impossible, and assigned them an aura of truth, objectivity and accuracy (2012, 663). Others followed suit. For example, McFarland and McFarland (2015) have recently shown how most Big Data analyses give rise to “precisely inaccurate” results simply because the sample size is so large that they give rise to statistically highly significant results (and hence the debacle over Google Flu Trends – for example, Lazer and Kennedy 2015). Similarly, Pechenick et al (2015) showed how, counter-intuitively, results from Google’s Books Corpus could easily be distorted by a single prolific author, or by the fact that there was a marked increase in scientific articles included in the corpus after the 1960s. Indeed, Peter Sondergaard, a senior vice president at Gartner and global head of Research, underlined that data (big or otherwise) are inherently dumb without algorithms to work on them (Gartner Inc. 2015b). In this regard, one might claim Big Data have been superseded by Big Algorithms in many respects.
It was only a matter of time before a ‘big data’ company latched onto archaeology for commercial purposes. Reported in a New Scientist article last week (with an unfortunate focus on ‘treasure’), a UK data analytics start-up called Democrata is incorporating archaeological data into a system to allow engineering and construction firms to predict the likelihood of encountering archaeological remains. This, of course, is what local authority archaeologists do, along with environmental impact assessments undertaken by commercial archaeology units. But this isn’t (yet) an argument about a potential threat to archaeological jobs.
As the end of 2014 approaches, Facebook has unleashed its new “Year in Review” app, purporting to show the highlights of your year. In my case, it did little other than demonstrate a more or less complete lack of Facebook activity on my part other than some conference photos a colleague had posted to my wall; in Eric Meyer’s case, it presented him with a picture of his daughter who had died earlier in the year. In a thoughtful and thought-provoking piece, he describes this as ‘Inadvertent Algorithmic Cruelty’: it wasn’t deliberate on the part of Facebook (who have now apologised), and for many people it worked well as evidenced by the numbers who opted to include it on their timelines, but it lacked an opt-in facility and there was an absence of what Meyer calls ‘empathetic design’. Om Malik picks up on this, pointing to the way Facebook now has an ‘Empathy Team’ apparently intended to make designers understand what it is like to be a user (sorry, a person), although Facebook’s ability to highlight what people see as important is driven by crude data such as the number of ‘likes’ and comments without any understanding of the underlying meanings which are present.
One of the features of the availability of increasing amounts of archaeological data online is that it frequently arrives without an accompanying awareness of context. Far from being a problem, this is often seen as an advantage in relation to ‘big data’ – indeed, Chris Anderson has claimed that context can be established later once statistical algorithms have found correlations in large datasets that might not otherwise be revealed. Continue reading →