Biggish Data

Big Data
Big Data 😉

Big Data is (are?) old hat …  Big Data dropped off Gartner’s Emerging Technologies Hype Cycle altogether in 2015, having slipped into the ‘Trough of Disillusionment’ in 2014 (Gartner Inc. 2014, 2015a). The reason given for this was simply that it had evolved and had become the new normal – the high-volume, high-velocity, high-variety types of information that classically defined ‘big data’ were becoming embedded in a range of different practices (e.g. Heudecker 2015).

At the same time, some of the assumptions behind Big Data were being questioned. It was no longer quite so straightforward to claim that ‘big data’ could overcome ‘small data’ by throwing computer power at a problem, or that quantity outweighed quality such that the large size of a dataset offset any problems of errors and inaccuracies in the data (e.g. Mayer-Schönberger and Cukier 2013, 33), or that these data could be analysed in the absence of any hypotheses (Anderson 2008).

For instance, boyd and Crawford had highlighted the mythical status of ‘big data’; in particular that it somehow provided a higher order of intelligence that could create insights that were otherwise impossible, and assigned them an aura of truth, objectivity and accuracy (2012, 663). Others followed suit. For example, McFarland and McFarland (2015) have recently shown how most Big Data analyses give rise to “precisely inaccurate” results simply because the sample size is so large that they give rise to statistically highly significant results (and hence the debacle over Google Flu Trends  – for example, Lazer and Kennedy 2015). Similarly, Pechenick et al (2015) showed how, counter-intuitively, results from Google’s Books Corpus could easily be distorted by a single prolific author, or by the fact that there was a marked increase in scientific articles included in the corpus after the 1960s. Indeed, Peter Sondergaard, a senior vice president at Gartner and global head of Research, underlined that data (big or otherwise) are inherently dumb without algorithms to work on them (Gartner Inc. 2015b). In this regard, one might claim Big Data have been superseded by Big Algorithms in many respects.

Continue reading

Let’s talk about digital archaeology

Andre Costopoulos lays down a series of provocations in his opening editorial for the new Digital Archaeology section of the Frontiers in Digital Humanities journal. So far, there doesn’t seem to have been much response – Twitter chatter, for example, simply draws attention to the article without comment (except perhaps in one instance where it may or may not be addressed tongue-in-cheek – such is the danger of social media!).

ME_463_StrawMan-640x199
Mimi and Eunice – (CC BY-SA 3.0)

He starts by saying simply:

“I want to stop talking about digital archeology. I want to continue doing archeology digitally … I would like to lay the groundwork for the journal as a place primarily to do archeology digitally, rather than as a place to discuss digital archeology”.

There’s certainly nothing wrong about a journal focussed on digital archaeological applications, but what’s wrong with talking about digital archaeology? He goes on:

Continue reading