Ian Milligan, Department of History, University of Waterloo
Just as Big Data is reshaping contemporary society, for better or worse, so too will it reshape the historical profession. The astounding growth of digital sources since the advent of the World Wide Web from 1990 to 1991 presents tremendous new opportunities for social and cultural historians, but it can also be terrifying in the sheer amount of information that these sources will contain about how life is now lived.
Web archives, created by web crawlers that systematically download websites, subsequently stored and preserved in repositories ranging from the Internet Archive to national libraries like the British Library and the Bibliothèque Nationale de France, contain incredible recollections: the everyday thoughts of Canadians, recorded on social media, blogs, websites, and beyond. You can access them yourself if you know the exact address with the Internet Archive’s Wayback Machine or if you want to try searching some old Canadian websites, visit http://WebArchives.ca.
First — the opportunity. Imagine what this might mean. As James Gleick has argued in his book The Information, the norm before the digital age was that human information would vanish but now, “expectations have inverted. Everything may be recorded and preserved, at least potentially.”
We can see the signs of this enormous resource already. The old “gold standard” of historical data included collections like the records of the Old Bailey (now digitized), the central London criminal court that provides so much insight into the lives of English people between 1674 and 1913. Its custodians can today rightfully claim that it is the “largest body of texts detailing the lives of non-elite people ever published.” Until now, that is. The online network GeoCities.com, which between 1994 and 2009 let everyday people create their own personal websites for no charge, saw the creation of some 38 million documents, generated by as many as seven million users. Or, during the #IdleNoMore protests in January 2013, consider that, on one day alone — 11 January 2013 — some 55,334 tweets used that hashtag.
Herein lies the problem, however, to temper this considerable opportunity. You can’t read 38 million documents, nor can you feasibly read all of these tweets. To use these sources, historians will need to use computers, but that’s not in itself a simple solution. To really understand how computer programs are parsing these sources, I think that historians need to know how to program. This doesn’t mean that historians all need to become computer scientists, of course, but rather that they begin to think algorithmically and bring a basic digital awareness to bear on these questions that will so profoundly affect us.
This is critically important because historians can apply a sensitive humanist sensibility to these questions. Ethical dimensions may be particularly pronounced, as we begin to discuss what it means to access and use the sources generated by millions, soon billions, of people — the vast majority of whom are unaware that their sources are being preserved. Are blogs or tweets fair game for the historian, now or in the future? As historians have considerable experience with ethical discussions, both from print and oral sources, it is yet another reminder that we can take a leading role in these conversations.
This is where resources like the Programming Historian, a free, open access online textbook that shows historians how to program, or my own Exploring Big Historical Data: The Historian’s Macroscope come in handy. These are team-written projects that try to give historians the tools to interpret digital sources, and also make the case that historians need to embrace collaboration, learn their own digital skills, and face the challenges of the 21st century head on.
All historians will benefit from fruitful, technically sound, and ethically principled engagement with web archives. As we begin to write histories of the 1990s (strange as it may sound, we are almost as far from the 1990s as we were from the 1960s when histories of that decade began to appear), we will need web archives to do justice to the period. Yet, imagine: military historians will have the thoughts of soldiers deployed overseas who tweeted or used discussion boards to communicate with loved ones; political historians can trace how elections in the 1990s or 2000s played out online, or how governments forged ties with communities; and social and cultural historians can see how our societies functioned. It will require some rethinking of how we as historians approach our profession, but the results will be worth it.
- The task of the historian changes along with the character of source material. Digital records pose new challenges and opportunities.
- Just as the data opens up new frontiers, it presents the obligation to think anew about ethical issues in archival research.
- James Gleick, The Information: A History, a Theory, a Flood (London: Vintage, 2012), 396–7. ↵