Big Data is a controversial subject featured in the news almost daily, from the NSA spying programs to the rise of corporate data brokers. For better or worse this data exists, and the high value of information to both governments and private interests alike, make it look as though the practice is here to stay in one form or another. But, it is not the entry of data collection into the many aspects of our lives that I am exploring here; rather it is how this data can be mined in the future by historians.
Though the emergence of metadata is no doubt unprecedented, in the study of history it is not completely foreign, just a different and much denser form of an archive. A vast digital archive of the everyday, one which could allow this profusion of data to be viewed through a historical lens, turning numbers into human narratives. In short, I would like to raise the possibility of a new historiography – Data history – which would be the study of the past through the mining of data.
The challenges though may arise not from the size of the researchable data, but with the almost endless possibilities of new historical approaches. Since the times of Herodotus and Thucydides there have been different methods of history. The German School of the nineteenth century ushered in the age of academic research based on documental evidence. Jacob Burckhardt, who wrote The Civilization of the Renaissance in Italy published in 1860, studied in Berlin under the influential Leopold von Ranke, and is an example of the turn of history to the written records of the past. Burckhardt was also one of the first modern practitioners of cultural and art history.
The Progressive Era brought with it the rise of statistics, and there use most prominently in sociology and history. The work of W.E.B. Du Bois is a prime example of this period, whose use of graphs and footnotes revolutionized serious social science and history writing. Fernand Braudel would surface from the archives of three continents with his epic The Mediterranean and the Mediterranean World in the Age of Philip II in 1949. Leading the Annales School to the foreground of twentieth century historiography, with its focus on the longue durée and on a history based from the bottom of society up. And since the majority of those who were literate in the past were both male and elite, statistics concerning economic and criminal records shone light on those who had been hidden in the darkness of the past.
In contrast, many modern historians can still illuminate past lives and events with limited archival material, the works of Jill Lepore and Natalie Zemon Davis come to mind, just to name a few. This draws an obvious conclusion that the amount of information that could be available in the future through metadata does not alone translate into better practices of history. And in putting forward the idea of a tomorrow where the study of Data history may exist, I recognize that I am raising more questions than answers here. But, one window I’d like to explore into this vast digital landscape may be found through genealogy.
There has always been an interest in ones ancestors, but as I mentioned before, the paper trails into the past were for the most part open to the minority that were privileged. Today, with Ancestry.com and other genealogical sites and societies, the doors to those we descend from have opened wider than ever before. The largest genealogical company, Ancestry.com, offers over 11 billion historical records to its subscribers. In comparison, Alice E. Marwick in her article “How Your Data Are Being Deeply Mined” for the NY Review of Books writes that: “The industry of collecting, aggregating, and brokering personal data is known as ‘database marketing.’
The second largest company in this field, Acxiom, has 23,000 computer servers that process more than 50 trillion data transactions per year.” As Acxiom is the second largest collector, that means well over 100 trillion pieces of data are being collected a year, with this number set to rise exponentially. This is not even factoring in whatever the NSA and other government agencies around the world are collecting. That is why it is called Big Data. It is less daunting when you imagine in the future your descendants being able to view your Facebook or other social media profile, and having access to the pictures, posts, and comments of a bygone era. In this sense, to genealogists and the historians of tomorrow this information is priceless.
Outside of history this genealogical data could be valuable as well. Michel Foucault in his essay “Nietzsche, Genealogy, History,” states “the body is molded by many distinct regimes; it is broken down by rhythms of work, rest, holidays, and eating habits. Genealogy, is thus situated within the body and history.” Foucault’s argument is of course more epistemological, but it relates perfectly to genetics. Our genes bare the imprints of how others lived in the past. The mining of historical data could then bring a family tree not only to life, but could improve the lives of those who will precede us. The mining of medical records as data could be used to both treat individuals or to gain a better understanding about diseases that affect whole societies.
The field of medicine is just one example outside history, many more questions I believe will be raised by the study of data, especially those concerned the three C’s of culture, class, and consumption.
Ultimately, I think that an archive for this data should be established, one where governments, business, and the individual can donate their data for the higher purpose of studying the human condition in the future. Of course these data donations could come with research restrictions, some records not to be opened till a certain date to protect sensitive information as an example. But, if we are serious about leaving a better world for those who will inhabit it after us, our data from today could create a better tomorrow. Data history will be the study of the past through the mining of data, the depth of which is our legacy to decide.