Crowdsourcing in the Cultural Heritage Domain

A paper of mine from 2012. Unfortunately, the final version with citations and references seems to have disappeared from my computer. A paper copy is also mysteriously absent… Still, one of my better papers on a nascent practice in the LIS field, and one that I would like preserved. Maybe I can crowdsource the citations back in!

In the humanities, the cultural heritage domain consists of a variety of cultural institutions and Web presences that protect and preserve the shared artefacts of human history. Almost everyone is familiar with these places to a lesser or greater extent. In the digital humanities lexicon, they have come to be referred to as GLAMS; an acronym standing for Galleries, Libraries, Archives and Museums. To be sure, these are some of our oldest and most valued institutions. They are also some of the hardest hit in our age of recession, where budgets are cut and programs deemed as marginal are eliminated. This forces GLAMS to use whatever tools are at their disposal to stay relevant and engaged with the public. Indeed, it is this very public that they depend on, and not just for their fiscal survival. GLAMS depend on generally large groups of people to confirm their mission – their raison d’être – that is, to impart culture, knowledge and memory to future generations.

The institutions that comprise GLAMS have been around for hundreds of years, but the older humanities disciplines are now being complemented by newer digital disciplines. Indeed, this interdisciplinary field is now shifting towards predominately computer-based realities. Of course, the humanities already has an intriguing history with the computer. Humanities computing as such began in 1949 with Father Roberto Busa, who with the help of Thomas J. Watson at IBM created an automated program to index 11 million words in Latin penned by Thomas Aquinas and others (citation). Busa set a precedent for humanities computing and there is now an award named after him given triennially by the Alliance of Digital Humanities Organisations (citation). Following Busa, the march of computer technology through the humanities led to a series of journals, symposia and academic computing centers. It culminated in the arrival of the Internet, but more specifically the World Wide Web. Over time, a broader base of academic interest was delivered via the computer, as scholarly material soon began to circulate over the Internet. This gave scholars in the humanities more options. For instance, they could better manage annotations with hypertext, conduct more efficient peer-review, and have a more expedient means to publication (citation).

There is really no solidity of definition when it comes to the term ‘digital humanities.’Broadly speaking, it is the intersection of computer technologies with traditional humanities disciplines. A variety of resources on the Web seek to explain the term, but each definition varies. One soon discovers that a definition of digital humanities is contingent on specific professions. For example, an archeologist will have a different approach to their field than a linguist and a linguist will have a different approach than a musician. Indeed, the humanities encompasses a wide range of professions that can be further subdivided into methodologies and typologies. It is the same in the digital humanities. Fred Gibbs, Associate Professor at George Mason University has analyzed what 170 participants working in some area of the digital humanities have said about DH. He found that participant’s answers could be lumped into nine different categories (citation). Of particular interest here are the categories he elucidated as “methods AND community” and “digitization / archives.” This can be seen in what the following respondent had to say about digital humanities:

“Creating, documenting, deploying and supporting software used in Humanities teaching and research; digitization, archiving and publication of Humanities texts through electronic means; using digital tools to generate and answer research questions related to Humanities texts; collaborating on Humanities projects through digital means; etc. etc… – Martin Holmes, University of Victoria” (citation).

Today, researchers in the humanities are increasingly producing online editions of texts and manuscripts. What’s more is that humanities-based research projects are going from a specialty area to a full-fledged community effort with the advent of crowdsourcing. Many crowdsourcing projects now abound in the digital humanities realm and GLAMS are beginning to explore the potential of this new phenomenon. Indeed, the more articulate and productive GLAMS are looking to distributed labor networks to keep the conversation going between the institution and the public. Crowdsourcing, however, is a term that developed peripherally, with no immediate relation to the digital humanities. Jeff Howe, a contributing editor for Wired magazine coined the term in 2006. Introducing the idea in his article, The Rise of Crowdsourcing, Howe says “Technological advances. . . are breaking down the cost barriers that once separated amateurs from professionals. . . The labor isn’t always free, but it costs a lot less than paying traditional employees. It’s not outsourcing; it’s crowdsourcing” (citation).

Howe was thinking mostly in terms of the business world, but he did open his article with an example from the humanities. Howe mentions Claudia Menashe, who was then-project director of the National Health Museum in Washington D.C. Looking to purchase some images related to the health care industry, she turned to iStockphoto, a “free image-sharing site. . . of amateur photographers – homemakers, students, engineers, dancers” (citation). Instead of paying a freelance photographer for her exhibition photos, Menashe looked to the crowd for a less expensive alternative and her discovery was not without affect. Now, a number of professionals in the cultural heritage domain are building on Howe’s idea. In crowdsourcing, they see the potential for a cost-effective strategy that will keep their institutions relevant in the digital age. Art and library directors, museum curators, archivists and historians of all stripes are beginning to see crowdsourcing as a way to work on projects which may have taken a backseat due to limited resources and skilled labor. Furthermore, these professionals see crowdsourcing as mutually beneficial. Theoretically, all individuals involved in crowdsourcing are in some way motivated to participate. According to interviews and surveys, some reasons to get involved in a crowdsourcing project as an amateur include, but are not limited to: the opportunity to develop one’s creative skills, build a portfolio for future employment, network with professionals, and contribute to a large project (citation). GLAMS recognize these incentives and are making their projects even more attractive by introducing them via social media.

Of course, there are different models of crowdsourcing and different organizations may rely on crowdsourced help for different reasons. As Howe predicted, the phenomenon is becoming a fairly standard component in the business world. For example, there are companies like InnoCentive, which allow the “worlds smartest people to compete to provide ideas and solutions to important business, social, policy, scientific, and technical challenges” (citation). There are countless examples like this that carry over into our media-saturated landscape. Businesses will rely on the crowd for information-gathering, scientific problems, market support, etc. Ideas are promulgated across Facebook, twitter and company-specific network platforms. However, crowdsourcing in the cultural heritage domain is a bit different from the business model. Instead, GLAMS will offer crowdsourcing platforms that resemble a model called ‘Distributed Human Intelligence Tasking.’ In this model, an organization tasks the crowd with analyzing large amounts of information. It is ideal for large-scale data analysis where human intelligence is more efficient or effective than computer analysis (citation). Therefore, the crowd that is sourced to participate is presented with a corpus of data that is already known. For example, GLAMS often have manuscripts and documents in their collections that they want transcribed.

One of the first to try crowdsourced transcription is University College London. Perhaps the most famous example of digital humanities crowdsourcing projects is their Transcribe Bentham Project. In the fall of 2010, University College London Centre for Digital Humanities invited anyone to help them transcribe some of the then-40,000 unpublished manuscripts of the British utilitarian philosopher Jeremy Bentham. These manuscripts were scanned and put online at the University’s Transcribe Bentham website. In preparation for their contributions, “users are given a long list of guidelines instructing them on how to enter codes for deletions, additions, marginal notes, headings and other textual quirks (citation). The guidelines may be extensive, but they are far from draconian rules. Indeed, there are plenty of personal choices in the endeavor, as users are free to choose manuscripts of any subject Bentham addressed. This includes drunkenness, swearing, adultery and much more. Of course, many users will choose to work on documents that are easier to read, since Bentham’s handwriting apparently deteriorated in his later years. Four months after opening to the public, 350 registered users produced 435 transcripts (citation). The transcripts, in turn, were reviewed and corrected by editors before being set aside to be used for printed editions of the collected works of Bentham. Following the justifications mentioned earlier, the editors at UCL Centre for Digital Humanities see the potential in Transcribe Bentham to “cut years, even decades, from the transcription process” (citation).

Another major crowdsourced project is the Papers of the War Department, sponsored by the Roy Rosenzweig Center for History and New Media at George Mason University. Here, 55,000 documents were reconstructed and housed from the original federal War Office in Washington D.C. The early documentary records had been ravaged by fire on November 8, 1800, but scholars were able to track down copies of War Department material from individual recipients (citation). There were as of this past September, 760 volunteers working to transcribe War Office records (citation). The Roy Rosenzweig Center for History and New Media have developed an open-source program called Scripto which allows administrators of universities, libraries, archives and museums to open up their collections to an institutional public. Scripto builds upon MediaWiki, the free software open-source package used for Transcribe Bentham (citation). The goal for the Papers of the War Department is to “use the best technology of the early twenty-first century to recover and make widely available the vital record of American history that was seemingly lost at the dawn of the nineteenth century” (citation). Transcribe Bentham and Papers of the War Department are perhaps the two biggest examples of crowdsourced transcription in the cultural heritage domain.

Of course, Transcribe Bentham and Papers of the War Department are just two examples in a larger scale of classification. Two authors from the Netherlands have done an empirical study of a substantial amount of projects initiated by relevant cultural heritage institutions. Johan Oomen and Lora Aroyo arrived at a list of six crowdsourcing initiatives that are offered by GLAMS to the public. These are Correction and Transcription Tasks, Contextualization, Complementing  Collection, Classification, Co-curation and Crowdfunding. We have already seen how Correction and Transcription Tasks work, but a brief summary of the following should further illustrate the nature of crowdsourcing in the cultural heritage domain. Contextualization is the activity of placing objects in a meaningful context. For example, the public may provide personal stories through a wiki-style web page to describe an object associated with public or family history. Complementing Collections is the joint task between an institution and the public to create collections that have been supplemented by crowd-contributions. Classification can be seen in the popular activity of social tagging – or creating taxonomies as finding-aids. Co-curation is when the institution draws on the inspiration or expertise of non-professional curators to create exhibits. Finally, Crowdfunding is the collective cooperation of people who pool their money and other resources together to support efforts initiated by an institution (citation).

Crowdsourcing advocates are excited over the potential to “improve access to material, build an engaged audience for collections and perhaps save money” (citation). This last reason makes crowdsourcing increasingly attractive because of the fiscal pressures facing GLAMS. However, there are those who deny the efficiency of crowdsourcing. Edward G. Lengel, editor in chief of the Papers of George Washington at the University of Virginia says of crowdsourcing, “[It is] an unproven concept” (citation). Commenting on Transcribe Bentham, Lengel goes on to state that “members of the public. . . are never going to be able to produce complete editions to the same level of accuracy that trained professions will. . .” (citation). This is because the public isn’t trained in documentary editing. Indeed, early analysis of data from the Transcribe Bentham project indicate that trained manuscript-readers would be able to operate at a pace two and a half times faster if they were devoted to transcription rather than moderating submissions (citation). Another detraction can be seen in a co-curated project: the Walters Art Museum “Public Property” exhibit. This past year, Walters Art Museum in Baltimore, Maryland sponsored an art exhibit in which “the show’s title, themes and artworks were chosen by more than 53,000 votes cast online and by museum visitors” (citation). Museum director Gary Vikan justifies the exhibit by using the engaged audience argument, but critics accuse Walters Art Museum for effectively dumbing down standards and forfeiting their traditional role of selecting and portraying culturally relevant and high-quality pieces of art. Never mind the fact that each piece selected for “Public Property” was already part of the Walters collection (citation). Some people are just resistant to change.

There are other crowdsourcing projects with similar goals. The National Archives has its Citizen Archivist Dashboard, where users can help tag images, transcribe manuscripts, edit articles as well as upload and share their own National Archives-related photos (citation). Another example is The George Eastman House International Museum of Photography and Film, which is undertaking a large online effort to “[tag] and [catalog] its archive of more than 400,000 images” (citation). There are of course many more, lesser-known projects that seek to engage the public in what we have seen to be a mutually-beneficial exercise. However, there may be risks to sponsoring institutions offering crowdsourced projects to the public. Indeed, members on crowdsourcing networks can potentially subvert and sabotage an institution’s goals. In crowdsourcing, GLAMS often have to specify the parameters of a project which may expose some of their proprietary computer data, making them vulnerable to hackers. Online community management and security are definitely important considerations for GLAMS heading into the future.

Overall, crowdsourcing in the cultural heritage domain is important because it encourages social memory and the production of an enlightened citizenry. Furthermore, crowdsourcing promotes the values of democracy through transparency and open participation. However, when we speak of democracy and democratizing the Web, we should be careful not to lavish crowdsourcing with too much praise. Since crowdsourcing occurs on the Internet, and since Internet access is lower among the economically disadvantaged and racial and ethnic minorities, we can’t really claim that crowdsourcing is altogether improving society – even in the area of the cultural heritage domain. After all, those who engage in crowdsourcing projects are likely people of comfortable socioeconomic standing who have access to higher education. Indeed, many probably get involved because of higher education. Therefore, although crowdsourcing is free, opportunities for everyone who wish to participate are still largely limited. Poverty and access to education are important issues, but another important issue may be closer to the heart of GLAMS. If GLAMS are to digitize their collections, it may lead to less patrons coming through their doors. Whether this is a legitimate concern or not, GLAMS are still in a unique position to try new things to stay relevant today.