Somehow it’s a month since I posted about my CENDARI research project (in Moving forward: modelling and indexing WWI battalions) on this site. That probably reflects the rhythm of the project – less trying to work out what I want to do and more getting on with doing it. A draft post I started last month simply said, ‘A lot of battalions were involved in World War One’. I’ll do a retrospective post soon, and here’s a quick summary of on-going work.
First, a quick recap. My project has two goals – one, to collect a personal narrative for each battalion in the Allied armies of the First World War; two, to create a service that would allow someone to ask ‘where was a specific battalion at a specific time?’. Together, they help address a common situation for people new to WWI history who might ask something like ‘I know my great-uncle was in the 27th Australian battalion in March 1916, where would he have been and what would he have experienced?’.
You can see the infobox structures in progress by flipping from the talk to the Template tabs. You’ll need to request an account to join in but more views, sample data and edge cases would be really welcome.
Populating the list of battalions and other units has been a huge task in itself, partly because very few cultural institutions have definitive lists of units they can (or want to) share, but it’s necessary to support both core goals. I’ve been fortunate to have help (see ‘Thanks and recent contributions’ on ‘How you can help‘) but the task is on-going so get in touch if you can help!
So there are three different ways you can help with ‘In their own words: collecting experiences of the First World War’:
If you can help with any of that, let me know! Or just get stuck in and edit the site.
I’ve started another Google Doc with very sketchy Notes towards modelling information about World War One Battalions. I need to test it with more battalion histories and update it iteratively. At this stage my thinking is to turn it into an InfoBox format to create structured data via the wiki. It’s all very lo-fi and much less designed than my usual projects, but I’m hoping people will be able to help regardless.
So, in this phase of the project, the aim is find a personal narrative – a diary, letters, memoirs or images – for each military unit in the British Army. Can you help?
However, the main point of this post is to publicly thank those who’ve helped by commenting and sharing on the doc, on twitter or via email. Hopefully I’m not forgetting anyone, as I’ve been blown away by and am incredibly grateful for the generosity of those who’ve taken the time to at least skim 1600 words (!). It’s all helped me clarify my ideas and find solutions I’m able to start implementing next week. In no order at all – at CENDARI, Jennifer Edmond, Alex O’Connor, David Stuart, Benjamin Štular, Francesca Morselli, Deirdre Byrne; online Andrew Gray @generalising; Alex Stinson @ DHKState; jason webber @jasonmarkwebber; Alastair Dunning @alastairdunning; Ben Brumfield @benwbrum; Christine Pittsley; Owen Stephens @ostephens; David Haskiya @DavidHaskiya; Jeremy Ottevanger @jottevanger; Monika Lechner @lemondesign; Gavin Robinson @merozcursed; Tom Pert @trompet2 – thank you all!
Worthy goals (i.e. things I’m hoping to accomplish, with the help of historians and the public; only some of which I’ll manage in the time)
At the end of this project, someone who wants to research a soldier in WWI but doesn’t know a thing about how armies were structured should be able to find a personal narrative from a soldier in the same bit of the army, to help them understand experiences of the Great War.
Hopefully these personal accounts will provide some context, in their own words, for the lived experiences of WWI. Some goals listed are behind-the-scenes stuff that should just invisibly make personal diaries, letters and memoirs more easily discoverable. It needs datasets that provide structures that support relationships between people and documents; participatory interfaces for creating or enhancing information about contemporary materials (which feed into those supporting structures), and interfaces that use the data created.
More specifically, my goals include:
A personal account by someone in each unit linked to that unit’s record, so that anyone researching a WWI name would have at least one account to read. To populate this dataset, personal accounts (diaries, letters, etc) would need to be linked to specific soldiers, who can then be linked to specific units. Linking published accounts such as official unit histories would be a bonus. [Semantic MediaWiki]
Researched links between individual men and the units they served in, to allow their personal accounts to be linked to the relevant military unit. I’m hoping I can find historians willing to help with the process of finding and confirming the military unit the writer was in. [Semantic MediaWiki]
A platform for crowdsourcing the transcription and annotation of digitised documents. The catch is that the documents for transcription would be held remotely on a range of large and small sites, from Europeana’s collection to library sites that contain just one or two digitised diaries. Documents could be tagged/annotated with the names of people, places, events, or concepts represented in them. [Semantic MediaWiki??]
A published webpage for each unit, to hold those links to official and personal documents about that unit in WWI. In future this page could include maps, timelines and other visualisations tailored to the attributes of a unit, possibly including theatres of war, events, campaigns, battles, number of privates and officers, etc. (Possibly related to CENDARI Work Package 9?) [Semantic MediaWiki]
A better understanding of what people want to know at different stages of researching WWI histories. This might include formal data gathering, possibly a combination of interviews, forum discussions or survey
Goals that are more likely to drop off, or become quick experiments to see how far you can get with accessible tools:
Trained ‘named entity recognition’ and ‘natural language processing’ tools that could be run over transcribed text to suggest possible people, places, events, concepts, etc [this might drop off the list as the CENDARI project is working on a tool called Pineapple (PDF poster). That said, I’ll probably still experiment with the Stanford NER tool to see what the results are like]
A way of presenting possible matches from the text tools above for verification or correction by researchers. Ideally, this would be tied in with the ability to annotate documents
The ability to search across different repositories for a particular soldier, to help with the above.
Another update from my CENDARI Fellowship at Trinity College Dublin, looking at ‘In their own words: linking lived experiences of the First World War’, which is a small-scale, short-term pilot based on WWI collections. My first post is Defining the scope: week one as a CENDARI Fellow. Over the past two weeks I’ve done a lot of reading – more WWI diaries and letters; WWI histories and historiography; specialist information like military structures (orders of battle, etc). I’ve also sketched out lots of snippets of possible functions, data, relationships and other outcomes.
I’ve narrowed the key goal (or minimum viable product, if you prefer) of my project to linking personal accounts of the war – letters, diaries, memoirs, photographs, etc – to battalions, by creating links from the individual who wrote them to their military unit. Once these personal accounts are linked to particular military units, they can be linked to higher units – from the battalion, ship or regiment to brigade, corps, etc – and to particular places, activities, events and campaigns. The idea behind this is to provide context for an individual’s experience of WWI by linking to narratives written by people in the same situation. I’m still working out how to organise the research process of matching the right soldier to the right battalion/regiment/ship so that relevant personal stories are discoverable. I’m also still working out which attributes of a battalion are relevant, how granular the data will be, and how to design for the inevitable variation in data quality (for example, the availability of records for different armies varies hugely). Finally, I’m still working out which bits need computer science tools and which need the help of other historians.
Given the number of centenary projects, I was hoping to find more structured data about WWI entities. Trenches to Triples would be useful source of permanent URLs, and terms to train named entity recognition, but am I missing other sources?
There’s a lot of content, and so much activity around WWI records, but it’s spread out across the internet. Individual people and small organisations are digitising and transcribing diaries and letters. Big collecting projects like Europeana have lots of personal accounts, but they’re often not transcribed and they don’t seem to be linked to structured data about the item itself. Some people have painstakingly transcribed unit diaries, but they’re not linked from the official site, so others wouldn’t know there’s a more easily read version of the diary available. I’ve been wondering if you could crowdsource the process of transcribing records held elsewhere, and offer the transcripts back to sites. Using dedicated transcription software would let others suggest corrections, and might also make it possible to link sections of the text to external ‘entities’ like names, places, events and concepts.
Albert Henry Bailey. Image: Sir George Grey Special Collections, Auckland Libraries, AWNS-19150909-39-5
To help figure out the issues researchers face and the variations in available resources, I’m researching randomly selected soldiers from different Allied forces. I’ve posted my notes on Private Albert Henry Bailey, service number 13/970a. You’ll see that they’re in prose form, and don’t contain any structured data. Most of my research used digitised-but-not-transcribed images of documents, with some transcribed accounts. It would definitely benefit from deeper knowledge of military history – for a start, which battalions were in the same place as his unit at the same time?
If there aren’t already any structured data sources for military hierarchies in WWI, do I have to make one? And if so, how? The idea would be to turn prose descriptions like this Australian War Memorial history of the 27th AIF Battalion, this order of battle of the 2nd Australian Division and any other suitable sources into structured data. I can see some ways it might be possible to crowdsource the task, but it’s a big task. But it’s worth it – providing a service that lets people look up which higher military units, places. activities and campaigns a particular battalion/regiment/ship was linked to at a given time would be a good legacy for my research.
I’m sure I’m forgetting lots of things, and my list of questions is longer than my list of answers, but I should end here. To close, I want to share a quote from the official history of the Auckland Mounted Rifles. The author said he ‘would like to speak of the splendid men of the rank and file who died during this three months’ struggle. Many names rush to the memory, but it is not possible to mention some without doing an injustice to the memory of others’. I guess my project is driven by a vision of doing justice to the memory of every soldier, particularly those ordinary men who aren’t as easily found in the records. I’m hoping that drawing on the work of other historians and re-linking disparate sources will help provide as much context as possible for their experiences of the First World War.
There are two parts to my CENDARI project ‘Bridging collections with a participatory Commons: a pilot with World War One archives’. The first involves working on the technical, data and cultural context/requirements for the ‘participatory history commons’ as an infrastructure; the second is a demonstrator based on that infrastructure. I’ll be working out how official records and ‘shoebox archives’ can be mined and indexed to help provide what I’m calling ‘computationally-generated context’ for people researching lives touched by World War One.
This week I’ve read metadata schema (MODS extended with TEI and a local schema, if you’re interested) and ontology guidelines, attended some lively seminars on Irish history, gotten my head around CENDARI’s work packages and the structure of the British army during WWI. I’ve started a list of nearby local history societies with active research projects to see if I can find some working on WWI history – I’d love to work with people who have sources they want to digitise and generally do more with, and people who are actively doing research on First World War lives. I’ve started to read sample primary materials and collect machine-readable sources so I can test out approaches by manually marking-up and linking different repositories of records. I’m going to spend the rest of the day tidying up my list of outcomes and deliverables and sketching out how all the different aspects of my project fit together. And tonight I’m going to check out some of the events at Discover Research Dublin. Nerd joy!
‘The cooperative archive’?
Finally, I’ve dealt with something I’d put off for ages. ‘Commons’ is one of those tricky words that’s less resonant than it could be, so I looked for a better name than the ‘participatory history commons’. because ‘commons’ is one of those tricky words that’s less resonant than it could be. I doodled around words like collation, congeries, cluster, demos, assemblage, sources, commons, active, engaged, participatory, opus, archive, digital, posse, mob, cahoots and phrases like collaborative collections, collaborative history, history cooperative, but eventually settled on ‘cooperative archive’. This appeals because ‘cooperative’ encompasses attitudes or values around working together for a common purpose, and it includes those who share records and those who actively work to enhance and contextualise them. ‘Archive’ suggests primary sources, and can be applied to informal collections of ‘shoebox archives’ and the official holdings of museums, libraries and archives.
What do you think – does ‘cooperative archive’ work for you? Does your first reaction to the name evoke anything like my thoughts above?
Update, October 11: following some market testing on Facebook, it seems ‘collaborative collections’ best describes my vision.
Enriching cultural heritage collections through a Participatory Commons platform: a provocation about collaborating with users
Mia Ridge, Open University Contact me: @mia_out or http://miaridge.com/
[I was invited to Copenhagen to talk about my research on crowdsourcing in cultural heritage at the 3rd international Sharing is Caring seminar on April 1. I’m sharing my notes in advance to make life easier for those awesome people following along in a second or third language, particularly since I’m delivering my talk via video.]
Today I’d like to present both a proposal for something called the ‘Participatory Commons’, and a provocation (or conversation starter): there’s a paradox in our hopes for deeper audience engagement through crowdsourcing: projects that don’t grow with their participants will lose them as they develop new skills and interests and move on. This talk presents some options for dealing with this paradox and suggests a Participatory Commons provides a way to take a sector-wide view of active engagement with heritage content and redefine our sense of what it means when everybody wins.
I’d love to hear your thoughts about this – I’ll be following the hashtag during the session and my contact details are above.
Before diving in, I wanted to reflect on some lessons from my work in museums on public engagement and participation.
My philosophy for crowdsourcing in cultural heritage (aka what I’ve learnt from making crowdsourcing games)
One thing I learnt over the past years: museums can be intimidating places. When we ask for help with things like tagging or describing our collections, people want to help but they worry about getting it wrong and looking stupid or about harming the museum.
The best technology in the world won’t solve a single problem unless it’s empathically designed and accompanied by social solutions. This isn’t a talk about technology, it’s a talk about people – what they want, what they’re afraid of, how we can overcome all that to collaborate and work together.
Dora’s Lost Data
So a few years ago I explored the potential of crowdsourcing games to make helping a museum less scary and more fun. In this game, ‘Dora’s Lost Data‘, players meet a junior curator who asks them to tag objects so they’ll be findable in Google. Games aren’t the answer to everything, but identifying barriers to participation is always important. You have to understand your audiences – their motivations for starting and continuing to participate; the fears, anxieties, uncertainties that prevent them participating. [My games were hacked together outside of work hours, more information is available at My MSc dissertation: crowdsourcing games for museums; if you’d like to see properly polished metadata games check out Tiltfactor’s http://www.metadatagames.org/#games]
Mutual wins – everybody’s happy
My definition of crowdsourcing: cultural heritage crowdsourcing projects ask the public to undertake tasks that cannot be done automatically, in an environment where the activities, goals (or both) provide inherent rewards for participation, and where their participation contributes to a shared, significant goal or research area.
It helps to think of crowdsourcing in cultural heritage as a form of volunteering. Participation has to be rewarding for everyone involved. That sounds simple, but focusing on the audiences’ needs can be difficult when there are so many organisational needs competing for priority and limited resources for polishing the user experience. Further, as many projects discover, participant needs change over time…
What is a Participatory Commons and why would we want one?
First, I have to introduce you to some people. These are composite stories (personas) based on my research…
Two archival historians, Simone and Andre. Simone travels to archives in her semester breaks to stock up on research material, taking photos of most documents ‘in case they’re useful later’, transcribing key text from others. Andre is often at the next table, also looking for material for his research. The documents he collected for his last research project would be useful for Simone’s current book but they’ve never met and he has no way of sharing that part of his ‘personal research collection’ with her. Currently, each of these highly skilled researchers take their cumulative knowledge away with them at the end of the day, leaving no trace of their work in the archive itself. Next… Two people from a nearby village, Martha and Bob. They joined their local history society when they retired and moved to the village. They’re helping find out what happened to children from the village school’s class of 1898 in the lead-up to and during World War I. They are using census returns and other online documents to add records to a database the society’s secretary set up in Excel. Meanwhile…
A family historian, Daniel. He has a classic ‘shoebox archive’ – a box containing his grandmother Sarah’s letters and diary, describing her travels and everyday life at the turn of the century. He’s transcribing them and wants to put them online to share with his extended family. One day he wants to make a map for his kids that shows all the places their great-grandmother lived and visited. Finally, there’s… Crowdsourcer Nisha.She has two young kids and works for a local authority. She enjoys playing games like Candy Crush on her mobile, and after the kids have gone to bed she transcribes ship logs on the Old Weather website while watching TV with her husband. She finds it relaxing, feels good about contributing to science and enjoys the glimpses of life at sea. Sites like Old Weather use ‘microtasks’ – tiny, easily accomplished tasks – and crowdsourcing to digitise large amounts of text.
Helping each other?
None of our friends above know it, but they’re all looking at material from roughly the same time and place. Andre and Simone could help each other by sharing the documents they’ve collected over the years. Sarah’s diaries include the names of many children from her village that would help Martha and Bob’s project, and Nisha could help everyone if she transcribed sections of Sarah’s diary.
Connecting everyone’s efforts for the greater good: Participatory Commons
This image shows the two main aspects of the Participatory Commons: the different sources for content, and the activities that people can do with that content.
The Participatory Commons (image: Mia Ridge)
The Participatory Commons is a platform where content from different sources can be aggregated. Access to shared resources underlies the idea of the ‘Commons’, particularly material that is not currently suitable for sites like Europeana, like ‘shoebox archives’ and historians’ personal record collections. So if the ‘Commons’ part refers to shared resources, how is it participatory?
The Participatory Commons interface supports a range of activities, from the types of tasks historians typically do, like assessing and contextualising documents, activities that specialists or the public can do like identifying particular people, places, events or things in sources, or typical crowdsourcing tasks like fulltext transcription or structured tagging.
By combining the energy of crowdsourcing with the knowledge historians create on a platform that can store or link to primary sources from museums, libraries and archives with ‘shoebox archives’, the Commons could help make our shared heritage more accessible to all. As a platform that makes material about ordinary people available alongside official archives and as an interface for enjoyable, meaningful participation in heritage work, the Commons could be a basis for ‘open source history’, redressing some of the absences in official archives while improving the quality of all records.
As a work in progress, this idea of the Participatory Heritage Commons has two roles: an academic thought experiment to frame my research, and as a provocation for GLAMs (galleries, museums, libraries, archives) to think outside their individual walls. As a vision for ‘open source history’, it’s inspired by community archives, public history, participant digitisation and history from below… This combination of a large underlying repository and more intimate interfaces could be quite powerful. Capturing some of the knowledge generated when scholars access collections would benefit both archives and other researchers.
‘Niche projects’ can be built on a Participatory Commons
As a platform for crowdsourcing, the Participatory Commons provides efficiencies of scale in the backend work for verifying and validating contributions, managing user accounts, forums, etc. But that doesn’t mean that each user would experience the same front-end interface.
Niche projects build on the Participatory Commons (quick and dirty image: Mia Ridge)
My research so far suggests that tightly-focused projects are better able to motivate participants and create a sense of community. These ‘niche’ projects may be related to a particular location, period or topic, or to a particular type of material. The success of the New York Public Library’s What’s on the Menu project, designed around a collection of historic menus, and the British Library’s GeoReferencer project, designed around their historic map collection, both demonstrate the value of defining projects around niche topics.
The best crowdsourcing projects use carefully designed interactions tailored to the specific content, audience and data requirements of a given project. These interactions are usually For example, the Zooniverse body of projects use much of the same underlying software but projects are designed around specific tasks on specific types of material, whether classifying simple galaxy types, plankton or animals on the Serengeti, or transcribing ship logs or military diaries.
The Participatory Commons is not only a collection of content, it also allows ‘niche’ projects to be layered on top, presenting more focused sets of content, and specialist interfaces designed around the content, audience and purpose.
Now I want to set the idea of the Participatory Commons aside for a moment, and return to crowdsourcing in cultural heritage. I’ve been looking for factors in the success or otherwise of crowdsourcing projects, from grassroots, community-lead projects to big glamorous institutionally-lead sites.
I mentioned that Nisha found transcribing text relaxing. Like many people who start transcribing text, she found herself getting interested in the events, people and places mentioned in the text. Forums or other methods for participants to discuss their questions seem to help keep participants motivated, and they also provide somewhere for a spark of curiosity to grow (as in this forum post). We know that some people on crowdsourcing projects like Old Weather get interested in history, and even start their own research projects.
Crowdsourcing as gateway to further activity
You can see that happening on other crowdsourcing projects too. For example, Herbaria@Homeaims to document historical herbarium collections within museums based on photographs of specimen cards. So far participants have documented over 130,000 historic specimens. In the process, some participants also found themselves being interested in the people whose specimens they were documenting.
As a result, the project has expanded to include biographies of the original specimen collectors. It was able to accommodate this new interest through a project wiki, which has a combination of free text and structured data linking records between the transcribed specimen cards and individual biographies.
‘Levels of Engagement’ in citizen science
There’s a consistent enough pattern in science crowdsourcing projects that there’s a model from ‘citizen science’ that outlines different stages participants can move through, from undertaking simple tasks, joining in community discussion, through to ‘working independently on self-identified research projects’.
There’s a tension between GLAM’s desire to invite people to ‘go deeper’, to find their own research interests, to begin to become citizen historians; and the desire to ask people to help us with tasks set by GLAMs to help their work. Heritage organisations can try to channel that impulse to start research into questions about their own collections, but sometimes it feels like we’re asking people to do our homework for us. The scaffolds put in place to help make tasks easier may start to feel like a constraint.
Who has agency?
If people move beyond simple tasks into more complex tasks that require a greater investment of time and learning, then issues of agency – participants’ ability to make choices about what they’re working on and why – start to become more important. Would Wikipedia have succeeded if it dictated what contributors had to write about? We shouldn’t mistake volunteers for a workforce just because they can be impressively dedicated contributors.
Participatory project models
Turning again to citizen science – this time public participation in science research, we have a model for participatory projects according to the amount of control participants have over the design of the project itself – or to look at it another way, how much authority the organisation has ceded to the crowd. This model contains three categories: ‘contributory’, where the public contributes data to a project designed by the organisation; ‘collaborative’, where the public can help refine project design and analyse data in a project lead by the organisation; and ‘co-creative’, where the public can take part in all or nearly all processes, and all parties design the project together.
As you can imagine, truly co-creative projects are rare. It seems cultural organisations find it hard to truly collaborate with members of the public; for many understandable reasons. The level of transparency required, and the investment of time for negotiating mutual interests, goals and capabilities increase as collaboration deepens. Institutional constraints and lack of time to engage in deep dialogue with participants make it difficult to find shared goals that work for all parties. It seems GLAMs sometimes try to take shortcuts and end up making decisions for the group, which means their ‘co-creative’ project is actually more just ‘collaborative’.
When participants start to out-grow the tasks that originally got them hooked, projects face a choice. Some projects are experimenting with setting challenges for participants. Here you see ‘mysteries’ set by the UK’s Museum of Design in Plastics, and by San FranciscoPublic Library on History Pin. Finding the right match between the challenge set and the object can be difficult without some existing knowledge of the collection, and it can require a lot of on-going time to encourage participants. Putting the mystery under the nose of the person who has the knowledge or skills to solve it is another challenge that projects like this will have to tackle.
Working with existing communities of interest is a good start, but it also takes work to figure out where they hang out online (or in-person) and understand how they prefer to work. GLAMs sometimes fall into the trap of choosing the technology first, or trying something because it’s trendy; it’s better to start with the intersection between your content and the preferences of potential audiences.
But is it wishful thinking to hope that others will be interested in answering the questions GLAMs are asking?
Should projects accept that some people will move on as they develop new interests, and concentrate on recruiting new participants to replace them? Do they try to find more interesting tasks or new responsibilities for participants, such as helping moderate discussions, or checking and validating other people’s work? Or should they find ways for the project grow as participants’ skill and knowledge increase? It’s important to make these decisions mindfully as the default is otherwise to accept a level of turnover as participants move on.
To return to lessons from citizen science, possible areas for deeper involvement include choosing or defining questions for study, analysing or interpreting data and drawing conclusions, discussing results and asking new questions.However, heritage organisations might have to accept that the questions people want to ask might not involve their collections, and that these citizen historians’ new interests might not leave time for their previous crowdsourcing tasks.
Why is a critical mass of content in a Participatory Commons useful?
And now we return to the Participatory Commons and the question of why a critical mass of content would be useful.
Increasingly, the old divisions between museum, library and archive collections don’t make sense. For most people, content is content, and they don’t understand why a pamphlet about a village fete in 1898 would be described and accessed differently depending on whether it had ended up in a museum, library or archive catalogue.
Basing niche projects on a wider range of content creates opportunities for different types of tasks and levels of responsibility. Projects that provide a variety of tasks and roles can support a range of different levels and types of participant skills, availability, knowledge and experience.
A critical mass of material is also important for the discoverability of heritage content. Even the most sophisticated researcher turns to Google sometimes, and if your content doesn’t come up in the first few results, many researchers will never know it exists. It’s easy to say but less easy to make a reality: the easier it is to find your collections, the more likely it is that researchers will use them.
Commons as party?
More importantly, a critical mass of content in a Commons allows us to re-define ‘winning’. If participation is narrowly defined as belonging to individual GLAMs, when a citizen historian moves onto a project that doesn’t involve your collection then it can seem like you’ve lost a collaborator. But the people who developed a new research interest through a project at one museum might find they end up using records from the archive down the road, and transcribing or enhancing their records during their investigation. If all the institutions in the region shared their records on the Commons or let researchers take and share photos while using their collections, the researcher has a critical mass of content for their research and hopefully as a side-effect, their activities will improve links between collections. If the Commons allows GLAMs to take a sector-wide view then someone moving on to a different collection becomes a moment to celebrate, a form of graduation. In our wildest imagination, the Commons could be like a fabulous party where you never know what fabulous interesting people and things you’ll discover…
To conclude – by designing platforms that allow people to collect and improve records as they work, we’re helping everybody win.
Thank you! I’m looking forward to hearing your thoughts.
M. Jordan Raddick et al., ‘Citizen Science: Status and Research Directions for the Coming Decade’, in astro2010: The Astronomy and Astrophysics Decadal Survey, vol. 2010, 2009, http://www8.nationalacademies.org/astro2010/DetailFileDisplay.aspx?id=454.
Rick Bonney et al., Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report (Washington D.C.: Center for Advancement of Informal Science Education (CAISE), July 2009), http://caise.insci.org/uploads/docs/PPSR%20report%20FINAL.pdf.
Bonney et al., Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report.
These are some of my notes for my invited plenary talk at GLAM-Wiki 2013 (Galleries, Libraries, Archives, Museums & Wikimedia, #GLAMWiki), held at the British Library on April 12-13, 2013. I don’t think I stuck that closely to them on the day, and in the interests of brevity I’ve left out the ‘timeline’ bits (but you can read about some of them in a related MuseumID article, ‘Where next for open cultural data in museums?‘) to focus on the lessons to be learnt from changes so far. There were lots of great talks and discussion at the event, you can view some of the presentations on Wikimedia UK’s YouTube channel.
A (now very) brief history of open cultural data
Firstly, thank you for the invitation to speak… This morning I want to highlight some key moments of change in the history of open cultural data – a history not only of licenses and data, but also of conversations, standards, and collaborations, of moments where things changed… I’ve included key moments from funders, legislative influences and the commercial sector too, as they create the context in which change happens and often have an effect on what’s considered possible. I’ll close by considering some of the lessons learnt.
‘open cultural data’ is data from cultural institutions that is made available for use in a machine-readable format under an open licence. But each word in open, cultural, data is slightly more complicated so I’ll unpack them a little…
Office clerks, FNV. Voorlichting.
While the degree of openness required to be ‘open’ data can be contentious, at its simplest, ‘open’ refers to content that is available for use outside the institution that created it, whether for school homework projects, academic monographs or mobile phone apps. ‘Open’ may refer to licences that clarify the permissions and restrictions placed on data, or to the use of non-proprietary digital technologies, or ideally, to a combination of both open licences and technologies.
Ideally, open data is freely available for use and redistribution by anyone for any purpose, but in reality there are often restrictions. GLAMs may limit commercial use by licensing content for ‘non-commercial use only’, but as there is no clear definition of ‘non-commercial use’ in Creative Commons licences, some developers may choose not to risk using a dataset with an unclear licence. GLAMs may also release data for commercial use but still require attribution, either to help retain the provenance of the content, to help people find their way to related content or just because they’d like some credit for their work. GLAMs might also release data under custom licences that deal with their specific circumstances, but they are then difficult to integrate with content from other openly-licensed datasets.
Hybrid licensing models are a pragmatic solution for the current environment. They at least allow some use and may contribute to greater use of open cultural data while other issues are being worked out. For example, some institutions in the UK are making lower resolutions images available for re-use under an open licence while reserving high resolution versions for commercial sales and licensing. Or they may differentiate between scholarly and commercial use, or use more restrictive licences for commercially valuable images and release everything else openly.
I think this type of access is better than nothing, particularly if organisations can learn from the experience and release more data next time. Because these hybrid models are often experimental, their reception is important, and it’s helpful for GLAMs to be able to show they’ve had a positive impact and hopefully helped create relationships with groups like Wikipedia.
Cultural data is data about objects, publications (such as books, pamphlets, posters or musical scores), archival material, etc, created and distributed by museums, libraries, archives and other organisations.
It’s a useful distinction to discuss early with other cultural heritage staff as it’s easy to be talking at cross-purposes: data can refer to different types of content, from metadata or tombstone records (the basic titles, names, dates, places, materials, etc of a catalogue record), to entire collection records (including data such as researched and interpretive descriptions of objects, bibliographic data, related themes and narratives) to full digital surrogates of an object, document or book as images or transcribed text. Some organisations release open metadata, others release all their data including their images. If you can’t do open data (full content or ‘digital surrogates’ like photographs or texts) then at least open up the metadata (data about the content) as e.g. CC0 and the rest with another licence. Releasing data may involve licensing images, offering downloads from catalogue sites; ‘content donations’, APIs and machine-facing interfaces; term lists, etc. Much of the data that isn’t images isn’t immediately interesting, and may be designed for inter-collections interoperability or mashups rather than media commons.
Why is open cultural data important?
Before I go on, why do we care? Open cultural data is the foundation on which many projects can be built. It helps achieve organisational goals, mission; can help increase engagement with content; can create ‘network effect’ with related institutions; can be re-used by people who share your goals around access to knowledge and information – people like Wikipedians.
Some key moments in open cultural data
Events I discussed included the founding of Wikimedia, Europeana and Flickr Commons, previous GLAM-Wiki conferences, changes in licences for art images, library catalogue records and museum content, GLAM APIs and linked data services and the launch of the Digital Public Library of America next week.
Many of the changes are the results of years of conversation and collaboration – change is slow but it does happen. GLAMs work through slow iterations – try something, and if no-one dies, they’ll try something else. We are all ambassadors, and we are all translators, helping each domain understand the other.
Contradictory things GLAMs are told they must do
Give content away for the benefit of all
Monetise assets; protect against loss of potential income; protect against mis-use of collections; conserve collections in perpetuity; protect the IP of artists; demonstrate ROI on digitisation
It’s not easy for GLAMs to release all their data under an entirely open licence, but they don’t do it just to be annoying – it’s important to understand some of the pressures they’re under. For example, GLAMs usually need to be able to track uses of their data and content to show the impact of digitising and publishing content, so they prefer attribution licences.
The issue of potential lost income – imaginary money that could be made one day if circumstances change, or profit that someone else makes off their opened data – is particularly difficult as hard to deal with [and here I ad-libbed, saying that it was like worrying about failing to meet the love of your life because you got on a different tube carriage – you can’t live your life chasing ghosts]. Ideally, open data needs to be understood as an input to the creative economy rather than an item on the balance sheet of an individual GLAM.
GLAMs worry about reputational damage, whether appearing on the front page of a tabloid newspaper for the ‘wrong’ reasons, questions being asked in Parliament, or critique from Wikipedians. Over time, their mindset is changing from keeping ‘our data’ to being holders, custodians of our shared heritage.
Conversations, communities, collaborations
Conversations matter… we’re all working towards the same goal, but we have different types of anxieties and different problems we have to address.
GLAMs are about collections, knowledge, and audiences. Unlike most online work, they are used to seeing the excitement people experience walking through their door – help GLAMs understand what Wikipedians can do for different audiences by making those audience real to them. GLAMs are also used to being wined and dined before you lay the hard word on them. Just because you don’t need to ask for permission to use content doesn’t mean you shouldn’t start a conversation with an organisation. There are lots of people with similar goals inside organisations, so try to find them and work with them. Trust is a currency, don’t blow it!
Being truly collaborative sometimes means compromising (or picking your battles) and it definitely means practising empathy. Open data people could stop talking about open data as something you *do* to GLAMs, and GLAMs could stop thinking open data people just want to make your life difficult.
The role of higher powers
Government attitudes to open data make a big difference and they can also change the risks associated with publishing orphan works. Governments can also help GLAMs open up their content by indemnifying them against the chance that someone else will monetise their data – consider it not a failure of the GLAM but a contribution to the creative and digital economy.
Things that are better than a poke in the eye with a sharp stick
Kittens (and puppies)
Cultural data that’s available online but isn’t (yet) openly licensed
Cultural data online that is licensed for non-commercial use
Yes, the last two aren’t ideal, but they are great deal better than nothing.
Into the future…
GLAMs and Wikipedians may move at different paces, and may have different priorities and different ways of viewing the world, but we’re all working towards the same goals. Not everything is as open, but a lot more is open than it used to be. I sensed yesterday [the first day of the conference] that there are still some tensions between Wikimedians and GLAMers, moments when we need to take a deep breath and put empathy before a pithy put down, but I loved that Kat Walsh’s welcome yesterday described how Wikipedia used to focus on how different from others but now focuses on reaching out to others and figuring out how we’re the same.
GLAMs and Wikipedians have already used open cultural data to make the world a better place. Let’s celebrate the progress we’ve made and keep working on that…
Congratulations to everyone who helped make it a great event, but particularly to Daria Cybulska and Andrew Gray (@generalising) for making everything work so smoothly, and Liam Wyatt (@wittylama) for the original invitation to speak.
I’ve called this post ‘Reflections on teaching Neatline’ but I could also have called it ‘when new digital humanists meet new software’. Or perhaps even ‘growing pains in the digital humanities?’.
A few months ago, Anouk Lang at the University of Strathclyde asked me to lead a workshop on Neatline, software from the Scholar’s Lab that plots ‘archives, objects, and concepts in space and time’. It’s a really exciting project, designed especially for humanists – the interfaces and processes are designed to express complexity and nuance through handcrafted exhibits that link historical materials, maps and timelines.
The workshop was on Thursday, and looking at the evaluation forms, most people found it useful but a few really struggled and teaching it was also slightly tough going. I’ve been thinking a lot about the possible reasons for that and I’m sharing them both as a request for others to share their experiences in similar circumstances and also in the hope that they’ll help others.
The basic outline of the workshop was an intros round (who I am, who they are and what they want to learn); information on what Neatline is and what it can do; time to explore Neatline and explore what the software can and can’t do (e.g. login, follow the steps at neatline.org/plugins/neatline to create an item based on a series of correspondence Anouk had been working on, deciding whether you want to transcribe or describe the letter, tweaking its appearance or linking it to other items); and a short period for reflection and discussion (e.g. ‘What kinds of interpretive decisions did you find yourself making? What delighted you? What frustrated you?’) to finish. If you’re curious, you can follow along with my slides and notes or try out the Neatline sandbox site.
The first half was fine but some people really struggled with the hands-on section. Some of it was to do with the software itself – as a workshop, it was a brilliant usability test of the admin interfaces of the software for audiences outside the original set of users. Neatline was only launched in July this year and isn’t even in version 2 yet so it’s entirely understandable that it appears to have a few functional or UX bugs. The documentation isn’t integrated into the interface yet (and sometimes lacks information that is probably part of the shared tacit knowledge of people working on the project) but they have a very comprehensive page about working with Neatline items. Overall, the process of handcrafting timelines and maps for a Neatline exhibit is still closer to ‘first, catch your rabbit‘ than making a batch of ready-mix cupcakes. Neatline is also designed for a particular view of the world, and as it’s built on top of other software (Omeka) with another very particular view of the world (and hello, Dublin Core), there’s a strong underlying mental model that informs the processes for creating content that is foreign to many of its potential users, including some at the workshop.
But it was also partly because I set the bar too high for the exercises and didn’t provide enough structure for some of the group. If I’d designed it so they created a simple Neatline item by closely following detailed instructions (as I have done for other, more consciously tech-for-beginners workshops), at least everyone would have achieved a nice quick win and have something they could admire on the screen. From there some could have tried customising the appearance of their items in small ways, and the more adventurous could have tried a few of the potential ways to present the sample correspondence they were working with to explore the effects of their digitisation decisions. An even more pragmatic but potentially divisive solution might have been to start with the background and demonstration as I did, but then do the hands-on activity with a smaller group of people who were up for exploring uncharted waters. On a purely practical level, I also should have uploaded the images of the letters used in the exercise to my own host so that they didn’t have to faff with Dropbox and Omeka records to get an online version of the image to use in Neatline.
And finally it was also because the group had really mixed ICT skills. Most were fine (bar the occasional bug), but some were not. It’s always hard teaching technical subjects when participants have varying levels of skill and aptitude, but when does it go beyond aptitude into your attitude about being pushed out of your comfort zone? I’d warned everyone at the start that it was new software, but if you haven’t experienced beta software before I guess you don’t have the context for understanding what that actually means.
I should make it clear here that I think the participants’ achievements outshine any shortcomings – Neatline is a great tool for people working with messy humanities data who want to go beyond plonking markers on Google Maps, and I think everyone got that, and most people enjoyed the chance to play with Neatline.
But more generally, I also wonder if it has to do with changing demographics in the digital humanities – increasingly, not everyone interested in DH is an early, or even a late adopter, and someone interested in DH for the funding possibilities and cool factor might not naturally enjoy unstructured exploration of new software, or be intrigued by trying out different combinations of content and functionality just ‘to see what happens’.
Practically, more information for people thinking of attending would be useful – ‘if you know x already, you’ll be fine; if you know y already, you’ll be bored’ would be useful in future. Describing an event as ‘if you like trying new software, this is for you’ would probably help, but it looks like the digital humanities might also now be attracting people who don’t particularly like working things out as they go along – are they to be excluded? If using software like this is the onboarding experience for people new to the digital humanities, they’re not getting the best first impression, but how do you balance the need for fast-moving innovative work-in-progress to be a bit hacky and untidy around the edges with the desires of a wider group of digital humanities-curious scholars? Is it ok to say ‘here be dragons, enter at your own risk’?
The short version: if you’ve got ideas on how museums, libraries and archives (i.e. GLAM) and the digital humanities can inspire and learn from each other, it’s your lucky day! Go add your ideas about concrete actions the Association for Computers and the Humanities can take to bring the two communities together or suggestions for a top ten ‘get started in museums and the digital humanities’ list (whether conference papers, journal articles, blogs or blog posts, videos, etc) to: ‘GLAM and Digital Humanities together FTW‘.
Update, August 23, 2012: the document is shaping up to be largely about ‘what can be done’ – which issues are shared by GLAMs and DH, how can we reach people in each field, what kinds of activities and conversations would be beneficial, how do we explain the core concepts and benefits of each field to the other? This suggests there’d be a useful second stage in focusing on filling in the detail around each of the issues and ideas raised in this initial creative phase. In the meantime, keep adding suggestions and sharing issues at the intersection of digital humanities and memory institutions.
A note on nomenclature: the genesis of this particular conversation was among museumy people so the original title of the document reflects that; it also reflects the desire to be practical and start with a field we knew well. The acronym GLAM (galleries, libraries, archives and museums) neatly covers the field of cultural heritage and the arts, but I’m never quite sure how effective it is as a recognisable call-to-action. There’s also a lot we could learn from the field of public history, so if that’s you, consider yourself invited to the party!
The longer version: in an earlier post from July’s Digital Humanities conference in Hamburg I mentioned that a conversation over twitter about museums and digital humanities lead to a lunch with @ericdmj, @clairey_ross, @briancroxall, @amyeetx where we discussed simple ways to help digital humanists get a sense of what can be learnt from museums on topics like digital projects, audience outreach, education and public participation. It turns out the Digital Humanities community is also interested in working more closely with museums, as demonstrated by the votes for point 3 of the Association for Computers and the Humanities (ACH)’s ‘Next Steps’ document, “to explore relationships w/ DH-sympathetic orgs operating beyond the academy (Museum Computer Network, Nat’l Council on Public History, etc)”. At the request of ACH’s Bethany Nowviskie (@nowviskie) and Stéfan Sinclair (@sgsinclair), Eric D. M. Johnson and I had been tossing around some ideas for concrete next steps and working up to asking people working at the intersection of GLAM and DH for their input.
However, last night a conversation on twitter about DH and museums (prompted by Miriam Posner‘s tweet asking for input on a post ‘What are some challenges to doing DH in the library?‘) suddenly took off so I seized the moment by throwing the outline of the document Eric and I had been tinkering with onto Google docs. It was getting late in the UK so I tweeted the link and left it so anyone could edit it. I came back the next morning to find lots of useful and interesting comments and additions and a whole list of people who are interested in continuing the conversation. Even better, people have continued to add to it today and it’s already a good resource. If you weren’t online at that particular time it’s easy to miss it, so this post is partly to act as a more findable marker for the conversation about museums, libraries, archives and the digital humanities.
Explaining the digital humanities to GLAMs
This definition was added to the document overnight. If you’re a GLAM person, does it resonate with you or does it need tweaking?
“The broadest definition would be 1) using digital technologies to answer humanities research questions, 2) studying born digital objects as a humanist would have studied physical objects, and or 3) using digital tools to transform what scholarship is by making it more accessible on the open web.”
How can you get involved?
Off the top of my head…
Add your name to the list of people interested in keeping up with the conversation
Read through the suggestions already posted; if you love an idea that’s already there, say so!
Read and share the links already added to the document
Suggest specific events where GLAM and DH people can mingle and share ideas/presentations
Suggest specific events where a small travel bursary might help get conversations started
Offer to present on GLAMs and DH at an event
Add examples of digital projects that bridge the various worlds
Add examples of issues that bridge the various worlds
Write case studies that address some of the issues shared by GLAMs and DH
Spread the word via specialist mailing lists or personal contacts
Share links to conference papers, journal articles, videos, podcasts, books, blog posts, etc, that summarise some of the best ideas in ways that will resonate with other fields
Consider attending or starting something like Decoding Digital Humanities to discuss issues in DH. (If you’re in or near Oxford and want to help me get one started, let me know!)
Something else I haven’t thought of…
I’m super-excited about this because everyone wins when we have better links between museums and digital humanities. Personally, I’ve spent a decade working in various museums (and their associated libraries and archives) and my PhD is in Digital Humanities (or more realistically, Digital History), and my inner geek itches to find an efficient solution when I see each field asking some of the same questions, or asking questions the other field has been working to answer for a while. This conversation has already started to help me discover useful synergies between GLAMs and DH, so I hope it helps you too.
Over time I’ve noticed the repetition of various misconceptions and apprehensions about crowdsourcing for cultural heritage and digital history, so since this is a large part of my PhD topic I thought I’d collect various resources together as I work to answer some FAQs. I’ll update this post over time in response to changes in the field, my research and comments from readers. While this is partly based on some writing for my PhD, I’ve tried not to be too academic and where possible I’ve gone for publicly accessible sources like blog posts rather than send you to a journal paywall.
[Last updated: February 2016, to address ‘crowdsourcing steals jobs’. Previous updates added a link to CCLA events, crowdsourcing projects to explore and a post on machine learning+crowdsourcing.]
What is crowdsourcing?
Definitions are tricky. Even Jeff Howe, the author of ‘Crowdsourcing’ has two definitions:
The White Paper Version: Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.
The Soundbyte Version: The application of Open Source principles to fields outside of software.
For many reasons, the term ‘crowdsourcing’ isn’t appropriate for many cultural heritage projects but the term is such neat shorthand that it’ll stick until something better comes along. Trevor Owens (@tjowens) has neatly problematised this in The Crowd and The Library:
‘Many of the projects that end up falling under the heading of crowdsourcing in libraries, archives and museums have not involved large and massive crowds and they have very little to do with outsourcing labor. … They are about inviting participation from interested and engaged members of the public [and] continue a long standing tradition of volunteerism and involvement of citizens in the creation and continued development of public goods’
Defining crowdsourcing in cultural heritage
To summarise my own thinking and the related literature, I’d define crowdsourcing in cultural heritage as an emerging form of engagement with cultural heritage that contributes towards a shared, significant goal or research area by asking the public to undertake tasks that cannot be done automatically, in an environment where the tasks, goals (or both) provide inherent rewards for participation.
Who is ‘the crowd’?
Good question! One tension underlying the ‘openness’ of the call to participate in cultural heritage is the fact that there’s often a difference between the theoretical reach of a project (i.e. everybody) and the practical reach, the subset of ‘everybody’ with access to the materials needed (like a computer and an internet connection), the skills, experience and time… While ‘the crowd’ may carry connotations of ‘the mob’, in ‘Digital Curiosities: Resource Creation Via Amateur Digitisation‘, Melissa Terras (@melissaterras) points out that many ‘amateur’ content creators are ‘extremely self motivated, enthusiastic, and dedicated’ and test the boundaries between ‘between definitions of amateur and professional, work and hobby, independent and institutional’ and quotes Leadbeater and Miller’s ‘The Pro-Am Revolution‘ on people who pursue an activity ‘as an amateur, mainly for the love of it, but sets a professional standard’.
There’s more and more talk of ‘community-sourcing’ in cultural heritage, and it’s a useful distinction but it also masks the fact that nearly all crowdsourcing projects in cultural heritage involve a community rather than a crowd, whether they’re the traditional ‘enthusiasts’ or ‘volunteers’, citizen historians, engaged audiences, whatever. That said, Amy Sample Ward has a diagram that’s quite useful for planning how to work with different groups. It puts the ‘crowd’ (people you don’t know), ‘network’ (the community of your community) and ‘community’ (people with a relationship to your organisation) in different rings based on their closeness to you.
‘The crowd’ is differentiated not just by their relationship to your organisation, or by their skills and abilities, but their motivation for participating is also important – some people participate in crowdsourcing projects for altruistic reasons, others because doing so furthers their own goals.
I’m worried about about crowdsourcing because…
…isn’t letting the public in like that just asking for trouble?
@lottebelice said she’d heard people worry that ‘people are highly likely to troll and put in bad data/content/etc on purpose’ – but this rarely happens. People worried about this with user-generated content, too, and while kids in galleries delight in leaving rude messages about each other, it’s rare online.
It’s much more likely that people will mistakenly add bad data, but a good crowdsourcing project should build any necessary data validation into the project. Besides, there are generally much more interesting places to troll than a cultural heritage site.
And as Matt Popke pointed out in a comment, ‘When you have thousands of people contributing to an entry you have that many more pairs of eyes watching it. It’s like having several hundred editors and fact-checkers. Not all of them are experts, but not all of them have to be. The crowd is effectively self-policing because when someone trolls an entry, somebody else is sure to notice it, and they’re just as likely to fix it or report the issue’. If you’re really worried about this, an earlier post on Designing for participatory projects: emergent best practice‘ has some other tips.
…doesn’t crowdsourcing take advantage of people?
Sadly, yes, some of the activities that are labelled ‘crowdsourcing’ do. Design competitions that expect lots of people to produce full designs and pay a pittance (if anything) to the winner are rightly hated. (See antispec.com for more and a good list of links).
But in cultural heritage, no. Museums, galleries, libraries, archives and academic projects are in the fortunate position of having interesting work that involves an element of social good, and they also have hugely varied work, from microtasks to co-curated research projects. Crowdsourcing is part of a long tradition of volunteering and altruistic participation, and to quote Owens again, ‘Crowdsourcing is a concept that was invented and defined in the business world and it is important that we recast it and think through what changes when we bring it into cultural heritage.’
[Update, May 2013: it turns out museums aren’t immune from the dangers of design competitions and spec work: I’ve written On the trickiness of crowdsourcing competitions to draw some lessons from the Sydney Design competition kerfuffle.]
“when you treat a crowd as disposable and anonymous, you prevent them from achieving their maximum ability. Disposable crowds create disposable output. Simply put: crowds need a sense of identity and community to achieve their potential.”
…crowdsourcing can’t be used for academic work
Reasons given include ‘humanists don’t like to share their knowledge’ with just anyone. And it’s possible that they don’t, but as projects like Transcribe Bentham and Trove show, academics and other researchers will share the work that helps produce that knowledge. (This is also something I’m examining in my PhD. I’ll post some early findings after the Digital Humanities 2012 conference in July).
Looking beyond transcription and other forms of digitisation, it’s worth checking out Prism, ‘a digital tool for generating crowd-sourced interpretations of texts’.
…it steals jobs
Once upon a time, people starting a career in academia or cultural heritage could get jobs as digitisation assistants, or they could work on a scholarly edition. Sadly, that’s not the case now, but that’s probably more to do with year upon year of funding cuts. Blame the bankers, not the crowdsourcers.
The good news? Crowdsourcing projects can create jobs – participatory projects need someone to act as community liaison, to write the updates that demonstrate the impact of crowdsourced contributions, to explain the research value of the project, to help people integrate it into teaching, to organise challenges and editathons and more.
So what’s the difference between crowdsourcing and user-generated content? The lines are blurry, but crowdsourcing is inherently productive – the point is to get a job done, whether that’s identifying people or things, creating content or digitising material.
Conversely, the value of user-generated content lies in the act of creating it rather than in the content itself – for example, museums might value the engagement in a visitor thinking about a subject or object and forming a response to it in order to comment on it. Once posted it might be displayed as a comment or counted as a statistic somewhere but usually that’s as far as it goes.
And @sherah1918 pointed out, there’s a difference between asking for assistance with tasks and asking for feedback or comments: ‘A comment book or a blog w/comments isn’t crowdsourcing to me … nor is asking ppl to share a story on a web form. That is a diff appr to collecting & saving personal histories, oral histories’.
Crowdfunding (it’s often just asking for micro-donations, though it seems that successful crowdfunding projects have a significant public engagement component, which brings them closer to the concerns of cultural heritage organisations. It’s also not that new. See Seventeenth-century crowd funding for one example.)
Data-mining social media and other content (though I’ve heard this called ‘passive’ or ‘implict’ crowdsourcing)
General calls for content, help or participation (see ‘user-generated content’) or vaguely asking people what they think about an idea. Asking for feedback is not crowdsourcing. Asking for help with your homework isn’t crowdsourcing, as it only benefits you.
Buzzwords applied to marketing online. And as @emmclean said, “I think many (esp mkting) see “crowdsourcing” as they do “viral” – just happens if you throw money at it. NO!!! Must be great idea” – it must make sense as a crowdsourced task.
Ok, so what’s different about crowdsourcing in cultural heritage?
‘The process of crowdsourcing projects fulfills the mission of digital collections better than the resulting searches… Far better than being an instrument for generating data that we can use to get our collections more used it is actually the single greatest advancement in getting people using and interacting with our collections. … At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory … it is about providing meaningful ways for the public to enhance collections while more deeply engaging and exploring them’.
[This was written in 2012. I’ve kept it for historical reasons but think differently now.]
First, another definition. As Fiona Romeo writes, ‘Citizen science projects use the time, abilities and energies of a distributed community of amateurs to analyse scientific data. In doing so, such projects further both science itself and the public understanding of science’. As Romeo points out in a different post, ‘All citizen science projects start with well-defined tasks that answer a real research question’, while citizen history projects rarely if ever seem to be based around specific research questions but are aimed more generally at providing data for exploration. Process vs product?
I’m still thinking through the differences between citizen science and citizen history, particularly where they meet in historical projects like Old Weather. Both citizen science and citizen history achieve some sort of engagement with the mindset and work of the equivalent professional occupations, but are the traditional differences between scientific and humanistic enquiry apparent in crowdsourcing projects? Are tools developed for citizen science suitable for citizen history? Does it make a difference that it’s easier to take a new interest in history further without a big investment in learning and access to equipment?
I have a feeling that ‘citizen science’ projects are often more focused on the production of data as accurately and efficiently as possible, and ‘citizen history’ projects end up being as much about engaging people with the content as it is about content production. But I’m very open to challenges on this…
What kind of cultural heritage stuff can be crowdsourced?
I wrote this list of ‘Activity types and data generated’ over a year ago for my Masters dissertation on crowdsourcing games for museums and a subsequent paper for Museums and the Web 2011, Playing with Difficult Objects – Game Designs to Improve Museum Collections (which also lists validation types and requirements). This version should be read in the light of discussion about the difference between crowdsourcing and user-generated content and in the context of things people can do with museums and with games, but it’ll do for now:
Tagging (e.g. steve.museum, Brooklyn Museum Tag! You’re It; variations include two-player ‘tag agreement’ games like Waisda?, extensions such as guessing games e.g. GWAP ESP Game, Verbosity, Tiltfactor Guess What?; structured tagging/categorisation e.g. GWAP Verbosity, Tiltfactor Cattegory)
Tags; folksonomies; multilingual term equivalents; structured tags (e.g. ‘looks like’, ‘is used for’, ‘is a type of’).
Debunking (e.g. flagging content for review and/or researching and providing corrections).
Linking (e.g. linking objects with other objects, objects to subject authorities, objects to related media or websites; e.g. MMG Donald).
Relationship data; contextualising detail; information on history, workings and use of objects; illustrative examples.
Stating preferences (e.g. choosing between two objects e.g. GWAP Matchin; voting on or ‘liking’ content).
Preference data; subsets of ‘highlight’ objects; ‘interestingness’ values for content or objects for different audiences. May also provide information on reason for choice.
Categorising (e.g. applying structured labels to a group of objects, collecting sets of objects or guessing the label for or relationship between presented set of objects).
Relationship data; preference data; insight into audience mental models; group labels.
Creative responses (e.g. write an interesting fake history for a known object or purpose of a mystery object.)
Relevance; interestingness; ability to act as social object; insight into common misconceptions.
You can also divide crowdsourcing projects into ‘macro’ and ‘micro’ tasks – giving people a goal and letting them solve it as they prefer, vs small, well-defined pieces of work, as in the ‘Umbrella of Crowdsourcing’ at The Daily Crowdsource and there’s a fair bit of academic literature on other ways of categorising and describing crowdsourcing.
Using crowdsourcing to manage crowdsourcing
There’s also a growing body of literature on ecosystems of crowdsourcing activities, where different tasks and platforms target different stages of the process. A great example is Brooklyn Museum’s ‘Freeze Tag!’, a game that cleans up data added in their tagging game. An ecosystem of linked activities (or games) can maximise the benefits of a diverse audience by providing a range of activities designed for different types of participant skills, knowledge, experience and motivations; and can encompass different levels of participation from liking, to tagging, finding facts and links.
A participatory ecosystem can also resolve some of the difficulties around validating specialist tags or long-form, more subjective content by circulating content between activities for validation and ranking for correctness, ‘interestingness’ (etc) by other players (see for example the ‘Contributed data lifecycle’ diagram on my MW2011 paper or the ‘Digital Content Life Cycle’ for crowdsourcing in Oomen and Aroyo’s paper below). As Nina Simon said in The Participatory Museum, ‘By making it easy to create content but impossible to sort or prioritize it, many cultural institutions end up with what they fear most: a jumbled mass of low-quality content’. Crowdsourcing the improvement of cultural heritage data would also make possible non-crowdsourcing engagement projects that need better content to be viable.
Platforms aimed at bootstrapping projects – that is, getting new projects up and running as quickly and as painlessly as possible – seem to be the next big thing. Designing tasks and interfaces suitable for mobile and tablets will allow even more of us to help out while killing time. There’s also a lot of work on the integration of machine learning and human computation; my post ‘Helping us fly? Machine learning and crowdsourcing‘ has more on this.
Find out how crowdsourcing in cultural heritage works by exploring projects
There’s a lot of academic literature on all kinds of aspects of crowdsourcing, but I’ve gone for sources that are accessible both intellectually and in terms of licensing. If a key reference isn’t there, it might be because I can’t find a pre-print or whatever outside a paywall – let me know if you know of one!
Thanks to everyone who responded to my call for their favourite ‘misconceptions and apprehensions about crowdsourcing (esp in history and cultural heritage)’, and to those who inspired this post in the first place by asking questions in various places about the negative side of crowdsourcing. I’ll update the post as I hear of more, so let me know your favourites. I’ll also keep adding links and resources as I hear of them.