Helping us fly? Machine learning and crowdsourcing

Image of a man in a flying contrapation powered by birds
Moon Machine by Bernard Brussel-Smith via Serendip-o-matic

Over the past few years we've seen an increasing number of projects that take the phrase 'human-computer interaction' literally (perhaps turning 'HCI' into human-computer integration), organising tasks done by people and by computers into a unified system. One of the most obvious benefits of crowdsourcing on digital platforms has been the ability to coordinate the distribution and validation of tasks. Increasingly, data manually classified through crowdsourcing is being fed into computers to improve machine learning so that computers can learn to recognise images or words almost as well as we do. I've outlined a few projects putting this approach to work below.

This creates new challenges for the future: if fun, easy tasks like image tagging and text transcription can be done by computers, what are the implications for cultural heritage and digital humanities crowdsourcing projects that used simple tasks as the first step in public engagement? After all, Fast Company reported that 'at least one Zooniverse project, Galaxy Zoo Supernova, has already automated itself out of existence'. What impact will this have on citizen science and history communities? How might machine learning free us to fly further, taking on more interesting tasks with cultural heritage collections?

The Public Catalogue Foundation has taken tags created through Your Paintings Tagger and achieved impressive results in the art of computer image recognition: 'Using the 3.5 million or so tags provided by taggers, the research team at Oxford 'educated' image-recognition software to recognise the top tagged terms'. All paintings tagged with a particular subject (e.g. 'horse') were fed into feature extraction processes to build an 'object model' of a horse (a set of characteristics that would indicate that a horse is depicted) then tested to see the system could correctly tag horses.

The BBC World Service archive used an 'open-source speech recognition toolkit to listen to every programme and convert it to text' and keywords then asked people to check the correctness of the data created (Algorithms and Crowd-Sourcing for Digital Archives, see also What we learnt by crowdsourcing the World Service archive).

The CUbRIK project combines 'machine, human and social computation for multimedia search' in their technical demonstrator, HistoGraph. The SOCIAM: The Theory and Practice of Social Machines project is looking at 'a new kind of emergent, collective problem solving', including 'citizen science social machines'.

And of course the Zooniverse is working on this, most recently with Galaxy Zoo. A paper summarised on their Milky Way project blog, outlines the powerful synergy between citizens scientists, professional scientists, and machine learning: 'citizens can identify patterns that machines cannot detect without training, machine learning algorithms can use citizen science projects as input training sets, creating amazing new opportunities to speed-up the pace of discovery', addressing the weakness of each approach if deployed alone.

Further reading: an early discussion of human input into machine learning is in Quinn and Bederson's 2011 Human Computation: A Survey and Taxonomy of a Growing Field. You can get a sense of the state of the field from various conference papers, including ICML ’13 Workshop: Machine Learning Meets Crowdsourcing and ICML ’14 Workshop: Crowdsourcing and Human Computing. There's also a mega-list of academic crowdsourcing conferences and workshops, though it doesn't include much on the tiny corner of the world that is crowdsourcing in cultural heritage.

Last update: March 2015. This post collects my thoughts on machine learning and human-computer integration as I finish my thesis. Do you know of examples I've missed, or implications we should consider?

Piloting a Participatory History Commons

I've been awarded a CENDARI Visiting Research Fellowship at Trinity College Dublin for a project called 'Bridging collections with a participatory Commons: a pilot with World War One archives'. I've posted my proposal at the link above, and when I start in September I'll post about my progress here. CENDARI have now published the list of all 2014 Fellows and a neat summary of the programme: 'The CENDARI Visiting Research Fellowships are intended to support and stimulate historical research in the two pilot areas of medieval European culture and the First World War, by facilitating access to key archives, specialist knowledge and collections in CENDARI host institutions'.

As I said in my post, 'it's an ambitious project which requires tackling community building, user experience design, historical materials and programming, and I'll be drawing on the expertise of many people'. I'll post as I go – but first, I'd best get back to finishing up my PhD thesis!

In the meantime, here's a small collection of things I've written as I think through what a participatory commons is and how it might work: my poster and talk notes for Herrenhausen conference and my keynote for Sharing is Caring, 'Enriching cultural heritage collections through a Participatory Commons platform: a provocation about collaborating with users'.

Sharing is caring keynote 'Enriching cultural heritage collections through a Participatory Commons'

Enriching cultural heritage collections through a Participatory Commons platform: a provocation about collaborating with users

Mia Ridge, Open University Contact me: @mia_out or https://miaridge.com/

[I was invited to Copenhagen to talk about my research on crowdsourcing in cultural heritage at the 3rd international Sharing is Caring seminar on April 1. I'm sharing my notes in advance to make life easier for those awesome people following along in a second or third language, particularly since I'm delivering my talk via video.]

Today I'd like to present both a proposal for something called the 'Participatory Commons', and a provocation (or conversation starter): there's a paradox in our hopes for deeper audience engagement through crowdsourcing: projects that don't grow with their participants will lose them as they develop new skills and interests and move on. This talk presents some options for dealing with this paradox and suggests a Participatory Commons provides a way to take a sector-wide view of active engagement with heritage content and redefine our sense of what it means when everybody wins.

I'd love to hear your thoughts about this – I'll be following the hashtag during the session and my contact details are above.

Before diving in, I wanted to reflect on some lessons from my work in museums on public engagement and participation.

My philosophy for crowdsourcing in cultural heritage (aka what I've learnt from making crowdsourcing games)

One thing I learnt over the past years: museums can be intimidating places. When we ask for help with things like tagging or describing our collections, people want to help but they worry about getting it wrong and looking stupid or about harming the museum.

The best technology in the world won't solve a single problem unless it's empathically designed and accompanied by social solutions. This isn't a talk about technology, it's a talk about people – what they want, what they're afraid of, how we can overcome all that to collaborate and work together.

Dora's Lost Data

So a few years ago I explored the potential of crowdsourcing games to make helping a museum less scary and more fun. In this game, 'Dora's Lost Data', players meet a junior curator who asks them to tag objects so they'll be findable in Google. Games aren't the answer to everything, but identifying barriers to participation is always important. You have to understand your audiences – their motivations for starting and continuing to participate; the fears, anxieties, uncertainties that prevent them participating. [My games were hacked together outside of work hours, more information is available at My MSc dissertation: crowdsourcing games for museums; if you'd like to see more polished metadata games check out Tiltfactor's http://www.metadatagames.org/#games]

Mutual wins – everybody's happy

My definition of crowdsourcing: cultural heritage crowdsourcing projects ask the public to undertake tasks that cannot be done automatically, in an environment where the activities, goals (or both) provide inherent rewards for participation, and where their participation contributes to a shared, significant goal or research area.

It helps to think of crowdsourcing in cultural heritage as a form of volunteering. Participation has to be rewarding for everyone involved. That sounds simple, but focusing on the audiences' needs can be difficult when there are so many organisational needs competing for priority and limited resources for polishing the user experience. Further, as many projects discover, participant needs change over time…

What is a Participatory Commons and why would we want one?

First, I have to introduce you to some people. These are composite stories (personas) based on my research…

Two archival historians, Simone and Andre. Simone travels to archives in her semester breaks to stock up on research material, taking photos of most documents 'in case they're useful later', transcribing key text from others. Andre is often at the next table, also looking for material for his research. The documents he collected for his last research project would be useful for Simone's current book but they've never met and he has no way of sharing that part of his 'personal research collection' with her. Currently, each of these highly skilled researchers take their cumulative knowledge away with them at the end of the day, leaving no trace of their work in the archive itself. Next…

Two people from a nearby village, Martha and Bob. They joined their local history society when they retired and moved to the village. They're helping find out what happened to children from the village school's class of 1898 in the lead-up to and during World War I. They are using census returns and other online documents to add records to a database the society's secretary set up in Excel. Meanwhile…

A family historian, Daniel. He has a classic 'shoebox archive' – a box containing his grandmother Sarah's letters and diary, describing her travels and everyday life at the turn of the century. He's transcribing them and wants to put them online to share with his extended family. One day he wants to make a map for his kids that shows all the places their great-grandmother lived and visited. Finally, there's…

Crowdsourcer Nisha.She has two young kids and works for a local authority. She enjoys playing games like Candy Crush on her mobile, and after the kids have gone to bed she transcribes ship logs on the Old Weather website while watching TV with her husband. She finds it relaxing, feels good about contributing to science and enjoys the glimpses of life at sea. Sites like Old Weather use 'microtasks' – tiny, easily accomplished tasks – and crowdsourcing to digitise large amounts of text.

Helping each other?

None of our friends above know it, but they're all looking at material from roughly the same time and place. Andre and Simone could help each other by sharing the documents they've collected over the years. Sarah's diaries include the names of many children from her village that would help Martha and Bob's project, and Nisha could help everyone if she transcribed sections of Sarah's diary.

Connecting everyone's efforts for the greater good: Participatory Commons

This image shows the two main aspects of the Participatory Commons: the different sources for content, and the activities that people can do with that content.

The Participatory Commons (image: Mia Ridge)

The Participatory Commons is a platform where content from different sources can be aggregated. Access to shared resources underlies the idea of the 'Commons', particularly material that is not currently suitable for sites like Europeana, like 'shoebox archives' and historians' personal record collections. So if the 'Commons' part refers to shared resources, how is it participatory?

The Participatory Commons interface supports a range of activities, from the types of tasks historians typically do, like assessing and contextualising documents, activities that specialists or the public can do like identifying particular people, places, events or things in sources, or typical crowdsourcing tasks like fulltext transcription or structured tagging.

By combining the energy of crowdsourcing with the knowledge historians create on a platform that can store or link to primary sources from museums, libraries and archives with 'shoebox archives', the Commons could help make our shared heritage more accessible to all. As a platform that makes material about ordinary people available alongside official archives and as an interface for enjoyable, meaningful participation in heritage work, the Commons could be a basis for 'open source history', redressing some of the absences in official archives while improving the quality of all records.

As a work in progress, this idea of the Participatory Heritage Commons has two roles: an academic thought experiment to frame my research, and as a provocation for GLAMs (galleries, museums, libraries, archives) to think outside their individual walls. As a vision for 'open source history', it's inspired by community archives, public history, participant digitisation and history from below… This combination of a large underlying repository and more intimate interfaces could be quite powerful. Capturing some of the knowledge generated when scholars access collections would benefit both archives and other researchers.

'Niche projects' can be built on a Participatory Commons

As a platform for crowdsourcing, the Participatory Commons provides efficiencies of scale in the backend work for verifying and validating contributions, managing user accounts, forums, etc. But that doesn't mean that each user would experience the same front-end interface.

Niche projects build on the Participatory Commons
(quick and dirty image: Mia Ridge)

My research so far suggests that tightly-focused projects are better able to motivate participants and create a sense of community. These 'niche' projects may be related to a particular location, period or topic, or to a particular type of material. The success of the New York Public Library's What's on the Menu project, designed around a collection of historic menus, and the British Library's GeoReferencer project, designed around their historic map collection, both demonstrate the value of defining projects around niche topics.

The best crowdsourcing projects use carefully designed interactions tailored to the specific content, audience and data requirements of a given project. These interactions are usually For example, the Zooniverse body of projects use much of the same underlying software but projects are designed around specific tasks on specific types of material, whether classifying simple galaxy types, plankton or animals on the Serengeti, or transcribing ship logs or military diaries.

The Participatory Commons is not only a collection of content, it also allows 'niche' projects to be layered on top, presenting more focused sets of content, and specialist interfaces designed around the content, audience and purpose.

Barriers

But there are still many barriers to consider, including copyright and technical issues and important cultural issues around authority, reliability, trust, academic credit and authorship. [There's more background on this at my earlier post on historians and the Participatory Commons and Early PhD findings: Exploring historians' resistance to crowdsourced resources.]

Now I want to set the idea of the Participatory Commons aside for a moment, and return to crowdsourcing in cultural heritage. I've been looking for factors in the success or otherwise of crowdsourcing projects, from grassroots, community-lead projects to big glamorous institutionally-lead sites.

I mentioned that Nisha found transcribing text relaxing. Like many people who start transcribing text, she found herself getting interested in the events, people and places mentioned in the text. Forums or other methods for participants to discuss their questions seem to help keep participants motivated, and they also provide somewhere for a spark of curiosity to grow (as in this forum post). We know that some people on crowdsourcing projects like Old Weather get interested in history, and even start their own research projects.

Crowdsourcing as gateway to further activity

You can see that happening on other crowdsourcing projects too. For example, Herbaria@Homeaims to document historical herbarium collections within museums based on photographs of specimen cards. So far participants have documented over 130,000 historic specimens. In the process, some participants also found themselves being interested in the people whose specimens they were documenting.

As a result, the project has expanded to include biographies of the original specimen collectors. It was able to accommodate this new interest through a project wiki, which has a combination of free text and structured data linking records between the transcribed specimen cards and individual biographies.

'Levels of Engagement' in citizen science

There's a consistent enough pattern in science crowdsourcing projects that there's a model from 'citizen science' that outlines different stages participants can move through, from undertaking simple tasks, joining in community discussion, through to 'working independently on self-identified research projects'.[1]

Is this 'mission accomplished'?

This is Nick Poole's word cloud based on 40 museum missionstatements. With words like 'enjoyment', 'access', 'learning' appearing in museum missions, doesn't this mean that turning transcribers into citizen historians while digitising and enhancing collections is a success? Well, yes, but…

Paths diverge; paradox ahead?

There's a tension between GLAM's desire to invite people to 'go deeper', to find their own research interests, to begin to become citizen historians; and the desire to ask people to help us with tasks set by GLAMs to help their work. Heritage organisations can try to channel that impulse to start research into questions about their own collections, but sometimes it feels like we're asking people to do our homework for us. The scaffolds put in place to help make tasks easier may start to feel like a constraint.

Who has agency?

If people move beyond simple tasks into more complex tasks that require a greater investment of time and learning, then issues of agency – participants' ability to make choices about what they're working on and why – start to become more important. Would Wikipedia have succeeded if it dictated what contributors had to write about? We shouldn't mistake volunteers for a workforce just because they can be impressively dedicated contributors.

Participatory project models

Turning again to citizen science – this time public participation in science research, we have a model for participatory projects according to the amount of control participants have over the design of the project itself – or to look at it another way, how much authority the organisation has ceded to the crowd. This model contains three categories: 'contributory', where the public contributes data to a project designed by the organisation; 'collaborative', where the public can help refine project design and analyse data in a project lead by the organisation; and 'co-creative', where the public can take part in all or nearly all processes, and all parties design the project together.[2]

As you can imagine, truly co-creative projects are rare. It seems cultural organisations find it hard to truly collaborate with members of the public; for many understandable reasons. The level of transparency required, and the investment of time for negotiating mutual interests, goals and capabilities increase as collaboration deepens. Institutional constraints and lack of time to engage in deep dialogue with participants make it difficult to find shared goals that work for all parties. It seems GLAMs sometimes try to take shortcuts and end up making decisions for the group, which means their 'co-creative' project is actually more just 'collaborative'.

New challenges

When participants start to out-grow the tasks that originally got them hooked, projects face a choice. Some projects are experimenting with setting challenges for participants. Here you see 'mysteries' set by the UK's Museum of Design in Plastics, and by San FranciscoPublic Library on History Pin. Finding the right match between the challenge set and the object can be difficult without some existing knowledge of the collection, and it can require a lot of on-going time to encourage participants. Putting the mystery under the nose of the person who has the knowledge or skills to solve it is another challenge that projects like this will have to tackle.

Working with existing communities of interest is a good start, but it also takes work to figure out where they hang out online (or in-person) and understand how they prefer to work. GLAMs sometimes fall into the trap of choosing the technology first, or trying something because it's trendy; it's better to start with the intersection between your content and the preferences of potential audiences.

But is it wishful thinking to hope that others will be interested in answering the questions GLAMs are asking?

A tension?

Should projects accept that some people will move on as they develop new interests, and concentrate on recruiting new participants to replace them? Do they try to find more interesting tasks or new responsibilities for participants, such as helping moderate discussions, or checking and validating other people's work? Or should they find ways for the project grow as participants' skill and knowledge increase? It's important to make these decisions mindfully as the default is otherwise to accept a level of turnover as participants move on.

To return to lessons from citizen science, possible areas for deeper involvement include choosing or defining questions for study, analysing or interpreting data and drawing conclusions, discussing results and asking new questions.[3]However, heritage organisations might have to accept that the questions people want to ask might not involve their collections, and that these citizen historians' new interests might not leave time for their previous crowdsourcing tasks.

Why is a critical mass of content in a Participatory Commons useful?

And now we return to the Participatory Commons and the question of why a critical mass of content would be useful.

Increasingly, the old divisions between museum, library and archive collections don't make sense. For most people, content is content, and they don't understand why a pamphlet about a village fete in 1898 would be described and accessed differently depending on whether it had ended up in a museum, library or archive catalogue.

Basing niche projects on a wider range of content creates opportunities for different types of tasks and levels of responsibility. Projects that provide a variety of tasks and roles can support a range of different levels and types of participant skills, availability, knowledge and experience.

A critical mass of material is also important for the discoverability of heritage content. Even the most sophisticated researcher turns to Google sometimes, and if your content doesn't come up in the first few results, many researchers will never know it exists. It's easy to say but less easy to make a reality: the easier it is to find your collections, the more likely it is that researchers will use them.

Commons as party?

More importantly, a critical mass of content in a Commons allows us to re-define 'winning'. If participation is narrowly defined as belonging to individual GLAMs, when a citizen historian moves onto a project that doesn't involve your collection then it can seem like you've lost a collaborator. But the people who developed a new research interest through a project at one museum might find they end up using records from the archive down the road, and transcribing or enhancing their records during their investigation. If all the institutions in the region shared their records on the Commons or let researchers take and share photos while using their collections, the researcher has a critical mass of content for their research and hopefully as a side-effect, their activities will improve links between collections. If the Commons allows GLAMs to take a sector-wide view then someone moving on to a different collection becomes a moment to celebrate, a form of graduation. In our wildest imagination, the Commons could be like a fabulous party where you never know what fabulous interesting people and things you'll discover…

To conclude – by designing platforms that allow people to collect and improve records as they work, we're helping everybody win.

Thank you! I'm looking forward to hearing your thoughts.


[1]M. Jordan Raddick et al., 'Citizen Science: Status and Research Directions for the Coming Decade', in astro2010: The Astronomy and Astrophysics Decadal Survey, vol. 2010, 2009, http://www8.nationalacademies.org/astro2010/DetailFileDisplay.aspx?id=454.

[2]Rick Bonney et al., Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report (Washington D.C.: Center for Advancement of Informal Science Education (CAISE), July 2009), http://caise.insci.org/uploads/docs/PPSR%20report%20FINAL.pdf.

[3]Bonney et al., Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report.


Image credits in order of appearance: Glider, Library of Congress, Great hall, Library of CongressCurzona Allport from Tasmanian Archive and Heritage Office, Hålanda Church, Västergötland, Sweden, Swedish National Heritage Board, Smithsonian Institution, Postmaster, General James A. Farley During National Air Mail Week, 1938Powerhouse Museum, Canterbury Bankstown Rugby League Football Club's third annual Ball.

Early PhD findings: Exploring historians' resistance to crowdsourced resources

I wrote up some early findings from my PhD research for conferences back in 2012 when I was working on questions around 'but will historians really use resources created by unknown members of the public?'. People keep asking me for copies of my notes (and I've noticed people citing an online video version which isn't ideal) and since they might be useful and any comments would help me write-up the final thesis, I thought I'd be brave and post my notes.

A million caveats apply – these were early findings, my research questions and focus have changed and I've interviewed more historians and reviewed many more participative history projects since then; as a short paper I don't address methods etc; and obviously it's only a huge part of a tiny topic… (If you're interested in crowdsourcing, you might be interested in other writing related to scholarly crowdsourcing and collaboration from my PhD, or my edited volume on 'Crowdsourcing our cultural heritage'.) So, with those health warnings out of the way, here it is. I'd love to hear from you, whether with critiques, suggestions, or just stories about how it relates to your experience. And obviously, if you use this, please cite it!

Exploring historians' resistance to crowdsourced resources

Scholarly crowdsourcing may be seen as a solution to the backlog of historical material to be digitised, but will historians really use resources created by unknown members of the public?

The Transcribe Bentham project describes crowdsourcing as 'the harnessing of online activity to aid in large scale projects that require human cognition' (Terras, 2010a). 'Scholarly crowdsourcing' is a related concept that generally seems to involve the collaborative creation of resources through collection, digitisation or transcription. Crowdsourcing projects often divide up large tasks (like digitising an archive) into smaller, more manageable tasks (like transcribing a name, a line, or a page); this method has helped digitise vast numbers of primary sources.

My doctoral research was inspired by a vision of 'participant digitization', a form of scholarly crowdsourcing that seeks to capture the digital records and knowledge generated when researchers access primary materials in order to openly share and re-use them. Unlike many crowdsourcing projects which are designed for tasks performed specifically for the project, participant digitization harnesses the transcription, metadata creation, image capture and other activities already undertaken during research and aggregates them to create re-usable collections of resources.

Research questions and concepts

When Howe clarified his original definition, stating that the 'crucial prerequisite' in crowdsourcing is 'the use of the open call format and the large network of potential laborers', a 'perfect meritocracy' based not on external qualifications but on 'the quality of the work itself', he created a challenge for traditional academic models of authority and credibility (Howe 2006, 2008). Furthermore, how does anonymity or pseudonymity (defined here as often long-standing false names chosen by users of websites) complicate the process of assessing the provenance of information on sites open to contributions from non-academics? An academic might choose to disguise their identity to mask their research activities from competing peers, from a desire to conduct early exploratory work in private or simply because their preferred username was unavailable; but when contributors are not using their real names they cannot derive any authority from their personal or institutional identity. Finally, which technical, social and scholarly contexts would encourage researchers to share (for example) their snippets of transcription created from archival documents, and to use content transcribed by others? What barriers exist to participation in crowdsourcing or prevent the use of crowdsourced content?

Methods

I interviewed academic and family/local historians about how they evaluate, use, and contribute to crowdsourced and traditional resources to investigate how a resource based on 'meritocracy' disrupts current notions of scholarly authority, reliability, trust, and authorship. These interviews aimed to understand current research practices and probe more deeply into how participants assess different types of resources, their feelings about resources created by crowdsourcing, and to discover when and how they would share research data and findings.

I sought historians investigating the same country and time period in order to have a group of participants who faced common issues with the availability and types of primary sources from early modern England. I focused on academic and 'amateur' family or local historians because I was interested in exploring the differences between them to discover which behaviours and attitudes are common to most researchers and which are particular to academics and the pressures of academia.

I recruited participants through personal networks and social media, and conducted interviews in person or on Skype. At the time of writing, 17 participants have been interviewed for up to 2 hours each. It should be noted that these results are of a provisional nature and represent a snapshot of on-going research and analysis.

Early results

I soon discovered that citizen historians are perfect examples of Pro-Ams: 'knowledgeable, educated, committed, and networked' amateurs 'who work to professional standards' (Leadbeater and Miller, 2004; Terras, 2010b).

How do historians assess the quality of resources?

Participants often simply said they drew on their knowledge and experience when sniffing out unreliable documents or statements. When assessing secondary sources, their tacit knowledge of good research and publication practices was evident in common statements like '[I can tell from] it's the way it's written'. They also cited the presence and quality of footnotes, and the depth and accuracy of information as important factors. Transcribed sources introduced another layer of quality assessment – researchers might assess a resource by checking for transcription errors that are often copied from one database to another. Most researchers used multiple sources to verify and document facts found in online or offline sources.

When and how do historians share research data and findings?

It appears that between accessing original records and publishing information, there are several key stages where research data and findings might be shared. Stages include acquiring and transcribing records, producing visualisations like family trees and maps, publishing informal notes and publishing synthesised content or analysis; whether a researcher passes through all the stages depends on their motivation and audience. Information may change formats between stages, and since many claim not to share information that has not yet been sufficiently verified, some information would drop out before each stage. It also appears that in later stages of the research process the size of the potential audience increases and the level of trust required to share with them decreases.

For academics, there may be an additional, post-publication stage when resources are regarded as 'depleted' – once they have published what they need from them, they would be happy to share them. Family historians meanwhile see some value in sharing versions of family trees online, or in posting names of people they are researching to attract others looking for the same names.

Sharing is often negotiated through private channels and personal relationships. Methods of controlling sharing include showing people work in progress on a screen rather than sending it to them and using email in preference to sharing functionality supplied by websites – this targeted, localised sharing allows the researcher to retain a sense of control over early stage data, and so this is one key area where identity matters. Information is often shared progressively, and getting access to more information depends on your behaviour after the initial exchange – for example, crediting the provider in any further use of the data, or reciprocating with good data of your own.

When might historians resist sharing data?

Participants gave a range of reasons for their reluctance to share data. Being able to convey the context of creation and the qualities of the source materials is important for historians who may consider sharing their 'depleted' personal archives – not being able to provide this means they are unlikely to share. Being able to convey information about data reliability is also important. Some information about the reliability of a piece of information is implicitly encoded in its format (for example, in pencil in notebooks versus electronic records), hedging phrases in text, in the number of corroborating sources, or a value judgement about those sources. If it is difficult to convey levels of 'certainty' about reliability when sharing data, it is less likely that people will share it – participants felt a sense of responsibility about not publishing (even informally) information that hasn't been fully verified. This was particularly strong in academics. Some participants confessed to sneaking forbidden photos of archival documents they ran out of time to transcribe in the archive; unsurprisingly it is unlikely they would share those images.

Overall, if historians do not feel they would get information of equal value back in exchange, they seem less likely to share. Professional researchers do not want to give away intellectual property, and feel sharing data online is risky because the protocols of citation and fair use are presently uncertain. Finally, researchers did not always see a point in sharing their data. Family history content was seen as too specific and personal to have value for others; academics may realise the value of their data within their own tightly-defined circles but not realise that their records may have information for other biographical researchers (i.e. people searching by name) or other forms of history.

Which concerns are particular to academic historians?

Reputational risk is an issue for some academics who might otherwise share data. One researcher said: 'we are wary of others trawling through our research looking for errors or inconsistencies. […] Obviously we were trying to get things right, but if we have made mistakes we don't want to have them used against us. In some ways, the less you make available the better!'. Scholarly territoriality can be an issue – if there is another academic working on the same resources, their attitude may affect how much others share. It is also unclear how academic historians would be credited for their work if it was performed under a pseudonym that does not match the name they use in academia.

What may cause crowdsourced resources to be under-used?

In this research, 'amateur' and academic historians shared many of the same concerns for authority, reliability, and trust. The main reported cause of under-use (for all resources) is not providing access to original documents as well as transcriptions. Researchers will use almost any information as pointers or leads to further sources, but they will not publish findings based on that data unless the original documents are available or the source has been peer-reviewed. Checking the transcriptions against the original is seen as 'good practice', part of a sense of responsibility 'to the world's knowledge'.

Overall, the identity of the data creator is less important than expected – for digitised versions of primary sources, reliability is not vested in the identity of the digitiser but in the source itself. Content found on online sites is tested against a set of finely-tuned ideas about the normal range of documents rather than the authority of the digitiser.

Cite as:

Ridge, Mia. “Early PhD Findings: Exploring Historians’ Resistance to Crowdsourced Resources.” Open Objects, March 19, 2014. https://www.openobjects.org.uk/2014/03/early-phd-findings-exploring-historians-resistance-to-crowdsourced-resources/.

References

Howe, J. (undated). Crowdsourcing: A Definition http://crowdsourcing.typepad.com

Howe, J. (2006). Crowdsourcing: A Definition. http://crowdsourcing.typepad.com/cs/2006/06/crowdsourcing_a.html

Howe, J. (2008). Join the crowd: Why do multinationals use amateurs to solve scientific and technical problems? The Independent. http://www.independent.co.uk/life-style/gadgets-and-tech/features/join-the-crowd-why-do-multinationals-use-amateurs-to-solve-scientific-and-technical-problems-915658.html

Leadbeater, C., and Miller, P. (2004). The Pro-Am Revolution: How Enthusiasts Are Changing Our Economy and Society. Demos, London, 2004. http://www.demos.co.uk/files/proamrevolutionfinal.pdf

Terras, M. (2010a) Crowdsourcing cultural heritage: UCL's Transcribe Bentham project. Presented at: Seeing Is Believing: New Technologies For Cultural Heritage. International Society for Knowledge Organization, UCL (University College London). http://eprints.ucl.ac.uk/20157/

Terras, M. (2010b). “Digital Curiosities: Resource Creation via Amateur Digitization.” Literary and Linguistic Computing 25, no. 4 (October 14, 2010): 425–438. http://llc.oxfordjournals.org/cgi/doi/10.1093/llc/fqq019

2013 in review: crowdsourcing, digital history, visualisation, and lots and lots of words

A quick and incomplete summary of my 2013 for those days when I wonder where the year went… My PhD was my main priority throughout the year, but the slow increase in word count across my thesis is probably only of interest to me and my supervisors (except where I've turned down invitations to concentrate on my PhD). Various other projects have spanned the years: my edited volume on 'Crowdsourcing our Cultural Heritage', working as a consultant on the 'Let's Get Real' project with Culture24, and I've continued to work with the Open University Digital Humanities Steering Group, ACH and to chair the Museums Computer Group.

In January (and April/June) I taught all-day workshops on 'Data Visualisation for Analysis in Scholarly Research' and 'Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions' for the British Library's Digital Scholarship Training Programme.

In February I was invited to give a keynote on 'Crowd-sourcing as participation' at iSay: Visitor-Generated Content in Heritage Institutions in Leicester (my event notes). This was an opportunity to think through the impact of the 'close reading' people do while transcribing text or describing images, crowdsourcing as a form of deeper engagement with cultural heritage, and the potential for 'citizen history' this creates (also finally bringing together my museum work and my PhD research). This later became an article for Curator journal, From Tagging to Theorizing: Deepening Engagement with Cultural Heritage through Crowdsourcing (proof copy available at http://oro.open.ac.uk/39117). I also ran a workshop on 'Data visualisation for humanities researchers' with Dr. Elton Barker (one of my PhD supervisors) for the CHASE 'Going Digital' doctoral training programme.

In March I was in the US for THATCamp Feminisms in Claremont, California (my notes), to do a workshop on Data visualisation as a gateway to programming and I gave a paper on 'New Challenges in Digital History: Sharing Women's History on Wikipedia' at the Women's History in the Digital World' conference at Bryn Mawr, Philadelphia (posted as 'New challenges in digital history: sharing women's history on Wikipedia – my draft talk notes'). I also wrote an article for Museum Identity magazine, Where next for open cultural data in museums?.

In April I gave a paper, 'A thousand readers are wanted, and confidently asked for': public participation as engagement in the arts and humanities, on my PhD research at Digital Impacts: Crowdsourcing in the Arts and Humanities (see also my notes from the event), and a keynote on 'A Brief History of Open Cultural Data' at GLAM-WIKI 2013.

In May I gave an online seminar on crowdsourcing (with a focus on how it might be used in teaching undergraduates wider skills) for the NITLE Shared Academics series. I gave a short paper on 'Digital participation and public engagement' at the London Museums Group's 'Museums and Social Media' at Tate Britain on May 24, and was in Belfast for the Museums Computer Group's Spring meeting, 'Engaging Visitors Through Play' then whipped across to Venice for a quick keynote on 'Participatory Practices: Inclusion, Dialogue and Trust' (with Helen Weinstein) for the We Curate kick-off seminar at the start of June.

In June the Collections Trust and MCG organised a Museum Informatics event in York and we organised a 'Failure Swapshop' the evening before. I also went to Zooniverse's ZooCon (my notes on the citizen science talks) and to Canterbury Cathedral Archives for a CHASE event on 'Opening up the archives: Digitization and user communities'.

In July I chaired a session on Digital Transformations at the Open Culture 2013 conference in London on July 2, gave an invited lightning talk at the Digital Humanities Oxford Summer School 2013, ran a half-day workshop on 'Designing successful digital humanities crowdsourcing projects' at the Digital Humanities 2013 conference in Nebraska, and had an amazing time making what turned out to be Serendip-o-matic at the Roy Rosenzweig Center for History and New Media at George Mason University's One Week, One Tool in Fairfax, Virginia (my posts on the process), with a museumy road trip via Amtrak and Greyhound to Chicago, Cleveland, Pittsburg inbetween the two events.

In August I tidied up some talk notes for publication as 'Tips for digital participation, engagement and crowdsourcing in museums' on the London Museums Group blog.

October saw the publication of my Curator article and Creating Deep Maps and Spatial Narratives through Design with Don Lafreniere and Scott Nesbit for the International Journal of Humanities and Arts Computing, based on our work at the Summer 2012 NEH Advanced Institute on Spatial Narrative and Deep Maps: Explorations in the Spatial Humanities. (I also saw my family in Australia and finally went to MONA).

In November I presented on 'Messy understandings in code' at Speaking in Code at UVA's Scholars' Lab, Charlottesville, Virginia, gave a half-day workshop on 'Data Visualizations as an Introduction to Computational Thinking' at the University of Manchester and spoke at the Digital Humanities at Manchester conference the next day. Then it was down to London for the MCG's annual conference, Museums on the Web 2013 at Tate Modern. Later than month I gave a talk on 'Sustaining Collaboration from Afar' at Sustainable History: Ensuring today's digital history survives.

In December I went to Hannover, Germany for the Herrenhausen Conference: "(Digital) Humanities Revisited – Challenges and Opportunities in the Digital Age" where I presented on 'Creating a Digital History Commons through crowdsourcing and participant digitisation' (my lightning talk notes and poster are probably the best representation of how my PhD research on public engagement through crowdsourcing and historians' contributions to scholarly resources through participant digitisation are coming together). In final days of 2013, I went back to my old museum metadata games, and updated them to include images from the British Library and took a first pass at making them responsive for mobile and tablet devices.

DHOxSS: 'From broadcast to collaboration: the challenges of public engagement in museums'

I'm just back from giving at a lightning talk for the Cultural Connections strand of the Digital.Humanities@Oxford Summer School 2013, and since the projector wasn't working to show my examples during my talk I thought I'd share my notes (below) and some quick highlights from the other presentations.

Mark Doffman said that it's important that academic work challenges and provokes, but make sure you get headlines for the right reasons, but not e.g. on how much the project costs. He concluded that impact is about provocation, not just getting people to say your work is wonderful.

Gurinder Punn of the university's Isis Innovation made the point that intellectual property and expertise can be transferred into businesses by consulting through your department or personally. (And it's not just for senior academics – one of the training sessions offered to PhD students at the Open University is 'commercialising your research').

Giles Bergel @ChapBookPro spoke on the Broadside Ballads Online (blog), explaining that folksong scholarship is often outside academia – there's a lot of vernacular scholarship and all sorts of domain specialists including musicians. They've considered crowdsourcing but want to be in a position to take the contributions as seriously as any print accession. They also have an image-match demonstrator from Oxford's Visual Geometry Group which can be used to find similar images on different ballad sheets.

Christian von Goldbeck-Stier offered some reflections on working with conductors as part of his research on Wagner. And perfectly for a summer's day:

Christian quotes Wilde on beauty: "one of the great facts of the world, like sunlight, or springtime…" http://t.co/8qGE9tLdBZ #dhoxss
— Pip Willcox (@pipwillcox) July 11, 2013

My talk notes: 'From broadcast to collaboration: the challenges of public engagement in museums'

I’m interested in academic engagement from two sides – for the past decade or so I was a museum technologist; now I’m a PhD student in the Department of History at the Open University, where I’m investigating the issues around academic and ‘amateur’ historians and scholarly crowdsourcing.

As I’ve moved into academia, I’ve discovered there’s often a disconnect between academia and museum practice (to take an example I know well), and that their different ways of working can make connecting difficult, even before they try to actually collaborate. But it’s worth it because the reward is more relevant, cutting-edge research that directly benefits practitioners in the relevant fields and has greater potential impact.

I tend to focus on engagement through participation and crowdsourcing, but engagement can be as simple as blogging about your work in accessible terms: sharing the questions that drive your research, how you’ve come to some answers, and what that means for the world at large; or writing answers to common questions from the public alongside journal articles.

Plan it

For a long time, museums worked with two publics: visitors and volunteers. They’d ask visitors what they thought in ‘have your say’ interactives, but to be honest, they often didn’t listen to the answers. They’d also work with volunteers but sometimes they valued their productivity more than they valued their own kinds of knowledge. But things are more positive these days – you've already heard a lot about crowdsourcing as a key example of more productive engagement.

Public engagement works better when it’s incorporated into a project from the start. Museums are exploring co-curation – working with the public to design exhibitions. Museums are recognising that they can’t know everything about a subject, and figuring out how to access knowledge ‘out there’ in the rest of the world. In the Oramics project at the Science Museum (e.g. Oramics to Electronica or Engaging enthusiasts online), electronic musicians were invited to co-curate an exhibition to help interpret an early electronic instrument for the public. 

There’s a model from 'Public Participation in Scientific Research' (or 'citizen science') I find useful in my work when thinking about how much agency the public has in a project, and it's also useful for planning engagement projects. Where can you benefit from questions or contributions from the public, and how much control are you willing to give up? 

Contributory projects designed by scientists, with participants involved primarily in collecting samples and recording data; Collaborative projects in which the public is also involved in analyzing data, refining project design, and disseminating findings; Co-created projects are designed by scientists and members of the public working together, and at least some of the public participants are involved in all aspects of the work. (Source: Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education (full report, PDF, 3 MB))

Do it

Museums have learnt that engaging the public means getting out of their venues (and their comfort zones). One example is Wikipedians-in-Residence, including working with Wikipedians to share images, hold events and contribute to articles. (e.g. The British Museum and MeA Wikipedian-in-Residence at the British MuseumThe Children's Museum's Wikipedian in Residence). 
It’s not always straightforward – museums don’t do ‘neutral’ points of view, which is a key goal for Wikipedia. Museums are object-centric, Wikipedia is knowledge-centric. Museums are used to individual scholarship and institutional credentials, Wikipedia is consensus-driven and your only credentials are your editing history and your references. Museums are slowly learning to share authority, to trust the values of other platforms. You need to invest time to learn what drives the other groups, how to talk with them and you have to be open to being challenged.

Mean it

Done right, engagement should be transformative for all sides. According to the National Co-ordinating Centre for Public Engagement, engagement ‘is by definition a two-way process, involving interaction and listening, with the goal of generating mutual benefit.’ Saying something is ‘open to the public’ is easy; making efforts to make sure that it’s intellectually and practically accessible takes more effort; active outreach is a step beyond open. It's not the same as marketing – it may use the same social media channels, but it's a conversation, not a broadcast. It’s hard to fake being truly engaged (and it's rude) so you have to mean it – doing it cynically doesn't help anyone.

Asking people to do work that helps your mission is a double win. For example, Brooklyn Museum's 'Freeze Tagask members of their community to help moderate tags entered by people elsewhere – they're trusting members of the community to clean up content for them.

Enjoy it

My final example is the National Library of Ireland on Flickr Commons, who do a great job of engaging people in Irish history, partly through their enthusiasm for the subject and partly through the effort they put into collating comments and updating their records, showing how much they value contributions. 

Almost by definition, any collaboration around engagement will be with people who are interested in your work, and they’ll bring new perspectives to it. You might end up working with international peers, academics from different disciplines, practitioner groups, scholarly amateurs or kids from the school down the road. And it’s not all online – running events is a great way to generate real impact and helps start conversations with potential for future collaboration.

You might benefit too! Talking about your research sometimes reminds you why you were originally interested in it… It’s a way of looking back and seeing how far you’ve come. It’s also just plain rewarding seeing people benefit from your research, so it's worth doing well.


Thanks again to Pip Willcox for the invitation to speak, and to the other speakers for their fascinating perspectives.  Participation and engagement lessons from cultural heritage and academia is a bit of a hot topic at the moment – there's more on it (including notes from a related paper I gave with Helen Weinstein) at Participatory Practices.

Setting off small fireworks: leaving space for curiosity

Remember when blog posts didn't need titles, didn't need to be long or take ages to write, and had nothing to do with your 'personal brand'? I've realised that while I'm writing up the PhD I'll barely blog at all if I don't blog like it's 2007 and just share interesting stuff when I've got a moment. Here goes…

I've been interested in the role of curiosity in engaging people with museum collections since I evaluated museum 'tagging' crowdsourcing games for my MSc project and learnt that the randomness of the objects presented made players really curious about what would appear next, and in turn that curiosity was one reason they kept playing. It turns out other metadata game designers have noticed the same effect. Flanagan and Carini (2012) wrote: 'Curiosity and doubt are key design opportunities. … In a number of instances, players became so curious about the images they were tagging that they would tag images with inquiry phrases, such as "want to know more about this culture."'

I returned to 'curiosity' for a talk I gave at the iSay conference in Leicester, where I related it to Raddick et al's (2009) 'Levels of Engagement' in citizen science, where Level 2 participation in community discussion (e.g. forums on crowdsourcing sites) and Level 3 is 'working independently on self-identified research projects'. To me, this suggested you should leave room for curiosity and wonder to develop – it might turn into a new personal journey for the participant or visitor, or even a new research question for a crowdsourcing project.

The reason I'm posting now is that I just came across Langer's definition of 'mindfulness': 'the "state of mind that results from drawing novel distinctions, examining information from new perspectives, and being sensitive to context. It is an open, creative, probabilistic state of mind in which the individual might be led to finding differences among things thought similar and similarities among things thought different" (Langer 1993, p.44).' in Csikszentmihalyi and Hermanson (1995). Further:

'Exhibits that facilitate mindfulness display information in context and present various viewpoints. For example, Langer (1993, p.47) contrasts the statement "The three main reasons for the Civil War were…" with the statement "From the perspective of the white male living in the twentieth century, the main reasons for the Civil War were…" (p.47). The latter approach calls for thoughtful comparisons. For example, How did women feel during the Civil War? the old? the old from the North? the black male today? and so on.'

I don't know about you, but my curiosity was piqued and my mind started going in lots of different directions. The second question carefully creates a gap just big enough to let a hundred new questions through and is a brilliant example of why both museum interpretation and participatory projects should leave room for curiosity…

Works cited:

  • Csikszentmihalyi, Mihaly, and Kim Hermanson. 1995. “Intrinsic Motivation in Museums: Why Does One Want to Learn?” In Public Institutions for Personal Learning: Establishing a Research Agenda, edited by John Falk and Lynn D. Dierking, 66 – 77. Washington D.C.: American Association of Museums. [This is seriously ace, track down a copy if you can]
  • Flanagan, Mary, and Peter. 2012. “How Games Can Help Us Access and Understand Archival Images.” American Archivist 75 (2): 514–537.
  • Raddick, M. Jordan, Georgia Bracey, K. Carney, G. Gyuk, K. Borne, J. Wallin, and S Jacoby. 2009. “Citizen Science: Status and Research Directions for the Coming Decade.” In Astro2010: The Astronomy and Astrophysics Decadal Survey. Vol. 2010. http://www8.nationalacademies.org/astro2010/DetailFileDisplay.aspx?id=454.

(Ok, so a post with references is not exactly blogging like it's 2006, but you've got to start somewhere…)
(Someone is literally setting off fireworks somewhere nearby. I have no idea why.)
(And yeah, I am working on a Saturday night. Friends don't let friends do PhDs, innit.)

We're all looking at the stars: citizen science projects at ZooCon13

Last Saturday I escaped my desk to head to the Physics department at the University of Oxford and be awed by what we're learning about space (and more terrestrial subjects) through citizen science projects run by Zooniverse at ZooCon13. All the usual caveats about notes from events apply – in particular, assume any errors are mine and that everyone was much more intelligent and articulate than my notes make them sound. These notes are partly written for people in cultural heritage and the humanities who are interested in the design of crowdsourcing projects, and while I enjoyed the scientific presentations I am not even going to attempt to represent them!  Chris Lintott live-blogged some of the talks on the day, so check out 'Live from ZooCon' for more. If you're familiar with citizen science you may well know a lot of these examples already – and if you're not, you can't really go wrong by looking at Zooniverse projects.

Aprajita Verma kicked off with SpaceWarps and 'Crowd-sourcing the Discovery of Gravitational Lenses with Citizen Scientists'. She explained the different ways gravitational lenses show up in astronomical images, and that 'strong gravitational lensing research is traditionally very labour-intensive' – computer algorithms generate lots of false positives, so you need people to help. SpaceWarps includes some simulated lenses (i.e. images of the sky with lenses added), mostly as a teaching tool (to provide more examples and increase familiarity with what lenses can look like) but also to make it more interesting for participants. The SpaceWarps interface lets you know when you've missed a (simulated, presumably) lens as well as noting lenses you've marked. They had 2 million image classifications in the first week, and 8500 citizen scientists have participated so far, 40% of whom have participated in 'Talk', the discussion feature. As discussed in their post 'What happens to your markers? A look inside the Space Warps Analysis Pipeline', they've analysed the results so far on ranges between astute/obtuse and pessimistic/optimistic markers – it turns out most people are astute. Each image is reviewed by ten people, so they've got confidence in the results.

Karen Masters talked about 'Cosmic Evolution in the Galaxy Zoo', taking us back to the first Galaxy Zoo project's hopes to have 30,000 volunteers and contrasting that with subsequent peer-reviewed papers that thanked 85,000, or 160,000 or 200,000 volunteers. The project launched in 2007 (before the Zooniverse itself) to look at spiral vs elliptical galaxies and it's all grown from there. The project has found rare objects, most famously the pea galaxies, and as further proof that the Zooniverse is doing 'real science online', the team have produced 36 peer reviewed paper, some with 100+ citations. At least 50 more papers have been produced by others using their data.

Phil Brohan discussed 'New Users for Old Weather'. The Old Weather project is using data from historic ships logs to help answer the question 'is this climate change or just weather?'. Some data was already known but there's a 'metaphorical fog' from missing observations from the past. Since the BBC won't let him put a satellite in a Tardis, they've been creative about finding other sources to help lift 'the fog of ignorance'. This project has long fascinated me because it started off all about science: in Phil's words, 'when we started all this, I was only thinking about the weather', but ended up being about history as well: 'these documents are intrinsically interesting'– he learnt what else was interesting about the logs from project participants who discovered the stories of people, disasters and strange events that lay within them. The third thing the project has generated (after weather and history) is 'a lot of experts'. One example he gave was evidence of the 1918-19 Spanish flu epidemic on board ship, which was investigated after forum posts about it. There's still a lot to do – more logs, including possibly French and Dutch – to come, and things would ideally speed up 'by a factor of ten'.

In Brooke Simmons' talk on 'Future plans for Galaxy Zoo', she raised the eternal issue of what to call participants in crowdsourcing: 'just call everyone collaborators'. 'Citizen scientists' makes a distinction between paid and unpaid scientists, as does 'volunteers'. She wants to help people do their own science, and they're working on making it easier than downloading and learning how to use more complicated tools. As an example, she talked about people collecting 'galaxies with small bulges' and analysing the differences in bulges (like a souped-up Galaxy Zoo Navigator?). She also talked about Zoo Teach, with resources for learning at all ages.

After the break we learnt about 'The Planet 4 Invasion', the climate and seasons of Mars from Meg Schwamb and about Solar Stormwatch in 'Only you can save planet Earth!' from Chris Davis, who was also presenting research from his student Kim Tucker-Wood (sp?). Who knew that solar winds could take the tail off a comet?!

Next up was Chris Lintott on 'Planet Hunting with and without Kepler'. Science communication advice says 'don't show people graphs', and since Planet Hunters is looking at graphs for fun, he thought no-one would want to do Planet Hunters. However, the response has surprised him. And 'it turns out that stars are actually quite interesting as well'. In another example of participants going above and beyond the original scope of the project, project participants watched a talk streamed online on 'heartbeat binaries', and went and found 30 of them from archives, their own records and posted them on the forum.  Now a bunch of Planet Hunters are working with Kepler team to follow them up.  (As an aside, he showed a screenshot of a future journal paper – the journal couldn't accept the idea that you could be a Planet Hunter and not be part of an academic team so they're listed as the Department of Astronomy at Yale.)

The final speaker was Rob Simpson on 'The Future of the Zooniverse'.  To put things in context, he said the human race spends 16 years cumulatively playing the game Angry Birds every day; people spend 2 months every day on the Zooniverse. In the past year, the human race spent 52 years on the Zooniverse's 15 live projects (they've had 23 projects in total). The Andromeda project went through all their data in 22 days – other projects take longer, but still attract dedicated people.  In the Zooniverse's immediate future are 'tools for (citizen) scientists' – adding the ability to do analysis in the browser, 'because people have a habit of finding things, just by being given access to the data'. They're also working on 'Letters' – public versions of what might otherwise be detailed forum posts that can be cited, and as a form of publication, it puts them 'in the domain'.  They're helping people communicate with each other and embracing their 'machine overlords', using Galaxy Zoo as a training tool for machine learning.  As computers get more powerful, the division of work between machines and people will change, perhaps leaving the beautiful, tricky, or complex bits for humans. [Update, June 29, 2013: Rob's posted about his talk on the Zooniverse blog, '52 Years of Human Effort', and corrected his original figure of 35 years to 52 years of human effort.]

At one point a speaker asked who in the room was a moderator on a Zooniverse project, and nearly everyone put their hand up. I felt a bit like giving them a round of applause because their hard work is behind the success of many projects. They're also a lovely, friendly bunch, as I discovered in the pub afterwards.

Conversations in the pub also reminded me of the flipside of people learning so much through these projects – sometimes people lose interest in the original task as their skills and knowledge grow, and it can be tricky to find time to contribute outside of moderating.  After a comment by Chris at another event I've been thinking about how you might match people to crowdsourcing projects or tasks – sometimes it might be about finding something that suits their love of the topic, or that matches the complexity or type of task they've previously enjoyed, or finding another unusual skill to learn, or perhaps building really solid stepping stones from their current tasks to more complex ones. But it's tricky to know what someone likes – I quite like transcribing text on sites like Trove or Notes from Nature, but I didn't like it much on Old Weather. And my own preferences change – I didn't think much of Ancient Lives the first time I saw it, but on another occasion I ended up getting completely absorbed in the task. Helping people find the right task and project is also a design issue for projects that have built an 'ecosystem' of parts that contribute to a larger programme, as discussed in 'Using crowdsourcing to manage crowdsourcing' in Frequently Asked Questions about crowdsourcing in cultural heritage and 'A suite of museum metadata games?' in Playing with Difficult Objects – Game Designs to Improve Museum Collections.

An event like ZooCon showed how much citizen science is leading the way – there are lots of useful lessons for humanities and cultural heritage crowdsourcing. If you've read this thinking 'I'd love to try it for my data, but x is a problem', try talking to someone about it – often there are computational techniques for solving similar problems, and if it's not already solved it might be interesting enough that people want to get involved and work with you on it.

On the trickiness of crowdsourcing competitions: some lessons from Sydney Design

I generally maintain a diplomatic silence about crowdsourcing competitions when I'm talking about crowdsourcing in cultural heritage as I believe spec work (or asking people to invest time in creating designs then paying just one 'winner') is unethical, and it's really tricky for design competitions to avoid looking like 'spec work'. I discovered this for myself when I ran the 'Cosmic Collections' mashup competition, so I have a lot of sympathy for museums who unknowingly get it wrong when experimenting with crowdsourcing. I also tend not to talk about poorly conceived or executed crowdsourcing projects as it doesn't seem fair to single out cultural heritage institutions that were trying to do the right thing against odds that ended up beating them, but I think the lessons to be drawn from the Sydney Design festival's competition are important enough to discuss here.

'Is it a free poster yet?'
'Is it a free poster yet?'

A crowdsourcing competition model that the museum had previously applied successfully (the Lace Award and Trainspotting, with prizes up to $AUD20,000 and display in the exhibition for winning designs) had a very different reception when the context and rewards changed. When the Powerhouse Museum's design competition to produce the visual identity for the Sydney Design festival was launched with a $US1000 prize, the design community's sensitivity to spec work and 'free pitching' was triggered, and they started throwing in some sarcastic responses.  The public feedback loop created as people could see previous designs and realised their own would also be featured on the site had a 4Chan-ish feel of a fun new meme about it, and once the norm of satirical responses was set, it was only going to escalate.

More importantly, there was a sense that Sydney Design was pulling a swifty. As Kate Sweetapple puts it in How the Sydney Design festival poster competition went horribly wrong:

'The fundamental difference [to the previous competitions], however, is that by running the competition, the Museum pulled a substantial job – worth tens of thousands of dollars – out of the professional marketplace. The submissions to Love Lace and Trainspotting did not have a commercial context one year, and none the next.'

If the previous reward was mostly monetary, offering a lesser intrinsic reward in exchange for a previously extrinsic reward is unlikely to work. If there's a bigger reward than than the competition brief itself would suggest, one important lesson is to make it unavoidably obvious. In this case, the Sydney Design Team's response said 'the Museum would have engaged the winning designer for further work and remuneration required to roll out the winning design into a more comprehensive marketing campaign', but this wasn't clear in the original brief. Many museum competitions display highly-ranked entries in their gallery spaces, and being exhibited in the museum or festival spaces might have been another form of valid reward, but only if it worked as an aspiration for the competition's audience, who in this case might well have a breadth of experience and exposure that rendered it less valuable.

Finally, in working with museums online, I've noticed the harshness of criticism is often proportionate to how deeply people care about you or identify you with certain values they hold dear.  When you're a beloved institution, people who care deeply about you feel betrayed when you get things wrong. As one commentator said in With friends like these, who needs enemies?, 'Sydney Design are meant to be in our corner'. If you regard critics as 'critical friends' you can turn the relationship around (as Merel van der Vaart discusses in the 'Opening up' section of her post on lessons from the Science Museum's Oramics exhibition) and build an even stronger relationship with them. Maybe Sydney Design can still turn this around…

Notes from 'Crowdsourcing in the Arts and Humanities'

Last week I attended a one-day conference, 'Digital Impacts: Crowdsourcing in the Arts and Humanities' (#oxcrowd), convened by Kathryn Eccles of Oxford's Internet Institute, and I'm sharing my (sketchy, as always) notes in the hope that they'll help people who couldn't attend.

Stuart Dunn reported on the Humanities Crowdsourcing scoping report (PDF) he wrote with Mark Hedges and noted that if we want humanities crowdsourcing to take off we should move beyond crowdsourcing as a business model and look to form, nurture and connect with communities.  Alice Warley and Andrew Greg presented a useful overview of the design decisions behind the Your Paintings Tagger and sparked some discussion on how many people need to view a painting before it's 'completed', and the differences between structured and unstructured tagging. Interestingly, paintings can be 'retired' from the Tagger once enough data has been gathered – I personally think the inherent engagement in tagging is valuable enough to keep paintings taggable forever, even if they're not prioritised in the tagging interface.  Kate Lindsay brought a depth of experience to her presentation on 'The Oxford Community Collection Model' (as seen in Europeana 1914-1918 and RunCoCo's 2011 report on 'How to run a community collection online' (PDF)). Some of the questions brought out the importance of planning for sustainability in technology, licences, etc, and the role of existing networks of volunteers with the expertise to help review objects on the community collection days.  The role of the community in ensuring the quality of crowdsourced contributions was also discussed in Kimberly Kowal's presentation on the British Library's Georeferencer project. She also reflected on what she'd learnt after the first phase of the Georeferencer project, including that the inherent reward of participating in the activity was a bigger motivator than competitiveness, and the impact on the British Library itself, which has opened up data for wider digital uses and has more crowdsourcing projects planned. I gave a paper which was based on an earlier version, The gift that gives twice: crowdsourcing as productive engagement with cultural heritage, but pushed my thinking about crowdsourcing as a tool for deep engagement with museums and other memory organisations even further. I also succumbed to the temptation to play with my own definitions of crowdsourcing in cultural heritage: 'a form of engagement that contributes towards a shared, significant goal or research question by asking the public to undertake tasks that cannot be done automatically' or 'productive public engagement with the mission and work of memory institutions'.

Chris Lintott of Galaxy Zoo fame shared his definition of success for a crowdsourcing/citizen science project: it has to produce results of value to the research community in less time than could have been done by other means (i.e. it must have been able to achieve something with crowd that couldn't have without them) and discussed how the Ancient Lives project challenged that at first by turning 'a few thousand papyri they didn't have time to transcribe into several thousand data points they didn't have time to read'.  While 'serendipitous discovery is a natural consequence of exposing data to large numbers of users' (in the words of the Citizen Science Alliance), they wanted a more sophisticated method for recording potential discoveries experts made while engaging with the material and built a focused 'talk' tool which can programmatically filter out the most interesting unanswered comments and email them to their 30 or 40 expert users. They also have Letters for more structured, journal-style reporting. (I hope I have that right).  He also discussed decisions around full text transcriptions (difficult to automatically reconcile) vs 'rich metadata', or more structured indexes of the content of the page, which contain enough information to help historians decide which pages to transcribe in full for themselves.

Some other thoughts that struck me during the day… humanities crowdsourcing has a lot to learn from the application of maths and logic in citizen science – lots of problems (like validating data) that seem intractable can actually be solved algorithmically, and citizen science hypothesis-based approach to testing task and interface design would help humanities projects. Niche projects help solve the problem of putting the right obscure item in front of the right user (which was an issue I wrestled with during my short residency at the Powerhouse Museum last year – in hindsight, building niche projects could have meant a stronger call-to-action and no worries about getting people to navigate to the right range of objects).  The variable role of forums and participants' relationship to the project owners and each other came up at various points – in some projects, interactions with a central authority are more valued, in others, community interactions are really important. I wonder how much it depends on the length and size of the project? The potential and dangers of 'gamification' and 'badgeification' and their potentially negative impact on motivation were raised. I agree with Lintott that games require a level of polish that could mean you'd invest more in making them than you'd get back in value, but as a form of engagement that can create deeper relationships with cultural heritage and/or validate some procrastination over a cup of tea, I think they potentially have a wider value that balances that.

I was also asked to chair the panel discussion, which featured Kimberly Kowal, Andrew Greg, Alice Warley, Laura Carletti, Stuart Dunn and Tim Causer.  Questions during the panel discussion included:

  • 'what happens if your super-user dies?' (Super-users or super contributors are the tiny percentage of people who do most of the work, as in this Old Weather post) – discussion included mass media as a numbers game, the idea that someone else will respond to the need/challenge, and asking your community how they'd reach someone like them. (This also helped answer the question 'how do you find your crowd?' that came in from twitter)
  • 'have you ever paid anyone?' Answer: no
  • 'can you recruit participants through specialist societies?' From memory, the answer was 'yes but it does depend'.
  • something like 'have you met participants in real life?' – answer, yes, and it was an opportunity to learn from them, and to align the community, institution, subject and process.
  • 'badgeification?'. Answer: the quality of the reward matters more than the levels (so badges are probably out).
  • 'what happens if you force students to work on crowdsourcing projects?' – one suggestion was to look for entries on Transcribe Bentham in a US English class blog
  • 'what's happened to tagging in art museums, where's the new steve.museum or Brooklyn Museum?' – is it normalised and not written about as much, or has it declined?
  • 'how can you get funding for crowdsourcing projects?'. One answer – put a good application in to the Heritage Lottery Fund. Or start small, prove the value of the project and get a larger sum. Other advice was to be creative or use existing platforms. Speaking of which, last year the Citizen Science Alliance announced 'the first open call for proposals by researchers who wish to develop citizen science projects which take advantage of the experience, tools and community of the Zooniverse. Successful proposals will receive donated effort of the Adler-based team to build and launch a new citizen science project'.
  • 'can you tell in advance which communities will make use of a forum?' – a great question that drew on various discussions of the role of communities of participants in supporting each other and devising new research questions
  • a question on 'quality control' provoked a range of responses, from the manual quality control in Transcribe Bentham and the high number of Taggers initially required for each painting in Your Paintings which slowed things down, and lead into a discussion of shallow vs deep interactions
  • the final questioner asked about documenting film with crowdsourcing and was answered by someone else in the audience, which seemed a very fitting way to close the day.
James Murray in his Scriptorium with thousands of word references sent in by members of the public for the first Oxford English Dictionary. Early crowdsourcing?

If you found this post useful, you might also like Frequently Asked Questions about crowdsourcing in cultural heritage or my earlier Museums and the Web paper on Playing with Difficult Objects – Game Designs to Improve Museum Collections.