Looking for (crowdsourcing) love in all the right places

One of the most important exercises in the crowdsourcing workshops I run is the 'speed dating' session. The idea is to spend some time looking at a bunch of crowdsourcing projects until you find a project you love. Finding a project you enjoy gives you a deeper insight into why other people participate in crowdsourcing, and will see you through the work required to get a crowdsourcing project going. I think making a personal connection like this helps reduce some of the cynicism I occasionally encounter about why people would volunteer their time to help cultural heritage collections. Trying lots of projects also gives you a much better sense of the types of barriers projects can accidentally put in the way of participation. It's also a good reminder that everyone is a nerd about something, and that there's a community of passion for every topic you can think of.

If you want to learn more about designing history or cultural heritage crowdsourcing projects, trying out lots of project is a great place to start. The more time you can spend on this the better – an hour is ideal – but trying just one or two projects is better than nothing. In a workshop I get people to note how a project made them feel – what they liked most and least about a project, and who they'd recommend it to. You can also note the input and output types to help build your mental database of relevant crowdsourcing projects.

The list of projects I suggest varies according to the background of workshop participants, and I'll often throw in suggestions tailored to specific interests, but here's a generic list to get you started.

10 Most Wanted http://10most.org.uk/ Research object histories
Ancient Lives http://ancientlives.org/ Humanities, language, text transcription
British Library Georeferencer http://www.bl.uk/maps/ Locating and georeferencing maps (warning: if it's running, only hard maps may be left!)
Children of the Lodz Ghetto http://online.ushmm.org/lodzchildren/ Citizen history, research
Describe Me http://describeme.museumvictoria.com.au/ Describe objects
DIY History http://diyhistory.lib.uiowa.edu/ Transcribe historical letters, recipes, diaries
Family History Transcription Project http://www.flickr.com/photos/statelibrarync/collections/ Document transcription (Flickr/Yahoo login required to comment)
Herbaria@home http://herbariaunited.org/atHome/ (for bonus points, compare it with Notes from Nature https://www.zooniverse.org/project/notes_from_nature) Transcribing specimen sheets (or biographical research)
HistoryPin Year of the Bay 'Mysteries' https://www.historypin.org/attach/project/22-yearofthebay/mysteries/index/ Help find dates, locations, titles for historic photographs; overlay images on StreetView
iSpot http://www.ispotnature.org/ Help 'identify wildlife and share nature'
Letters of 1916 http://dh.tcd.ie/letters1916/ Transcribe letters and/or contribute letters
London Street Views 1840 http://crowd.museumoflondon.org.uk/lsv1840/ Help transcribe London business directories
Micropasts http://crowdsourced.micropasts.org/app/photomasking/newtask Photo-masking to help produce 3D objects; also structured transcription
Museum Metadata Games: Dora http://museumgam.es/dora/ Tagging game with cultural heritage objects (my prototype from 2010)
NYPL Building Inspector http://buildinginspector.nypl.org/ A range of tasks, including checking building footprints, entering addresses
Operation War Diary http://operationwardiary.org/ Structured transcription of WWI unit diaries
Papers of the War Department http://wardepartmentpapers.org/ Document transcription
Planet Hunters http://planethunters.org/ Citizen science; review visualised data
Powerhouse Museum Collection Search http://www.powerhousemuseum.com/collection/database/menu.php Tagging objects
Reading Experience Database http://www.open.ac.uk/Arts/RED/ Text selection, transcription, description.
Smithsonian Digital Volunteers: Transcription Center https://transcription.si.edu/ Text transcription
Tiltfactor Metadata Games http://www.metadatagames.org/ Games with cultural heritage images
Transcribe Bentham http://www.transcribe-bentham.da.ulcc.ac.uk/ History; text transcription
Trove http://trove.nla.gov.au/newspaper?q= Correct OCR errors, transcribe text, tag or describe documents
US National Archives http://www.amara.org/en/teams/national-archives/ Transcribing videos
What's the Score at the Bodleian http://www.whats-the-score.org/ Music and text transcription, description
What's on the menu http://menus.nypl.org/ Structured transcription of restaurant menus
What's on the menu? Geotagger http://menusgeo.herokuapp.com/ Geolocating historic restaurant menus
Wikisource – random item link http://en.wikisource.org/wiki/Special:Random/Index Transcribing texts
Worm Watch http://www.wormwatchlab.org Citizen science; video
Your Paintings Tagger http://tagger.thepcf.org.uk/ Paintings; free-text or structured tagging

NB: crowdsourcing is a dynamic field, some sites may be temporarily out of content or have otherwise settled in transit. Some sites require registration, so you may need to find another site to explore while you're waiting for your registration email.

It's here! Crowdsourcing our Cultural Heritage is now available

My edited volume, Crowdsourcing our Cultural Heritage, is now available! My introduction (Crowdsourcing our cultural heritage: Introduction), which provides an overview of the field and outlines the contribution of the 12 chapters, is online at Ashgate's site, along with the table of contents and index. There's a 10% discount if you order online.

If you're in London on the evening of Thursday 20th November, we're celebrating with a book launch party at the UCL Centre for Digital Humanities. Register at http://crowdsourcingculturalheritage.eventbrite.co.uk.

Here's the back page blurb: "Crowdsourcing, or asking the general public to help contribute to shared goals, is increasingly popular in memory institutions as a tool for digitising or computing vast amounts of data. This book brings together for the first time the collected wisdom of international leaders in the theory and practice of crowdsourcing in cultural heritage. It features eight accessible case studies of groundbreaking projects from leading cultural heritage and academic institutions, and four thought-provoking essays that reflect on the wider implications of this engagement for participants and on the institutions themselves.

Crowdsourcing in cultural heritage is more than a framework for creating content: as a form of mutually beneficial engagement with the collections and research of museums, libraries, archives and academia, it benefits both audiences and institutions. However, successful crowdsourcing projects reflect a commitment to developing effective interface and technical designs. This book will help practitioners who wish to create their own crowdsourcing projects understand how other institutions devised the right combination of source material and the tasks for their ‘crowd’. The authors provide theoretically informed, actionable insights on crowdsourcing in cultural heritage, outlining the context in which their projects were created, the challenges and opportunities that informed decisions during implementation, and reflecting on the results.

This book will be essential reading for information and cultural management professionals, students and researchers in universities, corporate, public or academic libraries, museums and archives."

Massive thanks to the following authors of chapters for their intellectual generosity and their patience with up to five rounds of edits, plus proofing, indexing and more…

  1. Crowdsourcing in Brooklyn, Shelley Bernstein;
  2. Old Weather: approaching collections from a different angle, Lucinda Blaser;
  3. ‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections, Tim Causer and Melissa Terras;
  4. Build, analyse and generalise: community transcription of the Papers of the War Department and the development of Scripto, Sharon M. Leon;
  5. What's on the menu?: crowdsourcing at the New York Public Library, Michael Lascarides and Ben Vershbow;
  6. What’s Welsh for ‘crowdsourcing’? Citizen science and community engagement at the National Library of Wales, Lyn Lewis Dafis, Lorna M. Hughes and Rhian James;
  7. Waisda?: making videos findable through crowdsourced annotations, Johan Oomen, Riste Gligorov and Michiel Hildebrand;
  8. Your Paintings Tagger: crowdsourcing descriptive metadata for a national virtual collection, Kathryn Eccles and Andrew Greg.
  9. Crowdsourcing: Crowding out the archivist? Locating crowdsourcing within the broader landscape of participatory archives, Alexandra Eveleigh;
  10.  How the crowd can surprise us: humanities crowdsourcing and the creation of knowledge, Stuart Dunn and Mark Hedges;
  11. The role of open authority in a collaborative web, Lori Byrd Phillips;
  12. Making crowdsourcing compatible with the missions and values of cultural heritage organisations, Trevor Owens.

Notes from 'Crowdsourcing in the Arts and Humanities'

Last week I attended a one-day conference, 'Digital Impacts: Crowdsourcing in the Arts and Humanities' (#oxcrowd), convened by Kathryn Eccles of Oxford's Internet Institute, and I'm sharing my (sketchy, as always) notes in the hope that they'll help people who couldn't attend.

Stuart Dunn reported on the Humanities Crowdsourcing scoping report (PDF) he wrote with Mark Hedges and noted that if we want humanities crowdsourcing to take off we should move beyond crowdsourcing as a business model and look to form, nurture and connect with communities.  Alice Warley and Andrew Greg presented a useful overview of the design decisions behind the Your Paintings Tagger and sparked some discussion on how many people need to view a painting before it's 'completed', and the differences between structured and unstructured tagging. Interestingly, paintings can be 'retired' from the Tagger once enough data has been gathered – I personally think the inherent engagement in tagging is valuable enough to keep paintings taggable forever, even if they're not prioritised in the tagging interface.  Kate Lindsay brought a depth of experience to her presentation on 'The Oxford Community Collection Model' (as seen in Europeana 1914-1918 and RunCoCo's 2011 report on 'How to run a community collection online' (PDF)). Some of the questions brought out the importance of planning for sustainability in technology, licences, etc, and the role of existing networks of volunteers with the expertise to help review objects on the community collection days.  The role of the community in ensuring the quality of crowdsourced contributions was also discussed in Kimberly Kowal's presentation on the British Library's Georeferencer project. She also reflected on what she'd learnt after the first phase of the Georeferencer project, including that the inherent reward of participating in the activity was a bigger motivator than competitiveness, and the impact on the British Library itself, which has opened up data for wider digital uses and has more crowdsourcing projects planned. I gave a paper which was based on an earlier version, The gift that gives twice: crowdsourcing as productive engagement with cultural heritage, but pushed my thinking about crowdsourcing as a tool for deep engagement with museums and other memory organisations even further. I also succumbed to the temptation to play with my own definitions of crowdsourcing in cultural heritage: 'a form of engagement that contributes towards a shared, significant goal or research question by asking the public to undertake tasks that cannot be done automatically' or 'productive public engagement with the mission and work of memory institutions'.

Chris Lintott of Galaxy Zoo fame shared his definition of success for a crowdsourcing/citizen science project: it has to produce results of value to the research community in less time than could have been done by other means (i.e. it must have been able to achieve something with crowd that couldn't have without them) and discussed how the Ancient Lives project challenged that at first by turning 'a few thousand papyri they didn't have time to transcribe into several thousand data points they didn't have time to read'.  While 'serendipitous discovery is a natural consequence of exposing data to large numbers of users' (in the words of the Citizen Science Alliance), they wanted a more sophisticated method for recording potential discoveries experts made while engaging with the material and built a focused 'talk' tool which can programmatically filter out the most interesting unanswered comments and email them to their 30 or 40 expert users. They also have Letters for more structured, journal-style reporting. (I hope I have that right).  He also discussed decisions around full text transcriptions (difficult to automatically reconcile) vs 'rich metadata', or more structured indexes of the content of the page, which contain enough information to help historians decide which pages to transcribe in full for themselves.

Some other thoughts that struck me during the day… humanities crowdsourcing has a lot to learn from the application of maths and logic in citizen science – lots of problems (like validating data) that seem intractable can actually be solved algorithmically, and citizen science hypothesis-based approach to testing task and interface design would help humanities projects. Niche projects help solve the problem of putting the right obscure item in front of the right user (which was an issue I wrestled with during my short residency at the Powerhouse Museum last year – in hindsight, building niche projects could have meant a stronger call-to-action and no worries about getting people to navigate to the right range of objects).  The variable role of forums and participants' relationship to the project owners and each other came up at various points – in some projects, interactions with a central authority are more valued, in others, community interactions are really important. I wonder how much it depends on the length and size of the project? The potential and dangers of 'gamification' and 'badgeification' and their potentially negative impact on motivation were raised. I agree with Lintott that games require a level of polish that could mean you'd invest more in making them than you'd get back in value, but as a form of engagement that can create deeper relationships with cultural heritage and/or validate some procrastination over a cup of tea, I think they potentially have a wider value that balances that.

I was also asked to chair the panel discussion, which featured Kimberly Kowal, Andrew Greg, Alice Warley, Laura Carletti, Stuart Dunn and Tim Causer.  Questions during the panel discussion included:

  • 'what happens if your super-user dies?' (Super-users or super contributors are the tiny percentage of people who do most of the work, as in this Old Weather post) – discussion included mass media as a numbers game, the idea that someone else will respond to the need/challenge, and asking your community how they'd reach someone like them. (This also helped answer the question 'how do you find your crowd?' that came in from twitter)
  • 'have you ever paid anyone?' Answer: no
  • 'can you recruit participants through specialist societies?' From memory, the answer was 'yes but it does depend'.
  • something like 'have you met participants in real life?' – answer, yes, and it was an opportunity to learn from them, and to align the community, institution, subject and process.
  • 'badgeification?'. Answer: the quality of the reward matters more than the levels (so badges are probably out).
  • 'what happens if you force students to work on crowdsourcing projects?' – one suggestion was to look for entries on Transcribe Bentham in a US English class blog
  • 'what's happened to tagging in art museums, where's the new steve.museum or Brooklyn Museum?' – is it normalised and not written about as much, or has it declined?
  • 'how can you get funding for crowdsourcing projects?'. One answer – put a good application in to the Heritage Lottery Fund. Or start small, prove the value of the project and get a larger sum. Other advice was to be creative or use existing platforms. Speaking of which, last year the Citizen Science Alliance announced 'the first open call for proposals by researchers who wish to develop citizen science projects which take advantage of the experience, tools and community of the Zooniverse. Successful proposals will receive donated effort of the Adler-based team to build and launch a new citizen science project'.
  • 'can you tell in advance which communities will make use of a forum?' – a great question that drew on various discussions of the role of communities of participants in supporting each other and devising new research questions
  • a question on 'quality control' provoked a range of responses, from the manual quality control in Transcribe Bentham and the high number of Taggers initially required for each painting in Your Paintings which slowed things down, and lead into a discussion of shallow vs deep interactions
  • the final questioner asked about documenting film with crowdsourcing and was answered by someone else in the audience, which seemed a very fitting way to close the day.
James Murray in his Scriptorium with thousands of word references sent in by members of the public for the first Oxford English Dictionary. Early crowdsourcing?

If you found this post useful, you might also like Frequently Asked Questions about crowdsourcing in cultural heritage or my earlier Museums and the Web paper on Playing with Difficult Objects – Game Designs to Improve Museum Collections.

Can you capture visitors with a steampunk arm?

Credits: Science Museum

This may be familiar to you if you've worked on a museum website: an object will capture the imagination of someone who starts to spread the link around, there's a flurry of tweets and tumblrs and links (that hopefully you'll notice in time because you've previously set up alerts for keywords or URLs on various media), others like it too and it starts to go viral and 50,000 people look at that one page in a day, 20,000 the next, furious discussions break out on social media and other sites… then they're gone, onto the next random link on someone else's site.  It's hugely exciting, but it can also feel like a missed opportunity to show these visitors other cool things you have in your collection, to address some of the issues raised and to give them more information about the object.

There are three key aspects to riding these waves of interest: the ability to spot content that's suddenly getting a lot of hits; the ability to respond with interesting, relevant content while the link is still hot (i.e. within anything from a couple of hours to a couple of days); and the ability to put that relevant content on the page where fly-by-night visitors will see it.

For many museums, caught between a templated CMS and layers of sign-off for new content , it's not as easy as it sounds.  When the Science Museum's 'steampunk artificial arm' started circulating on twitter and then made boingboing, I was able to work with curators to get a post on the collections blog about it the next day, but then there was no way of adding that link to the Brought to Life page that was all most people saw.

In his post on “The Guardian’s Facebook app”, Martin Belam discusses how their Facebook app has helped archived content live again:

Someone shares an old article with their friends, some of their friends either already use or install the app, and the viral effect begins to take hold. … We’ve got over 1.3 million articles live on the website, so that is a lot of content to be discovered, and the app means that suddenly any page, languishing unloved in our database, can become a new landing page. When an article becomes popular in the app, we sometimes package it with content. Because we know the attention has come at a specific time from a specific place, we can add related links that are appropriate to the audience rather than to the original content. …when you’ve got the audience there, you need to optimise for them

As a content company with great technical and user experience teams, the Guardian is better placed to put together existing content around a viral article, but still, I'm curious: are any museums currently managing to respond to sudden waves of interest in random objects?  And if so, how?

'Go forth and digitise' – Bill Thompson at OpenTech 2010

I've realised events like OpenTech are a bit like geek Christmas – a brief intense moment of brilliant fun with inspiring people who not only get what you're saying, they'll give you an idea back that'll push you further… then it's back to the inching progress of everyday life, but hopefully with enough of that event energy to make it all easier. Anyway, enough rambling and onto my sketchy notes from the talk. Stuff in square brackets is me thinking aloud, any mistakes are mine, etc.

Giving the Enlightenment Another Five Hundred Years, Bill Thompson
Session 3, Track A #3A
[A confession – working in a museum, and a science museum at that, I have a long-standing interest in conserving enough of the past to understand the present and plan for the future, and just because it's fascinating. It was ace to hear from someone passionate about the role of archives and cultural heritage in the defence of reason, and even more ace to see the tweets flying around as other people got excited about it too.]

The importance of the scientific method; of asking hard questions and looking for refutation not confirmation.

But surely history is all about progress – what could go wrong? But imagine President Palin… History has shown that it's possible for progress to go backwards.

What can we do? He's not speaking on behalf of the BBC here, but his job is to figure out what you can do with the BBC's archive. [Video of seeing the BBC charter – the powerful impact of holding the actual physical object is reason enough to conserve things from the past, it's an oddly visceral connection to the people who made it that I've noticed again and again while working in museums and archaeology.]

We need to remember. To remember is to understand, to resist. We need to digitise. Remembering comes along with digitising; our experience of the world is so mediated by bits that unless we makes archives digital in some form, there's a real danger that they will be forgotten, inaccessible. Also need to build mechanisms so that stuff that's created now are preserved alongside the records of the past. We need to do it all. If we do it well, we'll give current and future generations the evidence they need to resist the onslaught of ignorance, the tide of unreason that's sweeping the world. Need to create reasonable digitisation of solid artefacts too.

We need to do it soon 'because the kids may not want to'. The technology exists but thinks there's a real danger that if not done in the next ten years, it won't be done; people won't realise the value of the archives and understand why it has to be done. Kids who've grown up on Google will never do the deep research that will take them to the stuff that's not digitised; non-digital stuff will fall into disuse; conservation/preservation will stop.
Don't let Google do it, they don't value the right things.

Once it's in bits, preserve the data and the artefact; catalogue it, make it findable, make it usable – open data world meets open knowledge world. Access to APIs and datasets is important to make sure material can be found. If you know it's there you can ask for it to be digitised. Build layers on top of the assets that have been digitised.

Need to make it usable so have to sort out the rights fiasco… Need a place to put it all, not sure that exists yet. New tools, services, standards so it can be preserved forever and found in future. Not a trivial task but vitally important. The information in the archives supports true understanding. Possibility of doing something transformative at the moment. [He finished with:] 'Go forth and digitise. And don't forget the metadata'.

Crowdsourcing metadata seems like a good idea; V&A gets a shout-out for crowdsourcing image cropping [with an ad hoc description 'which one of these are in focus' – they might be horrified to hear their photography described like that. I got all excited that other people were excited about crowdsourcing metadata, because creating interfaces with game dynamics to encourage people to create content about collections is my MSc dissertation project.]

OCRing text in digitised images – amazing [I need to find a reference to that – if we can do it it'd instantly make our archives and 2D collections much more accessible and discoverable]

Question re Internet Archive – ans that it doesn't have enough curation – 'like throwing your archives down a well before the invaders arrive' – they might be there in a usable form when you come back for them, they might not be.

Question: preservation and digital archaeology are two different things, how closely are they aligned? [digital archaeology presumably not destructive though]

[And that's the end of my notes for that session, notes on the Guardian platform and game session to come]

Wellcome Library blog – a quick review

I originally wrote this a while ago, and a whole bunch of new content has been added by what seems like a range of authors, so it's worth checking out.

The Wellcome Library has a quite lovely blog. I like their 'item of the month', the way they're addressing common questions 'where do things come from', the list of latest aquisitions (though it's about as human readable as I feared it might be), a 'call for testing' when they've got newly digitised records up… it's a good example of transparency and the provision of access in practice. It feels a little as if you had a friend who worked there who sent on little tidbits they came across during the work. 

The site says it has (the uber-annoying) Snap Shots but it didn't seem to actually be interfering with my browsing experience when I checked it out today.

There's a Flickr stream too (though they haven't yet nabbed a name, so it's at the not-so-snappy http://www.flickr.com/photos/26127598@N04/) and it hasn't been updated since what looks like a big batch upload in May 2008. Some of the images are lovely, check out the human cancer cells, or neurons in the brain or historically important – such as the first DNA fingerprint.

They've just added Charles Babbage, I wonder if they have anything on Ada Lovelace they could highlight for Ada Lovelace Day.