Crowdsourcing in cultural heritage, citizen science – recent updates

A small* collection of links from the past little while.

Projects

  • A new Zooniverse project, Decoding the Civil War, launched in June: 'Witness the United States Civil War by transcribing and deciphering messages and codes from the United States Military Telegraph'.
  • Another Zooniverse project, Camera CATalogue: 'Analyze Wildlife Photos to Help Panthera Protect Big Cats'.

Articles

  • Palmer, Stuart, and Deb Verhoeven, ‘Crowdfunding Academic Researchers–the Importance of Academic Social Media Profiles’, in ECSM 2016: Proceedings of the 3rd European Conference on Social Media (Academic Conferences and Publishing International, 2016), pp. 291–299
  • Preece, Jennifer, ‘Citizen Science: New Research Challenges for Human–Computer Interaction’, International Journal of Human-Computer Interaction, 32 (2016), 585–612 <http://dx.doi.org/10.1080/10447318.2016.1194153>
  • Dillon, Justin, Robert B. Stevenson, and Arjen E. J. Wals, ‘Introduction: Special Section: Moving from Citizen to Civic Science to Address Wicked Conservation Problems’, Conservation Biology, 30 (2016), 450–55 <http://dx.doi.org/10.1111/cobi.12689> – has an interesting new model, putting citizen sciences 'on a continuum from highly instrumental forms driven by experts or science to more emancipatory forms driven by public concern. The variations explain why citizens participate in CS and why scientists participate too. To advance the conversation, we distinguish between three strands or prototypes: science-driven CS, policy-driven CS, and transition-driven civic science.'

    'We combined Jickling and Wals’ (2008) heuristic for understanding environmental and sustainability education (Jickling & Wals 2008) and M. Fox and R. Gibson's problem typology (Fig. 1) to provide an overview of the different possible configurations of citizen science (Fig. 2). The heuristic has 2 axes. We call the horizontal axis the participation axis, along which extend the possibilities (increasing from left to right) for stakeholders, including the public, to participate in setting the agenda; determining the questions to be addressed; deciding the mechanisms and tools to be used; choosing how to monitor, evaluate, and interpret data; and choosing the course of action to take. The vertical (goal) axis shows the possibilities for autonomy and self-determination in setting goals and objectives. The resulting quadrants correspond to a particular strand of citizen science. All three occupied quadrants are important and legitimate.'

    A heuristic of citizen science based on Wals and Jickling (2008).
    A heuristic of citizen science based on Wals and Jickling (2008). From Dillon, Justin, Robert B. Stevenson, and Arjen E. J. Wals (2016)

    * It's a short list this month as I've been busy and things seem quieter over the northern hemisphere summer.

Crowdsourcing workshop at DH2016 – session overview

A quick signal boost for the collaborative notes taken at the DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? (held in Kraków, Poland, on 12 July as part of the Digital Humanities 2016 conference, abstract below). We'd emphasised the need to document the unconference-style sessions (see FAQ) so that future projects could benefit from the collective experiences of participants. Since it can be impossible to find Google Docs or past tweets, I've copied the session overview below. The text is a summary of key takeaways or topics discussed in each session, created in a plenary session at the end of the workshop.

Participant introductions and interests – live notes
Ethics, Labour, sensitive material

Key takeaway – questions for projects to ask at the start; don't impose your own ethics on a project, discussing them is start of designing the project.

Where to start
Engaging volunteers, tips including online communities, being open to levels of contribution, being flexible, setting up standards, quality
Workflow, lifecycle, platforms
What people were up to, the problems with hacking systems together, iiif.io, flexibility and workflows
Public expertise, education, what’s unique to humanities crowdsourcing
The humanities are contestable! Responsibility to give the public back the results of the process in re-usable
Options, schemas and goals for text encoding
Encoding systems will depend on your goals; full-text transcription always has some form of encoding, data models – who decides what it is, and when? Then how are people guided to use it?Trying to avoid short-term solutions
UX, flow, motivation
Making tasks as small as possible; creating a sense of contribution; creating a space for volunteers to communicate; potential rewards, issues like badgefication and individual preferences. Supporting unexpected contributions; larger-scale tasks
Project scale – thinking ahead to ending projects technically, and in terms of community – where can life continue after your project ends
Finding and engaging volunteers
Using social media, reliance on personal networks, super-transcribers, problematic individuals who took more time than they gave to the project. Successful strategies are very-project dependent. Something about beer (production of Itinera Nova beer with label containing info on the project and link to website).
Ecosystems and automatic transcription
Makes sense for some projects, but not all – value in having people engage with the text. Ecosystem – depending on goals, which parts work better? Also as publication – editions, corpora – credit, copyright, intellectual property
Plenary session, possible next steps – put information into a wiki. Based around project lifecycle, critical points? Publication in an online journal? Updateable, short-ish case studies. Could be categorised by different attributes. Flexible, allows for pace of change. Illustrate principles, various challenges.

Short-term action: post introductions, project updates and new blog posts, research, etc to https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=CROWDSOURCING – a central place to send new conference papers, project blog posts, questions, meet-ups.

The workshop abstract:

Crowdsourcing – asking the public to help with inherently rewarding tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge – is reasonably well established in the humanities and cultural heritage sector. The success of projects such as Transcribe Bentham, Old Weather and the Smithsonian Transcription Center in processing content and engaging participants, and the subsequent development of crowdsourcing platforms that make launching a project easier, have increased interest in this area. While emerging best practices have been documented in a growing body of scholarship, including a recent report from the Crowd Consortium for Libraries and Archives symposium, this workshop looks to the next 5 – 10 years of crowdsourcing in the humanities, the sciences and in cultural heritage. The workshop will gather international experts and senior project staff to document the lessons to be learnt from projects to date and to discuss issues we expect to be important in the future.

Photo by Digital Humanities ‏@DH_Western
Photo by Digital Humanities ‏@DH_Western

The workshop is organised by Mia Ridge (British Library), Meghan Ferriter (Smithsonian Transcription Centre), Christy Henshaw (Wellcome Library) and Ben Brumfield (FromThePage).

If you're new to crowdsourcing, here's a reading list created for another event.

 

April news in crowdsourcing, citizen science, citizen history

Another quick post with news on crowdsourcing in cultural heritage, citizen science and citizen history in April(ish) 2016…

Acceptances for our DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? have been sent out. If you missed the boat, don't panic! We're taking a few more applications on a rolling basis to allow for people with late travel approval for the DH2016 conference in July.

Probably the biggest news is the launch of citizenscience.gov, as it signals the importance of citizen science and crowdsourcing to the US government.

From the press release: 'the White House announced that the U.S. General Services Administration (GSA) has partnered with the Woodrow Wilson International Center for Scholars (WWICS), a Trust instrumentality of the U.S. Government, to launch CitizenScience.gov as the new hub for citizen science and crowdsourcing initiatives in the public sector.

CitizenScience.gov provides information, resources, and tools for government personnel and citizens actively engaged in or looking to participate in citizen science and crowdsourcing projects. … Citizen science and crowdsourcing are powerful approaches that engage the public and provide multiple benefits to the Federal government, volunteer participants, and society as a whole.'

There's also work to 'standardize data and metadata related to citizen science, allowing for greater information exchange and collaboration both within individual projects and across different projects'.

Other news:

Responses to questions about if the volunteers agreed that the Zooniverse… From Science Learning via Participation in Online Citizen Science

Have I missed something important? Let me know in the comments or @mia_out.

SXSW, project anniversaries and more – news on heritage crowdsourcing

Photo of programme
Our panel listing at SXSW

I've just spent two weeks in Texas, enjoying the wonderful hospitality and probing questions after giving various talks at universities in Houston and Austin before heading to SXSW. I was there for a panel on 'Build the Crowdsourcing Community of Your Dreams' (link to our slides and collected resources) with Ben Brumfield, Siobhan Leachman, and Meghan Ferriter. Siobhan, a 'super-volunteer' in more ways than one, posted her talk notes on 'How cultural institutions encouraged me to participate in crowdsourcing & the factors I consider before donating my time'.

In other news, we (me, Ben, Meghan and Christy Henshaw from the Wellcome Library) have had a workshop accepted for the Digital Humanities 2016 conference, to be held in Kraków in July. We're looking for people with different kinds of expertise for our DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing?.  You can apply via this form.

One of the questions at our SXSW panel was about crowdsourcing in teaching, which reminded me of this recent post on 'The War Department in the Classroom' in which Zayna Bizri 'describes her approach to using the Papers of the War Department in the classroom and offers suggestions for those who wish to do the same'. In related news, the PWD project is now five years old! There's also this post on Primary School Zooniverse Volunteers.

The Science Gossip project is one year old, and they're asking their contributors to decide which periodicals they'll work on next and to start new discussions about the documents and images they find interesting.

The History Harvest project have released their Handbook (PDF).

The Danish Nationalmuseet is having a 'Crowdsource4dk' crowdsourcing event on April 9. You can also transcribe Churchill's WWII daily appointments, 1939 – 1945 or take part in Old Weather: Whaling (and there's a great Hyperallergic post with lots of images about the whaling log books).

I've seen a few interesting studentships and jobs posted lately, hinting at research and projects to come. There's a funded PhD in HCI and online civic engagement and a (now closed) studentship on Co-creating Citizen Science for Innovation.

And in old news, this 1996 post on FamilySearch's collaborative indexing is a good reminder that very little is entirely new in crowdsourcing.

From grey dots to trenches to field books – news in heritage crowdsourcing

Apparently you can finish a thesis but you can't stop scanning for articles and blog posts on your topic. Sharing them here is a good way to shake the 'I should be doing something with this' feeling.* This is a fairly random sample of recent material, but if people find it useful I can go back and pull out other things I've collected.

Victoria Van Hyning, ‘What’s up with those grey dots?’ you ask – brief blog post on using software rather than manual processes to review multiple text transcriptions, and on the interface challenges that brings.

Melissa Terras, 'Crowdsourcing in the Digital Humanities' – pre-print PDF for a chapter in A New Companion to Digital Humanities.

Richard Grayson, 'A Life in the Trenches? The Use of Operation War Diary and Crowdsourcing Methods to Provide an Understanding of the British Army’s Day-to-Day Life on the Western Front' – a peer-reviewed article based on data created through Operation War Diary.

The Impact of Coordinated Social Media Campaigns on Online Citizen Science Engagement – a poster by Lesley Parilla and Meghan Ferriter reported on the Biodiversity Heritage Library blog.

The Impact of Coordinated Social Media Campaigns on Online Citizen Science Engagement

Ben Brumfield, Crowdsourcing Transcription Failures – a response to a mailing list post asking 'where are the failures?'

And finally, something related to my interest in participatory history commonsMartin Luther King Jr. Memorial Library – Central Library launches Memory Lab, a 'DIY space where you can digitize your home movies, scan photographs and slides, and learn how to care for your physical and digital family heirlooms'. I was so excited when I about this project – it's addressing such important issues. Jaime Mears is blogging about the project.

 

* How long after a PhD does it take for that feeling to go? Asking for a friend.

Exercises for 'The basics of crowdsourcing in cultural heritage'

I'm running a workshop (at a Knowledge Exchange event organised by the Scottish Network on Digital Cultural Resources Evaluation and the Museums Galleries Scotland Digital Transformation Network) to help people get started with crowdsourcing in cultural heritage. These exercises are designed to give participants some hands-on experience with existing projects while developing their ability to discuss the elements of successful crowdsourcing projects. They are also an opportunity to appreciate the importance of design and text in marketing a project, and the role of user experience design in creating projects that attract and retain contributors.

Exercise: compare front pages

Choose two of the sites below to review.

The most important question to keep in mind is: how effective is the front page at making you want to participate in a project? How does it achieve that?

Exercise: try some crowdsourcing projects

Try one of the sites listed above; others are listed in this post; non-English language sites are listed here. You can also ask for suggestions!

Attributes to discuss include:

The overall 'call to action'

  • Is the first step toward participating obvious?
  • Is the type of task, source material and output obvious?

Probable audience

  • Can you tell who the project wants to reach?
  • Does text relate to their motivations for starting, continuing?
  • How are they rewarded?
  • Are there any barriers to their participation?

Data input and data produced

  • What kinds of tasks create that data?
  • How are contributions validated?

How productive, successful does the site seem overall?

Exercise: lessons from game design

  • Go to http://git.io/2048
  • Spend 2 minutes trying it out
  • Did you understand what to do?
  • Did you want to keep playing?

Exercise: your plans

Some questions to help make ideas into reality:

  • Who already loves and/or uses your collections?
  • Which material needs what kind of work?
  • Do any existing platforms meet most of your needs?
  • What potential barriers could you turn into tasks?
  • How will you resource community interaction?
  • How would a project support your mission, engagement strategy and digitisation goals?

How an ecosystem of machine learning and crowdsourcing could help you

Back in September last year I blogged about the implications for cultural heritage and digital humanities crowdsourcing projects that used simple tasks as the first step in public engagement of advances in machine learning that mean that fun, easy tasks like image tagging and text transcription could be done by computers. (Broadly speaking, 'machine learning' is a label for technologies that allow computers to learn from the data available to them. It means they don't have to specifically programmed to know how to do a task like categorising images – they can learn from the material they're given.)

One reason I like crowdsourcing in cultural heritage so much is that time spent on simple tasks can provide opportunities for curiosity, help people find new research interests, and help them develop historical or scientific skills as they follow those interests. People can notice details that computers would overlook, and those moments of curiosity can drive all kinds of new inquiries. I concluded that, rather than taking the best tasks from human crowdsourcers, 'human computation' systems that combine the capabilities of people and machines can free up our time for the harder tasks and more interesting questions.

I've been thinking about 'ecosystems' of crowdsourcing tasks since I worked on museum metadata games back in 2010. An ecosystem of tasks – for example, classifying images into broad types and topics in one workflow so that people can find text to transcribe on subjects they're interested in, and marking up that text with relevant subjects in a final workflow – means that each task can be smaller (and thereby faster and more enjoyable). Other workflows might validate the classifications or transcribed text, allowing participants with different interests, motivations and time constraints to make meaningful contributions to a project.

The New York Public Library's Building Inspector is an excellent example of this – they offer five tasks (checking or fixing automatically-detected building 'footprints', entering street numbers, classifying colours or finding place names), each as tiny as possible, which together result in a complete set of checked and corrected building footprints and addresses. (They've also pre-processed the maps to find the building footprints so that most of the work has already been done before they asked people to help.)

Screenshot from NYPL's Building Inspector
Check building footprints in NYPL's Building Inspector

After teaching 'crowdsourcing cultural heritage' at HILT over the summer, where the concept of 'ecosystems' of crowdsourced tasks was put into practice as we thought about combining classification-focused systems like Zooniverse's Panoptes with full-text transcription systems, I thought it could be useful to give some specific examples of ecosystems for human computation in cultural heritage. If there are daunting data cleaning, preparation or validation tasks necessary before or after a core crowdsourcing task, computational ecosystems might be able to help. So how can computational ecosystems help pre- and post-process cultural heritage data for a better crowdsourcing experience?

While older ecosystems like Project Gutenberg and Distributed Proofreaders have been around for a while, we're only just seeing the huge potential for combining people + machines into crowdsourcing ecosystems. The success of the Smithsonian Transcription Center points to the value of 'niche' mini-projects, but breaking vast repositories into smaller sets of items about particular topics, times or places also takes resources. Machines can learn to classify source material by topic, by type, by difficulty or any other system that crowdsourcers can teach it. You can improve machine learning by giving systems 'ground truth' datasets with (for example) a crowdsourced transcription of the text in images, and as Ted Underwood pointed out on my last post, comparing the performance of machine learning and crowdsourced transcriptions can provide useful benchmarks for the accuracy of each method. Small, easy correction tasks can help improve machine learning processes while producing cleaner data.

Computational ecosystems might be able to provide better data validation methods. Currently, tagging tasks often rely on raw consensus counts when deciding whether a tag is valid for a particular image. This is a pretty crude measure – while three non-specialists might apply terms like 'steering' to a picture of a ship, a sailor might enter 'helm', 'tiller' or 'wheelhouse', but their terms would be discarded if no-one else enters them. Mining disciplinary-specific literature for relevant specialist terms, or finding other signals for subject-specific expertise would make more of that sailor's knowledge.

Computational ecosystems can help at the personal, as well as the project level. One really exciting development is computational assistance during crowdsourcing tasks. In Transcribing Bentham … with the help of a machine?, Tim Causer discusses TSX, a new crowdsourced transcription platform from the Transcribe Bentham and tranScriptorium projects. You can correct computationally-generated handwritten text transcription (HTR), which is a big advance in itself. Most importantly, you can also request help if you get stuck transcribing a specific word. Previously, you'd have to find a friendly human to help with this task. And from here, it shouldn't be too difficult to combine HTR with computational systems to give people individualised feedback on their transcriptions. The potential for helping people learn palaeography is huge!

Better validation techniques would also improve the participants' experience. Providing personalised feedback on the first tasks a participant completes would help reassure them while nudging them to improve weaker skills.

Most science and heritage projects working on human computation are very mindful of the impact of their choices on the participants' experience. However, there's a risk that anyone who treats human computation like a computer science problem (for example, computationally assigning tasks to the people with the best skills for them) will lose sight of the 'human' part of the project. Individual agency is important, and learning or mastering skills is an important motivation. Non-profit crowdsourcing should never feel like homework. We're still learning about the best ways to design crowdsourcing tasks, and that job is only going to get more interesting.

 

 

Save

Crowdsourcing the world's heritage

NB: this post was last updated 16 June 2024. In general, I add new sites but don't remove old sites that are no longer live. This post is now supplemented with another on National approaches to crowdsourcing / citizen science. I've also shared a 2015 list of 'participatory digital heritage sites' that includes many crowdsourcing sites. Contact me via my main website contact page to suggest a site.

It's all too easy to forget that there are international crowdsourcing projects in languages other than English so I thought I'd collect some projects related to cultural heritage, history and science here (following my definition of crowdsourcing in cultural heritage as 'asking the public to help with tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge'). This list is drawn from my PhD research, but this is a fast-moving field and I was focusing on early modern England, so inevitably this list will be missing loads of examples. Please suggest links to help people discover new projects! Also, I'm often taking my best guess at the correct translation for terms, so please correct me if I've misunderstood.

If you're interested in crowdsourcing in cultural heritage, my edited volume has chapters with lessons learnt from a range of projects.

English-language projects tend to be easier to find, but for completeness:

UKirecord.org.uk/ (thanks Rita Singer )

USAarchives.gov/citizen-archivist and weather.gov/cle/CWOP  (thanks @BuffaloResearch), crowd.loc.gov, transcription.si.edu/

  • 'Your project goes here' – what have I missed?
reseau-correct.fr correction
Correcting text from the Bibliothèque nationale de France on 'Correct'.

Three ways you can help with 'In their own words: collecting experiences of the First World War' (and a CENDARI project update)

Somehow it's a month since I posted about my CENDARI research project (in Moving forward: modelling and indexing WWI battalions) on this site. That probably reflects the rhythm of the project – less trying to work out what I want to do and more getting on with doing it. A draft post I started last month simply said, 'A lot of battalions were involved in World War One'. I'll do a retrospective post soon, and here's a quick summary of on-going work.

First, a quick recap. My project has two goals – one, to collect a personal narrative for each battalion in the Allied armies of the First World War; two, to create a service that would allow someone to ask 'where was a specific battalion at a specific time?'. Together, they help address a common situation for people new to WWI history who might ask something like 'I know my great-uncle was in the 27th Australian battalion in March 1916, where would he have been and what would he have experienced?'.

I've been working on streamlining and simplifying the public-facing task of collecting a personal narrative for each battalion, and have written a blog post, Help collect soldiers’ experiences of WWI in their own words, that reduces it to three steps:

  1. Take one of the diaries, letters and memoirs listed on the Collaborative Collections wiki, and
  2. Match its author with a specific regiment or battalion.
  3. Send in the results via this form.

If you know of a local history society, family historian or anyone else who might be interested in helping, please send them along to this post: Help collect soldiers’ experiences of WWI in their own words.

Work on specifying the relevant data structures to support a look-up service to answer questions about a specific units location and activities at a specific time largely moved to the wiki:

You can see the infobox structures in progress by flipping from the talk to the Template tabs. You'll need to request an account to join in but more views, sample data and edge cases would be really welcome.

Populating the list of battalions and other units has been a huge task in itself, partly because very few cultural institutions have definitive lists of units they can (or want to) share, but it's necessary to support both core goals. I've been fortunate to have help (see 'Thanks and recent contributions' on 'How you can help') but the task is on-going so get in touch if you can help!

So there are three different ways you can help with 'In their own words: collecting experiences of the First World War':

Finally, last week I was in New Zealand to give a keynote on this work at the National Digital Forum. The video for 'Collaborative collections through a participatory commons' is online, so you can catch up on the background for my project if you've got 40 minutes or so to spare. Should you be in Dublin, I'm giving a talk on 'A pilot with public participation in historical research: linking lived experiences of the First World War' at the Trinity Long Room Hub today (thus the poster).

And if you've made it this far, perhaps you'd like to apply for a CENDARI Visiting Research Fellowships 2015 yourself?

Moving forward: modelling and indexing WWI battalions

A super-quick update from my CENDARI Fellowship this week. I set up the wiki for In their own words: linking lived experiences of the First World War a week ago but only got stuck into populating it with lists of various national battalions this week. My current task list, copied from the front page is to:

If you can help with any of that, let me know! Or just get stuck in and edit the site.
I've started another Google Doc with very sketchy Notes towards modelling information about World War One Battalions. I need to test it with more battalion histories and update it iteratively. At this stage my thinking is to turn it into an InfoBox format to create structured data via the wiki. It's all very lo-fi and much less designed than my usual projects, but I'm hoping people will be able to help regardless.
So, in this phase of the project, the aim is find a personal narrative – a diary, letters, memoirs or images – for each military unit in the British Army. Can you help?