Keynote online: ‘Reaching out: museums, crowdsourcing and participatory heritage’

In September I was invited to give a keynote at the Museum Theme Days 2016 in Helsinki. I spoke on ‘Reaching out: museums, crowdsourcing and participatory heritage. In lieu of my notes or slides, the video is below. (Great image, thanks YouTube!)

Crowdsourcing in cultural heritage, citizen science – September 2016

More new projects and project updates I’ve noticed over September 2016.

Gillian Lattimore @Irl_HeritageDig has posted some of her dissertation research on Crowdsourcing Motivations in a GLAM Context: A Research Survey of Transcriber Motivations of the Meitheal Dúchas.ie Crowdsourcing Project. dúchas.ie is ‘a project to digitize the National Folklore Collection of Ireland, one of the largest folklore collections in the world’.

A long read on Brighton Pavilion and Museums’ Map The Museum, ‘#HeritageEveryware Map The Museum: connecting collections to the street‘ includes some great insights from Kevin Bacon.

Meghan Ferriter and Christine Rosenfeld have produced a special edition of a journal, ‘Exploring the Smithsonian Institution Transcription Center‘ with articles on ‘Crowdsourcing as Practice and Method in the Smithsonian Transcription Center’ and more.

Two YouGov posts on American and British people’s knowledge of their recent family history provide some useful figures on how many people in each region have researched family history.

Richard Light’s posted some interesting questions and feedback for crowdsourcing projects at The GB1900.org project – first look.

Archiving the Civil War’s Text Messages‘ provides more information about the Decoding the Civil War project.

Zooniverse blog post ‘Why Cyclone Center is the CrockPot of citizen science projects‘ gives some insight into why some projects appear ‘slower’ than others.

A December 2015 post, ‘How a citizen science app with over 70,000 users is creating local community’ (HT Jill Nugent ‏@ntxscied) and an interesting contrast to ‘Volunteer field technicians are bad for wildlife ecology‘. A nice quote from the first piece: ‘Young says that the number one thing that keeps iNaturalist users involved is the community that they create: “meeting other people who are into the same thing I am”’.

iNaturalist Bioblitz‘s are also more evidence for the value of time-limited challenges, or as they describe them, ‘a communal citizen-science effort to record as many species within a designated location and time period as possible’.

Micropasts continue to add historical and archaeological projects.

Survey of London and CASA launched the Histories of Whitechapel website, providing ‘a new interactive map for exploring the Survey’s ongoing research into Whitechapel’ and ‘inviting people to submit their own memories, research, photographs, and videos of the area to help us uncover Whitechapel’s long and rich history’.

New Zooniverse project Mapping Change: ‘Help us use over a century’s worth of specimens to map the distribution of animals, plants, and fungi. Your data will let us know where species have been and predict where they may end up in the future!’

New Europeana project Europeana Transcribe: ‘a crowdsourcing initiative for the transcription of digital material from the First World War, compiled by Europeana 1914-1918. With your help, we can create a vast and fully digital record of personal documents from the collection.’

‘Holiday pictures help preserve the memory of world heritage sites’ introduces Curious Travellers, a ‘data-mining and crowd sourced infrastructure to help with digital documentation of archaeological sites, monuments and heritage at risk’. Or in non-academese, send them your photos and videos of threatened historic sites, particularly those in ‘North Africa, including Cyrene in Libya, as well as those in Syria and the Middle East’.

I’ve added two new international projects, Les herbonautes, a French herbarium transcription project led by the Paris Natural History Museum, and Loki a Finnish project on maritime, coastal history to my post on Crowdsourcing the world’s heritage – as always, let me know of other projects that should be included.

 

Survey of London
Survey of London site

Crowdsourcing in cultural heritage, citizen science – recent updates

A small* collection of links from the past little while.

Projects

  • A new Zooniverse project, Decoding the Civil War, launched in June: ‘Witness the United States Civil War by transcribing and deciphering messages and codes from the United States Military Telegraph’.
  • Another Zooniverse project, Camera CATalogue: ‘Analyze Wildlife Photos to Help Panthera Protect Big Cats’.

Articles

  • Palmer, Stuart, and Deb Verhoeven, ‘Crowdfunding Academic Researchers–the Importance of Academic Social Media Profiles’, in ECSM 2016: Proceedings of the 3rd European Conference on Social Media (Academic Conferences and Publishing International, 2016), pp. 291–299
  • Preece, Jennifer, ‘Citizen Science: New Research Challenges for Human–Computer Interaction’, International Journal of Human-Computer Interaction, 32 (2016), 585–612 <http://dx.doi.org/10.1080/10447318.2016.1194153>
  • Dillon, Justin, Robert B. Stevenson, and Arjen E. J. Wals, ‘Introduction: Special Section: Moving from Citizen to Civic Science to Address Wicked Conservation Problems’, Conservation Biology, 30 (2016), 450–55 <http://dx.doi.org/10.1111/cobi.12689> – has an interesting new model, putting citizen sciences ‘on a continuum from highly instrumental forms driven by experts or science to more emancipatory forms driven by public concern. The variations explain why citizens participate in CS and why scientists participate too. To advance the conversation, we distinguish between three strands or prototypes: science-driven CS, policy-driven CS, and transition-driven civic science.’

    ‘We combined Jickling and Wals’ (2008) heuristic for understanding environmental and sustainability education (Jickling & Wals 2008) and M. Fox and R. Gibson’s problem typology (Fig. 1) to provide an overview of the different possible configurations of citizen science (Fig. 2). The heuristic has 2 axes. We call the horizontal axis the participation axis, along which extend the possibilities (increasing from left to right) for stakeholders, including the public, to participate in setting the agenda; determining the questions to be addressed; deciding the mechanisms and tools to be used; choosing how to monitor, evaluate, and interpret data; and choosing the course of action to take. The vertical (goal) axis shows the possibilities for autonomy and self-determination in setting goals and objectives. The resulting quadrants correspond to a particular strand of citizen science. All three occupied quadrants are important and legitimate.’

    A heuristic of citizen science based on Wals and Jickling (2008).
    A heuristic of citizen science based on Wals and Jickling (2008). From Dillon, Justin, Robert B. Stevenson, and Arjen E. J. Wals (2016)

    * It’s a short list this month as I’ve been busy and things seem quieter over the northern hemisphere summer.

Crowdsourcing workshop at DH2016 – session overview

A quick signal boost for the collaborative notes taken at the DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? (held in Kraków, Poland, on 12 July as part of the Digital Humanities 2016 conference, abstract below). We’d emphasised the need to document the unconference-style sessions (see FAQ) so that future projects could benefit from the collective experiences of participants. Since it can be impossible to find Google Docs or past tweets, I’ve copied the session overview below. The text is a summary of key takeaways or topics discussed in each session, created in a plenary session at the end of the workshop.

Participant introductions and interests – live notes
Ethics, Labour, sensitive material

Key takeaway – questions for projects to ask at the start; don’t impose your own ethics on a project, discussing them is start of designing the project.

Where to start
Engaging volunteers, tips including online communities, being open to levels of contribution, being flexible, setting up standards, quality
Workflow, lifecycle, platforms
What people were up to, the problems with hacking systems together, iiif.io, flexibility and workflows
Public expertise, education, what’s unique to humanities crowdsourcing
The humanities are contestable! Responsibility to give the public back the results of the process in re-usable
Options, schemas and goals for text encoding
Encoding systems will depend on your goals; full-text transcription always has some form of encoding, data models – who decides what it is, and when? Then how are people guided to use it?Trying to avoid short-term solutions
UX, flow, motivation
Making tasks as small as possible; creating a sense of contribution; creating a space for volunteers to communicate; potential rewards, issues like badgefication and individual preferences. Supporting unexpected contributions; larger-scale tasks
Project scale – thinking ahead to ending projects technically, and in terms of community – where can life continue after your project ends
Finding and engaging volunteers
Using social media, reliance on personal networks, super-transcribers, problematic individuals who took more time than they gave to the project. Successful strategies are very-project dependent. Something about beer (production of Itinera Nova beer with label containing info on the project and link to website).
Ecosystems and automatic transcription
Makes sense for some projects, but not all – value in having people engage with the text. Ecosystem – depending on goals, which parts work better? Also as publication – editions, corpora – credit, copyright, intellectual property
Plenary session, possible next steps – put information into a wiki. Based around project lifecycle, critical points? Publication in an online journal? Updateable, short-ish case studies. Could be categorised by different attributes. Flexible, allows for pace of change. Illustrate principles, various challenges.

Short-term action: post introductions, project updates and new blog posts, research, etc to https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=CROWDSOURCING – a central place to send new conference papers, project blog posts, questions, meet-ups.

The workshop abstract:

Crowdsourcing – asking the public to help with inherently rewarding tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge – is reasonably well established in the humanities and cultural heritage sector. The success of projects such as Transcribe Bentham, Old Weather and the Smithsonian Transcription Center in processing content and engaging participants, and the subsequent development of crowdsourcing platforms that make launching a project easier, have increased interest in this area. While emerging best practices have been documented in a growing body of scholarship, including a recent report from the Crowd Consortium for Libraries and Archives symposium, this workshop looks to the next 5 – 10 years of crowdsourcing in the humanities, the sciences and in cultural heritage. The workshop will gather international experts and senior project staff to document the lessons to be learnt from projects to date and to discuss issues we expect to be important in the future.

Photo by Digital Humanities ‏@DH_Western
Photo by Digital Humanities ‏@DH_Western

The workshop is organised by Mia Ridge (British Library), Meghan Ferriter (Smithsonian Transcription Centre), Christy Henshaw (Wellcome Library) and Ben Brumfield (FromThePage).

If you’re new to crowdsourcing, here’s a reading list created for another event.

 

April news in crowdsourcing, citizen science, citizen history

Another quick post with news on crowdsourcing in cultural heritage, citizen science and citizen history in April(ish) 2016…

Acceptances for our DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? have been sent out. If you missed the boat, don’t panic! We’re taking a few more applications on a rolling basis to allow for people with late travel approval for the DH2016 conference in July.

Probably the biggest news is the launch of citizenscience.gov, as it signals the importance of citizen science and crowdsourcing to the US government.

From the press release: ‘the White House announced that the U.S. General Services Administration (GSA) has partnered with the Woodrow Wilson International Center for Scholars (WWICS), a Trust instrumentality of the U.S. Government, to launch CitizenScience.gov as the new hub for citizen science and crowdsourcing initiatives in the public sector.

CitizenScience.gov provides information, resources, and tools for government personnel and citizens actively engaged in or looking to participate in citizen science and crowdsourcing projects. … Citizen science and crowdsourcing are powerful approaches that engage the public and provide multiple benefits to the Federal government, volunteer participants, and society as a whole.’

There’s also work to ‘standardize data and metadata related to citizen science, allowing for greater information exchange and collaboration both within individual projects and across different projects’.

Other news:

Responses to questions about if the volunteers agreed that the Zooniverse… From Science Learning via Participation in Online Citizen Science

Have I missed something important? Let me know in the comments or @mia_out.

SXSW, project anniversaries and more – news on heritage crowdsourcing

Photo of programme
Our panel listing at SXSW

I’ve just spent two weeks in Texas, enjoying the wonderful hospitality and probing questions after giving various talks at universities in Houston and Austin before heading to SXSW. I was there for a panel on ‘Build the Crowdsourcing Community of Your Dreams’ (link to our slides and collected resources) with Ben Brumfield, Siobhan Leachman, and Meghan Ferriter. Siobhan, a ‘super-volunteer’ in more ways than one, posted her talk notes on ‘How cultural institutions encouraged me to participate in crowdsourcing & the factors I consider before donating my time‘.

In other news, we (me, Ben, Meghan and Christy Henshaw from the Wellcome Library) have had a workshop accepted for the Digital Humanities 2016 conference, to be held in Kraków in July. We’re looking for people with different kinds of expertise for our DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing?.  You can apply via this form.

One of the questions at our SXSW panel was about crowdsourcing in teaching, which reminded me of this recent post on ‘The War Department in the Classroom‘ in which Zayna Bizri ‘describes her approach to using the Papers of the War Department in the classroom and offers suggestions for those who wish to do the same’. In related news, the PWD project is now five years old! There’s also this post on Primary School Zooniverse Volunteers.

The Science Gossip project is one year old, and they’re asking their contributors to decide which periodicals they’ll work on next and to start new discussions about the documents and images they find interesting.

The History Harvest project have released their Handbook (PDF).

The Danish Nationalmuseet is having a ‘Crowdsource4dk‘ crowdsourcing event on April 9. You can also transcribe Churchill’s WWII daily appointments, 1939 – 1945 or take part in Old Weather: Whaling (and there’s a great Hyperallergic post with lots of images about the whaling log books).

I’ve seen a few interesting studentships and jobs posted lately, hinting at research and projects to come. There’s a funded PhD in HCI and online civic engagement and a (now closed) studentship on Co-creating Citizen Science for Innovation.

And in old news, this 1996 post on FamilySearch’s collaborative indexing is a good reminder that very little is entirely new in crowdsourcing.

From grey dots to trenches to field books – news in heritage crowdsourcing

Apparently you can finish a thesis but you can’t stop scanning for articles and blog posts on your topic. Sharing them here is a good way to shake the ‘I should be doing something with this’ feeling.* This is a fairly random sample of recent material, but if people find it useful I can go back and pull out other things I’ve collected.

Victoria Van Hyning, ‘What’s up with those grey dots?’ you ask – brief blog post on using software rather than manual processes to review multiple text transcriptions, and on the interface challenges that brings.

Melissa Terras, ‘Crowdsourcing in the Digital Humanities‘ – pre-print PDF for a chapter in A New Companion to Digital Humanities.

Richard Grayson, ‘A Life in the Trenches? The Use of Operation War Diary and Crowdsourcing Methods to Provide an Understanding of the British Army’s Day-to-Day Life on the Western Front‘ – a peer-reviewed article based on data created through Operation War Diary.

The Impact of Coordinated Social Media Campaigns on Online Citizen Science Engagement – a poster by Lesley Parilla and Meghan Ferriter reported on the Biodiversity Heritage Library blog.

The Impact of Coordinated Social Media Campaigns on Online Citizen Science Engagement

Ben Brumfield, Crowdsourcing Transcription Failures – a response to a mailing list post asking ‘where are the failures?’

And finally, something related to my interest in participatory history commonsMartin Luther King Jr. Memorial Library – Central Library launches Memory Lab, a ‘DIY space where you can digitize your home movies, scan photographs and slides, and learn how to care for your physical and digital family heirlooms’. I was so excited when I about this project – it’s addressing such important issues. Jaime Mears is blogging about the project.

 

* How long after a PhD does it take for that feeling to go? Asking for a friend.

Exercises for ‘The basics of crowdsourcing in cultural heritage’

I’m running a workshop (at a Knowledge Exchange event organised by the Scottish Network on Digital Cultural Resources Evaluation and the Museums Galleries Scotland Digital Transformation Network) to help people get started with crowdsourcing in cultural heritage. These exercises are designed to give participants some hands-on experience with existing projects while developing their ability to discuss the elements of successful crowdsourcing projects. They are also an opportunity to appreciate the importance of design and text in marketing a project, and the role of user experience design in creating projects that attract and retain contributors.

Exercise: compare front pages

Choose two of the sites below to review.

The most important question to keep in mind is: how effective is the front page at making you want to participate in a project? How does it achieve that?

Exercise: try some crowdsourcing projects

Try one of the sites listed above; others are listed in this post; non-English language sites are listed here. You can also ask for suggestions!

Attributes to discuss include:

The overall ‘call to action’

  • Is the first step toward participating obvious?
  • Is the type of task, source material and output obvious?

Probable audience

  • Can you tell who the project wants to reach?
  • Does text relate to their motivations for starting, continuing?
  • How are they rewarded?
  • Are there any barriers to their participation?

Data input and data produced

  • What kinds of tasks create that data?
  • How are contributions validated?

How productive, successful does the site seem overall?

Exercise: lessons from game design

  • Go to http://git.io/2048
  • Spend 2 minutes trying it out
  • Did you understand what to do?
  • Did you want to keep playing?

Exercise: your plans

Some questions to help make ideas into reality:

  • Who already loves and/or uses your collections?
  • Which material needs what kind of work?
  • Do any existing platforms meet most of your needs?
  • What potential barriers could you turn into tasks?
  • How will you resource community interaction?
  • How would a project support your mission, engagement strategy and digitisation goals?

How an ecosystem of machine learning and crowdsourcing could help you

Back in September last year I blogged about the implications for cultural heritage and digital humanities crowdsourcing projects that used simple tasks as the first step in public engagement of advances in machine learning that mean that fun, easy tasks like image tagging and text transcription could be done by computers. (Broadly speaking, ‘machine learning’ is a label for technologies that allow computers to learn from the data available to them. It means they don’t have to specifically programmed to know how to do a task like categorising images – they can learn from the material they’re given.)

One reason I like crowdsourcing in cultural heritage so much is that time spent on simple tasks can provide opportunities for curiosity, help people find new research interests, and help them develop historical or scientific skills as they follow those interests. People can notice details that computers would overlook, and those moments of curiosity can drive all kinds of new inquiries. I concluded that, rather than taking the best tasks from human crowdsourcers, ‘human computation‘ systems that combine the capabilities of people and machines can free up our time for the harder tasks and more interesting questions.

I’ve been thinking about ‘ecosystems’ of crowdsourcing tasks since I worked on museum metadata games back in 2010. An ecosystem of tasks – for example, classifying images into broad types and topics in one workflow so that people can find text to transcribe on subjects they’re interested in, and marking up that text with relevant subjects in a final workflow – means that each task can be smaller (and thereby faster and more enjoyable). Other workflows might validate the classifications or transcribed text, allowing participants with different interests, motivations and time constraints to make meaningful contributions to a project.

The New York Public Library’s Building Inspector is an excellent example of this – they offer five tasks (checking or fixing automatically-detected building ‘footprints’, entering street numbers, classifying colours or finding place names), each as tiny as possible, which together result in a complete set of checked and corrected building footprints and addresses. (They’ve also pre-processed the maps to find the building footprints so that most of the work has already been done before they asked people to help.)

Screenshot from NYPL's Building Inspector
Check building footprints in NYPL’s Building Inspector

After teaching ‘crowdsourcing cultural heritage’ at HILT over the summer, where the concept of ‘ecosystems’ of crowdsourced tasks was put into practice as we thought about combining classification-focused systems like Zooniverse’s Panoptes with full-text transcription systems, I thought it could be useful to give some specific examples of ecosystems for human computation in cultural heritage. If there are daunting data cleaning, preparation or validation tasks necessary before or after a core crowdsourcing task, computational ecosystems might be able to help. So how can computational ecosystems help pre- and post-process cultural heritage data for a better crowdsourcing experience?

While older ecosystems like Project Gutenberg and Distributed Proofreaders have been around for a while, we’re only just seeing the huge potential for combining people + machines into crowdsourcing ecosystems. The success of the Smithsonian Transcription Center points to the value of ‘niche’ mini-projects, but breaking vast repositories into smaller sets of items about particular topics, times or places also takes resources. Machines can learn to classify source material by topic, by type, by difficulty or any other system that crowdsourcers can teach it. You can improve machine learning by giving systems ‘ground truth’ datasets with (for example) a crowdsourced transcription of the text in images, and as Ted Underwood pointed out on my last post, comparing the performance of machine learning and crowdsourced transcriptions can provide useful benchmarks for the accuracy of each method. Small, easy correction tasks can help improve machine learning processes while producing cleaner data.

Computational ecosystems might be able to provide better data validation methods. Currently, tagging tasks often rely on raw consensus counts when deciding whether a tag is valid for a particular image. This is a pretty crude measure – while three non-specialists might apply terms like ‘steering’ to a picture of a ship, a sailor might enter ‘helm’, ’tiller’ or ‘wheelhouse’, but their terms would be discarded if no-one else enters them. Mining disciplinary-specific literature for relevant specialist terms, or finding other signals for subject-specific expertise would make more of that sailor’s knowledge.

Computational ecosystems can help at the personal, as well as the project level. One really exciting development is computational assistance during crowdsourcing tasks. In Transcribing Bentham … with the help of a machine?, Tim Causer discusses TSX, a new crowdsourced transcription platform from the Transcribe Bentham and tranScriptorium projects. You can correct computationally-generated handwritten text transcription (HTR), which is a big advance in itself. Most importantly, you can also request help if you get stuck transcribing a specific word. Previously, you’d have to find a friendly human to help with this task. And from here, it shouldn’t be too difficult to combine HTR with computational systems to give people individualised feedback on their transcriptions. The potential for helping people learn palaeography is huge!

Better validation techniques would also improve the participants’ experience. Providing personalised feedback on the first tasks a participant completes would help reassure them while nudging them to improve weaker skills.

Most science and heritage projects working on human computation are very mindful of the impact of their choices on the participants’ experience. However, there’s a risk that anyone who treats human computation like a computer science problem (for example, computationally assigning tasks to the people with the best skills for them) will lose sight of the ‘human’ part of the project. Individual agency is important, and learning or mastering skills is an important motivation. Non-profit crowdsourcing should never feel like homework. We’re still learning about the best ways to design crowdsourcing tasks, and that job is only going to get more interesting.

 

 

Save

Crowdsourcing the world’s heritage

It’s all too easy to forget that there are crowdsourcing projects in languages other than English so I thought I’d collect some projects related to cultural heritage, history and science here (following my definition of crowdsourcing in cultural heritage as ‘asking the public to help with tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge’). This list is drawn from my PhD research, but this is a fast-moving field and I was focusing on early modern England, so inevitably this list will be missing loads of examples. Please suggest links to help people discover new projects! Also, I’m often taking my best guess at the correct translation for terms, so please correct me if I’ve misunderstood.

reseau-correct.fr correction
Correcting text from the Bibliothèque nationale de France on ‘Correct’.

 

Save

Save

Save