In September I was invited to give a keynote at the Museum Theme Days 2016 in Helsinki. I spoke on ‘Reaching out: museums, crowdsourcing and participatory heritage. In lieu of my notes or slides, the video is below. (Great image, thanks YouTube!)
iNaturalist Bioblitz‘s are also more evidence for the value of time-limited challenges, or as they describe them, ‘a communal citizen-science effort to record as many species within a designated location and time period as possible’.
Survey of London and CASA launched the Histories of Whitechapel website, providing ‘a new interactive map for exploring the Survey’s ongoing research into Whitechapel’ and ‘inviting people to submit their own memories, research, photographs, and videos of the area to help us uncover Whitechapel’s long and rich history’.
New Zooniverse project Mapping Change: ‘Help us use over a century’s worth of specimens to map the distribution of animals, plants, and fungi. Your data will let us know where species have been and predict where they may end up in the future!’
New Europeana project Europeana Transcribe: ‘a crowdsourcing initiative for the transcription of digital material from the First World War, compiled by Europeana 1914-1918. With your help, we can create a vast and fully digital record of personal documents from the collection.’
‘Holiday pictures help preserve the memory of world heritage sites’ introduces Curious Travellers, a ‘data-mining and crowd sourced infrastructure to help with digital documentation of archaeological sites, monuments and heritage at risk’. Or in non-academese, send them your photos and videos of threatened historic sites, particularly those in ‘North Africa, including Cyrene in Libya, as well as those in Syria and the Middle East’.
I’ve added two new international projects, Les herbonautes, a French herbarium transcription project led by the Paris Natural History Museum, and Loki a Finnish project on maritime, coastal history to my post on Crowdsourcing the world’s heritage – as always, let me know of other projects that should be included.
Dillon, Justin, Robert B. Stevenson, and Arjen E. J. Wals, ‘Introduction: Special Section: Moving from Citizen to Civic Science to Address Wicked Conservation Problems’, Conservation Biology, 30 (2016), 450–55 <http://dx.doi.org/10.1111/cobi.12689> – has an interesting new model, putting citizen sciences ‘on a continuum from highly instrumental forms driven by experts or science to more emancipatory forms driven by public concern. The variations explain why citizens participate in CS and why scientists participate too. To advance the conversation, we distinguish between three strands or prototypes: science-driven CS, policy-driven CS, and transition-driven civic science.’
‘We combined Jickling and Wals’ (2008) heuristic for understanding environmental and sustainability education (Jickling & Wals 2008) and M. Fox and R. Gibson’s problem typology (Fig. 1) to provide an overview of the different possible configurations of citizen science (Fig. 2). The heuristic has 2 axes. We call the horizontal axis the participation axis, along which extend the possibilities (increasing from left to right) for stakeholders, including the public, to participate in setting the agenda; determining the questions to be addressed; deciding the mechanisms and tools to be used; choosing how to monitor, evaluate, and interpret data; and choosing the course of action to take. The vertical (goal) axis shows the possibilities for autonomy and self-determination in setting goals and objectives. The resulting quadrants correspond to a particular strand of citizen science. All three occupied quadrants are important and legitimate.’
* It’s a short list this month as I’ve been busy and things seem quieter over the northern hemisphere summer.
A quick signal boost for the collaborative notes taken at the DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? (held in Kraków, Poland, on 12 July as part of the Digital Humanities 2016 conference, abstract below). We’d emphasised the need to document the unconference-style sessions (see FAQ) so that future projects could benefit from the collective experiences of participants. Since it can be impossible to find Google Docs or past tweets, I’ve copied the session overview below. The text is a summary of key takeaways or topics discussed in each session, created in a plenary session at the end of the workshop.
Options, schemas and goals for text encoding
Encoding systems will depend on your goals; full-text transcription always has some form of encoding, data models – who decides what it is, and when? Then how are people guided to use it?Trying to avoid short-term solutions
UX, flow, motivation
Making tasks as small as possible; creating a sense of contribution; creating a space for volunteers to communicate; potential rewards, issues like badgefication and individual preferences. Supporting unexpected contributions; larger-scale tasks
Project scale – thinking ahead to ending projects technically, and in terms of community – where can life continue after your project ends
Finding and engaging volunteers
Using social media, reliance on personal networks, super-transcribers, problematic individuals who took more time than they gave to the project. Successful strategies are very-project dependent. Something about beer (production of Itinera Nova beer with label containing info on the project and link to website).
Ecosystems and automatic transcription
Makes sense for some projects, but not all – value in having people engage with the text. Ecosystem – depending on goals, which parts work better? Also as publication – editions, corpora – credit, copyright, intellectual property
Plenary session, possible next steps – put information into a wiki. Based around project lifecycle, critical points? Publication in an online journal? Updateable, short-ish case studies. Could be categorised by different attributes. Flexible, allows for pace of change. Illustrate principles, various challenges.
Crowdsourcing – asking the public to help with inherently rewarding tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge – is reasonably well established in the humanities and cultural heritage sector. The success of projects such as Transcribe Bentham, Old Weather and the Smithsonian Transcription Center in processing content and engaging participants, and the subsequent development of crowdsourcing platforms that make launching a project easier, have increased interest in this area. While emerging best practices have been documented in a growing body of scholarship, including a recent report from the Crowd Consortium for Libraries and Archives symposium, this workshop looks to the next 5 – 10 years of crowdsourcing in the humanities, the sciences and in cultural heritage. The workshop will gather international experts and senior project staff to document the lessons to be learnt from projects to date and to discuss issues we expect to be important in the future.
The workshop is organised by Mia Ridge (British Library), Meghan Ferriter (Smithsonian Transcription Centre), Christy Henshaw (Wellcome Library) and Ben Brumfield (FromThePage).
Probably the biggest news is the launch of citizenscience.gov, as it signals the importance of citizen science and crowdsourcing to the US government.
From the press release: ‘the White House announced that the U.S. General Services Administration (GSA) has partnered with the Woodrow Wilson International Center for Scholars (WWICS), a Trust instrumentality of the U.S. Government, to launch CitizenScience.gov as the new hub for citizen science and crowdsourcing initiatives in the public sector.
CitizenScience.gov provides information, resources, and tools for government personnel and citizens actively engaged in or looking to participate in citizen science and crowdsourcing projects. … Citizen science and crowdsourcing are powerful approaches that engage the public and provide multiple benefits to the Federal government, volunteer participants, and society as a whole.’
There’s also work to ‘standardize data and metadata related to citizen science, allowing for greater information exchange and collaboration both within individual projects and across different projects’.
The Science Gossip project is one year old, and they’re asking their contributors to decide which periodicals they’ll work on next and to start new discussions about the documents and images they find interesting.
I’ve seen a few interesting studentships and jobs posted lately, hinting at research and projects to come. There’s a funded PhD in HCI and online civic engagement and a (now closed) studentship on Co-creating Citizen Science for Innovation.
Apparently you can finish a thesis but you can’t stop scanning for articles and blog posts on your topic. Sharing them here is a good way to shake the ‘I should be doing something with this’ feeling.* This is a fairly random sample of recent material, but if people find it useful I can go back and pull out other things I’ve collected.
Back in September last year I blogged about the implications for cultural heritage and digital humanities crowdsourcing projects that used simple tasks as the first step in public engagement of advances in machine learning that mean that fun, easy tasks like image tagging and text transcription could be done by computers. (Broadly speaking, ‘machine learning’ is a label for technologies that allow computers to learn from the data available to them. It means they don’t have to specifically programmed to know how to do a task like categorising images – they can learn from the material they’re given.)
One reason I like crowdsourcing in cultural heritage so much is that time spent on simple tasks can provide opportunities for curiosity, help people find new research interests, and help them develop historical or scientific skills as they follow those interests. People can notice details that computers would overlook, and those moments of curiosity can drive all kinds of new inquiries. I concluded that, rather than taking the best tasks from human crowdsourcers, ‘human computation‘ systems that combine the capabilities of people and machines can free up our time for the harder tasks and more interesting questions.
I’ve been thinking about ‘ecosystems’ of crowdsourcing tasks since I worked on museum metadata games back in 2010. An ecosystem of tasks – for example, classifying images into broad types and topics in one workflow so that people can find text to transcribe on subjects they’re interested in, and marking up that text with relevant subjects in a final workflow – means that each task can be smaller (and thereby faster and more enjoyable). Other workflows might validate the classifications or transcribed text, allowing participants with different interests, motivations and time constraints to make meaningful contributions to a project.
The New York Public Library’s Building Inspector is an excellent example of this – they offer five tasks (checking or fixing automatically-detected building ‘footprints’, entering street numbers, classifying colours or finding place names), each as tiny as possible, which together result in a complete set of checked and corrected building footprints and addresses. (They’ve also pre-processed the maps to find the building footprints so that most of the work has already been done before they asked people to help.)
After teaching ‘crowdsourcing cultural heritage’ at HILT over the summer, where the concept of ‘ecosystems’ of crowdsourced tasks was put into practice as we thought about combining classification-focused systems like Zooniverse’s Panoptes with full-text transcription systems, I thought it could be useful to give some specific examples of ecosystems for human computation in cultural heritage. If there are daunting data cleaning, preparation or validation tasks necessary before or after a core crowdsourcing task, computational ecosystems might be able to help. So how can computational ecosystems help pre- and post-process cultural heritage data for a better crowdsourcing experience?
While older ecosystems like Project Gutenberg and Distributed Proofreaders have been around for a while, we’re only just seeing the huge potential for combining people + machines into crowdsourcing ecosystems. The success of the Smithsonian Transcription Center points to the value of ‘niche’ mini-projects, but breaking vast repositories into smaller sets of items about particular topics, times or places also takes resources. Machines can learn to classify source material by topic, by type, by difficulty or any other system that crowdsourcers can teach it. You can improve machine learning by giving systems ‘ground truth’ datasets with (for example) a crowdsourced transcription of the text in images, and as Ted Underwood pointed out on my last post, comparing the performance of machine learning and crowdsourced transcriptions can provide useful benchmarks for the accuracy of each method. Small, easy correction tasks can help improve machine learning processes while producing cleaner data.
Computational ecosystems might be able to provide better data validation methods. Currently, tagging tasks often rely on raw consensus counts when deciding whether a tag is valid for a particular image. This is a pretty crude measure – while three non-specialists might apply terms like ‘steering’ to a picture of a ship, a sailor might enter ‘helm’, ’tiller’ or ‘wheelhouse’, but their terms would be discarded if no-one else enters them. Mining disciplinary-specific literature for relevant specialist terms, or finding other signals for subject-specific expertise would make more of that sailor’s knowledge.
Computational ecosystems can help at the personal, as well as the project level. One really exciting development is computational assistance during crowdsourcing tasks. In Transcribing Bentham … with the help of a machine?, Tim Causer discusses TSX, a new crowdsourced transcription platform from the Transcribe Bentham and tranScriptorium projects. You can correct computationally-generated handwritten text transcription (HTR), which is a big advance in itself. Most importantly, you can also request help if you get stuck transcribing a specific word. Previously, you’d have to find a friendly human to help with this task. And from here, it shouldn’t be too difficult to combine HTR with computational systems to give people individualised feedback on their transcriptions. The potential for helping people learn palaeography is huge!
Better validation techniques would also improve the participants’ experience. Providing personalised feedback on the first tasks a participant completes would help reassure them while nudging them to improve weaker skills.
Most science and heritage projects working on human computation are very mindful of the impact of their choices on the participants’ experience. However, there’s a risk that anyone who treats human computation like a computer science problem (for example, computationally assigning tasks to the people with the best skills for them) will lose sight of the ‘human’ part of the project. Individual agency is important, and learning or mastering skills is an important motivation. Non-profit crowdsourcing should never feel like homework. We’re still learning about the best ways to design crowdsourcing tasks, and that job is only going to get more interesting.
It’s all too easy to forget that there are crowdsourcing projects in languages other than English so I thought I’d collect some projects related to cultural heritage, history and science here (following my definition of crowdsourcing in cultural heritage as ‘asking the public to help with tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge’). This list is drawn from my PhD research, but this is a fast-moving field and I was focusing on early modern England, so inevitably this list will be missing loads of examples. Please suggest links to help people discover new projects! Also, I’m often taking my best guess at the correct translation for terms, so please correct me if I’ve misunderstood.
You can ‘use the site’s comment features to share any supplements (such as citations to published works, transcription of notes not yet addressed, authorial attribution for a particular text, etc.) or remarks on the significance of the manuscript codices and contents’ to help Islamic Manuscripts at Michigan.
The Norwegian The Digital Inn is for ‘sources/documents digitised by institutions, associations or persons outside the organisation of the National Archives of Norway’ – a fantastic way of collecting the work that community historians are doing