Talk notes for #AIUK on the British Library and crowdsourcing

I had a strict five minute slot for my talk in the panel on 'Reimagining the past with AI' at Turing's AI UK event today, so wrote out my notes and thought I might as well share them…

The panel blurb was 'The past shapes the present and influences the future, but the historical record isn’t straightforward, and neither are its digital representations. Join the AHRC project Living with Machines and friends on their journey to reimagine the past through AI and data science and the challenges and opportunities within.' It was a delight to chat with Dave Beavan, Mariona Coll Ardanuy, Melodee Wood and Tim Hitchcock.

My prepared talk: A bit about the British Library for those who aren't familiar with it. It's one of the two biggest libraries in the world, and it’s the national library for the UK. 
 
Its collections are vast – somewhere between 180 and 200 million collection items, including 14 million books; hundreds of terrabytes of archived websites; over 600,000 bound volumes of historical newspapers, of which about 60 million pages have been digitised with partners FindMyPast so far)… 
 
We've been working with crowdsourcing – which we defined as working with the public on tasks that contribute to a shared, significant goal related to cultural heritage collections or knowledge – for about a decade now. We've collected local sounds and accents around Britain, georeferenced gorgeous historical maps, matched card catalogue records in Urdu and Chinese to digital catalogue records, and brought the history of theatre across the UK to life via old playbills. 
 
Some of our crowdsourcing work is designed to help improve the discoverability of cultural heritage collections, and some, like our work with Living with Machines, is designed to build datasets to help answer wider research questions. 
 
In all cases, our work with crowdsourcing is closely aligned with the BL's mission: it helps make our shared intellectual heritage available for research, inspiration and enjoyment. 
 
We think of crowdsourcing activities as a form of digital volunteering, where participation in the task is rewarding in its own right. Our crowdsourcing projects are a platform for privileged access and deeper engagement with our digitised collections. They're an avenue for people who wouldn't normally encounter historical records close up to work with them, while helping make those items easier for others to access.
 
Through Living with Machines, we've worked out how to design tasks that fit into computational linguistic research questions and timelines… 
 
So that's all great – but… the scale of our collections is hard to ignore. Individual crowdsourcing tasks that make items more accessible by transcribing or classifying items are beyond the capacity of even the keenest crowd. Enter machine learning, human computation, human in the loop… 
 
While we're keen to start building systems that combine machine learning and human input to help scale up our work, we don't want to buy into terms like 'crowdworkers' or ‘gig work’ that we see in some academic and commercial work. If crowdsourcing is a form of public engagement, as well as a productive platform for tasks, we can't think of our volunteers as 'cogs' in a system. 
 
We think that it's important to help shape the future of 'human computation' systems; to ensure that work on machine learning / AI are in alignment with Library values . We look to work that peers at the Library of Congress are doing to create human-in-the-loop systems that 'cultivate responsible practices'. 
 
We want to retain the opportunities for the public to get started with simpler tasks based on historical collections, while also being careful not to 'waste clicks' by having people do tasks that computers can do faster. 
 
With Living with Machines, we've built tasks that provide opportunities for participants to think about how their classifications form training datasets for machine learning. 
 
So my questions for the next year are: how can we design human computation systems that help participants acquire new literacies and skills, while scaling up and amplifying their work?

Screenshot of Zoom view from the conference stage with a large green clock and red countdown timer
The conference 'backstage' view on Zoom

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.